PROMOTIF v 2.0

E. Gail Hutchinson and Janet M. Thornton, Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, Gower Street,London WC1E 6BT.

e-mail gail@uk.ac.ucl.bioc.bsm, or gail@bsm.bioc.ucl.ac.uk (if mailing from outside uk)

PROMOTIF provides details of the location and types of structural motifs in proteins of known structure by analysis of Brookhaven format coordinate files. The current version of the program analyses the following structural features:

Secondary structure			Beta strands
Disulphide bridges			Beta bulges
Beta turns				Beta hairpins
Gamma turns				Beta alpha beta units
Helical geometry			Psi loops
Helical interactions			Beta sheet topology
Main chain hydrogen bonding patterns

The program also produces a summary page, which gives a briefer description of each motif found in the protein. The progran can be used to compare motifs in a group of related structures such as an ensemble of NMR structures. A description of the program and some applications has recently been published (Hutchinson & Thornton, 1996). Sample outputs for the program can be viewed on our World Wide Web page (address: http://www.biochem.ucl.ac.uk ).

Availability

The program is freely available for academic users. Industrial users should contact the authors directly. The files can be down loaded from our anonymous ftp server (IP address 128.40.46.11). The files are in the /pub/promotif/v2.0 directory. Please read the LICENSE file, sign it and return it to the authors. If you experience problems in accessing the files by ftp contact the authors via e-mail at one of the following addresses: gail@uk.ac.ucl.bioc.bsm or thornton@uk.ac.ucl.bioc.bsm .

Installation

The programs can be copied either as a single tar file (in /tar directory) or individually. The following are the source code files and their associated "parameters" files which need to be copied and compiled.


  	       Source file		Parameters file 
		p_bulge2.f		p_bulge.par
		p_disulph2.f		p_disulph.par
		p_hairpin2.f		p_hairpin.par
		p_helix2.f		p_helix.par
		p_hera2.f		p_hera.par
		p_sheet2.f		p_sheet.par
		p_sssum2.f		p_sssum.par
		p_sstruc2.f		p_sstruc.par
		p_turn2.f		p_turn.par
		psplot2.f		psplot.par
		psplotc.f		     "
		psrout.f		     " 
		nmrconvert.f		nmrconvert.par

There are also a number of additional files as shown below:


		makefile2		this will compile the above set of 
					programs on a Unix machine
		promotif2.prm		the standard parameters file which
					can be editied by the user to
					control which outputs are produced
					and the colours to be used
		promotif.scr
		promotif_multi.scr
		promotif_nmr.scr	these are the script files for
					running the program. make sure
					these are 'user-executable'
		phipsi.mat		data file for Ramachandran plot
		hera_colours		colours file for hydrogen bonding
					diagram

It is easiest if you create a directory and copy all the files into there. Compile the programs by typing:-


	sh makefile2

Set up the following aliases (by adding them to your .cshrc file).

	setenv motifdir 'directory'
	alias promotif $motifdir'/promotif.scr'
	alias promotif_multi $motifdir'/promotif_multi.scr'
	alias promotif_nmr $motifdir'/promotif_nmr.scr'

where directory is the directory in which you have stored the executables of the programs. Once these have been set up the program can be run from any directory.

Running the program

This documentation describes PROMOTIF v2.0, which can now be run in one of 3 possible modes:-

(1) Single protein

This takes as input a single Brookhaven format file of protein coordinates and produces a series of output files for each motif. To run this type

		
	promotif pdbfile

where pdbfile is the full filename of a Brookhaven format file.

The output consists of:

Various ASCII text files intended to be machine readable for further automatic processing.

Black and white postscript tables

Colour postscript schematic diagrams

(2) Multiple protein

This is used for processing a list of Brookhaven coordinate files. The input file is a list of Brookhaven files and this can be processed in 3 ways:

Postscript (p) option

This produces a series of output text and postscript files as in the single protein version for each of the proteins in the list. To do this type

	promotif_multi p file

where file is the file containing the list of proteins.

List (l) option

This produces a text file for each motif. Each file contains a list of all the examples of the particular motif found in the entire list of proteins. To do this type

	promotif_multi l file

The files created using this option have file names in upper case: BETATURNS, GAMMATURNS, DISULPHIDES, HELICES, HELIX_INT, STRANDS, SHEETS, BULGES, HAIRPINS, BETAALPHABETA, PSILOOPS, SSSUM.

Compare (c) option

This will generate all the motifs for each protein in the list and, in addition produce a postscript file in which the motifs in the different proteins are compared. To run this type

	promotif_multi c file

Note: In the current version the program assumes that the structures in the data set are aligned with identical sequence numbers at equivalent positions in all structures. If you run this option on a list of unrelated proteins the results of the comparison will be meaningless.

(3) NMR ensemble

This is used for processing a file containing an ensemble of NMR structures. The program generates individual coordinate files for each of the members of the ensemble and these are then treated as in the multiple protein mode above.

Postscript (p) option

This will produce a set of output postscript files giving details of the motifs in each of the members of the ensemble. To do this type

	
	promotif_nmr p nmrfile

where nmrfile is a Brookhaven file containing an ensemble of NMR structures.

List (l) option

This produces a flat file for each motif. Each file contains a list of all the examples of the particular motif found in the NMR ensemble. To do this type

	promotif_nmr l nmrfile

Compare (c) option

This will generate all the motifs for each member of the ensemble and, in addition, produce a postscript file in which the motifs in the different members of the ensemble are compared. To run this type

	promotif_nmr c nmrfile

Options

The user can control which outputs are produced and the colours to be used by editing the standard parameters file promotif2.prm. There is also the option to produce all the outputs in black and white only. See Appendix I for details of how to modify this file.

Details of PROMOTIF Output

Most of the motifs are identified and classified according to rules defined in published papers. A more detailed description of the analysis and output for each motif follows below.

Secondary Structure

The program calculates the secondary structure of the protein using a local implementation (D. K. Smith, unpublished data) of the DSSP algorithm of Kabsch and Sander (1983). In the standard DSSP algorithm, a residue is included in a secondary structure only if its NH and CO groups form the appropriate hydrogen bonds or alternatively, for beta sheets only, if the CO(i-1) and NH(i+1) groups are involved in the appropriate hydrogen bonds. This gives assignments which broadly agree with IUPAC rule 6.2 (1970), which states that, to be involved in a particular secondary structure, a residue should have phi and psi values close to the ideal values for that secondary structure.

The slightly modified algorithm used in this suite of programs conforms to IUPAC convention rule 6.3, according to which a residue is considered part of a beta sheet or alpha helix if either its NH or CO groups are involved in the appropriate hydrogen bonds. In practice this means that one extra residue is added to the ends of each strand and helix where possible. These extra residues are classified using lower case letters for the secondary structure, while the remainder of the residues in secondary structures are indicated using upper case letters (E for beta strands, H for alpha helices and G for 3,10 helices.). This rule is that most commonly used amongst crystallographers. The secondary structure output is given in the file pdbn.sst, where 'pdbn' represents the characters preceding the decimal point in the Brookhaven file name (i.e. usually the Brookhaven code). This secondary structure file provides the raw data used for the remainder of the analyses.

Beta Turns

A beta turn is defined for 4 consecutive residues (denoted by i, i+1, i+2 and i+3) if the distance between the Calpha atom of residue i and the Calpha atom of residue i+3 is less than 7 Angstroms and if the central two residues are not helical (either using the Kabsch and Sander criteria or using author defined criteria) (Lewis, 1973).

The turns are assigned to one of 9 classes on the basis of the phi, psi angles of residues i+1 and i+2. The ideal angles for each of the turn types are as follows:


Type	    Phi(i+1)	 Psi(i+1)	Phi(i+2)	Psi(i+2)
I	     -60	   -30		  -90		     0
II	     -60	   120		   80		     0
VIII	     -60	   -30		 -120              120
I'	      60	    30		   90		     0
II'	      60	  -120		  -80		     0
VIa1	     -60	   120		  -90		     0 cis-proline(i+2)
VIa2	    -120	   120		  -60		     0 cis-proline(i+2)
VIb	    -135	   135		  -75		   160 cis-proline(i+2)
IV	    turns excluded from all the above categories

With the exception of the type VI turns these angles were originally defined by Venkatachalam (1968). The angles for the type VI turns were originally defined by Richardson (1981). We have used the nomenclature VIa1 and VIa2 to distinguish between two subclasses of type VIa turns with the phi, psi angles of residue i+1 in the beta and polyproline region of the Ramachandran plot (Hutchinson and Thornton 1994).

The phi and psi angles are allowed to vary by +/- 30 degrees from these ideal values with the added flexibility of one angle being allowed to deviate by as much as 40 degrees. Types VIa1, VIa2 and VIb turns are subject to the additional condition that residue i must be a cis-proline. Turns which do not fit any of the above criteria are classified as type IV.

Promotif output data for beta turns

pdbn_bturn_tab01.ps

where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each beta turn in the protein. From left to right are listed the sequence numbers of the first (i) and last (i+3) residues in each turn, the one-letter amino acid codes for each of the four residues in the turn, the turn type, the phi and psi angles of residues i+1 and i+2, the regions of the Ramachandran plot occuped by residues i+1 and i+2 (Appendix II). The final columns show the chi1 angles of residues i+1 and i+2 and the distance between the Calpha atoms of residues i and i+3. Once this page is full, subsequent turns are recorded in files named pdbn_bturn_tab02.ps, pdbn_bturn_tab03.ps etc.

pdbn_bturns_01.ps

A set of colour postscript schematic diagrams, one for each turn. This provides a Ramachandran plot with residues i+1 and i+2 plotted on it, as well as a schematic plot of the turn with the 4 residues and the Calpha(i) to Calpha(i+3) distance marked and arrows to indicate whether or not residue i donates a hydrogen bond to residues i+3. The residue numbers and turn type are indicated above the Ramachandran plot. 16 beta turns are plotted on each page; subsequent turns are plotted in file pdbn_bturns_02.ps etc.

pdbn.bturns

A flat file containing the same information as in pdbn_bturn_tab01.ps. The information is in the order: residue number and one-letter amino acid code of residues i, i+1, i+2 and i+3; turn type, Ramachandran regions of residues i+1 and i+2, phi(i+1), psi(i+1), phi(i+2), psi(i+2); Y or N to indicate whether or not a hydrogen bond is formed between the NH of residue i+3 and the CO of residue i; chi1(i+1), chi1(i+2); logical sequence number corresponding to the first residue in the turn.

BETATURNS

This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the protein structure from which each turn was derived is given in the first column.

Gamma Turns

A Gamma turn is defined for 3 residues i, i+1, i+2 if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees of one of the following 2 classes (Rose et al., 1985, Milner-White et al, 1988):


turn type	       phi(i+1)		       psi(i+1)
classic	 		75.0			-64.0
inverse		       -79.0			 69.0

Promotif output data for gamma turns

pdbn_gturn_tab01.ps

where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each gamma turn in the protein. From left to right are listed the sequence numbers of the first (i) and last (i+2) residues in the turn, the one-letter amino acid codes for each of the three residues in the turn, the turn type (classic or inverse), the phi and psi angles of residue i+1 and the distance between the Calpha atoms of residues i and i+2. Once this page is full, subsequent turns are recorded in files named pdbn_gturn_tab02.ps, pdbn_gturn_tab03.ps etc.

pdbn_gturns_01.ps

A set of colour postscript schematic diagrams, one for each turn. This provides a Ramachandran plot with residue i+1 plotted on it, and a schematic plot of the turn with the 3 residues and the Calpha(i)-Calpha(i+2) distance marked and arrows to indicate the i to i+2 hydrogen bond. The residue numbers and turn type are indicated above the Ramachandran plot.

pdbn.gturns

A flat file containing the same information as in pdbn_gturn_tab01.ps. The information is in the order: residue number and one-letter amino acid code of residues i, i+1 and i+2, turn type, Ramachandran region of residue i+1, phi(i+1), psi(i+1); distance between the Calpha atoms of residues i and i+2; chi1(i+1); logical sequence number corresponding to the first residue in the turn.

GAMMATURNS

This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the name of the structure in which the turn was found is recorded in the first column.

Beta Bulges

A beta bulge is a region of irregularity in a beta sheet, where the normal pattern of hydrogen bonding is disrupted e.g. by the insertion of an extra residue. Using the definition of beta strands and main-chain hydrogen bonds provided by the Kabsch and Sander algorithm the program identifies such irregularities and classifies them as described in Chan et al. (1993). The bulges are defined as parallel or antiparallel depending on whether they occur in parallel or antiparallel regions of beta sheet. Within each of these categories bulges are further subdivided into classic, wide, bent, G1 and special types depending on the number of residues involved and the hydrogen bonding pattern. Classic and wide bulges both involve an extra residue on one beta strand relative to its neighbouring strand. In antiparallel beta sheet the classic bulges occur where the extra residue is between two narrowly spaced pairs of hydrogen bonds, whilst in the case of the wide bulges the extra residue is between the widely spaced pairs of hydrogen bonds. Corresponding hydrogen bonding patterns for parallel classic and wide bulges can be found in Chan et al. (1993). Bent bulges occur much less frequently, and have one extra residue on both strand partners. The term special bulges is used to refer to several possible situations where there can be up to 3 extra residues in one strand. G1 bulges occur only in antiparallel sheets; in these cases residue 1 is in the alpha left conformation and is therefore usually glycine. This usually occurs at the end of a beta strand.

Promotif output data for beta bulges

pdbn_bulge_tab01.ps

where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each beta bulge in the protein. From left to right are residue numbers of residue X (on the normal strand), and residues 1 and 2 (on the bulged strand) and one letter amino acid code for each of these residues. The bulge type is described using two letters: the first letter is P or A depending on whether the bulge involves parallel or antiparallel beta strands; the second letter can be Classic, Wide, G1, Bent or Special. The final columns list the phi and psi angles of residues X, 1 and 2. At the moment just these residues are represented in the table, although there may be some additional residues involved in special bulges. These are listed in the flat file described below.

pdbn_bulges_01.ps etc

Colour postscript schematic diagrams for each bulge in the protein. For each bulge there is a Ramachandran plot displaying the phi,psi angles of the residues involved in the bulge. The 'normal' residues in the bulge (X, 1 and 2) are indicated by one colour (pink in the default setting of the program), and any remaining residues (3 and 4 if they occur) are displayed in a second colour (sky blue as a default). The type of bulge is indicated above the Ramachandran plot. To the right of the Ramachandran plot there is a schematic diagram indicating the residues and hydrogen bonding in and around the beta bulge. There are 10 bulges per page; further bulges, where present are represented in subsequent files: pdbn_bulges_02.ps etc.

pdbn.blg

A flat file containing the same information as in pdbn_bulge_tab01.ps.The information is in the order: bulge type indicated by two letters; (residue number, one letter amino acid code, phi, psi) of residues X, 1, 2, 3 and 4. Most of the bulges will have just residues X, 1 and 2; the special bulges may have one or more of residues 3 and 4, and the bent bulges will just have residues 1 and 2. If the residues are not present in a given bulge their phi and psi angles are given as -999.9. (The final numbers in each row give sequential numbering of the residues in the bulge, for use by the plotting program).

BULGES

Helices

The helix terminii and helix type are identified directly from the secondary structure assignment program.

Promotif output data for helices

pdbn_helix_tab01.ps

where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each helix in the protein. The helices are numbered consecutively from the N-terminus to allow helix interactions to be described. The start and end residues of each helix, helix type (alpha or 3,10), number of residues and amino acid sequence are given for each helix.

pdbn_helix_geom01.ps

A postscript table giving details of the geometry of each alpha helix. From left to right are the helix number, the length and unit rise (both in Angstroms), the number of residues per turn (ideally 3.6 for alpha helices), the helix pitch in Angstroms and a measure of the deviation of the helix geometry from an ideal helix (in degrees). This latter value should be 0 for a perfect helix. These parameters are not calculated for helices with less than four residues.

pdbn_helix_diag01.ps

A colour postscript file providing schematic diagrams - helical wheels and helical nets for each helix. The residues are colour coded for hydrophobic (green as default), polar (blue) and charged (red) amino acid types. The wheels and nets assume the ideal helical value of 3.6 residues per turn.

pdbn_helix_int01.ps

A postscript table giving details of the interacting pairs of helices in the protein. Two helices are defined as "interacting" if they contain one or more atoms within 4.5 Angstroms of the other helix.From left to right the data represent the helix numbers for the two helices involved, their distance of closest approach (in Angstroms), the omega angle between them, the number of interacting pairs of residues and the number of residues in each of the two helices involved in the interaction.

pdbn.hlx

This is a flat file containing the same information as in pdbn_helix_tab01.ps, pdbn_helix_geom01.ps and pdbn_helix_int01.ps. The first part of the file contains the data for each helix. The order of the information is as follows: helix number, chain letter, the first and last residue numbers of the helix, helix type (H (alpha helix) or G ( 3,10 helix). This is followed by the number of residues in the helix and a 9-letter character string indicating the secondary structures immediately surrounding the helix- from 4 secondary structures before the helix to 4 secondary structures after the helix. These are classed as E for strand, T for turn, H for alpha helix, G for 3,10 helix. Then follow various pieces of geometrical information about each helix: length in Angstroms, unit rise, number of residues per turn, pitch and a measure of the linearity of the helix. The final information in the file is the sequence of the helix recorded as a string of 1-letter codes.

The second section of the file gives details of the interacting pairs of helices in the protein. For each such pair the file gives the helix numbers and helix type for each helix, distance of closest approach and the omega angle between the helices. The next two letters indicate where in each of the two helices the position of closest approach to the other helix is located (N: beyond the N-terminus, C: beyond the C-terminus, I: within the helix). Also recorded are the number of residues in each of the helices which are involved in the interaction, as well as the total number of interacting residues.

HELICES

HELIX_INT

The flat file for helix interactions created using the list option. The file contains the same information as above. The Brookhaven code of the protein in which the helix interaction occurs is also given in the first column.

Strands, Sheets, Beta-Alpha-Beta Units and Psi-loops

These data are produced by a single program, which classifies the strands and the sheets to which they belong according to the output of the Secondary Structure Assignment program.

Promotif output for strands, sheets etc

pdbn_strands_tab01.ps

A postscript table giving the strand number (sequentially numbered from the N-terminus of the protein, start and end residues, the letter corresponding to the sheet in which the strand is involved , number of residues and sequence of each strand in the protein. The strands are numbered consecutively from the N-terminus of the protein.

pdbn_sheets_tab01.ps

A postscript table giving details of each beta sheet in the protein. The sheet letter is given, as well as the number of strands, the nature of the sheet (antiparallel, parallel or mixed). If the sheet forms a closed beta-barrel this is also mentioned here. Finally the table gives the topology of the sheet, using the nomenclature of Richardson (1981).

pdbn_motifs_tab01.ps

This contains up to two tables. The first gives details of the beta-alpha-beta units, which consist of two parallel hydrogen bonded beta strands connected by an alpha helix. The table gives the start and end residues of each of the two strands, the number of residues in each of the strands. the length of the loop and the length of the helical part of the loop. The second table gives details of the psi-loops if any. These consist of two antiparallel strands connected by a '+2' connection i.e. with one strand in between, hydrogen bonded to both of them (Tang et al, 1978). In contrast to the beta-alpha-beta units these occur very rarely in proteins (Hutchinson & Thornton, 1990). Just the start and end residues of each of the two strands involved are included at the moment.

pdbn.str

This is a flat file giving information about each strand in the protein as in pdbn_strands_tab01.ps. Some additional information is given relative to the postscript tables. The order of information is as follows: strand number; initial and final residues; sheet letter; number of residues in strand; amino acid sequence; number of hydrogen bonding partners of strand; 4 columns representing the strand numbers of these partners (if there are less than 4 partners, the remaining numbers are 0); 4 signs giving the orientation of these partners relative to the current strand (- for antiparallel and + for parallel); Richardson nomenclature for the topology of the connection to the next strand in the sequence (99 means that the next strand is in a different sheet); a character string to represent 4 secondary structure units before and 4 secondary structure units after the strand (E for strand, H for alpha helix, G for 3,10 helix and T for turn).

pdbn.sht

This flat file contains details of the beta sheets in the protein as in pdbn_sheets_tab01.ps. Two lines of data are given for each sheet. The first line gives the sheet letter, the number of strands in the sheet, a letter indicating whether the sheet is parallel (P), antiparallel (A) or mixed (M) and a letter indicating whether the sheet is a closed barrel (Y) or not (N). The second row gives the topology of the sheet according to the nomenclature of Richardson (1981).

pdbn.bab

This file gives information about the beta alpha beta units in the protein, if any. From left to right the data represent the initial and final residues of strand 1, initial and final residues of strand 2, length of strand 1, length of strand 2, length of loop, length of helix. (The final 2 numbers represent the logical sequence number of the beginning and end residues; these are used internally by the rest of the program).

pdbn.psi

This file gives information about the psi-loops in the protein, if any. From left to right the data represent the initial and final residues of strand 1, initial and final residues of strand 2 and the logical sequence number of the beginning and end residues.

SHEETS, STRANDS, BETAALPHABETA, PSILOOPS

These are the corresponding flat files created when promotif_multi or promotif_nmr are run using the list (l) option. They contain the same data as in the above files, and the name of the structure in which each motif was found is also recorded in each case.

Hairpins

Beta hairpins consist of two beta-strands which are antiparallel and hydrogen bonded together (connected by at least one bridge). The hairpins are classified as in Sibanda et al (1989) using two numbers X:Y, which denote the numer of residues in the loop defined using two different IUPAC conventions. If the end strand residues form two hydrogen bonds, then X=Y. If the distal hydrogen bond is not formed the number of residues in the loop depends on which definition of strand residues is used, according to the IUPAC convention. In practice, if the end hydrogen bond is not formed then Y=X+2.

For the smaller loops the hairpins are dominated by the formation of beta turns (usually I' and II'). The 3:5 hairpins are dominated by one well defined conformation which can be described as a type I turn followed by a G1 bulge. The most common class among the 4:4 hairpins contains a type I beta turn. Where these particular conformations occur,they are indicated by appropriate letters after the main classification.

If there is a break in the polypeptide chain within the loop region the hairpin will not be classed by PROMOTIF and the classification will be given as 0:0.

Promotif output for hairpins

pdbn_hairpin_tab01.ps

A black and white postscript table giving the start and end residues for each of the two beta-strands involved in the hairpin, the number of residues in each of these two strands and the hairpin class.

pdbn_hairpins_01.ps

A colour postscript schematic diagram for each hairpin. The left hand diagram shows the residue numbers and hairpin classification, with the lengths of the strands in the diagram proportional to the number of residues. The right hand diagram shows the sequence of the hairpin, with the strand residues in green and the loop residues in purple. The main-chain hydrogen bonds are indicated in pink.

pdbn.hpin

A flat file giving information as in pdbn_hairpin_tab01.ps for each hairpin. From left to right the information is as follows: strand numbers of the two strands involved, beginning and end residues of each strand, hairpin classification, number of residues in the two strands, sequence of strand 1, sequence of loop, sequence of strand 2, a string representing phi,psi classification of the loop, and various other numbers for use internally by the plotting program.

HAIRPINS

This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The file contains the same data as above, and the name of the structure in which the hairpin was found is also recorded.

Disulphide Bridges

Disulphide bridges are identified for two cysteine residues whose sulphur atoms are less than 3 Angstroms apart. Richardson (1981) identified several categories of disulphide bridges based on their internal chi angles, in particular the chi2, chi3 and chi2' angles. We have loosely classified disulphides based on the signs of these angles into 5 categories. These are shown below.


Disulphide type		Chi2		Chi3		Chi2'
left handed spiral	 -		 -		 -
right handed hook	 +		 +		 -
right handed spiral	 +		 +		 +
short right handed hook	 -		 +		 -

Note that the chi2 and chi2' values can be interchanged, as they merely reflect which of the two cysteines involved in the bridge is mentioned first. If the other cysteine were mentioned first the chi and chi' values would be interchanged.

Richardson found that the majority of disulphides could be classed as left handed spirals or right handed hooks.

Promotif output for disulphide bridges

pdbn_disulph_01.ps

A postscript table which gives details of each disulphide bridge found in the protein, the residue numbers of the two cysteines involved, chi1, chi2, chi3, chi2' and chi1' values, the distance between the Calpha atoms of the residues involved and the classification of the disulphide bridge, according to the above table, where assigned.

pdbn.dsf

A flat file containing details of each disulphide bridge found in the protein as in pdbn_disulph_01.ps. The type of bridge is abbreviated (RHH: right hand hook; SRH: short right hand hook; LHS: left handed spiral; RHS: right handed spiral), and the final two columns in the file are used internally by the program.

DISULPHIDES

This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, and the name of the structure in which the disulphide bridge was found is also recorded.

Main Chain Hydrogen Bonding Patterns

These are represented by hydrogen bonding diagrams drawn using the HERA program (Hutchinson and Thornton, 1990). The full version of the program and documentation is available separately from the authors. At the moment the version included with PROMOTIF simply draws a hydrogen bonding diagram of all sheets and helices in the given protein. More flexibility will be added later. The output file containing this diagram is called pdbn.her, where, again pdbn represents the Brookhaven code of the protein used.

Summary Page

The above detailed description for each motif can be quite lengthy and is not always required. PROMOTIF also produces a summary page for use when only a brief summary of the motifs in a protein is required.

Promotif summary data

pdbn_summary_01.ps

A page consisting of a set of postscript tables giving a summary of the secondary structure and motifs in the protein. The top part of the diagram shows the name of the protein, the amino acid sequence and the number of residues and chains. The number of residues and sequence refer to those residues actually observed in the electron density. Any difference between this and the actual sequence is indicated by the number of disordered residues. The number and percentage of residues in each secondary structural type are shown below this. A string of characters indicates the sequence of secondary structures as assigned by the Kabsch and Sander algorithm. In this E represents strands, H and G represent alpha and 3,10 helices respectively and T represents turns. For turns in particular this may not correspond exactly to the assignments by PROMOTIF, because the Kabsch and Sander assignments are based solely on hydrogen bonding criteria. In the second part of the figure mini-tables summarise the location of each type of motif in the protein as found by PROMOTIF.

pdbn.sum

A flat file giving summary secondary structure information as in pdbn_summary_01.ps. The file gives the protein name identified from the Brookhaven file, the amino-acid sequence, the number of residues and chains, the number of strands, alpha-helices and 3,10 helices and the percentage of residues in each of these secondary structure types and a character string representing the sequence of secondary structures in the protein.

SECSUM

This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option.

Comparison

When PROMOTIF is run on a set of related proteins (e.g. members of an ensemble of NMR structures) using the compare (c) option it generates a postscript figure in which the motifs in the proteins are compared. In the current version the program assumes that the structures in the data set are aligned with identical sequence positions in all structures.

The left hand side of the figure represents consensus information about the set of proteins. By default the consensus is calculated using all motifs that occur in more than 50% of the set of structures. The fraction of structures used to calculate the consensus can be varied by editing the promotif2.prm file (Appendix I). The columns show, from left to right, the residue number (No.), amino acid sequence (Seq) and consensus secondary structure assignments (SS) derived using the modified Kabsch and Sander algorithm (h/H, alpha-helix; t/T, turn; e/E, beta-strand; and S, bend). The remaining four columns of the consensus structure indicate the locations of beta-turns (BT), gamma-turns (GT), beta-bulges (BG) and disulphide bridges (DS) in the consensus structure. Where present, these motifs are indicated by the class of the motif in the appropriate column for a given residue. For beta- and gamma-turns, in addition to the various turn types (I, I', II, II', IV, VIII, VIa1, VIa2 and VIb for beta-turns and IN(VERSE) and CL(ASSIC) for gamma turns) a residue can also be classified as part of a composite turn (C) if it is involved in more than one turn or simply beta or gamma if the consensus structure has a turn, but there is no dominant turn type. Bulges are indicated by two letters, A or P, depending on whether the strands are antiparallel or parallel, and C(lassic), W(ide), S(pecial) or B(ent), depending on the pattern of hydrogen bonds.

The remainder of the figure represents differences from this consensus structure for each of the proteins in the data set. The numbers at the top of the columns represent the individual proteins in the list. The left hand column of the data for each protein highlights differences in secondary structure - extra secondary structure is indicated by the appropriate letter and secondary structure missing with respect to the consensus is indicated by the consensus structure with an X through it. The remaining space indicates differences in the turns, bulges and disulphides. If one of these is present in a particular structure and absent in the consensus, or if the motif type is different from the consensus, the residue is marked with the motif type. If a motif present in the consensus is absent from an individual structure, this is indicated by a cross through the motif (beta: beta-turn; gamma: gamma turn, BG: bulge).

Each page of the output contains imformation about a maximum of 76 residues (vertically) and 16 structures (horizontally). The names of the postscript files depend on the name of the file containing the list of proteins. If, for example you have run

	      promotif_multi c list

the first file of comparison data (showing the first 76 residues of the first 16 proteins in the list) will be called list01_1.ps. Data for subsequent residues are found by incrementing the first number, thus list02_1.ps will contain the next 76 residues for the same 16 proteins. Data for subsequent proteins are found by incrementing the second number, thus list01_2.ps will contain information on the first 76 residues for the next set of 16 proteins. If you have run promotif_nmr the program generates a list of proteins in a file called nmrlist and thus the comparison data will always be in files nmrlist01_1.ps, nmrlist02_1.ps etc.

References:

Chan, A. W. E., Hutchinson, E. G., Harris, D. & Thornton, J. M. (1993) Identification, classification and analysis of beta-bulges in proteins. Protein Science 2, 1574-1590.

Efimov, A. V. (1991) Structure of alpha-alpha hairpins with short connections. Protein Engineering 4, 245-250.

Hutchinson & Thornton (1990) HERA - A program to draw schematic diagrams of protein secondary structure. Proteins Struct. Funct. Genet. 8, 203-212.

Hutchinson, E. G. & Thornton, J. M. (1994) A revised set of potentials for beta turn formation in proteins. Protein Science 3, 2207-2216.

Hutchinson, E. G. & Thornton, J. M. (1996) PROMOTIF - A program to identify and analyze structural motifs in proteins" Protein Science 5, 212-220

IUPAC-IUB Commission on Biochemical Nomenclature (1970) Abbreviations and symbols for the description of the conformation of polypeptide chains. J. Mol. Biol. 52, 1-17.

Kabsch, W. & Sander, C (1983) Biopolymers 22, 2577-2637.

Lewis, P. N., Monany, F. A. & Scheraga, H. A. (1973) Chain reversals in proteins Biochem. Biophys. Acta 303, 211-229.

Milner-White, E. J., Ross, B. M., Ismail, R., Belhadj-Mastefa, K. & Poet, R. (1988) One type of gamma turn, rather than the other, gives rise to chain reversal in proteins. J. Mol. Biol. 204, 777-782.

Richardson, J. S. (1981) The anatomy and taxonomy of protein structure. Adv. Protein Chem. 34, 167-339.

Rose, G. D., Gierasch, L. M. & Smith, J. A. (1985) Turns in peptides and proteins. Adv. Prot. Chem. 37, 1-109.

Sibanda, B. L., Blundell, T. L. & Thornton, J. M. (1989) Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. J. Mol. Biol. 206, 759-777.

Tang, J., James, M. N. G., Hsu, I. N., Jenkins, J. A. & Blundell, T. L. (1978) Structural evidence for gene duplication in the evolution of the acid proteases nature 271, 618-621.

Venkatachalam (1968) Stereochemical criteria for polypeptides and proteins V. Conformation of a system of three linked peptide units. Biopolymers 6, 1425-1436.

Appendix I Modifying the promotif2.prm file

The standard promotif2.prm file supplied with PROMOTIF will cause the program to produce output for all of the motifs described above. This is not always necessary and the files thus generated may take up a lot of space. In the first section of the promotif.prm file a Y(es) is assigned by default for each motif type. If you do not want the output for one or more of the motifs then change the assignment to N(o). The basic calculations will still be performed and the flat files generated but the associated postscript will not. The final N in the list indicates N(o) for (not black and white plots) i.e. colour plots. If you would like all the output to be generated in black and white then change this to Y(o). The default percentage of structures used to calculate the consensus structure when the c option is used is given as 50.0 in the next line of the file. If you want to use a different percentage, change this number.

The second section of the file refers to colour output and we have assigned a colour to each of the parts of the colour schematic diagrams. If you would like a different colour combination then change the colour to one of those listed in the file. Alternatively you can change the colours and their rgb codes completely by editing the appropriate numbers in the file.

Appendix II The phi,psi regions of the Ramachandran plot

The PROMOTIF output for several of the motifs described in this documentation contain reference to assigned regions of the Ramachandran plot. These were originally assigned by Efimov (1991). The major regions are alpha (alpha-helical region), beta E (beta sheet region), beta P (polyproline region), alpha L (left handed alpha helical region occupied mainly by glycine), gamma L (adjacent to alpha L and again populated mainly by glycine) and epsilon, which is another small region in the bottom right hand corner of the Ramachandran plot, occupied mainly by glycine residues. These regions are sometimes indicated in the PROMOTIF output by corresponding English letters:


Region		      English letter

alpha		      A, a*
beta E		      B
beta P		      P
alpha L		      L
gamma L		      G
epsilon		      E

* A refers to the core regions of the Ramachandran plot occupied by alpha-helical residues in good quality high resolution structures. a refers to a region immediately surrounding the A region, which is occupied by helical residues in less well defined structures.

In our analysis the assignment of individual phi,psi values to a particular region is based on comparison with a matrix, in which each 10 x 10 degree interval in phi,psi space is assigned to a region. The matrix was calculated based on contouring the Ramachandran plot derived from a set of non-homologous protein structures. This matrix is encoded for use in the program as the file phipsi.mat. Residues with phi,psi values which fall outside these regions do not have an assigned phi,psi region and appear as blanks in the relevant tables.

Gail Hutchinson 22nd May, 1996