E. Gail Hutchinson and Janet M. Thornton,
Biomolecular Structure and Modelling Unit,
Department of Biochemistry and Molecular Biology,
University College, Gower Street,London WC1E 6BT.
e-mail gail@uk.ac.ucl.bioc.bsm,
or gail@bsm.bioc.ucl.ac.uk (if mailing from
outside uk)
PROMOTIF provides details of the location and types of structural motifs in proteins of known structure by analysis of Brookhaven format coordinate files. The current version of the program analyses the following structural features:
Secondary structure Beta strands
Disulphide bridges Beta bulges
Beta turns Beta hairpins
Gamma turns Beta alpha beta units
Helical geometry Psi loops
Helical interactions Beta sheet topology
Main chain hydrogen bonding patterns
The program also produces a summary page, which gives a briefer description of each motif found in the protein. The progran can be used to compare motifs in a group of related structures such as an ensemble of NMR structures. A description of the program and some applications has recently been published (Hutchinson & Thornton, 1996). Sample outputs for the program can be viewed on our World Wide Web page (address: http://www.biochem.ucl.ac.uk ).
Source file Parameters file p_bulge2.f p_bulge.par p_disulph2.f p_disulph.par p_hairpin2.f p_hairpin.par p_helix2.f p_helix.par p_hera2.f p_hera.par p_sheet2.f p_sheet.par p_sssum2.f p_sssum.par p_sstruc2.f p_sstruc.par p_turn2.f p_turn.par psplot2.f psplot.par psplotc.f " psrout.f " nmrconvert.f nmrconvert.par
There are also a number of additional files as shown below:
makefile2 this will compile the above set of programs on a Unix machine promotif2.prm the standard parameters file which can be editied by the user to control which outputs are produced and the colours to be used promotif.scr promotif_multi.scr promotif_nmr.scr these are the script files for running the program. make sure these are 'user-executable' phipsi.mat data file for Ramachandran plot hera_colours colours file for hydrogen bonding diagram
It is easiest if you create a directory and copy all the files into there. Compile the programs by typing:-
sh makefile2
Set up the following aliases (by adding them to your .cshrc file).
setenv motifdir 'directory'
alias promotif $motifdir'/promotif.scr'
alias promotif_multi $motifdir'/promotif_multi.scr'
alias promotif_nmr $motifdir'/promotif_nmr.scr'
where directory is the directory in which you have stored the executables of the programs. Once these have been set up the program can be run from any directory.
This takes as input a single Brookhaven format file of protein coordinates and produces a series of output files for each motif. To run this type
promotif pdbfile
where pdbfile is the full filename of a Brookhaven format file.
The output consists of:
This is used for processing a list of Brookhaven coordinate files. The input file is a list of Brookhaven files and this can be processed in 3 ways:
This produces a series of output text and postscript files as in the single protein version for each of the proteins in the list. To do this type
promotif_multi p file
where file is the file containing the list of proteins.
This produces a text file for each motif. Each file contains a list of all the examples of the particular motif found in the entire list of proteins. To do this type
promotif_multi l file
The files created using this option have file names in upper case: BETATURNS, GAMMATURNS, DISULPHIDES, HELICES, HELIX_INT, STRANDS, SHEETS, BULGES, HAIRPINS, BETAALPHABETA, PSILOOPS, SSSUM.
This will generate all the motifs for each protein in the list and, in addition produce a postscript file in which the motifs in the different proteins are compared. To run this type
promotif_multi c file
Note: In the current version the program assumes that the structures in the data set are aligned with identical sequence numbers at equivalent positions in all structures. If you run this option on a list of unrelated proteins the results of the comparison will be meaningless.
This is used for processing a file containing an ensemble of NMR structures. The program generates individual coordinate files for each of the members of the ensemble and these are then treated as in the multiple protein mode above.
This will produce a set of output postscript files giving details of the motifs in each of the members of the ensemble. To do this type
promotif_nmr p nmrfile
where nmrfile is a Brookhaven file containing an ensemble of NMR structures.
This produces a flat file for each motif. Each file contains a list of all the examples of the particular motif found in the NMR ensemble. To do this type
promotif_nmr l nmrfile
This will generate all the motifs for each member of the ensemble and, in addition, produce a postscript file in which the motifs in the different members of the ensemble are compared. To run this type
promotif_nmr c nmrfile
The slightly modified algorithm used in this suite of programs conforms to IUPAC convention rule 6.3, according to which a residue is considered part of a beta sheet or alpha helix if either its NH or CO groups are involved in the appropriate hydrogen bonds. In practice this means that one extra residue is added to the ends of each strand and helix where possible. These extra residues are classified using lower case letters for the secondary structure, while the remainder of the residues in secondary structures are indicated using upper case letters (E for beta strands, H for alpha helices and G for 3,10 helices.). This rule is that most commonly used amongst crystallographers. The secondary structure output is given in the file pdbn.sst, where 'pdbn' represents the characters preceding the decimal point in the Brookhaven file name (i.e. usually the Brookhaven code). This secondary structure file provides the raw data used for the remainder of the analyses.
The turns are assigned to one of 9 classes on the basis of the phi, psi angles of residues i+1 and i+2. The ideal angles for each of the turn types are as follows:
Type Phi(i+1) Psi(i+1) Phi(i+2) Psi(i+2) I -60 -30 -90 0 II -60 120 80 0 VIII -60 -30 -120 120 I' 60 30 90 0 II' 60 -120 -80 0 VIa1 -60 120 -90 0 cis-proline(i+2) VIa2 -120 120 -60 0 cis-proline(i+2) VIb -135 135 -75 160 cis-proline(i+2) IV turns excluded from all the above categories
With the exception of the type VI turns these angles were originally defined by Venkatachalam (1968). The angles for the type VI turns were originally defined by Richardson (1981). We have used the nomenclature VIa1 and VIa2 to distinguish between two subclasses of type VIa turns with the phi, psi angles of residue i+1 in the beta and polyproline region of the Ramachandran plot (Hutchinson and Thornton 1994).
The phi and psi angles are allowed to vary by +/- 30 degrees from these ideal values with the added flexibility of one angle being allowed to deviate by as much as 40 degrees. Types VIa1, VIa2 and VIb turns are subject to the additional condition that residue i must be a cis-proline. Turns which do not fit any of the above criteria are classified as type IV.
where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each beta turn in the protein. From left to right are listed the sequence numbers of the first (i) and last (i+3) residues in each turn, the one-letter amino acid codes for each of the four residues in the turn, the turn type, the phi and psi angles of residues i+1 and i+2, the regions of the Ramachandran plot occuped by residues i+1 and i+2 (Appendix II). The final columns show the chi1 angles of residues i+1 and i+2 and the distance between the Calpha atoms of residues i and i+3. Once this page is full, subsequent turns are recorded in files named pdbn_bturn_tab02.ps, pdbn_bturn_tab03.ps etc.
A set of colour postscript schematic diagrams, one for each turn. This provides a Ramachandran plot with residues i+1 and i+2 plotted on it, as well as a schematic plot of the turn with the 4 residues and the Calpha(i) to Calpha(i+3) distance marked and arrows to indicate whether or not residue i donates a hydrogen bond to residues i+3. The residue numbers and turn type are indicated above the Ramachandran plot. 16 beta turns are plotted on each page; subsequent turns are plotted in file pdbn_bturns_02.ps etc.
A flat file containing the same information as in pdbn_bturn_tab01.ps. The information is in the order: residue number and one-letter amino acid code of residues i, i+1, i+2 and i+3; turn type, Ramachandran regions of residues i+1 and i+2, phi(i+1), psi(i+1), phi(i+2), psi(i+2); Y or N to indicate whether or not a hydrogen bond is formed between the NH of residue i+3 and the CO of residue i; chi1(i+1), chi1(i+2); logical sequence number corresponding to the first residue in the turn.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the protein structure from which each turn was derived is given in the first column.
turn type phi(i+1) psi(i+1) classic 75.0 -64.0 inverse -79.0 69.0
where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each gamma turn in the protein. From left to right are listed the sequence numbers of the first (i) and last (i+2) residues in the turn, the one-letter amino acid codes for each of the three residues in the turn, the turn type (classic or inverse), the phi and psi angles of residue i+1 and the distance between the Calpha atoms of residues i and i+2. Once this page is full, subsequent turns are recorded in files named pdbn_gturn_tab02.ps, pdbn_gturn_tab03.ps etc.
A set of colour postscript schematic diagrams, one for each turn. This provides a Ramachandran plot with residue i+1 plotted on it, and a schematic plot of the turn with the 3 residues and the Calpha(i)-Calpha(i+2) distance marked and arrows to indicate the i to i+2 hydrogen bond. The residue numbers and turn type are indicated above the Ramachandran plot.
A flat file containing the same information as in pdbn_gturn_tab01.ps. The information is in the order: residue number and one-letter amino acid code of residues i, i+1 and i+2, turn type, Ramachandran region of residue i+1, phi(i+1), psi(i+1); distance between the Calpha atoms of residues i and i+2; chi1(i+1); logical sequence number corresponding to the first residue in the turn.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the name of the structure in which the turn was found is recorded in the first column.
where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each beta bulge in the protein. From left to right are residue numbers of residue X (on the normal strand), and residues 1 and 2 (on the bulged strand) and one letter amino acid code for each of these residues. The bulge type is described using two letters: the first letter is P or A depending on whether the bulge involves parallel or antiparallel beta strands; the second letter can be Classic, Wide, G1, Bent or Special. The final columns list the phi and psi angles of residues X, 1 and 2. At the moment just these residues are represented in the table, although there may be some additional residues involved in special bulges. These are listed in the flat file described below.
Colour postscript schematic diagrams for each bulge in the protein. For each bulge there is a Ramachandran plot displaying the phi,psi angles of the residues involved in the bulge. The 'normal' residues in the bulge (X, 1 and 2) are indicated by one colour (pink in the default setting of the program), and any remaining residues (3 and 4 if they occur) are displayed in a second colour (sky blue as a default). The type of bulge is indicated above the Ramachandran plot. To the right of the Ramachandran plot there is a schematic diagram indicating the residues and hydrogen bonding in and around the beta bulge. There are 10 bulges per page; further bulges, where present are represented in subsequent files: pdbn_bulges_02.ps etc.
A flat file containing the same information as in pdbn_bulge_tab01.ps.The information is in the order: bulge type indicated by two letters; (residue number, one letter amino acid code, phi, psi) of residues X, 1, 2, 3 and 4. Most of the bulges will have just residues X, 1 and 2; the special bulges may have one or more of residues 3 and 4, and the bent bulges will just have residues 1 and 2. If the residues are not present in a given bulge their phi and psi angles are given as -999.9. (The final numbers in each row give sequential numbering of the residues in the bulge, for use by the plotting program).
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the name of the structure in which the bulge was found is recorded in the first column.
where pdbn represents the characters before the decimal point in the file name i.e. usually the Brookhaven code of the protein. A postscript table giving details of each helix in the protein. The helices are numbered consecutively from the N-terminus to allow helix interactions to be described. The start and end residues of each helix, helix type (alpha or 3,10), number of residues and amino acid sequence are given for each helix.
A postscript table giving details of the geometry of each alpha helix. From left to right are the helix number, the length and unit rise (both in Angstroms), the number of residues per turn (ideally 3.6 for alpha helices), the helix pitch in Angstroms and a measure of the deviation of the helix geometry from an ideal helix (in degrees). This latter value should be 0 for a perfect helix. These parameters are not calculated for helices with less than four residues.
A colour postscript file providing schematic diagrams - helical wheels and helical nets for each helix. The residues are colour coded for hydrophobic (green as default), polar (blue) and charged (red) amino acid types. The wheels and nets assume the ideal helical value of 3.6 residues per turn.
A postscript table giving details of the interacting pairs of helices in the protein. Two helices are defined as "interacting" if they contain one or more atoms within 4.5 Angstroms of the other helix.From left to right the data represent the helix numbers for the two helices involved, their distance of closest approach (in Angstroms), the omega angle between them, the number of interacting pairs of residues and the number of residues in each of the two helices involved in the interaction.
This is a flat file containing the same information as in pdbn_helix_tab01.ps, pdbn_helix_geom01.ps and pdbn_helix_int01.ps. The first part of the file contains the data for each helix. The order of the information is as follows: helix number, chain letter, the first and last residue numbers of the helix, helix type (H (alpha helix) or G ( 3,10 helix). This is followed by the number of residues in the helix and a 9-letter character string indicating the secondary structures immediately surrounding the helix- from 4 secondary structures before the helix to 4 secondary structures after the helix. These are classed as E for strand, T for turn, H for alpha helix, G for 3,10 helix. Then follow various pieces of geometrical information about each helix: length in Angstroms, unit rise, number of residues per turn, pitch and a measure of the linearity of the helix. The final information in the file is the sequence of the helix recorded as a string of 1-letter codes.
The second section of the file gives details of the interacting pairs of helices in the protein. For each such pair the file gives the helix numbers and helix type for each helix, distance of closest approach and the omega angle between the helices. The next two letters indicate where in each of the two helices the position of closest approach to the other helix is located (N: beyond the N-terminus, C: beyond the C-terminus, I: within the helix). Also recorded are the number of residues in each of the helices which are involved in the interaction, as well as the total number of interacting residues.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, but in addition the name of the structure in which the helix was found is recorded in the first column.
The flat file for helix interactions created using the list option. The file contains the same information as above. The Brookhaven code of the protein in which the helix interaction occurs is also given in the first column.
A postscript table giving the strand number (sequentially numbered from the N-terminus of the protein, start and end residues, the letter corresponding to the sheet in which the strand is involved , number of residues and sequence of each strand in the protein. The strands are numbered consecutively from the N-terminus of the protein.
A postscript table giving details of each beta sheet in the protein. The sheet letter is given, as well as the number of strands, the nature of the sheet (antiparallel, parallel or mixed). If the sheet forms a closed beta-barrel this is also mentioned here. Finally the table gives the topology of the sheet, using the nomenclature of Richardson (1981).
This contains up to two tables. The first gives details of the beta-alpha-beta units, which consist of two parallel hydrogen bonded beta strands connected by an alpha helix. The table gives the start and end residues of each of the two strands, the number of residues in each of the strands. the length of the loop and the length of the helical part of the loop. The second table gives details of the psi-loops if any. These consist of two antiparallel strands connected by a '+2' connection i.e. with one strand in between, hydrogen bonded to both of them (Tang et al, 1978). In contrast to the beta-alpha-beta units these occur very rarely in proteins (Hutchinson & Thornton, 1990). Just the start and end residues of each of the two strands involved are included at the moment.
This is a flat file giving information about each strand in the protein as in pdbn_strands_tab01.ps. Some additional information is given relative to the postscript tables. The order of information is as follows: strand number; initial and final residues; sheet letter; number of residues in strand; amino acid sequence; number of hydrogen bonding partners of strand; 4 columns representing the strand numbers of these partners (if there are less than 4 partners, the remaining numbers are 0); 4 signs giving the orientation of these partners relative to the current strand (- for antiparallel and + for parallel); Richardson nomenclature for the topology of the connection to the next strand in the sequence (99 means that the next strand is in a different sheet); a character string to represent 4 secondary structure units before and 4 secondary structure units after the strand (E for strand, H for alpha helix, G for 3,10 helix and T for turn).
This flat file contains details of the beta sheets in the protein as in pdbn_sheets_tab01.ps. Two lines of data are given for each sheet. The first line gives the sheet letter, the number of strands in the sheet, a letter indicating whether the sheet is parallel (P), antiparallel (A) or mixed (M) and a letter indicating whether the sheet is a closed barrel (Y) or not (N). The second row gives the topology of the sheet according to the nomenclature of Richardson (1981).
This file gives information about the beta alpha beta units in the protein, if any. From left to right the data represent the initial and final residues of strand 1, initial and final residues of strand 2, length of strand 1, length of strand 2, length of loop, length of helix. (The final 2 numbers represent the logical sequence number of the beginning and end residues; these are used internally by the rest of the program).
This file gives information about the psi-loops in the protein, if any. From left to right the data represent the initial and final residues of strand 1, initial and final residues of strand 2 and the logical sequence number of the beginning and end residues.
These are the corresponding flat files created when promotif_multi or promotif_nmr are run using the list (l) option. They contain the same data as in the above files, and the name of the structure in which each motif was found is also recorded in each case.
For the smaller loops the hairpins are dominated by the formation of beta turns (usually I' and II'). The 3:5 hairpins are dominated by one well defined conformation which can be described as a type I turn followed by a G1 bulge. The most common class among the 4:4 hairpins contains a type I beta turn. Where these particular conformations occur,they are indicated by appropriate letters after the main classification.
If there is a break in the polypeptide chain within the loop region the hairpin will not be classed by PROMOTIF and the classification will be given as 0:0.
A black and white postscript table giving the start and end residues for each of the two beta-strands involved in the hairpin, the number of residues in each of these two strands and the hairpin class.
A colour postscript schematic diagram for each hairpin. The left hand diagram shows the residue numbers and hairpin classification, with the lengths of the strands in the diagram proportional to the number of residues. The right hand diagram shows the sequence of the hairpin, with the strand residues in green and the loop residues in purple. The main-chain hydrogen bonds are indicated in pink.
A flat file giving information as in pdbn_hairpin_tab01.ps for each hairpin. From left to right the information is as follows: strand numbers of the two strands involved, beginning and end residues of each strand, hairpin classification, number of residues in the two strands, sequence of strand 1, sequence of loop, sequence of strand 2, a string representing phi,psi classification of the loop, and various other numbers for use internally by the plotting program.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The file contains the same data as above, and the name of the structure in which the hairpin was found is also recorded.
Disulphide type Chi2 Chi3 Chi2' left handed spiral - - - right handed hook + + - right handed spiral + + + short right handed hook - + -
Note that the chi2 and chi2' values can be interchanged, as they merely reflect which of the two cysteines involved in the bridge is mentioned first. If the other cysteine were mentioned first the chi and chi' values would be interchanged.
Richardson found that the majority of disulphides could be classed as left handed spirals or right handed hooks.
A postscript table which gives details of each disulphide bridge found in the protein, the residue numbers of the two cysteines involved, chi1, chi2, chi3, chi2' and chi1' values, the distance between the Calpha atoms of the residues involved and the classification of the disulphide bridge, according to the above table, where assigned.
A flat file containing details of each disulphide bridge found in the protein as in pdbn_disulph_01.ps. The type of bridge is abbreviated (RHH: right hand hook; SRH: short right hand hook; LHS: left handed spiral; RHS: right handed spiral), and the final two columns in the file are used internally by the program.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option. The data are the same as above, and the name of the structure in which the disulphide bridge was found is also recorded.
A page consisting of a set of postscript tables giving a summary of the secondary structure and motifs in the protein. The top part of the diagram shows the name of the protein, the amino acid sequence and the number of residues and chains. The number of residues and sequence refer to those residues actually observed in the electron density. Any difference between this and the actual sequence is indicated by the number of disordered residues. The number and percentage of residues in each secondary structural type are shown below this. A string of characters indicates the sequence of secondary structures as assigned by the Kabsch and Sander algorithm. In this E represents strands, H and G represent alpha and 3,10 helices respectively and T represents turns. For turns in particular this may not correspond exactly to the assignments by PROMOTIF, because the Kabsch and Sander assignments are based solely on hydrogen bonding criteria. In the second part of the figure mini-tables summarise the location of each type of motif in the protein as found by PROMOTIF.
A flat file giving summary secondary structure information as in pdbn_summary_01.ps. The file gives the protein name identified from the Brookhaven file, the amino-acid sequence, the number of residues and chains, the number of strands, alpha-helices and 3,10 helices and the percentage of residues in each of these secondary structure types and a character string representing the sequence of secondary structures in the protein.
This is the corresponding flat file created when promotif_multi or promotif_nmr is run using the list (l) option.
The left hand side of the figure represents consensus information about the set of proteins. By default the consensus is calculated using all motifs that occur in more than 50% of the set of structures. The fraction of structures used to calculate the consensus can be varied by editing the promotif2.prm file (Appendix I). The columns show, from left to right, the residue number (No.), amino acid sequence (Seq) and consensus secondary structure assignments (SS) derived using the modified Kabsch and Sander algorithm (h/H, alpha-helix; t/T, turn; e/E, beta-strand; and S, bend). The remaining four columns of the consensus structure indicate the locations of beta-turns (BT), gamma-turns (GT), beta-bulges (BG) and disulphide bridges (DS) in the consensus structure. Where present, these motifs are indicated by the class of the motif in the appropriate column for a given residue. For beta- and gamma-turns, in addition to the various turn types (I, I', II, II', IV, VIII, VIa1, VIa2 and VIb for beta-turns and IN(VERSE) and CL(ASSIC) for gamma turns) a residue can also be classified as part of a composite turn (C) if it is involved in more than one turn or simply beta or gamma if the consensus structure has a turn, but there is no dominant turn type. Bulges are indicated by two letters, A or P, depending on whether the strands are antiparallel or parallel, and C(lassic), W(ide), S(pecial) or B(ent), depending on the pattern of hydrogen bonds.
The remainder of the figure represents differences from this consensus structure for each of the proteins in the data set. The numbers at the top of the columns represent the individual proteins in the list. The left hand column of the data for each protein highlights differences in secondary structure - extra secondary structure is indicated by the appropriate letter and secondary structure missing with respect to the consensus is indicated by the consensus structure with an X through it. The remaining space indicates differences in the turns, bulges and disulphides. If one of these is present in a particular structure and absent in the consensus, or if the motif type is different from the consensus, the residue is marked with the motif type. If a motif present in the consensus is absent from an individual structure, this is indicated by a cross through the motif (beta: beta-turn; gamma: gamma turn, BG: bulge).
Each page of the output contains imformation about a maximum of 76 residues (vertically) and 16 structures (horizontally). The names of the postscript files depend on the name of the file containing the list of proteins. If, for example you have run
promotif_multi c list
the first file of comparison data (showing the first 76 residues of the first 16 proteins in the list) will be called list01_1.ps. Data for subsequent residues are found by incrementing the first number, thus list02_1.ps will contain the next 76 residues for the same 16 proteins. Data for subsequent proteins are found by incrementing the second number, thus list01_2.ps will contain information on the first 76 residues for the next set of 16 proteins. If you have run promotif_nmr the program generates a list of proteins in a file called nmrlist and thus the comparison data will always be in files nmrlist01_1.ps, nmrlist02_1.ps etc.
Chan, A. W. E., Hutchinson, E. G., Harris, D. & Thornton, J. M. (1993) Identification, classification and analysis of beta-bulges in proteins. Protein Science 2, 1574-1590.
Efimov, A. V. (1991) Structure of alpha-alpha hairpins with short connections. Protein Engineering 4, 245-250.
Hutchinson & Thornton (1990) HERA - A program to draw schematic diagrams of protein secondary structure. Proteins Struct. Funct. Genet. 8, 203-212.
Hutchinson, E. G. & Thornton, J. M. (1994) A revised set of potentials for beta turn formation in proteins. Protein Science 3, 2207-2216.
Hutchinson, E. G. & Thornton, J. M. (1996) PROMOTIF - A program to identify and analyze structural motifs in proteins" Protein Science 5, 212-220
IUPAC-IUB Commission on Biochemical Nomenclature (1970) Abbreviations and symbols for the description of the conformation of polypeptide chains. J. Mol. Biol. 52, 1-17.
Kabsch, W. & Sander, C (1983) Biopolymers 22, 2577-2637.
Lewis, P. N., Monany, F. A. & Scheraga, H. A. (1973) Chain reversals in proteins Biochem. Biophys. Acta 303, 211-229.
Milner-White, E. J., Ross, B. M., Ismail, R., Belhadj-Mastefa, K. & Poet, R. (1988) One type of gamma turn, rather than the other, gives rise to chain reversal in proteins. J. Mol. Biol. 204, 777-782.
Richardson, J. S. (1981) The anatomy and taxonomy of protein structure. Adv. Protein Chem. 34, 167-339.
Rose, G. D., Gierasch, L. M. & Smith, J. A. (1985) Turns in peptides and proteins. Adv. Prot. Chem. 37, 1-109.
Sibanda, B. L., Blundell, T. L. & Thornton, J. M. (1989) Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. J. Mol. Biol. 206, 759-777.
Tang, J., James, M. N. G., Hsu, I. N., Jenkins, J. A. & Blundell, T. L. (1978) Structural evidence for gene duplication in the evolution of the acid proteases nature 271, 618-621.
Venkatachalam (1968) Stereochemical criteria for polypeptides and proteins V. Conformation of a system of three linked peptide units. Biopolymers 6, 1425-1436.
The second section of the file refers to colour output and we have assigned a colour to each of the parts of the colour schematic diagrams. If you would like a different colour combination then change the colour to one of those listed in the file. Alternatively you can change the colours and their rgb codes completely by editing the appropriate numbers in the file.
Region English letter alpha A, a* beta E B beta P P alpha L L gamma L G epsilon E
* A refers to the core regions of the Ramachandran plot occupied by alpha-helical residues in good quality high resolution structures. a refers to a region immediately surrounding the A region, which is occupied by helical residues in less well defined structures.
In our analysis the assignment of individual phi,psi values to a particular region is based on comparison with a matrix, in which each 10 x 10 degree interval in phi,psi space is assigned to a region. The matrix was calculated based on contouring the Ramachandran plot derived from a set of non-homologous protein structures. This matrix is encoded for use in the program as the file phipsi.mat. Residues with phi,psi values which fall outside these regions do not have an assigned phi,psi region and appear as blanks in the relevant tables.
Gail Hutchinson 22nd May, 1996