To report bugs, please contact
Cai X.-J. Zhang at chk@uoxray.uoregon.eduAlso see: MATCH1D
3D_A file_name
3D_B file_name3D_A
card is used to input the structural information for protein molecule A. Two files are required for each3D_A
card. One is a coordinate file in PDB format, file_name.pdb. The other is a file of secondary structure list, file_name.dssp. The .dssp file is created with program DSSP (Kabsch & Sander, 1983) in CCP4 package (Evans, 1991).
BEST {ON, OFF}3D_B
is used to input the structural information for protein molecule B. Two files are required for each3D_B
card. One is the coordinate file in PDB format, file_name.pdb. The other is a file of secondary structure list, file_name.dssp. Each3D_B
card activates a comparison between the structure being specified and a structure previously specified using a3D_A
card. The latest defined search criterium (or the default if not defined explicitly) will be used for the homology search.
BEST
card forces the program to output only
those solutions that have either the largest number of matched
residues or the smallest rms diviation. The default is to
output every solutions.
CUTOFF cutoff min_wnor max_rms
The cutoff value is used in two levels of homology search. First, two secondary structural elements, ai & aj, in molecule A are considered to be similar to two secondary structure fragments, bk & bl, in molecule B, if their vectors superposition has an rms deviation less than cutoff (in Å). Within two sets of homologous vectors, it is necessary that each and every corresponding pairs of ai & aj and bk & bl have an rms deviation less than cutoff (in Å). Secondly, the sufficient condition for two sets of vectors to be homologous is that the overall rms deviation must be less than cutoff (in Å). The default value of cutoff is 3.0 Å. The min_wnor is the minimum of weighted-number-of-residues. A potential solution of wnor smaller than the min_wnor will not be listed in the output. It may be used to select a better solution among a group of others. The default is zero (0). The max_rms is the maximum allowed rms deviation between two sets of C-alpha atoms. A potential solution of rms larger than the max_rms will not be listed in the output. The default is no-limitation (coded as 1000.0 Å).INCLUDE parameter_file_name
CLIQUE min_cliqueINCLUDE
defines an input parameter file, which may contain any input cards, including theINCLUDE
card itself. One example of using theINCLUDE
card is to create one template file for each template structure, to include the file name and some suitable search criterion. Such template files can be further grouped to form some libraries for systematic homology searches.
MATCH3D uses the Maximum Common Subgraph (MCS) technique (Bron & Kerbosch, 1973) to search for homologous sets of secondary structures (ie. vectors). min_clique defines the minimum number of vector pairs in two structures for them to be listed as homologous. min_clique must be larger than one (1). The default is five (5).QUIT
QUIT stops the program. It functions the same as [end_of_file] (eg. control-Z while running the program interactively).SEQUENTIAL {ON, OFF}
SEQUENTIAL
card sets a restriction on the solution, ie. whether or not
homologous secondary structures must be in the same sequential order in the two
structures being compared. If this option is turned on, the secondary structure paris
will be in the same order. Otherwise, permutation is allowed between the two
structures. The default is OFF.
!comment
Any input line starting with a semicolon (;) or an exclamation mark (!) will be ignored.
$ run [chk.mcs]match3d.exe cutoff 5.5 clique 5 sequential on 3d_a bgal_e 3d_b 2stv $The output from MATCH3D is the following.
< cutoff 5.5 < clique 5 < sequential on < 3d_a bgal_e 20 vectors in molecule bgal_e < 3d_b 2stv 12 vectors in molecule 2stv The MCS matrix has a dimension of 162 x 162. # 1 clique rms weights rms best vectors omitting ca-atom matches E: 833- 844E: 26 - 37 4.54 1.00 4.22 ' 833- 844' AVLITTAHAWQH ' 26 - 37 ' HKRFALINSGNT E: 881- 888E: 83 - 92 5.04 0.89 3.54 ' 881- 888' RIGLNCQL ' 85 - 92 ' FRFIWFRD H: 964- 967H: 102- 105 4.94 1.00 5.63 ' 964- 967' QQQL ' 102- 105' VLEV E: 982- 990E: 125- 135 4.92 0.90 4.21 ' 982- 990' TWLNIDGFH ' 125- 133' FTILK-VTL E: 1013-1021E: 142- 150 5.68 1.00 2.22 '1013-1021' RYHYQLVWC ' 142- 150' IKDRIINLP rtn polar 130 90 -2 1 0 4 ! 5 vect.( 5.2), 42/ 38 res.( 3.5)In the above output, for each input protein molecule or domain (ie. the
3D_A
or
3D_B
card), MATCH3D lists the number of helices or beta-strands defined by the program DSSP
(Kabsch & Sander, 1983), ie. the number of vectors. It also lists the number of possible pairs of
vectors between the two structures, ie. the dimension of the MCS matrix. The amount of
calculation is roughly proportional to the linear dimension of the matrix, (here it is 162) .
In the list of the matched vectors, H: and E: stand for alpha-Helical and beta-strand (Extended) secondary structures. Note that only the same type of secondary structure vectors can match with one another.
The column of rms omitting lists the rms deviation between the two vector sets while the particular pair of vectors is omitted. A small value of rms omitting often indicates that the corresponding pair is an outlier. In other words, if one deletes the pair from the two vector list, the rms of the rest of the vector pair might be significantly improved.
The column of weights lists the weights for each vector pair. The weight is used in the least square structure superposition (Mclachlan, 1979). Initially, the weight for a given pair of vectors (one from each of the two protein molecules) is set to be
1 - |ni - nj| / |ni + nj| ;where ni and nj are the numbers of residues in the two secondary structure fragments being compared, respectively. The weight is one (1) if the two fragments contain the same number of residues; it gets smaller as ni and nj become more and more different. In other words, the superposition will favor the pairs of secondary structure elements which have the same length, and unfavor to those pairs which have significantly different lengths. During the homology search, if the overall rms of the two vector sets is larger than the user defined cutoff, it indicates that the two vector sets do not match well. In some cases, however a bad match may be caused by only one or two outliers. Therefore, MATCH3D explores the possibility by eliminating the potential outliers. In this case, the weight of the vector pair that has the smallest rms omitting will be reset to zero (0). This new set of weights will be used for recalculating rms omitting and subsequently calculating rms ca-atom.
The column of best matches lists the best match of C-alpha atoms within each pair of vectors at the position determined by the weighted vector superposition. This information is useful when the two vectors are different in length. best matches also lists the amino acid sequences of the matched stretches with the single letter code. The column of rms ca-atom lists the corresponding rms coordinate difference of the C-alpha atoms.
The output line of rtn polar gives the rotation in polar angle (e.g. here 130°, 90° and -2°) and translation along the Cartesian axes (e.g. here 1 Å, 0 Å and 4 Å), which bring the molecule B to molecule A according to the weighted structure superposition (Mclachlan, 1979) of the two sets of C-alpha atoms. In this example, MATCH3D claims that there are five (5) vectors and 42 C-alpha atoms matched between the structures of bgal_e and 2stv, with rms deviations of 5.2 Å and 3.5 Å respectively. In this example the weighted-number-of-residues is 38 in instead of 42. In otherwords, the weight used in the structural overlap is not evenly distributed among the 42 C-alpha atoms. The result does not include possible matches between atoms in some non helical, non beta-stand conformational regions.
In some more general cases, MATCH3D gives more than one solution for a given pair of structures. Some of them may be partially correct. The first choice of solution is usually the one with the largest number of matched C-beta atoms and smallest rms deviation.
Bron, C. & Kerbosch J. (1973). Algorithm 457, finding all cliques of an undirected graph. Commum. A.C.M 16, 575-577.
Evans, P. R. (1991). The CCP4 package program. Crystallographic Computing 5, Edited by Moras, et al. Oxford Science Publications.
Grindley, H. M., et al. (1993). Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphosim algorithm. J. Mol. Biol. 229, 707-721.
Holm, L. and Sander, C. (1993). Protein Structure Comparison by Alignment of Distance Matrices. J. Mol. Biol. 233, 123-138.
Jacobson, R. H., Zhang X-J. Dubose, R. F. and Matthews, B. W. (1994). Three-dimensional Structure of beta-galactosidase from E. coli. Nature, vol 369, 761-766.
Kabsch, W. and Sander, C. (1983). Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen_bonded and Geometrical Features. Biopolymers, Vol. 22, 2577- 2637.
Matthews, B. W. & Rossman, M. G. (1985). Methods enzymol. 115, 397-420.
Mclachlan (1979). J. Mol. Biol. 128, 49-79.
Orengo, C. A., Jones, D. T. and Thornton, J. M. (1994). Protein Superfamilies and Domain Superfolds. Nature, Vol. 372, 631-634.
Taylor W. R. and Orengo C. A. (1989). Protein Structure Alignment. J. Mol. Biol. 208, 1- 22.