Program SARF2 searches for common Spatial ARrangements of backbone (actually Ca-trace) Fragments (SARFs) in protein structures.
The algorithm finds all the compatible pairs of the secondary structure (SS) elements (a-helix or b-strand) and then forms ensembles of the mutually compatible pairs.
Alexandrov, N.N. SARFing the PDB. Protein Engineering (1996), 9:727-732.
To run the program type 'sarf2'.
SARF2 reads file PARAM with parameters, including names of the lists of the PDB-formatted files with protein structures to compare and directory where they are located. SARF2 compares two lists of protein structures: list1 and list2. To compare just two structures, put the name of the PDB-formatted file with the first structure into list1 and the name of the file with the second one into list2. In the list of protein structures you can specify chain after the space. The program reads the names of the files with the lists of structures (list1 and list2) and the directories containing files with structures (dir1 and dir2) from the file PARAM. If dir1==dir2 and list1==list2, the program makes all-against- all comparisons of the structures from the list1.
Resulting file keeps the best match for each pairwise comparison. Each line of the file has two PDB identifiers, sizes of the proteins, size of the match, rmsd, fraction of the matched residues in the smaller protein (%), fraction of the identical residues in the match (%), short names of the proteins.
Print level - use 2.
Min fragment size (for SS) - use 5, for small proteins (less than 60 residues) use 4. Max rmsd for alpha 1 - use 0.4 Max rmsd for alpha 2 - use 0.3 Max rmsd for beta 1 - use 0.8 Max rmsd for beta 2 - use 0.4 Max angle between compatible SS - should be from 30 to 60 degrees. For most cases the recommended value is 50 degrees. Error in distance between compatible SS - should be from 1.0 to 2.0 A. The recommended value is 1.5. Filtering distance should be from 3 to 5. The recommended value is 4. Min fragment size (residue level) - use 5. Rmsd for two SS pairs from ensemble - is not used now. Number of the ensembles - should be from 50 to 300. Recommended value is 200, but if you don't care about the time of the calculations, make it 300. Fraction of the matched residues - if the similarity between two proteins is larger than this value, then the alignment is reported. Min number of SS in ensemble - should be from 2 to 5. In theory, it shows how many SS elements we should expect in the match. In fact it is not exactly so. For small protein use 2, for most cases use 3. Fragment size (extension) - the parameter shows the min size of the fragments forming the similarity. Should be from 4 to 7. For most cases use 5. Number of iterations in extension - use 4. Mean dist for extension - should be from 3 to 6. Use 4.5. Indel for extension - use 2. Min dist for extension - use 3. Max dist for extension - should be from 3 to 6. This parameter defines the max distance between a pair of the corresponding atoms. The recommended value is 5. Max rmsd for extension - this parameter together with the previous one put the limit of the resulting r.m.s.d. in the match. It should be from 2.5 to 3.5. recommended value is 3.2. To incorporate single residues? should be 'Y' if you want to incorporate single residues into the match. If you want only fragments type 'N'. The recommended is 'N', however 'Y' increases the total number of the residues in the match. There is a bug in the program which cause the crash sometimes if you chose 'Y'. Consecutive fragments only? should be 'Y' if you want to include only consecutive fragments. Type 'N' if you want to allow opposite direction of the fragments and disregard the connectivity. It is recommended to compare proteins first using 'Y' option and then to try 'N'. Print sequence alignment? when 'Y' an output includes a sequence alignment.
The program's output begins with the input parameters, followed by the results of pairwise comparison of protein structures. Each protein from the list1 is compared with each protein from the list2. Every time the program inputs new protein from the list1, it prints 'FIRST PROTEIN' and its filename. Every time the program inputs new protein from the list2, it prints 'next:' and filenames of the proteins with their sizes in brackets.
The program reports several similarities, which are sorted according to their size (the biggest first). The report about each similarity begins with the name of proteins, followed in the same line by the number of the residues in the similarity, r.m.s.d. between corresponding Ca-atoms after the superposition, fraction of the matched residues in the smaller protein, fraction of the identical residues in the match. If the fraction of the matched residues in the smaller protein is larger than the value in file PARAM ('Fraction of the matched residues'), the alignment is reported, showing the corresponding residues. PDB enumeration is used.
(i) copy the files /usr/local/sarf2/ALPHA /usr/local/sarf2/BETA /usr/local/sarf2/PARAM into your working directory (ii) create files xlist and ylist, which are lists of the pdb files to be superposed. Files in ylist get superposed on those in xlist. NOTE - I think the program can only accept pdb file names up to six characters in length (now there's a nice feature). So you may have to do some renaming or some tricky symbolic linking of your files. (iii) type sarf2 (iv) look at the ouput (v) if you're happy with the superposition and you want to actually apply it to the molecules involved you'll have to use a pdb manipulation program - like pdbset from the ccp4 suite for example. NOTE that because of rounding errors the determinant of the rotation matrix may not be exactly 1 and you may have to fix this. Let's say this is the relevant output from sarf2 rsv.pdb ( 76) hiv.pdb ( 71) common SARF # 1 -------------------------------------------------- 155 - 209 152 - 206 212 - 223 208 - 219 67 Ca-atoms ( 94%), rmsd = 1.85, 22% identical residues alignment of 4 SS elements: 4 alpha and 0 beta a01a02a03a04 a01a02a04a05 translation vector t1 = ( -0.528 -1.379 1.355) translation vector t2 = ( 20.840 25.764 0.513) rotation matrix r: -0.992 0.020 -0.126 -0.126 -0.311 0.942 -0.020 0.950 0.311 then we need to apply (to the 2nd molecule - in this case hiv.pdb) -t2, the rotation, +t1 Here's how to apply this transformation with pdbset ... The rotation matrix has been adjusted slightly to make its determinant 1 (using the program convrot) #!/bin/csh -f setenv CCP4_OPEN UNKNOWN ./pdbset XYZIN hiv.pdb XYZOUT junk1.pdb << $ ROTATE 1.0 0.0 0.0 - 0.0 1.0 0.0 - 0.0 0.0 1.0 SHIFT -20.840 -25.764 -0.513 $ ./pdbset XYZIN junk1.pdb XYZOUT junk2.pdb << $ ROTATE MATRIX -0.99201 0.02000 -0.12600 - -0.12600 -0.31100 0.94201 - -0.02000 0.95001 0.31100 $ ./pdbset XYZIN junk2.pdb XYZOUT hiv_transformed.pdb << $ ROTATE 1.0 0.0 0.0 - 0.0 1.0 0.0 - 0.0 0.0 1.0 SHIFT -0.528 -1.379 1.355 $
For more information, look at sarf's homepage