What is SARF?

Program SARF2 searches for common Spatial ARrangements of backbone (actually Ca-trace) Fragments (SARFs) in protein structures.

The algorithm finds all the compatible pairs of the secondary structure (SS) elements (a-helix or b-strand) and then forms ensembles of the mutually compatible pairs.

Reference

Alexandrov, N.N. SARFing the PDB. Protein Engineering (1996), 9:727-732.

How to run the program

To run the program type 'sarf2'.

Files needed

sarf2 - the program SARF2;
PARAM - parameters;
ALPHA - fragment of the helical structure;
BETA - fragment of the b-strand;
list1 and list2 - lists of the PDB-formatted files;
PDB-formatted files of structures to compare.

SARF2 reads file PARAM with parameters, including names of the lists of the PDB-formatted files with protein structures to compare and directory where they are located. SARF2 compares two lists of protein structures: list1 and list2. To compare just two structures, put the name of the PDB-formatted file with the first structure into list1 and the name of the file with the second one into list2. In the list of protein structures you can specify chain after the space. The program reads the names of the files with the lists of structures (list1 and list2) and the directories containing files with structures (dir1 and dir2) from the file PARAM. If dir1==dir2 and list1==list2, the program makes all-against- all comparisons of the structures from the list1.

Other parameters

Resulting file keeps the best match for each pairwise comparison. Each line of the file has two PDB identifiers, sizes of the proteins, size of the match, rmsd, fraction of the matched residues in the smaller protein (%), fraction of the identical residues in the match (%), short names of the proteins.

Print level - use 2.

Min fragment size (for SS) - use 5, for small proteins (less than
60 residues) use 4. 

Max rmsd for alpha 1 - use 0.4
Max rmsd for alpha 2 - use 0.3
Max rmsd for beta 1 - use 0.8
Max rmsd for beta 2 - use 0.4

Max angle between compatible SS - should be from 30 to 60 degrees.
For most cases the recommended value is 50 degrees.

Error in distance between compatible SS - should be from 1.0 to 2.0 A. 
The recommended value is 1.5.

Filtering distance should be from 3 to 5. 
The recommended value is 4.

Min fragment size (residue level) - use 5.

Rmsd for two SS pairs from ensemble - is not used now.

Number of the ensembles - should be from 50 to 300. 
Recommended value is 200, but if you don't care about
the time of the calculations, make it 300. 

Fraction of the matched residues - if the similarity
between two proteins is larger than this value, then
the alignment is reported.

Min number of SS in ensemble - should be from 2 to 5. In theory,
it shows how many SS elements we should expect in the match. In 
fact it is not exactly so. For small protein use 2, for most 
cases use 3. 

Fragment size (extension) - the parameter shows the min size of the
fragments forming the similarity. Should be from 4 to 7. For most 
cases use 5.

Number of iterations in extension - use 4.

Mean dist for extension - should be from 3 to 6. Use 4.5.
Indel for extension - use 2.
Min dist for extension - use 3.

Max dist for extension - should be from 3 to 6. This parameter
defines the max distance between a pair of the corresponding
atoms. The recommended value is 5.

Max rmsd for extension - this parameter together with the
previous one put the limit of the resulting r.m.s.d. in the
match. It should be from 2.5 to 3.5. recommended value is 3.2.

To incorporate single residues? should be 'Y' if you want to
incorporate single residues into the match. If you want only 
fragments type 'N'. The recommended is 'N', however 'Y' increases
the total number of the residues in the match. There is a bug
in the program which cause the crash sometimes if you chose 'Y'.

Consecutive fragments only? should be 'Y' if you want to 
include only consecutive fragments. Type 'N' if you want
to allow opposite direction of the fragments and disregard the
connectivity. It is recommended to compare proteins first
using 'Y' option and then to try 'N'.

Print sequence alignment? when 'Y' an output includes 
a sequence alignment.

Output

The program's output begins with the input parameters, followed by the results of pairwise comparison of protein structures. Each protein from the list1 is compared with each protein from the list2. Every time the program inputs new protein from the list1, it prints 'FIRST PROTEIN' and its filename. Every time the program inputs new protein from the list2, it prints 'next:' and filenames of the proteins with their sizes in brackets.

The program reports several similarities, which are sorted according to their size (the biggest first). The report about each similarity begins with the name of proteins, followed in the same line by the number of the residues in the similarity, r.m.s.d. between corresponding Ca-atoms after the superposition, fraction of the matched residues in the smaller protein, fraction of the identical residues in the match. If the fraction of the matched residues in the smaller protein is larger than the value in file PARAM ('Fraction of the matched residues'), the alignment is reported, showing the corresponding residues. PDB enumeration is used.

How to use SARF2 ... some additional brief instructions by Richard (Sep 99)

(i)    copy the files /usr/local/sarf2/ALPHA
                      /usr/local/sarf2/BETA
                      /usr/local/sarf2/PARAM
 
       into your working directory

(ii)   create files xlist and ylist, which are lists of the pdb files to be 
       superposed. Files in ylist get superposed on those in xlist.
       NOTE - I think the program can only accept pdb file names up to
       six characters in length (now there's a nice feature). So you may
       have to do some renaming or some tricky symbolic linking of your files.

(iii)  type sarf2

(iv)   look at the ouput

(v)    if you're happy with the superposition and you want to actually
       apply it to the molecules involved you'll have to use a pdb 
       manipulation program - like pdbset from the ccp4 suite for example.
       NOTE that because of rounding errors the determinant of the 
       rotation matrix may not be exactly 1 and you may have to fix this.

       Let's say this is the relevant output from sarf2


       rsv.pdb    ( 76) hiv.pdb    ( 71) common SARF #  1
       --------------------------------------------------

         155  -   209     152  -   206 
         212  -   223     208  -   219 

        67 Ca-atoms ( 94%), rmsd = 1.85,  22% identical residues


       alignment of  4 SS elements: 4 alpha and  0 beta

         a01a02a03a04                            
         a01a02a04a05                            

       translation vector t1 = (  -0.528  -1.379   1.355)
       translation vector t2 = (  20.840  25.764   0.513)
       rotation matrix r:
         -0.992   0.020  -0.126
         -0.126  -0.311   0.942
         -0.020   0.950   0.311

       then we need to apply (to the 2nd molecule - in this case hiv.pdb)
       -t2, the rotation, +t1
       
       Here's how to apply this transformation with pdbset ...

       The rotation matrix has been adjusted slightly to make its
       determinant 1 (using the program convrot)

#!/bin/csh -f

setenv CCP4_OPEN UNKNOWN

./pdbset XYZIN hiv.pdb XYZOUT junk1.pdb << $
ROTATE 1.0 0.0 0.0 -
       0.0 1.0 0.0 -
       0.0 0.0 1.0
SHIFT  -20.840  -25.764   -0.513
$

./pdbset XYZIN junk1.pdb XYZOUT junk2.pdb << $
ROTATE MATRIX   -0.99201   0.02000  -0.12600 -
                -0.12600  -0.31100   0.94201 -
                -0.02000   0.95001   0.31100
$

./pdbset XYZIN junk2.pdb XYZOUT hiv_transformed.pdb << $
ROTATE 1.0 0.0 0.0 -
       0.0 1.0 0.0 -
       0.0 0.0 1.0
SHIFT   -0.528  -1.379   1.355
$

For more information, look at sarf's homepage