OVRLAP

A program for comparing and superimposing distantly related protein structures

Purpose:

The problem of superimposing closely or distantly related protein structures is not a simple one. The OVRLAP algorithm is still one of the best around as it does provide a 3-D superposition for inspection. OVRLAP is based on the algorithms developed by Michael Rossmann and Patrick Argos in the 1970's. This version was written by Bill Bennett. The reference Rossmann & Argos J. Biol. Chem. 250, 7525 (1975) should be consulted for details of the probability arguments.

OVRLAP requires a minimum set of equivalences to get started and will find a superposition for any pair of proteins or domains. The algorithm is based on probability arguments: two residues are deemed equivalent if they are "close in space" (controlled by the E1 value) and the chain is "running in the same direction" (controlled by E2). The probability distribution is modelled as a gaussian. Equivalences are redefined or discarded as the algorithm progresses and the procedure is extremely robust. Alpha carbon coordinates are used in this version.

Implementation:

OVRLAP is a Fortran 77 program that will run on almost all machines with an f77 compiler, including PCs. Source code is available via the link at the end of this page.

Use:

There are a number of variable parameters in the code, but suitable defaults have been chosen and there is usually no need to change these. Some comments in the source code discuss the meaning of the variable parameters and suggest default values based on the experience of the author.

The program is run interactively with input from "standard input" but will also take input from a text file.

Input data:

"Trial" coordinate file name (PDB format)

"Reference" coordinate file name (PDB format).

"Trial" is superimposed on "Reference". Only alpha carbons are stored.

Initial equivalences (pairs of alpha carbon positions).

At least 3, default is 5. Input is free format, pairs of residue "numbers" separated by blanks or commas.
CHAIN IDENTIFIERS are ignored, so you may need to edit coordinates to restrict the possibilities.

E values: E1, E2, E1min, E2min

(angstroms, useful range is about 0-8)

The E-values (sigma values for bivariate gaussian probability distribution) control the equivalences defined by program. They are in angstroms, related to the rms errors. E1 and E2 are the initial values and these are refined downward toward the limits specified by E1min, E2min as the set of equivalences is refined. If you let the program run with the defaults, as it refines the superposition it will discard or reassign equivalent residues to give you a tight but small set of equivalences. The process converges when no further change of equivalences is obtained from one cycle to the next. To set a limit to this process for distantly related proteins, choose E1min and E2min to be 2-3 angstroms, so for example input of 5,5,2,2 would be appropriate.

Output:

In addition to a small amount of terminal output, the program produces more details in the form of two files:

a listing file "ovrlap.lis"

At the end of the listing file, a set of "structural equivalences" is given along with the amino acid sequence information. The trial structure is in the extreme left hand column. Nonequivalent loops are left unassigned. In general, smaller values for E1min and E2min will increase the gap sizes and decrease the number of equivalenced residues.

a matrix file "rtn.dat"

If you like the results of an OVRLAP run, use the program EDPDB to read rtn.dat and apply it to your trial coordinate set to perform the superposition. The relevant EDPDB command is "rtn file", which will apply the matrix and vector contained in rtn.dat to the currently selected "on" atoms. The Fortran statement describing rtn.dat is FORMAT(4(3f10.5/))

Finally, the program decomposes the transformation into a rotation about and translation along a screw axis. It further supplies (in the list file) a transformation that rotates the screw axis to place this along your line of sight (the Z axis) when viewing the superimposed structures. This places the rotation component in the plane of a drawing so that you can optimally represent the transformation for a figure in a publication. This approach is useful for representing domain motions in enzymes.

In a typical case of domain motion comparison, one has two or more crystal structures or independent molecules in the asymmetric unit. To represent the domain motion, one first needs to choose the "fixed" part of the molecule(s) and superimpose these. Then, an additional superposition for the variable domains is calculated and performed. The screw axis decomposition described above is the rigid body motion that best represents the overall domain motion.

Example:

Run interactively (type "ovrlap")
or redirect from input file (e.g. ovrlap < ovrlap.inp):

Contents of ovrlap.inp (5 initial equivalences)
kfpa.pdb
zfpa.pdb
5
10 10
20 20
30 30
100 100
150 150
5,5,2,2

You might try several alternatives for starting superpositions if you aren't sure about how these will affect the outcome. If the proteins have even vaguely similar folds, any reasonable set of initial equivalences will usually give the same answer.

Source code

ovrlap.f