The problem of superimposing closely or distantly related protein structures is not a simple one. The OVRLAP algorithm is still one of the best around as it does provide a 3-D superposition for inspection. OVRLAP is based on the algorithms developed by Michael Rossmann and Patrick Argos in the 1970's. This version was written by Bill Bennett. The reference Rossmann & Argos J. Biol. Chem. 250, 7525 (1975) should be consulted for details of the probability arguments.
OVRLAP requires a minimum set of equivalences to get started and will find a superposition for any pair of proteins or domains. The algorithm is based on probability arguments: two residues are deemed equivalent if they are "close in space" (controlled by the E1 value) and the chain is "running in the same direction" (controlled by E2). The probability distribution is modelled as a gaussian. Equivalences are redefined or discarded as the algorithm progresses and the procedure is extremely robust. Alpha carbon coordinates are used in this version.
OVRLAP is a Fortran 77 program that will run on almost all machines with an f77 compiler, including PCs. Source code is available via the link at the end of this page.
There are a number of variable parameters in the code, but suitable defaults have been chosen and there is usually no need to change these. Some comments in the source code discuss the meaning of the variable parameters and suggest default values based on the experience of the author.
The
program is run interactively with input from "standard input" but will also take input from a text file.
The E-values (sigma values for bivariate gaussian probability
distribution) control the equivalences defined by program. They are in
angstroms, related to the rms errors. E1 and E2 are the initial values and
these are refined downward toward the limits specified by E1min, E2min as
the set of equivalences is refined. If you let the program run with the
defaults, as it refines the superposition it will discard or reassign
equivalent residues to give you a tight but small set of
equivalences. The process converges when no further change of equivalences
is obtained from one cycle to the next. To set a limit to this process for
distantly related proteins, choose E1min and E2min to be 2-3 angstroms, so
for example input of 5,5,2,2 would be appropriate.
In addition to a small amount of terminal output,
the program produces more details in the form of two
files:
At the end of the listing file, a set of "structural equivalences" is given
along with the amino acid sequence information. The trial structure is in the
extreme left hand column. Nonequivalent loops are left unassigned. In general,
smaller values for E1min and E2min will increase the gap sizes and decrease the
number of equivalenced residues.
If you like the results of an OVRLAP run, use the program EDPDB
to read rtn.dat and apply it to your trial coordinate set to perform the
superposition. The relevant EDPDB command is "rtn file", which will apply
the matrix and vector contained in rtn.dat to the currently selected "on"
atoms. The Fortran statement describing rtn.dat is FORMAT(4(3f10.5/))
Finally, the program decomposes the transformation into a rotation about and
translation along a screw axis. It further supplies (in the list file) a
transformation that rotates the screw axis to place this along your line of
sight (the Z axis) when viewing the superimposed structures. This places the rotation
component in the plane of a drawing so that you can optimally represent the
transformation for a figure in a publication. This approach is
useful for representing domain motions in enzymes.
In a typical case of domain motion comparison, one has two or more crystal structures or independent molecules in the asymmetric unit. To represent the domain motion, one first needs to choose the "fixed" part of the molecule(s) and superimpose these. Then, an additional superposition for the variable domains is calculated and performed. The screw axis decomposition described above is the rigid body motion that best represents the overall domain motion.
Run interactively
(type "ovrlap") Contents of ovrlap.inp (5 initial equivalences)
You might try several alternatives for starting superpositions if you
aren't sure about how these will affect the outcome. If the proteins
have even vaguely similar folds, any reasonable set of initial equivalences will
usually give the same answer.
Input data:
CHAIN IDENTIFIERS are ignored, so you may need to edit coordinates to restrict the possibilities.
Output:
Example:
or redirect from input file
(e.g. ovrlap < ovrlap.inp):
kfpa.pdb
zfpa.pdb
5
10 10
20 20
30 30
100 100
150 150
5,5,2,2
Source code
ovrlap.f