WHAT IF allows you to do superpositionings of structures or parts of
structures on top of each other. All these operations are performed from
the SUPPOS menu. The general flow of events is as follows. You first
define a set of atoms and/or residues on top of which you want to superposition
a second set of atoms and/or residues. Then you give this second set. The
command DOSUP then calculates the best rotation matrix and translation vector
that, when applied to the second set, puts it on top of the first set.
You can optionally store this matrix and vector in the matrix and vector
database. The command APPLY will then prompt you for a range of residues (or
drugs, or,...) and will apply the presently active transformation (that is the
last calculated or read matrix/vector) to this range. Several options are
available for automatic superpositionings, for statistics and for comparison
of structures.
The 3SSP menu holds all options that are related to automatically compare
one structure with all structures in the database.
The format for the storage of transformation files is as follows:
One line A80 with the title.
One line A80 with the date and time.
These first two lines are of no importance, and they may contain anything you
want (they may even be empty, but they should be there).
Three lines 3F10.6,10X,F10.4 for the transformation matrix and vector. In the
following order:
R11 R12 R13 T1
R21 R22 R23 T2
R31 R32 R33 T3
an example:
test1
0.999999 0.000995 0.001096 0.0184
-0.000997 0.999998 0.001606 -0.0174
-0.001094 -0.001607 0.999998 0.0125
WHAT IF stores these matrices as files on disk. Only one matrix can be in
memory at any moment. WHAT IF only remembers which matrices exist.
Matrices are stored in files called SUPMAT.MAT** where ** is 1,2,3,4,...
PUTMAT will add the presently active transformation to the superposition
transformation database. In case the space reserved in the database is
full, a warning will be issued.
WHAT IF keeps only one transformation in memory. This is called the active
transformation. WHAT IF can however keep track of many (presently 25)
transformations. It only keeps the names of files (format: 1 line A80 for
the title; 1 line A80 with the date and
time of creation; 3 lines 3F10.6,10X,F10.4
for the matrix and vector) in which the superposition transformations are
stored. GETMAT copies the transformation from a file to memory. You will be
prompted for the number of the file. Use LSTMAT to get a directory of
available transformations. Use GETOLD to read a transformation file not
created by WHAT IF.
LSTMAT will cause WHAT IF to show you the names (=title lines) and the numbers
of the transformations presently available in the database. You only need
to give the number of a transformation if you are prompted for it. The title
is just there to aid you keeping the matrices apart.
DELMAT will cause WHAT IF to prompt you for the number of a transformation.
This transformation will then be removed from the database. The file
corresponding with it will stay in your directory, and can later still be
recovered with the GETOLD command. You can get a directory
of transformations in the database with the LSTMAT command.
The command DIRMAT will cause WHAT IF to go over all transformations in the
transformation database, and list their names (=title lines), their creation
data and time, and the actual transformation matrices and vectors.
With the GETOLD command you can read an external transformation file. This can
either be a file created by another program or by hand, or a
transformation left over from another WHAT IF session which did not get
stored. It can of course also be a transformation that was earlier deleted.
See above for the required file format.
SHOMAT will cause WHAT IF to show you the presently active transformation
matrix and vector.
WHAT IF creates transformation matrices that superpose one set of atoms
on top of another set. The set that stays on the same place is called set 1.
The command RANGE1 allows you to (re-)create set 1. You will be prompted
for residue ranges until you give 0 as input. You can only give a range
of residues that is in the same molecule. If you need ranges spanning
multiple molecules (eg 12 aa and a drug), you need to give 1 or more
zones for the protein, and one per drug. DNA/RNA, water and single atomic
drugs are not allowed in set 1. Every time you execute a SUPPOS command
that does more than range setting the previous set 1
and set 2 are thrown away. In case you enter amino acids, only the alpha
carbons are used. In case you enter a drug, all atoms are used.
In case of mixed sets you should keep this in mind, because
every atom bears the same weight in the lsq superpositioning algorithm.
Overlapping ranges are not allowed in set 1. Ranges that lay head to tail
in the molecule are automatically converted to one single range.
After you created set 1 with the RANGE1 command, you can create set 2 with
the RANGE2 command. For every zone of contiguous alpha carbons found in set 1
you will be prompted for a zone of amino acids that should be put in set 2.
The superpositioning will then try to put the corresponding alpha carbons
on top of each other. If the zone you enter for set 2 is too short, the tail
of the corresponding zone in set 1 is removed. If you enter too many residues
the tail of this zone in set 2 is removed. For every drug in set 1 you
will be prompted for a drug in set 2. If the drug in either of the two zones
has too many atoms, it will be truncated. The atom order in both drugs
should be identical!
WARNING!
Due to a small design flaw, it is not possible to have two stretches in either
of the two ranges that are neighbors in the soup. Eg. put 1-10 in range 1
on top of 2-11 in range 2 can not be combined with 11-20 in range 1 on
top of 27-36 in range 2 because 1-10 and 11-20 in range 1 are neighbors.
The command PICKIN will cause WHAT IF to ask you for the number of atom
pairs you want to pick. As soon as you have given the number control is passed
to the PS300 screen and data tablet, and in the message area on the top right
of the screen the message 'pick atom one' appears. Now you can pick the first
atom. When you pick one, the keyboard bell will ring, and the text at the
screen changes to 'pick atom two'. After picking the second atom the bell
rings again, and 'pick atom one' appears again. This is now atom one of
the second pair. This keeps going on till all the requested pairs are picked.
Be aware that you should always pick in the same order. That means always
atom one in molecule (or group) one, and atom two in molecule (or group) two.
SHORNG will show the two sets created (see RANGE1 and RANGE2). For every
atom the residue number in each set, and the coordinates will
be shown.
The command SAVRNG will cause WHAT IF to prompt you for a file name. It will
then write the presently available data in the two sets (see RANGE1 and RANGE2)
in this file. This command is especially handy since WHAT IF only
remembers transformations, and not sets of atoms needed to create these
transformations.
The command RESRNG will cause WHAT IF to prompt you for a file name. If you
give the name of a file previously written with the SAVRNG command, the two
sets will be read back in.
The command DOSUP will calculate the best matrix and vector to put set 2
on top of set 1. Be aware that the transformation is only calculated, but NOT
performed. For that you should use the APPLY command. The transformation
calculated WITH DOSUP will overwrite the existing active transformation
matrix. Transformations can be saved with the PUTMAT command.
The command APPLY will cause WHAT IF to prompt you for a range of residues.
Here you can enter anything present in the soup. The transformation presently
in memory will then be applied to this range (of protein, DNA/RNA, drugs and
water if wanted). That means that you can use one set of atoms/residues
to determine a superposition transformation matrix, but apply this matrix
to another set if wanted.
If you made a mistake with the range in the APPLY command, you can use the
UNDO command to restore the old situation. This command does the same as
the APPLY command, but it only uses the inverse of the presently
active transformation. If you screw up here again, you can undo the UNDO with
APPLY again etc., but that makes the whole soup into a big mess after
a few times. UNDO is of course a very nice option to misuse for all kinds
of other purposes....
The command CADIFF will cause WHAT IF to prompt you for two ranges and
a cutoff distance. It will then list all alpha carbons that are in the one
range, and are close to an alpha carbon in the other range.
The command CENDIF does the same as the command CADIFF (see above) with the
difference that CENDIF uses centers of residues rather than alpha carbons.
This option does two things. First SUPSTS will create a set of lines
between the two sets that were created with RANGE1 and RANGE2
(regardless whether APPLY has been executed or not).
These lines connect the atoms that were paired in the superpositioning.
Also the RMS displacement between the two sets is given.
The command COMPAR should be used after DOSUP and APPLY, and also needs
the accessibilities to be known before DOSUP and APPLY were done.
It will compare the two ranges but skip side chains of residues with
too high accessibility because once something sticks into solution,
it is close to irrelevant where it goes (at least in bio-computational
term).
The option ANAFIT requires the two ranges to be set. It will then do
the superpositioning of the second range on the first one, but rather than
just doing it the fast way, it will do it in a way that the human being
will understand the transformation. Normally transformations consist
of a rotation matrix which should be applied first, and a translation vector
which should be added afterwards. ANAFIT will do this the other way around.
It will first calculate the translation vector, and than the rotation matrix.
The rotation matrix will be decomposed in three independent rotations
around the X-axis, Y-axis, and Z-axis which together give the same
result as the rotation matrix. Don't try to use APPLY or UNDO after this
option, because the result will be meaningless at best.
The command EQUAL will cause WHAT IF to prompt you for two molecules. It will
check if these are two copies of the same molecule. If that is the case, all
kinds of statistics will be shown. Also the two molecules will be put at
the screen, the first one green, the second one ranging green to red
from perfect to worst superpositioning result. Lines will be drawn
between all pairs of identical atoms. This MOL-item provides a nice,
fast way of visualizing the largest differences. In case you do not have
twice the same molecules, you can use the EQUALF option or one of the
COMPAR-like options.
The command EQUALF will cause WHAT IF to prompt you for two molecules and a file.
This file should hold the unique identifiers of residues in the two molecules
given. All alpha carbons of pairs of residues given in the file will be
connected by lines (provided that they reside in the molecules).
The format of the file is one line per pair, format A4,1X,A4. That means that
the unique identifier of the residue in the first molecule
The main disadvantage of aligning protein sequences is that nothing is done
with secondary and tertiary structure knowledge. The option MOTIVS will
overcome this problem. MOTIVS is a rather time consuming option. It will
make a diagonal plot of 3-D superposition results. Depending on the size of
the proteins and the parameters you set, this can take from 30 seconds to
ten minutes CPU time on a micro VAX.
If you do not have a log file open upon entering this option you will
get a warning, and the possibility to jump out is offered, so you can open
a log file and start again.
MOTIVS will prompt you for two ranges. It will then try to do a 3-D
superposition of every stretch of every length in the first range on
every stretch of the same length in the other protein. (For the mathematicians,
this means that with N amino acids in each of the two proteins there
will be N**3 superpositionings tried with an average length of N/2 amino
acids). Some nifty little tricks make that two proteins of over 400 amino
acids each can be tested in roughly 5 minutes CPU on a micro-VAX.
This part of the program is not written very nicely, because clarity had
to bow for speed rather often.
You will also be asked if you want to skip helixes or not. This is added
because every helix always fits every other helix perfectly. If you ask for
helixes to be skipped, then every time a stretch is found in range 1
that has a matching stretch in range 2, but has less than 5 (this can
be changed with the parameter setting module) residues non-helical, it
will be skipped.
Now you will see the stretches of the two given ranges that superimpose
well enough according to the parameters you set (see chapter on parameter
setting). For each pair you will see their location in the ranges, their
aligned sequences, if available their secondary structure, their length,
and the RMS error and the maximal error on the superimposed alpha carbons.
WARNING. THE DIAGONAL PLOT IS TEMPORARILY REMOVED FROM THE CODE.
But when it gets put back the following applies again.
At the end you will be asked if you want a diagonal plot. Just try it, it
does not take much time, and only one page of output, but it is very
illustrative. The way you read such a plot is identical to the well known
sequence alignment diagonal plots, but now things are done in 3-D.
Program control is now passed to the PS300. The screen
menu will change, and the following screen menu options become available:
WAIT does the same as always.
NOID removes the labels placed next to the squares. Old atom labels that
are accidental left over from before entering the graphics menu can not
be removed with NOID from the graphics menu. You should go to the GRAFIC
menu first, give there GO, and use the NOID from there before entering
MOTIVS.
INIT as usual removes all mol-items. First picking NO and thereafter INIT
will as usual remove all WHAT IF generated mol-items (MOL0).
NEIM will ask you to pick a square. It will then draw the local environment
of the residue picked. This environment is not pickable.
First picking NO and then NEIM is the only way of removing this local
structure.
The residues will be translated from their normal soup positions to such
a position that they roughly are 'near' the MOTIVS square. This way you
do not have to search through the whole PS300 graphics space.
YES as such does nothing. Only as answer to questions might YES be needed.
NO as such does nothing. NO only has a function in conjunction with
NEIM or INIT, or as answer to a
question.
CONT will ask you to pick a square. This square will now become the center
of the graphics.
STER works the same as everywhere else.
As usual, CHAT passes control back from the PS300 to the VAX.
When you pass control back to the VAX resident part of WHAT IF (with CHAT),
a clustering algorithm will be started. WHAT IF will try to find the largest
cluster of superimposed stretches that can together be used to superimpose
with the limits that you are supposed to give at this moment.
When you are pleased with the clustering, WHAT IF starts an iterative set
of superimposition operations. In each round it will use the present set
of amino acid pairs it thinks that have to fall on top of each other
(in the first round the present set is the sum of all clusters), and
use those to do a superpositioning on the whole ranges given. Then it will
make a new set of amino acid pairs. This time all pairs that after applying
the transformation fall within the limits given will be put in the set.
Normally this process converges in 3 to 6 rounds. At the end of this
iterative process you will be prompted for a mol-object number and a
mol-item name. The two ranges will be put at the screen superimposed
(only alpha carbons will be show). The
coloring scheme used is as follows: The one range is green, the other
range is red. The dark red and dark green were use in obtaining the
superpositioning. The less saturated red and green alpha carbons were
not used.
After this display part, WHAT IF will (upon request) show you the sequence
alignment that resulted from the 3D-alignment. This will be done both
at the terminal, and at the PS300.
Since often two alpha carbons in the one range both
have the same alpha carbon in the other range as the nearest, some AI is
tried to clean this up.
I am sorry for the complexity of this option, but just try it once for
the two domains of Rhodanes as an example, and use everywhere the defaults.
You will see that it is simpler than expected.
There are essentially two different ways of running this option:
1) Contact analysis mode
2) sequence alignment mode (default)
The parameter LENHOM (see PARAMS) selects between these two modes. LENHOM=0
gives you `contact analysis mode`. This means that after the initial
clustering all residues that actually fall on top of each other will be
aligned and used for further superpositioning refinement. This means that
if there is an accidental close spatial proximity
of two residues in the 3-D alignment (eg. a b-strand cutting through
a helix) this will be used for further superpositioning improvement.
If you do not like this (deliberate) feature, you can use `sequence alignment
mode`. For that you have to set the LENHOM parameter to 4 or larger.This
parameter takes care that at least a couple of residues in a row fall on
top of each other. The number of residues that should match is given by
the value of the LENHOM parameter.
for
The command PARAMS, as usual, brings you in the menu from which the parameters
for SUPPOS operations can be changed.
The parameter MINLEN determines the minimal length of fragments to be
considered in the initial fragment search in the MOTIVS option. It is advised
to use no fragments shorter than 9 because of CPU and internal variable
overflow problems. For very homologous parameters, larger values, e.g. 25-35
are advised.
The parameter MAXERR determines the maximal superposition error allowed for
two alpha carbons in order to be equivalenced in the MOTIVS option.
Suggested values are 0.5 - 8.0 Angstrom. Half way the execution of MOTIVS,
the user is prompted for the
maximal and RMS error in the final superpositioning. MAXERR and RMSERR are
than the defaults.
The parameter RMSERR determines the maximal allowed RMS misplacement of
equivalenced alpha carbons in the initial fragment matching procedure of the
MOTIVS option. Suggested values are 0.5 - 3.8 Angstrom.
Half way the execution of MOTIVS, the user is prompted for the
maximal and RMS error in the final superpositioning. MAXERR and RMSERR are
than the defaults.
The LENHOM parameter determines how many residues should at least be
equivalenced in the final superposition in the MOTIVS option. The suggested
value is minimally 4.
Every helix always fits perfectly many times on every other helix. To avoid
finding billions of helix-helix matches in the fragment search part of the
MOTIVS option, you can tell MOTIVS to skip helixes. It will not skip them
entirely, but only accept helical fragments if more than NONHEL residues
are non-helical in at least one of the fragments. The secondary structure
is determined by DSSP, and thus this option is useless if you try to
superpose alpha-carbon-only molecules.
The option EQUAL can be used to compare different copies of the same molecule.
EQUAL will do some comparisons, and draw lines between equivalenced atoms.
In case you want to compare unequal molecules, you can set the EQUMOD flag to 1.
INISUP will cause WHAT IF to (re-)initialize all arrays and other information
relevant to SUPPOS options. Files on disk will remain intact. This command
is always automatically executed when the SUPPOS menu is entered or left.
There should actually be no need to ever use this option.
The command COLDIF will use the presently set RANGE1 and RANGE2, and colors
the second range from blue to red as function of the misfit. This option
should be used immediately after the APPLY option or strange coloring
schemes will result.
The command APLITM will cause WHAT IF to prompt you for the name of
a mol-item. It will then apply the presently active transformation matrix
to this mol-item.
The command CABOX will cause WHAT IF to prompt you for two ranges
and an alpha carbon distance cutoff. It will then create a diagonal plot
in which you will see one little square for every inter range alpha carbon
pair that has a distance less than the cutoff. If you use the same
ranges as for the MOTIVS option, you can see which matching fragments
are actually used after the clustering. There are many other ways that
you can mis-use this option.
The command SFUDGE will cause WHAT IF to prompt you for two molecules. These
two molecules should be identical (that means covalently identical, their
coordinates are allowed to be different). You will also be prompted for
a cutoff limit. All equivalent atoms in the two molecules that are
closer to each other than the cutoff limit will get their coordinates
pairwise averaged. This is a good option to emphasize differences between
molecules. The similar parts will get identical, but the larger differences
remain. This option of course makes chemical nonsense out of you molecules.
The command SFUDG2 will cause WHAT IF to prompt you for two molecules. These
two molecules do not need to be identical (neither covalently identical, nor
have the same coordinates). You will also be prompted for
a cutoff limit. All atoms in the two molecules that are
closer to each other than the cutoff limit will get their coordinates
pairwise averaged, whether they are supposed to be equivalenced or not.
Be aware that with a large cutoff limit this option will become unstable, and
will produce strange results.
This is a good option to emphasize differences between
molecules. The similar parts will get identical, but the larger differences
remain. This option of course makes chemical nonsense out of you molecules.