Superimposing structures (SUPPOS)

Introduction.

WHAT IF allows you to do superpositionings of structures or parts of structures on top of each other. All these operations are performed from the SUPPOS menu. The general flow of events is as follows. You first define a set of atoms and/or residues on top of which you want to superposition a second set of atoms and/or residues. Then you give this second set. The command DOSUP then calculates the best rotation matrix and translation vector that, when applied to the second set, puts it on top of the first set. You can optionally store this matrix and vector in the matrix and vector database. The command APPLY will then prompt you for a range of residues (or drugs, or,...) and will apply the presently active transformation (that is the last calculated or read matrix/vector) to this range. Several options are available for automatic superpositionings, for statistics and for comparison of structures.

The 3SSP menu holds all options that are related to automatically compare one structure with all structures in the database.

Options to do matrix administration

Transformation file format

The format for the storage of transformation files is as follows:

One line A80 with the title.

One line A80 with the date and time.

These first two lines are of no importance, and they may contain anything you want (they may even be empty, but they should be there).

Three lines 3F10.6,10X,F10.4 for the transformation matrix and vector. In the following order:


R11  R12  R13       T1
R21  R22  R23       T2
R31  R32  R33       T3

an example:

test1                                                                           

  0.999999  0.000995  0.001096              0.0184
 -0.000997  0.999998  0.001606             -0.0174
 -0.001094 -0.001607  0.999998              0.0125

WHAT IF stores these matrices as files on disk. Only one matrix can be in memory at any moment. WHAT IF only remembers which matrices exist. Matrices are stored in files called SUPMAT.MAT** where ** is 1,2,3,4,...

Put a matrix/vector in the database (PUTMAT)

PUTMAT will add the presently active transformation to the superposition transformation database. In case the space reserved in the database is full, a warning will be issued.

Get a matrix/vector from the database (GETMAT)

WHAT IF keeps only one transformation in memory. This is called the active transformation. WHAT IF can however keep track of many (presently 25) transformations. It only keeps the names of files (format: 1 line A80 for the title; 1 line A80 with the date and time of creation; 3 lines 3F10.6,10X,F10.4 for the matrix and vector) in which the superposition transformations are stored. GETMAT copies the transformation from a file to memory. You will be prompted for the number of the file. Use LSTMAT to get a directory of available transformations. Use GETOLD to read a transformation file not created by WHAT IF.

List the matrix/vector database (LSTMAT)

LSTMAT will cause WHAT IF to show you the names (=title lines) and the numbers of the transformations presently available in the database. You only need to give the number of a transformation if you are prompted for it. The title is just there to aid you keeping the matrices apart.

Delete a matrix/vector from the database (DELMAT)

DELMAT will cause WHAT IF to prompt you for the number of a transformation. This transformation will then be removed from the database. The file corresponding with it will stay in your directory, and can later still be recovered with the GETOLD command. You can get a directory of transformations in the database with the LSTMAT command.

Looking in transformation database (DIRMAT)

The command DIRMAT will cause WHAT IF to go over all transformations in the transformation database, and list their names (=title lines), their creation data and time, and the actual transformation matrices and vectors.

Reading external matrices in (GETOLD)

With the GETOLD command you can read an external transformation file. This can either be a file created by another program or by hand, or a transformation left over from another WHAT IF session which did not get stored. It can of course also be a transformation that was earlier deleted. See above for the required file format.

Show the presently active matrix (SHOMAT)

SHOMAT will cause WHAT IF to show you the presently active transformation matrix and vector.

Atom/residue selection

Select the atoms on which to superimpose (RANGE1)

WHAT IF creates transformation matrices that superpose one set of atoms on top of another set. The set that stays on the same place is called set 1. The command RANGE1 allows you to (re-)create set 1. You will be prompted for residue ranges until you give 0 as input. You can only give a range of residues that is in the same molecule. If you need ranges spanning multiple molecules (eg 12 aa and a drug), you need to give 1 or more zones for the protein, and one per drug. DNA/RNA, water and single atomic drugs are not allowed in set 1. Every time you execute a SUPPOS command that does more than range setting the previous set 1 and set 2 are thrown away. In case you enter amino acids, only the alpha carbons are used. In case you enter a drug, all atoms are used. In case of mixed sets you should keep this in mind, because every atom bears the same weight in the lsq superpositioning algorithm. Overlapping ranges are not allowed in set 1. Ranges that lay head to tail in the molecule are automatically converted to one single range.

Select the atoms to be superposed (RANGE2)

After you created set 1 with the RANGE1 command, you can create set 2 with the RANGE2 command. For every zone of contiguous alpha carbons found in set 1 you will be prompted for a zone of amino acids that should be put in set 2. The superpositioning will then try to put the corresponding alpha carbons on top of each other. If the zone you enter for set 2 is too short, the tail of the corresponding zone in set 1 is removed. If you enter too many residues the tail of this zone in set 2 is removed. For every drug in set 1 you will be prompted for a drug in set 2. If the drug in either of the two zones has too many atoms, it will be truncated. The atom order in both drugs should be identical!

WARNING!

Due to a small design flaw, it is not possible to have two stretches in either of the two ranges that are neighbors in the soup. Eg. put 1-10 in range 1 on top of 2-11 in range 2 can not be combined with 11-20 in range 1 on top of 27-36 in range 2 because 1-10 and 11-20 in range 1 are neighbors.

Picking pairs of atoms to superimpose (PICKIN)

The command PICKIN will cause WHAT IF to ask you for the number of atom pairs you want to pick. As soon as you have given the number control is passed to the PS300 screen and data tablet, and in the message area on the top right of the screen the message 'pick atom one' appears. Now you can pick the first atom. When you pick one, the keyboard bell will ring, and the text at the screen changes to 'pick atom two'. After picking the second atom the bell rings again, and 'pick atom one' appears again. This is now atom one of the second pair. This keeps going on till all the requested pairs are picked.

Be aware that you should always pick in the same order. That means always atom one in molecule (or group) one, and atom two in molecule (or group) two.

Showing the sets (SHORNG)

SHORNG will show the two sets created (see RANGE1 and RANGE2). For every atom the residue number in each set, and the coordinates will be shown.

Saving sets in a file (SAVRNG)

The command SAVRNG will cause WHAT IF to prompt you for a file name. It will then write the presently available data in the two sets (see RANGE1 and RANGE2) in this file. This command is especially handy since WHAT IF only remembers transformations, and not sets of atoms needed to create these transformations.

Retrieving sets (RESRNG)

The command RESRNG will cause WHAT IF to prompt you for a file name. If you give the name of a file previously written with the SAVRNG command, the two sets will be read back in.

Manual superimposing

The superimposing step (DOSUP)

The command DOSUP will calculate the best matrix and vector to put set 2 on top of set 1. Be aware that the transformation is only calculated, but NOT performed. For that you should use the APPLY command. The transformation calculated WITH DOSUP will overwrite the existing active transformation matrix. Transformations can be saved with the PUTMAT command.

Applying a transformation (APPLY)

The command APPLY will cause WHAT IF to prompt you for a range of residues. Here you can enter anything present in the soup. The transformation presently in memory will then be applied to this range (of protein, DNA/RNA, drugs and water if wanted). That means that you can use one set of atoms/residues to determine a superposition transformation matrix, but apply this matrix to another set if wanted.

Undoing an applied transformation (UNDO)

If you made a mistake with the range in the APPLY command, you can use the UNDO command to restore the old situation. This command does the same as the APPLY command, but it only uses the inverse of the presently active transformation. If you screw up here again, you can undo the UNDO with APPLY again etc., but that makes the whole soup into a big mess after a few times. UNDO is of course a very nice option to misuse for all kinds of other purposes....

Evaluation options

Alpha carbon distances (CADIFF)

The command CADIFF will cause WHAT IF to prompt you for two ranges and a cutoff distance. It will then list all alpha carbons that are in the one range, and are close to an alpha carbon in the other range.

Residue center distances (CENDIF)

The command CENDIF does the same as the command CADIFF (see above) with the difference that CENDIF uses centers of residues rather than alpha carbons.

Statistics (SUPSTS)

This option does two things. First SUPSTS will create a set of lines between the two sets that were created with RANGE1 and RANGE2 (regardless whether APPLY has been executed or not). These lines connect the atoms that were paired in the superpositioning. Also the RMS displacement between the two sets is given.

Tabulating the superpositioning result (COMPAR)

The command COMPAR should be used after DOSUP and APPLY, and also needs the accessibilities to be known before DOSUP and APPLY were done. It will compare the two ranges but skip side chains of residues with too high accessibility because once something sticks into solution, it is close to irrelevant where it goes (at least in bio-computational term).

Analyze a transformation (ANAFIT)

The option ANAFIT requires the two ranges to be set. It will then do the superpositioning of the second range on the first one, but rather than just doing it the fast way, it will do it in a way that the human being will understand the transformation. Normally transformations consist of a rotation matrix which should be applied first, and a translation vector which should be added afterwards. ANAFIT will do this the other way around. It will first calculate the translation vector, and than the rotation matrix. The rotation matrix will be decomposed in three independent rotations around the X-axis, Y-axis, and Z-axis which together give the same result as the rotation matrix. Don't try to use APPLY or UNDO after this option, because the result will be meaningless at best.

Comparing two identical molecules (EQUAL)

The command EQUAL will cause WHAT IF to prompt you for two molecules. It will check if these are two copies of the same molecule. If that is the case, all kinds of statistics will be shown. Also the two molecules will be put at the screen, the first one green, the second one ranging green to red from perfect to worst superpositioning result. Lines will be drawn between all pairs of identical atoms. This MOL-item provides a nice, fast way of visualizing the largest differences. In case you do not have twice the same molecules, you can use the EQUALF option or one of the COMPAR-like options.

Comparing two almost identical molecules (EQUALF)

The command EQUALF will cause WHAT IF to prompt you for two molecules and a file. This file should hold the unique identifiers of residues in the two molecules given. All alpha carbons of pairs of residues given in the file will be connected by lines (provided that they reside in the molecules).

The format of the file is one line per pair, format A4,1X,A4. That means that the unique identifier of the residue in the first molecule

Automatic 3-d alignment

Aligning two molecules/domains (MOTIVS)

The main disadvantage of aligning protein sequences is that nothing is done with secondary and tertiary structure knowledge. The option MOTIVS will overcome this problem. MOTIVS is a rather time consuming option. It will make a diagonal plot of 3-D superposition results. Depending on the size of the proteins and the parameters you set, this can take from 30 seconds to ten minutes CPU time on a micro VAX.

If you do not have a log file open upon entering this option you will get a warning, and the possibility to jump out is offered, so you can open a log file and start again.

MOTIVS will prompt you for two ranges. It will then try to do a 3-D superposition of every stretch of every length in the first range on every stretch of the same length in the other protein. (For the mathematicians, this means that with N amino acids in each of the two proteins there will be N**3 superpositionings tried with an average length of N/2 amino acids). Some nifty little tricks make that two proteins of over 400 amino acids each can be tested in roughly 5 minutes CPU on a micro-VAX. This part of the program is not written very nicely, because clarity had to bow for speed rather often.

You will also be asked if you want to skip helixes or not. This is added because every helix always fits every other helix perfectly. If you ask for helixes to be skipped, then every time a stretch is found in range 1 that has a matching stretch in range 2, but has less than 5 (this can be changed with the parameter setting module) residues non-helical, it will be skipped.

Now you will see the stretches of the two given ranges that superimpose well enough according to the parameters you set (see chapter on parameter setting). For each pair you will see their location in the ranges, their aligned sequences, if available their secondary structure, their length, and the RMS error and the maximal error on the superimposed alpha carbons.




WARNING. THE DIAGONAL PLOT IS TEMPORARILY REMOVED FROM THE CODE.


But when it gets put back the following applies again.

At the end you will be asked if you want a diagonal plot. Just try it, it does not take much time, and only one page of output, but it is very illustrative. The way you read such a plot is identical to the well known sequence alignment diagonal plots, but now things are done in 3-D.

Program control is now passed to the PS300. The screen menu will change, and the following screen menu options become available:


WAIT does the same as always.

NOID removes the labels placed next to the squares. Old atom labels that are accidental left over from before entering the graphics menu can not be removed with NOID from the graphics menu. You should go to the GRAFIC menu first, give there GO, and use the NOID from there before entering MOTIVS.

INIT as usual removes all mol-items. First picking NO and thereafter INIT will as usual remove all WHAT IF generated mol-items (MOL0).

NEIM will ask you to pick a square. It will then draw the local environment of the residue picked. This environment is not pickable.

First picking NO and then NEIM is the only way of removing this local structure.

The residues will be translated from their normal soup positions to such a position that they roughly are 'near' the MOTIVS square. This way you do not have to search through the whole PS300 graphics space.

YES as such does nothing. Only as answer to questions might YES be needed.

NO as such does nothing. NO only has a function in conjunction with NEIM or INIT, or as answer to a question.

CONT will ask you to pick a square. This square will now become the center of the graphics.

STER works the same as everywhere else.

As usual, CHAT passes control back from the PS300 to the VAX.


When you pass control back to the VAX resident part of WHAT IF (with CHAT), a clustering algorithm will be started. WHAT IF will try to find the largest cluster of superimposed stretches that can together be used to superimpose with the limits that you are supposed to give at this moment.

When you are pleased with the clustering, WHAT IF starts an iterative set of superimposition operations. In each round it will use the present set of amino acid pairs it thinks that have to fall on top of each other (in the first round the present set is the sum of all clusters), and use those to do a superpositioning on the whole ranges given. Then it will make a new set of amino acid pairs. This time all pairs that after applying the transformation fall within the limits given will be put in the set. Normally this process converges in 3 to 6 rounds. At the end of this iterative process you will be prompted for a mol-object number and a mol-item name. The two ranges will be put at the screen superimposed (only alpha carbons will be show). The coloring scheme used is as follows: The one range is green, the other range is red. The dark red and dark green were use in obtaining the superpositioning. The less saturated red and green alpha carbons were not used.

After this display part, WHAT IF will (upon request) show you the sequence alignment that resulted from the 3D-alignment. This will be done both at the terminal, and at the PS300. Since often two alpha carbons in the one range both have the same alpha carbon in the other range as the nearest, some AI is tried to clean this up.

I am sorry for the complexity of this option, but just try it once for the two domains of Rhodanes as an example, and use everywhere the defaults. You will see that it is simpler than expected.


There are essentially two different ways of running this option:
1) Contact analysis mode 
2) sequence alignment mode (default)
The parameter LENHOM (see PARAMS) selects between these two modes. LENHOM=0 gives you `contact analysis mode`. This means that after the initial clustering all residues that actually fall on top of each other will be aligned and used for further superpositioning refinement. This means that if there is an accidental close spatial proximity of two residues in the 3-D alignment (eg. a b-strand cutting through a helix) this will be used for further superpositioning improvement.

If you do not like this (deliberate) feature, you can use `sequence alignment mode`. For that you have to set the LENHOM parameter to 4 or larger.This parameter takes care that at least a couple of residues in a row fall on top of each other. The number of residues that should match is given by the value of the LENHOM parameter. for

Superpositioning parameters (PARAMS)

The command PARAMS, as usual, brings you in the menu from which the parameters for SUPPOS operations can be changed.

Minimal fragment length in motivs (MINLEN)

The parameter MINLEN determines the minimal length of fragments to be considered in the initial fragment search in the MOTIVS option. It is advised to use no fragments shorter than 9 because of CPU and internal variable overflow problems. For very homologous parameters, larger values, e.g. 25-35 are advised.

Maximal error in superposition (MAXERR)

The parameter MAXERR determines the maximal superposition error allowed for two alpha carbons in order to be equivalenced in the MOTIVS option. Suggested values are 0.5 - 8.0 Angstrom. Half way the execution of MOTIVS, the user is prompted for the maximal and RMS error in the final superpositioning. MAXERR and RMSERR are than the defaults.

Root mean square error in superposition (RMSERR)

The parameter RMSERR determines the maximal allowed RMS misplacement of equivalenced alpha carbons in the initial fragment matching procedure of the MOTIVS option. Suggested values are 0.5 - 3.8 Angstrom. Half way the execution of MOTIVS, the user is prompted for the maximal and RMS error in the final superpositioning. MAXERR and RMSERR are than the defaults.

Length of equivalenced stretches (LENHOM)

The LENHOM parameter determines how many residues should at least be equivalenced in the final superposition in the MOTIVS option. The suggested value is minimally 4.

Skipping helix-helix superpositions (NONHEL)

Every helix always fits perfectly many times on every other helix. To avoid finding billions of helix-helix matches in the fragment search part of the MOTIVS option, you can tell MOTIVS to skip helixes. It will not skip them entirely, but only accept helical fragments if more than NONHEL residues are non-helical in at least one of the fragments. The secondary structure is determined by DSSP, and thus this option is useless if you try to superpose alpha-carbon-only molecules.

Comparing molecules (EQUMOD)

The option EQUAL can be used to compare different copies of the same molecule. EQUAL will do some comparisons, and draw lines between equivalenced atoms. In case you want to compare unequal molecules, you can set the EQUMOD flag to 1.

(Re-)initialization (INISUP)

INISUP will cause WHAT IF to (re-)initialize all arrays and other information relevant to SUPPOS options. Files on disk will remain intact. This command is always automatically executed when the SUPPOS menu is entered or left. There should actually be no need to ever use this option.

Color as function of the misfit (COLDIF)

The command COLDIF will use the presently set RANGE1 and RANGE2, and colors the second range from blue to red as function of the misfit. This option should be used immediately after the APPLY option or strange coloring schemes will result.

Apply matrix to mol-items (APLITM)

The command APLITM will cause WHAT IF to prompt you for the name of a mol-item. It will then apply the presently active transformation matrix to this mol-item.

Manual diagonal plot (CABOX)

The command CABOX will cause WHAT IF to prompt you for two ranges and an alpha carbon distance cutoff. It will then create a diagonal plot in which you will see one little square for every inter range alpha carbon pair that has a distance less than the cutoff. If you use the same ranges as for the MOTIVS option, you can see which matching fragments are actually used after the clustering. There are many other ways that you can mis-use this option.

Averaging molecules (SFUDGE)

The command SFUDGE will cause WHAT IF to prompt you for two molecules. These two molecules should be identical (that means covalently identical, their coordinates are allowed to be different). You will also be prompted for a cutoff limit. All equivalent atoms in the two molecules that are closer to each other than the cutoff limit will get their coordinates pairwise averaged. This is a good option to emphasize differences between molecules. The similar parts will get identical, but the larger differences remain. This option of course makes chemical nonsense out of you molecules.

Averaging molecules (SFUDG2)

The command SFUDG2 will cause WHAT IF to prompt you for two molecules. These two molecules do not need to be identical (neither covalently identical, nor have the same coordinates). You will also be prompted for a cutoff limit. All atoms in the two molecules that are closer to each other than the cutoff limit will get their coordinates pairwise averaged, whether they are supposed to be equivalenced or not. Be aware that with a large cutoff limit this option will become unstable, and will produce strange results. This is a good option to emphasize differences between molecules. The similar parts will get identical, but the larger differences remain. This option of course makes chemical nonsense out of you molecules.