This chapter should start with honouring Alwyn Jones for having the idea
to create a fragment database. Eventhough I did not understand how his method
was implemented, and I therefore had to redesign the whole procedure around
another, faster, algorithm, the idea was his.
The idea is that all proteins are made up out of a limited number of possible
short fragments, together forming all possible
backbone conformations. Therefore, if one has a large enough
fragment database, it must be possible to build a new protein just by using
these fragments. The problem is however, how to find for example all groups
of 9 amino acids in the whole database that have a smaller than 1.0 Angstrom
RMS deviation on C-alpha positions when fitted to a group of 9 amino acids
in the molecule we are working on. To do this by brute force methods would
take around 50 hours of CPU time on a micro VAX. Using inversly sorted
C-alpha distance tables with integer distance pointer arrays can speed this
process up by many orders of magnitude. The possibility to find fragments
in the database that superimpose well on top of a part of the molecule you
are working on has been incorporated in the program WHAT IF in many ways.
Almost all these commands start with the two characters DG. This after Alwyn
who used the same nomenclature.
Because most DG*** options at some time explicitly use the middle amino
acid of the stretch, your group length should always be odd.
(Can be set with the
SETLEN command). The DG*** commands are all activated from the DGLOOP
menu. Type DGLOOP to enter this menu.
WHAT IF accepts every
hit that meets the user defined (or default) criteria about RMS and maximal
errors. However, most options have an upper limit in the number of hits.
This explained why, for example, you can work with crambin, but
not find the perfect hit in the database, eventhough crambin is in the database.
That is the simple result of finding enough hits before the hit in the
database that came from crambin was actually inspected. If you want to be
sure that you will get all hits, set the number of hits high, and the search
criteria tight. Also, hits that give an RMS better than 0.000001 are skipped
because that normally means that the database contains the protein you are
working with.
DGFIND will cause WHAT IF to prompt you for a residue number.
This can not be a residue that is too close to the N- or C-terminus
of any chain (Why, will be explained below). WHAT IF will take the
fragment (of at least 5 residues, see SETLEN) with this residue in the middle
and search the database for equally long fragments with a highly similar
back bone conformation. Highly similar is defined by the parameters, but
typically it means that the RMS on alpha carbons is better than 0.7A.
There are no additional constraints on this frament.
The DGINS option does rather a lot of things, one after the other.
You will first be prompted
for a residue after which to insert 1 till N amino acids (N depends on
a parameter in the CCONFI.FIG file, see also PRP006). Then you will be
asked for the number of amino acids to be inserted. The program will now
send the best hits over to the graphics window and you can loop through them
with the movie buttons (MOV+ and MOV-). After clicking CHAT you
are asked to choose which one you
want to use for the insertion. Of the inserted residues only the backbone
will be inserted (poly glycine insertion thus). No corrections for
non-covalent contacts (bumps) are made!
DGFIX will cause WHAT IF to prompt you for a residue number.
This can not be a residue that is too close to the N- or C-terminus
of any chain. WHAT IF will take the
fragment (of at least 5 residues, see SETLEN) with this residue in the middle
and search the database for equally long fragments with a highly similar
back bone conformation. Highly similar is defined by the parameters, but
typically it means that the RMS on alpha carbons is better than 0.7A.
The middle residue in the database fragment must be be of the same type
as the residue on which you perform the search.
DGMUT will cause WHAT IF to prompt you for a residue number.
This can not be a residue that is too close to the N- or C-terminus
of any chain. WHAT IF will take the
fragment (of at least 5 residues, see SETLEN) with this residue in the middle
and search the database for equally long fragments with a highly similar
back bone conformation. Highly similar is defined by the parameters, but
typically it means that the RMS on alpha carbons is better than 0.7A.
You will be prompted for the residue type of the middle residue in the
database fragments.
The command DGCONT allows you to search for pairs of residues that have the
same spacial relationship as the pair you give it as example. You will be
prompted for a central residue. For this residue you will have to tell which
atoms should make the contact with the still to be given neighbouring residue.
You will also have to give the atoms to be used for superimposing the database
hits on the central residue in the soup that you gave. Thereafter you are
prompted for the neighbouring residue and for the atoms in this neighbouring
residue that should have a contact with the indicated atoms in the central
residue. The last information needed is the contact distance. A contact is
considdered if the distance between two atoms is less than the sum of this
contact distance and the Van der Waals radii of the two contacting atoms.
WHAT IF will now loop over all residues in the database that are of the same
type as the central residue given. It will for each of these database hits
superimpose (only using the atoms marked for superimposing) this residue
on the central one, and apply the superposition transformation on the whole
molecule in which the database hit resides. If there is now (in the rotated and
translated database protein) a residue of the
same type as the given neighbour residue approximately at the same place in
space as the indicated neighbour, then this pair will be marked as a hit.
Don't worry about the stupidity of this algorithm. In reality it works a little
bit different, but that is way to difficult to explain.
All hits found are stored in a group, send to the MOVIE area, and upon request
send to a mol-item. This is since the neighbouring information is not stored
in the group, so if you later want to look at this contact group again, you
will have to redo the whole option.
'Approximately being at the same place' is defined as the average distance
between the equivalent atoms being less than a certain cutoff. The default
value is 4 Angstrom. Use the PARAMS option to change this cutoff.
The options DGFIND, DGFIX and DGMUT all prepare groups of hits. If you want
to mutate the amino acid used to make these hits with the middle amino acid
of one of these hits, you should use the DGREP option. This option does the
same as the DGGRA option (see DGGRA), but after showing the hits at the PS300
screen you are prompted for the number of the hit to be used. These numbers
are indicated at the right top of the screen while you click through
the movie with MOV+ and MOV-. If there is
no hit to your liking, you can (as usual) escape by typing zero.
The command DGGRA can be used to send hits to the graphics window
for visual inspection.
After typing DGGRA you will be promted for a group number. You can only look
at groups that were made using any of the DG*** options (also after a logical
operation with another group has been performed). The hits are sent to the
MOVIE. The middle residue, the one of our interest, is drawn somewhat
more intens than the other residues. The right hand side of the top bar
indicates the
number of the hit presently at the screen.
You can switch the movie off with the MOVIE button at the bottom of the screen.
Also a next set of DG*** hits will overwrite the previous one when send
over with a subsequent DGGRA command.
The command DGGRAL can be used to send hits to the graphics window for
visual inspection.
After typing DGGRAL you will be promted for a group number. You can only look
at groups that were made using any of the DG*** options (also after a logical
operation with another group has been performed). The hits are stored in a
MOL-item. They are
coloured by quality of fit. Blue for the best one, red for the worst.
The command DGSHOW does almost the same as the command SHOHIT (see SHOHIT in
the SCAN3D menu).
It lists the hits one by one, including sequence, secondary structure
determination for the fragment, and the RMS deviation for the alpha-carbons
after superpositioning. Be aware that the RMS deviation is no longer correct
if you have done logical combinations on this group.
The command CATOAL will run over the entire molecule and replace every amino
acid for which only the alpha carbon coordinates are present by a complete
residue. This option loops over the DGMUT option, and every time accepts
the best hit found, without user intervention.
If you are running this option on experimental alpha carbon positions, you
should probably run the RELAX option (see below) a couple of times
before starting with CATOAL.
The command ALTOCA causes WHAT IF to set all coordinates to zero except those
for the alpha carbons. This is of course a rather useless command, but
it is nice to test the quality of the CATOAL option.
This option does the same as DGROTA (see below). This option is only added
for option nomenclature consistency.
The command DGROTA does almost the same as DGMUT. However, it will automatically
add a DGGRAL option at the end. In this DGGRAL option only the side chains
of the middle residue of the search string will be shown. Also, in DGROTA the
weight on the central residues alpha carbon is infinite in the superposition.
This is a very good option to get an impression about possible sidechain
conformations (=rotamers) at a certain position.
The command DGROTA does the same as DGR1-1. It is left in here for compatibility
purposes.
The command DGRN-1 will prompt you for one residue. It will than determine
the rotamers (as described for the DRG1-1 option) for all 20 residue types
at this position (nothing is shown for glycine because it has no side chain).
The hits will be stored in the first 20 frames of the movie option.
The command DGR1-N will prompt you for a residue range and a residue type.
The range should not span more than 100 residues.
For every residue in the range the rotamers for the requested residue type
will be determined as described for the DGR1-1 option, and put in the movie.
At present the output is also a surprise to me.
The command DGRN-N is determines rotamer distributions for all residue types
for a complete range of residues. As this can no longer be displayed, you
get the Chi-1 statistics. The statistics consist of a table with for every
position for every residue type the distribution of preferred Chi-1 angles
in steps of 10 degrees. Also, three graphs will be shown with the frequency
of occurrence around +60, +/-180, and -60 degrees (from bottom to top) at each
position averaged over the 17 residue types (gly, ala, pro are excluded).
A second plot shows the distribution of the average residue over the 360
degrees of chi-1, averaged over the 17 residue types.
Since these two plots are drawn in the colour of the residues (actually their
alpha carbons), you are suggested to thing about colouring them cleverly
before you run this extremely time consuming option!
At present the output is also a surprise to me.
The command DGRSLF will cause WHAT IF to prompt you for a residue range. It
will than execute the DGR1-1 option on each residue in this range, and store
the results in the movie. The rotamers will be for the residue
type that is present at that situation. This option allows you to inspect how
many of your residues are in the most preferred conformation.
The range should not span more than 100 residues.
The command DGRS-N will cause WHAT IF to prompt you for a residue range.
For all residues in this range the geometrically best rotamer (that is
the rotamer that is closest to the middle of the cloud and has the
best backbone fit) will be determined. These best rotamers will be
plotted.
The command SETLEN can be used to change the length of the groups to search for.
The commands DGFIX, DGFIND and DGMUT need the group length to be odd. DGCONT
works independent of the group length.
This SETLEN command is completely equivalent to the SETLEN command in the
SCAN3D menu.
The command INIGRP does the same as the command with the same name in the
SCAN3D menu: it initializes all groups. This is an irreversible command.
The only way to get the groups back is by regenerating them.
The command SHOGRP does the same as the command with the same name in the
SCAN3D menu: it shows you all groups. The presently available groups
are shown including their group number, the number of hits in the group,
and a short description of how the group was created.
The command PARAMS brings you directly in the menu to change the DG*** related
parameters. See the chapter on parameter setting for a detailed description
of these parameters.
The command TIGHT will cause WHAT IF to tighten all DGLOOP related parameters
by a factor of 1.67. This means that the quality of the hits will on the
average get better on the cost of the number of hits.
The command RELAX will cause WHAT IF to relax all DGLOOP related parameters
by a factor of 1.67. This means that the quality of the hits will on the
average get worse, but you will get more hits.
The command RESPAR will cause WHAT IF to reset all DGLOOP related
parameters to their default values.
The command SHOPAR will cause WHAT IF to show you all DGLOOP related
parameters.
The option DGDMUT is experimental. It is supposed to become the smartest
possible mutant predicting module ever made...
Just try it. At worst WHAT IF crashes, and at best you understand what it
does and get some results.
The SCNSTS menu that is normally used to evaluate SCAN3D relational
database hits can also be used to determine residue statistics for
DG*** groups.