WHAT IF has a relational protein structure database on-line available. The
command SCAN3D activates the menu that operates this database. I used the
name SCAN3D to honour Steve Gardner who called his very fancy relational
database program 3DSCAN; eons ago he and I discussed this methodology for a
long time over a lot of beer. SCAN3D allows you to search in the database
for sequence, secondary structure, 3D-structure, accessibility, etc.
characteristics. The last part of this chapter is an early draft of an
article explaining why SCAN3D is so very special...
The commands are roughly divided in five groups:
1) Database inspection commands.
2) Scanning (database search) commands.
3) Logical operations (making relations).
4) Evaluation of results (listings, graphics).
5) Other commands.
Since there are many options, only a limited set is initially active in
this menu. Use the command MORE to activate all options.
The main principle of this database is that you search for fixed length
stretches of amino acids that have certain relations between all their
stored parameters. These found stretches are stored in groups. These
groups can be combined using logical operations like AND, OR, XOR, etc.
They can also be visualized at the graphics. The length of the groups searched
for can be set from 5 till 35 (with the SETLEN command).
The experienced user will see that there is some overlap between the groups
described in this chapter and the DG*** groups described in the structure
fragment chapter.
The database consists of sequence files, frequency tables,
secondary structure files, coordinate files,
pointer files, etc. All relevant information stored in these files can be
shown at the terminal.
The following commands allow you to look at specific parts of the database:
NOW INDEX SHOHED SHOHDS SDBHST SEQINF.
Some of these commands use internal parameters, but you do not have to
worry about those, the defaults are set such that you will most likely get
what you want any way. Otherwise look in the chapter parameter setting(s).
All these options also write in the log file if the LOGGER option has been
switched on (with the DOLOG command).
The command INDEX shows you the PDB 4-letter identification code for all
proteins presently available in the database. For every entry the number
of residues, the number of atoms, the number of water molecules and the
number of co-factors are shown.
NOW is an extended version of INDEX. It not only gives all output that would
be given by INDEX, but it also shows you the values for the parameters that
influence the way the database files are dealt with. You do not have to
worry too much about these parameters, the defaults are such that the
program does what you want 99 percent of all times anyway.
When you type SHOHED you will be prompted for the PDB 4 letter identification
code for the protein you want to see. If you do not know this code, you can
use the INDEX command to get a list of all of them presently in the database.
For this entry the header part of the PDB file will be shown at the terminal.
The command SHOHDS causes WHAT IF to show the HEADER, COMPND, SOURCE, and
AUTHOR record for all entries in the database at the terminal. Also shown
is for every entry the number of residues, the number of atoms, the
number of water molecules, and the number of co-factors.
This is a very crude option. It lists all sequences presently in the database.
You can reduce this to one sequence at a time by usage of parameter 4 (see
the chapter on setting parameters), but the SDBHST option does in this case
virtually the same. However SEQINF can also show frequency distributions
and neighbour matrices if you set the parameters correctly.
WHAT IF has stored a secondary structure determination for every amino acid
in the database. These determinations were made by the program DSSP (Written
by Kabsch and Sander, see appendix D). This option allows you to see
these secondary
structure determinations. You will be prompted for the name of the entry.
If you do not know these codes, you can use the INDEX
command to get a list of all of them presently available in the database.
WHAT IF will show the sequence in one letter code, and the secondary
structure element code as given by DSSP. If you do not know this programs
principle, I suggest you read the article, but in short: No code means random
coil; H means normal helix. 3 or G both mean 3-10 helix; T means turn, and
E and S mean sheet. See also appendix D.
The command SDBACC will cause WHAT IF to prompt you for the name of the entry.
If you do not know these codes, you can use the INDEX
command to get a list of all of them presently available in the database.
WHAT IF will show the sequence in one three letter code, and the total
accessibility for every residue. After every 15-th residue the screen stops
scrolling till you hit return. Please read the chapter on accessibilities
about some pitfalls with accessibility calculations.
The command SDBCHI will cause WHAT IF to prompt you for the name of the entry.
If you do not know these codes, you can use the INDEX
command to get a list of all of them presently available in the database.
WHAT IF will show the sequence in one three letter code, and all torsion
angles per residue. As 7 database files need to be inspected for this option,
it will take a few seconds before listing the angles commences. The screen
will stop scrolling after every 15-th hit. Hit return to continue scrolling.
The relational database in WHAT IF does not use a fancy language like SQL
for its queries. Instead, the user has to make the relations. The general
principle is that you look for one characteristic, and store the result
in something called a group. Thereafter you search for a second characteristic,
store this result in another group, etc. Then you make the relations by
means of logical operations on these groups. The normal logical operations
AND, OR, XOR, etc are available. This section describes what all you can ask
the database. The next section describes the usage of these logical operations.
You can fully flexibel get any sequence out of the database. See the last
part of this chapter about how to set the length of the stretches searched for,
about how to define groups of amino acids with a common name. See the previous
section for inspection of the sequences in the database.
The command SEQUEN will cause WHAT IF to ask you N times (N is the length
of the stretches searched for) 'Give the amino acid at position *'.
For every position in the search string you can give just return, meaning
that every amino acid is allowed there. You can also type one or more
amino acids in three letter-code. If you type more than one amino acid, this
means that every typed amino acid is acceptable at this position in the
stretches to be found. You may also mix in one or more of the so-called
self-made amino acids (see last section of this chapter).
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The name of this command is a little bit to humble. It does not only allow
you to search for helix and/or sheet, but also for turns and coil. If
you type HELSHT, WHAT IF will loop over all positions in the search string,
and for every position prompt you whether this position should be HSTC*.
You can now give any combination of these 5 characters. If you give for
example H at position 3, all stretches that will be lifted from the database
will be helical at the third position. If you give CT at position 6, all
stretches that will be lifted from the database will either be turn or
coil at the sixth position. If you give * at any position, then every type
of secondary structure will be allowed at that position.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command PSIPHI will cause WHAT IF to loop over the length of the search
string, and for every position prompt you for the limits of phi and psi
at this position. Here you have to give 4 values, all in the range -180.0 till
180.0. The first two are the lower and upper limit for phi, the last two
are the lower and upper limit for psi.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command OMEGA will cause WHAT IF to loop over the length of the search
string, and for every position prompt you for the limits on omega
at this position. Here you have to give 2 values, both in the range -180.0 till
180.0.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command SCNCNS will cause WHAT IF to prompt you for
the degree of conservation
at each position in the search profile. Conservation 0 (zero) means that
every residue is allowed and possible at this position. Conservation 100
means that only 1 residue is found at this position. Conservation is
measured by HSSP.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command SCNPOS will cause WHAT IF to prompt you for the absolute and
fractional position in the chain. The absolute position indicates where
in the chain the first residue of the search stretch is allowed to be.
Use negative numbers to indicate a distance to the C-terminus. The fractional
distance (values from 0.0 till 1.0) indicates in which part of the protein
the first residue of the search stretch is allowed to be.
Examples: To find all C-teminal helical stretches one would combine
a HELSHT run with SCNPOS with absolute range -1 till -2 (leaves one
residue free at the end), and fractional range 0.0 till 1.0.
To find all Cysteines in C-terminal domains, one would combine a SEQUEN
search with a SCNPOS search with absolute range 80 till 1000, and
fractional range 0.5 till 1.0.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command SCNHBO will cause WHAT IF to ask you for every backbone nitrogen
and oxygen in each residue in the search stretch whether it should be
hydrogen bonded. If you answer yes, it will ask for the secondary structure
type of the residue it is hydrogen bonded to. (Here, as usual, you can
answer H,S,T,C or any combination of these, or *)
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command CHIVAL will cause WHAT IF to prompt you for the number of the
sidechain chi-angle. You can give here 1 till 5 for chi-1, chi2, chi-3,
chi-4, chi-5, respectively. WHAT IF will then
loop over the length of the search
string, and for every position prompt you for the limits on the requested
chi angle. If the database amino acid does not have this chi-angle (like
there is no chi-4 in alanine...) then this database hit is not acceptable.
The only way to accept everything is by just hitting return when the
defaults of -180 180 are suggested. If you actually retype -180 180, every
amino acid that posesses this chi-angle is acceptable. If you just hit
return, every amino acid is acceptable at this position.
If you dont use the suggested default, you have to give 2 values, both
in the range -180.0 till 180.0.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command ACCVAL will cause WHAT IF to loop over the length of the search
string, and for every position prompt you for the limits of the surface
accessibility of the residue
at this position. Here you have to give 2 values, which are the lower
and upper limit for total surface accessibility for the residue at that
position. Read the chapter on accessibility calculations about some pitfalls
with accessibility values.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command DAYHOF will activate the sequence search option that accepts
only amino acids which according to the Dayhoff scorings matrix are worth
more than a certain number of points. The following scorings matrix is being
used:
V L I M F W Y G A P S T C H R K Q E N D
V 5 2 2 1 0-1 0-1 0-1-1 0-2-1-1-1-1-1-2-2
L 2 5 2 3 2 0 0-2-1-2-1 0-3-1-1-1 0-2-2-2
I 2 2 5 2 0 0 0-2-1-2-1 0-2-1-2-2-3-2-2-3
M 1 3 2 5 2-2-1-2 0-2-1 0 0 0-2-2 0-2-1-1
F 0 2 0 2 6 3 3-3-2-2-1-2-3 1-2-3-3-3-3-2
W-1 0 0-2 3 6 3-2-2-3 0-1-1 0 0-2-1-2-3-3
Y 0 0 0-1 3 3 6-3-2-3 0-2-2 1-1-2-2-1-1-2
G-1-2-2-2-3-2-3 5 0 0 0-1-2-1 0 0-1 0 0 0
A 0-1-1 0-2-2-2 0 5 1 1 0-2 0-1 0 0 1 0 0
P-1-2-2-2-2-3-3 0 1 5 0 0-3 0 0 0 0 1-2 0
S-1-1-1-1-1 0 0 0 1 0 5 2-1 0 1 0 1 1 2 0
T 0 0 0 0-2-1-2-1 0 0 2 5-1 1 0 0 0 1 0 0
C-2-3-2 0-3-1-2-2-2-3-1-1 6 0-2-3-3-3-2-2
H-1-1-1 0 1 0 1-1 0 0 0 1 0 5 2 1 1-1 1 1
R-1-1-2-2-2 0-1 0-1 0 1 0-2 2 5 2 2 0 0-2
K-1-1-2-2-3-2-2 0 0 0 0 0-3 1 2 5 1 1 1 0
Q-1 0-3 0-3-1-2-1 0 0 1 0-3 1 2 1 5 2 1 1
E-1-2-2-2-3-2-1 0 1 1 1 1-3-1 0 1 2 5 1 2
N-2-2-2-1-3-3-1 0 0-2 2 0-2 1 0 1 1 1 5 2
D-2-2-3-1-2-3-2 0 0 0 0 0-2 1-2 0 1 2 2 5
This means that if you request a aspartic acid at a certain position in the
search string, and say that the the score should be at least 2 points, that
glutamic acid, asparagine and aspartic acid are acceptable at this position.
You will be prompted for the average Dayhoff scoring value first. This is
simply the average of the scores for all positions in the search string.
Thereafter you will one by one be prompted for the residue at each position
in the search string, and its minimal Dayhoff score. If a certain residue
is allowed to be anything, just give any residue and -100 or something
very negative for the requested minimal score.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command TURNTP allows you to search for turns of a certain type. Turns
are defined as a reversal of the chain direction over four residues. These
residues are called I, I+1, I+2, I+3. Normally a hydrogen bond is found
between residue I, and I+3. WHAT IF will prompt you for the turn type. You
should give one of the following names:
I IP II IIP VIA VIB VIII IV
according to the nomenclature of Wilmot and Thornton in J.Mol.Biol, (1988)
203, 221-232 (where P stands for ' or prime).
The following limitations will now be placed on the phi and psi angles
of the residues I+1 and I+2:
PHI1 PSI1 PHI2 PSI2
I : TYPE I TURN ( -60 -30 -90 0)
IP : TYPE I` TURN ( 60 30 90 0)
II : TYPE II TURN ( -60 120 80 0)
IIP : TYPE II` TURN ( 60 -120 -80 0)
VIA : TYPE VIA TURN ( -60 120 -90 0)
VIB : TYPE VIB TURN (-120 120 -60 0)
VIII: TYPE VIII TURN ( -60 -30 -120 120)
IV : TYPE IV TURN ( ALL OTHERS)
Type IV is only there for completeness. Using it means that you actually
get the whole database as result.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
For a full description of hydrophobic moment calculations see the chapter
on this item.
WHAT IF has all hydrophobic moments for all proteins in the database
on-line available for repeat angle 100 degrees and window width 7.
The command SCNHYD will cause WHAT IF to loop over the length of the search
string, and for every position prompt you for the limits on the hydrophobic
moment
at this position. Here you have to give 2 values. All values are allowed.
Normal values fall in the range 0.0 till 0.5.
You will then be asked to give the `mismatch` parameter. This mismatch parameter
tells WHAT IF how many positions in each hit are maximally allowed to be
different from what was requested.
After the very fast search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
This option is absolutely amazing. However, it is relatively slow, and not
very user friendly. The idea is to find atomic contacts. It allows together
with the other SCAN3D options to look for example at all burried glutamic
acids for which the O-epsilons are not in contact with a basic nitrogen.
In order to use this option, you have to type a lot. For every position in the
search string, you will be prompted for the amino acid(s). If you give one
amino acid, you can use the subsequent question about which atoms to use
to specify individual atoms. If you give multiple amino acids, you can
when asked for the atoms, only give `SIDE-CHAINS` or `BACK-BONE`. After the
amino acid is known, the same questions as above will be repeated for
the residues with which there should be a contact. The same kind of answers as
for the residues searched for should be given. The last question per position
in the search string is the contact distance. This is the distance between
the atom centers minus the two Van der Waals radii. So for just touching
atoms, give zero. Since the database does not contains hits where this
distance is larger than 1.5 Angstrom it is useless (but not fatal) to give
very large numbers. You can also give negative distances to detect `bumps`.
The last thing you will be prompted for is the database range. Just hit return
to use the whole range. If you take the full (app. 100 proteins) database,
then the average search will take roughly 20 seconds CPU on a VAX workstation.
After the search WHAT IF will tell you how many hits it found
and ask you in which group you want to store these hits. The suggested
default is just the first free group available. Be aware that there are
at present only 10 groups allowed to be active at one time. If you do not
want to store the hits, just give group number 0.
The command SCNCYS will cause WHAT IF to prompt you for the cysteine
status for each of the positions in the search group. At every position you
can give one of the following:
* (asterix) if this residue is completely free.
-2 if this residue should not be a cysteine.
-1 if this residue should be an unpaired cysteine.
N if this residue should be a paired cysteine.
N can be zero if you don't care how far down in the sequence this cysteine
should be. If you give for example 4, that means that you are
going to search for cysteines that are paired with a cysteine that is
four residues further in the same sequence. So, this is one of the exceptions
where zero is valid input...
How to create groups is explained earlier in this chapter. The things
one can do with groups should have been explained first, since what can
be done with
them determines in a sence how one wants to go about making them. A group,
also sometimes called subgroup, is a set of peptides all having the same fixed
number of amino acids in them. Such a group is the result of searching through
the database. Upon searching, the program finds a certain number of hits. All
hits are stored, and the user can look at them. A very simple group would
for example be: all stretches of 8 amino acids with an alanine in it. This
would generate a group with several thousands of hits in it.
these groups can then be combined by means of logical operations.
These operations are AND OR NOT XOR. The user has to type SANDOR, to be able
to use one of these options.
The user has several options available to look in or at groups. These options
are SHOGRP (shows all groups made) SHOHIT shows hits in a group). Also the
option INIGRP is available to clean groups. SETLEN can be used to vary the
length of the stretches searched for.
After the comand SANDOR is given, the program responds by asking
for the number of the first group. Thereafter you will be prompted for the
second group, both times you should give a number from 1 to 10,
being the number of one of the groups you generated earlier. You will then
be prompted for the number of the group that will receive the result. If this
result group is already in use, you get the choice to over-write it, or to
make another choice. Then you will be prompted for the logical operation.
Here you should give one of the following:
AND OR NOT XOR
These operations do the following:
AND creates a new group consisting of all hits that both groups on which it
operates have in common.
OR creates a new group which consists of all hits that are present in at
least one of the two groups on which it operates.
NOT creates a new group consisting of all hits that are present in one of the
two groups on which it operates, but not in the other.
XOR creates a new group which consits of all hits that are present in the first
of the two groups on which it operates, but not in the second. (I don't think
that this operator will be used very often).
The command SCNINV will prompt you for an input group number and an output
group number. These may be the same. It will then invert the input group,
and store the result in the output group. If you do a logical OR on
the input and the output group of this option, you get the whole database
back.
There are several ways of inspecting the hits found. They are divided in
two categories, listing them at the normal terminal, or showing them
graphically.
The command SHOGRP shows you which groups have been generated sofar. Also shown
is how each group was generated, and how many hits there are in each group.
The command SHOHIT causes WHAT IF to prompt you for the number of the group.
You should then give the number of one of the earlier generated groups. Don't
worry if you forgot the group numbers, if you type something wrong, WHAT IF
will at worst tell you so, but it will not crash. After the group number has
been accepted, you will be prompted for the range of hits. Just give the number
of the first and the last hit you want to see. Now be ready
to use the no scroll option on your terminal, because for all requested hits
the program will show you the protein in which the hit was found, the sequence
numbers of the hit in this protein, the actual sequence, and the secondary
structure of this stretch as determined by DSSP (see appendix D).
In case you are looking at a group of fragments
created by one of the DG*** options, you will also get the RMS deviation between
the C-alpha coordinates of the hit and the stretch on top of which it
was fitted. However DG*** groups can not yet fully be mixed with 'normal'
groups.
If you want a fresh start, or for any other reason you want to get rid of all
the groups you have created sofar, you can use the command INIGRP. Be careful
with this command, it works immediately, and it is irreversible. Although, you
can of course always create all removed groups over again.
There are several ways to display hits.
The command SCNGRA will cause WHAT IF to prompt you for the number of a group.
Thereafter you will be asked how many hits you wnat to see. At present you
can give maximally 100 hits. These hits will be coloured red, all superimposed
on the first structure (which will sometimes look strange), centered at the
present center of the PS300, and placed in ridge. If you now rotate the
lower right dial of the dial box (the one labeled ridge) you will flip
through the hits.
The command SCNGRL will cause WHAT IF to prompt you for the number of a group.
Thereafter you will be asked how many hits you wnat to see. At present you
can give as many hits as you wish, but strange things will happen if all these hits
together have more than 2500 amino acids in them. These hits will be coloured
blue till red as function of their position in the hit list, all superimposed
on the first structure (which will sometimes look strange), centered at the
present center of the PS300, and placed in a MOL-item. You will be prompted
for the number of the MOL-object, and the name of the MOL-item.
WARNING! This option only works as expected when the middle residues of
all hits in a group are the same amino acid type (all alanines, or all
cysteines, etc.).
The command SCNGRN will cause WHAT IF to prompt you for the number of a group.
Thereafter you will be asked how many hits you want to see. At present you
can give as many hits as you wish, but strange things will happen if
all these hits
together have more than 2500 amino acids in them. These hits will be coloured
blue till red as function of their position in the hit list, all superimposed
on the first structure (which will sometimes look strange), centered at the
present center of the PS300, and placed in a MOL-item. In order to superimpose
the structures you will be asked which atoms to use for superpositioning.
The parameter setting menu can be used to determine which parts of the hit
and its environment will be shown.
The option needs a lot of input. You will be prompted for the atoms in the
center residue that should make the contact, for the atoms in the central
residue that should be used for superpositioning, for the neighbouring
residues, and for the atoms in the
neighbours that make the contact. This seems somewhat redundant because you
typed all this already for the previously run SCNCON option, but I have
great plans for these options in the future, and when those are ready,
you will understand why.
You will after roughly 10 seconds CPU on a VAX workstation be prompted
for the number of the MOL-object, and the name of the MOL-item.
The command SCNGRE will cause WHAT IF to prompt you for a group and a range
of hits. It will also prompt you for the atoms in the central residue
to be used for superpositioning at the screen, for the atoms in the central
residue to be shown, etc. This is the same procedure as for the SCNGRN option.
After all the input, all hits will be shown at the PS300 screen, with their
entire environments present.
This option is not yet tested.
The length of the stretches of amino acid that is being searched for, is always
a fixed number. I know that that is not optimal flexible for the user, but I
could not work out a method with flexible group lengths that could work just
as fast as it can do now. If you do not like the length of these stretches,
you can use the SETLEN command. Execution of this command will cost several seconds
because WHAT IF has to read many new pointer files.
It is possible to do logical operations on groups that have stretches of
different length in them. The program only looks at the first amino acid. If
these are the same, meaning that it is the same amino acid at the same location
in the same protein, then those stretches are for WHAT IF the same.
The command SETEAA allows you to create one extra self made amino acid. This
one is called the user defined self made amino acid. You will be prompted for
a three letter code under which this self made amino acid should be known.
This three letter code should of course not be one of the existing real or self
made amino acids. After the name is accepted, you will be prompted for the
names of the amino acids which make up this user defined self made amino acid.
Here you can only give the twenty real amino acids. If you later would like
to change this user defined self made amino acid, you can just use the SETEAA
command again. The first thing that SETEAA does is, removing the old user
defined self made amino acid if it exists.
The command SHOEAA shows you all presently set self made amino acids. It also
shows the user defined self made amino acid if there is one defined already.
The following self made amino acids are predefined:
BIG TRP + TYR + PHE + HIS + ARG + LYS + MET
SML GLY + ALA + SER
POS ARG + LYS
NEG GLU + ASP
POL ARG + LYS + GLU + ASP + GLN + ASN + HIS
If you want more, you should ask Gert Vriend, but ask very friendly, because
it means at least an hour of work.
the command SCNUSE will cause WHAT IF to prompt you for a group and a
hit number. It will then lift this hit from the database, and store it
as a separate molecule at the end of the protein range of your soup.
If you want to lift all contacts from the database that look like a
certain contact in you protein. The option to do this is DGCONT in the
DGLOOP menu.
The command DOSCAN can be used to get hits out of the database that
give a minimal score against a stretch of residues in the soup when
compared using the Dayhoff matrix.
You will be prompted for a range and a minimal score. All stretches in that
ranges will be compared with all stretches in the database. Every time
that a hit is found that gives when compared with the stretch in your
molecule no mutations that score below the dayhoff cutoff given, one is added
to the protein in which that hit was found. At the end a list with the
number of hits per protein is shown.
The command SAVGRP will save all present groups in a file. You will be
prompted for the name of the file in which to store the groups.
The command RESGRP will restore the groups from a file. You will be
prompted for the name of the file from which to restore the groups. This
file must of course be created earlier with the SAVGRP option.
The command PARAMS will, as usual, bring you to the menu from which you
can change the parameters for the SCAN3D related options.
The command SCNSTS brings you in the menu that allows for the evaluation
of groups. The following options are available in this menu:
The command ONEPRF will cause WHAT IF to prompt you for a group
number, a hit range, and the position in the group (the first, second,
etc., position in the fragments in the group). For all hits the
frequency of the twenty residues at the given position is listed.
Behind the frequency for each residue type the total frequency
in the database, and the the frequency in the database in Helix,
Strand, Turn and Coil are given. The last number is the preference
parameter (that is the natural logarithm of the expected
frequency divided by the observed frequency).
This preference parameter is NOT normalized for seconadry
structure, accessibility, etc. It is just a comparison between the
frequency of occurrence of the residue type at this position in
this group and the frequency that is expected from the frequency
of residue types in the whole database assuming a random
distribution.
The command ALLPRF will cause WHAT IF to prompt you for a group
number and a hit range. For all hits the
frequency of the twenty residues at all positions in the group
is listed. The last column shows the total frequency of this residue
summed over all positions in the group.
Documentation to be typed.
Documentation to be typed.
Documentation to be typed.
The command LSTCHI will cause WHAT IF to prompt you for a group number,
for the range of hits to be used,
and for a position in that group. For every hit the protein 4-letter
identifier, the number of the residue in this protein and all torsion
angles will be listed. The order of the torsion angles is: Phi, Psi,
Omega, and than, when present, Chi-1 till Chi-5.
The command LSTCHI will cause WHAT IF to prompt you for a group number,
for the range of hits to be used, and for two positions in that group.
For each residue its frequency at every position in the hit will be
listed (and the sum over all positions in all hits).
For the two residue positions a 20 * 20 residue distribution table will
be produced. The 10 most frequent pairs in this table will be listed
separately.
These parameters are probably totally useless for the SCNSTS menu. This
PARAMS command was added to ease future parameter additions.