Relational protein structure database (SCAN3D)

Introduction.

WHAT IF has a relational protein structure database on-line available. The command SCAN3D activates the menu that operates this database. I used the name SCAN3D to honour Steve Gardner who called his very fancy relational database program 3DSCAN; eons ago he and I discussed this methodology for a long time over a lot of beer. SCAN3D allows you to search in the database for sequence, secondary structure, 3D-structure, accessibility, etc. characteristics. The last part of this chapter is an early draft of an article explaining why SCAN3D is so very special...

The commands are roughly divided in five groups:
1) Database inspection commands.
2) Scanning (database search) commands.
3) Logical operations (making relations).
4) Evaluation of results (listings, graphics).
5) Other commands.

Since there are many options, only a limited set is initially active in this menu. Use the command MORE to activate all options.

The main principle of this database is that you search for fixed length stretches of amino acids that have certain relations between all their stored parameters. These found stretches are stored in groups. These groups can be combined using logical operations like AND, OR, XOR, etc. They can also be visualized at the graphics. The length of the groups searched for can be set from 5 till 35 (with the SETLEN command).

The experienced user will see that there is some overlap between the groups described in this chapter and the DG*** groups described in the structure fragment chapter.

Inspecting the database (SHOENT)

The database consists of sequence files, frequency tables, secondary structure files, coordinate files, pointer files, etc. All relevant information stored in these files can be shown at the terminal.

The following commands allow you to look at specific parts of the database: NOW INDEX SHOHED SHOHDS SDBHST SEQINF. Some of these commands use internal parameters, but you do not have to worry about those, the defaults are set such that you will most likely get what you want any way. Otherwise look in the chapter parameter setting(s). All these options also write in the log file if the LOGGER option has been switched on (with the DOLOG command).

Indexing the database (INDEX)

The command INDEX shows you the PDB 4-letter identification code for all proteins presently available in the database. For every entry the number of residues, the number of atoms, the number of water molecules and the number of co-factors are shown.

Indexing the database and parameters (NOW)

NOW is an extended version of INDEX. It not only gives all output that would be given by INDEX, but it also shows you the values for the parameters that influence the way the database files are dealt with. You do not have to worry too much about these parameters, the defaults are such that the program does what you want 99 percent of all times anyway.

Inspecting PDB file header (SHOHED)

When you type SHOHED you will be prompted for the PDB 4 letter identification code for the protein you want to see. If you do not know this code, you can use the INDEX command to get a list of all of them presently in the database. For this entry the header part of the PDB file will be shown at the terminal.

Inspecting all PDB file headers (SHOHDS)

The command SHOHDS causes WHAT IF to show the HEADER, COMPND, SOURCE, and AUTHOR record for all entries in the database at the terminal. Also shown is for every entry the number of residues, the number of atoms, the number of water molecules, and the number of co-factors.

Inspecting 3-d database sequences (SEQINF)

This is a very crude option. It lists all sequences presently in the database. You can reduce this to one sequence at a time by usage of parameter 4 (see the chapter on setting parameters), but the SDBHST option does in this case virtually the same. However SEQINF can also show frequency distributions and neighbour matrices if you set the parameters correctly.

Inspecting database helix/sheet/turn determinations (SDBHST)

WHAT IF has stored a secondary structure determination for every amino acid in the database. These determinations were made by the program DSSP (Written by Kabsch and Sander, see appendix D). This option allows you to see these secondary structure determinations. You will be prompted for the name of the entry. If you do not know these codes, you can use the INDEX command to get a list of all of them presently available in the database. WHAT IF will show the sequence in one letter code, and the secondary structure element code as given by DSSP. If you do not know this programs principle, I suggest you read the article, but in short: No code means random coil; H means normal helix. 3 or G both mean 3-10 helix; T means turn, and E and S mean sheet. See also appendix D.

Inspecting database accessibilities (SDBACC)

The command SDBACC will cause WHAT IF to prompt you for the name of the entry. If you do not know these codes, you can use the INDEX command to get a list of all of them presently available in the database. WHAT IF will show the sequence in one three letter code, and the total accessibility for every residue. After every 15-th residue the screen stops scrolling till you hit return. Please read the chapter on accessibilities about some pitfalls with accessibility calculations.

Inspecting database torsion angles (SDBCHI)

The command SDBCHI will cause WHAT IF to prompt you for the name of the entry. If you do not know these codes, you can use the INDEX command to get a list of all of them presently available in the database. WHAT IF will show the sequence in one three letter code, and all torsion angles per residue. As 7 database files need to be inspected for this option, it will take a few seconds before listing the angles commences. The screen will stop scrolling after every 15-th hit. Hit return to continue scrolling.

Scanning the database

The relational database in WHAT IF does not use a fancy language like SQL for its queries. Instead, the user has to make the relations. The general principle is that you look for one characteristic, and store the result in something called a group. Thereafter you search for a second characteristic, store this result in another group, etc. Then you make the relations by means of logical operations on these groups. The normal logical operations AND, OR, XOR, etc are available. This section describes what all you can ask the database. The next section describes the usage of these logical operations.

Introduction

You can fully flexibel get any sequence out of the database. See the last part of this chapter about how to set the length of the stretches searched for, about how to define groups of amino acids with a common name. See the previous section for inspection of the sequences in the database.

Search sequences (SEQUEN)

The command SEQUEN will cause WHAT IF to ask you N times (N is the length of the stretches searched for) 'Give the amino acid at position *'. For every position in the search string you can give just return, meaning that every amino acid is allowed there. You can also type one or more amino acids in three letter-code. If you type more than one amino acid, this means that every typed amino acid is acceptable at this position in the stretches to be found. You may also mix in one or more of the so-called self-made amino acids (see last section of this chapter).

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for secondary structure (HELSHT)

The name of this command is a little bit to humble. It does not only allow you to search for helix and/or sheet, but also for turns and coil. If you type HELSHT, WHAT IF will loop over all positions in the search string, and for every position prompt you whether this position should be HSTC*. You can now give any combination of these 5 characters. If you give for example H at position 3, all stretches that will be lifted from the database will be helical at the third position. If you give CT at position 6, all stretches that will be lifted from the database will either be turn or coil at the sixth position. If you give * at any position, then every type of secondary structure will be allowed at that position.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Search for phi-psi combinations (PSIPHI)

The command PSIPHI will cause WHAT IF to loop over the length of the search string, and for every position prompt you for the limits of phi and psi at this position. Here you have to give 4 values, all in the range -180.0 till 180.0. The first two are the lower and upper limit for phi, the last two are the lower and upper limit for psi.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Search for omega values (OMEGA)

The command OMEGA will cause WHAT IF to loop over the length of the search string, and for every position prompt you for the limits on omega at this position. Here you have to give 2 values, both in the range -180.0 till 180.0.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Search for conserved residues (SCNCNS)

The command SCNCNS will cause WHAT IF to prompt you for the degree of conservation at each position in the search profile. Conservation 0 (zero) means that every residue is allowed and possible at this position. Conservation 100 means that only 1 residue is found at this position. Conservation is measured by HSSP.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Search for position in the chain (SCNPOS)

The command SCNPOS will cause WHAT IF to prompt you for the absolute and fractional position in the chain. The absolute position indicates where in the chain the first residue of the search stretch is allowed to be. Use negative numbers to indicate a distance to the C-terminus. The fractional distance (values from 0.0 till 1.0) indicates in which part of the protein the first residue of the search stretch is allowed to be.

Examples: To find all C-teminal helical stretches one would combine a HELSHT run with SCNPOS with absolute range -1 till -2 (leaves one residue free at the end), and fractional range 0.0 till 1.0.

To find all Cysteines in C-terminal domains, one would combine a SEQUEN search with a SCNPOS search with absolute range 80 till 1000, and fractional range 0.5 till 1.0.

Search for bsckbone h-bonds (SCNHBO)

The command SCNHBO will cause WHAT IF to ask you for every backbone nitrogen and oxygen in each residue in the search stretch whether it should be hydrogen bonded. If you answer yes, it will ask for the secondary structure type of the residue it is hydrogen bonded to. (Here, as usual, you can answer H,S,T,C or any combination of these, or *)

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Search for side chain chi-angles (CHIVAL)

The command CHIVAL will cause WHAT IF to prompt you for the number of the sidechain chi-angle. You can give here 1 till 5 for chi-1, chi2, chi-3, chi-4, chi-5, respectively. WHAT IF will then loop over the length of the search string, and for every position prompt you for the limits on the requested chi angle. If the database amino acid does not have this chi-angle (like there is no chi-4 in alanine...) then this database hit is not acceptable. The only way to accept everything is by just hitting return when the defaults of -180 180 are suggested. If you actually retype -180 180, every amino acid that posesses this chi-angle is acceptable. If you just hit return, every amino acid is acceptable at this position. If you dont use the suggested default, you have to give 2 values, both in the range -180.0 till 180.0.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Searching for accessibility values (ACCVAL)

The command ACCVAL will cause WHAT IF to loop over the length of the search string, and for every position prompt you for the limits of the surface accessibility of the residue at this position. Here you have to give 2 values, which are the lower and upper limit for total surface accessibility for the residue at that position. Read the chapter on accessibility calculations about some pitfalls with accessibility values.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Searching related sequences using Dayhoff scorings matrix (DAYHOF)

The command DAYHOF will activate the sequence search option that accepts only amino acids which according to the Dayhoff scorings matrix are worth more than a certain number of points. The following scorings matrix is being used:


  V L I M F W Y G A P S T C H R K Q E N D
V 5 2 2 1 0-1 0-1 0-1-1 0-2-1-1-1-1-1-2-2
L 2 5 2 3 2 0 0-2-1-2-1 0-3-1-1-1 0-2-2-2
I 2 2 5 2 0 0 0-2-1-2-1 0-2-1-2-2-3-2-2-3
M 1 3 2 5 2-2-1-2 0-2-1 0 0 0-2-2 0-2-1-1
F 0 2 0 2 6 3 3-3-2-2-1-2-3 1-2-3-3-3-3-2
W-1 0 0-2 3 6 3-2-2-3 0-1-1 0 0-2-1-2-3-3
Y 0 0 0-1 3 3 6-3-2-3 0-2-2 1-1-2-2-1-1-2
G-1-2-2-2-3-2-3 5 0 0 0-1-2-1 0 0-1 0 0 0
A 0-1-1 0-2-2-2 0 5 1 1 0-2 0-1 0 0 1 0 0
P-1-2-2-2-2-3-3 0 1 5 0 0-3 0 0 0 0 1-2 0
S-1-1-1-1-1 0 0 0 1 0 5 2-1 0 1 0 1 1 2 0
T 0 0 0 0-2-1-2-1 0 0 2 5-1 1 0 0 0 1 0 0
C-2-3-2 0-3-1-2-2-2-3-1-1 6 0-2-3-3-3-2-2
H-1-1-1 0 1 0 1-1 0 0 0 1 0 5 2 1 1-1 1 1
R-1-1-2-2-2 0-1 0-1 0 1 0-2 2 5 2 2 0 0-2
K-1-1-2-2-3-2-2 0 0 0 0 0-3 1 2 5 1 1 1 0
Q-1 0-3 0-3-1-2-1 0 0 1 0-3 1 2 1 5 2 1 1
E-1-2-2-2-3-2-1 0 1 1 1 1-3-1 0 1 2 5 1 2
N-2-2-2-1-3-3-1 0 0-2 2 0-2 1 0 1 1 1 5 2
D-2-2-3-1-2-3-2 0 0 0 0 0-2 1-2 0 1 2 2 5

This means that if you request a aspartic acid at a certain position in the search string, and say that the the score should be at least 2 points, that glutamic acid, asparagine and aspartic acid are acceptable at this position.

You will be prompted for the average Dayhoff scoring value first. This is simply the average of the scores for all positions in the search string. Thereafter you will one by one be prompted for the residue at each position in the search string, and its minimal Dayhoff score. If a certain residue is allowed to be anything, just give any residue and -100 or something very negative for the requested minimal score.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Searching for turns (TURNTP)

The command TURNTP allows you to search for turns of a certain type. Turns are defined as a reversal of the chain direction over four residues. These residues are called I, I+1, I+2, I+3. Normally a hydrogen bond is found between residue I, and I+3. WHAT IF will prompt you for the turn type. You should give one of the following names:

I IP II IIP VIA VIB VIII IV

according to the nomenclature of Wilmot and Thornton in J.Mol.Biol, (1988) 203, 221-232 (where P stands for ' or prime).

The following limitations will now be placed on the phi and psi angles of the residues I+1 and I+2:

                      PHI1 PSI1   PHI2 PSI2
I   : TYPE I    TURN ( -60  -30    -90   0)
IP  : TYPE I`   TURN (  60   30     90   0)
II  : TYPE II   TURN ( -60  120     80   0)
IIP : TYPE II`  TURN (  60 -120    -80   0)
VIA : TYPE VIA  TURN ( -60  120    -90   0)
VIB : TYPE VIB  TURN (-120  120    -60   0)
VIII: TYPE VIII TURN ( -60  -30   -120 120)
IV  : TYPE IV   TURN ( ALL OTHERS)

Type IV is only there for completeness. Using it means that you actually get the whole database as result.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Looking for hydrophobic moment values (SCNHYD)

For a full description of hydrophobic moment calculations see the chapter on this item.

WHAT IF has all hydrophobic moments for all proteins in the database on-line available for repeat angle 100 degrees and window width 7.

The command SCNHYD will cause WHAT IF to loop over the length of the search string, and for every position prompt you for the limits on the hydrophobic moment at this position. Here you have to give 2 values. All values are allowed. Normal values fall in the range 0.0 till 0.5.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

Finding atomic contacts (SCNCON)

This option is absolutely amazing. However, it is relatively slow, and not very user friendly. The idea is to find atomic contacts. It allows together with the other SCAN3D options to look for example at all burried glutamic acids for which the O-epsilons are not in contact with a basic nitrogen.

In order to use this option, you have to type a lot. For every position in the search string, you will be prompted for the amino acid(s). If you give one amino acid, you can use the subsequent question about which atoms to use to specify individual atoms. If you give multiple amino acids, you can when asked for the atoms, only give `SIDE-CHAINS` or `BACK-BONE`. After the amino acid is known, the same questions as above will be repeated for the residues with which there should be a contact. The same kind of answers as for the residues searched for should be given. The last question per position in the search string is the contact distance. This is the distance between the atom centers minus the two Van der Waals radii. So for just touching atoms, give zero. Since the database does not contains hits where this distance is larger than 1.5 Angstrom it is useless (but not fatal) to give very large numbers. You can also give negative distances to detect `bumps`.

The last thing you will be prompted for is the database range. Just hit return to use the whole range. If you take the full (app. 100 proteins) database, then the average search will take roughly 20 seconds CPU on a VAX workstation.

After the search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Cysteines and cys-cys bridges (SCNCYS)

The command SCNCYS will cause WHAT IF to prompt you for the cysteine status for each of the positions in the search group. At every position you can give one of the following:

 * (asterix) if this residue is completely free.
-2 if this residue should not be a cysteine.
-1 if this residue should be an unpaired cysteine.
 N if this residue should be a paired cysteine.

N can be zero if you don't care how far down in the sequence this cysteine should be. If you give for example 4, that means that you are going to search for cysteines that are paired with a cysteine that is four residues further in the same sequence. So, this is one of the exceptions where zero is valid input...

Relating groups with logical operations

Creating groups

How to create groups is explained earlier in this chapter. The things one can do with groups should have been explained first, since what can be done with them determines in a sence how one wants to go about making them. A group, also sometimes called subgroup, is a set of peptides all having the same fixed number of amino acids in them. Such a group is the result of searching through the database. Upon searching, the program finds a certain number of hits. All hits are stored, and the user can look at them. A very simple group would for example be: all stretches of 8 amino acids with an alanine in it. This would generate a group with several thousands of hits in it.

these groups can then be combined by means of logical operations.

These operations are AND OR NOT XOR. The user has to type SANDOR, to be able to use one of these options.

The user has several options available to look in or at groups. These options are SHOGRP (shows all groups made) SHOHIT shows hits in a group). Also the option INIGRP is available to clean groups. SETLEN can be used to vary the length of the stretches searched for.

Combining groups by logical operators (SANDOR)

After the comand SANDOR is given, the program responds by asking for the number of the first group. Thereafter you will be prompted for the second group, both times you should give a number from 1 to 10, being the number of one of the groups you generated earlier. You will then be prompted for the number of the group that will receive the result. If this result group is already in use, you get the choice to over-write it, or to make another choice. Then you will be prompted for the logical operation. Here you should give one of the following:

AND OR NOT XOR

These operations do the following:

And

AND creates a new group consisting of all hits that both groups on which it operates have in common.

Or

OR creates a new group which consists of all hits that are present in at least one of the two groups on which it operates.

Not

NOT creates a new group consisting of all hits that are present in one of the two groups on which it operates, but not in the other.

Xor

XOR creates a new group which consits of all hits that are present in the first of the two groups on which it operates, but not in the second. (I don't think that this operator will be used very often).

Inverting a group of hits (SCNINV)

The command SCNINV will prompt you for an input group number and an output group number. These may be the same. It will then invert the input group, and store the result in the output group. If you do a logical OR on the input and the output group of this option, you get the whole database back.

Evaluation of results

There are several ways of inspecting the hits found. They are divided in two categories, listing them at the normal terminal, or showing them graphically.

Showing groups (SHOGRP)

The command SHOGRP shows you which groups have been generated sofar. Also shown is how each group was generated, and how many hits there are in each group.

Showing hits in groups (SHOHIT)

The command SHOHIT causes WHAT IF to prompt you for the number of the group. You should then give the number of one of the earlier generated groups. Don't worry if you forgot the group numbers, if you type something wrong, WHAT IF will at worst tell you so, but it will not crash. After the group number has been accepted, you will be prompted for the range of hits. Just give the number of the first and the last hit you want to see. Now be ready to use the no scroll option on your terminal, because for all requested hits the program will show you the protein in which the hit was found, the sequence numbers of the hit in this protein, the actual sequence, and the secondary structure of this stretch as determined by DSSP (see appendix D). In case you are looking at a group of fragments created by one of the DG*** options, you will also get the RMS deviation between the C-alpha coordinates of the hit and the stretch on top of which it was fitted. However DG*** groups can not yet fully be mixed with 'normal' groups.

Resetting the groups (INIGRP)

If you want a fresh start, or for any other reason you want to get rid of all the groups you have created sofar, you can use the command INIGRP. Be careful with this command, it works immediately, and it is irreversible. Although, you can of course always create all removed groups over again.

Displaying hits.

There are several ways to display hits.

Displaying hits one by one (SCNGRA)

The command SCNGRA will cause WHAT IF to prompt you for the number of a group. Thereafter you will be asked how many hits you wnat to see. At present you can give maximally 100 hits. These hits will be coloured red, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the PS300, and placed in ridge. If you now rotate the lower right dial of the dial box (the one labeled ridge) you will flip through the hits.

Display all hits on top of each other (SCNGRL)

The command SCNGRL will cause WHAT IF to prompt you for the number of a group. Thereafter you will be asked how many hits you wnat to see. At present you can give as many hits as you wish, but strange things will happen if all these hits together have more than 2500 amino acids in them. These hits will be coloured blue till red as function of their position in the hit list, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the PS300, and placed in a MOL-item. You will be prompted for the number of the MOL-object, and the name of the MOL-item.

Display all hit environments (SCNGRN)

WARNING! This option only works as expected when the middle residues of all hits in a group are the same amino acid type (all alanines, or all cysteines, etc.).

The command SCNGRN will cause WHAT IF to prompt you for the number of a group. Thereafter you will be asked how many hits you want to see. At present you can give as many hits as you wish, but strange things will happen if all these hits together have more than 2500 amino acids in them. These hits will be coloured blue till red as function of their position in the hit list, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the PS300, and placed in a MOL-item. In order to superimpose the structures you will be asked which atoms to use for superpositioning. The parameter setting menu can be used to determine which parts of the hit and its environment will be shown.

The option needs a lot of input. You will be prompted for the atoms in the center residue that should make the contact, for the atoms in the central residue that should be used for superpositioning, for the neighbouring residues, and for the atoms in the neighbours that make the contact. This seems somewhat redundant because you typed all this already for the previously run SCNCON option, but I have great plans for these options in the future, and when those are ready, you will understand why.

You will after roughly 10 seconds CPU on a VAX workstation be prompted for the number of the MOL-object, and the name of the MOL-item.

Display the environment of a hit (SCNGRE)

The command SCNGRE will cause WHAT IF to prompt you for a group and a range of hits. It will also prompt you for the atoms in the central residue to be used for superpositioning at the screen, for the atoms in the central residue to be shown, etc. This is the same procedure as for the SCNGRN option.

After all the input, all hits will be shown at the PS300 screen, with their entire environments present.

This option is not yet tested.

Other commands

Changing the group length (SETLEN)

The length of the stretches of amino acid that is being searched for, is always a fixed number. I know that that is not optimal flexible for the user, but I could not work out a method with flexible group lengths that could work just as fast as it can do now. If you do not like the length of these stretches, you can use the SETLEN command. Execution of this command will cost several seconds because WHAT IF has to read many new pointer files.

It is possible to do logical operations on groups that have stretches of different length in them. The program only looks at the first amino acid. If these are the same, meaning that it is the same amino acid at the same location in the same protein, then those stretches are for WHAT IF the same.

Defining an extra self made amino acid (SETEAA)

The command SETEAA allows you to create one extra self made amino acid. This one is called the user defined self made amino acid. You will be prompted for a three letter code under which this self made amino acid should be known. This three letter code should of course not be one of the existing real or self made amino acids. After the name is accepted, you will be prompted for the names of the amino acids which make up this user defined self made amino acid. Here you can only give the twenty real amino acids. If you later would like to change this user defined self made amino acid, you can just use the SETEAA command again. The first thing that SETEAA does is, removing the old user defined self made amino acid if it exists.

Listing the self made amino acids (SHOEAA)

The command SHOEAA shows you all presently set self made amino acids. It also shows the user defined self made amino acid if there is one defined already. The following self made amino acids are predefined:

BIG  TRP + TYR + PHE + HIS + ARG + LYS + MET
SML  GLY + ALA + SER
POS  ARG + LYS
NEG  GLU + ASP
POL  ARG + LYS + GLU + ASP + GLN + ASN + HIS

If you want more, you should ask Gert Vriend, but ask very friendly, because it means at least an hour of work.

Getting hits in the soup (SCNUSE)

the command SCNUSE will cause WHAT IF to prompt you for a group and a hit number. It will then lift this hit from the database, and store it as a separate molecule at the end of the protein range of your soup.

Look alike contacts (DGCONT)

If you want to lift all contacts from the database that look like a certain contact in you protein. The option to do this is DGCONT in the DGLOOP menu.

Screening with the Dayhoff matrix (DOSCAN)

The command DOSCAN can be used to get hits out of the database that give a minimal score against a stretch of residues in the soup when compared using the Dayhoff matrix.

You will be prompted for a range and a minimal score. All stretches in that ranges will be compared with all stretches in the database. Every time that a hit is found that gives when compared with the stretch in your molecule no mutations that score below the dayhoff cutoff given, one is added to the protein in which that hit was found. At the end a list with the number of hits per protein is shown.

Saving all groups (SAVGRP)

The command SAVGRP will save all present groups in a file. You will be prompted for the name of the file in which to store the groups.

Restoring groups from file (RESGRP)

The command RESGRP will restore the groups from a file. You will be prompted for the name of the file from which to restore the groups. This file must of course be created earlier with the SAVGRP option.

Parameters (PARAMS)

The command PARAMS will, as usual, bring you to the menu from which you can change the parameters for the SCAN3D related options.

Statistics on groups (SCNSTS)

The command SCNSTS brings you in the menu that allows for the evaluation of groups. The following options are available in this menu:

Show for one position in a group all statistics (ONEPRF)

The command ONEPRF will cause WHAT IF to prompt you for a group number, a hit range, and the position in the group (the first, second, etc., position in the fragments in the group). For all hits the frequency of the twenty residues at the given position is listed. Behind the frequency for each residue type the total frequency in the database, and the the frequency in the database in Helix, Strand, Turn and Coil are given. The last number is the preference parameter (that is the natural logarithm of the expected frequency divided by the observed frequency). This preference parameter is NOT normalized for seconadry structure, accessibility, etc. It is just a comparison between the frequency of occurrence of the residue type at this position in this group and the frequency that is expected from the frequency of residue types in the whole database assuming a random distribution.

Show minimal statistics for all positions in a group (ALLPRF)

The command ALLPRF will cause WHAT IF to prompt you for a group number and a hit range. For all hits the frequency of the twenty residues at all positions in the group is listed. The last column shows the total frequency of this residue summed over all positions in the group.

Show the Chi-1 statistics for one position in a group (ONECHI)

Documentation to be typed.

Show the Chi-1 statistics for all positions in a group (ALLCHI)

Documentation to be typed.

Put three dimensional Ramachandran plot in a MOL-item (GRACHI)

Documentation to be typed.

List torsion angles for residues at one position (LSTCHI)

The command LSTCHI will cause WHAT IF to prompt you for a group number, for the range of hits to be used, and for a position in that group. For every hit the protein 4-letter identifier, the number of the residue in this protein and all torsion angles will be listed. The order of the torsion angles is: Phi, Psi, Omega, and than, when present, Chi-1 till Chi-5.

Show statistics about pairs in one group (PRFPAI)

The command LSTCHI will cause WHAT IF to prompt you for a group number, for the range of hits to be used, and for two positions in that group.

For each residue its frequency at every position in the hit will be listed (and the sum over all positions in all hits).

For the two residue positions a 20 * 20 residue distribution table will be produced. The 10 most frequent pairs in this table will be listed separately.

Bring you in the SCAN3D/SCNSTS parameter menu (PARAMS)

These parameters are probably totally useless for the SCNSTS menu. This PARAMS command was added to ease future parameter additions.