Atomic parameter correlation rows (SEARCH)

Introduction.

WHAT IF allows you to correlate atomic parameters. This can be handy if you want to get the following types of questions answered:

How many potential hydrogen bond donors are buried, but not involved in a hydrogen bond?

How often is an internal water molecule in between two acidic groups In my molecule.

The principle is that the atomic parameters are converted into rows of logicals. The elements of these rows are true if the corresponding atom meets the requirements set by the user. (Eg. involved in a hydrogen bond with donor- acceptor distance shorter than 3.2 Angstrom, and the angle not deviating more than 45 degrees; or the atom is in a residue with total surface accessibility less than 5.0 squared Angstroms). These rows can be logically combined. The results can be analyzed in many different ways.

An article written on these parameter correlation rows has been added as an appendix.

Many options in this menu will work flunky or even incorrect if you have overlapping molecules in the soup (e.g. superposed molecules, or a set of NMR structures).

See the LSCRIP option if you want to repeat a query over a large set of molecules (e.g. the whole PDB, or a set of NMR structures).

Rows as database search tool

The option LSCRIP (see the chapter on SCRIPT) can be used to make a script run over a whole set of PDB files. The SEARCH menu allows you to do more complicated queries over the whole PDB than you can do for example with teh SCAN3D menu, or with other database systems. It will be slow, but it can do everything...

Options to generate rows

Hydrogen bonds (ROWHBO)

The command ROWHBO will calculate all hydrogen bonds for all residues in the soup. All atoms potentially involved in at least one hydrogen bond will be marked 'true'. See the HBONDS chapter about the difference between potential and real hydrogen bonds. You will be asked if hydrogen bonds with water should be included in the calculations. If you answer YES, all experimentally determined water molecules will be included. If you also want to include bulk water, you should use the ROWACC option and mark all atoms with non-zero accessibility.

Salt bridges (ROWSBR)

The command ROWSBR will cause WHAT IF to search for salt bridges in your protein. Only salt bridges between amino acids will be searched for. All atoms involved in a salt bridge will get a flag set in a row. A salt bridge is defined as an acidic oxygen and a basic nitrogen being within a certain distance. This distance has a default of 5.0 Angstrom. See the chapter on parameter setting if you want to change this. WHAT IF will ask you whether you want histidines to be considered basic or not. If you call them basic then both side chain nitrogens will be considdered basic at the same time.

Polar atoms (ROWPOL)

The command ROWPOL will create a row in which all polar atoms (those are the nitrogens and oxygens in the sidechains of Arg, Lys, His, Asp, Asn, Glu and Gln) are set to TRUE.

Surface accessibility (ROWACC)

The command ROWACC will cause WHAT IF to prompt you for ATOMS or ACIDS. By this it means that the limits to be set on the surface accessibility are for individual atoms, or for the sum of all atoms in the residue. You will thereafter be promted for the surface accessibility limits. Here you have to give two numbers (The defaults are 0.0 and 0.0; meaning completely buried). These limits are the lower and upper accessibility values between which the row element for that atom will be set to true. In case ATOMS were selected, the limits are applied straight forward to every atom. In case ACIDS (meaning amino acids) were selected, WHAT IF first calculates the total surface accessibility for the whole residue, if that value is between the limits given, then the row elements are set to true for all atoms in the residue.

Remember that surface accesibility is defined as the area of the sphere on which the center of a water molecule that touches the atom can be found. See the chapter on surface area calculations for the algorithm used to calculate these areas.

Using the default parameters, an accessible surface of 0.1 Angstrom is already enough to allow for a hydrogen bond with bulk water.

Surface accessibility (ROWGAC)

The command ROWGAC will cause WHAT IF to prompt you for residues. You can then give real residues, or self made residues (see SHOEAA, SETEAA etc.) or combinations thereof. All atoms in every copy of these residues (or in every real residue that is part of a requested self made residue) will get their row element set to true.

So, for example, if you answer the question about which residues to label with `SER ALA BIG`, all atoms in all ala, phe, his, lys, met, arg, ser, trp and tyr residues will be marked with true.

If you give only one kind of residue, you can select individual atoms. If you use more residue types at the same time, you can only select ALL, BACK, or SIDE for all atoms, backbone atoms or sidechain atoms respectively.

You will thereafter be promted for the surface accessibility limits. Here you have to give two numbers (The defaults are 0.0 and 0.0; meaning completely buried). These are the limits on the combined accessibility (per residue) for all atoms that you selected. If the accessibilities have not yet been determined, WHAT IF will activate the SETACC command in the ACCESS menu directly upon starting this option.

Remember that surface accesibility is defined as the area of the sphere on which the center of a water molecule that touches the atom can be found. See the chapter on surface area calculations for the algorithm used to calculate these areas.

Using the default parameters, an accessible surface of 0.1 Angstrom is already enough to allow for a hydrogen bond with bulk water.

Proximity to a cavity (ROWCAV)

The command ROWCAV will cause WHAT IF to prompt you for ATOMS or ACIDS. If you give ATOMS, all row elements belonging with an atom that makes up part of the wall of a cavity will be set. Giving ACIDS will cause WHAT IF to set all row elements for a residue if at least one of its atoms makes part of the wall of a cavity. You will also be prompted for the probe radius used while making the cavity map with the CAVITY option in the MAP menu.

This option is not full (fool?) proof. If you did not run the CAVITY option in the MAP menu, this option will not execute correctly.

Potential hydrogen bond donors and acceptors (ROWPDA)

The option ROWPDA sets all row elements belonging with nitrogen or oxygen atoms to true. ROWPDA stands for ROW Potential Donor or Acceptor.

Residue types (ROW1AA)

If you give the command ROW1AA, WHAT IF will prompt you for residues. You can then give real residues, or self made residues (see SHOEAA, SETEAA etc.) or combinations thereof. All atoms in every copy of these residues (or in every real residue that is part of a requested self made residue) will get their row element set to true.

So, for example, if you answer the question about which residues to label with `SER ALA BIG`, all atoms in all ala, phe, his, lys, met, arg, ser, trp and tyr residues will be marked with true.

If you give only one kind of residue, you can select individual atoms. If you use more residue types at the same time, you can only select ALL, BACK, or SIDE for all atoms, backbone atoms or sidechain atoms respectively.

Atom types (ROW1AT)

The command ROW1AT will cause WHAT IF to first execute the ROW1AA option. Thereafter you will be promted for the atom names. If you only gave one single residue (not a self-made residue) you can give individual atom names. If you gave several residues, you can now only give ALL (meaning all atoms in all residues, which makes this option identical to ROW1AA), BACK to only use the back bone atoms, or SIDE to only use side chain atoms. A row will be created in which the flags are set to true for every atom that was given.

Proximity to water molecules (ROWNOH)

If coordinates for H2O molecules are present in the soup you can use ROWNOH to set a row for atoms that are closer than a certain distance to a water molecule. Nothing is done with the orientation of the water molecule with respect to the residue, or the atoms. The distance used is the distance between the Van der Waals` surfaces. The default distance is 0.25 Angstrom.

Proximity to cofactors (ROWNCF)

If coordinates for co-factors are present in the soup you can use ROWNCF to create a row for atoms that are closer than a certain distance to a co-factor. Nothing is done with the orientation of the atom with respect to the co-factor. The distance between the Van der Waals' surfaces is used. The default distance is 0.25 Angstrom. All single atoms that are not water (e.g. metal ions) and all co-factors in the soup are used.

Crystallographic B-factors (ROWBFT)

The command ROWBFT will cause WHAT IF to prompt you for a range of crystallographic B-factors. All atoms that have their B-factors within this range will get the corresponding logical in the row set to TRUE.

Getting values from files (GETVAL)

Sometimes other programs can do things of which you would want that whatif did them too, because you could use the extra info in the parameter correlation searches. Well, don't worry. The GETVAL option allows for that. Just let the other program write a file (N lines F10.0 each) with one value per atom. The atom order should of course be the same (IUPAC atom order) as WHAT IF uses. If you now use the GETVAL option, you will be prompted for the name of this value file, and for the range of values. WHAT IF will now read one value from that file for every atom in the soup, and set the corresponding logical in the row to TRUE for every atom for which the value read falls within the given range.

Helix capping rows (ROWDIP)

Residues sitting in one of the three N- or C- terminal positions of a helix are called helix capping residues. Normally you want GLU (or ASP) at the N-terminal site, and ARG or LYS at the C-terminal site. This way you make use of the helix dipole. That is the reason for the name of this option ROWDIP where DIP stands for dipole. You are prompted for N-caps and for C-caps. If you answer those questions with YES then the residues in N- or C- cap position will get all their atoms tagged in the row. If you answer twice with NO, no row will be generated. You will afterwards be prompted for the number of the row.

Secondary structure rows (ROWHST)

The command ROWHST will cause WHAT IF to evaluate the secondary structure if that has not been done yet (see commands SETHST or SHOHST). Thereafter you will be prompted for a secondary structure element (Helix, Sheet, Turn, or Coil). All atoms in all residues in the main sequence (the one for which you have coordinates) that are determined to have that type of secondary structure are set to true.

Mutability rows (ROWHSP)

The command ROWHSP will cause WHAT IF to prompt you for the name of the HSSP file that corresponds to the present contents of the soup. From this file it will read the mutability factor. You will be prompted for a lower and an upper mutability value. All residues for which the mutability falls within this range will get all their atoms tagged TRUE.

Manual setting of rows (ROWMAN)

The command ROWMAN will cause WHAT IF to keep prompting you for residue ranges till you give 0 (zero). All atoms in all residues within these ranges will be set to TRUE.

Row as function of contacts (ROWCON)

The command ROWCON will cause WHAT IF to prompt you for some information. First you will be prompted for the ranges with which the contact should take place. Just give one or more ranges, finish with zero. Second, you will be asked if intra range contacts should be used too. If you say NO then no atom in the given range will be tagged at all. If you say YES, then probably many atoms in the given range will be tagged too, because most residues have some contacts with their covalent neighbours. Atomic contacts between covalently linked atoms are never taken into account. Then you will be prompted for ATOMS or ACIDS. If you say ATOMS then only atoms that make a contact with the given range will be tagged. If you say ACIDS, then a whole residue will be tagged as soon as one atom in it makes a contact with an atom in the given range. Finally you will be prompted for the contact distance. Two atoms are considdered making a contact if their distance minus the two Van der Waals radii minus the given cutoff is less than zero.

Row as function of selected contacts (ROWCNR)

The command ROWCON will cause WHAT IF to prompt you for some information. First you will be prompted for the ranges with which the contact should take place. Just give one or more ranges, finish with zero. Second, you will be asked if intra range contacts should be used too. If you say NO then no atom in the given range will be tagged at all. If you say YES, then probably many atoms in the given range will be tagged too, because most residues have some contacts with their covalent neighbours. Atomic contacts between covalently linked atoms are never taken into account. Third you will be prompted for the row that holds the constraints on the given range. This means that only those atoms will be looked at in the given range that are tagged TRUE in the row you give. So a contact between an atom somewhere, and an atom in the given range that is not tagged, will simply be considdered as not being a contact. Then you will be prompted for ATOMS or ACIDS. If you say ATOMS then only atoms that make a contact with the given range will be tagged. If you say ACIDS, then a whole residue will be tagged as soon as one atom in it makes a contact with an atom in the given range. Finally you will be prompted for the contact distance. Two atoms are considdered making a contact if their distance minus the two Van der Waals radii minus the given cutoff is less than zero.

Operations on atomic parameter rows

Once several rows of atomic parameter flags have been generated, the nice part of working with rows comes. They can be logically combined just as groups can. The way rows can be combined, is sligthly different from the way this is done with groups, because of the different nature of what is in them. The following options can be used to logically combine rows: ROWAND, ROWOR, ROWNOT, and ROWXOR.

Several other operations that only operate on one row are available too.

Logical and (ROWAND)

ROWAND will prompt you for two rows. It will then generate a row having the elements set to true for every atom that has its elements set to true in both two input rows. You will then be prompted for the output row number. Giving zero here will cause WHAT IF to throw the row away.

Logical or (ROWOR)

ROWOR will prompt you for two rows. It will then generate a row having the elements set to true for every atom that has its elements set to true in either one of the two input rows. You will then be prompted for the output row number. Giving zero here will cause WHAT IF to throw the row away.

Exclusive or (ROWNOT)

ROWNOT will prompt you for two rows. It will then generate a row having the elements set to true for every atom that has its element set to true in the one row, but not in the other. So the element should either be set to true in the first input row, or in the second, but not in both two input rows. You will then be prompted for the output row number. Giving zero here will cause WHAT IF to throw the row away.

Exclusive or (ROWXOR)

ROWXOR will prompt you for two rows. It will then generate a row having the elements set to true for every atom that has its element set to true in the first row, but not in the second. You will then be prompted for the output row number. Giving zero here will cause WHAT IF to throw the row away.

Inverting a row (ROWINV)

ROWINV changes every .true. in a row into a .false. and vice versa. You will be prompted for the output row number. This can be the same as the input row number. Giving zero will cause WHAT IF to do nothing.

Row subset operation (ROW1TA)

The command ROW1TA will cause WHAT IF to prompt you for one row, and a residue range. The resultant row will have the same atoms tagged outside the given range as the input row. Within the given range all atoms will be set true in every residue that has at least one atom true.

Row subset operation (ROW1TO)

The command ROW1TO will cause WHAT IF to prompt you for one row, and a residue range. The resultant row will have the same atoms tagged outside the given range as the input row. Within the given range all atoms will be set false in every residue that has at least one atom false.

Row subset operation (ROW0TA)

The command ROW0Ta will cause WHAT IF to prompt you for one row, and a residue range. The resultant row will have the same atoms tagged outside the given range as the input row. Within the given range all atoms will be set true in every residue that has all atoms false.

Inspecting rows

There are several way to inspect the results of the above mentioned operations:

Looking which rows exist (ROWSHO)

The command ROWSHO will cause WHAT IF to show you all presently active rows. For every row it will show you the number of the row, the number of elements set to true in this row, and the way in which this row has been created.

Looking at the contents of rows (ROWHIT)

The command ROWHIT does almost the same as the command LISTA. It shows all amino acids with all their atoms for a user defined range of amino acids at the terminal (and with the log-option switched on (see DOLOG and NOLOG) also in the log file of course). The only difference is that the seven-th column will now show two esclamation marks ( !! ) for every atom for which the flag is set in the requested row.

Making a table of the hits in a row (ROWTAB)

The command ROWTAB will cause WHAT IF to prompt you for a table number, a residue range and a row number. Every residue with at least a hit for one atom will be written in the table. Residues without hits in them are written as blanks.

Looking at residues with a hit in it (ROWHTO)

If you want to see only the residues with a hit in it you can use the command ROWHTO. You will be prompted for the row number, and the range of amino acids. WHAT IF will then go over all amino acids in that range and show those that have at least one hit in it.

Counting hits in residues (ROWHPR)

The command ROWHPR will cause WHAT IF to prompt you for a row and a residue range. For every residue in the range it will give one line of output consisting of the residue and its type and its name, followed by the number of hits in this residue, and the maximal number of hits in this residue type. The latter is of course equal to the number of atoms in this residue.

Counting hits in residues (ROWHP1)

The command ROWHP1 will cause WHAT IF to prompt you for a row and a residue range. For every residue in the range that has at least one atom marked true in the requested row it will give one line of output consisting of the residue and its type and its name, followed by the number of hits in this residue, and the maximal number of hits in this residue type. The latter is of course equal to the number of atoms in this residue.

(Re-)initializing the rows (ROWINI)

As there are only ten rows to work with, you might need to reset all rows in order to create space to generate new, other rows. The command ROWINI causes WHAT IF to irreversibly wipe out all previously generated rows, and all information about them. Be aware that you do not have to empty a row before you can write in it. After all rows have been filled, WHAT IF by default overwrites the ten-th row, but you can by hand overwrite every row you want.

Saving and restoring rows

Introduction

WHAT IF has two ways of backing up rows. The one way is row by row, the other is all rows together. This feature allows CPU intensive search results to be stored for future sessions. One should be aware however that very strange things can happen if rows are restored at a moment that the soup contents is different from the momemnt that the rows were backed up.

Saving one row (MAKROW)

The command MAKROW will cause WHAT IF to prompt you for the number of a row, and a file name. The row given will be saved in that file. You can later retrieve the row with the GETROW command.

Retrieving one row (GETROW)

The command GETROW will cause WHAT IF to prompt you for the name of a row file. This file must be created with the MAKROW command. It will read the file, and store the row in the first available free row (or overwrite row 10 if no free rows are available). Be aware that very strange things can happed in case the row that you read does not belong with the present soup contents.

Saving all rows (SAVROW)

The command SAVROW will cause WHAT IF to prompt you for a file name. It will then store all presently information about rows in this file. Use the command RESROW to retrieve the data later.

Restoring all rows from file (RESROW)

The command RESROW will cause WHAT IF to prompt you for the name of a file created with the SAVROW command. It will then initialize all information about rows presently in memory, and read all information from this file. Be aware the very strange things can happen if the soup is different now from the moment that the file was written with the SAVROW command.