Input and output of coordinates (SOUP)

Introduction.

WHAT IF needs coordinates. Without coordinates the program is still a nice database handler, and it can tell you what time it is, but without good coordinates there is not much need for using WHAT IF.

WHAT IF can read and write PDB-files (Brookhaven protein data bank format) and GROMOS files and it can read DIANA files.

The central data structure in WHAT IF is the so-called 'SOUP'. The SOUP is an assembly of water with all molecules in it. WHAT IF knows five kinds of molecules:
     1) protein;
     2) drugs/co-factors;
     3) DNA/RNA; 
     4) single atomic molecules;
     5) (groups of) water molecules.

Because WHAT IF can only work with a finite number of molecules at one time, water molecules are taken together as one molecule, consisting of all the water molecules that came from one source (eg. one input file, or one water position prediction).

The SOUP thanks it name to the fact that it consists of molecules floating around in water. However, there do not necessarily have to be water molecules present.

The menu that is activated with the SOUP command allows you to manipulate the SOUP. The WATER menu performs the addition or deletion of water in case you want to add or delete them by number. Special operations on water molecules (like automatic addition or deletion) are also performed from the WATER menu (see chapter WATER).

Rather often this writeup refers to residues as input to an option, in many instances however, the input can also be drugs, and sometimes also solvent. In these cases there is not always clear documentation about it. Normally you can keep in mind that if it is chemically sensible, WHAT IF will allow for it. In any case, just try it. WHAT IF will not crash in case you try something that is not allowed.

Reading/writing coordinates

Unfortunately the entire 'Who is who' in biomolecular computing, crystallography and biophysics has once written the one and only universal standard for coordinates. We therefore need an almost infinite number of options to read or write coordinate files. Most of these option have to do with interfacing to specific programs. These options are described in the chapters that deal with these interfaces. It is envisaged that a general coordinate reader will be provided for WHAT IF before version 6.0 is ready (December 95).

The command GETMOL is the general way of getting coordinates from a PDB file into memory. (With GETGRO you can read GROMOS formatted coordinate files). This is a command from the general menu, which means that you can execute it from every menu. You will be prompted for the name of the PDB file and thereafter the symmetry matrices, and ALL coordinates are read from this file and ADDED to the soup. If you want to start with an empty soup, you should first execute the INISOU command from the SOUP menu. There are many ways to write coordinates to a file. Many options do so automatically (eg. SHOHST, SPLINE, REFI, etc.). The generic command however is MAKMOL in the soup menu. This command writes a PDB file.

Where are the options?

Most coordinate related options are present in the SOUP menu. The commands GETGRO and GETMOL can be executed from ALL menus.

The set name

Whenever WHAT IF adds coordinates to the soup these coordinates need a set name. This set name is very handy if you want to remember which molecule in the soup came from which input file. If you are prompted for the set name and you hit just return, the set name will be made identical to the file name.

Reading coordinates from PDB file (GETMOL)

The command GETMOL will cause WHAT IF to prompt you for a PDB file. It will then read all coordinates from this file, and add them to the soup.

If the file is not found in your local directory, but it exists in the central PDB directory on your machine, you will be asked if you want to use this PDB file instead. Make your local WHAT IF manager aware of the the notes on the configuration files if WHAT IF can not find the standard PDB directory. (The standard PDB directory must be put in the CCONFI.FIG file).

Reading GROMOS coordinates (GETGRO)

The command GETGRO will cause WHAT IF to prompt you for a formatted GROMOS coordinate file. It will then read all coordinates from this file, and add them to the soup.

Writing coordinates in a PDB file (MAKMOL)

The command MAKMOL is the only correct way to write PDB files. You will be prompted for a template coordinate file. The header of this template file will be copied to the output PDB-file. Thereafter you will be prompted for the name of the PDB-file to be created. Last you will be prompted for the residue ranges.

Saving the soup contents in a file (SAVSOU)

The command SAVSOU will cause WHAT IF to prompt you for a save-file number. If will then create a file (numbered as requested) and puts all presently available data in the soup (molecules, residues, atoms, secondary structure, accessibilities, etc.) in this file. You can later use RESSOU to restore the soup from this file.

Restoring the soup from a file (RESSOU)

If you have previously saved the soup in a save-file with the SAVSOU command, you can use the RESSOU command to restore the soup from that save-file. Be aware that RESSOU will first destroy ALL data presently in the soup.

Saving the soup status in a file (SAVSTA)

The command SAVSTA will cause WHAT IF to prompt you for a save-file number. If will then create a file (numbered as requested) and puts all presently available data in the soup (molecules, residues, atoms, secondary structure, accessibilities, etc.) in this file. So far all is similar as for SAVSOU, but SAVSTA additionally tries to save the interactive status (scale, translation, view etc., labels, objects on/off etc.). You can later use RESSTA to restore the status from this file.

Restoring the soup from a file (RESSTA)

If you have previously saved the status in a save-file with the SAVSTA command, you can use the RESSTA command to restore the status from that save-file. Be aware that RESSTA will first destroy ALL data presently in the soup.

Using protons

WHAT IF was originally designed to work without explicit protons. We are presently adapting the program to accept protons as independent atoms. This can not be done overnight. Many options presently can deal with explicit protons correctly. Several options not yet. If you want to use explicit protons, give the following magical command as the first command in a WHAT IF session:
SETICO 29 1
Be aware however that several options will not (yet) treat the protons correctly yet, and some options will even create a stack-dump if used with the proton option active.

The protonisation is expected to be finished by mid 1996.

See the command ADDHYD in the refine menu for `dreaming` proton coordinates.

The soup

The command SOUP brings you in the menu from which you can manipulate the SOUP. At present SOUP consists of water with molecules in it. These molecules can be protein, DNA/RNA, non-water solvent, or drug. Everything not recognized by WHAT IF will be called drug. So, co-factors like FAD, or complex solvent molecules like MPD will be called drugs. Ions like Cu2+ Ca2+ etc. will be called non-water solvent molecules.

The commands in the SOUP menu can be logically grouped as follows:

1) look at the SOUP;

2) cut or paste proteins;

3) delete or insert molecules or residues;

4) save or restore amino acids;

5) cys-cys bridge related options.

6) other options.

In the SOUP menu you will find the command MORE. This command can be used to increase the number of options in the SOUP menu. Normally only the most used commands in this menu are visible, but MORE will also make the less frequently used options visible in the menu.

Looking at the contents of the soup

Listing the soup (SHOSOU)

The command SHOSOU will cause WHAT IF to show you the contents of the SOUP. The number of molecules will be shown, as well as their names. The molecules will be divided in the following classes: -1 = undefined; 0 = indicative of a program bug; 1 = protein; 2 = drug; 3 = DNA/RNA; 4 = solvent, non-water; 5 = water. The ranges of residues spanned by molecules and the total content per molecule class are also shown.

Cutting and pasting proteins

WHAT IF decides whether two residues are covalently bound by looking at the distance between the alpha carbon coordinates. Sometimes it makes multiple molecules out of one protein when you don't want that. The cut and paste commands are available to overrule WHAT IF's ideas about this. Also it is nice to fool WHAT IF sometimes by telling that all proteins are one big molecule shortly before you run an option that can only work on one molecule at a time.

Pasting proteins (PASTE)

The command PASTE will cause WHAT IF to prompt you for the C-terminal residue of a molecule. It will then paste this residue and the N-terminal residue of the next molecule in the soup, thereby making one molecule out of the two. If you try to paste at a position where you previously placed a cut-mark (see CUT), only this cut-mark will be removed and WHAT IF will automatically determine whether there will be a chain break or not. If you want to be sure that a paste-flag is set in such a case, you should paste at the same place twice.

Pasting all proteins (PASTAL)

The command PASTAL will cause WHAT IF to execute the PASTE command (see above) automatically for all proteins in the SOUP. PASTAL will first execute the INIPAS command (see below), so all previously set cut-flags and paste-flags are removed first.

Cutting molecules (CUT)

The command CUT will cause WHAT IF to prompt you for a residue number. It will then act like a protease at the C-terminal side of this residue. Thus if this was not the C-terminal residue of a molecule, the molecule you are cutting will change into two molecules. If you try to cut at a position where you previously placed a paste-mark (see PASTE), only this paste-mark will be removed and WHAT IF will automatically determine whether there will be a chain break or not. If you want to be sure that a cut-flag is set in such a case, you should cut at the same place twice.

Undoing cuts and pastes (INIPAS)

The command INIPAS will cause WHAT IF to remove all manually set cut and paste flags. It will thereafter re-determine what it thinks are independent molecules and what not. Hereby it uses solely distance criteria. Also two molecules that are in the soup separated from each other by a third one can never become one molecule, no matter how close they are in space.

Listing the cut and paste flags (SHOPAS)

The command SHOPAS can be used to list all presently set cut and paste flags.

Saving and restoring residues

If you want to try mutations (see mutating residues) you often might want to go back to the original situation later. You can of course every time write in between PDB-files, but there is also the possibility to save and later restore residues. This is a much faster procedure, and it costs less disk space.

Saving a residue (SAVAA)

The command SAVAA will cause WHAT IF to prompt you for the number of a residue. It will then write the residue in a file. You can later restore this residue with the RESAA command.

Restoring saved residues (RESAA)

The command RESAA will cause WHAT IF to prompt you for the number of a residue. You will also be prompted for the type of residue you want to insert. This must be the type that was used during the SAVAA operation. It will then add this residue from its file into the soup immediately after a residue for which you will be prompted. If you want to replace the residue in the soup with the restored residue, you should delete that residue in the soup, and insert the saved residue after the residue N-terminal of the one you are replacing. You can either first restore the previously saved residue after residue N in the soup, and then delete residue N, or first delete residue N, and then insert after N-1.

The real WHAT IF hackers can abuse the SAVAA and RESAA options to do rather complicated modifications of molecules.....

Deleting inserting mutating correcting

There are many ways to correct, delete, insert, or mutate amino acids, from many menus throughout WHAT IF. Direct correction, deletion and insertion operations can only be performed from the soup menu.

WARNING: many parameters are no longer correct after changes have been made in the soup. These parameters involve ROWS, H-BONDS, CUT and PASTE flags, DGLOOP groups, SALT BRIDGES, or more general, all information that depends on (pointers to) amino acids.

The following commands are available:

Initialize the soup (INISOU)

This commands removes all molecules from the soup. Other parameters like groups, matrices, maps, etc. will remain untouched. The INISOU command is irreversible!

Delete a molecule (DELMOL)

This command causes WHAT IF to perform the SHOSOU command first, and then prompt you for the number of the molecule to be deleted. If you give molecule 0 nothing will be deleted.

Delete multiple molecules (DELMLS)

This command causes WHAT IF to perform the SHOSOU command first, and then prompt you for the numbers of the molecules to be deleted. If you give molecule 0 nothing will be deleted.

Deleting a residue (DELETE)

The command DELETE will cause WHAT IF to prompt you for a residue number. That residue will than be deleted from the soup, without any structural corrections in the environment.

Correcting a residue range (CORAA)

The command CORAA will cause WHAT IF to prompt you for a residue range. All atoms in this range that are missing will be created by WHAT IF, provided that at least the backbone N, C-alpha and C are present. You will be asked by WHAT IF if you also want to correct bad inter atomic distances. If you answer with YES, WHAT IF will move atoms around till the bad inter atomic distances are better. However, this option will also displace some atoms that are actually placed correctly, and that might not be desired.

Don't worry about all kinds of error messages. These are caused by errors which when elsewhere in WHAT IF occurring, are fatal, but here don't matter too much. Be aware that this option only accepts amino acids.

Correcting all residues (CORALL)

The command CORALL will cause WHAT IF to execute the CORAA option without asking for the range, because it assumes that all amino acids in the soup should be corrected (at least those that are wrong). All atoms in this range that are missing will be created by WHAT IF, provided that at least the backbone N, C-alpha and C are present. You will be asked by WHAT IF if you also want to correct bad inter atomic distances. If you answer with YES, WHAT IF will move atoms around till the bad inter atomic distances are better. However, this option will also displace some atoms that are actually placed correctly, and that might not be desired.

Don't worry about all kinds of error messages. These are caused by errors which when elsewhere in WHAT IF occurring, are fatal, but here don't matter too much. Be aware that this option only works on amino acids.

Listing bad residues (CNTBAD)

The command CNTBAD will cause WHAT IF to look at all residues in the soup. It will count all residues that it thinks are perfect, and all that it thinks are bad. It will list all bad residues.

Cys-cys bridge commands

WHAT IF normally determines which cysteines are bridged by simple distance criteria. Every pair of cysteine S-gammas closer than 2.5 Angstrom trigger a cys-cys bridge. There are a few commands to manipulate this.

Listing cys-cys bridges (SHOCYS)

The command SHOCYS will cause WHAT IF to list all cysteine bridges presently known to it. This includes the self determined ones, and the user set cysteine bridges.

Setting cys-cys bridges (SETCYS)

The command SETCYS will cause WHAT IF to prompt you for the first and for the second cysteine in a cys-cys bridge. This can of course only be done if there are at least two unpaired cysteines available.

Initialization of cys-cys bridges (INICYS)

The command INICYS will cause WHAT IF to remove all flags for manually set cys-cys bridges, and set all cys-cys bridges according to distance criteria again.

Other soup commands

The following commands are also available from the soup menu:

Adding C-terminal oxygens (ADDOXT)

At present WHAT IF treats C-terminal oxygens still as single atomic individual molecules. This will be changed in version 6.0. However, till that time, you can use the ADDOXT command to add C-terminal oxygens where needed. This is for example needed after you remove one or more residues, and create new C-termini.

Reading proteins from the database (GETDBF)

The command GETDBF can be used to get a protein from WHAT IF's relational structure database in the soup. The command GETDBF will cause WHAT IF to prompt you for the number of a database file. You can use the INDEX command in the SCAN3D menu to see which proteins all are available. You will be asked if you want to initialize the soup first. If you answer with YES, the command INISOU (see above) will automatically be executed first. If you answer with NO, the requested protein will be added to the soup.

Creating a DNA molecule (MAKDNA)

The command MAKDNA will cause WHAT IF to display a mini menu that allows you to create a DNA molecule. Further information will be provided as soon as this option is bug free. Till that time, use MAKDNA with great care.

Renumbering residues (NEWUNQ)

The command NEWUNQ will cause WHAT IF to renumber the unique identifiers (=PDB identifiers) for the residues in your soup. They will be numbered 1, 2, 3, ... etc. You can use RENUMB if you want alternative numbering schemes.

Changing or setting chain-identifiers (SETCHA)

The command SETCHA will cause WHAT IF to prompt you for a range(s) of residues and for a (new) chain identifier. A chain identifier must be a single character. It will give all selected residues the chosen chain identifier.

Be aware that this option can get you in deep trouble....

If you give the first half of a chain a different chain identifier from the second half, you actually converted that one chain into two chains. Every character is allowed as chain identifier. WHAT IF has no problems with that, but the official PDB nomenclature only allows for capital A-Z, and several other programs might count on you using only those chain identifiers. If you give two disconnected chains the same chain identifier than a few WHAT IF options might start giving funny results, and other programs will become unpredictable.

In summary, this is an option that requires some thinking....

Making a copy of part of the soup (SOUCOP)

The command SOUCOP will cause WHAT IF to prompt you for a range of residues. It will then make an exact copy of this range after the last protein in the soup. This is a nice option for rearranging your soup without the usual edit procedures. It is also a useful option for loop transplants.

Hidden options

The following options are so-called hidden options:

Remove double molecules from the soup (CLNSOU)

The command CLNSOU removes all drugs, co-factors, water, ions, etc. from the soup. Also, in case proteins and/or DNA/RNA overlap severely in space, the molecule with the highest number in the soup gets deleted. This is a rather harsh and irreversible option. Use SAVSOU before you use this option?

Reading coordinates from a PDB file (GETUS3)

One of the most common errors in the residue nomenclature in PDB-like files is the addition of a fourth character to it (e.g. HISA, ASPH). The GETUS3 command can be used to overcome this problem. The command GETUS3 will cause WHAT IF to prompt you for a PDB file. It will then read all coordinates from this file, and add them to the soup. The fourth character of the residue name will be skipped upon reading.

Looking at the pointers in the soup (STATUS)

If WHAT IF gets confused it sometimes starts spitting incomprehesible messages at you such as "Soup out of sync". These messages are mainly meant for us, but that does not help you much, because your session is about to crash. The best thing to do in such cases is to run the STATUS command. That produces a lot of seemingly useless output, but it might rescue your session. After STATUS, try to use MAKMOL to save your soup, kill WHAT IF, and start again.

This is mainly a debug routine. The very experienced user might read the comments in the routine MOL010 to see what kind of pointers are all listed.

Fixing DNA molecules (INVERT)

Sometimes DNA molecules are present in the PDB file in the wrong order (i.e. the last residue is given first). In these cases INVERT can be used to invert the order of the bases in the molecule. WHAT IF is not very clever when dealing with DNA (mainly because I never work with DNA), so if WHAT IF gets confused about DNA molecules, try this option.

Alternatively, use the FIXDNA option.

By the way, you can also use this option (without any guarantees) on stretches of protein....

Merging drugs (MERGED)

The command MERGED allows you to merge multiple drugs into one single drug molecule. This is a handy option if you run out of possible molecules in the soup because of billions of single ions or something similar.

Deleting complete base pairs (DELDNA)

If you want to delete a base pair from the soup, that might be rather cumbersome work because you have to do a lot of residue number calculations. With the DELDNA option you can delete an entire base pair by proving the residue number of just one of the bases.

Fixing DNA molecules (FIXDNA)

Sometimes DNA molecules are present in the PDB file with the wrong residues (i.e. the O3* sits at the wrong base). In these cases FIXDNA can be used to correct the positions of O3* atoms in the molecule. WHAT IF is not very clever when dealing with DNA (mainly because I never work with DNA), so if WHAT IF gets confused about DNA molecules, try this option.

Alternatively, use the INVERT option.

Displaying the topology file (SHOTOP)

The command SHOTOP will cause WHAT IF to show you most information that it obtained from the last topology file that was read in. This is normally the topology file that get read automatically upon starting WHAT IF.

Forcing WHAT IF to neglect errors (DVADOM)

The command DVADOM will force WHAT IF to overrule its internal determination of which atoms are bad, and which treat them all as OK. You can see if atoms are bad when you type LISTA. The AT OK column has + for good atoms and - for bad atoms.