General notes (GENERAL)

Introduction

WHAT IF is written by G. Vriend, R.W.W Hooft and D. van Aalten as a tool for protein engineers, drug designers, molecular dynamics fans, NMR spectroscopists, and crystallographers. A long list of people who donated code is given at the end of this writeup. Stephan Schnabel, Brigitte Altenberg, but especially Jolanta Stouten contributed significantly to this writeup and helped chasing the bugs from the program. WHAT IF can be used on a Silicon Graphics IRIS (all the way from the INDY to the N processor VGX machines), on Bruker machines, on IBM-Pc (clones) under DOS or under LINUX, on the DEC alpha under OSF, on DEC Ultrix workstations, on HP under HP-UNIX and on SUN workstations. A generalized X11 interface is used on most of these platforms. WHAT IF can also be used without a graphics device, in which case it will probably run on all computers with decent FORTRAN and C compilers. WHAT IF costs $US 250,- (or DM 400,-) for academics and $US 5000,- for profit making institutions. It is delivered with source code, with databases, with this 500 page writeup, but without guarantees. There are no monthly fees, and FTP-based updates are free of costs, and can be gotten as often as desired.

The full WHAT IF provides more than 2000 options to the user. A few options only work on only one or a few of the aforementioned machines because that machine provides some special hardware features which the others do not have.

WHAT IF allows the molecular engineer to sit in front of a computer terminal or better, a graphics workstation, and ask questions that start with "What if ...." and then continue for example with "...I mutated that valine into an isoleucine?". The program can help the user by calculating the consequences of such a mutation. To do so it can use a three dimensional relational protein database in one, two, or three dimensions. It allows for quick evaluations of mutations in terms of occupied space, Van der Waals contacts, hydrogen bridges, accessible surfaces etc. The very fast access to the graphics system stimulates human inspection of results. The program is set up in a very transparent way, using many easy to use menu's. The user only needs to know the very few basic options, plus the options he or she wants to use. So, although WHAT IF offers more than two thousand options to the user, one only needs to know very few of those in order to answer even elaborate questions.

A graphics device can be used to continuously monitor answers to questions. Contacts, hydrogen bonds, salt bridges, accessible surfaces, etc. can be shown easily and quickly. The usage of a graphics device allows for interactive manipulation of structures. Structures can be shown with respect to one or more maps (eg. potential energy maps, or electron density maps). The option to color atoms or residues as function of their properties (like temperature factor, atom type, residue type, charge, hydrophobic moment, etc) facilitates quick evaluation of these properties.


The WHAT IF writeup is regularly updated on a World Wide Web (WWW) server. The URL of the WHAT IF homepage is:
http://swift.EMBL-Heidelberg.DE/whatif/
This homepage does not use any fancy WWW tools, and can be read by all known versions of Mosaic, Netscape, etc. The program can at all times generate plot files. These can be postscript files, HP-plot files or just general files with draws and moves in it. In case a laser printer and the postscript software are present, WHAT IF can put screen pictures immediately at the laser printer in postscript format; either in black and white, or in color. The orientation matrix, scale factor, translation and slab-value (=clipping value) provided by the graphics system will be passed on to the general plot files.

The enormous flexibility of WHAT IF guarantees that new options can be added quickly and easily.

WHAT IF uses much less memory than comparable programs. Its memory requirements are very machine dependent. On all machines a swap file of 40 Mbyte is for all practical purposes adequate. For several machines we know the minimal and optimal memory requirements. These are listed below. Be aware however, that the same program normally requires more memory after an update of the operating systems. The only reason that operating systems are updated is according to me that they can make them bigger so that you need to buy more memory -).

                               Memory (in Mbytes)
Machine     Operating system    Minimal    Optimal
DEC            Ultrix              ?         ?
DEC Alpha      OSF 2.*             64        64      *1
SG             IRIX 4.*            16        48
SG             IRIX 5.*            32        64      *2
IBM-Pc (clone) DOS                 12        16
IBM-Pc (clone) LINUX               8         16

1) See chapter 96 for notes on swap-file usage.

2) Some WHAT IF users have experienced problems with WHAT IF on SG Indys with 32 Mbytes of memory and operating system 5.1 or 5.2.


The disk requirements are less humble. All databases together will occupy 140 megabytes. However, not all databases need to be present on disk at the same time, and the software to (re-)generate the databases at any desired size is part of WHAT IF.

How to get started

How to get started totally depends on how your system manager has set up the WHAT IF account. Contact your local WHAT IF manager.

If this is the first time you use WHAT IF for a certain project, you should create a new subdirectory.


IMPORTANT. KEEP EVERY PROJECT IN ITS OWN SUBDIRECTORY!

WHAT IF starts directly after typing whatif. Be aware that WHAT IF takes up to two minutes on IBM-Pc (clones) under MS/DOS. Thereafter you are all set to go.

On UNIX machines your WHAT IF manager normally has defined a logical called `whatif`. If not try typing /usr/people/vriend/DO_WHATIF.COM

When ready, a menu and the WHAT IF prompt:


WHAT IF>

will appear at the screen. This means that the program is ready to receive your commands. Whenever you see this prompt, you can get a list of options available at that moment by hitting the return key. The options show up at the screen in an order the logic of which will only become clear to you after you worked with the program for a couple of times. If you want to know what a certain option does, you can either type:

HELP OPTION

in which OPTION stands for the option of your choice, or look in the chapter in the paper copy or computer-readable copy of this writeup that has the same name as the menu you are in, or just use the alphabetical index.

The command SHORT will cause WHAT IF to show you all options available in this menu with a one line explanation for that option.

The command INFO can in most menus be used to get very extensive HELP on a topic. INFO uses the same syntax as HELP. And if no INFO pages are found, INFO will do the same as HELP

The most important thing to do is:


Go through the TUTORIAL!!!!! 

That takes a day or three, but you win that back in less than no time. You can also visit us at the EMBL for a one week user course.

How to use WHAT IF.

Using menus

WHAT IF is menu driven. This means that for most options you first have to enter the menu that holds this option before you can execute it. There is a set of general options that can always be executed, no matter in which menu you are. These options normally fill the upper two thirds of the terminal window or the screen of the separate terminal. The options that are specific for the menu you are presently in, are on the lower lines. Among the general commands you will see a few lines called menu's. These are the commands to activate menu's. You can always leave a menu, and go back to where you came from by typing END. There is no need to always go back to the main menu before you go to another menu. You see the route you have taken through the menus listed as the right most column of the terminal window or the screen of the separate terminal.

In some menus you will find the command MORE. The execution of this command will add new options to this menu, Normally only the most used options are directly visible. That is done in order not to overload the user with options. MORE can not be undone. MORE only needs to be executed once per menu.

In several menus there are even more commands available after you typed MORE. If you type HIDDEN you get a short list of hidden commands in about ten menus. These commands are normally not documented further than by the text supplied by the HIDDEN command.

For the experienced user the possibility is build in to use most commands from every menu. To do so, you need to know the command's name, be able to use it without help, and to understand the way WHAT IF works. You activate this possibility by starting the command line with a percent sign. E.g. %SHOSOU will execute the soup menu command SHOSOU no matter in which menu you are.

User interaction

There are a few things about interaction with WHAT IF that everyone should know before starting to work with WHAT IF.

Whenever you have activated an option which requires additional user input, you can cancel the option by typing 0 (zero) as answer to any of the follow-up questions. If zero is not acceptable to WHAT IF, it will tell you so; do not worry because there will come more questions, and at least one of them allows you to enter 0 (zero) to bail out. This always applies when you are prompted for a file name, for a residue, for a residue range, for a group, or for a row. In case you are prompted for something else, try 0 (zero) as input, there is no way that this can crash the program.

Input residue numbers

If you are prompted for a residue or a residue range, you can respond in several ways. The first possibility is to just type the residue number(s) which WHAT IF has assigned to your residue(s) (or drugs, or waters). These are just sequential numbers, starting with 1 for the first residue encountered, etc.

Use the PDB names

If your input file used a different scheme for the numbering of residues you can give your own number(s) by typing O (the character O, not the digit zero) followed by the original residue number(s) (Which do not need to be numerical, in contrast to the strict PDB rules, WHAT IF will also accept names like 17A etc.). Use O as the first character of the line, and not for every residue or drug you give. This holds for all options throughout WHAT IF. The original names are always listed by WHAT IF in brackets.

Residue input via picking

If you give just only P, you will be asked to pick the residue(s) at the screen. In this case you can pick any atom in the residue(s) you want. Be aware that this option is not implemented entirely throughout WHAT IF. You better test if certain options function with P input the day BEFORE you have to give this demonstration to the director general of your company...

Input all residues

If you want to input all residues (protein and DNA/RNA) as a range, you can just type ALL.

Input the total soup

If you want to input all amino acids, DNA/RNA, co-factors, and water you can type TOT.

Input by molecule number

In case you want to enter one entire molecule you can give M followed by the molecule number (as assigned to the molecule by WHAT IF).

Separating between identical molecules with U

In case you have multiple copies of one molecule (for example before and after a Molecular Dynamics run) you can type U followed by first the molecule number and then the two original residue names. U3 17A 123 will use the residues 17A till 123 (according to the original numbering scheme) from the third molecule.

Separating between identical molecules with S

In case you have multiple copies of one molecule (for example before and after a Molecular Dynamics run) you can type S followed by first the molecule number and then the two sequential residue names. S3 18 23 will use the 18-th till 23-th residue from the third molecule.

Adressing groups of residues, families

A family is defined as a group of one or more amino acids consecutively located in the sequence. Families are not something very intelligent or so, it is just a way of giving names to stretches of amino acids. One can for example give all major secondary structure elements their own name. Families can at several stages be used as input for options. So is it for example possible to give families a color, or delete all residues in a family.

Commands that are related to usage of families are easily recognized because they have the three letter combination FAM in their name. The CLUFAM option brings you in the menu that deals with families and clusters.

Whenever you are prompted for one or more ranges you can also enter a family name.

Adressing groups of residues, clusters

A cluster is a group of residues that does not need to sit next to each other in the sequence. In a way clusters are sets of families.

Commands that are related to usage of clusters are easily recognized because they have the three letter combination CLU in their name. The CLUFAM option brings you in the menu that deals with families and clusters.

Whenever you are prompted for multiple ranges you can also enter a cluster name.

Long input lines and type ahead

WHAT IF allows for type ahead. So you can type a command and all its input on one line. If you use this feature, you should provide ALL requested input because WHAT IF is not smart enough to gamble where you want to use defaults. You can also often (but not always) put multiple commands on one line. You can not always type ahead beyond the GRAFIC command, beyond a zero, beyond a file name, or beyond a YES/NO question. It is not guaranteed that WHAT IF will work error free if you type ahead of a complicated command with much additional input.

If you use type ahead, always give the first AND the last residue of any range, also if the first and the last residue are the same.

Command nomenclature

Many commands in WHAT IF are constructed from two groups of three characters. The following three character codes always have the same or a very similar meaning (this table is not yet complete):
AA     Amino Acid (Often also used for DNA/RNA...)
AAS    Amino Acid Sidechain (Often also used for DNA/RNA...)
ACC    Has to do with ACCessibility
ALI    ALIgnment
ANA    ANAlyse
AND    Logical AND operation
ATM    AToM
BAD    BAD (is not good)
BFT    B-FacTor
BLD    BuiLD (mainly protein)
BND    BoND between atoms
CAV    CAVity
CEL    Crystallographic CELl
CEN    CENter
CHK    CHecK
CHI    Torsion angle
CLU    CLUster of 3D related residues
COL    COLour
CON    CONtact
COR    CORrect
CPK    Solid spheres
CYS    CYSteine or cysteine bridge
DBL    DouBLe
DEB    DEBump, remove bumps
DEF    DEFault or DEFine
DEL    DELete or remove
DG     Distance Geometry rotamer and loop search
DIF    DIFference
DNA    DNA (or RNA!)
DST    DiSTance
EDT    EDiT
ENV    ENVironment, molecules to be taken into account
ETM    Energy TErm
EVA    EVAluate
FAM    FAMily, range of covalently connected residues
FLP    FLiP, or turn around
FPO    Phi-psi-omega, backbone torsion angles
GET    Read from a formatted file (see MAK, SAV, RES)
GRA    GRAphics
GRL    Superpose all frames/hits/etc. in one MOL-item
GRI    GRIn and grid
GRO    GROmos
GRP    GRouP of database hits
H2O    Water
HBO    Hydrogen BOnd
HEL    HELix
HID    HIDden (as in hidden, or invisible options)
HIT    HIT as in a hit in a database search
HSP    HSsP (multiple sequence alignment files)
HST    Helix, Strand, Turn (in other words: secondary structure)
HYD    HYDrogens
INI    INItializes something.
INV    INVerse (normally used for TRUE <--> FALSE inversions)
LAB    LABel (not picked label, but label in MOL-item)
LIN    LINe
LOG    LOGfile in which options/commands/results are written
MAK    Write in a formatted file (see GET, SAV, RES)
MAP    3D electron density, property or probability distribution MAP
MAT    MATrix
MLS    MoLeculeS (see MOL)
MOL    MOLecule (unless MOL-object is meant) (see MLS)
MOM    MOMent as in hydrophobic MOMent
MUT    MUTate
NAM    NAMe
NEU    NEUral net
NEW    Replace something by an improved copy
NMR    Nuclear Magnetic Resonance
OPT    OPTimize (sometimes OPTionally...)
OR     Logical OR operation
PCK    PiCKed labels
PAR    PARameters
PAS    PASte
PCT    PerCenT
PHI    Backbone torsion angle PHI
PRF    PReFerred or PReFerence or PRoFile
PRP    PRoPerty (sometimes PRePare)
PSI    Backbone torsion angle PSI
QUA    QUAlity op packing
REF    REFine (=regularize) a protein structure
RES    REStore results from a WHAT IF specific file (see SAV, MAK, GET)
RNG    RaNGe of residues
ROT    ROTamer
SAV    SAVe results in a WHAT IF specific file (see RES, MAK, GET)
SCN    SCAN3D; relational structure sequence database
SDB    Show something from the DataBase
SHL    SHelL
SEQ    SEQuence (see SQS)
SET    Calculates something, without showing the results
SHO    Lists results, and displays them if applicable
SMC    SyMmetry Contact
SML    SMalL
SOU    SOUp (all WHAT IF's data)
SPC    SPeCial
SPH    SPHere in space
SQS    SeQuenceS (see SEQ)
SRF    SuRFace
STA    STAtus
STS    STatisticS
SUP    SUPerposition
SYM    SYMmetry
TAB    TABles (internal molecular spread sheet)
TRA    TRAjectory (sometimes TRAnslate)
TST    TeST
USE    USE or activate or incorporate
VAC    VACuum
VAL    VALue
VDD    Van der Waals (surface)
VDW    Van der Waals (radii)
VOL    VOLume
WAL    What if ALignment
WAT    WATer
ZON    ZONe of residues (see ZNS)
ZNS    Multiple ZoNeS (see ZON)

Graphical items and objects

WHAT IF uses MOL-items where-ever possible to represent graphics objects. You can find in the programmers manual what a MOL-item looks like. But in practice it is just a list of vectors or dots. The user preferably should give a unique name to every MOL-item that is created. This name stays attached to this MOL-item. The name can later be used to toggle MOL-items on and off, to delete them, to plot them, etc.

MOL-items are grouped in MOL-objects. A MOL-object can hold many MOL-items. If you do not use too many different colors per MOL-item, then an object can easily hold 20 of them (89 on S.G. machines). This might sound rather limited, but don't worry, I still have to meet the first user who managed to run into overflow problems while creating pictures at the screen.

Using protons

WHAT IF was originally designed to work without explicit protons. We are presently adapting the program to accept protons as independent atoms. This can not be done overnight. Many options presently can deal with explicit protons correctly. Several options not yet. If you want to use explicit protons, give the following magical command as the first command in a WHAT IF session:
SETICO 29 1
Be aware however that several options will not (yet) treat the protons correctly yet, and some options will even create a stack-dump if used with the proton option active.

The protonisation is expected to be finished by mid 1996.

Program parameters

At many places in the program there are distances, cut-off radii, or other parameters being used. In all cases there are default values which are sensibly chosen. However, one still might want to change these parameters. In most menus there is an option called PARAMS. This option will bring you in a small sub menu in which you will find all parameters. These parameter menus are sometimes organized slightly different from other menus. The most important difference is that you HAVE to give END to get out of this parameter setting menu before you can do anything else again; the general commands often do not work in parameter menus.

It is also possible to change parameters that are needed by the general commands. This can be done by typing SETPAR. This brings you in a menu which has many sub-menu's in it. All these sub-menu's allow for the changing of a group of parameters that logically belong together (eg all parameters that influence the way accessibilities are being calculated, or 3-D fragment database parameters, etc).

Release of WHAT IF 5.0

The next version of WHAT IF (version 5.0) is planned to be released somewhere in 1996. The following options are planned for this release:
- A clicker-dee-click user interface that requires virtually no typing.
- Completely automatic mutant prediction module.
- Interactive torsion angle motion for all angles, not only side-chain.
- Adding co-factors from the PDB files to WHAT IF's database.
- User defined program lay-out. E.g. Menu color, window size, etc. will
  be read from a user edit-able parameter file.
- Some small molecule operations will be added.
- An extensive protein structure verification module will be added.
  etc.
Users who plan to make large extensions to WHAT IF are urged to contact me first. I can then tell them if their plans are already being implemented by somebody else. This might save many weeks of programming efforts. To start programming one of the above options is of course certainly a waist of time because in less than half a year time they will be available.

Known bugs

There are still some bugs in WHAT IF. Several of them have not even been detected yet. A few of them are known to me, and are not worth fixing. Several others are easy to predict because of several reasons, but despite the almost certainty of their presence I have not yet found them. Some of the known bugs are double calculations. At many places things are just always calculated. The program is already very big (over 300,000 lines), and I do not care having to recalculate things if this does not cost measurable CPU time, but saves many program lines. Sometimes this results in double or triple messages about something being measured or calculated.

Sorry for this, but it is just Rob and me doing every day programming, and not a team of ten or twenty scientists and programmers as for some of the extremely expensive just as much bugged commercial programs.

Overview of the program

The program WHAT IF offers the user over thirteen hundred options. The writeup needed to describe all options is more than 500 pages long. It is therefore impossible to summarize all options in a few paragraphs. To summarize WHAT IF I will simply list the some of the menu's alphabetically:

ACCESS    Van der Waals and accessible surface options.
ANACON    Analysis, evaluation and visualization of contacts.
ANATRA    Analysis of Molecular Dynamics trajectories.
BUILD     Building proteins, adding residues.
CHIANG    Torsion angle evaluation, manipulation, analysis.
CHKMDF    Evaluation of (H,K,L,F,sigma) files.
CHECK     To check protein structures 
COLOUR    Coloring atoms residues molecules, objects.
CONOLY    Interface to Connolly's programs.
CLUFAM    To group residues in 3D.
DGLOOP    Structure fragment database.
DRUG      A few options to manipulate small molecules.
GRAEXT    Special graphics. Arrows, ball and stick models etc.
GRAFIC    General 3D graphics menu.
GRATWO    2D Graphics menu. (Phi-Psi plot, B-factor plot, etc.)
GRID      Interface to Goodford's GRID program.
GROMOS    Interface to GROMOS.
HBONDS    Hydrogen bond determination, evaluation and display.
HSSP      Interface to HSSP program (mutability prediction).
ITMADM    To manipulate graphical objects.
LABEL     Labeling atoms, residues, etc.
MAP       Administration and display of maps.
MASMAP    Manipulation, editing of maps.
MAPEDT    Crystallographic electron density envelop editor.
NEURAL    Activates a toy neural network.
NMR       NMR related commands.
NOTES     Is a protein specific notebook.
PIRPSQ    Sequence options (alignment, model by homology etc.)
PLOTIT    Plot options.
PORNO     To do molecular pornography (=really beautiful pictures)
QUALTY    Structure quality evaluation, mutant prediction.
REFINE    Structure regularisation.
SCAN3D    Relational protein structure database handler.
SCNSTS    Does statistics on relational database query results.
SEARCH    Interactive search for structure characteristics.
SETPAR    Parameter (re-)setting.
SOUP      Molecular administration (read/write/delete).
STEREO    Switch several stereo modes on/off.
SUPPOS    Superposition of molecules, residues, fragments.
SYMTRY    Symmetry matrix administration/application.
TABLES    Spread sheet for atomic data.
WALIGN    Multi sequence alignment.
WATER     Manipulation of water molecules.
XRAY      Holds a few Xray specific options.
3SSP      Automatic multiple structure superposition.