WHAT IF also contains tools to aid with the elucidation of protein structures. For example X-ray density maps can be displayed, and map- fitting quality can be evaluated.
In the near to middle-long future the main topics of improvement for WHAT IF will be:
user friendliness drug design tools (docking, CCDB interface) improved modeling tools more fully automatically written reports
First the three conceptually difficult, but very important, aspects of WHAT IF will be discussed. These are:
Dual control from either a text window, or a graphics window The use of MOL-items as three dimensional photos of molecules Residue numberingAfter that you will learn some of the general commands, you will learn how to navigate through the menus, and you will learn how to get help or information about options.
Thereafter you will learn how to display molecules residues and atoms.
The first part of the tutorial is closed with some exercises.
The second part of the tutorial starts with some more complicated general options, such as mutating residues, cutting and pasting molecules etc. You also learn how to understand the names of commands. After that several normal everyday options are discussed in a rather arbitrary order.
The third part of the tutorial consists of exercises of which one should choose 10 till 15.
Part four holds three exercises that will only be doable if additional software (GROMOS, RIBBON, GRID, and PLUTON) are installed.
Part five holds exercises for more difficult or less general options. If there is time left, you can try to complete those that you think are interesting.
Before you start the tutorial, copy all files from the directory ..../whatif/tutorial to your directory.
Lets start with an example. Start WHAT IF, and type (literally)
GETMOL 1CRN.PDB (The file 1CRN.PDB is crambin. 1CRN must be in capital letters) Crambin GRAPHIC SHOALL 1 PICT CENTER GO (Click on the green box CHAT after a second or so) LISTA 13 %INISOU LISTA 13 GO (Click on the green box CHAT after a second or so)Forget all details about the commands. Just try to understand that you read in a molecule (crambin) and displayed it in the graphics window. GETMOL reads PDB files, and 1CRN.PDB is the PDB file for crambin. This molecule was stored in the so-called SOUP. You listed the coordinates of residue 13 with LISTA. Thereafter you deleted the molecule from the soup (with the %INISOU command, but more about that later). The second LISTA command told you that there is nothing in the soup. However, the molecule was still happily sitting in the graphics window. That is because with SHOALL 1 PICT you made a 3D-photo of the molecule. This 3D-photo is called a MOL-item, and you told WHAT IF that its name is PICT, and that it should be stored in MOL-object 1. MOL-objects are little buttons at the bottom of the screen where you can store photos called MOL-items. You see that the button labeled MOL1 got yellow upon storing the photo called PICT in it.
This separation between a soup in which the molecules are manipulated, and a graphics screen that holds static photos of the soup is the second of the three complicated, but important aspects of WHAT IF. Your understanding of this separation is essential for your work with WHAT IF.
Use the command INIALL to restart WHAT IF.
Give the command SOUP. Approximately the following menu should pop up:
HELP INFO SHELL SHORTT END $.. %.. !.. SCRIPT - SOUP GRAFIC GRATWO GRAEXT COLOUR PLOTIT PORNO ITMADM LABEL - SOUP ACCESS ANACON HBONDS SYMTRY SUPPOS REFINE QUALTY - DGLOOP SCAN3D TABLES BUILD HSSP WATER CHIANG SETVDW - GROMOS ANATRA ESSDYN GRID 3SSP SEARCH PIRPSQ WALIGN - NMR MAP XRAY CONOLY MASMAP CHKMDF MAPEDT - SETPAR SPCIAL EXTRA CLUFAM NOTES CHECK - MUTATE DEBUMP LISTA LISTR SHOHST RENUMB HISTOR GETMOL - DOLOG NOLOG TRAROT CHARGE MINMAX DIST GO FULLSTOP - --------------------------------------------------------- SHOSOU INISOU DELMOL MAKMOL DELETE DELMLS PASTE CUT INIPAS SAVAA RESAA SHOCYS SETCYS MORE WHAT IF>You see two sets of commands. Those above the line are always active. Those below the line are only active in the SOUP menu. We will explain the commands above the line later. Type:
GETMOL 1CRN (WHAT IF knows that you mean .PDB at the end, you Crambin do not have to type that)The full dialog will look like:
WHAT IF> getmol Give the name of the coordinate file : 1CRN (be careful about uppercase) Give the set-name : Crambin 1 - 10 THR THR CYS CYS PRO SER ILE VAL ALA ARG 11 - 20 SER ASN PHE ASN VAL CYS ARG LEU PRO GLY 21 - 30 THR PRO GLU ALA ILE CYS ALA THR TYR THR 31 - 40 GLY CYS ILE ILE ILE PRO GLY ALA THR CYS 41 - 46 PRO GLY ASP TYR ALA ASN Other atoms found in file 1This `other atom` is the C-terminal oxygen.
WHAT IF first does some checks on the file. However, since crambin is virtually free from errors, nothing gets reported. The residues are read, and listed.
Lets now try some of the soup commands.
Type SHOSOU. Approximately the following text should show up:
Contents of the SOUP: Protein .................... : 1 Drug, ligand or co-factor .. : 0 DNA or RNA ................. : 0 Solvent not water or ion ... : 0 (Groups of) water .......... : 0 Molecule Range Type Set name 1 1 (1 ) 46 (46 ) Protein Crambin 2 47 (OXT ) 47 (OXT ) O2 <-- CrambinThe number of molecules per class are counted. In case of water, all waters that were read from one PDB file are called one group-of-waters molecule. The second part of the SHOSOU output is a list of the molecules. Here you see that each residue has two numbers: the sequential number in the SOUP, and between brackets the number that the residue has in the PDB file. You also see that each molecule got the set name (the one that you gave when your were asked for it) attached to it. The last part just lists which molecule types exist in the soup.
WHAT IF knows 6 classes of molecules.
1) Proteins. 2) Drugs, co-factors. 3) Nucleic acids. 4) Single atomic entities (e.g. metal ions). 5) Water. 6) Attached groups.OXT is the second oxygen on the C-terminal residue. Normally residues have only one oxygen. However, at the C-terminal position they have two oxygens. This second oxygen is called an attached group. It is attached to the backbone C of the C-terminal residue.
Type:
LISTA 46and you see that two atoms are labeled with an arrow. Those arrows either indicate that an attached group is bound there, or that the attached group is bound via that atom.
To see the use of the set name, type
GETMOL 1CRN Copy2 SHOSOUYou read crambin for the second time. Therefore there are now two crambin molecules in the SOUP. The set name is the only way to see who is who.
The third important concept of WHAT IF is residue numbering.
I assume you are still in the SOUP menu. Type:
INISOU GETMOL 1CRN Crambin DELETE 1 (to delete the first residue) SHOSOU LISTA 1With DELETE 1 you deleted the first residue from the SOUP. You see that the second residue now became the first one, but its number between brackets still is the original number two.
Type:
LISTA 4 LISTA O4 (character O, not number zero)You see that, as expected, LISTA 4 (LISTA stands for LIST Amino acid, but can be used to list the atomic information of everything) lists the fourth residue in the SOUP. However, LISTA O4 lists the residue which has the number four in the PDB file. To do it a bit more extreme, type:
DELETE 17 (forget the warnings about split molecules for now) LISTA 18 LISTA O18And now, try to explain this.....
Lets get to the warnings at the DELETE 17 command. WHAT IF realized that if you take a residue out of the middle of a molecule that there is a problem. To avoid making a 5 Angstrom bond between the residues 16 and 17 (numbers after cutting; their PDB numbers, that are given in brackets, are 17 and 19...) it makes two molecules out of the one it had before.
The same problem that holds for residues, holds of course for molecules. If you delete the first molecule, the second becomes the first, etc.
Type:
DELMOL 1 SHOSOUUnderstand...?
HELP SHORTT (indeed with two Ts)You get some help about the topic SHORTT (the command SHORT gives short help for the commands below the line, SHORTT for the commands above the line ). The funny number top left is actually the chapter in the writeup where the command SHORTT is described. Look it up. Also try SHORTT. Now hit return and type:
INFO MAKMOLYou get so much info that it scrolls of the screen. INFO gives you the introductory paragraph of the SOUP chapter in the writeup, and the paragraph that deals with the MAKMOL command. (If you are in another menu, you will of course not get the introductory chapter for the soup menu, but for the menu you are in). Now type SHORT. You get a list with one line explanations of all commands in the SOUP menu. You now know the four levels of HELP in WHAT IF:
1) Just hit RETURN and WHAT IF will tell you what to do, or shows the menu. 2) Type SHORT (or SHORTT) and you get a one line explanation. 3) Type HELP *** and you get help for command ***. 4) Type INFO *** and you get some background information plus help for ***.Type:
COLOUR SHOSOU %SHOSOU ENDWith COLOUR you went into the COLOUR menu. This is not the SOUP menu, and therefore the SHOSOU command does not work. However, by starting a command with a % sign you tell WHAT IF that the command exists somewhere, and that it should go over all menus to find it. This % sign can be used for all commands with a unique name. So, for example, SHORT can not be used after a % sign since WHAT IF would not know which of the 60 SHORT commands to take.
LISTA 13 .. (The real UNIX freaks can use !! instead of ..) 14 LISTA 12 ..What you see is that WHAT IF always stores complete input lines that start with a menu command. So if you type LISTA it gets memorized, but 13 is not stored. However, if you type LISTA 12, then that complete command gets stored, and if you repeat the command with .. or !! then WHAT IF recalls from its memory the command LISTA 12 which is a complete command, and thus gets executed. So, don't use type ahead if you plan on using the .. mechanism.
$ ls (Use $ DIR on PC-DOS machines)Commands that start with a $ sign are sent to the operating system.
TABLES HBONDS SUPPOS REFINEand look at the right most column at the screen. You see the path that you took through the menus. Continue your path deeper into WHAT IF with:
TABLES ACCESS BUILD COLOURAnd look again at the menu-path column. WHAT IF tells you that you went in too deep. Don't worry, WHAT IF will only crash after you go 73 more menus deep. Type SOUP, and check that you can really execute the SOUP commands (e.g. SHOSOU).
With the command END (HALT, STOP or EXIT will also work) you go back to the previous menu. Type END a couple of times. You see how you slowly eat your way back up in the menu tree. Hit return in between to see in which menu you really are at every moment.
For the computer programmers among you: You see that the menu TABLES was entered recursively in the above example. WHAT IF knows how to deal with this problem.
WHAT IF has some 40 large and 20 small menus. During the rest of this course we will inspect roughly half of them.
INISOU GETMOL 1CRN Crambin GRAFIC (or GRAPHIC if you like that better) SHOALL 1 A (You now see the picture of crambin in the upper right corner of the screen) CENTER GO
With this picture we are going to play for a while. First, push down the left mouse button, and move the mouse back and forth, and up and down. Then push the middle mouse button, and move the mouse again. After you typed GO all interaction with WHAT IF goes via the mouse.
Put the cursor exactly on top of an atom, and push either the left or the right mouse button. You see that the atom gets labeled.
Putting the cursor on an object, and pushing one of the two extreme mouse buttons is called picking.
See what happens if you push the left two mouse buttons at the same time, and move the mouse either horizontally or vertically.
There are seven combinations of pushed mouse buttons. They all do something different if you combine them with mouse motion. More about this later in this tutorial.
The file MOUSE.FIG (to be found in the dbdata directory) can be altered if you want WHAT IF to react differently on pushed mouse buttons. You find more information about this in the installation notes.
Pick the box labeled SOUP.
A so-called pull-down menu pulls down. The commands in there should by now look familiar to you, because they are all from the SOUP menu.
Pick HELP in the lower right corner of the screen.
You now get SHORT help for the commands in the soup menu.
Pick SHOSOU in the SOUP pull-down menu.
The text window comes back up, and the same text is shown as if you asked for INFO on SHOSOU in the normal text window. You have to hit RETURN to get rid of the text window again.
Pick HELP again to switch off the HELP facility.
Pick SHOSOU in the SOUP pull-down menu. The text port pops up, and this time you see the same as if you used the SHOSOU command in the SOUP menu. Hit RETURN again to get rid of the text window.
Now either pick the green box labeled SOUP, or double-click anywhere in empty space and the SOUP pull-down menu disappears.
We will now concentrate on the vertical menu at the right side of the screen. Here you find 37 boxes. Why don`t we let WHAT IF do the explaining. Pick HELP at the bottom right again. After that, pick one after the other the whole row of menu boxes starting with WAIT, NOID, etc. You get a little text box explaining all these menu boxes. Don't pick the CHAR menu box (if there is one). That one is only meant to fix a bug in some of the SG-VGX operating system versions. At the end, pick HELP again to switch off the help mode. Lets try a few options in the real world. If all is OK you still have a molecule at the screen.
Pick one or two atoms.
Pick DIST (In the top bar the text "Pick atom one" pops up) Pick an atom (The text changes to "Pick atom two") Pick another atomSeveral thing happened:
The distance shows up in the top bar
A dashed line is drawn between the atoms
The distance pops up as a label half way the atoms
Pick NOID (Look what got removed from the screen) Pick NOID (Look what more got removed from the screen)So, NOID removes distance indicators from the screen and NOID (when picked twice) removes the atom labels from the screen.
The two most important boxes are probably WAIT and CHAT. CHAT was discussed before (on SG systems you do not need to pick CHAT, you can also hit the ESCape key). Lets see what we can do with WAIT.
Pick DIST (You are asked to pick an atom) Pick an atomBut now you found out, it's the wrong atom, or you want other things to do. Anyway you will be asked for second atom. To get out of this mess you pick WAIT. WAIT stands for "Wait a minute, I did not want that". You will see the distance box is changing back to white, and you can continue with the next exercise.
INIGRAAfter that the graphics window is empty. Now type
ZONES (You will be prompted for a zone of residues to be displayed) 1 10 (You will be prompted again) 0 (That is generally how you tell WHAT IF to stop prompting) 1 (Tell WHAT IF to put the graphics vectors in something called MOL1) A (That is the name of the graphical ITEM that holds the ten residues) ZONES 11 20 0 2 B (Now we have twenty residues at the screen) GRACA (For part of the molecule we only want to see alpha carbons) 21 46 (Here also you are continuously prompted) 0 (Again, zero to tell WHAT IF "thats all") 3 C (This set of vectors at the screen is called C, and stored in MOL3) CENTER GONow see what happens if you pick the menu boxes labeled MOL1, MOL2 and MOL3 in the lowest row of the menu at the bottom of the screen. These are toggle switches for the MOL-objects. Pick CHAT and use the command GRASCH (to show side chains) for the residues 31 till 46. Call the MOL-item D, and put it in MOL-object 3. Type GO again, you will see that the menu box MOL3 toggles two items at the same time.
So, whenever you send something to the graphics display to look at it, it needs a name (the MOL-item) and a location (the MOL-object). Please choose the names unique, and only use characters and digits, preferably starting with a character. Don't use blanks in MOL-item names. Not that something will go wrong upon displaying, but plotting, deleting, and recalling old MOL-items in a next session will be impossible.
There are two more things that you need to know about the menu at the bottom. If you pick MENU the MENU disappears. If you pick MENU twice, the bottom menu is also gone. You can now only get the menu's back by picking with the right mouse button at a location where there is nothing at the screen. This option is useful when you want to take photographs of the screen.
Type:
GRACA ALL 0 1 Q ACON 1 GOIf you now rotate the molecule you see that the rotation is centered on the alpha carbon of the first residue. Pick CHAT and type:
INIGRA DBLBND SHOALL 1 Q GO (Pick CHAT after you have seen this)You see that DBLBND tells WHAT IF to draw double bonds where applicable. Type DBLBND again if you want to draw all bonds single again in the future. DSHBND tells WHAT IF that you want all bonds to be dashed. DBLBND and DSHBND can be used together if you want. To try DSHBND type:
INIGRA (To clear the screen) DSHBND 1 (to set the dash bond mode on. 1 will become the dash-length) SHOALL 1 A (And pick CHAT again once you have seen this) DSHBND (To switch dashing off for future MOL-items) GONow we will make a fancy plot. Type
SHOHST (WHAT IF will use DSSP and show the result) %COLHST ALL 0 (Coloring options will be discussed on the next page) INIGRA SPLINE ALL (SPLINE only accepts one range, so no zero needed at the end) N (For now the defaults are OK) 1 A CENTER GO (Put the molecule in a good view) PLOTIT PSTPLT N N A 1 0 0 N Now type your name Y (And now walk to the black and white laser writer)This option is not yet completely ready, but you can see where it goes...
This extremely beautiful plotting options was a kind donation to WHAT IF by David Thomas.
GRAFIC INIGRA COLOUR COLMOL (Colour a whole molecule) 1 (Number of the molecule) 120 (Colour number) END SHOALL 1 Q CENTER GO (And pick CHAT after you have seen the red molecule)The correspondence between numbers and colours is:
1 Blue 30 Blue-ish purple 60 Purple 90 Red-ish purple 120 Red 150 Orange 180 Yellow 190 Light brown 220 Soft green 240 Green 270 Funny green 300 White-ish green 330 Light blue 360 BlueAlways when WHAT IF prompts you for a colour you have to give a number between 1 and 360. Instead of numbers you it is also allowed to type the following colours in English: RED, GREEN, YELLOW, BLUE, PURPLE, ORANGE, CYAN, MAGENTA.
DIRECT (Direct mode is switched on) SHOALL (Funny, no MOL-object and MOL-item?) COLOUR COLBFT ALL 0 (Colour all atoms as function of the crystallographic B-factor) %CENTER GOSometimes, after long sessions, DIRECT gets confused, especially in the X11 version. If you don't see anything at the screen, kill the program with control-C, and start again (read crambin in, and repeat the above commands).
We saw two things. After switching on DIRECT mode, you are not asked to give a MOL-object or a MOL-item, things are shown DIRECTly, which gave this option its name. Also, if you change the colours, they are updated immediately at the screen, without the need to make a new MOL-item. You see that all but one residues are more or less blue. That is because in crambin there is one tyrosine with a much higher B-factor than all other residues, and the B-factor range is mapped linearly on the range from blue till red. Type:
COLBIN ALL GOWith the COLBIN option you have made a non-linear mapping of the B-factors on the colour range. The mapping is such that there are equally many atoms in every colour bin. It still holds that the more red the higher the B-factor, but you can no longer re-calculate the B-factor from the colour. Now type
COLRNG 300 100 COLBFT ALL 0 COLBIN ALL DIRECT (Direct mode is switched off again)You now see that the famous high B-factor tyrosine is red-ish, but the lower B-factor atoms are green. The colour to B-factor mapping runs backwards, from 300 to 100, or from green-ish to red-ish.
The most useful colouring commands are:
COLATM Set default atom colours. COLZNS Colour zones of residues. COLBFT Colour atoms by B-factor. COLHST Colour residues by secondary structure. COLPRP Colour residues by property. COLSPC Colour residues according to predefined schemes. COLBB Colours the backbone. COLSCH Colours side chains. COLTAB Colour residues as function of a table value. COLTYP Colour residues of certain type(s). COLBIN Divides colours over equally populated bins. COLRNG Set the extremes of colour ranges.See what you can do with the COLSPC command. It is very useful!
Display crambin, however,
The N- and C- terminal residues in red, tyrosine 29 coloured as function of the B-factor, all cysteine side chains completely yellow, and the rest coloured by atom-type.
Display all atoms, but: From 20 till 25 display only alpha carbons, from 11 till 15 and 17 till 19 display only backbone.
Good luck, you will need it, and if there are more people in the course, just look around and try to prevent other participants from killing themselves or the course teacher....
ACCESS Van der Waals and accessible surface options. ANACON Analysis, evaluation and visualization of contacts. BUILD Building proteins, adding residues. CHIANG Torsion angle evaluation, manipulation, analysis. CHECK Check if a molecule has errors of any kind. COLOUR Colouring atoms residues molecules, objects. DGLOOP Structure fragment database. DRUG For drug design related options. GRAEXT Special graphics. Arrows, ball and stick models etc. GRAFIC General 3D graphics menu. GRATWO 2D Graphics menu. (Phi-Psi plot, B-factor plot, etc.) HBONDS Hydrogen bond determination, evaluation and display. LABEL Labeling atoms, residues, etc. MAP Administration and display of maps. NMR NMR related commands. PIRPSQ Sequence options (alignment, model by homology etc.) PLOTIT Plot options. QUALTY Structure quality evaluation, mutant prediction. REFINE Structure regularisation. SCAN3D Relational protein structure database handler. SEARCH Interactive search for structure characteristics. SETPAR Parameter (re-)setting. SETVDW To alter Van der Waals radii. SOUP Molecular administration (read/write/delete). SUPPOS Superposition of molecules, residues, fragments. SYMTRY Symmetry matrix administration/application. TABLES Spread sheet for atomic data. WALIGN Multi sequence alignment. WATER Manipulation of water molecules. 3SSP Automatic multiple structure superposition.Additionally, there are menus that provide an interface to external programs:
CONOLY Interface to Connolly's programs. GRID Interface to Goodford's GRID program. GROMOS Interface to GROMOS. HSSP Interface to HSSP files (mutability prediction). PORNO To do molecular pornography (=really beautiful pictures) RIBBON (In PORNO) interfaces to M. Carson's RIBBONS program. PLUTON (In PORNO) interfaces to T. Spek's PLUTON program.By now you should have realized that many commands are a combination of two groups of three characters. These groups of three characters always have the same meaning. E.g GRA-ACC, TAB-GRA, GRA-HSP, etc., all send something to the graphics window. Try to understand what the following three letter codes do:
3SP AA ACC ALL ANA AT CHI CHK CLU COL CON CYS DEL DGL EDT EVA FAM GRA GRI GRO HBO HSP HST HYD INI ITM LAB LST MAP MOL NEU NEW NMR PAR PIR PLT PST QUA REF RES RIB RNG SAV SCN SET SHO SOU SRF STR SUP SYM TAB VAC WAL WAT WRE XRA etc.
INISOU GETMOL 1CRN Crambin MUTATE 13 N (The experimental version does a better job, but is much slower) ARG (You can also use the 1-letter code R) GRAFIC SHOALL 1 A CENTER GOIt will probably take you a while to find the arginine at position 13. However, when you find it you will see that it is not modelled very intelligently. So, lets fix it. Pick CHAT, and type:
DEBUMP 13 0.25 (You can just hit RETURN, because 0.25 is the default) 3 (We will remove bumps by rotating Chi-1,2,3,4 in 120 degree steps) SHOALL 2 B GOYou now have the situation before the debumping in MOL-object 1, and the one after debumping in MOL-object 2. What do you think about this?
DEBUMP tries all conformations of a side-chain till it finds a conformation that is free from Van der Waals clashes (bumps).
Type:
%SHOSOU CUT (Tell WHAT IF to split the molecule) 2 (Make the split after residue 2) SHOSOUYou see that after a CUT, the molecule is split in two parts. Now type:
DELETE 16 SHOSOUNow we have three molecules. The first CUT is of course only administratively, the second CUT, made by deleting a full residue is a real gap in the real molecule. Type:
PASTE 2 PASTE 15You see that WHAT IF detects correctly that the first CUT was not real, because it mumbles something like "CUT flag removed". The CUT around residue 15 is real, and now the PASTE puts them together in a non-chemical way. If you were to issue the SHOALL command and to look at this at the graphics window you would see a strange long straight line from residue 15 to 16 (formerly 17...).
INIPAS SHOSOUSo, the command INIPAS resets all CUT and PASTE flags. Residue 2 and 3 are normally connected, but 15 and 16 (17) are not.
GETMOL 1CRN Crambin GRAFICNow go to the accessibility menu with the ACCESS command and type:
SETACC ALL 0 For now, just hit return on the environment question (if there are some)WHAT IF needs to know which molecules need to be looked at when accessibilities are calculated. E.g., if you have a dimer and want to calculate the surface area of the contact interface you need to know the accessible surface area of the two monomers, and subtract the accessible surface area of the intact dimer. This would give twice the requested interface area.
WHAT IF is smarter than that. To calculate the accessibility of a monomer you would only put the monomer itself in the environment. Thereafter you repeat the calculation but now you tell WHAT IF not to forget the other half of the dimer by putting both monomer molecules in the environment. The difference is the interface surface area.
There are several things you can do with the calculated accessibilities, Type
SHOACC ALL 0 ANASRF ALL 0You see, several ways to evaluate, summarize etc. the calculated accessibility values. The other options are (there are more under MORE) but those are less useful:
SETACC Calculates solvent accessibilities. VACACC Calculates the accessibility for residue in vacuum. INIACC Resets solvent accessibilities to zero. SHOACC Does some accessibility statistics. ANASRF Analyses buried and accessible surface. INIENV Cleans the environment information. PARAMS Brings you in the accessibility parameter menu. MORE Activates more commands in this maneType MORE. What happened? Well, several menus have a MORE, but those extra options are for the experienced users. So, quickly type LESS to get rid of those extra options, and continue with this exercise.
To visualize the surface type:
GRAFIC SHOALL 1 Q GRAACC ALL 2 W CENTER GOStudy the colour of the dots. Conclusions?
With WHAT IF you make quasi density. This is often used in WHAT IF without you knowing it. E.g. density distributions of database hits are displayed by making quasi electron density maps out of them. In that case the grid points in the density map represent probabilities. It is also possible to put a function of the distance to the nearest atom in the electron density. In this case you can contour the map at a height that relates to the radius of the probe for which you want to see the accessible surface. Type:
MAP SRFMAP (To start the surface map generation option) N (This question is the result of my stupidity, sorry, Gert.) TEST (The map will be stored in a file called TEST.WMP) TEST SURFACE MAP (Title to recognize the map later) ALL 0 (We take the whole molecule) 1.4 (We want probes from 0.0001 til 1.4 A radius) 3.0 (Use the maximal probe radius + 1.0 + a little bit. WHAT IF will display a table that correlates probe radius with the contour level to be used later) 0.6 (That means that the precision will be 0.3 A) PARMAP (We are going to tell WHAT IF what to contour) 1 (We only have one map. There can be 10 in memory) Y (We will center on residue 1) 1 15 15 15 (We will look at a small box only) 30 (We contour for probes with 1.0 A radius) 60 (The map will be purple) GRAMAP 3 E (Finally we do something real...) GO Pick the GRAFIC pull-down menu Pick CENTER in this pull-down menu Double-click anywhere to remove the pull-down menu and pick CHAT after some time. After a while, click CHAT.
Type:
BUILD PARAMS HELIX END INIBLD PHE CBLDS 1 ARG SER LEU LEU GLU CYS LEU ILE LYS GLY 0 GRAFIC SHOALL 1 A CENTER GOYou see that you have created a short, ideal helix. To get a piece of strand attached to it type:
END (To go back to the build menu) PARAMS SHEET END (To go back from parameter menu to BUILD menu) CBLDS 11 (We want to start building after the C-terminal residue) GLY THR ILE ASP CYS THR ILE GLU 0 GRAFIC INIGRA SHOALL 1 A CENTER GOLater we will learn how to make this administratively correct molecule also a bit more plausible from a chemical point of view.
GRAFIC SHOALL 1 A CENTER HBONDS GRAHYD (To display the polar hydrogens) ALL 0 (for all atoms.) Y (Cones are potential H positions that are not spatially fixed) 2 B GOToggle MOL1 and MOL2 on and off a couple of times.
You have displayed all polar hydrogens. These are not in the SOUP. That will only be possible in version 5.1 or higher. They are also not pickable. You see that WHAT IF can calculate fixed positions for many hydrogens, but not for those at Ser-Og, Thr-Og, Tyr-Oh, Lys-Nz, and the N-terminal backbone N. Be aware that in cases where the proton can have two positions, both positions are drawn, although there can be only one present at a time.
Type:
SHOPAR (The maximal allowed angular errors and distances) SHOHBO ALL ALL (Try to understand the H-bond list output) (Now make your text-window a bit wider...) 3 E %ACON 30 (Set the screen center on the C-alpha of residue 30) GOWhen placing the hydrogens that are involved in H-bonds WHAT IF tries to make the hydrogen-acceptor distance as short as possible, and tries to make the angles over the acceptor and over the hydrogen as close as possible to 0 or 180 degrees.
Analyze the bunch of lines around residue 30. You see that WHAT IF shows all possible H-bonds. It can not yet decide on a most probable consistent subset. That requires the HB2*** options, but those are not meant for novices. With the HB2*** options the best possible hydrogen bonding network will be determined. In case of interest, ask your course teacher.
MAKSCR (Normally you would make a script with the editor, but for the tutorial, one script has been prepared for you) $more SCRIPT.BLD (You see the script to build a small protein at the screen)There are some restrictions to what you can do with script files.
Don't put too many graphics commands in a script file. Be careful with the GO command. Don't put too complicated type-ahead lines in a script file.Make sure you have a clean and empty WHAT IF. Type:
SCRIPT SCRIPT.BLD (Capitals obligatory as this is a file name) GRAFIC INIGRA SHOALL 1 A CENTER GOAnd if all went well, you have the same molecule at the screen as in the previous session. Now, get the SCRIPT in the editor with the command
EDT SCRIPT.BLDand add at the end:
GRAFIC INIGRA SHOALL 1 A CENTER GO ENDExecute this script. You see that now also the graphics is done by the script. But BE AWARE: You are still in the script, which implies that you can not use all options that normally are available. For example, use the SOUP pull-down menu and try to use the SHOSOU command. You see that it does not work. Pick CHAT and you see that the script will end.
Make sure you have a 'clean' crambin in the soup (%INISOU, INIGRA, and GETMOL 1CRN Crambin or GETMOL 1CRN etc). Type:
COLOUR COLMOL 1 120 GRAFIC SHOALL 1 A %RANDOM 1 (Adds a little random translation to all atoms in molecule 1) ALL 0 %RANDOM 1 (Adds a bit more random translation to all atoms in molecule 1) ALL 0 %COLMOL 1 180 SHOALL 2 B REFINE (That is the regularisation menu) REFI ALL (We want to 'correct' the geometry of the whole molecule) END %COLMOL 1 240 SHOALL 3 C CENTER GOThe command RANDOM will of course never be used in real life. Why would you want to make a molecule bad? RANDOM was only made to teach you how REFI works!
Now carefully compare the three models. Red is what it should be. Yellow is mugged up. Green is mugged up and corrected again. It is clear that REFI has improved a lot, but not made it perfect. Be aware that REFI cleans up the geometry, but not the energetics.
Make sure you have a 'clean' crambin in the soup. Type:
%COLMOL 1 120 SOUP DELETE 36 (This way we make a big gap in the molecule) GRAFIC SHOALL 1 A REFINE CRUDE (This does some very crude things to close the 35-36 gap) ALL 10 REFI ALL (No we do some finer geometry fixing) END %COLATM ALL 0 SHOALL 2 B CENTER GOYou see that the geometry around the closed gap is not brilliant, but at least it is good enough to feed to a molecular dynamics and energy minimization program. WHAT IF has for that purpose an interface to GROMOS. In one of the next paragraphs we will use it to further clean up this molecule, so don't loose it.
INIGRA ANACON (Brings you in the menu to ANAlyse CONtacts) %COLHST ALL 0 (Colour by secondary structure) CONRES (Contact plot at residue basis) ALL ALL 1 (If there is less than 1.0 A between the Van der Waals 1 A radii, it is a contact) GONow pick the SEQUNC button at the bottom of the screen. After some translation you will see the picture, as given below, on the screen. The bottom bar shows the sequence and the secondary structure coloured as a function of the secondary structure. (Blue=helix, orange=strand, green=turn+loop)
In versions below 4.9 the menu at the right side of the screen changed. In higher versions it stays the same. You can pick the two dimensional contact plot by picking its lower left corner. Try a few. You can also pop up the local structure around a contact. Pick NEIM. You see that you are now prompted to pick something, rather similar to the normal NEIM option that you have tried in one of the first examples. Pick a box in the contact plot. You now see the local structure in the molecule that gave rise to this box in the contact plot. The contacts are drawn in as dashed lines. If you pick CONT, followed by picking this box the screen get centered on this box.
These pick possibilities always exist in two dimensional graphics. Also in the example on the next page (Ramachandran plot).
There is another way to analyze contacts. Make sure that you are still in the ANACON menu and type:
GRAFIC INIGRA %COLATM ALL 0 SHOALL 1 A END CONTAC (Now we will analyze individual atomic contacts in 3D) ALL ALL 0.0 (That means that the Van der Waals radii just touch) N (We will not use symmetry) 2 B GRAFIC CENTER GOYou saw a list of all contacts running over the screen. This is typically something to get on paper. Lets try:
DOLOG (Tell WHAT IF to create a log-file) TEST.LOG Test 1 (We write the comment `Test 1` in the log file) 0 (No more comments to be written in the file) %CONTAC ALL ALL 0.0 N 0 (We don't want to display it again, do we?) NOLOG (Tell WHAT IF to stop logging output)Now you have a file called TEST.LOG with all contacts in it. On my machine at the EMBL you have to type:
$ lpr -Pps17a TEST.LOGto get this file printed, but this completely depends on your system setup.
INIGRA GRATWO (Brings us in a special menu for two dimensional graphics) %COLHST ALL 0 PHIPSI 1 (We can only use one molecule at a time) 1 A 2 B (Most people like these rather arbitrary lines) GOAnd now control is automatically passed to the graphics window. The same pick possibilities exist as in the previous example. First pick the four outliers. Think about the colouring scheme. Can you think of more informative colouring schemes? What about colouring by accessibility? Just try it.
%COLATM ALL 0 (To recover from the previous exercise) SUPPOS RANGE1 6 15 (This is the range on which to superpose) 0 RANGE2 22 31 (This is the range to be superposed) DOSUP (Calculate the superposition matrix and the RMS of the result) APPLY 22 31 (Apply the matrix to the range) GRAFIC %COLZON (We want to colour the superposed range) 22 31 0 180 GRACA (Lets only look at alpha carbons for clarity) 6 15 22 31 0 1 A ACON 10 GOYou see that the green trace is a bit longer. That is because WHAT IF draws always half bonds to the previous or next residue if possible. The stretch 22-31 has been forcefully moved away from its normal position. Residue 22 no longer has an N-terminal neighbour, and 31 lost its C-terminal friend.
Lets fix crambin again. Type:
SUPPOS UNDO 22 31 (UNDO does the inverse of APPLY) GRAFIC INIGRA %COLATM ALL 0 SHOALL 1 A (And that should look very normal again) GOThere is also a way to superpose molecules or fragments without telling WHAT IF first which ranges to superpose. Type:
SUPPOS %INISOU INIGRA GETMOL (We need another molecule for this example) 1rhd (Don't use capitals here, it is a file) Y (If 1rhd is not found, skip to the next paragraph) WHATEVER SHOHST PARAMS MINLEN 21 (This is not really needed, but it goes much faster this way) END MOTIVS (We will look for common motives in two ranges) Y (This speeds things up. Say no if no answer is found) Y (We are not interested in a log file) 1 150 (The N-terminal domain of this molecule) 151 292 (The C-terminal domain) Y (So we can see which motives WHAT IF recognized) 0 %APPLY 151 292 %COLHST ALL GRAFIC GRACA ALL 0 2 B CENTER GOAnd now you see in MOL-object 2 how beautifully WHAT IF superposed the two domains. There is NO detectable homology between the two domains.
CHECK FULCHK 1CRN Crambin (Be ready to use NO-SCROLL...) FULLSTOP (You have to get out of WHAT IF, otherwise you always get Y a report)If Latex is available on this computer use it on the file 'pdbout.tex' and look at the results. Otherwise, look at the text file 'pdbout.txt'.
You can also look at individual checks. For example Quality Control.
The most powerful model checking tool is packing normality analysis, or quality control. Type:
%INISOU GETMOL 1CRN Crambin CHECK QUACHK (To start quality control over the whole soup) FULLSTOP (The same as above) YWHAT IF will tell you the the score is -0.435. And that is good. The figure below shows what the quality control numbers mean. For individual residues the rule of thumb is that -5.0 or worse means that something is rotten. That can either be a modeling or X-ray error, or the residue is in an active site, at a crystal contact or something else special.
-3.0 Guaranteed wrong structure Bad structure or poor model -2.0 Probably bad structure or unrefined model Doubtful structure or model -1.0 Structure OK or good model Good structures 0.0 Good structures
PIRPSQ (Brings you in a sequence oriented menu) GETPIR (To read a sequence in PIR format) old.seq (That sequence must be EXACTLY the one of crambin) .. (To repeat the previous command: GETPIR) new.seq (The sequence to be modeled, aligned on old.seq) BLDPIR (To start the model building by homology) 1 (The old sequence was read first) 2 (The sequence to be modeled second) ALL (The structure that corresponds to old.seq) Y (We will use the good method, slower, but better)After two minutes or so, crambin in the soup has been replaced with the new model.
The model sometimes contains some bumps that can be resolved with very small (up to two degrees per torsion angle maximally) rotations around the side chain torsion angles. To do so type:
DEBALL (DEBump ALL (or many) residues) N (We normally do not have/need such a file) ALL Y (Otherwise torsion angles can rotate up to 120 degrees) 0.25 (Default, normally OK)If you see any residue with a bump value above 1.0 you can repeat the DEBALL cycle once more. If that still does not help, manual intervention might be required. If you type %SHOSOU you see that you now have two molecules. Use the REFINE menu to solve this problem. After you succeeded determine the 'Quality Control score. If you did not do things too bad the score could look like:
1 THR (1 ) : 1.055 2 SER (2 ) : -1.635 3 CYS (3 ) : 4.034 4 CYS (4 ) : 5.378 5 PRO (5 ) : -1.556 6 SER (6 ) : 5.072 7 ILE (7 ) : 1.929 8 VAL (8 ) : 1.829 9 ALA (9 ) : 1.093 10 GLU (10 ) : 1.369 11 SER (11 ) : 1.454 12 ASN (12 ) : 1.864 13 TYR (13 ) : 1.919 14 ASN (14 ) : 1.365 15 VAL (15 ) : 0.344 16 CYS (16 ) : -0.800 17 ARG (17 ) : -4.696 18 LEU (18 ) : -4.229 19 PRO (19 ) : -2.343 20 GLY (20 ) : -2.731 21 THR (21 ) : -1.881 22 PRO (22 ) : 2.328 23 GLU (23 ) : 0.643 24 ALA (24 ) : 0.692 25 LEU (25 ) : 1.668 26 CYS (26 ) : 1.604 27 ALA (27 ) : -0.835 28 THR (28 ) : -2.305 29 TYR (29 ) : -4.846 30 THR (30 ) : -3.806 31 GLY (31 ) : -0.790 32 CYS (32 ) : 1.882 33 ILE (33 ) : 1.377 34 ILE (34 ) : -2.244 35 ILE (35 ) : -2.212 36 GLY (36 ) : -0.841 37 ALA (37 ) : -1.595 38 THR (38 ) : -4.241 39 CYS (39 ) : -0.440 40 PRO (40 ) : 1.764 41 ASN (41 ) : -2.338 42 ASP (42 ) : -3.507 43 TYR (43 ) : -2.748 44 ALA (44 ) : -3.366 45 ASN (45 ) : -1.116Average for range 1 - 45 : -0.502 (Which is not bad for a model!)
This model we need later in the GROMOS excercise. Type:
%MAKMOL (We will save our model in a PDB file. MAKMOL sits in SOUP) Hit RETURN when prompted for the header file. BAD.MODEL (That will be the file name) 0 (We write no remarks in this file) ALL 0
SCAN3D (To go to the menu) SETLEN 7 (We will search for database fragments of 7 residues long) HELSHT (We will put secondary structure constraints on the H stretches. The first 4 residues should be helical, the fifth H can be anything, and the last one should be in a strand) H H * S S 0 (No errors are allowed in the search)After .25 seconds WHAT IF tells you that it scanned about 300 proteins and found close to 30 hits. You are prompted for the number of a group. Just use group 1. To see what we have, type:
SHOHIT 1 (That is the group number you just gave) 1 10 (These numbers are installation dependent)The first part of the result could for example look like:
Hit # 1 in database entry : Aatn 291 - 297 : LYS ASP LEU TYR ALA ASN ASN H H H H T S Hit # 2 in database entry : 1bbt 63 - 69 : GLY GLY LEU LEU ARG ALA SER H H H H H T S
1) Read in the molecule 5TIM. Calculate the loss of accessible surface of each dimer as a result of the dimerization.
2) Which proteases have been solved with R-factor better than 25.0 and resolution better than 2.5 A. (Hint: SELECT menu, proteases are often called hydrolases)
3) Which of these proteases sit in the SCAN3D database?
4) Generate (graphically) the cell for crambin, and display all atoms that belong in this cell. (Hint: CELL in GRAEXT menu and GRACEL in SYM menu, plus perhaps a few more options)
5) It has often been written that positive residues prefer to sit near the C-terminal end of helices and negative residues near the N-terminal end. Is this true? (Hint: SETLEN and HELSHT in SCAN3D, and ALLPRF in SCNSTS in SCAN3D)
6) Find all buried unsatisfied hydrogen bond donors and acceptors in a 5TIM monomer. (Hint: Use several options in the SEARCH menu)
7) Repeat the previous search, but now with inclusion of waters.
8) Read crambin, and determine the quality of torsion angles.
9) Do the same for 5TIM. Which is the better structure.
10) Invent your own excercise.
11) Read the TIM dimer. Mutate 2 small interface residues to TRPs. Go to the CHECK menu and check for bumps.
12) Go to the ANACON menu and check for bumps.
13) Read crambin. Draw an alpha carbon trace and add the side chain of Phe 13. Label the Cd1 in this Phe. Plot this including the label. (Hint: PSTPLT in the PLOTIT menu)
14) Now use a command in the label menu to position the label a bit more 'intelligently'. Plot it again.
15) Make a script that reads crambin, and puts it at the screen coloured by B-factor.
16) Read 5TIM. List the waters that are stuck between the two monomers. (Hint: look in WATER menu).
17) Read crambin. Determine its quality with the RNGQUA option in the QUALTY menu.
18) Now mutate Phe 13 to a Tyr, using the experimental method. Determine the quality again. Any conclusions?
19) Read 5TIM. Superpose the two monomers. Now colour by fitting error. (Hint: RANGE1, RANGE2, DOSUP, APPLY, COLDIF, all in SUPPOS menu).
20) Calculate the accessibilities in crambin. Write the LISTA output for the first 10 residues to a file. (Hint: SETACC, DOLOG, LISTA, NOLOG).
21) Use the SETVDW menu to make all VdWaals radii the same (e.g. 1.7A). Repeat step 20, and compare the results. Any conclusions?
22) Produce a printed output of tables with: residue type, accessibility, phi, psi, omega, and secondary structure. (Hint: TABAA, TABCHI, TABHST, DOLOG, TABSHO, all in TABLES)
23) A few residues in crambin have non-perfect backbone torsion angles. Use PHIPSI in GRATWO to find out which ones those are.
24) Read the writeup about the commands SRFMAP, PARMAP and GRAMAP. Make a surface map of crambin.
25) Make a cavity map using AACAVI and a probe size of 0.6A. Conclusions?
26) Read crambin. Delete residues 7-9. Reinsert these 3 residues with the DGINS option in the DGLOOP database.
27) Get all alanines from the database (SCAN3D). Make a 3D phi-psi plot. (Hint: SETLEN=1, SEQUEN and STATS in SCAN3D. GRACHI in STATS)
28) Repeat this for proline. Any conclusions?
So, unless this is a course at the EMBL, continue with part 5.
GETMOL (Re-read the original crude model) BAD.MODEL bad-model GROMOS (If there are some files from another GROMOS session, kill them "Y") PARAMS (To modify EM parameters) STEPS 5000 (5000 steps EM) END FASTEM (Does EM automatically. This will take a some minutes)Now we are ready to do some molecular dynamics. Type:
%INISOU GETGRO (Now we read the last coordinates in the EM run) WRE-EMGRO10.DAT XLets now compare the last MD structure with the original model that we stored in the PDB file BAD.MODEL. Type:
GETMOL (Re-read the original crude model) BAD.MODEL bad-model GRAFIC %COLMOL 2 120 (We make the bad model red,) %COLMOL 1 240 (and the 'good' model green.) INIGRA SHOALL 1 A CENTER GO
GRAFIC SHOALL 1 A ACON 13 MUTATE 13 N ALA (We remove the aromatic ring of Phe 13) GRID (If you or someone else has used this option already once, you are asked here if old files should be deleted. If that is the case, answer with Y) MAKGRN (Neglect error messages, they are only warnings) RUNGRN ALL LSTGRN (To see if GRIN has found errors. This option brings the GRIN output file in the editor. Quit from the editor (e.g. :"q!" for vi) to continue with the rest of this tutorial) MAKGRD (Hit RETURN to get a list of allowed probe types) C1= RUNGRD (This takes half a minute) GRIDTEST (That will be the name of the potential energy map) TEST GETGRD (Read the GRID energy map as if it is electron density) 1 (That is the electron density map header for this case) MAP (We go to the MAP menu to contour this potential energy map) SHOMAP (Read under extremes for the extreme energy (-4.25)) PARMAP 1 Y 13 20 20 20 -2 (To contour at -2.0 Kcal/Mole) 60 (Purple) GRAMAP 2 W GRAFIC GOYou see that the phenylalanine 13 side chain (that is still visible because we made MOL-object 1 before we mutated to alanine) is sticking reasonably well into density that indicates a high potential for aromatic groups.
Make sure that you have a clean crambin in the SOUP and type:
PORNO PLUTON (To start the PLUTON interface) 13 0 (We only want to look at one phenylalanine) TEST TESTAnd once you are in PLUTON, recognized by the >> prompt, type:
ROD SHADE PLOT (You can not rotate this plot, it is static) QUITThere is HELP in PLUTON if you want more info.
Now we will test RIBBONS. Type:
RIBCPK ALL (We want a ribbon of the whole molecule) ALL (We also want the whole molecule space filled) 0 0You now get a new window. Install it. Put the mouse in it. Push the right mouse button. Drag down till models. Drag to the right. Drag in the MODELS pop-up menu to 'rib10.model' and release the mouse button. With the middle mouse you can rotate this figure. With the left mouse you can scale it. Push the right mouse button. Drag down till EXIT and release the mouse button.
RIBBONS comes with a large writeup. But just picking around a bit should be sufficient to figure out all the options. If you want RIBBONS you can buy it from Mike Carson, and I will than deliver an altered version of RIBBONS that works well together with WHAT IF.
NEURAL (To get in the neurotic menu) EXAMPL (To copy some example files to the local directory) GETSET (To read a dataset) TRAIN.NEU (Capitals required, it is a file) NETWRK (To define the neural network architecture) 2 5 2.5 5.0With NETWRK, 2, 5, 2.5, 5.0 you created a network architecture consisting of 2 hidden layers of 5 nodes each. WHAT IF will try to keep the values of the junctions between -2.5 and 2.5, but junctions outside -5.0, 5.0 are forbidden. Continue typing:
TRAIN N 200 SHOSETWith TRAIN and 200 you told WHAT IF to do 200 rounds of network optimization. This will take a couple of minutes on an INDIGO workstation. You will see the error probably converge around a value of 0.2. That is a little bit bigger than the error that I put into this dataset (0.14). (Try more and wider hidden layers overnight, and you will see that the error can get smaller. This is called over-training. The network learns the data by heart, rather than that it extracts the hidden correlations). The `SHOSET` command gives two sets of output the first half shows the input values, the observed results, the calculated results, and the error in the calculated results. The second half also displays the tolerance of the net (see below). Below you see a dataset that has the answers given. The file without the answers is called TEST.NEU. So, type:
GETSET (To read another data set made with the same function and noise) TEST.NEU (Capitals, it is a file) SHOSET (Apply the net to the test dataset)The second SHOSET command does the same as the first, but now the errors are of course irrelevant. You should just look at the calculated answers. The true answers are given below. If you were to take the trouble of calculating the RMS between the expected and calculated values in the test set, you would probably find an RMS around 0.7. That nicely indicates one of the problems of neural nets. They are black boxes, very deep-black black boxes.....
1.823 1.311 3.633 | 0.424 0.140 0.549 | 0.906 1.296 2.603 0.129 0.690 0.605 | 1.472 0.419 1.728 | 1.013 0.226 1.155 1.202 0.733 1.836 | 0.409 1.550 2.984 | 0.681 1.092 2.003 1.511 1.764 4.697 | 1.397 1.096 2.740 | 1.462 1.560 3.916 1.772 0.221 1.949 | 0.146 0.777 0.907 | 0.871 1.240 2.530 0.959 0.482 1.267 | 0.274 0.907 1.185 | 0.453 1.726 3.545 1.355 0.504 1.620 | 0.782 0.658 1.283 | 1.076 1.002 2.194 0.515 0.201 0.712 | 1.666 0.574 2.175 | 0.140 0.430 0.330 1.565 0.476 1.839 | 0.778 1.875 4.439 | 1.266 0.920 2.299 1.222 1.545 3.663 | 0.473 0.609 0.874 | 1.982 0.616 2.367