NAOMI USER GUIDE (Version 2.4)
The NAOMI 2.4 User Guide

by Simon M. Brocklehurst


Contents
  • Advanced side-chain entropy methods
  • Tertiary structure prediction
  • Graphical output and interfaces to molecular graphics software
  • Miscellaneous
  • Known bugs
    What is NAOMI?
    NAOMI is a sophisticated computer program system for studying the three-dimensional structure of proteins at the atomic level.
    • General structural analysis
    • Dealing with problematic pdb files
    • Simulating dynamic properties of proteins
    • Prediction of protein function from 3-D structure
    • Protein Engineering and Design
    • NMR structure refinement
    • Tertiary structure prediction
    • Prediction of protein folding pathways

    Why use NAOMI?
    Many of the features offered in NAOMI are presently unique, but subsets of its features (particularly in the general structural analysis section) are offered in varying ways by other software. In developing this program, high on the list of priorities were: making the program easy to set up, learn and use (making the output intuitive to understand); incorporation of high quality (in terms of results and efficiency), novel algorithms. NAOMI is currently used by structural biology and chemistry laboratories throughout North America, Europe and Japan.


    Getting up and running with NAOMI

    Installation

    NAOMI has a built-in license manager. You need to set an environment variable so that NAOMI can find the license file it requires in order to run. Put
    setenv NAOMI_LM directory-name
    
    into your .cshrc file so that it will be set each time you log on. The directory name should be the name of a directory on your system that contains a file called NAOMI_lm.lic.

    NB You need a terminal "/" on the end of the directory name.

    Typical contents of NAOMI_lm.lic might be:

    Licenses for my computer
    FEATURE 1 F6483F345229CDE
    FEATURE 2 953B34483FF
    FEATURE 3 38BC82F342E
    
    Feature license keys are available from the author. Feature 1 is the main key to allow the program to work. Feature numbers higher than 1 "switch on" other features that are not available by default. For example FEATURE 2, allows the Protein Engineering and Design module to work.

    Files that you need

    The basic requirement for running NAOMI is a file that contains the "script" specifying what you want to do with your protein structure. This must contain instructions giving an estimate of the number of residues in the protein. Typically, you might have a "standard" memory allocation of 500 residues (probably you will never need more than 7000 residues). Don't routinely use a residue limit much larger than you need - it wastes system resources. It's a good idea to make a file called "alloc_memory.inp" that you can include in all in your NAOMI input files by using the @ symbol. So create a file called "alloc_memory.inp" which should contain something analagous to:
    ! Standard dynamic memory allocation include file for medium sized proteins
    residue limit 500
    residue library size 20 /usr/local/lib/naomi/lib/res_lib1.pdb
    
    with the pathname for res_lib1.pdb set appropriately for your system (Check with your local NAOMI expert where the default residue library has been installed).

    A basic input file is given below.

    An example input script file to get NAOMI up and running (this is a simple example, you'd probably not actually want to ever do it for real)

    Edit a file, perhaps you could call it "test.inp" (but it doesn't really matter. Put in the following lines to make a NAOMI script.
     @alloc_memory.inp
    
     set report dir ./reports/
     
     set pdb dir /nike/old_eve/smb18/pdb/
     
     read pdb 1lyz.pdb
    
     zone 1 5
     calc phi psi
     write rnum rname phi psi nl
    
    This tells NAOMI to do the following things:

    Include the contents of the file alloc_memory.inp into the script. (See above)

    Set the directory where some output will be directed. NB THIS DIRECTORY MUST EXIST BEFORE YOU RUN THE PROGRAM (use the UNIX mkdir command)

    Set the directory from which you wish to read the pdb file

    Tell NAOMI to read the pdb file 1lyz.pdb (the coordinates of lyzozyme)

    zone 1 5 means work on residues 1 through 5 only

    then calculate the main chain dihedral angles phi psi for these residues

    Finally, for the active zone, for each residue write out the residue number, residue name, phi psi. The nl command says put a new line character here. The reason for this is explained in the documentation of the write command.

    Running naomi, using the example file

    The executable must be in a directory on your path.

    Then typing

        naomi < file1  > file2
    
    will read the input commands from file1 and place the output in file2. So type
             naomi < test.inp
    
    The following output (or something similar) should appear on your screen:
    |-----------------------------------------------------------|
    |--------------------   N A O M I   ------------------------|
    |                 Simon M. Brocklehurst
    |***********************************************************|
    |*                     Version 2.0                         *|
    |***********************************************************|
    |* It is a condition of use of this software that you cite *|
    |* the reference(s) given below in any published work.     *|
    |*                                                         *|
    |*  (1) Simon M. Brocklehurst & Richard N. Perham (1993)   *|
    |*      Protein Science 2, 626-639                         *|
    |*                                                         *|
    |***********************************************************|
    |___________________________________________________________|
    Including file alloc_memory.inp
    |***********************************************************|
        This version was last updated on Aug  9 1994,10:22:05
       This output file was produced on Wed Aug 17 13:57:06 1994
    |***********************************************************|
    
    1lyz.pdb
      1  LYS   n/c   122.5  
      2  VAL  -98.7   126.9  
      3  PHE  -73.0   166.2  
      4  GLY -111.1   159.0  
      5  ARG  -56.4   -77.0  
    

    iv) If you have acces to an WWW browser with FORMS, then you can use the NAOMI HTML program launcher.


    Command Syntax

    Many of the things you can do with NAOMI don't require a lot of complicated commands to be typed in. So in order to make things as clear as possible in these instructions, I've organised things in sections, like: Calculating hydrogen bonds.


    Including another NAOMI script, within a script

    Use the command

       
        @filename
    
    to include a previously prepared script into the current script. Of course, you can have as many "scripts within scripts" as the memory of your computer will allow.
    Reading a pdb file into NAOMI

    This is done in two parts i) allocating memory, and directory setup, ii) reading in the file.

    i) Allocating memory and Directory Setup


    Allocating memory

    Before you read in the file, you must allocate some memory for the protein. Although NAOMI allocates memory dynamically, it does need an upper limit of the number of residues in the protein. Usually imposing an upper limit of 2000 residues will suffice (rarely do you need more than 7000). If RAM is tight on your computer, you can set the limit to a much lower value. The command to use is:
    	      residue limit integer
    
    where integer would be say 2000.

    NOTE this memory allocation should only be done once at the start of a script - if you read several pdb files in a script, NAOMI will deal with it. In other words, the residue limit command just says that in a given script, you will never be looking at a single pdb file with more than 2000 residues in it.

    If you are doing any modelling, you should also allocate some memory for the amino-acid library which contains examples of standard amino acids. The command to use is

         residue library size integer pathname
    
         e.g.
    
         residue library size 20 ~/naomi/lib/res_lib.pdb
    

    (see section constructing a residue library)


    Directory setup

    NAOMI requires access to two directories. The first is the so-called "report" directory. This is where some output will be directed e.g. some error messages that are not output directly to you specified output file. You must have the line
    	set report dir directory
    
    where directory must have the terminating "/" character e.g.
    	set report dir /usr/people/smb/naomi/
    
    or
    	set report dir ./
    
    to simply set the current directory as the reports directory.

    The second directorty is where the pdb file(s) you wish to work on are held. This might be a personal directory, or the root directory of your local Brookhaven distribution (see the idcodes command).

    Use the command

    	set pdb dir directory
    
    to set this analagously to the set report dir command i.e. include the terminal "/".

    ii) Reading a pdb file

    After having performed the initial set up, use the command
    	read pdb filename
    
    to read in the coordinates for the structure (an approximation to Brookghaven or X-PLOR file formats is expected). NAOMI is reasonably intelligent about this, you will find that many files that other programs "choke" on, can be read. You can make NAOMI write out a "proper" pdb file, using the "correct" and "validate" commands.
    Repeating the same commands on several PDB files

    For some kinds of analysis, you will want to be able to repeat a set of commands on several pdb files e.g. analysing ensembles of nmr structures, doing some analysis of the whole protein databank etc.

    In NAOMI, a special kind of loop is available to let you do this. What you need to create is a file with a list of PDB files in that you want to use. Then any commands inside the curly braces will be performed on all of the files contained in the file called filename (the files should be found in the directory specified with "set pdb dir").

    	        for pdb list filename
    	        {
    	
    
    	        }
    
    The "for pdb list" effectively replaces the "read pdb" command that is used for operations on single files
    Brookhaven directory structures and idcodes

    The current release of the Brookhaven Protein Databank (1994) has a directory hierarchy to make it quicker to browse the databank. The command
                     idcodes on
    
    means that you should use the brookhaven idcode instead of a filename. This allows you to analyse the whole pdb easily. NAOMI will then look for the file in the appropriate place. You should set the directory where your brookhaven files are with the "set pdb dir" command. e.g.
    	        set pdb dir /brookhaven/distr/
    	 
    	        idcodes on
    
    	        read pdb 6PTI
    
    will read the file /brookhaven/distr/pt/pdb6pti.ent. The command "idcodes off" means that actual filenames should be used.


    Selecting "active residues" of a protein

    Any residues in the protein may be selected for use in analysis and computation. By default, all the residues are selected. Two commands are provided to make selections. If you want only to use a single contiguous range of residues, use the command

    	zone  res1 res2
    
    to select a specified contiguous subset of residues in the protein for further work. This erases any other selection you have made earlier on in the script

    NB if the residue has a chain identifier, then this must be concatenated to the the residue id e.g,

     	zone C154 C198
    
    To (re)select the entire protein for further work, use
    	zone all
    
    If you want to build up a more complex residue selection, use the "select" comand viz:
    	select res1, res2, res3:res4, res5 etc
    
    This allows both the selection of individual residues and ranges of residues (res3:res4 selects a range from res3 to res4). You must place ","s in beween the items in the selection list. The select command does "not" erase previous selections. The following example shows how you could select residues 1, 5, 20, 31,32,33,34 and 40
    	zone none
    	select 1, 5, 20
    	select 31:34, 40
    
    Note the use of "zone none" to initially deselect all residues.

    Wildcards are also available with the select (and select2) commands e.g.

    	zone none
    	select A*
    
    would select all residues in chain A.
    "calc" and "write"

    Simple calculations, relevant at the residue level, are performed by using the calc command which has a syntax

          calc param1 param2 parem3... etc
    
          e.g.
    
          calc phi psi short
    
    would calculate the dihedral angles phi and psi, and also a short hand nomenclature which characterises the conformation of a residue based on phi and psi. Calculations are performed for the selected zone of the protein (by default the whole molecule).

    The write command allows the output of the results. The syntax is similar, except you need to indicate where new lines (nl) should be placed in the output e.g.

           write rnum rname phi psi nl
    
    would produce a list of residue number, name and phi/psi values for the active zone of the protein (by default the whole input coordinate file).
    calc:
    	phi
    	psi
    	omega
    	chi1
    	chi2
    	chi3
    	chi4
    	chi5
    
    	short
    
    	curvature   - 	how tightly coiled the polypeptide chain is
    	 	 	at a given residue position...
    
    	hydrophobic -	calculated details of neighbours making
    			hydrohobic contacts in 3-D dimensions.
    	error	    -	residue averages for B-factors (X-ray) and
    			r.m.s.d. (NMR)
    write: as for calc, but also
    
    	rnum
    	rname
    	grid
    
    "protwrite"

    You may output information about the overal protein with the command

    	protwrite
    
    NB It is the users responsibility to ensure that they have previously envoked any relevant calculations with particular "use" commands.

    Possible parameters are shown below grouped into related sections:

    General (requires no calculations)

    	NumberOfResidues
    

    Solvent accessibility (requires "use solvent access")

    	TotalAccessAbsolute
    	TotalAccessHydrophobicAbsolute
    	TotalAccessHydrophobicPercent
    	TotalAccessPureHydrophobicAbsolute
    	TotalAccessPureHydrophobicPercent
    
    So an example script might be:
    	use solvent access
    	protwrite NumberOfResidues TotalAccessAbsolute
    
    The following output might be produced
    	124 2345.345
    
    which would indicate that the protein had 124 residues with a total solvent accessible surface area of 2345.345 Square Angstroms (Note that this figure is not accurate to 3 d.p., rather it is the generic floating point precision output by the protwrite command)
    Program sub-sections within NAOMI

    Some commands within NAOMI will be recognised only if you have entered a particular subsection of the program. For example some parts of database construction and simulation facilities are within a subsection called TYRA. The command to go into a sub-section is the name of the sub-section. To return to the main level, type end. That is,

         tyra
            command 1
            command 2 etc
    
         end
    
    This documention will always tell you when a specific command is part of the TYRA section of the program.
    Making a side-chain entropy database

    Sorry, this section of the documentation is not currently avaiable


    Constructing a residue library

    Sorry, this section of the documentation is not currently avaiable


    Building in "missing atoms" to a protein

    Often, a PDB file will have some atoms missing, for example if a side-chain is not visible in an electron density map, it will usually be modelled as an alanine residue. To build in missing atoms, use the command,

                  repair side-chains
    
    within the TYRA level of commands. This will keep the conformation of the old side-chain where atoms are available, and will set newly placed regions of the side-chain in an extended conformation, or in an appropriate conformation if a ring is involved.

    If you prefer, you can place the missing atoms and energy refine the new side-chains (rather than keeping as much of the old side-chain as possible). Again within the TYRA level do,

                  repair and refine side-chains
    

    Identifying chain-breaks

    3-D protein structures sometimes have incomplete main-chains. In the case of X-ray structures, this is often because electron density for particular residues is either too weak to interpret or completely absent.

    It is convenient for some types of structural analysis to know where such "missing" parts of a structure are. The commands:

    	use chain-breaks
    	table chain-breaks
    
    provide a list of main-chain-breaks, in the form of a list pairs of residues falling on either side of a break in a polypeptide chain. Example output for a protein with two breaks in the chain (in chain C between residues 56 and 62, and in chain C again between residues 73 and 78) is shown below:
    
    NAOMI>OUTPUT List of residues at ends of breaks in main-chain of rec_C.pdb
    NAOMI>OUTPUT C  56 HIS - C  62 GLY
    NAOMI>OUTPUT C  73 THR - C  78 GLN
    
    
    Note you may wish to pipe the output from these commands through egrep e.g.
    	naomi < rec_C.inp | egrep -v "WARNING"
    
    Note: expected breaks in multi-chain proteins are ignored, provided that chain identifiers are properly used in the PDB file.
    Cyclic permutations

    Reconstruction of some symmetric protein structures from their supplied coordinates requires cyclic permutation of the coordinates. The command:

    		cyclic permute
    
    transforms a set of x,y,z coordinates to y,z,x. Thus, three applications of the command will restore the original coordinates i.e.
    	first time:   x,y,z  ->  y,z,x
    	second time:  y,z,x  ->  z,x,y
    	third time:   z,x,y  ->  x,y,z
    
    Use this command in conjunction with the pdb_write to out coordinates at each stage of the manipulation.
    Alterations to atom and residue numbering

    You can renumber residues in a PDB file to be consecutively numbered starting at a given number with the command

          reset_resnum resnum
    
    e.g.
          reset_resnum 4
    
    would number all the residues in the protein so that the first residue is 4, the second is 5 etc.
    Writing out a pdb file

    The command

        pdb_write
    
    writes out the current zone to the reports directory, with the SAME filename as the input pdb file. NB if the pdb directory, and the reports directory are the same - the original file will be overwritten.

    At present this command writes out calculated amide proton data, for use with the "molscript hbonds" command.


    Renaming atoms according to IUPAC nomenclature

    The correct atom names for some side-chain atoms in PDB files depend on the conformation of the side-chain. For many computations (e.g. calculation of three-dimensional structure from NMR derived data) the atoms in a protein need to be named before the conformation of the residue is known.

    Thus it is common to find errors in the naming of some atoms. To rename atoms correctly, use the command

    		validate
    
    which correctly renames atoms in the protein. If you wish to write out the correct structure, either use a script as shown in the example below. This example particular shows how to rename the atoms in an ensemble of structures, but you do the same operation on a single file.
    	correct
    	for pdb list ensem.lis
    	{
    	validate
            }
    
    You could equally well use the pdb_write command within the bracketed loop
    	for pdb list ensem.lis
    	{
    	validate
    	pdb_write
            }
    
    The correct command, is uniquely linked with the validate command. The pdb_write command is a more general command. If you are using X-PLOR format files, you should use the script:
    	for pdb list ensem.lis
    	{
    	terminii X-PLOR
    	validate
    	pdb_write
            }
    
    immediately before the validate command.
    Predicting protein-protein interactions

    Sorry, this section of the documentation is not currently avaiable


    Graphical output

    Interfaces to the programs MOLSCRIPT (P. J Kraulis), INSIGHT II (Biosym), and QUANTA (MSI) are provided to make scientific visualization, and preparation of figures for presentation and publication time-efficient. Intrinsic graphical output, in PostScript format, is also produced for some commands.


    MOLSCRIPT Interface

    Use the command,

        molscript parameter
    
    where parameters can be:
    	    contacts    - input file showing both intra and inter chain
                              hydrophobic interactions schematically
    	                  (do "calc hydrophobic" before this command)
    
    
                hbonds - hydrogen bonds (needs calculated HN positions
                         thus use pdb_write command) (do use hbonds first)
    
                sec_struc   - produce input file for cartoon plot
    	                  (do "use sec_struc before using this command")
    
                sec_struc_col1   - produce input file for cartoon plot
    	                    (do "use sec_struc before using this command")
    	                    rainbow coloured from N (violet) - C (red) 
                                terminii and distance depth-cued
    
                sec_struc_col2   - input file for cartoon plot
    	                    (do "use sec_struc before using this command")
    	                    rainbow coloured from N (violet) - C (red) 
                                terminii and NO distance depth-cueing
    	
    
    
    In addition, the commands
    	molscript on
    	molscript off
    
    switch on and off production of MOLSCRIPT input files when particular commands are executed e.g.
    	predict possible binding sites
    
    can produce an input file to visualize the results, by using the program MOLSCRIPT.
    RASMOL Interface

    You need to configure your account to use the RASMOL interface. Set the environment variable NAOMI_RASMOL_PATH to the pathname of the version of RASMOL you want to use with NAOMI (some sites have different versions of RASMOL compiled for different machines e.g. with 8-bit or 24-bit (32-bit) colour.

    In your .cshrc file, put a line similar to the one below, substituting the "/usr/people..." with the pathname for RASMOL at your site (don't forget to source this file and rehash after doing this).

    setenv NAOMI_RASMOL_PATH /usr/people/smb/smb-bin/bin/rasmol.24
    
    At any point in your script after you read in a structure, you can automatically start up a RASMOL interface by using the command
    	start rasmol
    
    The backbone of the structure will displayed, with each chain coloured differently. The residues that are currently selected within NAOMI will be coloured purple For example, the following script, will read in a structure to NAOMI, start and start up the NAOMI-RASMOL interface, with the A chain complex purple
    	read pdb struc.pdb
    	zone none
    	select A*
    	start rasmol
    
    If you have licenses for various FEATURE modules, you can also use the commands
    	rasmol on
    	rasmol off
    
    To start up rasmol with more complicated representations as described at the relevant places in the documentation for these MODULES.
    INSIGHT II Interface

    The commands

    	insight on
    	insight off
    
    switch on and off the production of BIOSYM COMMAND LANGUAGE (BCL) files. These files appear in the reports directory, and when read into INSIGHT II at the command line, set up new commands in the program which appear as options on pull-down menus.
    QUANTA

    Use the command

    	quanta parameter
    

    Intrinsic NAOMI graphical output


    Executing UNIX shell commands within NAOMI

    Use the system command:
        system string
    
    where string is passed to the shell that naomi was started from.

    Within a NAOMI script, if you wanted for example to compress a file called naomi.dat, you could have the command:

      system compress naomi.dat &
    

    Known bugs

    • non-exhaustive exception handling of the command language at present - if you make an error in the input script, NAOMI will sometimes ignore the command without warning you, and may crash as a result of an input error. This is a rare problem - especially if you don't make typos!
    • Non-exhaustive trapping of memory allocation problems. This will not be a problem unless you are working right at the limits of your machine's capacity (this is likely only to happen if you are working with very large systems i.e. several thousand residues on a machine with less than 64 MB Memory available).

    Assessing bad contacts in a protein

    Within the TYRA level of commands, use the command
             bad contacts
    
    to assess the extent to which atoms are making bad contacts i.e. the extent to which atoms are too close to each other. It returns a number describing the whole protein.
    Setting dihedral angles to specific values

    Within the TYRA level of commands, use
               set torsion res tor angle 
    
               e.g. set torsion A5 chi1 -120.0
    
    would set chi1 for residue 5 (chain A) to -120.0 degrees. tor can be phi, psi, omega, chi1, chi2, chi3, chi4 or chi5. Note if you try to set a degree of freedom that is not appropriate (e.g. a ring-opening torsion), the command will fail. NAOMI should warn you if this happens.

    When main-chain modelling, all residues that are currently selected (with the zone or select commands) and on the C-teriminal side of the dihedral angle that is being changed will move.


    Setting dihedral angles to random values

    Please note, the commands:
    	random seed
    	randomise
    
    are available only if you have a license for the NMR structure refinement module.

    In some cases, when calculating NMR structures from primary data, bias is introduced into the system by using the same starting structure for each calculated structure (especially in regions of structure that are poorly defined by restraints). A similar problem of bias has been noted in other applications where three-dimensional structures are calculated from distance restraints (e.g. comparative modelling, protein folding pathway modelling).

    To minimize the effect of bias introduced into such systems, a facility for generating random families of structures, but having good covalent geometry is provided. Within the TYRA level of commands, use:

    	randomise param family-name n
    
    where
    
      param can be: main, side or both
      family-name the name of the family of structures to be generated
      n is number of members required for the family
    
    For example.
    	randomise main s_ 10
    
    would generate 10 structures with names "s_1.pdb", "s_2.pdb", "s_3.pdb" etc... having the values of phi and psi set to random values for all currently selected residues (omega values are left unchanged from the structure that was read in - if you want to change these these must be set manually using the set torsion command). That is, main specifies that main-chain dihedrals should be randomised, side specfiies that "allowed" side-chain dihedrals should be randomised - both randomises both main-chain and side-chain dihedral angles.

    The family of structures is created in the "report" directory you specify in your input script.

    To obtain reproduceable results, and/or generate several different families of structures, you must specifically "seed" the random number generator. Within the TYRA level of commands, you may do this by using the command

    	random seed integer
    
    For example
    	random seed 12345
    
    NB obviously you should place the "random seed" command before using the "randomise" command in your input script

    So, if you are using X-PLOR, you might use the folling input script to generate a family of 40 random structures:

    	read pdb template.pdb
    	terminii X-PLOR
    	tyra
    	random seed 145625
    	randomise both s_ 40
    	end
    

    'Mutating' residues

    Within the TYRA level of commands, use
               mutate res code
    
    where type is either a 1 or 3 letter amino acid code. NB upper or lower case acceptable for the code.
        e.g. mutate 7 e
          or mutate 7 glu
    
    would make residue 7 a glutamate residue. Conformations are from the library. You can change the conformation of the new side-chains by using the "set torsion" command, or the "remove bad contacts" command.

    NB if the residue has a chain identifier, then this must be concatenated to the the residue id e.g,

     	mutate A56 TYR
    

    Automatically moving a side-chain to minimize bad contacts

    Within the TYRA level of commands, use the command
    	minimize bad contacts res
    
    where res should be a concatenation of chain_id and residue id e.g.
    	minimize bad contacts A26
    
    would move the side-chain of residue 26 in chain A to a position making the minimum number of bad contacts with the rest of the protein. This is a useful command to use after using the 'mutate' command.
    Disulphide bond maniplulation

    Manipulating disulpide bonds by interactive graphics is awkward, and sometimes disulphide bonds are modelled into inappropriately strained conformations in both X-ray and NMR protein structures. In some cases the strained conformations arise because of errors in the potential functions of structure refinement programs. You may wish to see if the conformation can be improved - either to obtain a better fit with experimental data, or if experimental data is poor or not available, to obtain a less strained conformation.

    If you have a license for the PROTEIN ENGINEERING/DEISGN module of NAOMI, you can use the disulphide engineering commands to investigate possible alternative models for disulphide bonds (an exhaustive conformational search of energetically favourable disulphide bond conformations is performed).


    Generating side-chain ensembles - Type I
    It is often useful to get a picture of how "constrained" the side-chains in a protein are by their surroundings. Within the TYRA level of commands,
    	make side-chain ensemble-1 string
    
    where string is a list of one-letter codes
    	e.g.  IAFYW
    
    will generate an ensemble of 35 pdb files based on the input structure. The ensemble will have the selected side-chains randomly moved such that they make no bad contacts with each other. This is useful when one wants to make no prior assumptions about the behaviour of amino acid side-chains according to their tertiary environments.
    Automatic design of stablilizing mutants
    Sorry, this section of the documentation is not currently avaiable
    Engineering unstrained disulphide bonds

    Sorry, this section of the documentation is not currently avaiable

    Sorry, this section of the documentation is not currently avaiable

    (ii) Within the TYRA level of commands, use the command

    Sorry, this section of the documentation is not currently avaiable


    Automatic identification and classification of secondary structure

    NAOMI uses a fuzzy logic alogirthm to recognize secondary structural motifs in proteins. Decisions are made as to whether possible segements are, or are not, complete secondary structural elements. This is different to the approach used by some other programs that identify repeating patterns (of for example chain conformation, or hydrogen bonds).

    The command

    	use sec_struc
    
    explicitly tells naomi that secondary structural information will be required later on in the script. It will automatically invoke calculations of other properties e.g. hydrogen bonds, if these have not already been calculated elsewhere in the script. Use this command, follwed by
    	table sec_struc
    
    to provide a list of secondary structural elements in the protein. The output takes the form of an overall summary, followed by details of residue numbers involved in helices, strands (forming part of sheets), and beta turns. Example output is show below.
    
    ..bbbbbbbb.bbbbbbbbbbaaaaaaaaaaaaaaaa....bbbbbb...
    bbbbbb
    
    Beta strands
         3  -  10
         12  -  21
         42  -  47
         51  -  56
    Helices 310, regular, pi
         22  -  37
    Beta-turns
         47  -  50  Type IV AA
    
    
    A novel algorithm, making use of hydrogen bonding information and polypeptide chain conformation parameters, is used to recognize the secondary structural motifs.
    Hydrogen bonds

    To obtain a list of hydrogen bonds, along with calculated enegies (from a model using explictly calculated lone-pair positions, and quantifying both electrostatic effects and quality of orbital overlap), use the command:
    	use hbonds
    
    To tell NAOMI to calculate all information about hydrogen bonds in the protein.

    Then use any combination of the following commands to output information about hydrogen bonds in the protein.

    	table hbonds_da
    	table hbonds_ad
    	table side-chain_hbonds
    
    The "da" in the first command stands for donor-acceptor listing, so a list of all main-chain donors is output, along with the partnering main-chain and side-chain acceptors. The calculated energies are useful in deciding which is the major contributor of bifurcated hydrogen bonds, and also in analysing secondary structure in detail, e.g. under, over winding of helices, or missing hydrogen bonds due to helix bends etc. (Also the command "molscript hbonds" may be used to produced a graphical representation (see the examples section on the NAOMI Web Site).

    Example output is shown below:

    
               Table of hydrogen bonds: Donor to acceptors
    
          for  1rnb.pdb  resolution  0.00 Angstroms
    
    (NB remember to validate the structure with the VALIDATE and CORRECT
    options before calculating H-bonds for the best results
    
     |Donors |---------------------  Acceptors -------------------------|
     | Main  |      Main                  |         Side chain          |
     | Chain |      Chain                 |                             |
        86 D       #              #            #           #           #
        87 R  99 T  -8.41         #            #           #           #
        88 I       #              #            #           #           #
        89 L  97 Y -10.23         #            #           #           #
        90 Y       #              #            #           #           #
        91 S  95 L  -6.39         #            #           #           #
        92 S       #              #            #           #           #
        93 D       #              #         93 D OD1 -5.08 #           #
        94 W  91 S  -5.10         #            #           #           #
        95 L       #              #         91 S OG  -7.76 #           #
        96 I       #              #            #           #           #
    
    
    The "table hbonds_ad" lists analagously but as acceptor to donor.

    The "table side-chain_hbonds" command lists possible side-chain - side-chain hydrogen bonds in the protein. Example output is shown below:

    
          Table of side-chain - side-chain hydrogen bonds
          for  rec_B.pdb  resolution  0.00 Angstroms
    
    Format is donor - acceptor, with chain, residue number, residue and atom
    given for both.  NB At present the Energy is actually the donor-acceptor
    distance.
    
    B   34  K NZ  - B   51  T OG1 (E =   3.19)
    B   34  K NZ  - B   53  E OE2 (E =   2.98)
    B   39  R NH1 - B  132  D OD1 (E =   3.62)
    B   39  R NH2 - B  132  D OD1 (E =   2.93)
    B   40  S OG  - B   42  E OE1 (E =   3.73)
    B   45  T OG1 - B   42  E OE1 (E =   3.32)
    B   45  T OG1 - B   42  E OE2 (E =   2.91)
    
    

    Solvent accessibility calculations


    Relevent command summary:
    	use solvent access
    	table residue_access
    	table total_access
    	zone 
    	select
    
    (The following commands require a license for the protein function module)
    	zone2
    	select2
    

    Commands are provided for calculation of the solvent accessible surface (using a fast numerical integration algorithm) area of atoms and residues in a protein. The solvent accessible surface is taken as that defined by Lee and Richards i.e the locus of the centre of a probe sphere (representing a water molecule) rolled over the entire van der Waals surface of the protein.

    First, use the command:

    	use solvent access
    
    to tell NAOMI that solvent accessibility calculations are required in this script. Remember to only do "use" commands after you have made your residue selection e.g.
    	zone 10 15
    	use solvent access
    

    This command invokes calculation of both absolute (in units of square Angstroms) and percentage (100% accessibility corresponding to the accessibility of residue X in the three-residue peptide G-X-G, where G, X and G are in extended main-chain and/or side-chain conformations) accessibilities for all selected atoms and residues. The calculations are rapid compared with many solvent accessibility algorithms - accessibilities for 1000 atoms can generally be calculated in approximately 10 seconds on a MIPS R4600SC Workstation.

    The following commands then allow output of the results of the calculations:

    	table residue_access param1 param2
    
    param1 controls whether main-chain, side-chain, both main-chain and side-chain, total, or side-chain carbon residue accessibilities are output. It can take the values:
    	main
    	side
    	both
    	total
    	carbon
    
    param2 controls the units of the calculation i.e. whether absolute accessiblities (in square Angstroms) or percentage accessibilites are output. It can take the values:
    	absolute
    	percent
    	both
    
    For example, the script:
    	use solvent access
    	table residue_access both both
    
    might produce the following output:
    
    NAOMI>Calculating solvent accessibile surface areas...
    NAOMI>OUTPUT Residue solvent accessibilities:
    NAOMI>OUTPUT main-chain, side-chain (in square Angstroms and percentage) 
    NAOMI>OUTPUT
    NAOMI>OUTPUT     1  K    25 A^2     71 % ,     89 A^2     45 % 
    NAOMI>OUTPUT     2  V    17 A^2     50 % ,     71 A^2     54 % 
    NAOMI>OUTPUT     3  F     4 A^2     12 % ,      8 A^2      4 % 
    NAOMI>OUTPUT     4  G    28 A^2     34 % ,      0 A^2      0 % 
    NAOMI>OUTPUT     5  R     1 A^2      4 % ,     72 A^2     31 % 
    NAOMI>OUTPUT     6  C     2 A^2      6 % ,     40 A^2     38 % 
    NAOMI>OUTPUT     7  E     5 A^2     15 % ,     60 A^2     41 % 
    NAOMI>OUTPUT     8  L     0 A^2      0 % ,      0 A^2      0 % 
    NAOMI>OUTPUT     9  A     0 A^2      0 % ,      0 A^2      0 % 
    NAOMI>OUTPUT    10  A     2 A^2      4 % ,     41 A^2     55 % 
    NAOMI>OUTPUT    11  A     6 A^2     16 % ,     16 A^2     21 % 
    NAOMI>OUTPUT    12  M     0 A^2      0 % ,      0 A^2      0 % 
    NAOMI>OUTPUT    13  K    11 A^2     31 % ,     68 A^2     35 % 
    NAOMI>OUTPUT    14  R    27 A^2     78 % ,    154 A^2     67 % 
    NAOMI>OUTPUT    15  H     9 A^2     25 % ,     23 A^2     16 % 
    NAOMI>OUTPUT    16  G    40 A^2     48 % ,      0 A^2      0 % 
    NAOMI>OUTPUT    17  L     0 A^2      0 % ,      0 A^2      0 % 
    NAOMI>OUTPUT    18  D     7 A^2     18 % ,     31 A^2     27 % 
    NAOMI>OUTPUT    19  N     7 A^2     19 % ,     89 A^2     73 % 
    NAOMI>OUTPUT    20  Y     5 A^2     13 % ,     62 A^2     33 % 
    NAOMI>OUTPUT    21  R    28 A^2     80 % ,    112 A^2     49 % 
    NAOMI>OUTPUT    22  G    69 A^2     84 % ,      0 A^2      0 % 
    NAOMI>OUTPUT    23  Y     0 A^2      0 % ,     42 A^2     22 % 
    NAOMI>OUTPUT    24  S     1 A^2      2 % ,     31 A^2     33 % 
    
    NB glycine residue side-chains take 0 values for all accessibilities (because glycine residues don't have side-chains!)

    NB residue near terminii and chain-breaks may apparently have greater than 100% accessibilities because 100% is calculated within a 3-residue segment. The command

    	table total_access
    
    outputs information on the total solvent accessible surface of the protein. Example output is:
    
    NAOMI>OUTPUT Total Solvent Accessible Surface of Protein
    NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    NAOMI>OUTPUT Total               6242 A^2
    NAOMI>OUTPUT    Main-chain       1393 A^2     22 %
    NAOMI>OUTPUT    Side-chain       4849 A^2     78 %
    NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    NAOMI>OUTPUT Total hydrophobic   3318 A^2     53 %
    NAOMI>OUTPUT    Main-chain        677 A^2     20 %
    NAOMI>OUTPUT    Side-chain       2640 A^2     80 %
    NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    NAOMI>OUTPUT Total hydrophilic   2924 A^2     47 %
    NAOMI>OUTPUT    Main-chain        715 A^2     24 %
    NAOMI>OUTPUT    Side-chain       2209 A^2     76 %
    NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    NB The percentages here are not relatve solvent accessibilities as they are in the residue-level output. Rather they are percentages of totals e.g. e.g. Total hydrophobic percentage of 53 in the table above is relatve to the Total surface accessibility of 6242 A^2 etc

    More complex solvent accessibility options
    You may wish to calculate the solvent accessiblity of a "molecule" with some parts effectively "missing". For example, suppose you had a protein system with two chains, A and B. You may wish to calculate accessibilities in chain A in the presence and absence of chain B. To do this, a second level of residue selection is provided (but only in the protein function module). Some examples should make things clear:

    	zone none
    	select A10:A30,A40
    	zone2 none
    	select2 A1:A100,B1:B100
    	use solvent access
    
    This script will calculate accessibilities for residues A10 through A30, and A40 in the presence of all the atoms in residues A1:A100 and B1:B100. If "select2 A1:A100" had been used instead, the calculations would process as though the atoms in chain B were not present.
    Interior/Exterior Residue esimation

    Visual inspection of globular protein folds shows that some residues may be regarded as being interior to the fold whilst some are on the protein surface.

    Use the commands

    	table exterior_residues
    	table interior_residues
    
    Example output is shown below
    
    NAOMI>OUTPUT_____________________________________
    NAOMI>OUTPUT Exterior residues...
    NAOMI>OUTPUT
    NAOMI>OUTPUT      1  V
    NAOMI>OUTPUT      2  I
    NAOMI>OUTPUT      4  M
    NAOMI>OUTPUT      5  P
    NAOMI>OUTPUT      6  S
    NAOMI>OUTPUT      8  R
    ...
    output deleted for reasons of space
    ...
    NAOMI>OUTPUT_____________________________________
    NAOMI>OUTPUT_____________________________________
    NAOMI>OUTPUT Interior residues...
    NAOMI>OUTPUT
    NAOMI>OUTPUT      3  A
    NAOMI>OUTPUT      7  V
    NAOMI>OUTPUT     10  Y
    NAOMI>OUTPUT     11  A
    NAOMI>OUTPUT     16  V
    
    
    to provide automatic classifications of these.


    Salt bridges

    Use the command
    	table salt-bridges
    
    to output a list of possible salt-bridges in a protein. Additional to residue information (chain, number and type), the closest approach of atoms in the side-chain [r(min) in Angstroms] and side-chain - side-chain hydrogen bonding information is indicated [(HB) indicates a hydrogen bond, (__) indictates no hydrogen bond].

    Example output is shown below

    
    A  38 LYS and A  33 GLU :  r(min) =   6.7 (__)
    A  38 LYS and B 127 GLU :  r(min) =   7.9 (__)
    A  41 LYS and A  32 GLU :  r(min) =   3.2 (HB)
    A  41 LYS and B 127 GLU :  r(min) =   3.1 (HB)
    A  64 ARG and A  65 GLU :  r(min) =   6.5 (__)
    
    

    Supersecondary structure:

    Automatic identification and classification of beta hairpin loops

    To identify and classify beta hairpins in a structure according to the nomenclature of Wilmot and Thornton, use the commands

    	use hairpins
    	table hairpins
    
    example output is shown below for the homodimeric protein glutathione reductase (the calculation for this c. 1000 residue protein took less than 1 minute cpu time on a R4000SC INDIGO2):
    
    A 120  - A 121  2:2  FVD AK TLE  Wide left bulge
    A 126  - A 127  2:2  LEV NG ETI  Regular
    B 120  - B 121  2:2  FVD AK TLE  Wide left bulge
    B 126  - B 127  2:2  LEV NG ETI  Regular
    A 237  - A 241  3:5  VVK NTDGS LTL  G1 bulge
    A 246  - A 250  3:5  TLE LEDGR SET  G1 bulge
    B 237  - B 241  3:5  VVK NTDGS LTL  G1 bulge
    B 246  - B 250  3:5  TLE LEDGR SET  G1 bulge
    A 344  - A 348  5:5  TVV FSHPP IGT  Regular
    B 344  - B 348  5:5  TVV FSHPP IGT  Regular
    A 377  - A 387  9:11  SFT AMYTAVTTHRQ PCR  Regular
    B 377  - B 387  9:11  SFT AMYTAVTTHRQ PCR  Regular
    
    
    The residue numbers and sequence of the loops are output, along with the Sibanda and Thornton classificatin (3:5 etc). The sequence of the three residues flanking the loop, and the details of the secondary structure of these "flanking" residues (regular, bulge etc) are also provided. The output is ordered for increasing loop size.

    If you want to have the conformation of each residue in the loop output as well (A = alpha, B = beta etc), then do:

    	calc phi psi short
    	use hairpins
    	table hairpins
    
    which gives the following output. The conformation for each reasidue is under the amino acid sequence.
    
    A 120  - A 121  2:2  FVD AK TLE  Wide left bulge
                         PAB AA BBB
    A 126  - A 127  2:2  LEV NG ETI  Regular
                         BBB L+ PBB
    B 120  - B 121  2:2  FVD AK TLE  Wide left bulge
                         PAB AA BBB
    B 126  - B 127  2:2  LEV NG ETI  Regular
                         BBB LG PBB
    A 237  - A 241  3:5  VVK NTDGS LTL  G1 bulge
                         BBP BAAGP BBP
    A 246  - A 250  3:5  TLE LEDGR SET  G1 bulge
                         BPB BAAGP BBB
    B 237  - B 241  3:5  VVK NTDGS LTL  G1 bulge
                         BBP PAAGP BBP
    B 246  - B 250  3:5  TLE LEDGR SET  G1 bulge
                         BPB BAAGP BBB
    A 397  - A 400  4:4  VCV GSEE KIV  Narrow right bulge
                         BBB EAAL PPA
    B 397  - B 400  4:4  VCV GSEE KIV  Narrow right bulge
                         BBB EAAL PPA
    A 344  - A 348  5:5  TVV FSHPP IGT  Regular
                         BBB APBPP BBB
    B 344  - B 348  5:5  TVV FSHPP IGT  Regular
                         BBB BPBPP BBB
    
    
    Thanks to Y. J. K. Edwards for granting permission to incorporate a modified version of the TURNPIN beta hairpin recognition algorithm into NAOMI.
    Hydrophobic Interaction Analysis

    Identifying close attractive van der Waals interactions between pairs of non-polar groups is a useful way of identifying the roles that particular residues play hydrophobic cores of proteins. Frequently, this type of analsysis is more information-rich than calculation of solvent accessible surface areas because details of intramolecular interactions are reveailed (for example if you wanted to know how a helix was interacting with a beta sheet). See the examples sectionon the NAOMI Web Site.

    Use the commands

    	calc hydrophobic
    	table hydrophobic
    
    to obtain such an analsys. Example output is shown below:
    
    Residues whose side chains make hydrophobic contacts
     10    T -  11    P
     11    P -  10    T
     13    V -  32    K
     13    V -  34    V
     14    T -  16    Y
     14    T -  63    K
     15    T -  30    T
     16    Y -  14    T
     16    Y -  33    A
     16    Y -  36    A
     16    Y -  39    A
     16    Y -  43    F
     16    Y -  58    Y
     16    Y -  63    K
     16    Y -  65    F
     17    K -  30    T
     17    K -  64    T
     18    L -  20    I
    
    
    Each of these pairs of residues contain methyl, methylene or methyne groups that are interacting with each other.

    A graphical representation of this can be obtained if you have access to the program molscript:

    	calc hydrophobic
    	molscript contacts
    
    It is usually a good idea to use this option in conjunction with the "molscript sec_struc" command (see the examples section on the Web Site).
    Identification of Key Residues in a fold

    The so-called 'key' residues in a fold are defined as those residues that: make a significant contribution to the hydrophobic core(s) of a protein; and/or those which have main-chain conformations that are energetically favourable for only a small subset of the 20 naturually occuring amino acid residues.

    Use the commands

    	calc phi psi short hydrophobic
            table key_residues
    
    to perform the analysis. Example output is show below:
    
    1lyz.pdb  resolution  2.00 Angstroms
    
     17 321741352 1 5  53 5 5  85223333  6 31 2
    KvfGrcelaaamkRhglDnyrgySlGNwvcaakfeSnfNtqAtNRNTDgs
                   G  + +L              LG          ++
    
    1 5 6614  2243    21  1 22 412 243   7 2 4 213 45
    tDygilqiNSrwwcNDgRtpGSrNlcnipcSallSSDiTaSvNcakKivS
       E  +         G         L
    
        6 161272 132   1 1541 2 6
    DGNgmNawvawrNrckgTDvQawirgcRl
       E            L        G
    
    The sequence of the protein is shown in one letter codes. Potential key residues are shown as lower case. Above and below such a potential key residue, is shown the reason for the classification. Above the residue is shown what is effectively a weight on its contribution to the hydrophobic core(s) - the higher the number, the more important the residue (this number is actually the "contact number" [Brocklehurst & Perham, 1993] for a residue). Usually, it's best to ignore those residues with a contact number of 1. This analysis will identify residues involved in all types of hydrophobic cluster (e.g. interior and exterior).

    Below a residue, the "short-hand" nomenclature for the main-chain conformation is shown - only residues with a positve value of phi are indicated (either +, L, G or E). Obviously proline residues are important in a fold, but these can be identified from the sequence alone (as opposed to analysing a structure).


    Covalent bonds and CONECT records

    Some computer programs require as input, information on all covalent bonds in a protein provided in the form of PDB format CONECT records. Use the script
    		use covalent_bonds
    		table conect_records
    
    to produce these. Example output is shown below:
    
    CONECT    1    2
    CONECT    2    3    5
    CONECT    3    4    7
    CONECT    6    7
    CONECT    7    8
    CONECT    8    9   11
    CONECT    9   10   15
    CONECT   11   12   13
    CONECT   14   15
    CONECT   15   16
    

    Disulphide bonds and SSBOND records

    You can automatically locate all disulphide bonds in a protein from the coordaintes,and generate PDB format SSBOND records for them by using the command:
    	use disulphides
    
    Typical output is:
    
    SSBOND   1 CYS A    3    CYS A   18 
    SSBOND   2 CYS A   12    CYS A   24 
    SSBOND   3 CYS A   17    CYS A   31 
    SSBOND   4 CYS A   35    CYS A   40 
    SSBOND   5 CYS A   46    CYS A   61 
    SSBOND   6 CYS A   55    CYS A   67 
    SSBOND   7 CYS A   60    CYS A   74 
    SSBOND   8 CYS A   78    CYS A   83 
    SSBOND   9 CYS A   89    CYS A  104 
    SSBOND  10 CYS A   98    CYS A  110 
    SSBOND  11 CYS A  103    CYS A  117 
    SSBOND  12 CYS A  121    CYS A  126 
    SSBOND  13 CYS A  132    CYS A  147 
    SSBOND  14 CYS A  141    CYS A  153 
    SSBOND  15 CYS A  146    CYS A  160 
    SSBOND  16 CYS A  164    CYS A  169 
    
    The format is "number of disulphide bond, residue name, chain identifier, residue number, residue name, chain identifier, residue number"

    NB Molscript format input files can be produced by using the "molscript disulphides" after "use disulphides".


    NMR structure refinement: Identification of main-chain and side-chain hydrogen bond partners from ensembles of structures

    Hydrogen bond restraints are important in defining the three-dimensional structure of proteins in many NMR structure determinations. But it is difficult (and often impossible) to identify hydrogen bonding partners by direct observation by using current NMR experiments.

    One of the best ways to attack this problem is to analyse ensembles of structures calculated without hydrogen bond restraints, to see where donor-acceptor pairs can be identified unambiguously. In combinatation with hydron exchange NMR experiments, this approach can, in favourable cases allow unique identification of both donor and acceptor partners of:

    • both regular and distorted secondary structural hydrogen bonds.
    • main-chain tertiary hydrogen bonds
    • side-chain - main-chain hydrogen bonds
    The energy-based analysis (using a realistic hydrogen bond potential function) also allows relative "strengths" of hydrogen bonds involving shared donors to be postulated.
    NB The analysis can now handle homo and hetero multi-chain proteins (as well as any type of residue identifiers in the pdb file)

    This analysis is highly recommended as a way of determining hydrogen bonding partners in NMR structure determinations, and is preferable to simply assuming for example that helices consist solely of i,i+4 hydrogen bonds etc. It is also recommended that information regarding an ensemble calculated without hydrogen bond restraints be presented in published work.

    The commands:

     	for pdb list filename
    	{
    	  use hbonds
    	  table hbonds_dump
       	  table hbonds_sidedump
    	}
            analyse ensemble
    
    
    allow such an analysis (including statistics on calculated energies) to be performed within NAOMI.

    Please note, if you do not use precisely this script, the behaviour of NAOMI is undefined. Always be careful that your reports directory does not contain any rogue "tmp" files before starting this analysis. There are potentially problems with system resources for this analysis. If you are working on the structure of protein of more than 200 residues, you will need a special version of the program - please contact the author in this case (the system resources required by these commands are unaffected by memory allocation commands, so you cannot change them yourself).

    Example output for main-chain - main-chain hydrogen bonds is given below. Each possible donor is shown, along with possible acceptors, the number of times the hydrogen bond occurs in the ensemble, and statistics on the calculated energies of the hydrogen bonds. NB, this is not the exact format, because chain identifiers are now output

    
      don  acc  no.  mean   adev   sdev   svar   skew   curt
       4    2    1  -3.94
       6    3    1  -3.47
       6    4    5  -4.77   0.37   0.52   0.27   0.66  -1.40
       7    3    1  -1.95
       7    4    9  -4.52   0.57   0.74   0.54   0.30  -1.23
       7    5    1  -5.15
       8    3    1  -5.74
       8    4    1  -3.92
      10    6    1  -4.22
      10    7    8  -1.17   0.75   0.92   0.84  -0.24  -1.66
      11    7    8  -4.76   0.13   0.17   0.03  -0.68  -0.94
      11    8    1  -0.01
      12    8   19  -5.40   0.10   0.12   0.02  -0.02  -0.88
      12    9    1   0.47
      13    9   19  -5.28   0.14   0.19   0.03   1.02   0.76
      13   10   19  -1.21   0.26   0.32   0.10   0.64  -0.85
      14   10   19  -4.62   0.35   0.42   0.18   0.19  -1.00
      14   11   16  -0.74   0.56   0.63   0.40  -0.22  -1.68
      15   11   19  -3.13   0.51   0.68   0.47   0.36   0.37
      15   12   18  -1.51   0.59   0.90   0.81   0.95   1.31
      16   11   18  -4.37   0.74   0.88   0.77  -0.05  -1.34
      20   17   18   0.00   0.28   0.39   0.15  -0.78   0.28
      21   17   18  -4.29   0.27   0.32   0.11  -0.14  -1.44
      21   18   19  -1.04   0.55   0.64   0.42  -0.05  -1.44
      26   24    1  -4.21
      27   25    6  -4.34   1.36   1.60   2.55   0.56  -1.87
      29   26    1  -3.14
      30   27   12  -2.78   0.51   0.56   0.31  -0.30  -1.85
      34   31    3   0.14   0.42   0.62   0.38   0.06  -2.33
      36   32   18  -4.93   0.37   0.43   0.19   0.30  -1.36
      37   33   19  -5.09   0.28   0.33   0.11   0.72  -1.03
      37   34    1  -1.82
      38   34    3  -3.96   0.15   0.23   0.05   0.01  -2.33
      38   35   16   1.15   0.24   0.29   0.09  -0.40  -1.04
      39   35    6  -4.70   0.03   0.04   0.00   0.27  -1.62
      40   36    7  -3.86   0.35   0.45   0.20   0.56  -1.52
      41   37    3   0.02   2.35   3.05   9.33  -0.38  -2.33
      41   38    1  -1.64
      41   39    1  -2.98
      42   37    1  -2.39
      42   38    2  -3.23   1.60   2.26   5.09   0.00  -2.75
    
    
    Side-chain analsyses are similar, but the acceptor atoms are shown also. This allows the use to see if a particular acceptor atom is uniquely involved in a hydrogen bond in cases where this may be ambiguous (e.g. in aspartate residues, atoms OD1 and OD2). Remember to use the "correct" and "validate" commands before to create correctly named atoms where the atom name depends on residue side-chain conformation (you may also need to use the "terminii X-PLOR" if you are using X-PLOR format files).
    
      don  acc  atom no.  mean   adev   sdev   svar   skew   curt
       6    6  OG     1  -0.94
      18   17  OD1    3  -0.79   0.35   0.47   0.22   0.27  -2.33
      19   17  OD1   19  -4.38   1.59   1.77   3.13   0.36  -1.72
      20   17  OD1    5  -4.01   1.75   2.01   4.04  -0.30  -2.23
      23   34  OD1   17  -1.72   1.70   2.01   4.04  -0.28  -1.20
      23   34  OD2    4  -3.30   1.72   2.31   5.36   0.71  -1.72
      24   24  OG1   19  -2.58   0.50   0.63   0.39   0.79  -0.73
      24   34  OD2    7  -4.63   1.21   1.47   2.16   0.80  -1.45
      25   24  OG1   18  -5.27   0.05   0.06   0.00   0.71  -0.83
      26   34  OD2    2  -4.05   2.26   3.20  10.26   0.00  -2.75
      28   27  OD1    1  -1.16
      29   27  OD1   16  -6.41   0.57   0.82   0.67   1.66   1.24
      31   34  OD2   19  -5.73   1.26   1.61   2.60   1.22  -0.40
      36   36  OD1    2  -0.06   0.17   0.25   0.06   0.00  -2.75
    
    
    These commands require the NMR structure refinement module to be licensed.
    Prediction of NOEs from structure
    To predict structurally relevant NOEs one might expect to observe in a given three-dimensional structure (including multimeric proteins), which would be expected to appear in the region of a NOESY spectrum (F1 (0 - 12 ppm), F2 (5-12 ppm) use the command:
    	predict noes lower upper
    
    where the lower and upper represent bounds on inter-protein distances. Information on expected intra-residue, inter-residue and inter-chain NOEs. No chemical shift degeneracy of protons is assumed (even methyl groups at present unfortunately).

    For example,

    	predict noes 1.8 7.0
    
    would report all relevant inter-proton distances between 1.8 and 7.0 Angstroms in a structure. Effectively then, NOEs between pairs of protons where one of the pair is either an amide proton (main-chain or side-chain) or a ring proton are predicted. Intra-residue, and medium and long range NOE predictions are detailed separately (see the example output below).

    NB if you wish to explicitly investigate inter-chain NOEs on multi-chain proteins, make sure that the chains have different chain identifiers (different segment identifiers are not sufficient) in the coordinate file.

    Example output (shortend for reasons of space) is given below.

    !Possible NOEs for residue _   7  VAL, forward in sequence
    !Intra residue NOEs
    INTRA_RES         atom  HN  res _   7  VAL - atom  HA  res _   7  VAL dist  3.0
    INTRA_RES         atom  HN  res _   7  VAL - atom  HB  res _   7  VAL dist  2.6
    INTRA_RES         atom  HN  res _   7  VAL - atom HG11 res _   7  VAL dist  4.7
    INTRA_RES         atom  HN  res _   7  VAL - atom HG12 res _   7  VAL dist  4.9
    INTRA_RES         atom  HN  res _   7  VAL - atom HG13 res _   7  VAL dist  4.4
    INTRA_RES         atom  HN  res _   7  VAL - atom HG21 res _   7  VAL dist  3.0
    INTRA_RES         atom  HN  res _   7  VAL - atom HG22 res _   7  VAL dist  4.3
    INTRA_RES         atom  HN  res _   7  VAL - atom HG23 res _   7  VAL dist  3.4
    !Inter residue NOEs
    INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HN  res _   8  ARG dist  2.5
    INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HA  res _   8  ARG dist  5.0
    INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HG2 res _   8  ARG dist  6.7
    INTER_RES_(i,i+1) atom  HA  res _   7  VAL - atom  HN  res _   8  ARG dist  3.6
    INTER_RES_(i,i+1) atom  HB  res _   7  VAL - atom  HN  res _   8  ARG dist  2.0
    INTER_RES_(i,i+2) atom  HN  res _   7  VAL - atom  HE1 res _   9  LYS dist  6.9
    INTER_RES_(i,i+2) atom  HN  res _   7  VAL - atom  HE2 res _   9  LYS dist  5.7
    INTER_RES_(i,i+2) atom HG12 res _   7  VAL - atom  HN  res _   9  LYS dist  6.4
    INTER_RES_(i,i+2) atom HG13 res _   7  VAL - atom  HN  res _   9  LYS dist  5.2
    INTER_RES_(i,i+3) atom  HB  res _   7  VAL - atom  HN  res _  10  TYR dist  5.3
    INTER_RES_(i,i+3) atom  HB  res _   7  VAL - atom  HD1 res _  10  TYR dist  6.9
    INTER_RES_(i,i+3) atom HG21 res _   7  VAL - atom  HN  res _  10  TYR dist  6.4
    INTER_RES_(i,i+3) atom HG23 res _   7  VAL - atom  HN  res _  10  TYR dist  5.9
    INTER_RES_(i,i+3) atom HG23 res _   7  VAL - atom  HD1 res _  10  TYR dist  6.9
    INTER_RES_(i,i+4) atom  HN  res _   7  VAL - atom  HN  res _  11  ALA dist  5.9
    INTER_RES_(i,i+4) atom  HA  res _   7  VAL - atom  HN  res _  11  ALA dist  4.8
    INTER_RES_(i,i+4) atom  HB  res _   7  VAL - atom  HN  res _  11  ALA dist  6.3
    INTER_RES_(long)  atom  HN  res _   7  VAL - atom HD11 res _  18  ILE dist  6.6
    INTER_RES_(long)  atom  HN  res _   7  VAL - atom HD12 res _  18  ILE dist  6.9
    INTER_RES_(long)  atom HG12 res _   7  VAL - atom  HN  res _  29  ARG dist  7.0
    INTER_RES_(long)  atom  HB  res _   7  VAL - atom  HN  res _  30  VAL dist  5.8
    INTER_RES_(long)  atom HG11 res _   7  VAL - atom  HN  res _  30  VAL dist  5.5
    INTER_RES_(long)  atom HG12 res _   7  VAL - atom  HN  res _  30  VAL dist  4.2
    INTER_RES_(long)  atom HG13 res _   7  VAL - atom  HN  res _  30  VAL dist  4.7
    INTER_RES_(long)  atom HG22 res _   7  VAL - atom  HN  res _  30  VAL dist  6.3
    
    NB This command is useful for sorting out ambigous NOEs in spectra by analysing calculated structures. You should expect, however, that some peaks in your spectra, predicted from this simple analysis of structures may be missing or some extra peaks present due to other physical effects (e.g. spin diffusion).
    Assembling and disassembling multiple model PDB files
    To make a single pdb file consisting of an ensemble of selected single structure pdb files, use the "pack structures" command. An example of a script is:
    	!instruct NAOMI to produce an ensemble file
    	pack structures
    
    	!use structures listed in the file "strucs.lis"
    	for pdb list strucs.lis
    	{
    
    	!make sure atoms are named correctly
    	validate                     
    
    	!only use residues 30 to 50 in chain A
    	zone none
    	select A30:A50
    
    	!write each structure to the ensemble file
    	pdb_write
    	}
    
    The ensemble pdb file is called "ensemble.pdb" and will be placed in the "report" directory that you specified at the beginning of your complete NAOMI input script.

    Selecting only regions of a family is useful for some aspects of structural analysis, for example you might want to omit disordered terminii for analysing an ensemble of structures using PROCHECK_NMR.


    Predicting the 3-D structure of protein folding intermediates
    Sorry, this section of the documentation is not currently avaiable
    Analysis of protein-protein complexes - favourable interactions

    Sorry, this section of the documentation is not currently avaiable
    Machine-parsable 3-D information

    In order to allow results of structure analyses performed by NAOMI to be simply interfaced to bioinformatics software, the command
    	table structure_info
    
    is provided. The output is designed to be machine parsable and includes information that can usefully be included in multiple sequence alignments in cases where the alignment containts sequence(s) of proteins with known three-dimensional structure.

    Information output includes the sequence of the protein for which coordinates were found, secondary structure, disulphide bridging information, key residue contact number (see elsewhere in the User Guide, and whether the side-chain is buried or exposed.

    Sample output is given below (The word "3D_info" flags start of information, "end 3D_info" flags end of output)

    
    3D_info
    !Amino acid sequence of protein for which coordinates are available
    pdbsequence
    >1rnb.pdb
    QVINTFDGVADYLQTYHKLPNDYITKSEAQALGWVASKGNLADVAPGKSI
    GGDIFSNREGKLPGKSGRTWREADINYTSGFRNSDRILYSSDWLIYKTTD
    HYQTFTKIR
    // end sequence
    !Residue level information follows...
    !pdb chain_id is output if there is one, '-' if not
    !Syntax for sec_struc is:
    !sec_struc < helix |  strand | loop >
    !Syntax for disulphide is:
    !disulphide < absolute residue number >
    !Syntax for key_residue contact number is:
    !contact < contact number >
    !Syntax side-chain solvent accessibility:
    !access < buried | exposed >
    residue 1
            chain -
            type Q
            pdb residue id   2 
            sec_struc loop
            access exposed
    end residue
    
    residue 2
            chain -
            type V
            pdb residue id   3 
            sec_struc loop
            access exposed
    end residue
    ...
      (information for most residues not shown in the User Guide for
       reasons of space)
    ...
    
    residue 107
            chain -
            type K
            pdb residue id 108 
            sec_struc strand
            contact 2
            access exposed
    end residue
    
    residue 108
            chain -
            type I
            pdb residue id 109 
            sec_struc strand
            contact 6
            access buried
    end residue
    
    residue 109
            chain -
            type R
            pdb residue id 110 
            sec_struc strand
            contact 3
            access exposed
    end residue
    
    end 3D_info
    

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Author: Simon M. Brocklehurst