NAOMI USER GUIDE (Version 2.4)

NAOMI USER GUIDE (Version 2.4) The NAOMI 2.4 User Guide

by Simon M. Brocklehurst

Contents

Introduction
Protein structure input and output
Reading a pdb file into NAOMI

i) Allocating memory and directory setup
ii) Reading a pdb file

Working with many PDB files simultaneously (ensembles, selected sets of proteins, the entire databank etc)
Brookhaven directory structures and idcodes
Writing out a pdb file
Renaming atoms according to IUPAC nomenclature (where atom names depend on side-chain conformation)
General structural analysis

RESIDUE SELECTION - Selecting "active zones" of a protein
"calc" and "write" (deals with phi, psi, chi1 thru' chi5, plus several other residue-level features of a protein [including tertiary structural information])
"protwrite" (deals with information at the level of the whole protin e.g. number of residues)
Assessing bad contacts in a protein
HYDROGEN BONDS - identification and assessment of quality (energy)
SECONDARY STRUCTURE - Automatic identification and classification (strands, helices, turns)
SUPERSECONDARY STRUCTURE

Automatic identification/classification of beta-hairpin loops

COVALENT BONDS and PDB CONECT records
DISULPHIDE BONDS and PDB SSBOND records
HYDROPHOBIC INTERACTION analysis
Identification of KEY RESIDUES in a fold
SOLVENT ACCESSIBILITY calculations
INTERIOR/EXTERIOR residue estimation
SALT BRIDGES
CHAIN-BREAKS (finding gaps in a protein main-chain e.g. absence due to missing or weak electron density)
Interface to bioinformatics software

machine-parsable 3-D info for improvement of sequence alignments which include sequences of proteins with known structure
Symmetry Operations

Cyclic permutations
Analysis of protein-protein complexes

Detail favourable inter-protein hydrophobic, hydrogen bonding and electrostatic interactions
Detail unfavourable inter-protein steric interactions (e.g. due to difficulties in modelling sturcture using poor density).
Correcting/dealing with errors, omissions and problematic features in coordinate files

Building in "missing atoms" to a protein
Alterations/corrections to atom and residue numbering
Writing out a pdb file
Main-chain modelling

Manually setting main-chain dihedral angles to specific values
Setting main-chain dihedral angles to random values
Side-chain modelling

'Mutating' residues
Constructing a residue library (how to deal with unusual residues)
Manually setting dihedral angles to specific values
Setting side-chain dihedral angles to random values
Automatically moving a side-chain to minimize bad contacts
Disulphide bond manipulation
Modelling and simulation of dynamic properties of proteins

Generating side-chain ensembles - Type I
See also Predicting protein-protein interactions
Prediction of protein function from 3-D structure

Predicting protein-protein interactions
Protein Engineering/Design

Automatic design of new protein surfaces (solubilizing buried protein domains)
Improving solution properties of proteins
Automatic design of stablizing mutants

unstrained disulphide bonds
unstrained X -> PRO mutations

Putative stabilities of designed/redesigned proteins
Prediction of protein folding pathways

Predicting the 3-D structure of protein folding intermediates
NMR structure refinement

Identification of main-chain and side-chain hydrogen bond partners from ensembles of structures
Prediction of NOEs from structure
Renaming atoms according to IUPAC nomenclature (where atom names depend on side-chain conformation)
Generating families with random structure, but good covalent geometry
Assembling and disassembling multiple model PDB files
see also disulphide modelling commands (also useful in X-ray refinement).

Advanced side-chain entropy methods

Making a side-chain entropy database

Tertiary structure prediction

Graphical output and interfaces to molecular graphics software

Graphical output

Miscellaneous

Executing UNIX shell commands from within a NAOMI input script

Known bugs

What is NAOMI?
NAOMI is a sophisticated computer program system for studying the three-dimensional structure of proteins at the atomic level.

General structural analysis
Dealing with problematic pdb files
Simulating dynamic properties of proteins
Prediction of protein function from 3-D structure
Protein Engineering and Design
NMR structure refinement
Tertiary structure prediction
Prediction of protein folding pathways

Why use NAOMI?
Many of the features offered in NAOMI are presently unique, but subsets of its features (particularly in the general structural analysis section) are offered in varying ways by other software. In developing this program, high on the list of priorities were: making the program easy to set up, learn and use (making the output intuitive to understand); incorporation of high quality (in terms of results and efficiency), novel algorithms. NAOMI is currently used by structural biology and chemistry laboratories throughout North America, Europe and Japan.

Getting up and running with NAOMI

Installation

NAOMI has a built-in license manager. You need to set an environment variable so that NAOMI can find the license file it requires in order to run. Put

setenv NAOMI_LM directory-name

into your .cshrc file so that it will be set each time you log on. The directory name should be the name of a directory on your system that contains a file called NAOMI_lm.lic.

NB You need a terminal "/" on the end of the directory name.

Typical contents of NAOMI_lm.lic might be:

Licenses for my computer
FEATURE 1 F6483F345229CDE
FEATURE 2 953B34483FF
FEATURE 3 38BC82F342E

Feature license keys are available from the author. Feature 1 is the main key to allow the program to work. Feature numbers higher than 1 "switch on" other features that are not available by default. For example FEATURE 2, allows the Protein Engineering and Design module to work.

Files that you need

The basic requirement for running NAOMI is a file that contains the "script" specifying what you want to do with your protein structure. This must contain instructions giving an estimate of the number of residues in the protein. Typically, you might have a "standard" memory allocation of 500 residues (probably you will never need more than 7000 residues). Don't routinely use a residue limit much larger than you need - it wastes system resources. It's a good idea to make a file called "alloc_memory.inp" that you can include in all in your NAOMI input files by using the @ symbol. So create a file called "alloc_memory.inp" which should contain something analagous to:

! Standard dynamic memory allocation include file for medium sized proteins
residue limit 500
residue library size 20 /usr/local/lib/naomi/lib/res_lib1.pdb

with the pathname for res_lib1.pdb set appropriately for your system (Check with your local NAOMI expert where the default residue library has been installed).

A basic input file is given below.

An example input script file to get NAOMI up and running (this is a simple example, you'd probably not actually want to ever do it for real)

Edit a file, perhaps you could call it "test.inp" (but it doesn't really matter. Put in the following lines to make a NAOMI script.

 @alloc_memory.inp

 set report dir ./reports/
 
 set pdb dir /nike/old_eve/smb18/pdb/
 
 read pdb 1lyz.pdb

 zone 1 5
 calc phi psi
 write rnum rname phi psi nl

This tells NAOMI to do the following things:

Include the contents of the file alloc_memory.inp into the script. (See above)

Set the directory where some output will be directed. NB THIS DIRECTORY MUST EXIST BEFORE YOU RUN THE PROGRAM (use the UNIX mkdir command)

Set the directory from which you wish to read the pdb file

Tell NAOMI to read the pdb file 1lyz.pdb (the coordinates of lyzozyme)

zone 1 5 means work on residues 1 through 5 only

then calculate the main chain dihedral angles phi psi for these residues

Finally, for the active zone, for each residue write out the residue number, residue name, phi psi. The nl command says put a new line character here. The reason for this is explained in the documentation of the write command.

Running naomi, using the example file

The executable must be in a directory on your path.

Then typing

    naomi < file1  > file2

will read the input commands from file1 and place the output in file2. So type

         naomi < test.inp

The following output (or something similar) should appear on your screen:

|-----------------------------------------------------------|
|--------------------   N A O M I   ------------------------|
|                 Simon M. Brocklehurst
|***********************************************************|
|*                     Version 2.0                         *|
|***********************************************************|
|* It is a condition of use of this software that you cite *|
|* the reference(s) given below in any published work.     *|
|*                                                         *|
|*  (1) Simon M. Brocklehurst & Richard N. Perham (1993)   *|
|*      Protein Science 2, 626-639                         *|
|*                                                         *|
|***********************************************************|
|___________________________________________________________|
Including file alloc_memory.inp
|***********************************************************|
    This version was last updated on Aug  9 1994,10:22:05
   This output file was produced on Wed Aug 17 13:57:06 1994
|***********************************************************|

1lyz.pdb
  1  LYS   n/c   122.5  
  2  VAL  -98.7   126.9  
  3  PHE  -73.0   166.2  
  4  GLY -111.1   159.0  
  5  ARG  -56.4   -77.0

iv) If you have acces to an WWW browser with FORMS, then you can use the NAOMI HTML program launcher.

Command Syntax

Many of the things you can do with NAOMI don't require a lot of complicated commands to be typed in. So in order to make things as clear as possible in these instructions, I've organised things in sections, like: Calculating hydrogen bonds.

Including another NAOMI script, within a script

Use the command

   
    @filename

to include a previously prepared script into the current script. Of course, you can have as many "scripts within scripts" as the memory of your computer will allow.

Reading a pdb file into NAOMI

This is done in two parts i) allocating memory, and directory setup, ii) reading in the file.

i) Allocating memory and Directory Setup

Allocating memory

Before you read in the file, you must allocate some memory for the protein. Although NAOMI allocates memory dynamically, it does need an upper limit of the number of residues in the protein. Usually imposing an upper limit of 2000 residues will suffice (rarely do you need more than 7000). If RAM is tight on your computer, you can set the limit to a much lower value. The command to use is:

	      residue limit integer

where integer would be say 2000.

NOTE this memory allocation should only be done once at the start of a script - if you read several pdb files in a script, NAOMI will deal with it. In other words, the residue limit command just says that in a given script, you will never be looking at a single pdb file with more than 2000 residues in it.

If you are doing any modelling, you should also allocate some memory for the amino-acid library which contains examples of standard amino acids. The command to use is

     residue library size integer pathname

     e.g.

     residue library size 20 ~/naomi/lib/res_lib.pdb

(see section constructing a residue library)

Directory setup

NAOMI requires access to two directories. The first is the so-called "report" directory. This is where some output will be directed e.g. some error messages that are not output directly to you specified output file. You must have the line

	set report dir directory

where directory must have the terminating "/" character e.g.

	set report dir /usr/people/smb/naomi/

	set report dir ./

to simply set the current directory as the reports directory.

The second directorty is where the pdb file(s) you wish to work on are held. This might be a personal directory, or the root directory of your local Brookhaven distribution (see the idcodes command).

Use the command

	set pdb dir directory

to set this analagously to the set report dir command i.e. include the terminal "/".

ii) Reading a pdb file

After having performed the initial set up, use the command

	read pdb filename

to read in the coordinates for the structure (an approximation to Brookghaven or X-PLOR file formats is expected). NAOMI is reasonably intelligent about this, you will find that many files that other programs "choke" on, can be read. You can make NAOMI write out a "proper" pdb file, using the "correct" and "validate" commands.

Repeating the same commands on several PDB files

For some kinds of analysis, you will want to be able to repeat a set of commands on several pdb files e.g. analysing ensembles of nmr structures, doing some analysis of the whole protein databank etc.

In NAOMI, a special kind of loop is available to let you do this. What you need to create is a file with a list of PDB files in that you want to use. Then any commands inside the curly braces will be performed on all of the files contained in the file called filename (the files should be found in the directory specified with "set pdb dir").

	        for pdb list filename
	        {
	

	        }

The "for pdb list" effectively replaces the "read pdb" command that is used for operations on single files

Brookhaven directory structures and idcodes

The current release of the Brookhaven Protein Databank (1994) has a directory hierarchy to make it quicker to browse the databank. The command

                 idcodes on

means that you should use the brookhaven idcode instead of a filename. This allows you to analyse the whole pdb easily. NAOMI will then look for the file in the appropriate place. You should set the directory where your brookhaven files are with the "set pdb dir" command. e.g.

	        set pdb dir /brookhaven/distr/
	 
	        idcodes on

	        read pdb 6PTI

will read the file /brookhaven/distr/pt/pdb6pti.ent. The command "idcodes off" means that actual filenames should be used.

Selecting "active residues" of a protein

Any residues in the protein may be selected for use in analysis and computation. By default, all the residues are selected. Two commands are provided to make selections. If you want only to use a single contiguous range of residues, use the command

	zone  res1 res2

to select a specified contiguous subset of residues in the protein for further work. This erases any other selection you have made earlier on in the script

NB if the residue has a chain identifier, then this must be concatenated to the the residue id e.g,

 	zone C154 C198

To (re)select the entire protein for further work, use

	zone all

If you want to build up a more complex residue selection, use the "select" comand viz:

	select res1, res2, res3:res4, res5 etc

This allows both the selection of individual residues and ranges of residues (res3:res4 selects a range from res3 to res4). You must place ","s in beween the items in the selection list. The select command does "not" erase previous selections. The following example shows how you could select residues 1, 5, 20, 31,32,33,34 and 40

	zone none
	select 1, 5, 20
	select 31:34, 40

Note the use of "zone none" to initially deselect all residues.

Wildcards are also available with the select (and select2) commands e.g.

	zone none
	select A*

would select all residues in chain A.

"calc" and "write"

Simple calculations, relevant at the residue level, are performed by using the calc command which has a syntax

      calc param1 param2 parem3... etc

      e.g.

      calc phi psi short

would calculate the dihedral angles phi and psi, and also a short hand nomenclature which characterises the conformation of a residue based on phi and psi. Calculations are performed for the selected zone of the protein (by default the whole molecule).

The write command allows the output of the results. The syntax is similar, except you need to indicate where new lines (nl) should be placed in the output e.g.

       write rnum rname phi psi nl

would produce a list of residue number, name and phi/psi values for the active zone of the protein (by default the whole input coordinate file).

calc:
	phi
	psi
	omega
	chi1
	chi2
	chi3
	chi4
	chi5

	short

	curvature   - 	how tightly coiled the polypeptide chain is
	 	 	at a given residue position...

	hydrophobic -	calculated details of neighbours making
			hydrohobic contacts in 3-D dimensions.
	error	    -	residue averages for B-factors (X-ray) and
			r.m.s.d. (NMR)
write: as for calc, but also

	rnum
	rname
	grid

"protwrite"

You may output information about the overal protein with the command

	protwrite

NB It is the users responsibility to ensure that they have previously envoked any relevant calculations with particular "use" commands.

Possible parameters are shown below grouped into related sections:

General (requires no calculations)

	NumberOfResidues

Solvent accessibility (requires "use solvent access")

	TotalAccessAbsolute
	TotalAccessHydrophobicAbsolute
	TotalAccessHydrophobicPercent
	TotalAccessPureHydrophobicAbsolute
	TotalAccessPureHydrophobicPercent

So an example script might be:

	use solvent access
	protwrite NumberOfResidues TotalAccessAbsolute

The following output might be produced

	124 2345.345

which would indicate that the protein had 124 residues with a total solvent accessible surface area of 2345.345 Square Angstroms (Note that this figure is not accurate to 3 d.p., rather it is the generic floating point precision output by the protwrite command)

Program sub-sections within NAOMI

Some commands within NAOMI will be recognised only if you have entered a particular subsection of the program. For example some parts of database construction and simulation facilities are within a subsection called TYRA. The command to go into a sub-section is the name of the sub-section. To return to the main level, type end. That is,

     tyra
        command 1
        command 2 etc

     end

This documention will always tell you when a specific command is part of the TYRA section of the program.

Making a side-chain entropy database

Sorry, this section of the documentation is not currently avaiable

Constructing a residue library

Sorry, this section of the documentation is not currently avaiable

Building in "missing atoms" to a protein

Often, a PDB file will have some atoms missing, for example if a side-chain is not visible in an electron density map, it will usually be modelled as an alanine residue. To build in missing atoms, use the command,

              repair side-chains

within the TYRA level of commands. This will keep the conformation of the old side-chain where atoms are available, and will set newly placed regions of the side-chain in an extended conformation, or in an appropriate conformation if a ring is involved.

If you prefer, you can place the missing atoms and energy refine the new side-chains (rather than keeping as much of the old side-chain as possible). Again within the TYRA level do,

              repair and refine side-chains

Identifying chain-breaks

3-D protein structures sometimes have incomplete main-chains. In the case of X-ray structures, this is often because electron density for particular residues is either too weak to interpret or completely absent.

It is convenient for some types of structural analysis to know where such "missing" parts of a structure are. The commands:

	use chain-breaks
	table chain-breaks

provide a list of main-chain-breaks, in the form of a list pairs of residues falling on either side of a break in a polypeptide chain. Example output for a protein with two breaks in the chain (in chain C between residues 56 and 62, and in chain C again between residues 73 and 78) is shown below:


NAOMI>OUTPUT List of residues at ends of breaks in main-chain of rec_C.pdb
NAOMI>OUTPUT C  56 HIS - C  62 GLY
NAOMI>OUTPUT C  73 THR - C  78 GLN

Note you may wish to pipe the output from these commands through egrep e.g.

	naomi < rec_C.inp | egrep -v "WARNING"

Note: expected breaks in multi-chain proteins are ignored, provided that chain identifiers are properly used in the PDB file.

Cyclic permutations

Reconstruction of some symmetric protein structures from their supplied coordinates requires cyclic permutation of the coordinates. The command:

		cyclic permute

transforms a set of x,y,z coordinates to y,z,x. Thus, three applications of the command will restore the original coordinates i.e.

	first time:   x,y,z  ->  y,z,x
	second time:  y,z,x  ->  z,x,y
	third time:   z,x,y  ->  x,y,z

Use this command in conjunction with the pdb_write to out coordinates at each stage of the manipulation.

Alterations to atom and residue numbering

You can renumber residues in a PDB file to be consecutively numbered starting at a given number with the command

      reset_resnum resnum

e.g.

      reset_resnum 4

would number all the residues in the protein so that the first residue is 4, the second is 5 etc.

Writing out a pdb file

The command

    pdb_write

writes out the current zone to the reports directory, with the SAME filename as the input pdb file. NB if the pdb directory, and the reports directory are the same - the original file will be overwritten.

At present this command writes out calculated amide proton data, for use with the "molscript hbonds" command.

Renaming atoms according to IUPAC nomenclature

The correct atom names for some side-chain atoms in PDB files depend on the conformation of the side-chain. For many computations (e.g. calculation of three-dimensional structure from NMR derived data) the atoms in a protein need to be named before the conformation of the residue is known.

Thus it is common to find errors in the naming of some atoms. To rename atoms correctly, use the command

		validate

which correctly renames atoms in the protein. If you wish to write out the correct structure, either use a script as shown in the example below. This example particular shows how to rename the atoms in an ensemble of structures, but you do the same operation on a single file.

	correct
	for pdb list ensem.lis
	{
	validate
        }

You could equally well use the pdb_write command within the bracketed loop

	for pdb list ensem.lis
	{
	validate
	pdb_write
        }

The correct command, is uniquely linked with the validate command. The pdb_write command is a more general command. If you are using X-PLOR format files, you should use the script:

	for pdb list ensem.lis
	{
	terminii X-PLOR
	validate
	pdb_write
        }

immediately before the validate command.

Predicting protein-protein interactions

Sorry, this section of the documentation is not currently avaiable

Graphical output

Interfaces to the programs MOLSCRIPT (P. J Kraulis), INSIGHT II (Biosym), and QUANTA (MSI) are provided to make scientific visualization, and preparation of figures for presentation and publication time-efficient. Intrinsic graphical output, in PostScript format, is also produced for some commands.

MOLSCRIPT Interface

Use the command,

    molscript parameter

where parameters can be:

	    contacts    - input file showing both intra and inter chain
                          hydrophobic interactions schematically
	                  (do "calc hydrophobic" before this command)


            hbonds - hydrogen bonds (needs calculated HN positions
                     thus use pdb_write command) (do use hbonds first)

            sec_struc   - produce input file for cartoon plot
	                  (do "use sec_struc before using this command")

            sec_struc_col1   - produce input file for cartoon plot
	                    (do "use sec_struc before using this command")
	                    rainbow coloured from N (violet) - C (red) 
                            terminii and distance depth-cued

            sec_struc_col2   - input file for cartoon plot
	                    (do "use sec_struc before using this command")
	                    rainbow coloured from N (violet) - C (red) 
                            terminii and NO distance depth-cueing

In addition, the commands

	molscript on
	molscript off

switch on and off production of MOLSCRIPT input files when particular commands are executed e.g.

	predict possible binding sites

can produce an input file to visualize the results, by using the program MOLSCRIPT.

RASMOL Interface

You need to configure your account to use the RASMOL interface. Set the environment variable NAOMI_RASMOL_PATH to the pathname of the version of RASMOL you want to use with NAOMI (some sites have different versions of RASMOL compiled for different machines e.g. with 8-bit or 24-bit (32-bit) colour.

In your .cshrc file, put a line similar to the one below, substituting the "/usr/people..." with the pathname for RASMOL at your site (don't forget to source this file and rehash after doing this).

setenv NAOMI_RASMOL_PATH /usr/people/smb/smb-bin/bin/rasmol.24

At any point in your script after you read in a structure, you can automatically start up a RASMOL interface by using the command

	start rasmol

The backbone of the structure will displayed, with each chain coloured differently. The residues that are currently selected within NAOMI will be coloured purple For example, the following script, will read in a structure to NAOMI, start and start up the NAOMI-RASMOL interface, with the A chain complex purple

	read pdb struc.pdb
	zone none
	select A*
	start rasmol

If you have licenses for various FEATURE modules, you can also use the commands

	rasmol on
	rasmol off

To start up rasmol with more complicated representations as described at the relevant places in the documentation for these MODULES.

INSIGHT II Interface

The commands

	insight on
	insight off

switch on and off the production of BIOSYM COMMAND LANGUAGE (BCL) files. These files appear in the reports directory, and when read into INSIGHT II at the command line, set up new commands in the program which appear as options on pull-down menus.

QUANTA

Use the command

	quanta parameter

Intrinsic NAOMI graphical output

Executing UNIX shell commands within NAOMI

Use the system command:

    system string

where string is passed to the shell that naomi was started from.

Within a NAOMI script, if you wanted for example to compress a file called naomi.dat, you could have the command:

  system compress naomi.dat &

Known bugs

non-exhaustive exception handling of the command language at present - if you make an error in the input script, NAOMI will sometimes ignore the command without warning you, and may crash as a result of an input error. This is a rare problem - especially if you don't make typos!
Non-exhaustive trapping of memory allocation problems. This will not be a problem unless you are working right at the limits of your machine's capacity (this is likely only to happen if you are working with very large systems i.e. several thousand residues on a machine with less than 64 MB Memory available).

Assessing bad contacts in a protein

Within the TYRA level of commands, use the command

         bad contacts

to assess the extent to which atoms are making bad contacts i.e. the extent to which atoms are too close to each other. It returns a number describing the whole protein.

Setting dihedral angles to specific values

Within the TYRA level of commands, use

           set torsion res tor angle 

           e.g. set torsion A5 chi1 -120.0

would set chi1 for residue 5 (chain A) to -120.0 degrees. tor can be phi, psi, omega, chi1, chi2, chi3, chi4 or chi5. Note if you try to set a degree of freedom that is not appropriate (e.g. a ring-opening torsion), the command will fail. NAOMI should warn you if this happens.

When main-chain modelling, all residues that are currently selected (with the zone or select commands) and on the C-teriminal side of the dihedral angle that is being changed will move.

Setting dihedral angles to random values

Please note, the commands:

	random seed
	randomise

are available only if you have a license for the NMR structure refinement module.

In some cases, when calculating NMR structures from primary data, bias is introduced into the system by using the same starting structure for each calculated structure (especially in regions of structure that are poorly defined by restraints). A similar problem of bias has been noted in other applications where three-dimensional structures are calculated from distance restraints (e.g. comparative modelling, protein folding pathway modelling).

To minimize the effect of bias introduced into such systems, a facility for generating random families of structures, but having good covalent geometry is provided. Within the TYRA level of commands, use:

	randomise param family-name n

where


  param can be: main, side or both
  family-name the name of the family of structures to be generated
  n is number of members required for the family

For example.

	randomise main s_ 10

would generate 10 structures with names "s_1.pdb", "s_2.pdb", "s_3.pdb" etc... having the values of phi and psi set to random values for all currently selected residues (omega values are left unchanged from the structure that was read in - if you want to change these these must be set manually using the set torsion command). That is, main specifies that main-chain dihedrals should be randomised, side specfiies that "allowed" side-chain dihedrals should be randomised - both randomises both main-chain and side-chain dihedral angles.

The family of structures is created in the "report" directory you specify in your input script.

To obtain reproduceable results, and/or generate several different families of structures, you must specifically "seed" the random number generator. Within the TYRA level of commands, you may do this by using the command

	random seed integer

For example

	random seed 12345

NB obviously you should place the "random seed" command before using the "randomise" command in your input script

So, if you are using X-PLOR, you might use the folling input script to generate a family of 40 random structures:

	read pdb template.pdb
	terminii X-PLOR
	tyra
	random seed 145625
	randomise both s_ 40
	end

'Mutating' residues

Within the TYRA level of commands, use

           mutate res code

where type is either a 1 or 3 letter amino acid code. NB upper or lower case acceptable for the code.

    e.g. mutate 7 e
      or mutate 7 glu

would make residue 7 a glutamate residue. Conformations are from the library. You can change the conformation of the new side-chains by using the "set torsion" command, or the "remove bad contacts" command.

NB if the residue has a chain identifier, then this must be concatenated to the the residue id e.g,

 	mutate A56 TYR

Automatically moving a side-chain to minimize bad contacts

Within the TYRA level of commands, use the command

	minimize bad contacts res

where res should be a concatenation of chain_id and residue id e.g.

	minimize bad contacts A26

would move the side-chain of residue 26 in chain A to a position making the minimum number of bad contacts with the rest of the protein. This is a useful command to use after using the 'mutate' command.

Disulphide bond maniplulation

Manipulating disulpide bonds by interactive graphics is awkward, and sometimes disulphide bonds are modelled into inappropriately strained conformations in both X-ray and NMR protein structures. In some cases the strained conformations arise because of errors in the potential functions of structure refinement programs. You may wish to see if the conformation can be improved - either to obtain a better fit with experimental data, or if experimental data is poor or not available, to obtain a less strained conformation.

If you have a license for the PROTEIN ENGINEERING/DEISGN module of NAOMI, you can use the disulphide engineering commands to investigate possible alternative models for disulphide bonds (an exhaustive conformational search of energetically favourable disulphide bond conformations is performed).

Generating side-chain ensembles - Type I
It is often useful to get a picture of how "constrained" the side-chains in a protein are by their surroundings. Within the TYRA level of commands,

	make side-chain ensemble-1 string

where string is a list of one-letter codes

	e.g.  IAFYW

will generate an ensemble of 35 pdb files based on the input structure. The ensemble will have the selected side-chains randomly moved such that they make no bad contacts with each other. This is useful when one wants to make no prior assumptions about the behaviour of amino acid side-chains according to their tertiary environments.

Automatic design of stablilizing mutants
Sorry, this section of the documentation is not currently avaiable

Engineering unstrained disulphide bonds

Sorry, this section of the documentation is not currently avaiable

(ii) Within the TYRA level of commands, use the command

Sorry, this section of the documentation is not currently avaiable

Automatic identification and classification of secondary structure

NAOMI uses a fuzzy logic alogirthm to recognize secondary structural motifs in proteins. Decisions are made as to whether possible segements are, or are not, complete secondary structural elements. This is different to the approach used by some other programs that identify repeating patterns (of for example chain conformation, or hydrogen bonds).

The command

	use sec_struc

explicitly tells naomi that secondary structural information will be required later on in the script. It will automatically invoke calculations of other properties e.g. hydrogen bonds, if these have not already been calculated elsewhere in the script. Use this command, follwed by

	table sec_struc

to provide a list of secondary structural elements in the protein. The output takes the form of an overall summary, followed by details of residue numbers involved in helices, strands (forming part of sheets), and beta turns. Example output is show below.


..bbbbbbbb.bbbbbbbbbbaaaaaaaaaaaaaaaa....bbbbbb...
bbbbbb

Beta strands
     3  -  10
     12  -  21
     42  -  47
     51  -  56
Helices 310, regular, pi
     22  -  37
Beta-turns
     47  -  50  Type IV AA

A novel algorithm, making use of hydrogen bonding information and polypeptide chain conformation parameters, is used to recognize the secondary structural motifs.

Hydrogen bonds

To obtain a list of hydrogen bonds, along with calculated enegies (from a model using explictly calculated lone-pair positions, and quantifying both electrostatic effects and quality of orbital overlap), use the command:

	use hbonds

To tell NAOMI to calculate all information about hydrogen bonds in the protein.

Then use any combination of the following commands to output information about hydrogen bonds in the protein.

	table hbonds_da
	table hbonds_ad
	table side-chain_hbonds

The "da" in the first command stands for donor-acceptor listing, so a list of all main-chain donors is output, along with the partnering main-chain and side-chain acceptors. The calculated energies are useful in deciding which is the major contributor of bifurcated hydrogen bonds, and also in analysing secondary structure in detail, e.g. under, over winding of helices, or missing hydrogen bonds due to helix bends etc. (Also the command "molscript hbonds" may be used to produced a graphical representation (see the examples section on the NAOMI Web Site).

Example output is shown below:


           Table of hydrogen bonds: Donor to acceptors

      for  1rnb.pdb  resolution  0.00 Angstroms

(NB remember to validate the structure with the VALIDATE and CORRECT
options before calculating H-bonds for the best results

 |Donors |---------------------  Acceptors -------------------------|
 | Main  |      Main                  |         Side chain          |
 | Chain |      Chain                 |                             |
    86 D       #              #            #           #           #
    87 R  99 T  -8.41         #            #           #           #
    88 I       #              #            #           #           #
    89 L  97 Y -10.23         #            #           #           #
    90 Y       #              #            #           #           #
    91 S  95 L  -6.39         #            #           #           #
    92 S       #              #            #           #           #
    93 D       #              #         93 D OD1 -5.08 #           #
    94 W  91 S  -5.10         #            #           #           #
    95 L       #              #         91 S OG  -7.76 #           #
    96 I       #              #            #           #           #

The "table hbonds_ad" lists analagously but as acceptor to donor.

The "table side-chain_hbonds" command lists possible side-chain - side-chain hydrogen bonds in the protein. Example output is shown below:


      Table of side-chain - side-chain hydrogen bonds
      for  rec_B.pdb  resolution  0.00 Angstroms

Format is donor - acceptor, with chain, residue number, residue and atom
given for both.  NB At present the Energy is actually the donor-acceptor
distance.

B   34  K NZ  - B   51  T OG1 (E =   3.19)
B   34  K NZ  - B   53  E OE2 (E =   2.98)
B   39  R NH1 - B  132  D OD1 (E =   3.62)
B   39  R NH2 - B  132  D OD1 (E =   2.93)
B   40  S OG  - B   42  E OE1 (E =   3.73)
B   45  T OG1 - B   42  E OE1 (E =   3.32)
B   45  T OG1 - B   42  E OE2 (E =   2.91)

Solvent accessibility calculations

Relevent command summary:

	use solvent access
	table residue_access
	table total_access
	zone 
	select

(The following commands require a license for the protein function module)
	zone2
	select2

Commands are provided for calculation of the solvent accessible surface (using a fast numerical integration algorithm) area of atoms and residues in a protein. The solvent accessible surface is taken as that defined by Lee and Richards i.e the locus of the centre of a probe sphere (representing a water molecule) rolled over the entire van der Waals surface of the protein.

First, use the command:

	use solvent access

to tell NAOMI that solvent accessibility calculations are required in this script. Remember to only do "use" commands after you have made your residue selection e.g.

	zone 10 15
	use solvent access

This command invokes calculation of both absolute (in units of square Angstroms) and percentage (100% accessibility corresponding to the accessibility of residue X in the three-residue peptide G-X-G, where G, X and G are in extended main-chain and/or side-chain conformations) accessibilities for all selected atoms and residues. The calculations are rapid compared with many solvent accessibility algorithms - accessibilities for 1000 atoms can generally be calculated in approximately 10 seconds on a MIPS R4600SC Workstation.

The following commands then allow output of the results of the calculations:

	table residue_access param1 param2

param1 controls whether main-chain, side-chain, both main-chain and side-chain, total, or side-chain carbon residue accessibilities are output. It can take the values:

	main
	side
	both
	total
	carbon

param2 controls the units of the calculation i.e. whether absolute accessiblities (in square Angstroms) or percentage accessibilites are output. It can take the values:

	absolute
	percent
	both

For example, the script:

	use solvent access
	table residue_access both both

might produce the following output:


NAOMI>Calculating solvent accessibile surface areas...
NAOMI>OUTPUT Residue solvent accessibilities:
NAOMI>OUTPUT main-chain, side-chain (in square Angstroms and percentage) 
NAOMI>OUTPUT
NAOMI>OUTPUT     1  K    25 A^2     71 % ,     89 A^2     45 % 
NAOMI>OUTPUT     2  V    17 A^2     50 % ,     71 A^2     54 % 
NAOMI>OUTPUT     3  F     4 A^2     12 % ,      8 A^2      4 % 
NAOMI>OUTPUT     4  G    28 A^2     34 % ,      0 A^2      0 % 
NAOMI>OUTPUT     5  R     1 A^2      4 % ,     72 A^2     31 % 
NAOMI>OUTPUT     6  C     2 A^2      6 % ,     40 A^2     38 % 
NAOMI>OUTPUT     7  E     5 A^2     15 % ,     60 A^2     41 % 
NAOMI>OUTPUT     8  L     0 A^2      0 % ,      0 A^2      0 % 
NAOMI>OUTPUT     9  A     0 A^2      0 % ,      0 A^2      0 % 
NAOMI>OUTPUT    10  A     2 A^2      4 % ,     41 A^2     55 % 
NAOMI>OUTPUT    11  A     6 A^2     16 % ,     16 A^2     21 % 
NAOMI>OUTPUT    12  M     0 A^2      0 % ,      0 A^2      0 % 
NAOMI>OUTPUT    13  K    11 A^2     31 % ,     68 A^2     35 % 
NAOMI>OUTPUT    14  R    27 A^2     78 % ,    154 A^2     67 % 
NAOMI>OUTPUT    15  H     9 A^2     25 % ,     23 A^2     16 % 
NAOMI>OUTPUT    16  G    40 A^2     48 % ,      0 A^2      0 % 
NAOMI>OUTPUT    17  L     0 A^2      0 % ,      0 A^2      0 % 
NAOMI>OUTPUT    18  D     7 A^2     18 % ,     31 A^2     27 % 
NAOMI>OUTPUT    19  N     7 A^2     19 % ,     89 A^2     73 % 
NAOMI>OUTPUT    20  Y     5 A^2     13 % ,     62 A^2     33 % 
NAOMI>OUTPUT    21  R    28 A^2     80 % ,    112 A^2     49 % 
NAOMI>OUTPUT    22  G    69 A^2     84 % ,      0 A^2      0 % 
NAOMI>OUTPUT    23  Y     0 A^2      0 % ,     42 A^2     22 % 
NAOMI>OUTPUT    24  S     1 A^2      2 % ,     31 A^2     33 %

NB glycine residue side-chains take 0 values for all accessibilities (because glycine residues don't have side-chains!)

NB residue near terminii and chain-breaks may apparently have greater than 100% accessibilities because 100% is calculated within a 3-residue segment. The command

	table total_access

outputs information on the total solvent accessible surface of the protein. Example output is:


NAOMI>OUTPUT Total Solvent Accessible Surface of Protein
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total               6242 A^2
NAOMI>OUTPUT    Main-chain       1393 A^2     22 %
NAOMI>OUTPUT    Side-chain       4849 A^2     78 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total hydrophobic   3318 A^2     53 %
NAOMI>OUTPUT    Main-chain        677 A^2     20 %
NAOMI>OUTPUT    Side-chain       2640 A^2     80 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total hydrophilic   2924 A^2     47 %
NAOMI>OUTPUT    Main-chain        715 A^2     24 %
NAOMI>OUTPUT    Side-chain       2209 A^2     76 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NB The percentages here are not relatve solvent accessibilities as they are in the residue-level output. Rather they are percentages of totals e.g. e.g. Total hydrophobic percentage of 53 in the table above is relatve to the Total surface accessibility of 6242 A^2 etc

More complex solvent accessibility options
You may wish to calculate the solvent accessiblity of a "molecule" with some parts effectively "missing". For example, suppose you had a protein system with two chains, A and B. You may wish to calculate accessibilities in chain A in the presence and absence of chain B. To do this, a second level of residue selection is provided (but only in the protein function module). Some examples should make things clear:

	zone none
	select A10:A30,A40
	zone2 none
	select2 A1:A100,B1:B100
	use solvent access

This script will calculate accessibilities for residues A10 through A30, and A40 in the presence of all the atoms in residues A1:A100 and B1:B100. If "select2 A1:A100" had been used instead, the calculations would process as though the atoms in chain B were not present.

Interior/Exterior Residue esimation

Visual inspection of globular protein folds shows that some residues may be regarded as being interior to the fold whilst some are on the protein surface.

Use the commands

	table exterior_residues
	table interior_residues

Example output is shown below


NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT Exterior residues...
NAOMI>OUTPUT
NAOMI>OUTPUT      1  V
NAOMI>OUTPUT      2  I
NAOMI>OUTPUT      4  M
NAOMI>OUTPUT      5  P
NAOMI>OUTPUT      6  S
NAOMI>OUTPUT      8  R
...
output deleted for reasons of space
...
NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT Interior residues...
NAOMI>OUTPUT
NAOMI>OUTPUT      3  A
NAOMI>OUTPUT      7  V
NAOMI>OUTPUT     10  Y
NAOMI>OUTPUT     11  A
NAOMI>OUTPUT     16  V

to provide automatic classifications of these.

Salt bridges

Use the command

	table salt-bridges

to output a list of possible salt-bridges in a protein. Additional to residue information (chain, number and type), the closest approach of atoms in the side-chain [r(min) in Angstroms] and side-chain - side-chain hydrogen bonding information is indicated [(HB) indicates a hydrogen bond, (__) indictates no hydrogen bond].

Example output is shown below


A  38 LYS and A  33 GLU :  r(min) =   6.7 (__)
A  38 LYS and B 127 GLU :  r(min) =   7.9 (__)
A  41 LYS and A  32 GLU :  r(min) =   3.2 (HB)
A  41 LYS and B 127 GLU :  r(min) =   3.1 (HB)
A  64 ARG and A  65 GLU :  r(min) =   6.5 (__)

Supersecondary structure:

Automatic identification and classification of beta hairpin loops

To identify and classify beta hairpins in a structure according to the nomenclature of Wilmot and Thornton, use the commands

	use hairpins
	table hairpins

example output is shown below for the homodimeric protein glutathione reductase (the calculation for this c. 1000 residue protein took less than 1 minute cpu time on a R4000SC INDIGO2):


A 120  - A 121  2:2  FVD AK TLE  Wide left bulge
A 126  - A 127  2:2  LEV NG ETI  Regular
B 120  - B 121  2:2  FVD AK TLE  Wide left bulge
B 126  - B 127  2:2  LEV NG ETI  Regular
A 237  - A 241  3:5  VVK NTDGS LTL  G1 bulge
A 246  - A 250  3:5  TLE LEDGR SET  G1 bulge
B 237  - B 241  3:5  VVK NTDGS LTL  G1 bulge
B 246  - B 250  3:5  TLE LEDGR SET  G1 bulge
A 344  - A 348  5:5  TVV FSHPP IGT  Regular
B 344  - B 348  5:5  TVV FSHPP IGT  Regular
A 377  - A 387  9:11  SFT AMYTAVTTHRQ PCR  Regular
B 377  - B 387  9:11  SFT AMYTAVTTHRQ PCR  Regular

The residue numbers and sequence of the loops are output, along with the Sibanda and Thornton classificatin (3:5 etc). The sequence of the three residues flanking the loop, and the details of the secondary structure of these "flanking" residues (regular, bulge etc) are also provided. The output is ordered for increasing loop size.

If you want to have the conformation of each residue in the loop output as well (A = alpha, B = beta etc), then do:

	calc phi psi short
	use hairpins
	table hairpins

which gives the following output. The conformation for each reasidue is under the amino acid sequence.


A 120  - A 121  2:2  FVD AK TLE  Wide left bulge
                     PAB AA BBB
A 126  - A 127  2:2  LEV NG ETI  Regular
                     BBB L+ PBB
B 120  - B 121  2:2  FVD AK TLE  Wide left bulge
                     PAB AA BBB
B 126  - B 127  2:2  LEV NG ETI  Regular
                     BBB LG PBB
A 237  - A 241  3:5  VVK NTDGS LTL  G1 bulge
                     BBP BAAGP BBP
A 246  - A 250  3:5  TLE LEDGR SET  G1 bulge
                     BPB BAAGP BBB
B 237  - B 241  3:5  VVK NTDGS LTL  G1 bulge
                     BBP PAAGP BBP
B 246  - B 250  3:5  TLE LEDGR SET  G1 bulge
                     BPB BAAGP BBB
A 397  - A 400  4:4  VCV GSEE KIV  Narrow right bulge
                     BBB EAAL PPA
B 397  - B 400  4:4  VCV GSEE KIV  Narrow right bulge
                     BBB EAAL PPA
A 344  - A 348  5:5  TVV FSHPP IGT  Regular
                     BBB APBPP BBB
B 344  - B 348  5:5  TVV FSHPP IGT  Regular
                     BBB BPBPP BBB

Thanks to Y. J. K. Edwards for granting permission to incorporate a modified version of the TURNPIN beta hairpin recognition algorithm into NAOMI.

Hydrophobic Interaction Analysis

Identifying close attractive van der Waals interactions between pairs of non-polar groups is a useful way of identifying the roles that particular residues play hydrophobic cores of proteins. Frequently, this type of analsysis is more information-rich than calculation of solvent accessible surface areas because details of intramolecular interactions are reveailed (for example if you wanted to know how a helix was interacting with a beta sheet). See the examples sectionon the NAOMI Web Site.

Use the commands

	calc hydrophobic
	table hydrophobic

to obtain such an analsys. Example output is shown below:


Residues whose side chains make hydrophobic contacts
 10    T -  11    P
 11    P -  10    T
 13    V -  32    K
 13    V -  34    V
 14    T -  16    Y
 14    T -  63    K
 15    T -  30    T
 16    Y -  14    T
 16    Y -  33    A
 16    Y -  36    A
 16    Y -  39    A
 16    Y -  43    F
 16    Y -  58    Y
 16    Y -  63    K
 16    Y -  65    F
 17    K -  30    T
 17    K -  64    T
 18    L -  20    I

Each of these pairs of residues contain methyl, methylene or methyne groups that are interacting with each other.

A graphical representation of this can be obtained if you have access to the program molscript:

	calc hydrophobic
	molscript contacts

It is usually a good idea to use this option in conjunction with the "molscript sec_struc" command (see the examples section on the Web Site).

Identification of Key Residues in a fold

The so-called 'key' residues in a fold are defined as those residues that: make a significant contribution to the hydrophobic core(s) of a protein; and/or those which have main-chain conformations that are energetically favourable for only a small subset of the 20 naturually occuring amino acid residues.

Use the commands

	calc phi psi short hydrophobic
        table key_residues

to perform the analysis. Example output is show below:


1lyz.pdb  resolution  2.00 Angstroms

 17 321741352 1 5  53 5 5  85223333  6 31 2
KvfGrcelaaamkRhglDnyrgySlGNwvcaakfeSnfNtqAtNRNTDgs
               G  + +L              LG          ++

1 5 6614  2243    21  1 22 412 243   7 2 4 213 45
tDygilqiNSrwwcNDgRtpGSrNlcnipcSallSSDiTaSvNcakKivS
   E  +         G         L

    6 161272 132   1 1541 2 6
DGNgmNawvawrNrckgTDvQawirgcRl
   E            L        G

The sequence of the protein is shown in one letter codes. Potential key residues are shown as lower case. Above and below such a potential key residue, is shown the reason for the classification. Above the residue is shown what is effectively a weight on its contribution to the hydrophobic core(s) - the higher the number, the more important the residue (this number is actually the "contact number" [Brocklehurst & Perham, 1993] for a residue). Usually, it's best to ignore those residues with a contact number of 1. This analysis will identify residues involved in all types of hydrophobic cluster (e.g. interior and exterior).

Below a residue, the "short-hand" nomenclature for the main-chain conformation is shown - only residues with a positve value of phi are indicated (either +, L, G or E). Obviously proline residues are important in a fold, but these can be identified from the sequence alone (as opposed to analysing a structure).

Covalent bonds and CONECT records

Some computer programs require as input, information on all covalent bonds in a protein provided in the form of PDB format CONECT records. Use the script

		use covalent_bonds
		table conect_records

to produce these. Example output is shown below:


CONECT    1    2
CONECT    2    3    5
CONECT    3    4    7
CONECT    6    7
CONECT    7    8
CONECT    8    9   11
CONECT    9   10   15
CONECT   11   12   13
CONECT   14   15
CONECT   15   16

Disulphide bonds and SSBOND records

You can automatically locate all disulphide bonds in a protein from the coordaintes,and generate PDB format SSBOND records for them by using the command:

	use disulphides

Typical output is:


SSBOND   1 CYS A    3    CYS A   18 
SSBOND   2 CYS A   12    CYS A   24 
SSBOND   3 CYS A   17    CYS A   31 
SSBOND   4 CYS A   35    CYS A   40 
SSBOND   5 CYS A   46    CYS A   61 
SSBOND   6 CYS A   55    CYS A   67 
SSBOND   7 CYS A   60    CYS A   74 
SSBOND   8 CYS A   78    CYS A   83 
SSBOND   9 CYS A   89    CYS A  104 
SSBOND  10 CYS A   98    CYS A  110 
SSBOND  11 CYS A  103    CYS A  117 
SSBOND  12 CYS A  121    CYS A  126 
SSBOND  13 CYS A  132    CYS A  147 
SSBOND  14 CYS A  141    CYS A  153 
SSBOND  15 CYS A  146    CYS A  160 
SSBOND  16 CYS A  164    CYS A  169

The format is "number of disulphide bond, residue name, chain identifier, residue number, residue name, chain identifier, residue number"

NB Molscript format input files can be produced by using the "molscript disulphides" after "use disulphides".

NMR structure refinement: Identification of main-chain and side-chain hydrogen bond partners from ensembles of structures

Hydrogen bond restraints are important in defining the three-dimensional structure of proteins in many NMR structure determinations. But it is difficult (and often impossible) to identify hydrogen bonding partners by direct observation by using current NMR experiments.

One of the best ways to attack this problem is to analyse ensembles of structures calculated without hydrogen bond restraints, to see where donor-acceptor pairs can be identified unambiguously. In combinatation with hydron exchange NMR experiments, this approach can, in favourable cases allow unique identification of both donor and acceptor partners of:

both regular and distorted secondary structural hydrogen bonds.
main-chain tertiary hydrogen bonds
side-chain - main-chain hydrogen bonds

The energy-based analysis (using a realistic hydrogen bond potential function) also allows relative "strengths" of hydrogen bonds involving shared donors to be postulated.
NB The analysis can now handle homo and hetero multi-chain proteins (as well as any type of residue identifiers in the pdb file)

This analysis is highly recommended as a way of determining hydrogen bonding partners in NMR structure determinations, and is preferable to simply assuming for example that helices consist solely of i,i+4 hydrogen bonds etc. It is also recommended that information regarding an ensemble calculated without hydrogen bond restraints be presented in published work.

The commands:

 	for pdb list filename
	{
	  use hbonds
	  table hbonds_dump
   	  table hbonds_sidedump
	}
        analyse ensemble

allow such an analysis (including statistics on calculated energies) to be performed within NAOMI.

Please note, if you do not use precisely this script, the behaviour of NAOMI is undefined. Always be careful that your reports directory does not contain any rogue "tmp" files before starting this analysis. There are potentially problems with system resources for this analysis. If you are working on the structure of protein of more than 200 residues, you will need a special version of the program - please contact the author in this case (the system resources required by these commands are unaffected by memory allocation commands, so you cannot change them yourself).

Example output for main-chain - main-chain hydrogen bonds is given below. Each possible donor is shown, along with possible acceptors, the number of times the hydrogen bond occurs in the ensemble, and statistics on the calculated energies of the hydrogen bonds. NB, this is not the exact format, because chain identifiers are now output


  don  acc  no.  mean   adev   sdev   svar   skew   curt
   4    2    1  -3.94
   6    3    1  -3.47
   6    4    5  -4.77   0.37   0.52   0.27   0.66  -1.40
   7    3    1  -1.95
   7    4    9  -4.52   0.57   0.74   0.54   0.30  -1.23
   7    5    1  -5.15
   8    3    1  -5.74
   8    4    1  -3.92
  10    6    1  -4.22
  10    7    8  -1.17   0.75   0.92   0.84  -0.24  -1.66
  11    7    8  -4.76   0.13   0.17   0.03  -0.68  -0.94
  11    8    1  -0.01
  12    8   19  -5.40   0.10   0.12   0.02  -0.02  -0.88
  12    9    1   0.47
  13    9   19  -5.28   0.14   0.19   0.03   1.02   0.76
  13   10   19  -1.21   0.26   0.32   0.10   0.64  -0.85
  14   10   19  -4.62   0.35   0.42   0.18   0.19  -1.00
  14   11   16  -0.74   0.56   0.63   0.40  -0.22  -1.68
  15   11   19  -3.13   0.51   0.68   0.47   0.36   0.37
  15   12   18  -1.51   0.59   0.90   0.81   0.95   1.31
  16   11   18  -4.37   0.74   0.88   0.77  -0.05  -1.34
  20   17   18   0.00   0.28   0.39   0.15  -0.78   0.28
  21   17   18  -4.29   0.27   0.32   0.11  -0.14  -1.44
  21   18   19  -1.04   0.55   0.64   0.42  -0.05  -1.44
  26   24    1  -4.21
  27   25    6  -4.34   1.36   1.60   2.55   0.56  -1.87
  29   26    1  -3.14
  30   27   12  -2.78   0.51   0.56   0.31  -0.30  -1.85
  34   31    3   0.14   0.42   0.62   0.38   0.06  -2.33
  36   32   18  -4.93   0.37   0.43   0.19   0.30  -1.36
  37   33   19  -5.09   0.28   0.33   0.11   0.72  -1.03
  37   34    1  -1.82
  38   34    3  -3.96   0.15   0.23   0.05   0.01  -2.33
  38   35   16   1.15   0.24   0.29   0.09  -0.40  -1.04
  39   35    6  -4.70   0.03   0.04   0.00   0.27  -1.62
  40   36    7  -3.86   0.35   0.45   0.20   0.56  -1.52
  41   37    3   0.02   2.35   3.05   9.33  -0.38  -2.33
  41   38    1  -1.64
  41   39    1  -2.98
  42   37    1  -2.39
  42   38    2  -3.23   1.60   2.26   5.09   0.00  -2.75

Side-chain analsyses are similar, but the acceptor atoms are shown also. This allows the use to see if a particular acceptor atom is uniquely involved in a hydrogen bond in cases where this may be ambiguous (e.g. in aspartate residues, atoms OD1 and OD2). Remember to use the "correct" and "validate" commands before to create correctly named atoms where the atom name depends on residue side-chain conformation (you may also need to use the "terminii X-PLOR" if you are using X-PLOR format files).


  don  acc  atom no.  mean   adev   sdev   svar   skew   curt
   6    6  OG     1  -0.94
  18   17  OD1    3  -0.79   0.35   0.47   0.22   0.27  -2.33
  19   17  OD1   19  -4.38   1.59   1.77   3.13   0.36  -1.72
  20   17  OD1    5  -4.01   1.75   2.01   4.04  -0.30  -2.23
  23   34  OD1   17  -1.72   1.70   2.01   4.04  -0.28  -1.20
  23   34  OD2    4  -3.30   1.72   2.31   5.36   0.71  -1.72
  24   24  OG1   19  -2.58   0.50   0.63   0.39   0.79  -0.73
  24   34  OD2    7  -4.63   1.21   1.47   2.16   0.80  -1.45
  25   24  OG1   18  -5.27   0.05   0.06   0.00   0.71  -0.83
  26   34  OD2    2  -4.05   2.26   3.20  10.26   0.00  -2.75
  28   27  OD1    1  -1.16
  29   27  OD1   16  -6.41   0.57   0.82   0.67   1.66   1.24
  31   34  OD2   19  -5.73   1.26   1.61   2.60   1.22  -0.40
  36   36  OD1    2  -0.06   0.17   0.25   0.06   0.00  -2.75

These commands require the NMR structure refinement module to be licensed.

Prediction of NOEs from structure
To predict structurally relevant NOEs one might expect to observe in a given three-dimensional structure (including multimeric proteins), which would be expected to appear in the region of a NOESY spectrum (F1 (0 - 12 ppm), F2 (5-12 ppm) use the command:

	predict noes lower upper

where the lower and upper represent bounds on inter-protein distances. Information on expected intra-residue, inter-residue and inter-chain NOEs. No chemical shift degeneracy of protons is assumed (even methyl groups at present unfortunately).

For example,

	predict noes 1.8 7.0

would report all relevant inter-proton distances between 1.8 and 7.0 Angstroms in a structure. Effectively then, NOEs between pairs of protons where one of the pair is either an amide proton (main-chain or side-chain) or a ring proton are predicted. Intra-residue, and medium and long range NOE predictions are detailed separately (see the example output below).

NB if you wish to explicitly investigate inter-chain NOEs on multi-chain proteins, make sure that the chains have different chain identifiers (different segment identifiers are not sufficient) in the coordinate file.

Example output (shortend for reasons of space) is given below.

!Possible NOEs for residue _   7  VAL, forward in sequence
!Intra residue NOEs
INTRA_RES         atom  HN  res _   7  VAL - atom  HA  res _   7  VAL dist  3.0
INTRA_RES         atom  HN  res _   7  VAL - atom  HB  res _   7  VAL dist  2.6
INTRA_RES         atom  HN  res _   7  VAL - atom HG11 res _   7  VAL dist  4.7
INTRA_RES         atom  HN  res _   7  VAL - atom HG12 res _   7  VAL dist  4.9
INTRA_RES         atom  HN  res _   7  VAL - atom HG13 res _   7  VAL dist  4.4
INTRA_RES         atom  HN  res _   7  VAL - atom HG21 res _   7  VAL dist  3.0
INTRA_RES         atom  HN  res _   7  VAL - atom HG22 res _   7  VAL dist  4.3
INTRA_RES         atom  HN  res _   7  VAL - atom HG23 res _   7  VAL dist  3.4
!Inter residue NOEs
INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HN  res _   8  ARG dist  2.5
INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HA  res _   8  ARG dist  5.0
INTER_RES_(i,i+1) atom  HN  res _   7  VAL - atom  HG2 res _   8  ARG dist  6.7
INTER_RES_(i,i+1) atom  HA  res _   7  VAL - atom  HN  res _   8  ARG dist  3.6
INTER_RES_(i,i+1) atom  HB  res _   7  VAL - atom  HN  res _   8  ARG dist  2.0
INTER_RES_(i,i+2) atom  HN  res _   7  VAL - atom  HE1 res _   9  LYS dist  6.9
INTER_RES_(i,i+2) atom  HN  res _   7  VAL - atom  HE2 res _   9  LYS dist  5.7
INTER_RES_(i,i+2) atom HG12 res _   7  VAL - atom  HN  res _   9  LYS dist  6.4
INTER_RES_(i,i+2) atom HG13 res _   7  VAL - atom  HN  res _   9  LYS dist  5.2
INTER_RES_(i,i+3) atom  HB  res _   7  VAL - atom  HN  res _  10  TYR dist  5.3
INTER_RES_(i,i+3) atom  HB  res _   7  VAL - atom  HD1 res _  10  TYR dist  6.9
INTER_RES_(i,i+3) atom HG21 res _   7  VAL - atom  HN  res _  10  TYR dist  6.4
INTER_RES_(i,i+3) atom HG23 res _   7  VAL - atom  HN  res _  10  TYR dist  5.9
INTER_RES_(i,i+3) atom HG23 res _   7  VAL - atom  HD1 res _  10  TYR dist  6.9
INTER_RES_(i,i+4) atom  HN  res _   7  VAL - atom  HN  res _  11  ALA dist  5.9
INTER_RES_(i,i+4) atom  HA  res _   7  VAL - atom  HN  res _  11  ALA dist  4.8
INTER_RES_(i,i+4) atom  HB  res _   7  VAL - atom  HN  res _  11  ALA dist  6.3
INTER_RES_(long)  atom  HN  res _   7  VAL - atom HD11 res _  18  ILE dist  6.6
INTER_RES_(long)  atom  HN  res _   7  VAL - atom HD12 res _  18  ILE dist  6.9
INTER_RES_(long)  atom HG12 res _   7  VAL - atom  HN  res _  29  ARG dist  7.0
INTER_RES_(long)  atom  HB  res _   7  VAL - atom  HN  res _  30  VAL dist  5.8
INTER_RES_(long)  atom HG11 res _   7  VAL - atom  HN  res _  30  VAL dist  5.5
INTER_RES_(long)  atom HG12 res _   7  VAL - atom  HN  res _  30  VAL dist  4.2
INTER_RES_(long)  atom HG13 res _   7  VAL - atom  HN  res _  30  VAL dist  4.7
INTER_RES_(long)  atom HG22 res _   7  VAL - atom  HN  res _  30  VAL dist  6.3

NB This command is useful for sorting out ambigous NOEs in spectra by analysing calculated structures. You should expect, however, that some peaks in your spectra, predicted from this simple analysis of structures may be missing or some extra peaks present due to other physical effects (e.g. spin diffusion).

Assembling and disassembling multiple model PDB files
To make a single pdb file consisting of an ensemble of selected single structure pdb files, use the "pack structures" command. An example of a script is:

	!instruct NAOMI to produce an ensemble file
	pack structures

	!use structures listed in the file "strucs.lis"
	for pdb list strucs.lis
	{

	!make sure atoms are named correctly
	validate                     

	!only use residues 30 to 50 in chain A
	zone none
	select A30:A50

	!write each structure to the ensemble file
	pdb_write
	}

The ensemble pdb file is called "ensemble.pdb" and will be placed in the "report" directory that you specified at the beginning of your complete NAOMI input script.

Selecting only regions of a family is useful for some aspects of structural analysis, for example you might want to omit disordered terminii for analysing an ensemble of structures using PROCHECK_NMR.

Predicting the 3-D structure of protein folding intermediates
Sorry, this section of the documentation is not currently avaiable

Analysis of protein-protein complexes - favourable interactions

Sorry, this section of the documentation is not currently avaiable

Machine-parsable 3-D information

In order to allow results of structure analyses performed by NAOMI to be simply interfaced to bioinformatics software, the command

	table structure_info

is provided. The output is designed to be machine parsable and includes information that can usefully be included in multiple sequence alignments in cases where the alignment containts sequence(s) of proteins with known three-dimensional structure.

Information output includes the sequence of the protein for which coordinates were found, secondary structure, disulphide bridging information, key residue contact number (see elsewhere in the User Guide, and whether the side-chain is buried or exposed.

Sample output is given below (The word "3D_info" flags start of information, "end 3D_info" flags end of output)


3D_info
!Amino acid sequence of protein for which coordinates are available
pdbsequence
>1rnb.pdb
QVINTFDGVADYLQTYHKLPNDYITKSEAQALGWVASKGNLADVAPGKSI
GGDIFSNREGKLPGKSGRTWREADINYTSGFRNSDRILYSSDWLIYKTTD
HYQTFTKIR
// end sequence
!Residue level information follows...
!pdb chain_id is output if there is one, '-' if not
!Syntax for sec_struc is:
!sec_struc < helix |  strand | loop >
!Syntax for disulphide is:
!disulphide < absolute residue number >
!Syntax for key_residue contact number is:
!contact < contact number >
!Syntax side-chain solvent accessibility:
!access < buried | exposed >
residue 1
        chain -
        type Q
        pdb residue id   2 
        sec_struc loop
        access exposed
end residue

residue 2
        chain -
        type V
        pdb residue id   3 
        sec_struc loop
        access exposed
end residue
...
  (information for most residues not shown in the User Guide for
   reasons of space)
...

residue 107
        chain -
        type K
        pdb residue id 108 
        sec_struc strand
        contact 2
        access exposed
end residue

residue 108
        chain -
        type I
        pdb residue id 109 
        sec_struc strand
        contact 6
        access buried
end residue

residue 109
        chain -
        type R
        pdb residue id 110 
        sec_struc strand
        contact 3
        access exposed
end residue

end 3D_info

Author: Simon M. Brocklehurst