Why use NAOMI?
NB You need a terminal "/" on the end of the directory name.
Typical contents of NAOMI_lm.lic might be:
A basic input file is given below.
Include the contents of the file alloc_memory.inp into the script.
(See above)
Set the directory where some output will be directed. NB THIS DIRECTORY
MUST EXIST BEFORE YOU RUN THE PROGRAM (use the UNIX mkdir command)
Set the directory from which you wish to read the pdb file
Tell NAOMI to read the pdb file 1lyz.pdb (the coordinates of lyzozyme)
zone 1 5 means work on residues 1 through 5 only
then calculate the main chain dihedral angles phi psi for these residues
Finally, for the active zone, for each residue write out the residue
number, residue name, phi psi. The nl command says put a new line
character here. The reason for this is explained in the documentation
of the write command.
Then typing
Many of the things you can do with NAOMI don't require a lot
of complicated commands to be typed in. So in order to make
things as clear as possible in these instructions, I've organised
things in sections, like: Calculating hydrogen bonds.
Use the command
This is done in two parts i) allocating memory, and directory setup,
ii) reading in the file.
NOTE this memory allocation should only be done once at the start of
a script - if you read several pdb files in a script, NAOMI will deal
with it. In other words, the residue limit command just says that in
a given script, you will never be looking at a single pdb file with
more than 2000 residues in it.
If you are doing any modelling, you should also allocate some memory
for the amino-acid library which contains examples of standard amino
acids. The command to use is
(see section constructing a residue library)
The second directorty is where the pdb file(s) you wish to work on
are held. This might be a personal directory, or the root directory
of your local Brookhaven distribution (see the idcodes command).
Use the command
For some kinds of analysis, you will want to be able to repeat
a set of commands on several pdb files e.g. analysing ensembles of
nmr structures, doing some analysis of the whole protein databank etc.
In NAOMI, a special kind of loop is available to let you do this. What
you need to create is a file with a list of PDB files in that you want
to use. Then any commands inside the curly braces will be performed on
all of the files contained in the file called filename (the files should
be found in the directory specified with "set pdb dir").
Any residues in the protein may be selected for use in analysis and
computation. By default, all the residues are selected.
Two commands are provided to make selections.
If you want only to use a single contiguous range of residues, use
the command
NB if the residue has a chain identifier, then this must
be concatenated to the the residue id e.g,
Wildcards are also available with the select (and select2)
commands e.g.
Simple calculations, relevant at the residue level, are performed by using
the calc command which has a syntax
The write command allows the output of the results. The syntax is
similar, except you need to indicate where new lines (nl) should be
placed in the output e.g.
You may output information about the overal protein with the command
Possible parameters are shown below grouped into related sections:
General (requires no calculations)
Solvent accessibility (requires "use solvent access")
Some commands within NAOMI will be recognised only if you have
entered a particular subsection of the program. For example
some parts of database construction and simulation facilities
are within a subsection called TYRA. The command to go into
a sub-section is the name of the sub-section. To return to
the main level, type end. That is,
Sorry, this section of the documentation is not currently avaiable
Sorry, this section of the documentation is not currently avaiable
Often, a PDB file will have some atoms missing, for example if
a side-chain is not visible in an electron density map, it will
usually be modelled as an alanine residue. To build in missing atoms,
use the command,
If you prefer, you can place the missing atoms and energy refine the
new side-chains (rather than keeping as much of the old side-chain
as possible). Again within the TYRA level do,
3-D protein structures sometimes have incomplete main-chains. In the
case of X-ray structures, this is often because electron density for
particular residues is either too weak to interpret or completely absent.
It is convenient for some types of structural analysis to know where
such "missing" parts of a structure are.
The commands:
Reconstruction of some symmetric protein structures from their supplied
coordinates requires cyclic permutation of the coordinates. The command:
You can renumber residues in a PDB file to be consecutively numbered
starting at a given number with the command
The command
At present this command writes out calculated amide proton data,
for use with the "molscript hbonds" command.
The correct atom names for some side-chain atoms in PDB files depend
on the conformation of the side-chain. For many computations (e.g.
calculation of three-dimensional structure from NMR derived data) the
atoms in a protein need to be named before the conformation of the residue
is known.
Thus it is common to find errors in the naming of some atoms. To rename
atoms correctly, use the command
Sorry, this section of the documentation is not currently avaiable
Interfaces to the programs MOLSCRIPT (P. J Kraulis), INSIGHT II
(Biosym), and QUANTA (MSI) are provided to make scientific visualization,
and preparation of figures for presentation and publication
time-efficient. Intrinsic graphical output, in PostScript format, is
also produced for some commands.
Use the command,
You need to configure your account to use the RASMOL interface. Set
the environment variable NAOMI_RASMOL_PATH to the pathname of the
version of RASMOL you want to use with NAOMI (some sites have different
versions of RASMOL compiled for different machines e.g. with 8-bit
or 24-bit (32-bit) colour.
In your .cshrc file, put a line similar to the one below, substituting
the "/usr/people..." with the pathname for RASMOL at your site (don't
forget to source this file and rehash after doing this).
The commands
Use the command
Within a NAOMI script, if you wanted for example to compress a file called
naomi.dat, you could have the command:
When main-chain modelling, all residues that are currently selected
(with the zone or select commands) and on the C-teriminal side of the
dihedral angle that is being changed will move.
In some cases, when calculating NMR structures from primary data, bias is
introduced into the system by using the same starting structure for each
calculated structure (especially in regions of structure that are poorly
defined by restraints). A similar problem of bias has been noted in other
applications where three-dimensional structures are calculated from distance
restraints (e.g. comparative modelling, protein folding pathway modelling).
To minimize the effect of bias introduced into such systems, a facility
for generating random families of structures, but having good covalent
geometry is provided. Within the TYRA level of commands, use:
The family of structures is created in the "report" directory you specify
in your input script.
To obtain reproduceable results, and/or generate several different
families of structures, you must specifically "seed" the random
number generator. Within the TYRA level of commands, you may do
this by using the command
So, if you are using X-PLOR, you might use the folling input script
to generate a family of 40 random structures:
NB if the residue has a chain identifier, then this must
be concatenated to the the residue id e.g,
If you have a license for the PROTEIN ENGINEERING/DEISGN
module of NAOMI, you can use the disulphide engineering commands
to investigate possible alternative models for disulphide bonds (an
exhaustive conformational search of energetically favourable disulphide
bond conformations is performed).
Sorry, this section of the documentation is not currently avaiable
Sorry, this section of the documentation is not currently avaiable
The command
Then use any combination of the following commands to output information
about hydrogen bonds in the protein.
Example output is shown below:
The "table side-chain_hbonds" command lists possible side-chain - side-chain
hydrogen bonds in the protein. Example output is shown below:
First, use the command:
This command invokes calculation of both
absolute (in units of square Angstroms) and percentage
(100% accessibility corresponding to the accessibility of residue X in the
three-residue peptide G-X-G, where G, X and G are in extended main-chain and/or
side-chain conformations) accessibilities for all selected atoms and
residues. The calculations are rapid compared with many solvent
accessibility algorithms - accessibilities for 1000 atoms can generally
be calculated in approximately 10 seconds on a MIPS R4600SC Workstation.
The following commands then allow output of the results of the
calculations:
NB residue near terminii and chain-breaks may apparently have
greater than 100% accessibilities because 100% is calculated within
a 3-residue segment.
More complex solvent accessibility options
Use the commands
Example output is shown below
To identify and classify beta hairpins in a structure according to the
nomenclature of Wilmot and Thornton, use the commands
If you want to have the conformation of each residue in the loop output
as well (A = alpha, B = beta etc), then do:
Identifying close attractive van der Waals interactions between pairs of
non-polar groups is a useful way of identifying the roles that particular
residues play hydrophobic cores of proteins. Frequently, this type
of analsysis is more information-rich than calculation of solvent
accessible surface areas because details of intramolecular interactions
are reveailed (for example if you wanted to know how a helix was
interacting with a beta sheet). See the examples sectionon the NAOMI
Web Site.
Use the commands
A graphical representation of this can be obtained if you have access to
the program molscript:
The so-called 'key' residues in a fold are defined as those residues that:
make a significant contribution to the hydrophobic core(s) of a protein;
and/or those which have main-chain conformations that are energetically
favourable for only a small subset of the 20 naturually occuring amino acid
residues.
Use the commands
Below a residue, the "short-hand" nomenclature for the main-chain
conformation is shown - only residues with a positve value of phi are
indicated (either +, L, G or E). Obviously proline residues are
important in a fold, but these can be identified from
the sequence alone (as opposed to analysing a structure).
NB Molscript format input files can be produced by using the
"molscript disulphides" after "use disulphides".
One of the best ways to attack this problem is to analyse ensembles of
structures calculated without hydrogen bond restraints, to see where
donor-acceptor pairs can be identified unambiguously. In combinatation
with hydron exchange NMR experiments, this approach can, in favourable
cases allow unique identification of both donor and acceptor partners of:
This analysis is highly recommended as a way of determining hydrogen
bonding partners in NMR structure determinations, and is preferable
to simply assuming for example that helices consist solely of i,i+4
hydrogen bonds etc. It is also recommended that information regarding
an ensemble calculated without hydrogen bond restraints be presented in
published work.
The commands:
Please note, if you do not use precisely this script, the behaviour
of NAOMI is undefined. Always be careful that your reports directory does
not contain any rogue "tmp" files before starting this analysis. There
are potentially problems with system resources for this analysis. If you
are working on the structure of protein of more than 200 residues, you will
need a special version of the program - please contact the author in this
case (the system resources required by these commands are unaffected by
memory allocation commands, so you cannot change them yourself).
Example output for main-chain - main-chain hydrogen bonds is given
below. Each possible donor is shown, along with possible acceptors,
the number of times the hydrogen bond occurs in the ensemble, and
statistics on the calculated energies of the hydrogen bonds.
NB, this is not the exact format, because chain identifiers
are now output
For example,
NB if you wish to explicitly
investigate inter-chain NOEs on multi-chain proteins, make sure that
the chains have different chain identifiers (different segment identifiers
are not sufficient) in the coordinate file.
Example output (shortend for reasons of space) is given below.
Selecting only regions of a family is useful for some aspects of structural
analysis, for example you might want to omit disordered terminii for
analysing an ensemble of structures using PROCHECK_NMR.
Information output includes the sequence of the protein for which
coordinates were found, secondary structure, disulphide bridging
information, key residue contact number (see elsewhere in the User
Guide, and whether the side-chain is buried or exposed.
Sample output is given below (The word "3D_info" flags start of information,
"end 3D_info" flags end of output)
Many of the features offered in NAOMI are presently unique, but subsets of
its features (particularly in the general structural analysis section) are
offered in varying ways by other software. In developing this program,
high on the list of priorities were: making the program easy to set up,
learn and use (making the output intuitive to understand); incorporation
of high quality (in terms of results and efficiency), novel algorithms.
NAOMI is currently used by structural biology and chemistry laboratories
throughout North America, Europe and Japan.
Getting up and running with NAOMI
Installation
NAOMI has a built-in license manager. You need to set
an environment variable so that NAOMI can find the license
file it requires in order to run. Put
setenv NAOMI_LM directory-name
into your .cshrc file so that it will be set each time you log
on. The directory name should be the name of a directory
on your system that contains a file called NAOMI_lm.lic.
Licenses for my computer
FEATURE 1 F6483F345229CDE
FEATURE 2 953B34483FF
FEATURE 3 38BC82F342E
Feature license keys are available from the author. Feature 1
is the main key to allow the program to work. Feature numbers
higher than 1 "switch on" other features that are not available
by default. For example FEATURE 2, allows the Protein Engineering
and Design module to work.
Files that you need
The basic requirement for running NAOMI is a file that contains
the "script" specifying what you want to do with your protein structure.
This must contain instructions giving an estimate of the number of residues
in the protein. Typically, you might have a "standard" memory allocation
of 500 residues (probably you will never need more than 7000 residues).
Don't routinely use a residue limit much larger than you need - it
wastes system resources.
It's a good idea to make a file called "alloc_memory.inp" that you can
include in all in your NAOMI input files by using the @ symbol.
So create a file called "alloc_memory.inp" which should contain
something analagous to:
! Standard dynamic memory allocation include file for medium sized proteins
residue limit 500
residue library size 20 /usr/local/lib/naomi/lib/res_lib1.pdb
with the pathname for res_lib1.pdb set appropriately for your system
(Check with your local NAOMI expert where the default residue library
has been installed).
An example input script file to get NAOMI up and running (this
is a simple example, you'd probably not actually want to ever do
it for real)
Edit a file, perhaps you could call it "test.inp" (but it doesn't
really matter. Put in the following lines to make a NAOMI script.
@alloc_memory.inp
set report dir ./reports/
set pdb dir /nike/old_eve/smb18/pdb/
read pdb 1lyz.pdb
zone 1 5
calc phi psi
write rnum rname phi psi nl
This tells NAOMI to do the following things:
Running naomi, using the example file
The executable must be in a directory on your path.
naomi < file1 > file2
will read the input commands from file1 and place the output in file2.
So type
naomi < test.inp
The following output (or something similar) should appear on your
screen:
|-----------------------------------------------------------|
|-------------------- N A O M I ------------------------|
| Simon M. Brocklehurst
|***********************************************************|
|* Version 2.0 *|
|***********************************************************|
|* It is a condition of use of this software that you cite *|
|* the reference(s) given below in any published work. *|
|* *|
|* (1) Simon M. Brocklehurst & Richard N. Perham (1993) *|
|* Protein Science 2, 626-639 *|
|* *|
|***********************************************************|
|___________________________________________________________|
Including file alloc_memory.inp
|***********************************************************|
This version was last updated on Aug 9 1994,10:22:05
This output file was produced on Wed Aug 17 13:57:06 1994
|***********************************************************|
1lyz.pdb
1 LYS n/c 122.5
2 VAL -98.7 126.9
3 PHE -73.0 166.2
4 GLY -111.1 159.0
5 ARG -56.4 -77.0
iv) If you have acces to an WWW browser with FORMS, then you can
use the NAOMI HTML program launcher.
Command Syntax
Including another NAOMI script, within a script
@filename
to include a previously prepared script into the current script.
Of course, you can have as many "scripts within scripts" as
the memory of your computer will allow.
Reading a pdb file into NAOMIi) Allocating memory and Directory Setup
Allocating memory
Before you read in the file, you must allocate some memory for
the protein. Although NAOMI allocates memory dynamically, it does
need an upper limit of the number of residues in the protein. Usually
imposing an upper limit of 2000 residues will suffice (rarely do you
need more than 7000). If RAM is tight on your computer, you can set
the limit to a much lower value. The command to use is:
residue limit integer
where integer would be say 2000.
residue library size integer pathname
e.g.
residue library size 20 ~/naomi/lib/res_lib.pdb
Directory setup
NAOMI requires access to two directories. The first is the
so-called "report" directory. This is where some output will
be directed e.g. some error messages that are not output directly
to you specified output file. You must have the line
set report dir directory
where directory must have the terminating "/" character e.g.
set report dir /usr/people/smb/naomi/
or
set report dir ./
to simply set the current directory as the reports directory.
set pdb dir directory
to set this analagously to the set report dir command i.e. include
the terminal "/".
ii) Reading a pdb file
After having performed the initial set up, use the command
read pdb filename
to read in the coordinates for the structure (an approximation to
Brookghaven or X-PLOR file formats is expected). NAOMI is reasonably
intelligent about this, you will find that many files that other programs
"choke" on, can be read. You can make NAOMI write out a "proper" pdb file,
using the "correct" and "validate" commands.
Repeating the same commands on several PDB files
for pdb list filename
{
}
The "for pdb list" effectively replaces the "read pdb" command that is
used for operations on single files
Brookhaven directory structures and idcodes
The current release of the Brookhaven Protein Databank (1994) has
a directory hierarchy to make it quicker to browse the databank.
The command
idcodes on
means that you should use the brookhaven idcode instead of a filename.
This allows you to analyse the whole pdb easily. NAOMI will then look
for the file in the appropriate place. You should set
the directory where your brookhaven files are with the "set pdb dir" command.
e.g.
set pdb dir /brookhaven/distr/
idcodes on
read pdb 6PTI
will read the file /brookhaven/distr/pt/pdb6pti.ent. The command
"idcodes off" means that actual filenames should be used.
Selecting "active residues" of a protein
zone res1 res2
to select a specified contiguous subset of residues in the protein
for further work. This erases any other selection you have made
earlier on in the script
zone C154 C198
To (re)select the entire protein for further work, use
zone all
If you want to build up a more complex residue selection, use the
"select" comand viz:
select res1, res2, res3:res4, res5 etc
This allows both the selection of individual residues and ranges
of residues (res3:res4 selects a range from res3 to res4).
You must place ","s in beween the items in the selection list.
The select command does "not" erase previous selections. The following
example shows how you could select residues 1, 5, 20, 31,32,33,34 and 40
zone none
select 1, 5, 20
select 31:34, 40
Note the use of "zone none" to initially deselect all residues.
zone none
select A*
would select all residues in chain A.
"calc" and "write"
calc param1 param2 parem3... etc
e.g.
calc phi psi short
would calculate the dihedral angles phi and psi, and also a short
hand nomenclature which characterises the conformation of a residue
based on phi and psi. Calculations are performed for the selected
zone of the protein (by default the whole molecule).
write rnum rname phi psi nl
would produce a list of residue number, name and phi/psi values for
the active zone of the protein (by default the whole input coordinate
file).
calc:
phi
psi
omega
chi1
chi2
chi3
chi4
chi5
short
curvature - how tightly coiled the polypeptide chain is
at a given residue position...
hydrophobic - calculated details of neighbours making
hydrohobic contacts in 3-D dimensions.
error - residue averages for B-factors (X-ray) and
r.m.s.d. (NMR)
write: as for calc, but also
rnum
rname
grid
"protwrite"
protwrite
NB It is the users responsibility to ensure that they have
previously envoked any relevant calculations with particular "use"
commands.
NumberOfResidues
TotalAccessAbsolute
TotalAccessHydrophobicAbsolute
TotalAccessHydrophobicPercent
TotalAccessPureHydrophobicAbsolute
TotalAccessPureHydrophobicPercent
So an example script might be:
use solvent access
protwrite NumberOfResidues TotalAccessAbsolute
The following output might be produced
124 2345.345
which would indicate that the protein had 124 residues with a
total solvent accessible surface area of 2345.345 Square Angstroms (Note
that this figure is not accurate to 3 d.p., rather it is the
generic floating point precision output by the protwrite command)
Program sub-sections within NAOMI
tyra
command 1
command 2 etc
end
This documention will always tell you when a specific command is part
of the TYRA section of the program.
Making a side-chain entropy database
Constructing a residue library
Building in "missing atoms" to a protein
repair side-chains
within the TYRA level of commands. This will keep the conformation
of the old side-chain where atoms are available, and will set
newly placed regions of the side-chain in an extended conformation,
or in an appropriate conformation if a ring is involved.
repair and refine side-chains
Identifying chain-breaks
use chain-breaks
table chain-breaks
provide a list of main-chain-breaks, in the form of a list pairs of
residues falling on either side of a break in a polypeptide chain.
Example output for a protein with two breaks in the chain (in chain C
between residues 56 and 62, and in chain C again between residues 73 and
78) is shown below:
NAOMI>OUTPUT List of residues at ends of breaks in main-chain of rec_C.pdb
NAOMI>OUTPUT C 56 HIS - C 62 GLY
NAOMI>OUTPUT C 73 THR - C 78 GLN
Note you may wish to pipe the output from these commands through
egrep e.g.
naomi < rec_C.inp | egrep -v "WARNING"
Note: expected breaks in multi-chain proteins are ignored, provided
that chain identifiers are properly used in the PDB file.
Cyclic permutations
cyclic permute
transforms a set of x,y,z coordinates to y,z,x. Thus, three applications
of the command will restore the original coordinates i.e.
first time: x,y,z -> y,z,x
second time: y,z,x -> z,x,y
third time: z,x,y -> x,y,z
Use this command in conjunction with the pdb_write to out coordinates
at each stage of the manipulation.
Alterations to atom and residue numbering
reset_resnum resnum
e.g.
reset_resnum 4
would number all the residues in the protein so that the first residue
is 4, the second is 5 etc.
Writing out a pdb file
pdb_write
writes out the current zone to the reports directory, with the
SAME filename as the input pdb file. NB if the pdb directory,
and the reports directory are the same - the original file will be
overwritten.
Renaming atoms according to IUPAC
nomenclature
validate
which correctly renames atoms in the protein. If you wish to write out the
correct
structure, either use a script as shown in the example below. This example
particular shows how to rename the atoms in an ensemble of structures, but
you do the same operation on a single file.
correct
for pdb list ensem.lis
{
validate
}
You could equally well use the pdb_write command within the bracketed loop
for pdb list ensem.lis
{
validate
pdb_write
}
The correct command, is uniquely linked with the validate command. The
pdb_write command is a more general command. If you are using X-PLOR format files, you
should use the script:
for pdb list ensem.lis
{
terminii X-PLOR
validate
pdb_write
}
immediately before the validate command.
Predicting protein-protein interactions
Graphical output
MOLSCRIPT Interface
molscript parameter
where parameters can be:
contacts - input file showing both intra and inter chain
hydrophobic interactions schematically
(do "calc hydrophobic" before this command)
hbonds - hydrogen bonds (needs calculated HN positions
thus use pdb_write command) (do use hbonds first)
sec_struc - produce input file for cartoon plot
(do "use sec_struc before using this command")
sec_struc_col1 - produce input file for cartoon plot
(do "use sec_struc before using this command")
rainbow coloured from N (violet) - C (red)
terminii and distance depth-cued
sec_struc_col2 - input file for cartoon plot
(do "use sec_struc before using this command")
rainbow coloured from N (violet) - C (red)
terminii and NO distance depth-cueing
In addition, the commands
molscript on
molscript off
switch on and off production of MOLSCRIPT input files when particular
commands are executed e.g.
predict possible binding sites
can produce an input file to visualize the results, by using the program
MOLSCRIPT.
RASMOL Interface
setenv NAOMI_RASMOL_PATH /usr/people/smb/smb-bin/bin/rasmol.24
At any point in your script after you read in a structure, you can
automatically start up a RASMOL interface by using the command
start rasmol
The backbone of the structure will displayed, with each chain coloured
differently. The residues that are currently selected within NAOMI
will be coloured purple
For example, the following script, will read in a structure to NAOMI,
start and start up the NAOMI-RASMOL interface, with the A chain complex
purple
read pdb struc.pdb
zone none
select A*
start rasmol
If you have licenses for various FEATURE modules, you can also use the
commands
rasmol on
rasmol off
To start up rasmol with more complicated representations as described
at the relevant places in the documentation for these MODULES.
INSIGHT II Interface
insight on
insight off
switch on and off the production of BIOSYM COMMAND LANGUAGE (BCL) files.
These files appear in the reports directory, and when read into
INSIGHT II at the command line, set up new commands in the program
which appear as options on pull-down menus.
QUANTA
quanta parameter
Intrinsic NAOMI graphical output
Executing UNIX shell commands within NAOMI
Use the system command:
system string
where string is passed to the shell that naomi was started from.
system compress naomi.dat &
Known bugs
Assessing bad contacts in a protein
Within the TYRA level of commands, use the command
bad contacts
to assess the extent to which atoms are making bad contacts i.e. the extent
to which atoms are too close to each other. It returns a number describing
the whole protein.
Setting dihedral angles to specific values
Within the TYRA level of commands, use
set torsion res tor angle
e.g. set torsion A5 chi1 -120.0
would set chi1 for residue 5 (chain A) to -120.0 degrees. tor can be
phi, psi, omega, chi1, chi2, chi3, chi4 or chi5. Note if you try to set
a degree of freedom that is not appropriate (e.g. a ring-opening torsion),
the command will fail. NAOMI should warn you if this happens.
Setting dihedral angles to random values
Please note, the commands:
random seed
randomise
are available only if you have a license for the NMR structure refinement
module.
randomise param family-name n
where
param can be: main, side or both
family-name the name of the family of structures to be generated
n is number of members required for the family
For example.
randomise main s_ 10
would generate 10 structures with names "s_1.pdb", "s_2.pdb", "s_3.pdb"
etc... having the values of phi and psi set to random values for
all currently selected residues (omega values are left unchanged
from the structure that was read in - if you want to change these
these must be set manually using the set torsion command). That is,
main specifies that main-chain dihedrals should be randomised,
side specfiies that "allowed" side-chain dihedrals should be
randomised - both randomises both main-chain and side-chain
dihedral angles.
random seed integer
For example
random seed 12345
NB obviously you should place the "random seed" command before using
the "randomise" command in your input script
read pdb template.pdb
terminii X-PLOR
tyra
random seed 145625
randomise both s_ 40
end
'Mutating' residues
Within the TYRA level of commands, use
mutate res code
where type is either a 1 or 3 letter amino acid code. NB
upper or lower case acceptable for the code.
e.g. mutate 7 e
or mutate 7 glu
would make residue 7 a glutamate residue. Conformations
are from the library. You can change the conformation of the
new side-chains by using the "set torsion" command, or the
"remove bad contacts" command.
mutate A56 TYR
Automatically moving a side-chain to minimize bad
contacts
Within the TYRA level of commands, use the command
minimize bad contacts res
where res should be a concatenation of chain_id and residue id
e.g.
minimize bad contacts A26
would move the side-chain of residue 26 in chain A to a position making
the minimum number of bad contacts with the rest of the protein.
This is a useful command to use after using the 'mutate' command.
Disulphide bond maniplulation
Manipulating disulpide bonds by interactive graphics is awkward, and
sometimes disulphide bonds are modelled into inappropriately strained
conformations in both X-ray and NMR protein structures. In some cases the
strained conformations arise because of errors in the potential functions of
structure refinement programs. You may wish to see if the
conformation can be improved - either to obtain a better fit with
experimental data, or if experimental data is poor or not available, to
obtain a less strained conformation.
Generating side-chain ensembles - Type I
It is often useful to get a picture of how "constrained" the
side-chains in a protein are by their surroundings. Within the TYRA
level of commands,
make side-chain ensemble-1 string
where string is a list of one-letter codes
e.g. IAFYW
will generate an ensemble of 35 pdb files based on the input structure.
The ensemble will have the selected side-chains randomly moved such that they
make no bad contacts with each other. This is useful when one wants to
make no prior assumptions about the behaviour of amino acid side-chains
according to their tertiary environments.
Automatic design of stablilizing mutants
Sorry, this section of the documentation is not currently avaiable
Engineering unstrained disulphide bonds(ii) Within the TYRA level of commands, use the
command
Sorry, this section of the documentation is not currently avaiable
Automatic identification and classification of
secondary structure
NAOMI uses a fuzzy logic alogirthm to recognize secondary structural
motifs in proteins. Decisions are made as to whether possible segements
are, or are not, complete secondary structural elements. This is different
to the approach used by some other programs that identify repeating
patterns (of for example chain conformation, or hydrogen bonds).
use sec_struc
explicitly tells naomi that secondary structural information will
be required later on in the script. It will automatically invoke
calculations of other properties e.g. hydrogen bonds, if these have
not already been calculated elsewhere in the script. Use this command,
follwed by
table sec_struc
to provide a list of secondary structural elements in the protein. The
output takes the form of an overall summary, followed by details
of residue numbers involved in helices, strands (forming part of sheets),
and beta turns. Example output is show below.
..bbbbbbbb.bbbbbbbbbbaaaaaaaaaaaaaaaa....bbbbbb...
bbbbbb
Beta strands
3 - 10
12 - 21
42 - 47
51 - 56
Helices 310, regular, pi
22 - 37
Beta-turns
47 - 50 Type IV AA
A novel algorithm, making use of hydrogen bonding information and
polypeptide chain conformation parameters, is used to recognize
the secondary structural motifs.
Hydrogen bonds
To obtain a list of hydrogen bonds, along with calculated enegies (from
a model using explictly calculated lone-pair positions, and quantifying
both electrostatic effects and quality of orbital overlap), use the
command:
use hbonds
To tell NAOMI to calculate all information about hydrogen bonds in the
protein.
table hbonds_da
table hbonds_ad
table side-chain_hbonds
The "da" in the first command stands for donor-acceptor listing, so a list
of all main-chain donors is output, along with the partnering main-chain and side-chain
acceptors. The calculated energies are useful in deciding which is
the major contributor of bifurcated hydrogen bonds, and also in
analysing secondary structure in detail, e.g. under, over winding of
helices, or missing hydrogen bonds due to helix bends etc.
(Also the command "molscript hbonds" may be used to produced a graphical
representation (see the examples section on the NAOMI Web Site).
Table of hydrogen bonds: Donor to acceptors
for 1rnb.pdb resolution 0.00 Angstroms
(NB remember to validate the structure with the VALIDATE and CORRECT
options before calculating H-bonds for the best results
|Donors |--------------------- Acceptors -------------------------|
| Main | Main | Side chain |
| Chain | Chain | |
86 D # # # # #
87 R 99 T -8.41 # # # #
88 I # # # # #
89 L 97 Y -10.23 # # # #
90 Y # # # # #
91 S 95 L -6.39 # # # #
92 S # # # # #
93 D # # 93 D OD1 -5.08 # #
94 W 91 S -5.10 # # # #
95 L # # 91 S OG -7.76 # #
96 I # # # # #
The "table hbonds_ad" lists analagously but as acceptor to donor.
Table of side-chain - side-chain hydrogen bonds
for rec_B.pdb resolution 0.00 Angstroms
Format is donor - acceptor, with chain, residue number, residue and atom
given for both. NB At present the Energy is actually the donor-acceptor
distance.
B 34 K NZ - B 51 T OG1 (E = 3.19)
B 34 K NZ - B 53 E OE2 (E = 2.98)
B 39 R NH1 - B 132 D OD1 (E = 3.62)
B 39 R NH2 - B 132 D OD1 (E = 2.93)
B 40 S OG - B 42 E OE1 (E = 3.73)
B 45 T OG1 - B 42 E OE1 (E = 3.32)
B 45 T OG1 - B 42 E OE2 (E = 2.91)
Solvent accessibility calculations
Relevent command summary:
use solvent access
table residue_access
table total_access
zone
select
(The following commands require a license for the protein function module)
zone2
select2
Commands are provided for calculation of the solvent accessible surface
(using a fast numerical integration algorithm) area of atoms and residues
in a protein. The solvent accessible surface is taken as that defined
by Lee and Richards i.e the locus of the centre of a probe sphere
(representing a water molecule) rolled over the entire van der Waals
surface of the protein.
use solvent access
to tell NAOMI that solvent accessibility calculations are required
in this script. Remember to only do "use" commands after
you have made your residue selection e.g.
zone 10 15
use solvent access
table residue_access param1 param2
param1 controls whether main-chain, side-chain, both
main-chain and side-chain, total,
or side-chain carbon residue accessibilities are output. It can take the
values:
main
side
both
total
carbon
param2 controls the units of the calculation i.e. whether
absolute accessiblities (in square Angstroms) or percentage
accessibilites are output. It can take the values:
absolute
percent
both
For example, the script:
use solvent access
table residue_access both both
might produce the following output:
NAOMI>Calculating solvent accessibile surface areas...
NAOMI>OUTPUT Residue solvent accessibilities:
NAOMI>OUTPUT main-chain, side-chain (in square Angstroms and percentage)
NAOMI>OUTPUT
NAOMI>OUTPUT 1 K 25 A^2 71 % , 89 A^2 45 %
NAOMI>OUTPUT 2 V 17 A^2 50 % , 71 A^2 54 %
NAOMI>OUTPUT 3 F 4 A^2 12 % , 8 A^2 4 %
NAOMI>OUTPUT 4 G 28 A^2 34 % , 0 A^2 0 %
NAOMI>OUTPUT 5 R 1 A^2 4 % , 72 A^2 31 %
NAOMI>OUTPUT 6 C 2 A^2 6 % , 40 A^2 38 %
NAOMI>OUTPUT 7 E 5 A^2 15 % , 60 A^2 41 %
NAOMI>OUTPUT 8 L 0 A^2 0 % , 0 A^2 0 %
NAOMI>OUTPUT 9 A 0 A^2 0 % , 0 A^2 0 %
NAOMI>OUTPUT 10 A 2 A^2 4 % , 41 A^2 55 %
NAOMI>OUTPUT 11 A 6 A^2 16 % , 16 A^2 21 %
NAOMI>OUTPUT 12 M 0 A^2 0 % , 0 A^2 0 %
NAOMI>OUTPUT 13 K 11 A^2 31 % , 68 A^2 35 %
NAOMI>OUTPUT 14 R 27 A^2 78 % , 154 A^2 67 %
NAOMI>OUTPUT 15 H 9 A^2 25 % , 23 A^2 16 %
NAOMI>OUTPUT 16 G 40 A^2 48 % , 0 A^2 0 %
NAOMI>OUTPUT 17 L 0 A^2 0 % , 0 A^2 0 %
NAOMI>OUTPUT 18 D 7 A^2 18 % , 31 A^2 27 %
NAOMI>OUTPUT 19 N 7 A^2 19 % , 89 A^2 73 %
NAOMI>OUTPUT 20 Y 5 A^2 13 % , 62 A^2 33 %
NAOMI>OUTPUT 21 R 28 A^2 80 % , 112 A^2 49 %
NAOMI>OUTPUT 22 G 69 A^2 84 % , 0 A^2 0 %
NAOMI>OUTPUT 23 Y 0 A^2 0 % , 42 A^2 22 %
NAOMI>OUTPUT 24 S 1 A^2 2 % , 31 A^2 33 %
NB glycine residue side-chains take 0 values for all accessibilities
(because glycine residues don't have side-chains!)
table total_access
outputs information on the total solvent accessible surface of the
protein. Example output is:
NAOMI>OUTPUT Total Solvent Accessible Surface of Protein
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total 6242 A^2
NAOMI>OUTPUT Main-chain 1393 A^2 22 %
NAOMI>OUTPUT Side-chain 4849 A^2 78 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total hydrophobic 3318 A^2 53 %
NAOMI>OUTPUT Main-chain 677 A^2 20 %
NAOMI>OUTPUT Side-chain 2640 A^2 80 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAOMI>OUTPUT Total hydrophilic 2924 A^2 47 %
NAOMI>OUTPUT Main-chain 715 A^2 24 %
NAOMI>OUTPUT Side-chain 2209 A^2 76 %
NAOMI>OUTPUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NB The percentages here are not relatve solvent accessibilities
as they are in the residue-level output. Rather they are percentages
of totals e.g. e.g. Total hydrophobic percentage of 53 in the
table above is relatve to the Total surface accessibility of 6242 A^2
etc
You may wish to calculate the solvent accessiblity of a "molecule"
with some parts effectively "missing". For example, suppose you had a
protein system with two chains, A and B. You may wish to calculate
accessibilities in chain A in the presence and absence of chain B.
To do this, a second level of residue selection is provided (but only
in the protein function module). Some examples should make things clear:
zone none
select A10:A30,A40
zone2 none
select2 A1:A100,B1:B100
use solvent access
This script will calculate accessibilities for residues A10 through
A30, and A40 in the presence of all the atoms in residues A1:A100 and
B1:B100. If "select2 A1:A100" had been used instead, the calculations would
process as though the atoms in chain B were not present.
Interior/Exterior Residue esimation
Visual inspection of globular protein folds shows that some residues
may be regarded as being interior to the fold whilst some are on the
protein surface.
table exterior_residues
table interior_residues
Example output is shown below
NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT Exterior residues...
NAOMI>OUTPUT
NAOMI>OUTPUT 1 V
NAOMI>OUTPUT 2 I
NAOMI>OUTPUT 4 M
NAOMI>OUTPUT 5 P
NAOMI>OUTPUT 6 S
NAOMI>OUTPUT 8 R
...
output deleted for reasons of space
...
NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT_____________________________________
NAOMI>OUTPUT Interior residues...
NAOMI>OUTPUT
NAOMI>OUTPUT 3 A
NAOMI>OUTPUT 7 V
NAOMI>OUTPUT 10 Y
NAOMI>OUTPUT 11 A
NAOMI>OUTPUT 16 V
to provide automatic classifications of these.
Salt bridges
Use the command
table salt-bridges
to output a list of possible salt-bridges in a protein. Additional
to residue information (chain, number and type), the closest approach of
atoms in the side-chain [r(min) in Angstroms] and side-chain - side-chain hydrogen
bonding information is indicated [(HB) indicates a hydrogen bond, (__)
indictates no hydrogen bond].
A 38 LYS and A 33 GLU : r(min) = 6.7 (__)
A 38 LYS and B 127 GLU : r(min) = 7.9 (__)
A 41 LYS and A 32 GLU : r(min) = 3.2 (HB)
A 41 LYS and B 127 GLU : r(min) = 3.1 (HB)
A 64 ARG and A 65 GLU : r(min) = 6.5 (__)
Supersecondary structure:
Automatic identification and classification of
beta hairpin loops
use hairpins
table hairpins
example output is shown below for the homodimeric protein glutathione
reductase (the calculation for this c. 1000 residue protein took less
than 1 minute cpu time on a R4000SC INDIGO2):
A 120 - A 121 2:2 FVD AK TLE Wide left bulge
A 126 - A 127 2:2 LEV NG ETI Regular
B 120 - B 121 2:2 FVD AK TLE Wide left bulge
B 126 - B 127 2:2 LEV NG ETI Regular
A 237 - A 241 3:5 VVK NTDGS LTL G1 bulge
A 246 - A 250 3:5 TLE LEDGR SET G1 bulge
B 237 - B 241 3:5 VVK NTDGS LTL G1 bulge
B 246 - B 250 3:5 TLE LEDGR SET G1 bulge
A 344 - A 348 5:5 TVV FSHPP IGT Regular
B 344 - B 348 5:5 TVV FSHPP IGT Regular
A 377 - A 387 9:11 SFT AMYTAVTTHRQ PCR Regular
B 377 - B 387 9:11 SFT AMYTAVTTHRQ PCR Regular
The residue numbers and sequence of the loops are output, along with
the Sibanda and Thornton classificatin (3:5 etc). The sequence of the
three residues flanking the loop, and the details of the secondary
structure of these "flanking" residues (regular, bulge etc) are also
provided. The output is ordered for increasing loop size.
calc phi psi short
use hairpins
table hairpins
which gives the following output. The conformation for each reasidue
is under the amino acid sequence.
A 120 - A 121 2:2 FVD AK TLE Wide left bulge
PAB AA BBB
A 126 - A 127 2:2 LEV NG ETI Regular
BBB L+ PBB
B 120 - B 121 2:2 FVD AK TLE Wide left bulge
PAB AA BBB
B 126 - B 127 2:2 LEV NG ETI Regular
BBB LG PBB
A 237 - A 241 3:5 VVK NTDGS LTL G1 bulge
BBP BAAGP BBP
A 246 - A 250 3:5 TLE LEDGR SET G1 bulge
BPB BAAGP BBB
B 237 - B 241 3:5 VVK NTDGS LTL G1 bulge
BBP PAAGP BBP
B 246 - B 250 3:5 TLE LEDGR SET G1 bulge
BPB BAAGP BBB
A 397 - A 400 4:4 VCV GSEE KIV Narrow right bulge
BBB EAAL PPA
B 397 - B 400 4:4 VCV GSEE KIV Narrow right bulge
BBB EAAL PPA
A 344 - A 348 5:5 TVV FSHPP IGT Regular
BBB APBPP BBB
B 344 - B 348 5:5 TVV FSHPP IGT Regular
BBB BPBPP BBB
Thanks to Y. J. K. Edwards for granting permission to incorporate a
modified version of the TURNPIN beta hairpin recognition algorithm into
NAOMI.
Hydrophobic Interaction Analysis
calc hydrophobic
table hydrophobic
to obtain such an analsys. Example output is shown below:
Residues whose side chains make hydrophobic contacts
10 T - 11 P
11 P - 10 T
13 V - 32 K
13 V - 34 V
14 T - 16 Y
14 T - 63 K
15 T - 30 T
16 Y - 14 T
16 Y - 33 A
16 Y - 36 A
16 Y - 39 A
16 Y - 43 F
16 Y - 58 Y
16 Y - 63 K
16 Y - 65 F
17 K - 30 T
17 K - 64 T
18 L - 20 I
Each of these pairs of residues contain methyl, methylene or methyne
groups that are interacting with each other.
calc hydrophobic
molscript contacts
It is usually a good idea to use this option in conjunction with the
"molscript sec_struc" command (see the examples section on the Web Site).
Identification of Key Residues in a fold
calc phi psi short hydrophobic
table key_residues
to perform the analysis. Example output is show below:
1lyz.pdb resolution 2.00 Angstroms
17 321741352 1 5 53 5 5 85223333 6 31 2
KvfGrcelaaamkRhglDnyrgySlGNwvcaakfeSnfNtqAtNRNTDgs
G + +L LG ++
1 5 6614 2243 21 1 22 412 243 7 2 4 213 45
tDygilqiNSrwwcNDgRtpGSrNlcnipcSallSSDiTaSvNcakKivS
E + G L
6 161272 132 1 1541 2 6
DGNgmNawvawrNrckgTDvQawirgcRl
E L G
The sequence of the protein is shown in one letter codes. Potential
key residues are shown as lower case. Above and below such a potential
key residue, is shown the reason for the classification. Above the residue
is shown what is effectively a weight on its contribution to the
hydrophobic core(s) - the higher the number, the more important the
residue (this number is actually the "contact number" [Brocklehurst &
Perham, 1993] for a residue). Usually, it's best to ignore those
residues with a contact number of 1. This analysis will identify
residues involved in all types of hydrophobic cluster (e.g. interior
and exterior).
Covalent bonds and CONECT records
Some computer programs require as input, information on all covalent bonds
in a protein provided in the form of PDB format CONECT records.
Use the script
use covalent_bonds
table conect_records
to produce these. Example output is shown below:
CONECT 1 2
CONECT 2 3 5
CONECT 3 4 7
CONECT 6 7
CONECT 7 8
CONECT 8 9 11
CONECT 9 10 15
CONECT 11 12 13
CONECT 14 15
CONECT 15 16
Disulphide bonds and SSBOND records
You can automatically locate all disulphide bonds in a protein from
the coordaintes,and generate PDB format SSBOND records for them by
using the command:
use disulphides
Typical output is:
SSBOND 1 CYS A 3 CYS A 18
SSBOND 2 CYS A 12 CYS A 24
SSBOND 3 CYS A 17 CYS A 31
SSBOND 4 CYS A 35 CYS A 40
SSBOND 5 CYS A 46 CYS A 61
SSBOND 6 CYS A 55 CYS A 67
SSBOND 7 CYS A 60 CYS A 74
SSBOND 8 CYS A 78 CYS A 83
SSBOND 9 CYS A 89 CYS A 104
SSBOND 10 CYS A 98 CYS A 110
SSBOND 11 CYS A 103 CYS A 117
SSBOND 12 CYS A 121 CYS A 126
SSBOND 13 CYS A 132 CYS A 147
SSBOND 14 CYS A 141 CYS A 153
SSBOND 15 CYS A 146 CYS A 160
SSBOND 16 CYS A 164 CYS A 169
The format is "number of disulphide bond, residue name, chain identifier,
residue number, residue name, chain identifier, residue number"
NMR structure refinement: Identification of main-chain and
side-chain hydrogen bond partners from ensembles of structures
Hydrogen bond restraints are important in defining the three-dimensional
structure of proteins in many NMR structure determinations. But it is
difficult (and often impossible) to identify hydrogen bonding partners
by direct observation by using current NMR experiments.
The energy-based analysis (using a realistic hydrogen bond potential
function) also allows relative "strengths" of hydrogen bonds
involving shared donors to be postulated.
NB The analysis can now handle homo and hetero multi-chain
proteins (as well as any type of residue identifiers in the pdb file)
for pdb list filename
{
use hbonds
table hbonds_dump
table hbonds_sidedump
}
analyse ensemble
allow such an analysis (including statistics on calculated energies) to be
performed within NAOMI.
don acc no. mean adev sdev svar skew curt
4 2 1 -3.94
6 3 1 -3.47
6 4 5 -4.77 0.37 0.52 0.27 0.66 -1.40
7 3 1 -1.95
7 4 9 -4.52 0.57 0.74 0.54 0.30 -1.23
7 5 1 -5.15
8 3 1 -5.74
8 4 1 -3.92
10 6 1 -4.22
10 7 8 -1.17 0.75 0.92 0.84 -0.24 -1.66
11 7 8 -4.76 0.13 0.17 0.03 -0.68 -0.94
11 8 1 -0.01
12 8 19 -5.40 0.10 0.12 0.02 -0.02 -0.88
12 9 1 0.47
13 9 19 -5.28 0.14 0.19 0.03 1.02 0.76
13 10 19 -1.21 0.26 0.32 0.10 0.64 -0.85
14 10 19 -4.62 0.35 0.42 0.18 0.19 -1.00
14 11 16 -0.74 0.56 0.63 0.40 -0.22 -1.68
15 11 19 -3.13 0.51 0.68 0.47 0.36 0.37
15 12 18 -1.51 0.59 0.90 0.81 0.95 1.31
16 11 18 -4.37 0.74 0.88 0.77 -0.05 -1.34
20 17 18 0.00 0.28 0.39 0.15 -0.78 0.28
21 17 18 -4.29 0.27 0.32 0.11 -0.14 -1.44
21 18 19 -1.04 0.55 0.64 0.42 -0.05 -1.44
26 24 1 -4.21
27 25 6 -4.34 1.36 1.60 2.55 0.56 -1.87
29 26 1 -3.14
30 27 12 -2.78 0.51 0.56 0.31 -0.30 -1.85
34 31 3 0.14 0.42 0.62 0.38 0.06 -2.33
36 32 18 -4.93 0.37 0.43 0.19 0.30 -1.36
37 33 19 -5.09 0.28 0.33 0.11 0.72 -1.03
37 34 1 -1.82
38 34 3 -3.96 0.15 0.23 0.05 0.01 -2.33
38 35 16 1.15 0.24 0.29 0.09 -0.40 -1.04
39 35 6 -4.70 0.03 0.04 0.00 0.27 -1.62
40 36 7 -3.86 0.35 0.45 0.20 0.56 -1.52
41 37 3 0.02 2.35 3.05 9.33 -0.38 -2.33
41 38 1 -1.64
41 39 1 -2.98
42 37 1 -2.39
42 38 2 -3.23 1.60 2.26 5.09 0.00 -2.75
Side-chain analsyses are similar, but the acceptor atoms are shown
also. This allows the use to see if a particular acceptor atom is uniquely
involved in a hydrogen bond in cases where this may be ambiguous (e.g.
in aspartate residues, atoms OD1 and OD2). Remember to use the "correct" and
"validate" commands before to create correctly named atoms where the atom name
depends on residue side-chain conformation (you may also need to use the
"terminii X-PLOR" if you are using X-PLOR format files).
don acc atom no. mean adev sdev svar skew curt
6 6 OG 1 -0.94
18 17 OD1 3 -0.79 0.35 0.47 0.22 0.27 -2.33
19 17 OD1 19 -4.38 1.59 1.77 3.13 0.36 -1.72
20 17 OD1 5 -4.01 1.75 2.01 4.04 -0.30 -2.23
23 34 OD1 17 -1.72 1.70 2.01 4.04 -0.28 -1.20
23 34 OD2 4 -3.30 1.72 2.31 5.36 0.71 -1.72
24 24 OG1 19 -2.58 0.50 0.63 0.39 0.79 -0.73
24 34 OD2 7 -4.63 1.21 1.47 2.16 0.80 -1.45
25 24 OG1 18 -5.27 0.05 0.06 0.00 0.71 -0.83
26 34 OD2 2 -4.05 2.26 3.20 10.26 0.00 -2.75
28 27 OD1 1 -1.16
29 27 OD1 16 -6.41 0.57 0.82 0.67 1.66 1.24
31 34 OD2 19 -5.73 1.26 1.61 2.60 1.22 -0.40
36 36 OD1 2 -0.06 0.17 0.25 0.06 0.00 -2.75
These commands require the NMR structure refinement module to be licensed.
Prediction of NOEs from structure
To predict structurally relevant NOEs one might expect
to observe in a given three-dimensional structure (including
multimeric proteins), which would be expected to appear in the region of a
NOESY spectrum (F1 (0 - 12 ppm), F2 (5-12 ppm) use the command:
predict noes lower upper
where the lower and upper represent
bounds on inter-protein distances. Information on expected intra-residue,
inter-residue and inter-chain NOEs. No chemical shift degeneracy of protons
is assumed (even methyl groups at present unfortunately).
predict noes 1.8 7.0
would report all relevant inter-proton distances between 1.8 and 7.0
Angstroms in a structure.
Effectively then, NOEs between pairs of protons where one of the pair
is either an amide proton (main-chain or side-chain) or a ring proton
are predicted. Intra-residue, and medium and long range NOE predictions
are detailed separately (see the example output below).
!Possible NOEs for residue _ 7 VAL, forward in sequence
!Intra residue NOEs
INTRA_RES atom HN res _ 7 VAL - atom HA res _ 7 VAL dist 3.0
INTRA_RES atom HN res _ 7 VAL - atom HB res _ 7 VAL dist 2.6
INTRA_RES atom HN res _ 7 VAL - atom HG11 res _ 7 VAL dist 4.7
INTRA_RES atom HN res _ 7 VAL - atom HG12 res _ 7 VAL dist 4.9
INTRA_RES atom HN res _ 7 VAL - atom HG13 res _ 7 VAL dist 4.4
INTRA_RES atom HN res _ 7 VAL - atom HG21 res _ 7 VAL dist 3.0
INTRA_RES atom HN res _ 7 VAL - atom HG22 res _ 7 VAL dist 4.3
INTRA_RES atom HN res _ 7 VAL - atom HG23 res _ 7 VAL dist 3.4
!Inter residue NOEs
INTER_RES_(i,i+1) atom HN res _ 7 VAL - atom HN res _ 8 ARG dist 2.5
INTER_RES_(i,i+1) atom HN res _ 7 VAL - atom HA res _ 8 ARG dist 5.0
INTER_RES_(i,i+1) atom HN res _ 7 VAL - atom HG2 res _ 8 ARG dist 6.7
INTER_RES_(i,i+1) atom HA res _ 7 VAL - atom HN res _ 8 ARG dist 3.6
INTER_RES_(i,i+1) atom HB res _ 7 VAL - atom HN res _ 8 ARG dist 2.0
INTER_RES_(i,i+2) atom HN res _ 7 VAL - atom HE1 res _ 9 LYS dist 6.9
INTER_RES_(i,i+2) atom HN res _ 7 VAL - atom HE2 res _ 9 LYS dist 5.7
INTER_RES_(i,i+2) atom HG12 res _ 7 VAL - atom HN res _ 9 LYS dist 6.4
INTER_RES_(i,i+2) atom HG13 res _ 7 VAL - atom HN res _ 9 LYS dist 5.2
INTER_RES_(i,i+3) atom HB res _ 7 VAL - atom HN res _ 10 TYR dist 5.3
INTER_RES_(i,i+3) atom HB res _ 7 VAL - atom HD1 res _ 10 TYR dist 6.9
INTER_RES_(i,i+3) atom HG21 res _ 7 VAL - atom HN res _ 10 TYR dist 6.4
INTER_RES_(i,i+3) atom HG23 res _ 7 VAL - atom HN res _ 10 TYR dist 5.9
INTER_RES_(i,i+3) atom HG23 res _ 7 VAL - atom HD1 res _ 10 TYR dist 6.9
INTER_RES_(i,i+4) atom HN res _ 7 VAL - atom HN res _ 11 ALA dist 5.9
INTER_RES_(i,i+4) atom HA res _ 7 VAL - atom HN res _ 11 ALA dist 4.8
INTER_RES_(i,i+4) atom HB res _ 7 VAL - atom HN res _ 11 ALA dist 6.3
INTER_RES_(long) atom HN res _ 7 VAL - atom HD11 res _ 18 ILE dist 6.6
INTER_RES_(long) atom HN res _ 7 VAL - atom HD12 res _ 18 ILE dist 6.9
INTER_RES_(long) atom HG12 res _ 7 VAL - atom HN res _ 29 ARG dist 7.0
INTER_RES_(long) atom HB res _ 7 VAL - atom HN res _ 30 VAL dist 5.8
INTER_RES_(long) atom HG11 res _ 7 VAL - atom HN res _ 30 VAL dist 5.5
INTER_RES_(long) atom HG12 res _ 7 VAL - atom HN res _ 30 VAL dist 4.2
INTER_RES_(long) atom HG13 res _ 7 VAL - atom HN res _ 30 VAL dist 4.7
INTER_RES_(long) atom HG22 res _ 7 VAL - atom HN res _ 30 VAL dist 6.3
NB This command is useful for sorting out ambigous NOEs in spectra by
analysing calculated structures. You should expect, however, that some
peaks in your spectra, predicted from this simple
analysis of structures may be missing or some extra peaks present due to
other physical effects (e.g. spin diffusion).
Assembling and disassembling multiple model
PDB files
To make a single pdb file consisting of an ensemble of selected
single structure pdb files, use the "pack structures" command.
An example of a script is:
!instruct NAOMI to produce an ensemble file
pack structures
!use structures listed in the file "strucs.lis"
for pdb list strucs.lis
{
!make sure atoms are named correctly
validate
!only use residues 30 to 50 in chain A
zone none
select A30:A50
!write each structure to the ensemble file
pdb_write
}
The ensemble pdb file is called "ensemble.pdb" and will be placed in the
"report" directory that you specified at the beginning of your complete
NAOMI input script.
Predicting the 3-D structure of protein folding
intermediates
Sorry, this section of the documentation is not currently avaiable
Analysis of protein-protein complexes - favourable
interactions
Sorry, this section of the documentation is not currently avaiable
Machine-parsable 3-D information
In order to allow results of structure analyses performed by NAOMI
to be simply interfaced to bioinformatics software, the command
table structure_info
is provided. The output is designed to be machine parsable and includes
information that can usefully be included in multiple sequence alignments
in cases where the alignment containts sequence(s) of proteins with
known three-dimensional structure.
3D_info
!Amino acid sequence of protein for which coordinates are available
pdbsequence
>1rnb.pdb
QVINTFDGVADYLQTYHKLPNDYITKSEAQALGWVASKGNLADVAPGKSI
GGDIFSNREGKLPGKSGRTWREADINYTSGFRNSDRILYSSDWLIYKTTD
HYQTFTKIR
// end sequence
!Residue level information follows...
!pdb chain_id is output if there is one, '-' if not
!Syntax for sec_struc is:
!sec_struc < helix | strand | loop >
!Syntax for disulphide is:
!disulphide < absolute residue number >
!Syntax for key_residue contact number is:
!contact < contact number >
!Syntax side-chain solvent accessibility:
!access < buried | exposed >
residue 1
chain -
type Q
pdb residue id 2
sec_struc loop
access exposed
end residue
residue 2
chain -
type V
pdb residue id 3
sec_struc loop
access exposed
end residue
...
(information for most residues not shown in the User Guide for
reasons of space)
...
residue 107
chain -
type K
pdb residue id 108
sec_struc strand
contact 2
access exposed
end residue
residue 108
chain -
type I
pdb residue id 109
sec_struc strand
contact 6
access buried
end residue
residue 109
chain -
type R
pdb residue id 110
sec_struc strand
contact 3
access exposed
end residue
end 3D_info
Author: Simon M. Brocklehurst