If an expression selects no atoms or residues, then there is generally no error; that command simply does not do anything. The exception to this is the position vector specification.
Note that atom selections and residue selections
cannot be freely used in the 'and' and 'or' expressions. The selection
expressions are strongly typed; all terms in one 'and' or 'or'
expression must be of the same type; either atom or residue. However,
there are operators that convert an atom selection into the
corresponding residues
(contains
) and vice versa
(in
).
require exp1, exp2, exp3 ... and expn
The 'and' operator has the meaning that the expressionsexp1, exp2, exp3,..., expn
must all be true for an atom or residue to be selected. All the expressions must of one type; either atom or residue selection.Note the comma ',' character: it is required between the expressions, except before the keyword
and
, where it may not occur.
either exp1, exp2, exp3, ... or expn
The 'or' operator has the meaning that if any single one of expressionsexp1, exp2, exp3,..., expn
is true for an atom or residue, then that atom or residue is selected. All the expressions must of one type; either atom or residue selection.Note the comma ',' character: it is required between the expressions, except before the keyword
or
, where it may not occur.
not exp
This operator simply converts the value exp
for each atom
or residue into its opposite value.
atom
string
Selects all atoms with the given name. The name may be contain X-PLOR type wildcards or be a regular expression.
occupancy
number
number
Selects all atoms with an occupancy value within the given range.
b-factor
number
number
Selects all atoms with a B-factor value within the given range.
in
residue-selection
Selects all atoms within the selected residue(s). This is an expression often used for the commands ball-and-stick and cpk, which need an atom selection as argument.
sphere
vector
number
Selects all atoms within a sphere with its centre at the given vector and with the given radius.
close
atom-selection
number
Selects all atoms closer than the given distance to any of the given atoms. The atoms given as argument are not part of the finally selected set. That is, this expression specifies only neighbours to certain atoms, excluding the atoms themselves.
backbone
This atom selection is short-hand for the following expression:either require in amino-acids and either atom N, atom CA, atom C or atom O or require not in amino-acids and either atom *', atom O%P or atom PThat is, if a residue is an amino acid, then its N, CA, C and O atoms are selected. If it is not an amino acid, then the atoms with names appropriate for the nucleic acid residue phosphate and (deoxy)ribose groups are selected. In the latter case an expression that selects all primed atoms is used.
hydrogens
This atom selection is short-hand for the following expression:either atom H*, atom 1H*, atom 2H* or atom 3H*That is, all atoms having the names commonly given to hydrogen atoms in a PDB file are selected.Note that this selection is currently not based on the element specified for the atom in the new (v2.0) PDB file format. It may in a future version.
molecule
string
Selects all residues within the given molecule. The molecule name is that given when the coordinate file was read. The name may be contain X-PLOR type wildcards or be a regular expression.
model
integer
Selects the model with the given number.Protein structures determined from NMR data are almost always computed as ensembles of coordinate data sets, where the degree of variability between the sets is related to the number of experimental constraints available.
In the new (v2.0) PDB coordinate file format, the different coordinate sets from an NMR structure determination are given sequential model numbers, starting with 1.
from
string
to string
Selects the stretch of residues between and including the given residues. The names may be contain X-PLOR type wildcards or be a regular expression. If there actually is more than one stretch of residues that match, then all stretches are selected.For example, if a coordinate file contains amino acids from 1 to 100, and waters also numbered 1 to 57 (as may occur in PDB files), then a sequence specification
"from 5 to 15"
will pick both the stretch of amino-acid residues from 5 to 15, and the waters from 5 to 15.This is usually not a problem in connection with commands such as helix or coil, since any selected non-amino acid residues are simply ignored by these. The behaviour can be advantageous when dealing with symmetrical subunits. The name comparison feature can then be used to pick both strands (or whatever) in both chains with one single command.
If a stretch of residues is not finished when the last residue in the currently loaded coordinates is reached, then MolScript issues a warning, but does not produce an error. An error should arguably be the proper response, but there are PDB files where the residue names are such that this particular condition is difficult to avoid.
residue
string
Selects the residues with the given name (or number). The name may be contain X-PLOR type wildcards or be a regular expression.Note that the residue name is left-shifted and the blanks have been squeezed out when the coordinate file was read. This means that the chain identifier and insertion code, if any, are part of the residue name, even if they were separate in the input coordinate file.
type
string
Selects the residues with the given type. The type may be contain X-PLOR type wildcards or be a regular expression.
chain
string
Selects the residues with the given chain identifier. Note that this identifier is currently just a character, if it is at all present. The new (v2.0) PDB format segment identifiers have not been implemented yet.
contains
atom-selection
Selects the residues that contain the given atoms.
amino-acids
This residue selection is short-hand for the following selection expression:either type ALA, type SER, type THR, type GLY, type PRO, type CPR, type ASN, type GLN, type ASP, type GLU, type ASX, type GLX, type ARG, type LYS, type HIS, type PHE, type TYR, type TRP, type TRY, type VAL, type ILE, type LEU, type MET, type CYS, type CSH, type CYH or type CSMAll standard three-letter codes for amino acid residues are recognized, as well as some non-standard ones; CPR for cis-proline, ASX for undetermined ASN or ASP, GLX for undetermined GLN or GLU, TRY for tryptophan, and CSH, CYH and CSM for cysteine.
waters
This residue selection is short-hand for the following selection expression:either type H2O, type HHO, type OHH, type HOH, type OH2, type SOL or type WATAt least some of the commonly occurring residue type designations for water molecules are covered by this expression.
nucleotides
This residue selection is short-hand for the following selection expression:either residue A, residue +A, residue C, residue +C, residue I, residue +I, residue G, residue +G, residue T, residue +T, residue U or residue +UThis covers the common nucleotide bases as well as modified variants of these bases designated according to the PDB conventions.
ligands
This residue selection is short-hand for the following selection expression:not either amino-acids, waters or nucleotidesAll residues which are neither amino acids, waters nor nucleotides are selected by this expression.
Comparisons between the given atom names, residue types and names, and molecule names in the various selection expressions with those present in the coordinate data follow certain rules:
Tyr
is not equal to
TYR
.
off
, then MolScript allows using X-PLOR
(Brünger 1992) type
wildcard characters in the given strings. If the value is
on
, then the given string is viewed as a proper regular
expression.
atom * all atoms atom N* all nitrogen atoms (and sodium, neon, niobium,...) atom %G* all gamma (G) atoms; CG, OG, OG1, SG (and possibly others) type T* residue types THR, TRP and TYR (and possibly others) type T%R residue types THR and TYRIf the coordinate file contains '*' in atom names (nucleic acids in PDB files) then these are converted into single-quotes ''' while reading the file. If your coordinate file contains '*' in residue names or types, or '%', '#' or '+' characters anywhere, then you must use a proper regular expression.
regexp
(except not having the
"r{m,n}"
feature):
^ beginning of line $ end of line . any character \< beginning of word \> end of word [str] any character in str [^str] any character not in str [x-y] any character between x and y (ASCII order) * any number of the preceding expression c the character c, where c is not special \(r\) the regular expression rCaveat: The above description may contain errors, since the source code used for this feature was not very well documented. Also, it hasn't been tested properly.