SFALL (CCP4: Supported Program)
NAME
sfall
- Structure factor calculation and X-ray refinement using forward and reverse FFT
SYNOPSIS
sfall
[
XYZIN
foo.brk
] [
MAPIN
foo.map
] [
HKLIN
foo.mtz
] [
MAPOUT
foo_out.map
] [
HKLOUT
foo_out.mtz
] [
XYZOUT
foo_out.brk
] [
GRADMAT
foo.sfmat
]
[Keyworded input]
CONTENTS
- DESCRIPTION
- Generate atom map
- Calculate structure factors
- Generate X-ray gradients for
refinement
- Space Groups
- KEYWORDED INPUT
- MEMORY ALLOCATION
- INPUT AND OUTPUT FILES
- Notes on Formfactors
- Notes on XYZIN, Atom names, etc
- Notes on HKLIN
- Notes on MAPIN
- Output from
Structure Factor Calculation
- Notes on XYZOUT
- Notes on HKLOUT
- Notes on MAPOUT
- Output from the Gradients Calculation
- notes on GRADMAT
- Output from
the Calculation of the Optimum Shifts
- Notes on stepsize
- COMMON PROBLEMS
- EXAMPLES
- Calculate map
from atom coordinates
- Read
in coordinates to calculate structure factors
- Read in
map to calculate structure factors
- Iteration cycles to do
back-transformation and phase
combination cycles in solvent flattening procedure
- Unrestrained refinement
- Prolsq refinement cycles Please note: PROLSQ
is no longer available as refinement program in the CCP4 Suite.
- Simple Runnable Examples
- AUTHORS
- PROGRAM FUNCTION
- Modelling of the electron density
and calculation of
structure factors
- Calculation of the gradients
- The Normal Matrix
- Calculation of the shifts
- REFERENCES
- SEE ALSO
DESCRIPTION
This program calculates structure factors and X-ray gradients for refinement
using inverse and forward fast Fourier techniques.
If coordinates are input,
it builds an `atom map' on a suitable grid over the asymmetric unit of
the spacegroup. Atomic density is distributed as a
Gaussian centred on the atom position. The value of the Gaussian
depends on the distance from the atom centre, the atom type, and
temperature factor. Its value
is calculated at each grid point within a given radius for
each atom, and the values are summed to give an `atom map'. This
`map' can then be used for an inverse fast Fourier transform, which
gives the structure factor. It is important that the grid is fine
enough to sample the atomic Gaussians accurately - the accuracy of the
structure factor estimates depends on this. I recommend using a grid
which gives a sampling of ~0.5Å.
X-ray gradients are estimated by `differential synthesis'. A difference map
is calculated, and this is convoluted with the atomic density.
By default, SFALL scales Fobs to Fcalc. This then lets you do
maps on (or close to) an absolute scale if the map comes from
coordinates. If you are back-transforming a map, e.g. in
solvent-flattening or averaging, then the map isn't on an absolute
scale anyway, and so this scaling is not particularly sensible
(although not usually harmful). Note that in iterative procedures, the
Fobs you bring in to SFALL on each cycle should be the original Fobs,
not the one carried through from the previous cycle (which may have been
scaled, selected, or otherwise corrupted).
In this case, it is probably more sensible to use the NOSCALE option
in SFALL, then either (i) use SIGMAA to generate Partial or Combined
Fourier coefficients for the next map (if you want to weight them);
or (ii) use RSTATS to scale Fcalc to Fobs.
For a description of the original method see [1]. A modification in
calculating the gradients suggested by Alain Lifchitz is used.
A list of common problems (with possible
solutions) can now be found below.
1) Generate atom map
Input:
i) XYZIN
-
Coordinates (Brookhaven format with CRYST1 and SCALEi lines).
Output:
i) MAPOUT
-
Atom map. Various modifications can be made to this map:
- a)
-
The `solvent' map. This map has zero at all points
where there is atomic density, and a constant value
for all points where there was no atomic density.
It is used for calculating structure factors for
`bulk solvent' (see ICOEFL for ideas on using this);
or for use as a mask.
- b)
-
A tagged `atom map' where each grid point is tagged
with a flag to say which residue, or which atom
number, contributed most to it. This is used for
analysing map correlations, and real space R-factors.
2) Calculate structure factors
By default a scale factor is calculated between Fobs and Fc and is applied
to the output Fobs. If this is not wanted then the keyword NOSCALE must be
used.
Input:
i) either coordinates or a map (compulsory)
-
If map is input it can be `truncated' to eliminate very high or low density.
ii) HKLIN (optional)
-
This must contain: H K L FP SIGFP [ FREE FPART PHIP ...]
It MUST be sorted on h k l, and be in the CCP4 chosen
asymmetric unit. See NOTES ON HKLIN for further details.
-
The FREE value is generated by FREERFLAG or in X-PLOR and transferred to MTZ
format using F2MTZ. See the FREERFLAG documentation. (Note that X-PLOR 3.1
only assigns flags of 0 or 1.) The keyword FREERFLAG allows reflexions with a
given FREE value to be excluded from refinement and Fc calculations.
-
IF FPART and PHIP have been assigned, the structure factor output
will equal the scaled FPART vector plus the contribution calculated
from the input plus the scaled FPART vector.
Output:
i) HKLOUT containing: H K L ... FC PHIC (WANGWT)
-
(If you are preparing the 'averaged map' for generating the Wang envelope,
(flagged by keyword AVMODE) the WANGWT will be calculated as well as the
structure factor modified for use in solvent flattening.)
ii) MAPOUT - The atom `map' (optional).
-
You can also keep the atom map if you want to.
3) Generate X-ray gradients for refinement
These are used for minimisation of X-ray differences between FP and
FC. This calculation uses the forward FFT.
Input:
i) XYZIN
-
Coordinates (Brookhaven format with header).
ii) HKLIN
-
-
Containing: H K L FP SIGFP [ FREE ]
When FREE is assigned then a subset of reflections is excluded from the
gradient calculation (see above for details). Hence the agreement between
them and Fc plays no part of the refinement procedure. The Free R factor
calculated for these reflections is a useful indicator of the quality of the
refinement [4]. It must be stressed that during a particular refinement
procedure only one freeR set of reflections should be used. It is meaningless
to change the freeR set in the middle of refinement. However, with several
freeR sets, it is possible to compare the same refinement procedure against
different freeR sets. It is still possible for all the freeR sets to be
unsuitable e.g. if the NCS and the crystal symmetry are related, as they often
are, there may be special relationships between classes of reflections, and
this may mean that the FREE set is not truly independent of the other
observations.
Output:
i) XYZOUT
-
A coordinate file in which the least squares shifts have been applied.
ii) GRADMAT
-
A direct access file containing information about the X-ray gradients,
ready for input into PROLSQ (no longer available in the CCP4 Suite).
Further details of the specification of input and output files is given in
section INPUT AND OUTPUT FILES.
See description of keyword MODE and examples for uses for the various options.
Space Groups
SFALL has special subroutines to handle the following spacegroups:
P1 P21(4) P21212(1018 - alt. origin)
P212121(19) P4122/P4322(91/95) P41212/P43212 (92/96)
P31/P32(144/145) P3/R3(143/146) P3121/P3221(152/154)
P61/P65 (169/170)
It is possible to use the P1 version (or any suitable lower symmetry)
for the calculations. In particular, any non-primitive spacegroup can
be run in the appropriate primitive one. See keyword SFSGROUP
for details.
BUT BEWARE: To use SFSG P1 you must have your reflections
sorted with l>=0.
This is NOT the default for P3121/P3221 or P6i22.
You would be sensible to run CAD with the keyword:
`OUTLIM SPACEGROUP P1' before calculating structure factors.
The crystal symmetry will be used to generate atom positions from a
unique molecule.
See NOTES ON HKLIN for special requirements for the HKLIN file when using
a lower symmetry version.
KEYWORDED INPUT
All input is keyworded. Only the first 4 characters of a keyword are
significant.
The keyworded records can be in any order.
Anything on a line after an ! or # is ignored and
lines can be continued by using a minus sign.
The possible keywords are:
- Compulsory
MODE
- Optional
AVMODE,
BADD,
BINS,
BRESET,
CELL,
CHAIN,
EDEN,
END,
FORMFACTOR,
FREERFLAG,
GRID,
H2OBG,
LABIN,
LABOUT,
NAME,
NOSCALE,
OMIT,
RATIO,
RESOLUTION,
RMSSHIFT,
RSCB,
SCALE,
SFSGROUP,
SIGMA,
STEPSIZE,
SYMMETRY,
TITLE,
TRUNCATE,
VDWR,
VERBOSE,
WEIGHT.
N.B. Some of these are compulsory for some MODEs.
DESCRIPTION OF KEYWORDS
MODE [ ATMMAP ... | SFCALC ... | SFREF ... ]
This keyword controls the course of the calculation.
- ATMMAP [ RESMOD ] [ ATMMOD ] [ SOLVMAP ]
-
Calculating `atom maps' from coordinates. The map extends over
the asymmetric unit of the appropriate space group (which may be
different from the true spacegroup, see keyword SFSGROUP for
possible groups). N.B. XYZIN and MAPOUT must be assigned.
-
Subsidiary keywords:
- RESMOD
-
output `atom map' values tagged by a residue number flag.
- ATMMOD
-
output `atom map' values tagged by a atom number flag.
- SOLVMAP
-
output `solvent map' with value specified by the H2OBG keyword
(default=0.03) at all points not occupied by atom density.
All points occupied by atom density set to 0.0.
- SFCALC ...
-
Structure factor calculation.
-
Subsidiary keywords:
- XYZIN or MAPIN (one essential)
-
Define input type, either coordinates or a map.
N.B. XYZIN or MAPIN must be assigned.
- followed by:
-
- HKLIN (optional)
-
Only reflections which exist in the HKLIN file
are output to HKLOUT.
The output FC PHIC are appended to input H K L FP SIGFP...
See keywords LABIN and LABOUT for further details.
N.B. HKLIN and HKLOUT must be assigned. See NOTES ON
HKLIN for the requirements placed on the HKLIN file.
- ATMMAP or SOLVMAP (optional)
-
Either an `atom map' or a `solvent map' is output. The map extends over
the asymmetric unit of the space group used in the structure factor
calculation (see keyword SFSGROUP). Note that for non-primitive
space groups, only density for symmetry mates corresponding to primitive
symmetry operations is output; use MODE ATMMAP to obtain a complete map.
N.B. MAPOUT must be assigned.
- SFREF
-
Coordinate and/or B-factor refinement; will calculate least squares
gradient and shifts. N.B. XYZIN and HKLIN must be assigned.
-
Subsidiary keywords:
- MAPIN
-
(This option is exceedingly unlikely to be needed.)
Gradients will be taken from an input difference map.
Usually this difference map is generated internally from the structure
factors calculated from the atomic coordinates. However, in some
special situations, e.g. if you had averaged a difference map earlier,
you may want to input it rather than use the internally generated one.
- RESTRAINED
-
Atom parameter gradients calculated from the fit of the
X-ray data are output to GRADMAT, to be combined with
geometric restraints calculated by PROTIN and PROLSQ (no longer available in the CCP4 Suite).
- UNRESTRAINED
-
Parameter shifts to be applied without restraints;
a file will be written to XYZOUT.
-
If followed by:
- XYZ
-
refinement of the X Y and Z parameters will be done.
- XYZB
-
refinement of the XYZ and B parameters will be done.
- B
-
refinement of the B parameters will be done.
Only possible with the UNRESTRAINED option.
- Example 1: MODE ATMMAP
-
An electron density map is created from the atomic coordinates.
- Example 2: MODE ATMMAP RESMOD
-
An electron density map is created from the atomic coordinates
with an extra residue identifying flag added to the atomic density;
this is 100*IRES+MCHFLG (If there are several chains in your protein you
will need to describe the chain limits using the keyword CHAIN to avoid
the same flag being given to atoms from different chains). This allows
OVERLAPMAP to analyse the correlation coefficient between 2 maps
or the `real space R-Factor' for contributions from each residue.
- Example 3: MODE ATMMAP ATMMOD
-
An electron density map is created from the atomic coordinates
with an extra atom identifying flag added to the atomic density;
this is 100*atom_number.
This allows OVERLAPMAP to analyse the correlation coefficient between 2 maps
or the `real space R-Factor' for contributions from each atom
(This was used to see if the hydrogen or deuterium
atoms in a neutron study could be distinguished in the electron density map).
- Example 4: MODE SFCALC XYZIN HKLIN [ ATMMAP ]
-
A structure factor calculation is performed using the input coordinates.
HKLIN is read, and the FP are scaled to FC.
HKLOUT contains at least H K L scaled_FP scaled_SIGFP FC PHIC
(You can output unscaled FP by using the keyword NOSCALE).
An R-factor is calculated.
If keyword ATMMAP is given the electron density map created from the atomic
coordinates is saved on MAPOUT.
- Example 5: MODE SFCALC MAPIN HKLIN
-
A structure factor calculation is performed using the input map.
The map MUST cover the asymmetric unit expected for the SFSGROUP.
This will be true if you have calculated the map using FFT with the
FFTSPACEGROUP the same as the SFSGROUP and have NOT reset the AXIS
order. Details are given below.
HKLIN is read, and the FP are scaled to FC.
HKLOUT contains at least H K L scaled_FP scaled_SIGFP FC PHIC
(You can output unscaled FP by setting keyword NOSCALE).
An R-factor is calculated.
- Example 6: MODE SFCALC XYZIN (ATMMAP)
-
A structure factor calculation is performed using the input coordinates.
Since no HKLIN is read, the SYMMETRY you wish to use for the structure
will have to be given. The default is to use the symmetry of the SFSGROUP.
Output HKLOUT containing h k l FC PHIC (WANGWT if AVMODE set)
for all reflections within the requested resolution limits.
-
If keyword ATMMAP is given the electron density map created from the atomic
coordinates is saved on MAPOUT.
- Example 7: MODE SFCALC MAPIN
-
A structure factor calculation is performed using the input map.
The map MUST cover the asymmetric unit expected for the SFSGROUP.
This will be true if you have calculated the map using FFT with the
FFTSPACEGROUP the same as the SFSGROUP and have NOT reset the AXIS
order.
- Example 8: MODE SFREF XYZ (or XYZB) RESTRAINED
-
Used for restrained refinement of XYZ coordinates or XYZ coordinates and
B factors. It is used in conjunction with PROTIN and PROLSQ (no longer available in the CCP4 Suite).
First a structure factor calculation is done, the FP scaled and an R-factor
calculated.
Then the program calculates the normal matrix and gradient contributions for
the parameters to be refined.
Gradients of atomic shifts are output to GRADMAT.
Reference: R. Agarwal, A. Lifchitz, J. Konnert, W.A Hendrickson [6].
- Example 9: MODE SFREF XYZ (or XYZB or B) UNRESTRAINED
-
Used for unrestrained refinement of XYZ coordinates or XYZ coordinates and
B factors or Bfactors alone.
First a structure factor calculation is done, the FP scaled and an R-factor
calculated.
Then the program calculates the normal matrix, gradient contributions and
shifts for the parameters to be refined.
The shifts are analysed using arguments from STEPSIZE, RATIO and a set of
modified coordinates output to XYZOUT.
Reference: R. Agarwal, A. Lifchitz [6].
SFSGROUP <sfsgroup>
<sfsgroup> is the spacegroup number or name of the spacegroup used for the
calculation. You don't normally need to specify this as a sensible
default will be used (the highest symmetry one the program can use
which is compatible with your data) as printed in the output.
N.B. - This IS NOT NECESSARILY the spacegroup of your structure. E.g., you may
well be calculating structure factors in P1 for a structure with spacegroup
F43212. The program will check whether the requested <sfsgroup> is compatible
with your data.
The possibilities for <sfsgroup> are: P1, P21 (4), P21212 (1018 -
alternative origin), P212121 (19), P4122/P4322 (91/95), P41212/P43212
(92/96), P31/P32 (144/145), P3/R3 (143/146), P3121/P3221 (152/154), P61/P65
(169/170).
Any spacegroup can be run in an appropriate one whose symmetry
operators are a subset of the one of interest (P1 can always be used
as long as the hkl data are sorted with l>=0).
By judiciously shifting origins many spacegroups become supergroups.
$CLIBD/symop.lib contains the modified symmetry operators for some spacegroups.
You may move the origin of your coordinates with PDBSET.
For example:
-
SYMGEN X + 1/4, Y + 1/4, Z
LABIN <program label>=<file label> ...
Note: This is essential for all refinement and if HKLIN is assigned
for a structure factor calculation.
The following labels can be assigned.
H K L FP SIGFP FREE FPART PHIP
PHIP F0 F1 F2 F3 F4 F5 F6
F7 F8 F9 F10 F11 F12 F13 F14
F15 F16 F17 F18 F19
Assignments for FP and SIGFP are always required.
If the free R flag is assigned to FREE, a subset of
reflections can be excluded from the calculation or refinement. See the
keyword FREERFLAG.
If you wish to add a known FPART to the structure factors, assign FPART
and PHIP. This is useful if you want to calculate hydrogen contributions,
add in an anisotropic heavy atom, add in a bulk solvent correction, which
has been calculated somewhere else.
The FPART can be scaled relative to the FC by inputting a SCALE.
If you wish to copy some but not all the other columns from your input
to your output file,
you may assign F0 F1 ... F19. These will be transcribed to the output file.
If you want to copy everything over, use ALLIN on the LABOUT line.
LABOUT <program label>=<file label> ... [ ALLIN ]
<program labels> can be given for FC PHIC (and WANGWT, if required).
Any column assigned on LABIN will be copied to the output file.
You can change that file label with LABOUT if you absolutely have to,
but it is unwise!
ALLIN will copy all columns in the input file to the output file.
Beware: the program may get confused if a file label included in ALLIN is the
same as a program label.
OPTIONAL KEYWORDS:
AVMODE [ RADIUS <radius> ] [ WEIGHT <modewt> ]
When you are preparing the 'averaged map' for generating the Wang envelope,
this sets the radius of the map sphere and the type of weighting.
Subsidiary keywords:
- RADIUS <radius>
-
Specify the <radius> of the averaging sphere (angstroms)
- WEIGHT <modewt>
-
- <modewt> = 1
-
Use weighting scheme w=1-(r/R) (Wang's method)
- <modewt> = 2
-
Use weighting scheme w=1-(r/R)**2
BADD <badd>
A parameter added to all atomic B values. It will smear the Gaussians used to
generate the atomic density over a larger volume. L.Ten Eyck suggests this
should allow a coarser sampling grid. G. Bricogne is sceptical and certainly
it introduces serious errors into the B12 coenzyme structure factors at 1Å
resolution (ref Hugh Savage). EJD shares the scepticism.
The default value (0.0) gives reasonable results for most problems.
BINS <nbins>
<nbins> (default 50) is the number of bins for the analysis of
R-factor v. resolution.
Ranges of 4sin**2/Lambda**2 are of width 1.0/<nbins>.
For high resolution data this needs to be reset.
BRESET <bmin>
B-factors less than <bmin> (default bmin=5.0) are set equal to <bmin>.
For high resolution data this needs to be reset.
CELL <a> <b> <c> [ <alpha> <beta> <gamma> ]
Input cell parameters explicitly.
This is checked against the cell read from the MTZ file and that read from the
PDB file. The program will continue with a warning if there is a small
inconsistency and stop if there is a serious one. <alpha> <beta>
<gamma> default to 90 degrees.
CHAIN <chnlab> <iresn> <iresc>
Only needed for MODE ATMMAP RESMOD if your coordinate file has
more than one chain. You will need one CHAIN card for each chain.
- <chnlab>
-
single character label (e.g. A)
- <iresn>
-
first residue number
- <iresc>
-
last residue number
EDEN <escale> <cmp> <cmn>
Done for Cecil Tate.
The input map is scaled by:
xscal=escale/nx*ny*nz
and truncated.
if density(x,y,z)<0
density(x,y,z)=xscal*cmn*tanh(density(x,y,z)/cmn))
if density(x,y,z)>0
density(x,y,z)=xscal*cmp*tanh(density(x,y,z)/cmp))
END
This ends input.
FORMFACTOR [ NEUTRON | NGAUSS <ngauss> ] <atnam1> <atnam2> ...
See the NOTES ON FORMFACTORS.
SFALL has hardcoded X-ray and neutron formfactors available for
atom types H C N O S.
(Default CuKa X-ray formfactors.)
Other atom types are tabulated in $CLIBD/atomsf.lib (X-ray)
and $CLIBD/atomsf_neutron.lib (neutron).
If your atom identifier is not recognised (i.e. it is not C, N, O, or S)
the program will read atomsf.lib and take the FIRST table which matches
your atom ID. This may not be satisfactory if your atom type is Fe+3, say,
and you may wish to specify the formfactor yourself.
Subsidiary keywords:
- NEUTRON
-
Use neutron formfactors.
Remember to assign ATOMSF $CLIBD/atomsf_neutron.lib to read the
correct library of formfactors.
or
- NGAUSS <ngauss>
-
<ngauss> (default 2) is the number of Gaussian terms requested for
the default atoms H C N O S. The only possible values are 2 or 5.
- <atnam1> <atnam2> ...
-
Atom names for any other atom types used in XYZIN.
The atom names MUST exactly match the entries in
ATOMSF library.
Example:
FORM NEUTRON P Co+3 D
FORM NGAUSS 5 Fe+3
FREERFLAG <freeflag>
Reflections where the value of the column FREE, taken from the MTZ
file, equals <freeflag> (default 0) are omitted from the refinement or
FC calculation and their R-factor is monitored separately. If you do
not wish a free-R set to be used don't assign FREE.
GRID <nx> <ny> <nz>
<nx>, <ny>, <nz> are integers giving the number of divisions along each whole
unit cell edge.
The atomic density is sampled on this grid, so if it is too coarse you
will introduce errors into the structure factor calculation.
These default to values near to celledge*2, which defines
a grid spacing of 0.5Å.
Different spacegroups have special requirements for factorisation.
No prime factors higher than 19 are permitted
(The `atom map' generated on this grid has the asymmetric unit defined for
this spacegroup. See table below.).
The following general restrictions must be observed:
<nx> >= 2 * HMAX + 1
<ny> >= 2 * KMAX + 1
<nz> >= 2 * LMAX + 1
In addition there are further space group dependent
conditions as follows (where `n' is an integer, i.e. <nz> must be
even, for instance):
<nx> <ny> <nz>
P1 - 2n 2n
P21 2n 4n 2n
P21212 4n 4n 4n
P21212a '' '' ''
P212121 '' '' ''
P4122/P4322 4n 4n 8n
P41212/P43212 '' '' ''
P31/P32 6n 6n 6n
P3/R3 '' '' ''
P3121/p3221 '' '' ''
P61/P65 6n 6n 12n
H2OBG <h2obg>
Bulk water scattering to fill in all empty regions. <h2obg>
(default 0.0) is the value
which will fill the SOLVMAP.
ICOEFL uses this.
NAME PROJECT <pname> CRYSTAL <xname> DATASET <dname>
When a new MTZ file is being created, i.e. for MODE SFCALC with no HKLIN assigned,
this keyword is used to set up an appropriate project/crystal/dataset for the
output MTZ file. This is strongly recommended, otherwise the LABOUT columns
will be assigned to a default dataset with meaningless (for you) names.
NOSCALE
Scale between FP and FC is calculated but not applied to output
FP and SIGFP.
OMIT <th>
Reflections are not included in the refinement if mod(Fo/Fc)><th> or
mod(Fc/Fo)><th>. This provides a means of excluding data with bad
agreement from the refinement and it should be used with
care (Default: 1000).
RATIO <ratio>
Large xyz and B shifts are truncated to <ratio>*rms shift.
This is only used for UNRESTRAINED refinement.
At the beginning of a refinement, the value of ratio should be kept fairly low
(1.5 - 2.0) as shifts tend to be large when the errors in the parameters are
large. When most atoms are well refined, then the ratio may be increased to 3
or 4, to allow for the refinement of these atoms which are poorly placed.
RESOLUTION <dmin> [ <dmax> ]
<dmin>, <dmax> (default 1, 1000) are resolution limits in Angstroms
(or in 4*sin**2/l**2 if both are <1.0. They can be given in either order.
If only one value is given, it is assumed to be the high resolution cutoff.
The SFCALC option outputs FC for all reflections to the outer limit
(You usually calculate SFs to make maps and it is best to calculate
all SFs and then limit resolution etc. in FFT).
Default values are overwritten from HKLIN to include all reflections.
RMSSHIFT [XYZ] <rmsxyz> [B] <rmsb>
The shifts are scaled to limit the rms XYZ shift and B shift to be equal to the
set value, or to the calculated one, whichever is smaller. Only useful during
unrestrained refinement. Equivalent to setting STEPSIZE. If RMSSHIFT is given
STEPSIZE will be ignored.
Default values: 0.0 0.0.
RSCB <dsmin> <dsmax>
in Angstroms (either order) or 4s**2/lambda**2. Default values: 4.5Å
and 1.0Å.
If appropriate the program refines the value of the scale and
overall B value to fit <Fo**2> to <Fc**2> using reflections within
this resolution range. The default resolution range is chosen to use
only those reflections where the <Fo**2> distribution fit Wilson statistics
reasonably well. A standard protein distribution shows a sharp 5Å dip,
which can distort the scale factors. <Fc**2> is often unrealistically
high for low resolution data, until the solvent is adequately modelled.
The lack of `solvent contrast' causes this.
Users are often worried because they get lower R-Factors if
this scale is fitted for all data. This does not necessarily mean such a
scale is more correct!
The default HKLOUT will have:
-
H K L Scale*FP Scale*SIGFP ... FC*exp(-overall_B*ss) PHIC
If NOSCALE is specified HKLOUT will have:
-
H K L FP SIGFP ... FC*exp(-overall_B*ss) PHIC
After the end of each structure factor calculation a new scale factor
is calculated from the following expression, where
SCNEW is the new scale factor and SCALE is the previous scale factor.
Sum {(SCALE * Fcalc) * (Fobs)}
SCNEW = SCALE * --------------------------------------
Sum {(SCALE * Fcalc) * (SCALE * Fcalc)}
SCALE [FPART] <scale> <bov>
These parameters are applied to the Fpart as <scale>*exp(-<bov>*ssq/ll)
(Defaults: <scale>=1, <bov>=0).
Some structure factor outputs (notably SHELX) include an Fcalc scaled
as 10*absolute.
SFALL calculates FC on the absolute scale. If you wanted to combine these
values you would need
SCALE FPART 0.1
SIGMA <sigma>
Data with Fobs < sigma * sd(Fobs) will not be included in refinement
calculations. This cutoff has no effect on structure factor calculations.
Do not be alarmed when your R-Factor appears to be higher for them than when
you are refining; you will have more reflections in the structure factor sum.
Default value: -1.0, i.e. no cutoff.
STEPSIZE <szo> <szb> <szfrc>
This is only used during UNRESTRAINED refinement.
XYZ shifts are multiplied by <szo>, B shifts are multiplied by <szb>.
<szfrac> is the fractional change in step size above which a second
calculation is done.
NOT used if RMSSHIFT is given. Default values: 1.0 1.0 0.3.
The program tries to estimate these using an algorithm of R. Agarwal's.
See NOTES ON STEPSIZE.
If you are doing unrestrained refinement, structure factor calculation
will be repeated if abs(initial step - optimum step)/(initial step).
SYMMETRY <sg>
(Defaults to the symmetry of the SFSGROUP).
<sg> is the spacegroup number or name.
If HKLIN is assigned the symmetry will be picked up from the MTZ file
header and should NOT be specified with this keyword.
If HKLIN is not assigned you may need to give this.
See examples below.
TITLE <title>
Title (up to 80 characters) used on the printer output.
The title information after the keyword is used as a heading for map sections
and also as title information for the output reflection data file if present.
TRUNCATE <rhmin> <rhmax>
The input map is truncated - all values <<rhmin> are set to <rhmin>,
and all values ><rhmax> set to <rhmax>.
This should be used for the solvent flattening procedure.
VDWR <dlim>
<dlim> helps specify the maximum radius for which an atom is considered to
contribute to the electron density while building the `atom map'.
The initial value is modified by a factor Sqrt(B+25)/2PI to "spread" the density
for atoms with high B factors.
Default: 2.5.
The speed and accuracy of the calculation depend to some extent on the
values chosen. For 1.5Å data a value of 2.5 is sufficient (a maximum
value of 3.0 is allowed). For neutron data, or for atoms with low B
values, it could safely be reset to 1Å.
VERBOSE
Lots and lots of output - useful if debugging. Please note that this
may produce a message telling you that atom names have been set to ZZZ
or residue names to DUM if you have non-standard names. This is not
important as the atomic number is determined from the original atom
type. See Notes on Formfactors and Notes on Atom Names below.
WEIGHT <w>
The weight (default 0.0) applied to each structure factor is given by
(2sintheta/lambda)**<w>
(See [1]). At the beginning of the refinement use <w>=-1.5 or -1.0
to put most weight on the low angle data. As the refinement proceeds increase
<w> to 0.0 (But actually we never use it...).
MEMORY ALLOCATION
The program allocates a large work array, whose size may be changed
from the default by setting the logical name MEMSIZE to an appropriate
integer value. The current value is printed in the output (look for
`Memory allocation'). There are other dimensions which can currently
only be changed by recompilation and the error messages are currently
not very clear about changing the MEMSIZE.
INPUT AND OUTPUT FILES
The following input and output files are used by the program (in addition to
some scratch files which are deleted at the end of program execution):
Input Files:
- XYZIN
-
Input Coordinates File (Brookhaven format)
- HKLIN
-
Input Reflection Data File (MTZ format)
- MAPIN
-
Input Map File (standard map file format)
Output Files:
- XYZOUT
-
Output Coordinate File (Brookhaven format)
- HKLOUT
-
Output reflection data file in standard MTZ format
- MAPOUT
-
Output map file (Standard map file format)
- GRADMAT
-
Output matrix file for Hendrickson Konnert program
The files used in a single run of the program depend on the options requested
via the data control keywords.
NOTES ON FORMFACTORS
The form factors are read from the file assigned to ATOMSF.
This defaults to $CLIBD/atomsf.lib.
Coefficients for analytical approximation to the scattering factors for atom
identifier at a given resolution sintheta/lambda (=S) are given as:
a1*exp(-b1*s*s) + a2*exp(-b2*s*s) - 2 Gaussian approximation
(Agarwal, 1978)
or as:
a1*exp(-b1*s*s) + a2*exp(-b2*s*s) + a3*exp(-b3*s*s)
+ a4*exp(-b4*s*s) + c
- 5 Gaussian approximation
See [2] pp. 99-101 (Table 2.2B) or
[5] p. 500ff, table 6.1.1.4 for values a1-a4, b1-b4, c for X-rays.
For neutron scattering see [2] pp. 99-101 (Table 2.2B).
The atom name can be used to assign a formfactor to it.
- WARNING 1)
-
The program distinguishes CA - the metal - from CA - main
chain atom in a protein by its position on the coordinate line... And older
versions of FRODO get this wrong! The rules on defining atom names are the
same as those used by PDB. The chemical symbol is specified in columns 13 and
14 and the symbol is RIGHT justified.
0 1
e.g. column 1 5 0 34
ATOM ... ZN is Zinc
ATOM ... U is Uranium
For the common atom types the program includes tables for form factors for the
required value of NGAUSS. They are C N O H and S. All others must be read from
the library files atomsf.lib or atomsf_neutron.lib which are automatically
assigned. These contain tables of formfactors for every known atom (Kim
Henrick monument).
atomsf.lib contains the five Gaussian form for X-ray
data and the two Gaussian form for atoms H C N O and S.
Unless you are working at very high resolution (>1.5Å) the two Gaussian
approximations are adequate and speed up the calculations somewhat.
atomsf_neutron.lib has the constant value needed for neutron scattering.
NOTES ON XYZIN, ATOM NAMES ETC.
Coordinates are assigned to XYZIN in Brookhaven format.
The CRYST1 card (which gives the cell dimensions) and the SCALEi cards (which
give the required transform for turning the coordinates into fractional
ones) should be included in XYZIN.
Cell parameters from the HKLIN file or CELL keyword (if either are present) will
be used in the absence of CRYST1 and SCALEi cards, but otherwise the results of
omitting the cards are somewhat unpredictable - in some cases the program may fail,
in others it may proceed but using random or inappropriate values for the cell
parameters.
If an atom has occupancy set =0.0 it will be copied to the output file if
appropriate, but not used in the calculation. This means it is easy to exclude
suspect atoms from any calculation without deleting them from the file.
The program does not make use of anisotropic U factors in the calculation
of electron density and structure factors, and so ANISOU records are ignored.
Example of part of XYZIN:
REMARK 2ZN refined coordinates
CRYST1 82.500 82.500 34.000 90.00 90.00 120.0 1 R3
SCALE1 0.01212 0.00700 0.00000 0.00000
SCALE2 0.00000 0.01400 0.00000 0.00000
SCALE3 0.00000 0.00000 0.02941 0.00000
ATOM 1 N GLY A 201 1 -8.863 16.944 14.289 1.00 21.88
.................................
ATOM 826 CA ALA D 130 4 -5.022 21.709 -11.876 1.00 15.58
ATOM 830 CA WAT A 249 1 -0.002 -0.004 7.891 0.33 10.40
ATOM 831 ZN2 WAT A 249 1 0.000 0.000 -8.039 0.33 11.00
ATOM 832 OW1 WAT A 249 1 0.000 0.000 -8.039 0.00 11.00
.................................
Atom 1 is Nitrogen, 826 is Carbon, 830 is Calcium, 831 is Zinc and
832 is Oxygen.
NOTES ON HKLIN
List of H K L .. FP SIGFP ( FPART PHIP F0 .. F19 )
An MTZ file with column labels corresponding to
H K L .. FP SIGFP [FREE FPART PHIP F0 .. F19]. The header will
contain the cell dimensions and spacegroup
of the structure (if you want to change these, use MTZUTILS).
For SFCALC this file MUST BE SORTED on h k l (i.e. so h is the slowest varying
index, then k and l is the fastest), and be within the asymmetric unit
expected by the program; see below for expected limits (CAD will reorder
your file if necessary). It is sensible to do this if your data has not been
processed by CCP4 programs. XDS, MADNESS etc. sometimes have different
default asymmetric units to the CCP4 suite.
# an example of sorting native data and checking for correct
# asymmetric unit
cad HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_sorted <<EOF
TITLE Toxd data sorted - default symmetry P212121
LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
CTYPE FILE 1 E1=F E2=Q
EOF
For SFREF this file MUST BE SORTED on h k l (i.e. so h is the slowest varying
index, then k and l is the fastest), and be extended if necessary to cover
the whole asymmetric unit expected by the program for the SF spacegroup; see
below for expected limits. If you are working at a lower symmetry than optimal
the data must have been extended and sorted (CAD will do this too).
There is a reason - sorting data is quite slow, and you will probably use
this file many times during the course of a refinement.
# an example of extending and sorting native data to P1 +-h,+-k,+l
cad HKLIN1 $CEXAM/toxd/toxd HKLOUT toxd_p1 << EOF
OUTLIM SPACEGROUP P1
TITLE Toxd data extended to P1 cell
LABIN FILE 1 E1=FTOXD3 E2=SIGFTOXD3
CTYPE FILE 1 E1=F E2=Q
EOF
The agreement factor analysis is carried out against the values of FP.
Sigma values are assumed to be in column SIGFP.
IF FPART and PHIP have been defined, the FC vector equals the FPART vector
plus the FC contribution calculated from the input. FPART can be scaled by
entering a suitable value on the SCALE input.
Other columns in the input file can be copied to the output file by assigning
them to F0= F1= ... Or all columns can be copied over if the ALLIN subkeyword
is given in LABOUT.
Note however that if the input file contains columns with labels that match the
program labels then SFALL will stop with a fatal error.
Limits for Indices for the various space groups (these are the same as those
used in the data reduction programs and FFT):
P1 h k l : l >= 0
h k 0 : h >= 0
0 k 0 : k >= 0
P21 h k l : k >= 0, l >= 0
h k 0 : h >= 0
P21212 h k l : h >= 0, k >= 0, l >= 0
P212121 h k l : h >= 0, k >= 0, l >= 0
P41212 h k l : h >= 0, k >= 0, l >= 0, h >= k
(+ other tetragonal)
P31/P32 h k l : h >= 0, k > 0
0 0 l : l > 0
P3/R3 h k l : h >= 0, k > 0
0 0 l : l > 0
P3121/P3221 h k l : h >= 0, k >= 0, k <= h (all l)
h h l : l >= 0
P61/P65 h k l : h >= 0, k >= 0, k <= h (all l)
h h l : l >= 0
NOTES ON MAPIN
A standard MAP file. The header will contain the cell dimensions and
spacegroup of the structure.
If you wish to generate structure factors from a input map it is essential that
the map is in an appropriate format for the SFALL spacegroup you request.
Common errors are:
1) using a different grid for the FFT calculation and the SFALL calculation.
FATAL!
2) Changing the axis order in the FFT calculation - both have the same defaults
for the same spacegroup! FATAL!
3) Choosing a different asymmetric unit for the FFT and the SFALL calculation.
You must make sure your map has the one which covers the SFALL requirements.
There is no flexibility here.
Limits for axes for the various space groups (these are the same as those
used as defaults in FFT):
In space group P1, P21 and P21212a, 'b' is taken as the unique axis.
In space group P21212a, 1/4 is subtracted from the X and Y values of the
equivalent positions given in International Tables.
X1 X2 X3 Range of X3 Axis order
P1 Z X Y 0 to Y Z X Y
P21 Z X Y 0 to Y/2-1 Z X Y
P21212a Z X Y 0 to Y/4 Z X Y
P212121 X Y Z 0 to Z/4 Y X Z
P4122 X Y Z 0 to Z/8 Y X Z
P41212 X Y Z 0 to Z/8 Y X Z
P4322 X Y Z 0 to Z/8 Y X Z
P43212 X Y Z 0 to Z/8 Y X Z
P31 X Y Z 0 to Z/3-1 Y X Z
P32 X Y Z 0 to Z/3-1 Y X Z
P3 X Y Z 0 to Z-1 Y X Z
R3 X Y Z 0 to Z/3-1 Y X Z
P3121 X Y Z 0 to Z/6 Y X Z
P3221 X Y Z 0 to Z/6 Y X Z
P61 X Y Z 0 to Z/6-1 Y X Z
P65 X Y Z 0 to Z/6-1 Y X Z
OUTPUT from Structure Factor Calculation
Note: The logfile has been flagged with XLOGGRAPH symbols to allow
selected output to be displayed on an Xterminal.
- (a)
-
The limits of the box used in modelling the electron density.
- (b)
-
A list, with coordinates, of the first 10 atoms included in
generating the model electron density map.
- (c)
-
The number of atoms input, the number of atoms in the sorted list of atoms,
and the number of atoms used in generating the electron density.
- (d)
-
The minimum, average and maximum values of the temperature factors for the
atoms.
- (e)
-
Minimum and maximum electron density values are printed for each section
of the generated density. The number of slabs
used depends on the space allocated for the work arrays in the program.
- (f)
-
The number of reflections used and the number of reflections on file.
- (g)
-
Old and new scale factors (with an overall temperature factor BOV if
temperature factor dependent re-scaling was applied).
- (h)
-
A structure factor summary table in ranges of 4sin**2theta/lambda**2
giving, for each range, the number of reflections, the average values of Fobs
and Fcalc, the reliability index and an average value of the weighted errors.
- (i)
-
The overall reliability index after re-scaling.
NOTES ON XYZOUT
Coordinates in Brookhaven format with the XYZ or B shifts applied - output of
unrestrained refinement.
The proportion of the shift to be applied can be modified (see the
description of keyword STEPSIZE).
If an input atom has occupancy of 0.0 it will be copied to the output file.
NOTES ON HKLOUT
The output file contains FC and PHIC
only for those reflections which exist in the HKLIN file, even if the structure
factor calculation has covered more of reciprocal space. The list will include
ALL reflections in HKLIN within the outer resolution limit. There is no inner
resolution cutoff, or sigma cutoff. Hence there may well be more reflections
included than during refinement and your R-factor may well be higher than the
one output during refinement. All columns specified in LABIN will be written
to HKLOUT. If ALLIN is specified then all input columns will be in HKLOUT.
If no HKLIN has been assigned the program outputs a list of H K L FC PHIC for
all possible reflections within the reciprocal asymmetric unit for the SFSGROUP
that are within the requested resolution limits.
FC and PHIC can be modified for use in solvent flattening (see keyword AVMODE).
NOTES ON MAPOUT
Several types of maps can be output - see keyword MODE for how to produce each
one.
- a)
-
The `atom map'. All density assigned at each grid point due to a
neighbouring atom.
- b)
-
The `atom map' modified to include a residue flag for each main chain and
side chain atom which has contributed most to this grid point. This allows the
map correlation program to produce a set of correlations residue by residue
between an ideal map (the `atom map') and any other map - part of real space
R-factor stuff.
- c)
-
The `atom map' modified to include an atom number flag. This can be used to
look at hydrogen density in a neutron map.
- d)
-
A `solvent map' with zero at all points within the protein and some constant
(H2obg) at all other points. This is a step on the way to adding in bulk
solvent to calculated structure factors.
OUTPUT from the Gradients Calculation - NOTES ON GRADMAT
A direct access file of gradient information is output ready for
input to PROLSQ (no longer available in the CCP4 Suite).
The following items are output in this section:
- (a)
-
The value of the minimisation function (PII). Plus the number of reflections
included in the calculation of PII.
- (b)
-
The average and peak values of the gradients and shifts.
- (c)
-
The slope of the search direction and the initial step size used in the
quadratic approximation to find the optimum step size.
- (d)
-
For unrestrained refinement a table of estimated error against B-factor is
given. For each temperature range there is the number of atoms and the
estimated error (standard deviation).
OUTPUT from the Calculation of the Optimum Shifts
This section gives PII (minimisation function) values, relative slopes,
calculated optimum step sizes and, when determined, the actual step size used
in applying the shifts.
NOTES ON STEPSIZE
R. Agarwal attempted to estimate the best step size to use during UNRESTRAINED
refinement.
A displacement of the form alphaDelta(u), which minimises the minimisation
function of the least squares P(u + alphaDelta(u)), will be the optimum where
alpha is the optimum step size. As P as a function of alpha approximates to
a quadratic function, it is possible to calculate the optimum step size,
alphaopt, from the values of P(alpha) at alpha = 0 and alpha = alpha1 and from
the slope of P(alpha) at alpha = 0. The initial step size alpha1 used in the
calculation is chosen as 1.0 or the value input in the control data. If the
fractional change in step size is greater than SZFRC (by default 0.3) then
a second calculation is done with alpha1 taken as the optimum step size just
calculated.
COMMON PROBLEMS
- Problem: Warning after opening the input reflection file HKLIN
about the sort order not being H K L.
Solution: This may not crash the program (sometimes it does) but even so
some of the output may be unreliable if the reflections in HKLIN have not been
sorted with h as the slowest, k as the intermediate, and l as the fastest varying
index. See NOTES ON HKLIN above for the
requirements placed on HKLIN.
- Problem: Program stops with the message:
MTZ file label duplicates program label name
Solution: SFALL should warn you before stopping, which labels are
duplicated, e.g.:
Warning: this may fail!
This column label FC is also a program label
The suggested workaround at present is to rename the offending file labels,
for example using a utility program such as SFTOOLS,
before rerunning the program.
- Problem: No conversion from orthogonal to fractional atom co-ordinates
(ie listed fractional co-ordinates have no value, or the same values as the
orthogonal co-ordinates).
or
Problem: Program stops with the message:
**** No Cell Input to subroutine rbfro1?? ****
after reading in XYZIN.
Solution: Check that CRYST1 and SCALE cards are present in the input
co-ordinate file XYZIN.(See NOTES ON XYZIN above).
- Problem: Program crashes or stops after reporting statistics on B's in
MODE SFCALC. (May also occur in other MODEs for the same reason.)
Solution: Check that the input reflection file has the correct asymmetric
unit and that the reflections have been sorted so that h is the slowest changing
index, k is the next, and l varies the fastest. (See NOTES
ON HKLIN above).
Nb: the program should now generate an explicit error message suggesting that this
is the problem.
- Problem: Program fails after printing the title and the line
"Chain Names - First Residue - Last Residue - serial Kcount" (possibly also
with error messages concerning array sizes).
Solution: You may have to increase the value of MEMSIZE (see
MEMORY ALLOCATION above).
N.B.: the program should now generate an explicit error message suggesting that this
is the problem.
- Common problems associated with the input map can be found in the section
NOTES ON MAPIN.
EXAMPLES
Calculate map from atom coordinates
sfall
XYZIN /f/scratch/swift/pa_model1_2.pdb
XYZOUT $SCRATCH/junk.pdb
MAPOUT $SCRATCH/junk.map
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE ATMMAP
RESO 30 2.1
BINS 60
RSCB 5.0 2.1
SFSG 1
END
END-sfall
Read in coordinates to calculate structure factors
sfall
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz
HKLOUT $SCRATCH/pa_model1_2_phase.mtz
XYZIN /f/scratch/swift/pa_model1_2.pdb
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC XYZIN HKLIN
RESO 30 2.1
BINS 60
RSCB 5.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4 FREE=FreeRflag
LABO FC=FC PHIC=AC
END
END-sfall
Read in map to calculate structure factors
sfall
HKLIN $SCRATCH/pt2_pt4_shhg2_os2_nat.mtz
HKLOUT $SCRATCH/pa_model1_2_phase.mtz
MAPIN $SCRATCH/junk.map
<< END-sfall
TITL Phasing on initial PA structure (refined twice)
GRID 52 64 76
MODE SFCALC MAPIN HKLIN
RESO 30 2.1
BINS 60
RSCB 8.0 2.1
SFSG 1
LABI FP=F_V4 SIGFP=SIGF_V4
LABO FC=FC PHIC=AC
END
END-sfall
Iteration_cycles to do the back-transformation and phase combination cycles in solvent flattening procedure
# - back-transform map with sfall to obtain Fcs and phases, scale
# and calculate R-factor
# - combine the phases with sigmaa
# - run fft to calculate a new map with Fobs and combined phases as
# input to flatmap
#
sfall HKLIN ../mtz/fvb18_sigmaa.mtz
HKLOUT fvb_flat_14.mtz
MAPIN fvb_flattened_14.map
<< EOF-sfall
titl back transformation for cyc 14
mode SFCALC MAPIN HKLIN
grid 104 128 160
reso 200.0 3.0
BINS 40
rscb 10.0 3.0
sfsg 19
AVMODE RADIUS 8
badd 0
FORM NGAUSS 5 C H N O S
LABI FP=fvb_Fnp SIGFP=fvb_SIGFnp -
F0=PHIB_I3 F1=FOM_I3 -
F2=A F3=B F4=C F5=D
LABO FC=FCWANG PHIC=PHICWANG WANGWT=WC8
END
EOF-sfall
Unrestrained refinement
sfall
HKLIN ../data/taka_fx_trunc.mtz
XYZIN ../data/takaxp_model8_b.pdb
XYZOUT $SCRATCH/junk.pdb
<< END-sfall
TITL TAKAXP_MODEL8_9 SFS
GRID 104 136 264
MODE SFREF XYZ UNRESTRAINED
FREE 3.0
RESO 20 2.1 60
RSCB 8.0 2.1
SFSG 19
FORM NGAU 2 Ca Cr Fe+2 P
LABI FP=FP SIG=SIGFP FREE=FreeRflag
END-sfall
Prolsq refinement cycles
#!/bin/csh -f
#
# REFINE consists of sfall, protin and prolsq
#
set LastCycle=146
set Number_of_Cycles=8
set CycleCounter=0
#
Begin:
#
@ CurrentCycle=$LastCycle + 1
#
# ========== SFALL ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running SFALL for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
sfall HKLIN s91a.mtz
XYZIN s91a_cyc${LastCycle}.brk
GRADMAT s91a_GRADMAT${CurrentCycle}.tmp
<< END-sfall
TITL Barnase s91a Mutant
MODE SFREF XYZB RESTRAINED ! for prolsq
GRID 90 90 120 !div CELL by these should give .=. 0.7 A
BINS 60
RESO 10 2.2 !last no. is nstep
RATI 3 !truncate xyz and B shrifts to RATI*RMS
BADD 0 !smear the gaussians to gen atomic density [10.0]
RSCB 4 1.5 !limits for refining scale and B, same as RESO ?
SFSG 145 ! fft space group
LABIN FP=S91A_F SIGFP=S91A_SIGF FREE=FreeRflag
END
END-sfall
#
/bin/rm ${SCRATCH}s91a_junk.brk ${SCRATCH}sfall*
#
# ========== PROTIN ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE
echo ' Running PROTIN for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
protin XYZIN s91a_cyc${LastCycle}.brk
PROTOUT s91a_protout.out
PROTCOUNTS s91a_protcounts.out
DICTPROTN ${LIBD}protin.dict
<< END-protin
TITLE Barnase Ser91 -> Ala
SYMMETRY 145
!
CHNNAME ID A CHNTYP 1 ROFFSET 0
CHNNAME ID B CHNTYP 2 ROFFSET 0
CHNNAME ID C CHNTYP 3 ROFFSET 0
chnname id E chntyp 4 roffset 0
!
CHNTYP 1 NTER 3 VAL 3 CTER 110 ARG 2
CHNTYP 2 NTER 3 VAL 3 CTER 110 ARG 2
CHNTYP 3 NTER 2 GLN 3 CTER 110 ARG 2
chntyp 4 wat
!
LIST FEW ![few]/some (for >20A)/all
PEPP 5 !no. of atoms restrained to be in a plane [5]
VDWR 1 CPR 8 3.4
END
END-protin
#
# ========== PROLSQ ==========
#
set DATE=`date`
echo ' *************************** '$CurrentCycle' **************'
echo ' '
echo ' Running PROLSQ for Cycle Number :' $CurrentCycle' on '$DATE
echo ' '
echo ' *************************** '$CurrentCycle' **************'
#
prolsq XYZIN s91a_cyc${LastCycle}.brk
PROTOUT s91a_protout.out
PROTCOUNTS s91a_protcounts.out
GRADMAT s91a_GRADMAT${CurrentCycle}.tmp
XYZOUT s91a_cyc${CurrentCycle}.brk
<< END-prol
TITLE Barnase s91a Mutant
CGCYCLES 20 100 1 !max no. cyc for conjugate gradient soln [50]
ORIGIN 0 0 0 !origin constraint flags
RESOLUTION 10 2.2 !select Fobs
OUTPUT XYZ !o/p XYZ(.brk)/SF(.mtz)/SHIFTS(shifts file)
RTEST -1 1 1 1 1 1 1 1 !-1 -> +1 when R factor .=25%
NOUPDATE !not update atomic coord in PROTIN o/p file
MATRIX .30 .85 .85 !1st param adjusts SF/geom contribution
!start=.08;.12;.2;.25;.3;.35; final=.08
!2nd&3rd:shift factors for position and B
BATOM !(+) when refining indiv B
!
! ***** Restraints *****
DISTANCE 1 0.02 0.04 0.05 0.01 0.1
PLANE 1 0.02
CHIRAL 1 0.12
TEMPERATURE 1.0 1.0 1.5 1.5 2.0 0.1 !start
!TEMPERATURE 1.0 1.5 2.0 2.0 2.5 3.0 !when R factor <25%
HOLD 0.1 1.0 0.1
!TORSION 1 20 20 30 50 !when R factor >25%
TORSION 1 10 10 15 45 !when R factor <25%
VANDERWAAL 1 0.2 -0.3 0.0 -0.2 0.0
!
! ***** Monitor *****
MONITOR DISTAN 3. VANDERWAAL .5 TORSION 3
MONITOR PLANE 3 CHIRAL 3 !for R factor<25%
END
END-prol
#
Cleanup:
#
/bin/rm s91a_protout.out s91a_protcounts.out s91a_GRADMAT${CurrentCycle}.tmp
#
@ CycleCounter=$CycleCounter + 1
@ LastCycle=$LastCycle + 1
#
# Test to see whether Number_of_cycles done
#
if ($CycleCounter<$Number_of_Cycles) goto Begin
#
Simple Runnable Examples
examples/unix/runnable/sfall.exam
AUTHORS
Eleanor Dodson and Ted Baker, based on a program by Ramesh Agarwal
and Neil Isaacs.
Documented for the Daresbury Laboratory protein crystallography system by John
Campbell.
PROGRAM FUNCTION
The program is based on a least-squares refinement technique using fast Fourier
transform (FFT) methods as devised by Ramesh Agarwal. His paper [1] on the
technique should be studied to get a detailed understanding of the method and
only a broad description of the method is given in this section.
The program may be used for
- calculating and outputting the matrix and gradient terms required by the
restrained least squares refinement program PROLSQ (no longer available in the CCP4 Suite). The calculated shifts can
be applied directly to the atomic coordinates/B-factor, unrestrained
refinement.
- The generation and calculation of an electron density map from an input
set of atomic coordinates and its transformation, using FFT techniques, to
obtain a set of calculated structure factors.
The program SFALL is used to calculate structure factors from an electron
density map using Fast Fourier Transform techniques. Whenever possible, the
crystallographic symmetry is exploited to increase the efficiency of the
calculation. Care must be taken by the user to ensure that the map has been
calculated on a sufficiently fine grid if significant errors in the calculated
structure factors are to be avoided. As a rough rule a grid size of
approximately three times the maximum index of the data in each direction is
appropriate. The use of the Fast Fourier Transform in structure factor
calculations for large molecules has been described and discussed by L.F. Ten
Eyck [3].
- The calculation of structure factors from an electron density map. The
same cautionary measures must be applied as those explained in (b).
Modelling of the Electron Density and Calculation of Structure Factors
The contribution of an atom to a structure factor is the product of an atomic
form factor function and a Gaussian temperature factor function. If the form
factor function is approximated by a sum of Gaussian functions, then this
contribution may be represented by the sum of Gaussian terms and so may the
modelled electron density. The amount of computation required in setting up
the electron density map is proportional to the number of Gaussian terms used
in approximating the form factor. The use of a two term Gaussian is probably
adequate in most cases where the resolution of the data does not extend beyond
1.5Å. The program does, however, allow for the input of other numbers of
terms. Non default atoms defined in FORMFACTOR are usually represented as five
Gaussians since heavy atom scattering factors are more complex.
Tables of coefficients for a five term Gaussian approximation to the atomic
form factors are given in the International Tables for X-ray Crystallography
Vol.IV [2]. Some values for two and one term Gaussian approximations are
given in the Agarwal paper.
The radius of the atoms (beyond which the electron density is considered to be
negligible) is also required in the calculation. The value for this radius must
be chosen to balance the requirements of accuracy (large radius) and speed
(small radius). The tables below indicate the radii required for carbon atoms,
for different temperature factors (Bm), to give a given fractional loss of
electron density (DeltaZm/Zm).
DeltaZm/Zm 0.02 0.01 0.005
Two term Gaussian Bm
5 1.89 2.06 2.22
10 2.02 2.21 2.38
20 2.26 2.47 2.68
More values are given in the Agarwal paper.
The electron density map is built up using the contributions from each atom
within, or near (within the atom radius) to, the part of the map being
calculated. A three dimensional FFT of the electron density map is used to
calculate the structure factors. In this calculation the space group symmetries
are used whenever possible.
If the sampling grid is too coarse, significant errors may occur in the
calculation of the structure factors. These errors may be reduced [3]
either by using a finer grid for the calculation or by adding an artificial
extra temperature factor to all the atoms when modelling the density (not
recommended). This factor has also to be taken into account when the calculated
structure factors are compared with the observed structure factors.
Calculation of the Gradients
The gradients are calculated by convoluting a modified difference density
function with the electron densities of the individual atoms being refined.
The density function is calculated by calculating the Fourier transform (again
using the FFT) of a difference function based on the weighted values of the
errors in the structure factors. A three dimensional difference FFT is
calculated and convoluted with the derivatives of the atomic density for x y
and z, or B. The convolution is carried out by summing, over all the grid
points enclosed by the atom (radius defined as described above), the product of
the electron density of the atom and the value of the modified difference
density function. Thus, the calculation of the gradients is similar to the
calculation of the structure factors with the steps being reversed.
The Normal Matrix
Only the diagonal terms (interacting between parameters of the same atom) are
calculated for the normal matrix. The method described by Agarwal is
used but is generalised to allow for a variable number of terms in the Gaussian
approximations to the atomic form factors. Also the function A'xx(bi) is
tabulated for each value of bi from 1 to 150 and the nearest values, rather
than interpolated values, are selected from the table when forming the diagonal
terms of the matrix.
A set of terms, H, is only calculated for one parameter as the other diagonal
terms may be calculated to a good degree of accuracy from this initial set. An
inverse matrix is set up for calculating the orthogonalised contributions of
the gradient.
Calculation of the Shifts
The shifts, calculated as described above, may not in fact be the optimum ones
to use due to the approximate nature and the non-linearity of the method used.
In practice, it is unwise to apply large shifts as these may give rise to
instability in the refinement or to `oscillating atoms' and thus provision is
made to truncate the larger shifts. The program allows for the input of a
ratio, RATIO, which may be used to truncate the maximum value of a shift to
RATIO * r.m.s. value of the shifts
At the beginning of a refinement, the value of RATIO should be kept fairly low
(1.5-2.0) as shifts tend to be large when the errors in the parameters are
large. When most atoms are well refined, then the ratio may be increased to 3
or 4, to allow for the refinement of these atoms which are poorly placed.
Even these truncated shifts may not be the optimum shifts to be applied. A
displacement of the form alphaDelta(u), which minimises the minimisation
function of the least squares P(u + alphaDelta(u)), will be the optimum where
alpha is the optimum step size. As P as a function of alpha approximates to a
quadratic function, it is possible to calculate the optimum step size,
alphaopt, from the values of P(alpha) at alpha = 0 and alpha = alpha1 and from
the slope of P(alpha) at alpha = 0. The initial step size alpha1 used in the
calculation is chosen as 1.0 or the value input in the control data. If the
fractional change in step size is greater than SZFRC (by default 0.3) then a
second calculation is done with alpha1 taken as the optimum step size just
calculated.
REFERENCES
- Agarwal, R.C., Acta Cryst., (1978), A34, 791-809.
- International Tables for X-ray Crystallography, Vol.IV, (1974), Kynoch Press.
- Ten Eyck, L.F., Acta Cryst., (1977), A33, 486.
- Bruenger, A.T., Nature 355, 472-4 (1992)
- International Tables for Crystallography, vol. C, (1995), Kluwer.
- "Refinement of protein structures", Proceedings of the
Daresbury Study Weekend 15-16 November, 1980 (Compiled by P.S. Machin, J.W. Campbell and M. Elder).
SEE ALSO
cad,
freerflag,
fft,
icoefl,
mtzutils,
overlapmap,
pdbset,
prolsq (no longer available in the CCP4 Suite),
protin,
sigmaa,
watersearch (1),
xloggraph,
X-PLOR manual