sfcheck [HKLIN in.mtz] [XYZIN in.pdb]
[HKLOUT out.mtz] [MAPOUT map.ccp4]
[PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]
Authors: A.A.Vagin, J.Richelle, S.J.Wodak.
email: alexei@ysbl.york.ac.uk
A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of
procedure for evaluating the quality of macromolecular stracture-factor
data and their agreement with atomic model.
Acta Cryst.(1999). D55, 191-205
1. Data deposited in PDB file:
1.1 Crystal:
cell parameters and space group
1.2 Model:
number of atoms
number of water molecules
solvent content
<B> for model
Matthews coefficient and corresponding solvent %
reported resolution
reported R-factor
1.3 Refinement:
refinement program
resolution range for refinement
reported sigma cut-off for refinement
reported R-factor
reported Rfree
2. Data computed by Sfcheck:
2.1 Structure factors:
number of reflections
number of reflections with I > sigma
number of reflections with I > 3sigma
resolution range
completeness
R-standard computed by Sfcheck (sum(sigma)/sum(F))
Wilson plot (amplitudes vs. resolution)
overall B-factor by Patterson origin peak and by Wilson plot
optical resolution
expected minimal error in coordinates
Anisotropic distribution of Structure Factors -ratio of Eigen values
2.2 Model vs. structure factors:
R-factor
Correlation coefficient
R-factor for reported resolution range and sigma cut-off
Rfree
Luzzati plot (R-factor vs. resolution)
coordinate error from Luzzati plot
expected maximal error in coordinates
diffraction-data precision indicator (DPI)
Patterson scaling - scale , Badd
Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23
Solvent correction - Ks,Bs
Optical resolution
Optical resolution is defined as an expected minimum distance
between two resolved peaks in the electron density map.
With a single-Gaussian approximation of the shape of atomic peak
the minimum distance between two resolved peaks is twice the standard
deviation "sigma" or the width of atomic peak W (W = 2 sigma).
Expected width of atomic peak W is computed as
W = sqrt ( 2 (sigma_patt2 + sigma_res2) )
where sigma_patt - standard deviation of the Gaussian corresponded
to the Patterson origin peak.
sigma_res - standard deviation of the Gaussian corresponded
to the origin peak of spherical interference function
which is Fourier transform of the sphere in
the reciprocal space with radius 1/d_min.
sigma_res = 0.356 d_min.
d_min is minimum d-spacing, "nominal resolution".
The "expected optical resolution for complete data set" is
calculated as above but using all reflections, with values for
missing reflection being the average value in the corresponding
resolution shell.
Plot of Optical resolution for an atom with B=0 demonstrates
behavior of the part of Optical resolution corresponded on the
series termination.
Patterson scaling
Scaling in SFCHECK is based on the Patterson origin peak which is
approximated as a gaussian. Compared to the conventional scaling
by the Wilson plot, this method is particularly advantageous when
only low resolution data are available.
The program gives overall B-factors estimated by both methods.
Low resolution cut-off
Disordered solvent contributes to diffraction at low resolution.
However, removal of low resolution data from calculations results
in a series termination effect which is noticeable in the electron
density at the surface of the molecule. To reduce the influence of
low resolution terms, SFCHECK applies a "soft" low resolution
cut-off to structure factors according to the formula:
Fnew = Fold (1-exp(-Boff*s2)) , where Boff = 4dmax2
Sfcheck uses Boff = 256. This corresponds to the low resolution
cut-off at 8 A.
Scaling
Sfcheck scales Fobs and Fcalc by the Patterson origin peak using all
data applying Boff.
First, it computes Boveralls for observed and calculed amplitudes.
Second, it makes the width of the calculated peak equal to the
observed, i.e. computes an additional thermal factor Badd:
Badd = Boverall_obs - Boverall_calc
Third, Sfcheck computes the scale factor for Fcalc:
sum(Fobs2*(1-exp(-Boff*s2)))
scale = sqrt ( --------------------------------------------- )
sum(Fcalc2*exp(-Badd*s2)*(1-exp(-Boff*s2)))
Finally we have:
Fcalc_scaled = Fcalc * scale * exp(-Badd*s2)
Sfcheck computes the R-factor and Correlation coefficient for all
data applying the soft low resolution cut-off as described above.
Sfcheck computes the R-factor and Correlation coefficient for
the reported resolution range and reported sigma cut-off without
applying Boff. If the Fobs file contains reflections marked with
the Rfree flag, the program computes Rfree.
Completeness
Missing data are restored by using the average values of
intensities for the corresponding resolution shell.
The program produces a plot of completeness vs. resolution and
a plot of the average radial completeness in polar coordinates
theta and phi.
Expected minimal error
The minimal coordinate error is estimated using the experimental
sigmas(F). The standard deviation of an atomic coordinate is
given by:
sig_min(r) = sqrt(3)*sigma(slope)/curvature
where sigma(slope) is the slope of then electron density in the
x direction (along A).
curvature is an average curvature of the electron
density at the atomic peak centre.
and computed as:
sigma(slope) = (2pi*sqrt(sum(h2*(sigF)2)))/(VOL*A)
VOL - volume of cell
A - cell parameter
h - Miller index
summation over all reflections
( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.)
curvature = (2pi2*sum(h2*F))/(VOL*A2)
( Murshudov et al., (1997) Acta.Cryst D532, 240.)
If there is no experimental sigma Sfcheck
uses sigma = Fobs * 0.04 for all reflections.
Expected maximal error
The expected maximal error in coordinates is estimated
by the difference between !Fobs! and !Fcalc!:
sig_max(r) = sqrt(3)*sigma(slope)/curvature
sigma(slope) = (2pi*sqrt(sum(h2*(Fobs-Fcalc)2)))/(VOL*A)
curvature = (2pi2*sum(h2*F))/(VOL*A2)
For missing reflections the program uses the average value of
sigma(Fobs) for the corresponding resolution shell instead
of (Fobs-Fcalc).
DPI - diffraction-data precision indicator
The Cruickshank method of estimation of coordinate error.
(Acta Cryst.(1999), D55, pp 583-601)
sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact
where C - fractional completeness.
Rfact - convential crystallographic R-factor
Nobs - number of reflections
Dmin - maximal resolution
If Rfree flags are specified, the program uses the Murshudov approach
to calculate DPI:
(Newsletter on protein crystallography., Daresbury
Laboratory, (1997) 33, pp 25-30.)
sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree
Luzzati plot (R-factor vs. resolution)
Program computes the average radial error <delta> in coordinates
by Luzzati plot.
<delta(r)> = 1.6 sig(x)
Solvent content
The solvent content is the fraction of the unit cell volume not occupied
by the model. The model consists of ALL atoms present in the coordinate
file including ordered solvent.
Residual factor Rmerge
sum_i (sum_j |Ij - <I>|)
Rmerge(I) = --------------------------
sum_i (sum_j (<I>))
Ij = the intensity of the jth observation of reflection i
<I> = the mean of the intensities of all observations of
reflection i
sum_i is taken over all reflections
sum_j is taken over all observations of each reflection
Local error estimation (plotted for each residue, for the backbone
and for the side chain):
1. Amplitude of displacement of atoms from electron density
2. Density correlation coefficient
3. Density index
4. B-factor
5. Index of connectivity
Displacement
Displacement of atoms from electron density is estimated from the
difference (Fobs - Fcal) map. The displacement vector is the ratio of
the gradient of difference density to the curvature. The amplitude of
the displacement vector is an indicator of the positional error.
Correlation coefficient
The density correlation coefficient is calculated for each residue
from the atomic densities of (2Fobs-Fcalc) map - "Robs" and the model
map (Fcalc) - "Rcalc" :
D_corr = <Robs><Rcalc>/sqrt(<Robs2><Rcalc2>)
where <Robs> is the mean of "observed" densities of atoms
of the residue (backbone or side chain).
<Rcalc> is the mean of "calculated" densities of atoms
of the residue.
The value of density for an atom from the map R(x) is given by:
sum_i ( R(xi) * Ratom(xi - xa) )
Dens = ----------------------------------
sum_i ( Ratom(xi - xa) )
where Ratom(x): the atomic electron density for x-th point of grid.
xa : vector of the centre of atom.
xi : vector of the i-th point of grid.
Sum is taken over all grid points which have distance
from the centre of atom less than Radius_limit.
For all atoms Radius_limit = 2.5 A.
Index of density and index of connectivity
The index of connectivity is the product of the (2Fobs-Fcal) electron
density values for the backbone atoms N, CA and C, i.e. the geometric
mean value for these atoms. Low values of this index indicate breaks
in the backbone electron density which may be due to flexibility of
the chain or incorrect tracing. The index of density is a similar
indicator which is calculated for all atoms of a given residue.
An omit map procedure is a means of reducing the model bias in
the electron density calculated with model phases. SFCHECK produces
the so called total omit map by an automatic procedure. First, the
initial (Fobs, PHImodel) map is divided into N boxes. For each
box, the electron density in it is set to zero and new phases are
calculated from this modified map. A new map is calculated using
these phases and Fobs. This map contains the omit map for the
given box which is stored until the procedure is repeated for
all boxes. At the end, all the boxes with omit maps are used
to assemble complete omit map. Phases calculated from this complete
omit map are combined with the initial phases. The whole procedure may
be repeated (keyword NOMIT). Note: it is time consuming!
Sfcheck can optionly create an output file with omit phases
(see HKLOUT)
Sfcheck can use only one input file of either coordinates or structure
factors. In such cases Sfcheck can of course produce limited analysis of
the coordinates or the data.
Sfcheck checks for merohedral twinning.
(only if the program uses one input MTZ file of structure factors)
Perfect twinning test: <I2>/<I>2
Sfcheck will compute a Partial Twinning test:
H = !I(h1)-I(h2)!/(I(h1)+I(h2))
for the following space groups:
P3 P31 P32 R3
P4 P41 P42 P43 I4 I41
P6 P61 P62 P63 P64 P65
P312 P321 R32
P23 F23 I23 P213 I213
Alpha (twinning fraction) = 1/2 - <H>
If 0.05 <Alpha< 0.45 Sfcheck can create an output MTZ file
with detwinned data (see HKLOUT)
For details see: Yeates,T.O. (1997) Methods in Enzymology 276, 344-358.
Information is output to a PostScript file:
sfcheck_<identifier>.ps
Sfcheck can create:
LABIN, NOMIT
Specify input column labels. Only needed if a MTZ file is input.
Sfcheck labels defined are: F, SIGF, F(-), SIGF(-), I, SIGI, I(-), SIGI(-), FREE
| F | label of F or F(+) |
|---|---|
| SIGF | label of sigma F or sigma F(+) |
| F(-) | label of F(-) |
| SIGF(-) | label of sigma F(-) |
| FREE | label of flag of Rfree factor |
| I | Structure Intensity of hkl |
| SIGI | Standard deviation of the above |
| I(-) | Structure Intensity of -h -k -l |
| SIGI(-) | Standard deviation of the above |
Only required if omit procedure is needed
<nomit> is the number of cycles of omit procedure. Default is 0, i.e no omit procedure. The <nmon>=2 is recommended value.
There are two input files: MTZ and PDB. Sfcheck will assess the agreement between the atomic model and X-ray data. # -------------------------------- sfcheck HKLIN test.mtz XYZIN 2sar.pdb << eor # -------------------------------- LABIN F=F SIGF=SIGF FREE=FreeR_flag eor
There are two input files: MTZ and PDB. For assessing Sfcheck will use the phases after the omit procedure. New MTZ file with OMIT Phases will be created. As variable PATH_OUT is used the output PostScript file will be placed in directory: /y/people/alexei/ # -------------------------------- sfcheck HKLIN test.mtz XYZIN 2sar.pdb \ HKLOUT new.mtz PATH_OUT /y/people/alexei/ \ << eor # -------------------------------- LABIN F=F SIGF=SIGF FREE=FreeR_flag NOMIT 2 eor