sfcheck [HKLIN in.mtz] [XYZIN in.pdb]
[HKLOUT out.mtz] [MAPOUT map.ccp4]
[PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]
Authors: A.A.Vagin, J.Richelle, S.J.Wodak. email: alexei@ysbl.york.ac.uk A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of procedure for evaluating the quality of macromolecular stracture-factor data and their agreement with atomic model. Acta Cryst.(1999). D55, 191-205
1. Data deposited in PDB file: 1.1 Crystal: cell parameters and space group 1.2 Model: number of atoms number of water molecules solvent content <B> for model Matthews coefficient and corresponding solvent % reported resolution reported R-factor 1.3 Refinement: refinement program resolution range for refinement reported sigma cut-off for refinement reported R-factor reported Rfree 2. Data computed by Sfcheck: 2.1 Structure factors: number of reflections number of reflections with I > sigma number of reflections with I > 3sigma resolution range completeness R-standard computed by Sfcheck (sum(sigma)/sum(F)) Wilson plot (amplitudes vs. resolution) overall B-factor by Patterson origin peak and by Wilson plot optical resolution expected minimal error in coordinates Anisotropic distribution of Structure Factors -ratio of Eigen values 2.2 Model vs. structure factors: R-factor Correlation coefficient R-factor for reported resolution range and sigma cut-off Rfree Luzzati plot (R-factor vs. resolution) coordinate error from Luzzati plot expected maximal error in coordinates diffraction-data precision indicator (DPI) Patterson scaling - scale , Badd Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23 Solvent correction - Ks,Bs Optical resolution Optical resolution is defined as an expected minimum distance between two resolved peaks in the electron density map. With a single-Gaussian approximation of the shape of atomic peak the minimum distance between two resolved peaks is twice the standard deviation "sigma" or the width of atomic peak W (W = 2 sigma). Expected width of atomic peak W is computed as W = sqrt ( 2 (sigma_patt2 + sigma_res2) ) where sigma_patt - standard deviation of the Gaussian corresponded to the Patterson origin peak. sigma_res - standard deviation of the Gaussian corresponded to the origin peak of spherical interference function which is Fourier transform of the sphere in the reciprocal space with radius 1/d_min. sigma_res = 0.356 d_min. d_min is minimum d-spacing, "nominal resolution". The "expected optical resolution for complete data set" is calculated as above but using all reflections, with values for missing reflection being the average value in the corresponding resolution shell. Plot of Optical resolution for an atom with B=0 demonstrates behavior of the part of Optical resolution corresponded on the series termination. Patterson scaling Scaling in SFCHECK is based on the Patterson origin peak which is approximated as a gaussian. Compared to the conventional scaling by the Wilson plot, this method is particularly advantageous when only low resolution data are available. The program gives overall B-factors estimated by both methods. Low resolution cut-off Disordered solvent contributes to diffraction at low resolution. However, removal of low resolution data from calculations results in a series termination effect which is noticeable in the electron density at the surface of the molecule. To reduce the influence of low resolution terms, SFCHECK applies a "soft" low resolution cut-off to structure factors according to the formula: Fnew = Fold (1-exp(-Boff*s2)) , where Boff = 4dmax2 Sfcheck uses Boff = 256. This corresponds to the low resolution cut-off at 8 A. Scaling Sfcheck scales Fobs and Fcalc by the Patterson origin peak using all data applying Boff. First, it computes Boveralls for observed and calculed amplitudes. Second, it makes the width of the calculated peak equal to the observed, i.e. computes an additional thermal factor Badd: Badd = Boverall_obs - Boverall_calc Third, Sfcheck computes the scale factor for Fcalc: sum(Fobs2*(1-exp(-Boff*s2))) scale = sqrt ( --------------------------------------------- ) sum(Fcalc2*exp(-Badd*s2)*(1-exp(-Boff*s2))) Finally we have: Fcalc_scaled = Fcalc * scale * exp(-Badd*s2) Sfcheck computes the R-factor and Correlation coefficient for all data applying the soft low resolution cut-off as described above. Sfcheck computes the R-factor and Correlation coefficient for the reported resolution range and reported sigma cut-off without applying Boff. If the Fobs file contains reflections marked with the Rfree flag, the program computes Rfree. Completeness Missing data are restored by using the average values of intensities for the corresponding resolution shell. The program produces a plot of completeness vs. resolution and a plot of the average radial completeness in polar coordinates theta and phi. Expected minimal error The minimal coordinate error is estimated using the experimental sigmas(F). The standard deviation of an atomic coordinate is given by: sig_min(r) = sqrt(3)*sigma(slope)/curvature where sigma(slope) is the slope of then electron density in the x direction (along A). curvature is an average curvature of the electron density at the atomic peak centre. and computed as: sigma(slope) = (2pi*sqrt(sum(h2*(sigF)2)))/(VOL*A) VOL - volume of cell A - cell parameter h - Miller index summation over all reflections ( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.) curvature = (2pi2*sum(h2*F))/(VOL*A2) ( Murshudov et al., (1997) Acta.Cryst D532, 240.) If there is no experimental sigma Sfcheck uses sigma = Fobs * 0.04 for all reflections. Expected maximal error The expected maximal error in coordinates is estimated by the difference between !Fobs! and !Fcalc!: sig_max(r) = sqrt(3)*sigma(slope)/curvature sigma(slope) = (2pi*sqrt(sum(h2*(Fobs-Fcalc)2)))/(VOL*A) curvature = (2pi2*sum(h2*F))/(VOL*A2) For missing reflections the program uses the average value of sigma(Fobs) for the corresponding resolution shell instead of (Fobs-Fcalc). DPI - diffraction-data precision indicator The Cruickshank method of estimation of coordinate error. (Acta Cryst.(1999), D55, pp 583-601) sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact where C - fractional completeness. Rfact - convential crystallographic R-factor Nobs - number of reflections Dmin - maximal resolution If Rfree flags are specified, the program uses the Murshudov approach to calculate DPI: (Newsletter on protein crystallography., Daresbury Laboratory, (1997) 33, pp 25-30.) sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree Luzzati plot (R-factor vs. resolution) Program computes the average radial error <delta> in coordinates by Luzzati plot. <delta(r)> = 1.6 sig(x) Solvent content The solvent content is the fraction of the unit cell volume not occupied by the model. The model consists of ALL atoms present in the coordinate file including ordered solvent. Residual factor Rmerge sum_i (sum_j |Ij - <I>|) Rmerge(I) = -------------------------- sum_i (sum_j (<I>)) Ij = the intensity of the jth observation of reflection i <I> = the mean of the intensities of all observations of reflection i sum_i is taken over all reflections sum_j is taken over all observations of each reflection
Local error estimation (plotted for each residue, for the backbone and for the side chain): 1. Amplitude of displacement of atoms from electron density 2. Density correlation coefficient 3. Density index 4. B-factor 5. Index of connectivity Displacement Displacement of atoms from electron density is estimated from the difference (Fobs - Fcal) map. The displacement vector is the ratio of the gradient of difference density to the curvature. The amplitude of the displacement vector is an indicator of the positional error. Correlation coefficient The density correlation coefficient is calculated for each residue from the atomic densities of (2Fobs-Fcalc) map - "Robs" and the model map (Fcalc) - "Rcalc" : D_corr = <Robs><Rcalc>/sqrt(<Robs2><Rcalc2>) where <Robs> is the mean of "observed" densities of atoms of the residue (backbone or side chain). <Rcalc> is the mean of "calculated" densities of atoms of the residue. The value of density for an atom from the map R(x) is given by: sum_i ( R(xi) * Ratom(xi - xa) ) Dens = ---------------------------------- sum_i ( Ratom(xi - xa) ) where Ratom(x): the atomic electron density for x-th point of grid. xa : vector of the centre of atom. xi : vector of the i-th point of grid. Sum is taken over all grid points which have distance from the centre of atom less than Radius_limit. For all atoms Radius_limit = 2.5 A. Index of density and index of connectivity The index of connectivity is the product of the (2Fobs-Fcal) electron density values for the backbone atoms N, CA and C, i.e. the geometric mean value for these atoms. Low values of this index indicate breaks in the backbone electron density which may be due to flexibility of the chain or incorrect tracing. The index of density is a similar indicator which is calculated for all atoms of a given residue.
An omit map procedure is a means of reducing the model bias in the electron density calculated with model phases. SFCHECK produces the so called total omit map by an automatic procedure. First, the initial (Fobs, PHImodel) map is divided into N boxes. For each box, the electron density in it is set to zero and new phases are calculated from this modified map. A new map is calculated using these phases and Fobs. This map contains the omit map for the given box which is stored until the procedure is repeated for all boxes. At the end, all the boxes with omit maps are used to assemble complete omit map. Phases calculated from this complete omit map are combined with the initial phases. The whole procedure may be repeated (keyword NOMIT). Note: it is time consuming! Sfcheck can optionly create an output file with omit phases (see HKLOUT)
Sfcheck can use only one input file of either coordinates or structure factors. In such cases Sfcheck can of course produce limited analysis of the coordinates or the data.
Sfcheck checks for merohedral twinning. (only if the program uses one input MTZ file of structure factors) Perfect twinning test: <I2>/<I>2 Sfcheck will compute a Partial Twinning test: H = !I(h1)-I(h2)!/(I(h1)+I(h2)) for the following space groups: P3 P31 P32 R3 P4 P41 P42 P43 I4 I41 P6 P61 P62 P63 P64 P65 P312 P321 R32 P23 F23 I23 P213 I213 Alpha (twinning fraction) = 1/2 - <H> If 0.05 <Alpha< 0.45 Sfcheck can create an output MTZ file with detwinned data (see HKLOUT) For details see: Yeates,T.O. (1997) Methods in Enzymology 276, 344-358.
Information is output to a PostScript file:
sfcheck_<identifier>.ps
Sfcheck can create:
LABIN, NOMIT
Specify input column labels. Only needed if a MTZ file is input.
Sfcheck labels defined are: F, SIGF, F(-), SIGF(-), I, SIGI, I(-), SIGI(-), FREE
F | label of F or F(+) |
---|---|
SIGF | label of sigma F or sigma F(+) |
F(-) | label of F(-) |
SIGF(-) | label of sigma F(-) |
FREE | label of flag of Rfree factor |
I | Structure Intensity of hkl |
SIGI | Standard deviation of the above |
I(-) | Structure Intensity of -h -k -l |
SIGI(-) | Standard deviation of the above |
Only required if omit procedure is needed
<nomit> is the number of cycles of omit procedure. Default is 0, i.e no omit procedure. The <nmon>=2 is recommended value.
There are two input files: MTZ and PDB. Sfcheck will assess the agreement between the atomic model and X-ray data. # -------------------------------- sfcheck HKLIN test.mtz XYZIN 2sar.pdb << eor # -------------------------------- LABIN F=F SIGF=SIGF FREE=FreeR_flag eor
There are two input files: MTZ and PDB. For assessing Sfcheck will use the phases after the omit procedure. New MTZ file with OMIT Phases will be created. As variable PATH_OUT is used the output PostScript file will be placed in directory: /y/people/alexei/ # -------------------------------- sfcheck HKLIN test.mtz XYZIN 2sar.pdb \ HKLOUT new.mtz PATH_OUT /y/people/alexei/ \ << eor # -------------------------------- LABIN F=F SIGF=SIGF FREE=FreeR_flag NOMIT 2 eor