SFCHECK (CCP4: Supported Program)

NAME

sfcheck - A program for assessing the agreement between the atomic model and X-ray data.

SYNOPSIS

sfcheck [HKLIN in.mtz] [XYZIN in.pdb] [HKLOUT out.mtz] [MAPOUT map.ccp4]
[PATH_OUT path_out] [PATH_SCR path_scr]
[Keyworded input]

DESCRIPTION

Version 6.0.3 (26.02.2002) - Features

A program for assessing the agreement between the atomic model and X-ray data. The program requires one or two input files, with the coordinates of the model (in PDB format) and structure factors ( MTZ or CIF format), and runs completely automatically, gives information about R-factor, correlation, Luzzati plot, Wilson plot, Boverall,pseudo-translation, twinning test ..., local error estimation by residues. Sfcheck can compute omit phases and use these instead of phases of model. For output Sfcheck generates a PostScript file. Sfcheck can also create a new MTZ output file with omit phases or with detwinned data.

Reference
Output information produced by SFCHECK
Local error esimationt
Omit procedure
Partional information
Twinning test
Input output files
Keywords
Command file examples

Reference

   Authors:      A.A.Vagin, J.Richelle, S.J.Wodak. 
                email: alexei@ysbl.york.ac.uk
    A.A.Vaguine, J.Richelle, S.J.Wodak. SFCHECK: a unified set of 
    procedure for evaluating the quality of macromolecular stracture-factor
    data and their agreement with atomic model.
    Acta Cryst.(1999). D55, 191-205

Output information produced by SFCHECK

    1. Data deposited in PDB file: 

      1.1 Crystal: 
          cell parameters and space group
 
      1.2 Model: 
          number of atoms
          number of water molecules
          solvent content  
          <B> for model
          Matthews coefficient and corresponding solvent %
          reported resolution 
          reported R-factor
 
      1.3 Refinement:
          refinement program
          resolution range for refinement
          reported sigma cut-off for refinement
          reported R-factor
          reported Rfree
 
    2. Data computed by Sfcheck:

      2.1 Structure factors:
          number of reflections
          number of reflections with I > sigma
          number of reflections with I > 3sigma
          resolution range
          completeness
          R-standard computed by Sfcheck (sum(sigma)/sum(F))
          Wilson plot (amplitudes vs. resolution)
          overall B-factor by Patterson origin peak and by Wilson plot
          optical resolution 
          expected minimal error in coordinates
          Anisotropic distribution of Structure Factors -ratio of Eigen values 
 
      2.2 Model vs. structure factors:
          R-factor
          Correlation coefficient
          R-factor for reported resolution range and sigma cut-off
          Rfree    
          Luzzati plot (R-factor vs. resolution)
          coordinate error from Luzzati plot
          expected maximal error in coordinates
          diffraction-data precision indicator (DPI)
          Patterson scaling    - scale , Badd
          Anisothermal scaling - betas: b11,b22,b33,b12,b13,b23
          Solvent correction - Ks,Bs
 
    Optical resolution

      Optical resolution is defined as an expected minimum distance
      between two resolved peaks in the electron density map.

      With a single-Gaussian approximation of the shape of atomic peak
      the minimum distance between two resolved peaks is twice the standard 
      deviation "sigma" or the width of atomic peak W (W = 2 sigma).
      Expected width of atomic peak W is computed as

       W = sqrt ( 2 (sigma_patt² + sigma_res²) )

       where  sigma_patt - standard deviation of the Gaussian corresponded 
                         to the Patterson origin peak. 

            sigma_res  - standard deviation of the Gaussian corresponded
                         to the origin peak of spherical interference function
                         which is Fourier transform of the sphere in 
                         the reciprocal space with radius 1/d_min.

                         sigma_res = 0.356 d_min.         
                        
                         d_min is minimum d-spacing, "nominal resolution".

      The "expected optical resolution for complete data set" is  
      calculated as above but using all reflections, with values for
      missing reflection being the average value in the corresponding
      resolution shell.
      Plot of Optical resolution for an atom with B=0 demonstrates
      behavior of the part of Optical resolution corresponded on the 
      series termination.
  
    Patterson scaling
 
      Scaling in SFCHECK is based on the Patterson origin peak which is
      approximated as a gaussian. Compared to the conventional scaling 
      by the Wilson plot, this method is particularly advantageous when
      only low resolution data are available.
      The program gives overall B-factors estimated by both methods.
 
    Low resolution cut-off
 
      Disordered solvent contributes to diffraction at low resolution.
      However, removal of low resolution data from calculations results
      in a series termination effect which is noticeable in the electron
      density at the surface of the molecule. To reduce the influence of
      low resolution terms, SFCHECK applies a "soft" low resolution 
      cut-off to structure factors according to the formula:
 
        Fnew = Fold (1-exp(-Boff*s²)) , where Boff = 4dmax²
  
      Sfcheck uses Boff = 256. This corresponds to the low resolution 
      cut-off at 8 A.
 
    Scaling
      
      Sfcheck scales Fobs and Fcalc by the Patterson origin peak using all
      data applying Boff.
      First, it computes Boveralls for observed and calculed amplitudes.
      Second, it makes the width of the calculated peak equal to the 
      observed, i.e. computes an additional thermal factor Badd:
        
            Badd = Boverall_obs - Boverall_calc
  
      Third, Sfcheck computes the scale factor for Fcalc:
 
                                  sum(Fobs²*(1-exp(-Boff*s²)))
            scale = sqrt ( --------------------------------------------- )
                           sum(Fcalc²*exp(-Badd*s²)*(1-exp(-Boff*s²)))
 
      Finally we have:
 
            Fcalc_scaled = Fcalc * scale * exp(-Badd*s²)   
 
      Sfcheck computes the R-factor and Correlation coefficient for all
      data applying the soft low resolution cut-off as described above. 
      Sfcheck computes the R-factor and Correlation coefficient for
      the reported resolution range and reported sigma cut-off without
      applying Boff. If the Fobs file contains reflections marked with
      the Rfree flag, the program computes Rfree.
  
    Completeness 
 
      Missing data are restored by using the average values of 
      intensities for the corresponding resolution shell.
      The program produces a plot of completeness vs. resolution and
      a plot of the average radial completeness in polar coordinates
      theta and phi.
 
    Expected minimal error  
 
      The minimal coordinate error is estimated using the experimental 
      sigmas(F). The standard deviation of an atomic coordinate is 
      given by:
      
         sig_min(r) = sqrt(3)*sigma(slope)/curvature
 
              where  sigma(slope) is the slope of then electron density in the 
                                  x direction (along A).             
                     curvature is an average curvature of the electron 
                                  density at the atomic peak centre.
         
      and computed as:
  
       sigma(slope) = (2pi*sqrt(sum(h²*(sigF)²)))/(VOL*A)
 
                     VOL - volume of cell
                     A   - cell parameter
                     h   - Miller index        
                     summation over all reflections
 
                    ( Cruickshank,D.W.J. (1949) Acta.Cryst 2, 65.) 
 
       curvature  = (2pi²*sum(h²*F))/(VOL*A²)
         
                    ( Murshudov et al., (1997) Acta.Cryst D532, 240.) 
  
      If there is no experimental sigma Sfcheck
      uses  sigma = Fobs * 0.04 for all reflections.
            
    Expected maximal error
 
      The expected maximal error in coordinates is estimated 
      by the difference between !Fobs! and !Fcalc!:
 
       sig_max(r) = sqrt(3)*sigma(slope)/curvature
 
       sigma(slope) = (2pi*sqrt(sum(h²*(Fobs-Fcalc)²)))/(VOL*A)
 
       curvature  = (2pi²*sum(h²*F))/(VOL*A²)
  
      For missing reflections the program uses the average value of 
      sigma(Fobs) for the corresponding resolution shell instead 
      of (Fobs-Fcalc).
 
    DPI - diffraction-data precision indicator
      
      The Cruickshank method of estimation of coordinate error.
                   (Acta Cryst.(1999), D55, pp 583-601)
                 
        sig(x) = sqr(Natoms/(Nobs-4Natoms)) C-1/3 dmin Rfact
 
                where  C     - fractional completeness.
                       Rfact - convential crystallographic R-factor
                       Nobs  - number of reflections 
                       Dmin  - maximal resolution
        
       If Rfree flags are specified, the program uses the Murshudov approach 
       to calculate DPI: 
                   (Newsletter on protein crystallography., Daresbury
                    Laboratory, (1997) 33, pp 25-30.)
 
        sig(x) = sqr(Natoms/Nobs) C-1/3 dmin Rfree
    
    Luzzati plot (R-factor vs. resolution)
 
       Program computes the average radial error <delta> in coordinates 
       by Luzzati plot.
                          <delta(r)> = 1.6 sig(x)
 
    Solvent content  
 
       The solvent content is the fraction of the unit cell volume not occupied
       by the model. The model consists of ALL atoms present in the coordinate 
       file including ordered solvent.
 
 
    Residual factor Rmerge 
 
                            sum_i (sum_j |Ij - <I>|)
                Rmerge(I) = --------------------------
                                 sum_i (sum_j (<I>))
 
                Ij  = the intensity of the jth observation of reflection i
                <I> = the mean of the intensities of all observations of
                       reflection i
 
                sum_i is taken over all reflections
                sum_j is taken over all observations of each reflection

Local error estimation

    Local error estimation (plotted for each residue, for the backbone
    and for the side chain):
       1. Amplitude of displacement of atoms from electron density
       2. Density correlation coefficient
       3. Density index 
       4. B-factor
       5. Index of connectivity
 
    Displacement
 
      Displacement of atoms from electron density is estimated from the
      difference (Fobs - Fcal) map. The displacement vector is the ratio of
      the gradient of difference density to the curvature. The amplitude of
      the displacement vector is an indicator of the positional error.
 
    Correlation coefficient
 
      The density correlation coefficient is calculated for each residue
      from the atomic densities of (2Fobs-Fcalc) map - "Robs" and the model
      map (Fcalc) - "Rcalc" :
 
      D_corr =  <Robs><Rcalc>/sqrt(<Robs²><Rcalc²>)
 
          where <Robs> is the mean of "observed" densities of atoms 
                of the residue (backbone or side chain).
  
                <Rcalc> is the mean of "calculated" densities of atoms 
                of the residue.
 
          The value of density for an atom from the map R(x) is given by:
 
                   sum_i ( R(xi) * Ratom(xi - xa) )
          Dens =  ---------------------------------- 
                       sum_i ( Ratom(xi - xa) ) 
 
            where  Ratom(x): the atomic electron density for x-th point of grid.
                   xa      : vector of the centre of atom.
                   xi      : vector of the i-th point of grid.

            Sum is taken over all grid points which have distance
            from the centre of atom less than Radius_limit.
            For all atoms Radius_limit = 2.5 A.
 
    Index of density and index of connectivity
 
      The index of connectivity is the product of the (2Fobs-Fcal) electron 
      density values for the backbone atoms N, CA and C, i.e. the geometric
      mean value for these atoms. Low values of this index indicate breaks 
      in the backbone electron density which may be due to flexibility of 
      the chain or incorrect tracing.  The index of density is a similar 
      indicator which is calculated for all atoms of a given residue.

Omit procedure

 
      An omit map procedure is a means of reducing the model bias in 
      the electron density calculated with model phases. SFCHECK produces 
      the so called total omit map by an automatic procedure. First, the
      initial (Fobs, PHImodel) map is divided into N boxes. For each
      box, the electron density in it is set to zero and new phases are
      calculated from this modified map. A new map is calculated using
      these phases and Fobs. This map contains the omit map for the
      given box which is stored until the procedure is repeated for
      all boxes. At the end, all the boxes with omit maps are used 
      to assemble complete omit map. Phases calculated from this complete
      omit map are combined with the initial phases. The whole procedure may
      be repeated (keyword NOMIT). Note: it is time consuming!
      Sfcheck can optionly create an output file with omit phases 
      (see HKLOUT)

Partional information

 
      Sfcheck can use only one input file of either coordinates or structure 
      factors. In such cases Sfcheck can of course produce limited analysis of
      the coordinates or the data.

Twinning test

 
     Sfcheck checks for merohedral twinning.
     (only if the program uses one input MTZ file of structure factors)

     Perfect twinning test: <I²>/<I>² 

     Sfcheck will compute a Partial Twinning test:

          H = !I(h1)-I(h2)!/(I(h1)+I(h2))

     for the following space groups:

                           P3 P31 P32 R3           
                           P4 P41 P42 P43 I4 I41   
                           P6 P61 P62 P63 P64 P65  
                           P312 P321 R32           
                           P23 F23 I23 P213 I213   
        
     Alpha (twinning fraction) = 1/2 - <H>

     If  0.05 <Alpha< 0.45 Sfcheck can create an output MTZ file
     with detwinned data (see HKLOUT)

     For details see: Yeates,T.O. (1997) Methods in Enzymology 276, 344-358.

Input output files

HKLIN: Input MTZ file with the reflected data.
See the LABIN keyword for details.
Also CIFile of structure factors is acceptable.

XYZIN: Input PDB file with the coordinates.
Must contain a CRYST1 card with the unit cell and the space group name. Sfcheck can use the information from HEADER, SCALE, MTRIX and REMARK cards.

Output files

Information is output to a PostScript file:

sfcheck_<identifier>.ps

Sfcheck can create:

HKLOUT: Only required if output MTZ file is needed. output new MTZ file with the reflected data(F,sigF) or with OMIT Phases (Fobs,Ph,FOM) or with detwinned Fobs (F,sigF).

MAPOUT: Only required if output file with map is needed. output CCP4 file with the extract density (around model) map.

PATH_SCR: You can use this variable to redirect all scratch files to special directory.
Default is value of TEMP1 or CCP4_SRC variables.
If 'PATH_SCR .' all scratch files will be in current directory.

PATH_OUT: This variable will redirect output PostScript file to the defined directory.

Keywords

The available keywords are:

LABIN, NOMIT

LABIN <program label>=<file label>...

Specify input column labels. Only needed if a MTZ file is input.

Sfcheck labels defined are: F, SIGF, F(-), SIGF(-), I, SIGI, I(-), SIGI(-), FREE

F	label of F or F(+)
SIGF	label of sigma F or sigma F(+)
F(-)	label of F(-)
SIGF(-)	label of sigma F(-)
FREE	label of flag of Rfree factor
I	Structure Intensity of hkl
SIGI	Standard deviation of the above
I(-)	Structure Intensity of -h -k -l
SIGI(-)	Standard deviation of the above

NOMIT <nmon>

Only required if omit procedure is needed

<nomit> is the number of cycles of omit procedure. Default is 0, i.e no omit procedure. The <nmon>=2 is recommended value.

Command file examples

Example of usual command file :

  There are two input files: MTZ and PDB.
  Sfcheck will assess the agreement between the atomic 
  model and X-ray data. 

# --------------------------------
sfcheck HKLIN test.mtz XYZIN 2sar.pdb << eor
# --------------------------------
LABIN F=F SIGF=SIGF FREE=FreeR_flag 
eor

Example for omit procedure :

  There are two input files: MTZ and PDB.
  For assessing Sfcheck will use the phases after the omit procedure. 
  New MTZ file with OMIT Phases will be created. As variable PATH_OUT  
  is used  the output PostScript file will be placed in directory: 
  /y/people/alexei/

# --------------------------------
sfcheck HKLIN test.mtz XYZIN 2sar.pdb \
HKLOUT new.mtz PATH_OUT /y/people/alexei/ \
 << eor
# --------------------------------
LABIN F=F SIGF=SIGF FREE=FreeR_flag 
NOMIT 2
eor