Crystallographic Error Analysis

A method to detect problem residues in a crystallographic refinement is presented. Each protein residue is analyzed in terms of mainchain (mc) and sidechain (sc) atoms. Poorly modeled regions in the structure are reliably identified. The results may be plotted or viewed with ribbons .

Background

Reference

For background on the method and for purposes of citation the following reference is given:

M. Carson, T.W. Buckner, Z. Yang, S.V.L. Narayana and C.E. Bugg (1994) Error Detection in Crystallographic Models. Acta Cryst. D 50:900-909.

Abstract

A variety of criteria were tested for identifying errors in protein crystal coordinates. Statistical analysis was based on comparisons of a highly refined crystal structure and several preliminary models derived from molecular replacement. A protocol employing temperature factors, real-space fit residuals, geometric strains, dihedral angles, and shifts from the previous refinement cycle is developed. These results are generally applicable to the detection of errors in partially refined protein crystal structures.

Key Points

Crystallographic data is required to reliably assess the quality of a coordinate file. Statistical analysis implies that a linear model of 5 independent variables (see abstract) is required to fit the error. The error model was taken as the deviation between the preliminary coordinates and the final refined coordinates. Grossly incorrect residues (1.0A deviation) are identified with approximately 90% accuracy for the mainchain and 70% for the sidechains. Real-space fit using maps calculated with individual B-factors was the single criteria having the highest per-residue correlation with coordinate error (about 0.7).

Executing the Analysis

The Full Protocol: ribbon-errors

usage:
ribbon-errors < errors.in > diagnostic.output

X-PLOR and ribbons need be set up as expected on your system (see Program Environment later in this chapter).

Required Data

The following 5 files must be present to begin (see Data Preparation for Analysis later in this chapter):

final.pdb -- your final coordinates, with individual B-factors
prev.pdb -- the coordinates, before last round of SA refinement
Fobs.map -- your best `observed' map, FRODO format
Fcal.map -- a purely calculated map, same scale as above
Xplor.inp -- a short list of X-PLOR commands to set parameters

Input File

The input file consists of a name tag and the list of files above in the order above. The sample input `errors.in' is shown below. (notes in parentheses are not actually in the file):

 fdm_may94		(name_tag)  

 fdm_bref.pdb		(final.pdb)  

 fdm_prep.pdb		(prev.pdb)  

 fdm_2fo.dsn6		(Fobs = 2Fo-Fc.map)  

 fdm_calc.dsn6		(Fcal.map)  

 fdm_xa_ribbons.inp	(Xplor.inp)

The script may also be run interactively by answering the prompts for input.

Output Files

The `diagnostic.output' file is mostly an echo of prompts and input, useful only if something goes wrong (see Problems? at the end of the chapter). The last line should read: ` Successful completion of 'ribbon-errors'.'

The script produces 11 ascii files of per-residue information, with each file name prefaced with the `name_tag' set on input:

name_tag_rsr.list -- Real Space Residual fit of model to map
name_tag_shift.list -- Shift (rms) from previous position
name_tag_bf.list -- Temperature (B) factors
name_tag_geom.list -- Geometric strain of bonds, angles, planes
name_tag_dihe.list -- Dihedral sensibilities of mainchain and sidechain
name_tag_rsr.log -- Additional diagnostics for the RSR step
name_tag_xa.hbss -- Mainchain H-bonding used to get secondary structure
name_tag_xa.rama -- Phi/Psi/Omega angles, ribbons format
name_tag_xa.chis -- Sidechain Chi angles, ribbons format
name_tag_xa.ss -- Xtal error Analysis ribbons *_xa.ss file
name_tag_error.list -- Crystallographic error analysis summary list

The Best Single Criterion: ribbons-rsr

usage:
EM> RSR final.pdb Fobs.map Fcal.map rsr.list > diagnostics

ribbons needs be set up as expected on your system (see Program Environment later in this chapter).

Required Data

The following 3 files must be present to begin (see Data Preparation for Analysis later in this chapter):

final.pdb -- your final coordinates, with individual B-factors
Fobs.map -- your best `observed' map, FRODO format
Fcal.map -- a purely calculated map, same scale as above

Output Files

The `diagnostics' file is mostly an echo of input data, useful only if something goes wrong (see Problems? at the end of the chapter).

The script produces an ascii files of per-residue information, `rsr.list' with the Real Space Residual fit of model to map.

Interpreting the Results

The method averages the results from the rsr(R), shift(S), bf(B), geom(G), and dihe(D) results for mainchain(mc) and sidechain(sc) atoms to determine an overall error factor (E) in standard deviation(sigma) units relative to the mean. Residues with E-values > 1.0 are most probably in error. Results are kept for each criteria, and finally all combined into a summary file.

The Individual ribbons *.list files

Some sample lines from the RSR output `neu_dec93_rsr.list' follow. Raw data for each criteria are maintained. These *.list files may be converted into PostScript plots.

 res# aa  R-all  R-mc  R-sc
   84  D   0.281 0.220 0.320
   85  F   0.147 0.140 0.148

The ribbons *_error.list file

Some sample lines from the summary `neu_dec93_error.list' follow. It is seen that Phe 85 is OK, while Asp 84 is not. The latter has significantly poor rsr, B-factor, and shifts for its mainchain atoms, while its adherence to ideal geometry and allowed torsion dihedral is good. The biggest problem with Phe 85 is its mainchain B-factor being 1.0 sigma above the mean. The data:

res#  aa    Eave   Emc   Esc   Rmc  Bmc  Smc  Gmc  Dmc  Rsc  Bsc  Ssc  Gsc  Dsc
  84   D    1.64  1.86  1.43   3.7  2.8  3.7 -0.4 -0.5  3.8  2.9  1.1 -0.3 -0.4
  85   F    0.05  0.20 -0.11   0.7  1.0  0.6 -0.9 -0.5  0.1  0.2 -0.2 -0.3 -0.4

The ribbons *_xa.ss file

The corresponding `neu_dec93_xa.ss' file assigns a letter grade, and can be used directly with ribbons to visualize problem areas. The data:

   84   D   c   c   x   C   C   C   E   D   E   A   A   E   D   C   A   A
   85   F   c   c   x   B   B   A   B   C   B   A   A   B   B   A   A   A

Scores are based on the value of an error criteria in standard deviation units relative to the mean for all the residues. These scores are assigned colors for visualization with ribbons .

Display of Results

Bar Graph Plots

The rsr(R), shift(S), bf(B), geom(G), dihe(D), and error summary(E) *.list files can all be converted into individual per-residue PostScript plots. See the documentation in PostScript Plots} for options and examples. For default plots, the commands are:

rsr-ps your_rsr.list > your_rsr.ps
rms-ps your_shift.list > your_shift.ps
bf-ps your_bf.list > your_bf.ps
geom-ps your_geom.list > your_geom.ps
dihe-ps your_dihe.list > your_dihe.ps
sig-ps your_error.list > your_error.ps

Ramachandran-like plots can also be created:

rama-ps < your_xa.rama > rama.ps
chis-ps < your_xa.chis > chis.ps

Viewing with ribbons (on supported workstations)

For a standard ribbon drawing of the single protein chain analyzed with ribbon-errors , your final *_xa.ss file and your final *.pdb file are required. Issue the commands to setup ribbons (note: this will create the files `final.model', `final.coords', and `final.ribbons' in your current directory). Then invoke the display program:

pdb-ss-model final.pdb final_xa.ss ``Optional title''
ribbons -n final

Hot spots in the structure should be obvious. The default coloring scheme for grading is as follows:

ribbons version 2.5 or later employs an X Windows/Motif interface. Choose pop-up menus by pressing the LeftMouseButton on the labeled Menubar at the top of the screen. Use the Ribbon Style Panel Sequence Color Widget to select any analysis feature for display by selecting the `Ribbon Style' choice of `Edit' from the Menubar.

Pressing the ALT key while the cursor is over a residue will display information about the error criteria. Press ALT with the cursor over the background to clear the message.

Ramachandran-like plots can be viewed directly on SGI machines as follows:

rama-plot < your_xa.rama
chis-plot < your_xa.chis

Data Preparation for Analysis

Coordinate Files

The RSR protocol requires only one set of coordinates. Two sets of coordinates in PDB format *.pdb are required for the full protocol: the current best model and a comparison set. (I use the coordinates prior to the last round of simulated annealing refinement with X-PLOR for comparison.) You must split the PDB coordinate file of your crystal model into pieces, unless you have a monomer with no co-factors or waters. For example, if your current model contains a dimer, a co-factor, and some waters, you must create 3 files: one for each monomer and one for the non-protein atoms. This splitting must be repeated for the comparison set. This is similar to the data preparation required for X-PLOR before the `generate' step.

Each *.pdb file must have as its last line the `END' record. The current best model must have the polar hydrogens used by X-PLOR. Each protein chain subjected to the full analysis should contain only standard amino acids with standard atom names, else the results will be suspect for the non-standard residues. The programs cannot distinguish between `mainchain' and `sidechain' or look up the residue's preferred conformations for the `hetero' atom files; thus, less information is output for these.

Map Calculation

Two electron density maps in FRODO *.dsn6 format are required: the best observed and a purely calculated map. (I use 2Fo-Fc coefficients with calculated phases for the `observed' map. Our results show omitmaps are not significantly better.) The calculated map must be produced at the same scale, with calculated amplitudes and phases. Each map must completely cover all atoms to be analyzed, so please add a cushion of at least 3A. The two maps must be exactly the same size in grid points. To create the maps with X-PLOR, copy the sample input file into your directory:

cp $RIBBONS_HOME/analysis/xplor/rsr_maps.inp .

Edit this file to incorporate your data and set the file names for the output maps. The sample data is from the monolclinic crystal form of Factor D. Here is a shell script ( $RIBBONS_HOME/analysis/xplor/rsr_maps.csh ) to produce the maps (I usually do this interactively):


#!/bin/csh
#
#	'xplor'  executes Brunger's X-PLOR.
#	'xmappage'  executes his utility to create FRODO maps.
#
#  calculate the maps and create the output X-PLOR *.map files.
#  you can save the output to create Luzzati plots.
#
#  this file is: $RIBBONS_HOME/analysis/xplor/rsr_maps.inp
#
xplor < rsr_maps.inp > rsr_maps.out

#  convert the *.map files to FRODO binary format
#
xmappage < < END
fdm_2fo.map
fdm_2fo.dsn6
END

xmappage < < END
fdm_calc.map
fdm_calc.dsn6
END

#   pitch the big ascii *.map files
#
rm fdm_2fo.map fdm_calc.map

Of course you can use any appropriate program to create the maps, eg, Wm. Furey's PHASES package.

X-PLOR Input Script

The supplied ribbon-errors command script that runs the full error analysis protocol relies on X-PLOR and the X-PLOR shell language. It is assumed that you are reasonably comfortable with X-PLOR. A local X-PLOR command file must be copied and edited to incorporate your data and limit the atoms of interest:

cp $RIBBONS_HOME/analysis/xplor/xplor_ribbons.inp .

Edit the file in the three places where comments draw your attention. You must: 1) set your X-PLOR *.psf file name, 2) include any required parameter files, 3) and most importantly set the selection statement at the end to include only the inclusive residue numbers of the atoms of the particular protein chain or set of atoms being analyzing.

But I Don't Use X-PLOR!

The X-PLOR shell language is used as a convenience (instead of writing more auxiliary programs). For each protein chain or group of heteroatoms, the script: 1) averages temperature factors for each residue and its mc and sc atoms; 2) determines the rms shift from the previous model in the same fashion; 3) evaluates the geometric strain energy due to bond, angle, and planar deviations from ideality for each residue and its mc and sc atoms. Each of the 3 outputs is a simple formatted ascii list. Only the latter calculation really uses X-PLOR.

Alternative refinement programs such as PROLSQ or TNT could likely give an equivalent list of geometric strain. Anyone interested in adapting the protocol to a different refinement program should contact me (who is counting on your being an expert on the refinement program of choice!).

Program Environment

The default script to execute the error analysis depends heavily on Axel Brunger's X-PLOR program (see above if you don't have X-PLOR).

You are expected to have a command named `xplor' that executes the X-PLOR program. (You could change this by editing the definition of `run_xplor' in the `ribbon-errors' script in $RIBBONS_HOME/bin.) Additionally, you should set the environment variable `TOPPAR' to point to the X-PLOR topology and parameter directory.

ribbons must be installed correctly on your system (see Installation Notes ). You must: 1) have at least the main ribbons directory and the /analysis, /bin, /data, and /help subdirectories, 2) have the environment variable `RIBBONS_HOME' set to point to the root of the ribbons directory tree, 3) have `$RIBBONS_HOME/bin' added to your command path.

See the VMS section if you are running on a VAX.

Problems?

The script tries to detect whether each required program and input file is available. It does not check file types. Did you enter the input files in the right order? Are xplor and ribbons installed as expected on your system? (see Program Environment ). Did you supply all required information for X-PLOR? (see X-PLOR Input Script ).

X-PLOR produces voluminous and generally useless diagnostics during the procedures. This output and the temporary input are generally deleted. For debugging purposes, copy the error script to your current directory:

cp $RIBBONS_HOME/bin/ribbon-errors .

Change the 27th line of the file: #set DebugXplor = "Yes"
Simply remove the leading comment character `#'. (For VAX, see the VMS section of the manual). Re-run the script and study the generated files `xa_NNN.inp' and `xa_NNN.out', where NNN is the process number generated by your operating system.

Do you have non-standard amino acids in your protein {\em *.pdb} file? (A leading ACE residue may seriously confuse ribbons If the RSR results are suspicious, make sure there are no associated warnings in the `your_rsr.log' file for grid points out of range. If still baffled, send me e-mail: carson@cmc.uab.edu.

VMS???

E-mail me if you are serious: carson@cmc.uab.edu.

Bibliography

FactorD \\ Ribbons \\ X-PLOR \\ FRODO \\ O \\ Phases \\ PROLSQ \\ TNT \\ "TNT" stands for "Ten Eyck 'n' Tronrud". ("'n'" = "and".) As far as I know, TNT is still being maintained by Dale Tronrud. His e-mail address is "dale@uoxray.uoregon.edu".

Ribbons User Manual / UAB-CMC / carson@cmc.uab.edu