2
Methodology

This chapter provides a brief account of the basic ideas, practical consequences and implementation of the new features in this release of the X-PLOR software for X-ray crystallographic refinement and NMR structure determination.

X-ray crystallographic refinement
Torsion angle dynamics for structure refinement
Background
Conventional simulated annealing refinement treats each individual atom as an independent entity that moves subject to the forces acting upon it. However, the force constants that govern bond lengths and angles are relatively large and permit little variation from ideal values. Thus, it is natural to consider torsion angles as appropriate degrees of freedom with which to productively search the main conformational space available to the molecule and divide the structure into a set of completely rigid torsion groups.
By eliminating bond lengths and angles as degrees of freedom from the molecule it is also possible to use much larger time steps for integrating the equations of motion with torsion angle dynamics than is possible with conventional molecular dynamics. The most useful consequence of the ability to run stable refinements with large time steps is that difficult refinement problems, which require simulated annealing at high temperatures in order to escape from incorrect conformations, can be tackled. For conventional molecular dynamics the computational cost of high temperature refinements is prohibitively high because very small time steps are required to control vibrations along the bond lengths and angles.
The application of torsion angle dynamics to X-ray crystallographic refinement has been shown to give somewhat improved results over conventional simulated annealing methods at a given temperature (Rice and Brünger 1994). More significantly, torsion angle refinements starting at very high temperatures give converged refinements for poor starting models -- a result that can not be achieved using conventional simulated annealing protocols confined to lower temperatures.
The code for the torsion angle dynamics algorithm in X-PLOR (Rice and Brünger 1994), released in interim form in X-PLOR 3.851, has now been optimized to give better performance. For a 120-amino-acid protein, the integration of the equations of motion is approximately 40% faster than in the original implementation.
Implementation
Example scripts for X-ray crystallographic refinement with torsion angle dynamics are found in the subdirectory /tutorial/xtal_torsion/.
The parameters that control the torsion angle dynamics are contained in a dynamic torsion command block, which is terminated with an end statement. The available parameters can be classified into two types (i) parameters that control the initial setup of the torsion angle topology and (ii) parameters that control the dynamics run and which are analogous to parameters available for conventional molecular dynamics.
Specification of topology
The parameters that control the specification of the torsion topology are contained within a topology sub-block which is terminated with an end statement.
The available parameters are:
- FIX TORSion <selection> which constrains selected torsions.
- FIX GROUp <selection> which constrains all torsions in <selection>.
- FREE TORSion <selection> which removes a torsion constraint.
- FREE BOND <selection> which ignores a connection for generating the topology.
- KDIHmax = <real> which sets a threshold energy for establishing free torsions.
- MAXTree = <integer> which sets the maximum number of trees in the structure.
- MAXChain = <integer> which sets the maximum number of chains per tree.
- MAXLength= <integer> which sets the maximum length of a chain in a tree.
- MAXBond = <integer> which sets the maximum number of bonds allowed per atom.
- MAXDihe = <integer> which sets the maximum number of torsions allowed per atom.
- MAXJoint = <integer> which sets the maximum number of branches per atomic group.
- RESET is used to eliminate any previous topology specifications.
Molecular dynamics control
Once the topology of the structure is set, any number of dynamic torsion command blocks can be added to control the torsion angle dynamics application.
The following parameters are supported for specifying the simulated annealing protocol:
- ASCIi=<logical> writes a formatted (ASCII) coordinate trajectory file if ASCIi is TRUE; if ASCIi is FALSE an unformatted (binary) file is written.In the former case, a scale factor and an offset are applied to the co-ordinates and the result is converted to an integer number.
- CSEL=<selection> selects which atoms are written to the coordinate trajectory file.
- ILBFrq=<integer> updates the Langevin boundary between normal and Langevin particles in conjunction with RBUF and ORIGin.
- NSTEp=<integer> sets the number of dynamics steps.
- NPRInt=<integer> determines the frequency with which the energy information is printed.
- NSAVC=<integer> determines the frequency with which the coordinates are written to the trajectory file.
- NSAVV=<integer> determines the frequency with which the coordinates are written to the coordinate trajectory file.
- OFFSet=<real> provides the offset for the ASCIi=TRUE option.
- ORIGin=<vector> is the origin of the sphere of the Langevin boundary.
- RBUF=<real> specifies the radius of the spherical boundary in for the normal and Langevin particles.
- REASsign=<logical> reassigns the velocities if TRUE.
- SCALe=<real> provides a scale factor for the ASCIi=TRUE option.
- TBATh=<real> specifies the temperature of the heat bath or the target temperature for the temperature coupling.
- TIMEstep=<real> sets the time step for the dynamics.
- TRAJectory=<filename> provides the filename of the coordinate trajectory file.
- VSEL=<selection> selects atoms that are written to the velocity trajectory file.
- VELOcity=<filename> names the velocity trajectory file.
- VASCii=<logical> writes a formatted (ASCII) coordinate trajectory file if ASCIi is TRUE; if ASCIi is FALSE an unformatted (binary) file is written.In the former case, a scale factor and an offset are applied to the velocities and the result is converted to an integer number.
- VOFFset=<real> provides the offset for the ASCIi=TRUE option.
- VSCALe=<real> provides a scale factor for the ASCIi=TRUE option.
Maximum likelihood targets for structure refinement
Notes provided by N.S. Pannu and R.J. Read were used in the preparation of this section of text.
Background
In the past, crystallographic refinement with the X-PLOR program usually optimized the parameters of the atomic model (positions and temperature factors) to minimize the residual target
- E =
w(h)(|Fobs(h)|-k|Fcalc(h)|)2
where w(h) is a weighting factor and k is the scale factor between the observed and calculated structure factor amplitudes.
This target would give correct results if (i) the deviation between |Fobs(h)| and k|Fcalc(h) was Gaussian, (ii) the mean deviation was zero and (iii) the standard deviation of the Gaussian was independent of the parameters of the atomic model. This latter point is clearly untrue since the errors have a changing phase component which depends on the atomic model.
To overcome these fundamental difficulties, new targets for crystallographic refinement have been derived from first principles using maximum likelihood statistics (Pannu and Read 1996). The resulting target takes the form
- E =
wml(h) (|Fobs(h)|-<|Fobs(h)|>)2
for refinement against structure factor amplitudes and an analogous expression is obtained for refinement against intensities. Although this target is superficially similar to the least-squares residual, the weight, wml, and expectation value of the observed structure factor, <|Fobs(h)|>, are now calculated via estimates of
A (Read 1986) taken from a set of cross-validation test data.
Practical tests involving minimizations of misfit molecular models with the maximum likelihood targets and the conventional residual target show that better convergence (lower Rfree) and reduced bias (smaller difference between R and Rfree) is obtained with the maximum likelihood targets (Pannu and Read 1996). Tests in which a maximum likelihood target was used in conjunction with torsion angle dynamics showed that a greater radius of convergence could be achieved than with other refinement methods (Adams et al. 1997).
More recently, a new maximum likelihood target, the structure factor amplitude with Hendrickson-Lattman phase probability coefficients, has been developed and this target is also available in the X-PLOR program. This target makes optimal use of all the experimental information that is available since the phase probability coefficients correctly model the uncertainty in the experimentally determined values of the phase angle (that is., as determined from MIR or MAD data). Preliminary tests of the maximum likelihood target with phase probability information (Pannu and Read, unpublished results) give very promising results.
Implementation
The maximum likelihood targets are specified using the target keyword within the xrefin parameter block (example: target=MLF1). The three maximum likelihood targets currently supported are:
The least-squares residual (target=resid) is no longer the default method for refinement with the X-PLOR program. The new default target is MLF2, the intensity based maximum likelihood target. This target may be used for all types of positional refinement and for the refinement of individual atomic thermal parameters. Note that the maximum likelihood target is not used for the special case of an overall temperature factor refinement or for refinements that make use of explicit phase values, which are flagged by a value of the phase weight parameter, wp> 0.0. For these two cases the default target is automatically reset to resid.
There are two other parameters within the xrefin target block, siga and mbins, which can alter the behavior of maximum likelihood refinement.
The siga parameter provides the ability to update the
A estimate that is used in the calculation of the maximum likelihood target. The siga parameter may be set to fix, next, or refi=<integer>.
The siga fix option (the default) fixes the estimates for
A at the values used at the beginning of the refinement. To update the estimates for
A after some cycles of refinement the siga next option may be used. The siga refi = <integer> option is used to update the estimates for the
A values after the given number of structure factor calculations. This latter option should be used with caution because frequent updating sometimes causes an increase in Exref which can cause the line search to be abandoned in minimization.
The mbins parameter sets the number of resolution bins used for estimation of the
A values. The default value of mbins for other calculations with X-PLOR that use this parameter (for example, resolution-dependent tabulation of R-values -- see page 164 of the X-PLOR 3.1 manual) is 8. If maximum likelihood refinement is performed the default value for this parameter will automatically change to the number of reflections divided by 1000 or the number of cross-validation reflections divided by 50, whichever is the smaller. This new default will usually be a good estimate for the optimal value of mbins for the refinement. Any value of mbins that is explicitly set in the X-PLOR script will override these default modes of operation.
As in refinements against the other targets, a weight parameter, wa, is needed to scale the gradients in the X-ray energy to the gradients in the chemical energy. The value of the wa can be estimated in the same way as for the other targets, by using a script (/tutorial/xtalrefine/check.inp) which carries out a short free dynamics run and then calculates the ratios of the chemical and X-ray energy gradients. Since the maximum likelihood refinements make use of (internally) normalized data, the correct value of wa is usually orders of magnitude smaller than the value that would be used for a refinement with the least-squares residual. Note that none of the maximum likelihood targets uses the phase weight parameter, wp, (the MLHL refinement target deals directly with phase probabilities, not any explicit value for a phase angle) so wp should either be set to zero or omitted from the refinement script.
It should be noted that a set of cross-validation data is an absolute requirement for maximum likelihood refinement. If the cross-validation reflections are not flagged in the input data file and are not explicitly set up in the refinement script, 10% of the data will be automatically used for cross-validation purposes.
Example scripts for refinement using the maximum likelihood targets are /tutorial/xtaltorsion/torsion_slow_ml.inp for co-ordinate refinements and /tutorial/xtalrefine/brefinement_ml.inp for temperature factor refinements.
Andersen thermal coupling
Background
Simulated annealing calculations with the X-PLOR program have previously used the Berendsen thermal coupling method to control the temperature of the system (see pages 130-131 of the X-PLOR 3.1 manual). Temperature control with the Berendsen method is obtained by adding a force to each atom that is proportional to the individual atomic velocity. The overall scale constant for this force depends on the difference between the current temperature of the system and the desired temperature.
X-PLOR now also offers the option of temperature control by the Andersen thermal coupling method (Andersen 1980). In the Andersen method a set of atoms is randomly selected from the molecule and their velocities are replaced by randomly generated velocities selected from a Boltzmann velocity distribution at the desired temperature. The physical process that is modeled by the Andersen thermal coupling algorithm is a collision of a particle in the system with a particle in a heat bath at the desired temperature.
In contrast to the Berendsen thermal coupling method, which simply modifies the trajectories of all atoms in the system by rescaling along their current paths, the Andersen thermal coupling involves a stochastic process which alters the velocities and directions of the selected atoms at each time step. The general effect of Andersen thermal coupling is to break up correlated motions involving groups of atoms. This behavior increases local sampling of the conformational space but inhibits more global changes in the protein model.
Practical tests of the Andersen thermal coupling method reflect these theoretical expectations. For atomic models that are fairly close to convergence the Andersen thermal coupling method usually gives final structures with slightly lower values of Rfree than structures obtained from equivalent calculations with the Berendsen thermal coupling method. Models that still require large concerted movements to bring them into agreement with the X-ray data tend to be less convergent (unpublished results). These findings suggest that the Andersen thermal coupling method might be usefully employed in the final stages of a simulated annealing refinement. For example, in a slow cooling refinement protocol, where the temperature is reduced from 3000 to 300K, one could switch to the Andersen thermal coupling method once the system reached 1000K.
Implementation
The Andersen thermal coupling option may be invoked by setting two parameters within the velocity verlet command block.
The Andersen thermal coupling option is turned on by setting andersen = true (default is false). A control parameter, nequ=<integer>, specifies the number of time steps needed to reset the velocities of Natom atoms, where Natom is the number of atoms in the molecule. As a rough guide, for a cooling stage involving N time steps, nequ should be set to about N/2 to achieve full temperature control. A typical value for nequ would be 20.
Note that the Berendsen and Andersen thermal coupling options are mutually exclusive so the option to turn on Berendsen thermal coupling, tcoupling=true, should not be used if the Andersen thermal coupling option is used.
MAD phasing
This portion of text is adapted from program notes written by Axel T. Brünger.
Background
The following examples refer to a four-wavelength experiment but they are easily modified to accommodate between 2 and 5 wavelengths. Anomalous scatterer parameters are refined against the MAD data and phase probability distributions are computed. The algorithm consists of a combination of the Phillips and Hodgson (1980) method with a maximum-likelihood refinement using an error model similar to that of Terwilliger and Eisenberg (1987). Cross-validation is included in order to be able to assess the quality of the refinement and phasing process.
Sequence of jobs to be run
mad_merge.inp
- Merges reflection files.
mad_scale.inp
- Scales data.
mad_analyse.inp
- Computes dispersive and anomalous diffraction ratios.
mad_histogram.inp
- Computes histograms of dispersive and anomalous differences.
mad_refine.inp
- MAD heavy atom refinement and computation of phase probability distributions.
mad_map.inp
- Compute fom-weighted electron density map from MAD phases.
mad_diff_map.inp
- Compute fom-weighted dispersive difference Fourier map between wavelengths.
mad_diff_as_map.inp
- Compute fom-weighted anomalous difference Fourier map at a wavelength.
Tutorial
1. Edit and run the mad_merge.inp file in order to merge the data sets.
- You should restrict the resolution limits of the merged data to the range that you want to use in heavy atom searches and refinements. Writing all data without actually using it will cause additional CPU expense although the result will be identical.
- In the output reflection file produced by mad_merge.inp the structure factor amplitudes are named F_W1, F_W2, F_W3,.... and the corresponding standard deviations are named S_W1, S_W2,S_W3,...
2. Create a co-ordinate file with the anomalous scatterer xyz coordinates in orthogonal angstrom units. Set the B values to reasonable values (usually ~20 Å2) and set the occupancies to one.
3. Edit and run the mad_scale.inp file.
- Modify the number of atoms according to the expected asymmetric unit contents. Also modify the spacegroup, unit cell, etc. Then run mad_scale.inp.
4. Edit and run the mad_analyse.inp file.
- The anomalous and dispersive diffraction ratios will be listed in the output file, mad_analyse.dat.
5. Edit and run the mad_histogram.inp file.
- The anomalous and dispersive histograms will be listed in the output file, mad_histogram.dat.
6. Edit and run the mad_refine.ino file.
Macrocycles, cycles, and refinement steps
The refinement proceeds for the specified number of macrocycles ($macrocycle). Each macrocycle loops through the turned-on ($wave_<i>) LOCs in the specified sequence (#order) and carries out the specified number of cycles ($ncycle). Each cycle performs the specified number of refinement steps ($xstep, $bstep, $fstep, $fdstep).
The first macrocycle is always carried out without refinement steps and scaling. Refinement steps and scaling start in the second macrocycle. This is done in order to build up an initial combined phase probability distribution using all available information before performing the refinement steps and scaling. At the end of each cycle of a particular LOC, the overall combined probability phase distribution is updated.
LOCs
There are two LOCs for each wavelength except for the reference wavelength:
F->Fref and Friedel(F)->Fref.
For the reference wavelength only the Friedel(Fref)->Fref LOC is used. Refinement of the LOCs is carried out independently constrained by the overall phase probability distribution. Differences in refinement parameters among the various LOCs can be used to assess the convergence of the refinement.
Phasing and refinement selections
Phase probability distributions are obtained for all wavelengths which have been measured at the reference wavelength regardless of completeness at the other wavelengths and of the completeness of the Bijvoet pairs. In other words, phase probability distributions will be produced for reflections for which only a subset of wavelengths and Bijvoet mates was observed. For the refinement and scaling targets, however, a subset of the reflections are used. The selection is stored in the array "w_sel". The example in mad_refine.inp selects those reflections for which all amplitudes satisfy the low amplitude, dispersive and anomalous difference and outlier cutoffs. The selection can be custom-tailored if necessary.
Cross-validation
The LOC R value, LOC value, and phasing power are computed for both the selected subset of reflections used to define the target ("working set") and the remaining reflections ("test set"). The test set can be used for cross-validation. According to Brünger, the cross-validated values are a more sensitive indicator for the quality of the refinement and the correctness of the refined sites.
Compressing the log file
Search for the word "TAB" in the X-PLOR log file. This produces a more concise summary of the progress of the refinement process. Look at the mad_refine.summary file to get a compact summary of the final refinement parameters and statistical indicators.
Assessing convergence
The program runs through macrocycles ($macrocycle), that is, the prescribed sequence of LOCs. For each LOC the program runs through a series of microcycles ($ncycle). Search for the word "shift" in the log file that X-PLOR produces to assess convergence.
Note that in the log file the "cycle" number refers to the number of microcycles. Note that the fp and fdp shifts are identical for all sites at a particular wavelength whereas B values and coordinates are individually refined. Some experimentation with $xstep, $bstep, $fpstep, $fdpstep, $macrocycle, $ncycle, $phistep, $tolerance should be performed to obtain a reasonable compromise between computer time and accuracy.
Fp and fdp shifts are the most sensitive indicators of convergence. One can get these shifts by using a pipeline of grep commands.
For example:
grep TAB mad_long.out | grep shift | grep "F_W1" | grep "1 SITE" | grep fp
If the shifts are below 0.1 the corresponding phases should have converged and one can stop the refinement.
Other indicators for convergence are the difference in f' and f'' values for the F->Fref and Friedel(F)->Fref LOCs (cf. file mad_refine.fp), the difference in coordinate and B-factor values for the various LOCs (cf. file mad_refine.pdb), reasonable final B values and coordinates and shifts with respect to the initial parameters (cf. file mad_refine.summary).
Choice of refinement and phasing parameters
Convergence is affected by the number of refinement steps carried out ($xstep, $bstep, $fpstep, $fdpstep, $macrocycle, $ncycle). Accuracy of the phase probability distribution is affected by the integration step size ($phistep, in degrees). Depending on the resolution of the diffraction data, anisotropic B scaling (parameter $bscale) should be tried.
The order in which the LOCs are computed should produce only a small influence on the result (the "list" parameter in the macro call to master). The choice of reference wavelength ($ref_wave) can produce slightly different phases. Thus, the phase differences between different choices of reference wavelengths are a lower bound of the phase error. It is sometimes also of interest to determine the influence of omitting a particular wavelength ($wave_1, $wave_2,... flags).
Lack-of-closure expressions
Individual wavelengths can be turned off and on for phasing and refinement by changing the $wave_1, $wave_2, ... flags. In general, two LOCs will be produced per wavelength: the F->F LOC and the Friedel(Fref)->Fref LOC, both relative to the reference wavelength. For the reference wavelength, only the Friedel(Fref)->Fref LOC is computed.
The individual LOCs can be custom-tailored (that is, different selections for the reflections used for the target and scale calculations, refinement parameters) by changing the entries in the call to the master macro.
Absolute scale of structure factors and scattering factors
The probabilistic approach produces f' and f'' values that depend on the absolute scale of the structure factors. In addition, only differences between f' values are used in the probabilistic approach. If reliable estimates of f' and f'' values are available (for example, obtained by a theoretical computation of f'' at a "remote" wavelength with a small anomalous signal), it can be used to rescale and offset the scattering factors such that the resulting f' and f'' values will match the initial value for the specified wavelength $abs_scale_wave. The scattering factors and diffraction data are mapped as follows:
- f' (new) = scale * f'(refined) + offset
- f'' (new) = scale * f''(refined)
- fobs(new) = scale * fobs(refined)
- sigma(new) = scale * sigma(refined)
- f_w1(new) = scale * f_w1(refined)
- s_w1(new) = scale * s_w1(refined)
- ...
- scale = f''(initial) / f''(refined)
- for "scaling" wavelength Friedel(F)->Fref LOC
- offset= f'(initial) - scale*f'(refined)
- for "scaling" wavelength Friedel(F)->Fref LOC
-
As a result, f'(new) and f''(new) (for the Friedel(F)->Fref LOC) are equal to the initial values for the "scaling" wavelength. In the provided example, the remote wavelength is 4; that is, $abs_scale_wave is set to 4.
Interpretation of the output files produces by mad_refine.inp
1. mad_refine.fp
This file contains the refined f' and f'' values. Note that the probabilistic approach cannot produce absolute values for f' and f''. The scale of f' and f'' is dependent on the scale of the diffraction data (the scale factor that was applied to the reference wavelength by mad_scale.inp). In addition, f' is always relative to a baseline which has to be theoretically estimated from the observed fluorescence signal. If the "scaling" wavelength is specified, f' and f'' and offset and re-scaled to match the scaling wavelength. The refinement is performed independently for the F -> Fref and Friedel(F) -> Fref LOCs. This provides a way of assessing the convergence of the refinement. According to Brünger, the corresponding values should normally be close (within 10% of the maximum f' and f'' values).
Note:
- The Fref-Fref LOC is not meaningful, thus no refined f', f'' values are given.
- If no "scaling" wavelength is available, the f' value for the reference wavelength will be zero and the scale of f' and f'' will be determined by the scaling performed in mad_scale.inp.
- The scaling wavelength does not have to be equal to the reference wavelength.
2. mad_refine.hkl
This file contains the phase probability distributions (stored in arrays PA, PB, PC, PD), the corresponding phase centroids (stored in the phases of the FOBS array), and the corresponding figures-of-merit. This file is ready to be read by subsequent jobs to produce experimental electron density maps, perform density modification, phase combination, and refinement with or without phase information.
3. mad_refine.pdb
This file contains the refined coordinates of the anomalous sites. There is one coordinate set for each LOC, that is, each LOC is treated independently. The SEGIDs are encoded according to the particular LOC, for example:
The SEGID TO+2 refers to the F->Fref LOC and the SEGID TO-2 refers to the Friedel(F)->Fref LOC.
Convergence of the coordinate sets and B factors for corresponding sites can be used to assess the quality of the refinement. According to Brünger, the coordinates are very similar for the different LOCs. The B-values show more variation since they may compensate for systematic errors between different wavelengths.
4. mad_refine.summary
This file lists the LOC R value, LOC value, phasing power, initial and final coordinates and other refined parameters. The overall figure-of-merit distribution is listed at the end.
Statistical indicators for the working set
The following notes can be used as guidelines to interpret the various statistical indicators:
- Phasing power values > 1.0: excellent LOC
- R-Cullis < 0.6: excellent LOC;
R-Cullis < 0.9: usable LOC
How to obtain phases for the non-anomalous structure factor
Use X-PLOR to read the reflection file with centroid phases output from mad_refine.inp and use an X-PLOR script with the xrefin statement:
- write reflection fobs sigma
to write a reflection file containing the phased non-anomalous structure factor.
Release notes and warnings
- At the current stage of the program, individual heavy atom directions cannot be fixed during refinement: the heavy atom is either refined or kept fixed. This may cause drifting of the heavy atom in cases where one of the directions is arbitrary (for example, for one heavy atom in P21, the y-coordinate is arbitrary).
- At the current stage of the program the Hendrickson-Lattman coefficients are treated as real arrays. This means that if the data set is expanded or mapped into a different asymmetric unit the corresponding phase probability distributions may be wrong. To expand and reduce Hendrickson-Lattman coefficient data sets use the expandprobability and reduceprobability macros.
- You must reduce all your data sets in anomalous mode: Friedel mates are kept as separate reflections. X-PLOR will phase both Friedel mates. To obtain phases for the non-anomalous structure factor follow the note below.
- Please make sure you run enough cycles to get good convergence (see notes below).
- Note that the individual FOMs listed in the output file (for example, when using the grep command to get the lines marked by "TAB") represent the FOMs for each lack-of-closure expression. The FOMs of the combined phase probability distributions are listed in the mad_refine_summary file.
- Note that the notion of a "test" set in the macros should not be confused with a randomly selected set. In this context it refers to the reflections that are not used for scaling and refinement but that are phased.
- The refinement can be very slow because of the generality of the method.
Data conversion
Background
The crystallographic diffraction data that you will use in calculations with the X-PLOR program could initially be in any one of a variety of formats, depending on the software that was used for data processing or other calculations through which it may have been passed. A small utility program has been provided in order to reformat this data into the X-PLOR reflection data format. (See Appendix B, File Formats, for a description of the items supported in the reflection data model and the tags used to identify them.)
This utility program is able to read a variety of data types and arrangements. If the data is initially in a binary format (for example, as in the CCP4 data formats) you should dump these to an ASCII format (CCP4 provides a facility for this operation) and then select the appropriate format for this utility program.
Implementation
A utility program, xplor_data_convert, is provided for carrying out data conversions. This program may be run using a small input script from the UNIX command line. Diagnostic information should be directed to a log file. For example:
xplor_data_convert < data_convert.inp > data_convert.log
The conversion of data to the X-PLOR format usually only takes a few seconds.
The input script contains the following information:
- Crystal cell dimensions (a, b, c,
,
,
)
- Lower and upper resolution limits in angstroms.
- Codes to specify the input data items. Reflection records are assumed to begin with Miller indices h,k,l. The data items are read as free-format items:
- FSFS Fobs(+), SD(+), Fobs(-), SD(-)
- FSD Fobs, SD, Anomalous difference
- FSFP Fobs, SD, Fcalc, Phasecalc
- FSPM Fobs, SD, Phase, Figure-of-merit
- FSMP Fobs, SD, Figure-of-merit, Phase
- FMP Fobs, Figure-of-merit, Phase
- FS Fobs, SD
- IS Intensity, SD
- ISIS Intensity(+), SD(+), Intensity(-), SD(-)
- FMP4 Fobs, Figure-of-merit, Phase, A, B, C, D
- FSMP4 Fobs, SD, Figure-of-merit, Phase, A, B, C, D
- An option (yes/no) to use a FORTRAN-style format description for the input reflection data. This option provides the capability of reading files that do not precisely match one of the pre-set format descriptions on line 3. For example, the format descriptor could be used to bypass unwanted items.
- A FORTRAN-style format descriptor (read but not interpreted if "no" is entered on line 4).
- Specification (yes/no) as to whether the input data set contains Bijvoet pairs as unique items or whether these data have been merged.
An example input script, data_convert.inp, is provided in the tutorial/dataconversion subdirectory.

NMR structure determination
Torsion angle dynamics for NMR structure determination
Background
The basic rationale for using torsion angle dynamics for structure determination using distance restraints are similar to those for X-ray crystallographic refinement -- elimination of unwanted degrees of freedom and the ability to achieve stable runs at very high temperature without incurring large computational cost. Applications of torsion angle dynamics to NMR structure determination were described by Stein et al. (1997). For moderately sized proteins (~100 amino acids) a simple protocol using torsion angle dynamics was found to have very high (>85%) success rates for structure determination.
In summary, this protocol consists of the following stages:
- `Cooking' at 50,000K for 15ps with the energy constant for the Van der Waals parameters scaled by 0.1.
- Cooling to 1000K for 15ps with ramping of the Van der Waals parameters to full scale.
- Cooling to 300K for 6ps using conventional molecular dynamics.
- 1000 steps of conjugate gradient minimization.
- The weights for the NOE restraints are set to 150 kcal/mol for stages 1-3 and 50 kcal/mol for stage 4. The functional form for the NOE distance restraints is a flat-bottomed parabolic function with a soft asymptote. The weights for the dihedral angle restraints are set to 100 kcal/mol for stages 1-3 and 300 kcal/mol in the final stage.
We have developed a script duplicating this protocol and a related script that omits use of conventional molecular dynamics. The performance of these two scripts is very similar in terms of success rate and computer time. We believe that the length of the initial "cooking" stage is the most critical factor in achieving correctly folded structures. For very large proteins a reasonable success rate can be obtained by increasing the length of this stage.
Implementation
Two scripts for structure determination using torsion angle dynamics are available in the tutorial/nmr directory. The first script, sa_tad_stein.inp, contains the protocol described by Stein et al. (1997) and the second script, sa_tad_all.inp, contains a slightly modified protocol that omits any use of conventional molecular dynamics.
Structure determination using ambiguous restraints
This portion of text is adapted from program notes written by M. Nilges.
Introduction
ARIA (for Ambiguous Restraints for Iterative Assignment) is a fully automated iterative method that performs a series of tasks, tasks that are typically necessary in the calculation and refinement of structures from NOE data (Nilges et al. 1997).
Each task performed by ARIA has a corresponding subdirectory containing example scripts in /tutorial/aria/):
- calibrate peak volumes to distance restraints, based on reference volumes and distances, or calculated structures. Several proton classes can be used.
- assign ambiguous NOEs.
- clean spectra from problematic NOEs (noise, artifacts), based on violation statistics. These NOEs can be printed for inspection, and their bounds can be re-set automatically, or they can be automatically excluded.
- merge several independent data sets (different temperatures, solvents, homo- and heteronuclear).
- calculate structures with X-PLOR, using either ab initio simulated annealing or a hybrid distance-geometry simulated annealing approach. Only the former allows full use of ambiguous NOEs in all stages of the calculation. Prochiral groups are treated with a floating chirality approach that is fully integrated into the treatment and assignment of ambiguous NOEs.
- analyse and select structures for a next iteration of assignment/structure calculation.
One of the most important goals of using ARIA is bookkeeping. Any interactive work is performed only on the peak lists themselves.
Each data set is interpreted (calibrated, assigned, and cleaned) separately, before the different sets are merged to give one distance restraint list. Different types of data sets can be mixed; the operations performed by ARIA are determined by simple flags. Thus, one can use a list of peaks from an automatic peak picker, where calibration, assignment, and cleaning have to be performed. A single file (called project.xplor) needs to be edited to set all flags and parameters for the spectra assignment and the calculation.
Overview
Iterative assignment scheme
In general, the method bases all operations in iteration i on the structures of iteration i-1. Iteration 0 is different from other iterations in that no previous trial structures are yet available. There exist several possibilities for iteration 0:
- 1. generate a series of random structures and treat these as the basis of all operations (calibration, assignment, violation analysis and merging);
- 2. calculate a set of structures based on manually assigned, `hard' restraints;
- 3. generate starting structures from sequence homology.
Each following iteration begins with ordering the structures from the previous iteration with respect to total energy and the best are selected as the basis for interpreting the spectra. A list of the structures ordered by energy is created in the file file.list. The spectra are first calibrated, using distances calculated from the structures as reference. For every NOE whose chemical shift coordinates correspond to proton resonances, an ambiguous distance restraint is added to the list. All the restraints extracted from the spectra in this way are analyzed for systematic violations. Any restraint that is systematically violated is removed from the list. The restraints are then partially assigned, that is to say, possibilities that correspond to large distances in the converged structures are removed. This procedure is applied to each spectrum (for example, taken at different mixing times, temperatures, etc.) and a separate restraint list for each spectrum is created in the same directory as the structures that will be calculated from them. The restraint lists derived from the different spectra are then merged to avoid duplication of information. A new set of structures is calculated.
Organization of the project
A structure determination project is organized in a directory tree. The root directory (project) contains files project.xplor and initialize.xplor and the subdirectories data, structures, assign, analysis and protocol.
data This directory contains a subdirectory for the sequence related data, and for each dataset used (NOEs, and other data such as dihedral angles, etc.). It also contains a list of the names of all assigned spectra to be merged (spectra_ass.list), which is created at present by hand, for example:
- sequence This subdirectory contains all sequence-related information such as the psf file, the template file for the generation of random starting structures, and a list of prochiral groups for which floating chirality assignment is performed. It also contains the file read_data_xplor, which needs to specify all data files read in for the structure calculation.
- noe1 Each of the noe subdirectories contains the original peak file, the chemical shift list valid for this spectrum, and the converted restrained lists readable by X-PLOR. The allowed frequency ranges are determined during conversion, and cannot be changed later.
- noe2 A different peak or restraint list.
- hbonds This directory contains a list of hydrogen bonding restraints, which can be assigned or ambiguous.
- torsion Standard torsion angle restraints.
structures Contains a subdirectory for the results of each iteration: the assigned spectra, and the structures calculated with these spectra. In addition ARIA creates a file, file.list, that contains a list of the filenames, ordered by total energy.
assign Contains input files and X-PLOR include files to perform the partial assignment, cleaning and merging.
- calib.inp uses the include files calibrate.xplor to perform calibration and errset.xplor is used to set error bounds. It performs a violation statistic to reset bounds of or remove certain restraints, and assigns ambiguous NOEs and writes the resulting restraints in the current iteration directory.
- merge.inp takes the resulting restraints files of all NOE spectra and merges them to remove multiple entries arising from multiple picking of the connectivity.
analysis A series of analysis scripts.
protocol The read-only files of the simulated annealing protocol.
toppar The energy parameters used for the structure determination.
The subdirectories analysis, protocol and assign are (mostly) read-only and usually need not be modified. toppar also should not have to be modified, but it seems advantageous to keep exactly the parameters used with the project for documentation.
Start of a project
Structure: Create as usual the psf file and template file in a directory data/sequence.
Floating chirality: Define the groups for which floating assignment is to be performed in the file setup_swap.xplor in data/sequence.
NOE data: The NOE data is organized by using a separate directory for each data set.
- 1. Create a separate directory for each NOE list to be used in the structure calculation.
- 2. Copy the original peak list and chemical shift information into this directory.
- 3. Edit an appropriate script from the assign directory to apply appropriate
ppm values in the two (three) frequency dimensions, and translate the peak files and chemical shift lists to the X-PLOR readable format.
project.xplor: Modify project.xplor to define file names and parameters.
spectra_ass.list: Create a file in directory data that contains a list of all file names of assigned spectra (the $restraintfile variable in project.xplor, prefixed with "@@NEWIT".
initialize.xplor
This file contains only initialization commands for a few file names.
project.xplor
The file project.xplor contains all parameters, file names, etc., specific to the project. The first section contains all the parameters of the simulated annealing protocol (temperature, number of steps, etc.).
The second section contains file names of the structure file, the template file, and the file root for all coordinate output files.
The third section contains parameters for each dataset used in the refinement.
The fourth section contains parameters for each iteration (any of the above parameters can be put here if they are to be varied from iteration to iteration).
The last section contains some modifications of parameters defined above in order to specify correct directories (via environment variables).
X-PLOR/ARIA commands
ANALYse-restraints <analysis-restraints-statement> is invoked from the ARIA level of X-PLOR.
<ANALysis-restraints-statement>:==
- HELP
- EQUIvalent <group> { <equivalent-statement> } END marks protons with unresolved chemical shifts as a single spin. The distance to an unresolved group is calculated as <r-av>1/av where the exponent av is set with the AVERage command. The main use of this command is for methyl groups and aromatic rings.
- <equivalent-statement>:==
- 1-2 treats each methylene group as an unresolved group of protons. A methylene group is recognized by X-PLOR as three hydrogens bound to a heavy atom. At present not working.
- METHyl treats each methyl group as an unresolved group of protons. A methyl group is recognized by X-PLOR as three hydrogens bound to a heavy atom. At present not working.
- NONE deletes all previously defined unresolved groups.
- SELECT=<selection> treats the selected atoms as an unresolved group of protons.
- SINGle reads in distances from the main coordinate set and overwrites any existing data.
- ACCumulate adds distances from the main coordinate set, calculating a <r-6>-1/6 average.
- AVERage adds distances from the main coordinate set, calculating an arithmetic average.
- MINImum adds distances from the main coordinate set, calculating the minimum value.
- MODE=INTRA-residue | ALL INTRA will count as ambiguous only NOEs that involve more that one residue pair; ALL all ambiguities (default: ALL).
- LEVEl=<real> Assignment criterion. Possibilities that have an estimated contribution to the peak of less than LEVEl will be removed.
- CUTOff=<real> Assignment criterion. Possibilities that correspond to distances larger than CUTOff will be removed.
- FRQUpdate <class> Updates frequencies (in RMSD array) to the weighted average of the NOE peak positions.
- CHECk <class> Removes duplicate entries on restraint list. All restraints are compared to each other. If two restraints have the same assignment, the one with narrower bounds will be kept.
- PURGe <class> <class> Remove an unambiguously assigned assignment possibility in the first class from all restraints in the second class.
- TRANsfer not implemented.
- MINN=<integer> Minimum number of possibilities to be written. If a restraint has less than the specified number of possibilities, it will not be written.
- MAXN=<integer> Maximum number of possibilities to be written. If a restraint has more than the specified number of possibilities, it will not be written.
- FROM=<selection>...TO=<selection> Write only restraints involving specified atom pairs.
- LIST<class> Write out restraints as a human readable list in atom order.
- SORT<class> Write out restraints as human readable list in order of estimated contribution.
- RESTraints <class> Write out X-PLOR restraint list using standard atom selection to specify ambiguities.
- OR-Restraints <class> Write out X-PLOR restraint list specifying ambiguities with OR between double selections.
- RESET Reinitialize database.
COUNt-violations <count-violations-statement> is invoked from the ARIA level of X-PLOR.
- HELP
- RESEt Reinitialize database.
- THREshold <real> Count the number of times a restraint is violated in an ensemble of structures.
- EXCLude <class><real> Exclude all restraints in a class that are violated in more than the fraction of structures specified.
- LIST <class><class> List all restraints in a class that are violated in more than the fraction of structures specified.
- SET <class> <real> <real> <real> Set limits for all restraints in a class that are violated in more than the fraction of structures specified to the specified lower and upper bound.
CALIbrate <calibrate-statement> is invoked from the ARIA level of X-PLOR.
- HELP
- RESEt Reinitialize database.
- MODE (ANISotrop | ISOTrop) By default, a calibration factor is calculated for each pair of proton groups separately. The ISOtropic mode re-averages these calibration factors such that CIJ=(CII.CIJ)1/2. This is at the moment somewhat unreliable.
- CUTOff <real> Exclude all distances in trial structures that exceed cutoff.
- VECTor <sele> <sele> ( AUTO | DRED | VREF ). If AUTO is specified, the calibration distances are taken as averages over all interproton distances in the trial structures for which there is an NOE. Explicit reference distances and volumes can be specified with DREF and VREF commands. The missing terms in the matrix of calibration factors are then automatically calculated using the isotropic approximation.
- DREFerence <real> Specify an explicit reference distance for the last specified pair of proton groups.
- VREFerence <real> Specify an explicit reference volume for the last specified proton groups.
- EXPOnent <real> Specify an explicit reference volume for all protons.
DO <do-statement> is invoked from the ARIA level of X-PLOR. It manipulates the value of data arrays. The operations are carried out component-by-component for all atoms.
J-coupling and proton and 13C13 carbon chemical shift NMR refinement
Direct coupling constant refinement
The direct coupling constant refinement is described in Garret et al. (1994).
The syntax required for direct coupling refinement is a set of statements of the form COUP E<coupling statement>L END
with
<COUPLINGS-statement>:==
- ASSIgn<slctn><slctn><slctn><slctn>[<slctn><slctn><slctn><slctn>]
- <real><real>[<real><real>]
- E* atom i , atom j , atom k, atom l, J-obs, J-err *L.
- The optional second set of slctns and obss is for degenerate classes
- CLASs <name>
- Starts a new class. Applies to all ASSIgn, TYPE, FORCe and FLAT entries until another CLASS entry is issued.
- COEFFicients <real><real><real><real>
- Set the Karplus curve coefficients A,B,C and the Karplus curve phase P (in that order) for this class.
- CV=<integer>
- Select partition number for cross-validation.
- DEGEneracy <1 | 2>
- Set the number of couplings to be averaged for each observation in this class (Edefault = 1L).
- FORCe <real><real>
- Force constant(s) for all assignments in the current class (Edefault = 50, 10L).
- NREStraints <integer>
- Number of slots for J coupling.
- PARTition=<integer>
- Number of partitions for complete cross-validation.
- POTential <SQUAre | HARMonic >
- Use J-err or not.
- PRINt THREshold <real> <ALL | CLASs <name>>
- Prints J-coupling violations greater than the specified value (in Hz).
- RESet
- Erases the J coupling assignment table.
Direct secondary carbon chemical shift refinement
The direct secondary carbon chemical shift refinement is described in the paper by Kuszewski et al. (1995b).
The syntax required for direct coupling refinement is a set of statements of the form CARB {<chemical-shift-statement>} END
with
<CHEMSHIFT-statement>:==
- ASSIgn <sel><sel><sel><sel><sel><real><real><real><real>
- {* Ci-1, Ni, Cai, Ci, Ni+1, Ca-obs, Cb-obs (in ppm) *}.
- CLASsification <name>
- Starts a new class. Applies to all ASSIgn and FORCe entries until another CLASS entry is issued.
- EXPEctation <integer> <integer> <real> <real> <real> <real>
- Sets (psi-pos, psi-pos, CAvalue, CAerr, CBvalue, CBerr ) for expectation values.
- FORCeconstant <real>
- Force constant for all assignments in the current class (default = 50).
- NREStraints <integer>
- Number of slots for chemical shift restraints to allocate in memory (default = 200).
- PHIStep <real>
- Number of steps in the phi dimension of the expectation array.
- PSIStep <real>
- Number of steps in the psi dimension of the expectation array.
- POTEntial <SQUAre | HARMonic>
- Use shift errors or not.
- PRINt THREshold <real>
- Prints secondary shift violations of either Ca and Cb greater than the specified value (in ppm).
- RCOIl <sel> <real> <real>
- Set the random coil a and b 13C shifts for the selected atoms.
- RESEt
- Erases the chemical shift assignment table.
- ZERO
- Zero out the expectation value arrays.
Direct 1H chemical shift refinement
The direct 1H chemical shift refinement is described in two papers by Kuszewski et al. (1995a, 1996).
The syntax required for direct 1H chemical shift refinement is a set of statements of the form PROTONSHIFTS {proton-shift-statement} END
with
<PROTONSHIFTS-statement>:==
- OBSErved <sel> [<sel>] <real> [<real>]
- Set the observed proton shift or shifts for the protons in the selection(s) to the real value(s).
- RCOIl <sel> <real>
- Set the random coil proton shift for all the protons in the selection to the real value(s).
- ANISotropy <sel> <sel> <sel> <CO|CN> <isCOOH> <SC | BB>
- Select CA, CO,O atoms (one in each selection), say if it is a CO or CN bond, if it is part of a carboxyl group (which are averaged together), and whether it is a backbone or sidechain CO.
- AMIDes <sel>
- Select all the backbone amide protons.
- CARBons <sel>
- Select all the BB carbonyl carbon atoms.
- NITRogens <sel>
- Select all the BB nitrogen atoms.
- OXYGens <sel>
- Select all the BB oxygen atoms.
- RINGatoms <PHE|TYR|HIS|TRP5|TRP6|ADE6|ADE5|GUA6|GUA5|THY| CYT|URA> <sel> <sel> <sel> <sel> <sel> [<sel>]
- Specify how many atoms are in the ring, and select each one in turn. The last selection is only for six-member rings.
- ALPHasandamides <sel>
- Select both alpha and amide protons.
- CLASsification <name>
- Starts a new class. Applies to all ASSIgn and FORCe entries until another CLASS entry is issued.
- ERROr <real>
- Specify the error to be used by the current class in square well calculations (default = 0.3).
- DEGEneracy <1 | 2>
- Set the number of shifts to be averaged for each observation in this class (default = 1).
- FORCeconstant <real> [<real>]
- Specify the force constant and Constantine force constant for all assignments in the current class (defaults = 0.1).
- POTEntial <SQUAre | HARMonic>
- Specify whether to use se shift errors or not.
- PRINt THREshold <real> <ALL | CLASs <name>> <RMSD|NORMsd>
- Print proton shift violations greater than the specified value (in ppm) for all classes or for the specified class. May also use to prints the results to the RMSD array
- RESEt
- Erases the proton shift assignment table.
Fast refinement using direct NOEs
Background
Refinement of NMR structures typically employs distance restraints derived from a semi-quantitative interpretation of the NOE intensities. In cases where one has good quality experimental data, it behooves one to use the NOE intensities directly to assist in the refinement. NOE intensities are the raw experimental data (much like the reflection data in X-ray crystallography). They contain more information than the semi-quantitative distance restraints and are less susceptible to bias in interpretations.
The technology of the calculation of NOE intensities and the gradients with respect to the atomic coordinates have improved over the last few years. The method that is implemented in X-PLOR is based on the ideas of matrix doubling and Gaussian quadrature (Yip 1993). This present implementation is significantly faster than the earlier implementation based on Yip and Case (1989). To date, accelerations of one to two orders of magnitudes have been obtained, depending on the system studied.
Implementation
The new method is implemented as a new option in the relaxation/cutoff parameter group. Within the mode specifier, the new keyword MATD is added as an option.
MODE=<NONE,GRADient,ALL,MATD>
applies no cutoff, cutoff only for the gradient calculation, cutoff for both intensity and gradient calculation, and cutoff employed for both intensity and gradient calculation but using the new matrix doubling method (default: none).
When MATD is selected, one must specify the value of the cutoff by the parameter VALUe,. The RELAtive_value parameter is inoperative. A value of 6-8 is recommended for VALUe when MATD is chosen.
Last updated May 05, 1998 at 11:51AM Pacific Daylight Time.
Copyright © 1997, 1998, Molecular Simulations, Inc. All rights
reserved.