Next: How Far to Shift? Up: Theory of TNT Refinement Previous: Theory of TNT Refinement

What Direction to Shift?

How does one calculate ? TNT offers several options. The simplest method is derived by recognizing that the least we need is for to point downhill. Since the gradient of a function points in the steepest uphill direction, by definition, the negative of the gradient will point in the steepest downhill direction. Equating with the negative of the gradient of the function ( ) gives us the steepest descent method.

The steepest descent method is very good as long as the curvature for each parameter is the equal, but this is far from true in crystallographic refinement. Because of the difference in curvatures the minimum curves away from the direction of the gradient, and many, many successive cycles of refinement are required to reach the minimum. Some account must be made for the curvature.

The conjugate gradient method (Fletcher, Reeves, Computer Journal (1964). 7, 149) uses the difference between the gradient in the last cycle and the current gradient to infer the curvature and derive a superior direction to shift. In effect, it learns the curvature of the function as more and more cycles are run and compensates for it. The conjugate gradient method requires the current gradient and the direction of shift from the last cycle. This means that it cannot be used in the first cycle of a series because in the first cycle there is no old direction of shift. Also you cannot change anything about the model between the two cycles. The weights, modules used, and scale factors must remain the same. If you need to change the weights, or something else, you must restart with a steepest descent cycle.

To summarize, with conjugate gradient minimization

Because the conjugate gradient method deduces the curvature from previous cycles it sometimes takes quite a few before it can overcome the problems introduced by large differences in curvature. For positional parameters, x, y, and z, the curvatures are close to the same magnitude for all atoms except when a particular atom has many more electrons than the rest. A heavy atom will tend to be overshifted each cycle which results in an oscillation. In steepest descent the oscillation problem is very bad and persists after many cycles. With conjugate gradient the oscillations will damp out with time but you must monitor the individual shifts of the heavy atoms to determine when convergence has occurred.

The B factors of heavy atoms also oscillate during refinement, however a more serious problem occurs with atoms with large B factors. The curvature for the B factor of these atoms is smaller than the average which causes the parameters to be undershifted. The B factors actually lock up with values smaller than optimal if they started too small and with values too large if they started too large. Since B factors usually start small the large B factors of a model refined with steepest descent or conjugate gradient are systematically underestimated.

The solution to these problems is to include explicitly the curvature of each parameter. The method which uses curvature that is equivalent to steepest descent is the gradient/curvature method. In this method the direction of shift, , is defined to be the negative of the ratio of the gradient and the diagonal of the curvature matrix. In other words (so to speak)

Given the same gradient, parameters with large curvatures are shifted smaller amounts while those with smaller curvatures are shifted more. The calculation of the shift vector with this method requires the gradient of the function and the diagonal of the curvature.

Besides improving the rate of convergence the explicit incorporation of the curvature allows the positional and thermal parameters to be refined at the same time. With steepest descent and conjugate gradient the large difference in curvature between the different types of parameters results in the requirement that they each be refined separately. First the XYZ parameters must be refined with the B factors held constant and then the B's are refined with the positions held. The process must be alternated several times to allow each class of parameter to adjust to the changes in the other. With gradient/curvature all parameters may be varied in each cycle.

While the incorporation of the diagonal part of the curvature improves convergence a great deal the off-diagonal part is still not used. The explicit use of these elements of the normal matrix would consume a great deal of computer time and require extensive modifications to the structure of TNT. If a method equivalent to conjugate gradient existed which used the gradient/curvature method as its base instead of steepest descent the off-diagonal elements would be ``learned'' from the history of the refinement. Such a method has been devised (Axelsson, O., Barker, V.A., Finite Element Solution of Boundary Value Problems (1984) Chapter 1, Academic Press, Inc), (Tronrud, D. E., Acta Cryst. (1992) A48, 912-916) and called preconditioned conjugate gradient. To calculate the shift vector requires the gradient and curvature of the function as well as the shift vector from the last cycle. Like conjugate gradient nothing may be changed between successive cycles of preconditioned conjugate gradient refinement. If the weights are to be changed refinement must begin again with gradient/curvature.

The equations for the shift in TNT's version of preconditioned conjugate gradient are

Next: How Far to Shift? Up: Theory of TNT Refinement Previous: Theory of TNT Refinement

Dale Edwin Tronrud
Thu Jul 6 23:24:57 PDT 2000