CCP4 Interface: Experimental Phasing Module

	CCP4i: Graphical User Interface
	Experimental Phasing Module

Solution Files (HA files)

Merge Datasets - CAD

Scale and Analyse Datasets - SCALEIT and FHSCAL: Scale and Analyse Datasets - Task Window Layout

Prepare Data for HA Search - Revise, Ecalc, MTZ2various

ACORN - ab initio Phasing

SHELX - Heavy Atom Search

RANTAN - Direct Methods

Professs - NCS from HA

Oasis - SAD/SIR phasing

Generate Patterson Map: Excluding Large Intensity Differences; Generate Patterson Map - Task Window Layout

Real Space Patterson Search - RSPS

Run MLPHARE: Data Harvesting; Maps

This module contains the following tasks:: Merge Datasets (CAD); Scale and Analyse Datasets; Prepare Data for HA Search; ACORN - ab initio Phasing; SHELX - Heavy Atom Search; RANTAN - Direct Methods; Professs - NCS from HA; Oasis - SAD/SIR phasing; Generate Patterson Map; Real Space Patterson Search; Run MLPHARE

Specialist Help is available on:: ScaleChoose - choosing the right scaling program for your datasets

The layout of each task window, i.e. the number of folders present, and whether these folders are open or closed by default, depends on the choices made in the Protocol folder of the task (see Introduction). Although certain folders are closed by default, there are specific reasons why you should or may want to look at them. These reasons are described in the Task Window Layout sections below.

Solution Files

Heavy Atom (.ha) Files

Heavy atom (HA) files are short files which keep a record of the proposed heavy atom sites in a structure. They are analagous to the MR files of the Molecular Replacement module. The format of the file is similar to the ATOM input line for the MLPHARE heavy atom refinement program. There is one line per atom site and the line is free format beginning with the word ATOM:

ATOM atom_name x y z occupancy anomalous_occupancy BFAC B-factor

The interface to MLPHARE can use an HA file as input and HA files are output by:

PEAKMAX program: using the OUTPUT FRAC option
RANTAN task: the files are actually generated by the PEAKMAX program which searches the maps generated from RANTAN's output phases
ACORN task: the files are actually generated by the PEAKMAX program which searches the maps generated from ACORN's output phases
Generate Patterson Map task: when the PEAKSEARCH option is on
Real Space Patterson Search (RSPS) task: when the site analysis option is used. The script for the task reads the RSPS program log file and extracts the sites information to HA files
MLPHARE task: after each refinement run the current refined coordinates are extracted from the log file to an HA file

HA files are generated with a default file name which is project_jobid_n.ha where n=1,2,3... . If you select an HA file from the menu under the View Files from Job button, it will be displayed in an HA file viewer which is similar to the MR file viewer and which has some simple functionality to edit the file. Picking a line in the file will put a # character at the beginning of a line and this line will then be ignored on input to MLPHARE. A second pick will remove the # character. There is a Change All button at the bottom of the viewer which will add or remove #'s from all ATOM lines. There is also an Edit Columns button which presents options to set the atom name, occupancy, anomalous occupancy and Bfactor for all the atoms in the file.

Merge Datasets - CAD

This task interfaces to the CAD program which can be used to:

Delete columns from an MTZ file
Merge data from two or more MTZ files
Reset the resolution range in the MTZ file(s)
Change the sort order or HKL limits or space group

To input more than one MTZ file, click on the Add input MTZ file button. By default all the data in the input MTZ file is put into the output file but you can change the Input option from 'all columns' to 'selected columns' and then select the columns using the Add column button. If you want to have the majority of the columns in the file, then click on the List All Columns button and then delete the columns you do not require using the Delete selected item option under the Edit list menu button. You will then need to select the column by clicking on one of the fields for that column with the right mouse button. See also Extending Frames and Toggle Frames.

CAD can not deal with more than 29 columns.

Do not include columns H, K and L in input. These are transferred to output automatically, and only upset the program.

Two special data types are used to signal that you are preparing data for translation functions of various types. They are:

U: partial FC
V: partial PHIC

There must be only one FCpart PHICpart per input file, and they must be the last items specified for LABIN. CAD generates equivalent reflections using only the ROTATIONAL part of the primitive symmetry operator (i.e. if the spacegroup is P212121, these reflections are analysed as though the spacegroup was P222). This is allowed for in the TFFC and RSEARCH programs.

CAD - Task Window Layout

Features to look out for in the CAD Task are:

Folder title	Importance	Comment
Files Folder	Add input MTZ file	To include more than one MTZ file
Define MTZ Output	override space group, cell dimensions, sort order, hkl limits etc.	can also be done with SFTOOLS

See program documentation: CAD, SFTOOLS

Scale and Analyse Datasets - SCALEIT and FHSCAL

For the scaling of derivative to native datasets, two CCP4 programs are available: SCALEIT and FHSCAL. The tutorial on isomorphous replacement by I. Tickle describes the strengths and weaknesses of those programs. Note that there is no unique solution to the problem of scaling together two different datasets. Various problems can arise from:

Scale Datasets with Anomalous Dispersion Data

The Scale Datasets task will run SCALEIT to scale together all the DPHn (the dispersive difference for the nth wavelength).

It will optionally do a cross-comparison of the anomalous data sets - this involves rerunning SCALEIT with the input:

LABIN FP = FPHn SIGFP = SIGFPHn FPH1 = FPHm SIGFPH1=SIGFPHm DPH1 = DPHm SIGDPH1= SIGDPHm

for all possible pairwise combinations of wavelengths n and m. From these runs, the cross-comparison Rfactor and normal probability for the acentric data are extracted.

It is also optional to perform analysis of dispersive differences by rerunning SCALEIT with the input:

LABIN FP = FPH(+)n SIGFP = SIGFPH(+)n FPH1 = FPH(-)n SIGFPH1= SIGFPH(-)n

From this analysis, the normal probablities for the acentric and centric data and the Rfactor are extracted. The input MTZ file must contain the FPH(+)n and FPH(-)n. If you do not have data in this form, you should run the mtzMADmod program which converts DPHn to the appropriate form. This program is not interfaced. A better solution is to use the latest version of the TRUNCATE program which retains the FPH(+)n and FPH(-)n on output.

The results of both these analyses are tabulated in a summary file called project_jobid_scaleit.summary.

Scale Datasets - Task Window Layout

In the Protocol folder of the Scale Datasets task, you can choose:

analysis only - use SCALEIT without refinement
scale refinement using SCALEIT - use SCALEIT refinement
scale refinement using FHSCAL (Kraut's method) - use FHSCAL refinement
FHSCAL scale refinement & SCALEIT analysis - in effect, a combination of options 1 and 3
apply input scale factors - use SCALEIT with externally determined scale factors - not usually used

Features to look out for in the Scale Datasets Task are:

Protocol option	Folder title	Importance	Comment
1	Analysis	Graphs of differences between datasets	Analysis against resolution always performed.
2	Refinement Parameters	Apply Wilson scaling	Final Wilson scaling (affects scale factor only) after least-squares scaling (scale and temperature factors). See also Wilson.
3	Fhscal Scaling Parameters		Perform Kraut scaling with FHSCAL. In extreme cases, namely if the high resolution limit of the native dataset is lower than that of (one of) the derivatives, certain reflections may not get output. See also Caveat in FHSCAL program documentation.
4	Analysis	Analysis of FHSCAL results	SCALEIT ANALYSE is performed after scaling using FHSCAL (see protocol options 1 and 3).
5	Input Scaling Factors		Externally determined scales applied and analysis performed. No refinement. See also SCALE.

See program documentation: SCALEIT, FHSCAL.

Prepare Data for HA Search - Revise, Ecalc, MTZ2various

You wil need to run this task for the following cases:

Input Data	Phasing Method
MAD	RANTAN or ACORN, SHELX, RSPS, Anomalous Difference Patterson Maps
SAD	RANTAN or ACORN, SHELX
SIR	RANTAN or ACORN, SHELX

In the Prepare Data for HA Search task window you should only need to identify the type of your data and which phasing program you intend to run, and the interface will make the necessary conversions described below.

MAD data is rescaled by the REVISE program to give an estimate of the normalised anomalous scattering magnitude (given the column label FM by RANTAN and ACORN but sometimes referred to as FA in the literature). The input data can be in the form of F(+) and F(-) for each wavelength or be anomalous differences Dano for each wavelength. The output FM can then be used in similar fashion to a single anomalous difference (Dano) or isomorphous difference (Diso). The theory behind this is described in the REVISE program documentation.

Data conversion

Direct methods programs such as SHELX, RANTAN and ACORN usually work with data in the form of normalised intensities rather than the structure factors which are normally used in macromolecular crystallography. So structure factor data must be converted to normalised structure amplitudes for use in direct methods programs. The SHELX program has an internal procedure to do this conversion but data intended for RANTAN and ACORN must go through the ECALC program which calculates normalised structure amplitudes (usually given the column label E).

RANTAN, ACORN and all other CCP4 programs work with experimental data in MTZ file format but SHELX requires the data in an ASCII format described in the SHELX documentation. The Prepare Data for HA Search task will use MTZ2VARIOUS to convert an MTZ file to SHELX format.

See program documentation: REVISE, MTZ2VARIOUS, ECALC.

ACORN - ab initio Phasing at Atomic Resolution

ACORN is an ab initio procedure to solve a protein structure when atomic resolution data is available. In case of a structure containing heavy atoms, its procedures can be used for determination of anomalous scatterers from anomalous data where the resolution can be as low as 3Å to 4Å.

MAD data for ACORN must be preprocessed by the REVISE program (see above) which generates estimates of FM which is the normalised anomalous scattering factor. The input to REVISE is the FP and FPH(+)n and FPH(-)n for dataset n. These data should have been scaled by the SCALEIT program. REVISE also needs to know the wavelength, f' and f'' for each wavelength.

Acorn - Task Window Layout

Features to look out for in the Acorn Task are:

Protocol option	Folder title	Importance	Comment
search and phase with starting coordinates	ACORN-MR Parameters	Choose between a limited search with a POSItioned fragment, or a full ROTation Function and TRANslation function search
determine small molecule structure	General Acorn Parameters	Choose appropriate grid sampling	Grid sampling defaults to 1/3 of the high resolution limit which, in case of small molecule structures, is commonly around 1Å
search for heavy atom(s) at lower resolution		Separate window opens to 'Prepare Data for Experimental Phasing Programs'
search for heavy atom(s) at lower resolution	Selecting Data	Choose appropriate resolution limits

See program documentation: ACORN, ECALC, REVISE, SCALEIT.

SHELX - Heavy Atom Search

The SHELX program can be obtained from THE SHELX HOMEPAGE. The CCP4i interface is for SHELXS-97. To ensure that CCP4i scripts can find the SHELX program, the full path name of the program needs to be entered in the Configure Interface window which is accessed from a button in the System Administration menu on the right hand side of the Main Window.

For more information on the SHELX program, see THE SHELX HOMEPAGE. This has references to various FAQs: The SHELX Homepage; Frequently asked questions (macromolecules), and Thomas Schneider's FAQs.

RANTAN - Direct Methods

The RANTAN Direct Methods program can be applied to solving MAD data or isomorphous replacement data. The Interface will set the key input parameters appropriately for the type of data.

For isomorphous data, RANTAN works optimally with the input in the form of normalised amplitudes rather than structure factors so the Interface will usually run the ECALC program to convert SFs to normalised amplitudes. The Interface will alternatively allow input of either precalculated normalised amplitudes or normalised amplitudes and initial phases.

MAD data for RANTAN will be preprocessed by the REVISE program (see above) which generates estimates of FM which is the normalised anomalous scattering factor. The input to REVISE is the FP and FPH(+)n and FPH(-)n for dataset n. These data should have been scaled by the SCALEIT program. REVISE also needs to know the wavelength, f' and f'' for each wavelength.

See program documentation: RANTAN, ECALC, REVISE, SCALEIT.

Professs - NCS from HA

PROFESSS is a tool to help in the identification of NCS related atoms from a list of heavy atom positions. At the moment, PROFESSS only works with 'traditional' PDB files. HA files as produced by ACORN or RANTAN (for instance) can not be fed into PROFESSS - the HA file needs to be converted through the Convert Coordinate Formats task in the Coordinate Utilities module.

Professs - reading the ouput

The program first lists the triangles of atoms which it has found, then it analyses each pair of triangles as a possible NCS match. For each possible operator, a list of all matching atoms is given. For each pair of atoms, a 'loop factor' is listed. If the NCS operator is an N-fold rotation, the atom will be part of a 'loop' of N atoms (unless one is missing). This, along with an appropriate 3rd polar angle, can confirm the existence of a proper NCS operator.

Atoms are described by the atom serial number from the input PDB, along with 4 numbers listed in square brackets. The first of these is the number of the crystallographic symmetry operators, and the other three are the unit cell translations applied after the symmetry operator.

Professs - beware

When calculating the distance between a pair of atoms, all symmetry equivalents are considered, but only the cell repeat giving the least distance is considered. In a very few cases of low order crystallographic symmetry this may cause atoms to be missed.

Oasis - SAD/SIR phasing

OASIS is a computer program for breaking phase ambiguity in One-wavelength Anomalous Scattering or Single Isomorphous Replacement (Substitution) protein data. The phase problem is reduced to a sign problem once the anomalous-scatterer or the replacing-heavy-atom sites are located. OASIS applies a direct method procedure to break the phase ambiguity intrinsic to OAS or SIR data.

Generate Patterson Map

The Generate Patterson Map Task performs the following:

Run SCALEIT to find an optimal cutoff for excluding refections with suspiciously large differences
Run FFT PATTERSON in default sectioning mode to get first direction of map sections
Run MAPMASK to resection output map, to produce all necessary Harker sections
Run PEAKMAX to search maps for peaks and write these to the "Peak coord" file and to an HA file (see above)
Plot Harker sections with NPO

Optionally:

The user can give the coordinates of points to be plotted on the Patterson map
The user can give the coordinates of putative heavy atom sites and the VECTORS program is run to determine the predicted cross-vectors which are then plotted on the Patterson map

Excluding Large Intensity Differences

Erroneously large intensity differences can affect a Patterson map disproportionately because the parameter used, the intensity, is the square of the structure factor, and the square of a large number is a very large number. The effect seen in the Patterson map is ridges.

It is therefore usually a good idea to exclude the reflections with very high differences: FPH-FP from the difference Patterson and FPH+-FPH- from the anomalous difference Patterson. By default the Interface will run the SCALEIT program to analyse the data and use the value of 4.1*RMS(FPH-FP) which is a reasonable first estimate of a suitable cutoff. It may be worthwhile to try different cutoff values and look at the resultant Patterson map - the value used can be set at the top of the Exclude Reflections folder. Excluding 'good' reflections tends to degrade the map so it is not good to over-estimate the cutoff value. For very good data it may be unnecessary to exclude any data. The SCALEIT log file also has a table of Isomorphous and (if appropriate) Anomalous differences which show the number of reflections with given differences as a function of resolution shell.

Generate Patterson Map - Task Window Layout

Features to look out for in the Generate Patterson Map Task are:

Protocol option	Folder title	Importance	Comment
difference Patterson	Exclude Reflections	Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. `FPH` and `FP`)	see Excluding Large Intensity Differences
anomalous difference Patterson	Exclude Reflections	Exclude reflections with erroneously large (intensity) differences between F1 and F2 (i.e. `FPH+` and `FPH-`)	see Excluding Large Intensity Differences

See program documentation: SCALEIT, FFT, MAPMASK, PEAKMAX, NPO, VECTORS, HAVECS.

Real Space Patterson Search - RSPS

RSPS is a grid search program that provides search options (to solve heavy atom derivatives) as well as interactive options for examining potential solutions (as a fit of potential sites to the difference Patterson map). All options operate in real and vector space. Searches can be performed to locate either heavy atom positions, or, under certain conditions, to locate the position of molecules with internal (NCS) symmetry. The goal of RSPS is not to generate a complete solution to the heavy atom difference Patterson, but rather to find enough sites to allow initial phases to be calculated for difference Fourier analysis.

Searches are carried out by assigning trial positions on a grid covering the asymmetric unit of the crystal, and then computing a score for each trial position, based on the Patterson densities at the positions corresponding to the predicted vectors for each position. From the symmetry operators (crystallographic and/or non-crystallographic) all unique transformations that map a point in real (crystal) space to a point in vector (Patterson) space are generated. In other words, these transformations map a point in real space to the Patterson vectors associated with that point.

Run MLPHARE

MLPHARE can be used to refine either isomorphous or anomalous data. Check the 'Use anomalous difference data' box at the top of the MLPHARE interface if appropriate. The initial default interface only provides for describing one derivative or wavelength; click on the Add Another Derivative button under the 'MTZ in' section to open space for additional data.

The minimal input then required is some initial heavy atom definitions in the folder Describe Derivatives & Refinement. For each derivative enter a name, and the name of the HA file containing the data for that derivative. Alternatively, enter the atoms explicity by changing the Use data 'from file' menu option to Use data 'entered below' and then typing in the information. The Cut and Paste tool may be useful. For anomalous data you will need to enter the same HA file for each wavelength.

It is possible to edit the HA files 'on line' by clicking the View button on the file selection line. The HA file viewer has some simple editing tools but more complex changes may need to be done in an editor.

The output MTZ file contains columns PHIB_mlphare1, FOM_mlphare1 etc.. If you use this file as input to another MLPHARE run, set a new unique column name extension. Change the parameter 'Output label identifier' from mlphare1 to mlphare2 for instance. Each run of MLPHARE within the Interface also outputs one HA file for each derivative. These HA files can be used as input to the next MLPHARE run.

The SCALEIT documentation states: "MLPHARE has a built in weighting scheme which means that it doesn't do much harm to include less good data in phasing. After all the poor hkl should get low FOMs, and then DM can use the few reflections with reasonable phases to help in the phase extension procedure."

The MLPHARE program documentation has several helpful hints, e.g.: "NB: If an occupancy becomes near to 0.0 the coordinate shifts will possibly be meaningless", and a whole section of Notes on usage.

Suggested input numbers for Estimated Lack of Closure:

The program documentation suggests no input at all for the very first run.
The Interface has default 0.0 for all the numbers, even in the very first run.
Some people 'always' use a certain number (10% of F?!) in the very first run.

Data Harvesting

MLPHARE is one of the Data Harvesting programs. See Data Harvesting in CCP4i for implications for the Interface.

Maps

The MLPHARE interface has the option to output double difference maps which can be used to search for further heavy atoms. In this case the PEAKMAX program will also be run to list the peaks to a PDB file and to an HA file with the name project_jobid_label_peaks.ha where label is the MTZ column label of the derivative FPH. If you wish to do any other analysis on the map, it can be input to the 'Generate Patterson Map' task when the 'Run FFT ...' option at the top of the task window has been toggled off.

It is easiest to create maps by running the FFT task inside the Run Mlphare task. Do this by toggling on the option to 'Generate double difference maps files ...'.

In some cases it may be necessary to (re)create maps independently from the MLPHARE task. It is not possible to do this through the Create Task-Specific Maps task in the Map & Mask Utilities module. And only if you know exactly what you are doing should you attempt to do this through the Run FFT - Create Map task in the Map & Mask Utilities module.

See program documentation: MLPHARE, PEAKMAX, FFT.

See also MIRTutorial(Bath) (the HTML equivalent of $CDOC/Iso_repl_itickle_tut.bath.ps),
Isomorphous Replacement (Birkbeck),
LLNL - Bernhard Rupp's Crystallographic Web Applets (containing an applet which calculates expected anomalous dispersion ratios),
Chooch (a program for calculating Anomalous Scattering Factors from X-ray fluorescence data).

Valid XHTML 1.0! Valid CSS!