Data formats for SOLVE

Contents

Index

Data formats for automated structure determination with SOLVE

Should you merge your data to the asymmetric unit before running SOLVE?

SOLVE can read unmerged data or data merged to the asymmetric unit.
- PREMERGED data is best if your data is already well scaled
- UNMERGED data is best if your data has not been thoroughly scaled already

Can you input more than one data file for a native, derivative, or wavelength?

For each native, derivative, or wavelength dataset, you can input one or more separate data files.
- If a dataset has just one data file, just read in the datafile
- If a dataset consists of several data files, just read them in one after another

You will need to tell SOLVE about your data format:

if you have DENZO/SCALEPACK output as your raw data...
- ...and the data is NOT MERGED to the asymmetric unit, you will use the flags:
  - READDENZO
  - UNMERGED
  - READ_INTENSITIES
- if the data is ALREADY MERGED to the asymmetric unit, substitute the flag:
  - PREMERGED
if you have FREE-FORMAT intensities or amplitudes as your raw data...
- ...and the data looks like: H K L I SIGMA, use the flags
  - READFORMATTED
  - UNMERGED
  - READ_INTENSITIES
- if the data looks like: H K L I+ SIGMA+ I- SIGMA-, substitute the flag:
  - PREMERGED
- if you have free-format F(hkl) instead of intensities:
  - substitute the flag READ_AMPLITUDES
if you have a CCP4 MTZ file with amplitudes scaled and reduced to the asymmetric unit as your raw data...
- You will have to make sure that this mtz file contains only the data you want and not lots of other columns of data
- Note what you have called your data columns
- The column names that SOLVE will want assigned are:
  - MAD data:
    - FPH1 (amplitude at wavelength 1)
    - SIGFPH1 (sigma of FPH1)
    - DPH1 (anomalous difference wavelength 1)
    - SIGDPH1 (sigma of DPH1)
    - FPH2 (etc for wavelength 2, 3 ...)
  - MIR data:
    - FP (amplitude for native)
    - SIGFP (sigma of FP)
    - FPH1 (amplitude for deriv 1)
    - SIGFPH1 (sigma of FPH1)
    - DPH1 (anomalous difference deriv 1)
    - SIGDPH1 (sigma of DPH1)
    - FPH2 (etc for derivs 2, 3 ...)
- use the flags LABIN and HKLIN to tell SOLVE how to read your mtz file. You can use multiple LABIN statements if you can't fit it all on one line. A sample LABIN statement where native F is called FP and sigma is SIG and deriv F is called FHG and sig of deriv F is SIGHG and anom diff for deriv is called DELHG and its sigma is SIGDELHG and with an input file of input.mtz is:
  - LABIN FP=FP SIGFP=SIG FPH1=FHG SIGFPH1=SIGHG
  - LABIN DPH1=DELHG SIGDPH1=SIGDELHG
  - HKLIN input.mtz
  - NOTE: use uppercase letters (unless your column names are lowercase) because case matters here
- SOLVE figures out if this is MIR or MAD data based on whether or not you define FP and SIGFP.
- When SOLVE reads the HKLIN statement it will read in the file using the information in all previous LABIN statements. HKLIN can be specified only once in a solve run.
- You do not need to input cell dimensions or space group if you use HKLIN. The values read from the mtz file are used unless you change them with a keyword after the HKLIN statement. SOLVE writes out a symmetry file in the local directory based on the symmetry information in the mtz file that you can use later if you wish. It is named with the space group name.
- NOTE: remove the SCALE_MAD command from your script file as your data is assumed to be scaled already
if you have a CCP4 MTZ file with unmerged intensities or amplitudes as your raw data...
- use mtzutils to get an mtz file with just h k l I+ sig I- sig (or amplitudes and sigmas)
- use mtzdump to dump the entire file to an ascii file
- edit the file to delete the first and last few lines of the file which are not reflections and to replace any occurrences of "?" in the file with "0.0"
- use the flags:
  - READFORMATTED
  - PREMERGED
  - READ_INTENSITIES (or READ_AMPLITUDES)
if you have a d*TREK file with intensities as your raw data...
- use the flag READTREK (just one flag needed)