Data formats for automated structure
determination with SOLVE
Should you merge your data to the asymmetric unit before running SOLVE?
- SOLVE can read unmerged data or data merged to the asymmetric unit.
- PREMERGED data is best if your data is already well scaled
- UNMERGED data is best if your data has not been thoroughly scaled already
Can you input more than one data file for a native, derivative, or wavelength?
- For each native, derivative, or wavelength dataset, you can input one or more separate data files.
- If a dataset has just one data file, just read in the datafile
- If a dataset consists of several data files,
just read them in one after another
You will need to tell SOLVE about your data format:
- if you have DENZO/SCALEPACK output as your raw data...
- ...and the data is NOT MERGED to the asymmetric unit, you will use the flags:
- READDENZO
- UNMERGED
- READ_INTENSITIES
- if the data is ALREADY MERGED to the asymmetric unit, substitute the flag:
- if you have FREE-FORMAT intensities or amplitudes as your raw data...
- ...and the data looks like: H K L I SIGMA, use the flags
- READFORMATTED
- UNMERGED
- READ_INTENSITIES
- if the data looks like: H K L I+ SIGMA+ I- SIGMA-, substitute the flag:
- if you have free-format F(hkl) instead of intensities:
- substitute the flag READ_AMPLITUDES
- if you have a CCP4 MTZ file with amplitudes scaled and reduced
to the asymmetric unit as your raw data...
- You will have to make sure that this mtz file contains only
the data you want and not lots of other columns of data
- Note what you have called your data columns
- The column names that SOLVE will want assigned are:
- MAD data:
- FPH1 (amplitude at wavelength 1)
- SIGFPH1 (sigma of FPH1)
- DPH1 (anomalous difference wavelength 1)
- SIGDPH1 (sigma of DPH1)
- FPH2 (etc for wavelength 2, 3 ...)
- MIR data:
- FP (amplitude for native)
- SIGFP (sigma of FP)
- FPH1 (amplitude for deriv 1)
- SIGFPH1 (sigma of FPH1)
- DPH1 (anomalous difference deriv 1)
- SIGDPH1 (sigma of DPH1)
- FPH2 (etc for derivs 2, 3 ...)
- use the flags LABIN and HKLIN to tell SOLVE how to read
your mtz file. You can use multiple LABIN statements if you
can't fit it all on one line. A sample LABIN
statement where native F is called FP and sigma is SIG and
deriv F is called FHG and sig of deriv F is SIGHG and
anom diff for deriv is called DELHG and its sigma is SIGDELHG
and with an input file of input.mtz is:
- LABIN FP=FP SIGFP=SIG FPH1=FHG SIGFPH1=SIGHG
- LABIN DPH1=DELHG SIGDPH1=SIGDELHG
- HKLIN input.mtz
- NOTE: use uppercase letters (unless your column names
are lowercase) because case matters here
- SOLVE figures out if this is MIR or MAD data based on whether
or not you define FP and SIGFP.
- When SOLVE reads the HKLIN statement it will read in
the file using the information in all previous LABIN
statements. HKLIN can be specified only once in a
solve run.
- You do not need to input cell dimensions or space group if
you use HKLIN. The values read from the mtz file are used
unless you change them with a keyword after the HKLIN statement.
SOLVE writes out a symmetry file in the local directory
based on the symmetry information in the mtz file that
you can use later if you wish. It is named with the space
group name.
- NOTE: remove the SCALE_MAD command from your script file as
your data is assumed to be scaled already
- if you have a CCP4 MTZ file with unmerged
intensities or amplitudes as your raw data...
- use mtzutils to get an mtz file with just h k l I+ sig I- sig
(or amplitudes and sigmas)
- use mtzdump to dump the entire file to an ascii file
- edit the file to delete the first and last few lines of the file which
are not reflections and to replace any occurrences of "?" in
the file with "0.0"
- use the flags:
- READFORMATTED
- PREMERGED
- READ_INTENSITIES (or READ_AMPLITUDES)
- if you have a d*TREK file with intensities as your raw data...
- use the flag READTREK (just one flag needed)