SOLVE home page

 Table of Contents

 Alphabetical Index

SOLVE Examples

This page gives three examples of what SOLVE can do. Each is an automatic structure determination carried out starting with raw data and a minimal amount of information from the user. In each case the top solution was correct and had the correct handedness (anomalous differences were used in each structure determination). The examples are:

For each structure determination, this page shows:

 

Gene 5 protein

Summary of this structure solution:

This is a dataset with 3-wavelengths of MAD data, 2800 reflections to 2.6 A, 87 amino acids, and 2 selenium sites (Met-1, Met-77). SOLVE found both selenium sites in 6 minutes on an DEC alpha 500 MHz workstation. The Met-1 site has a very high thermal factor.

Solve.setup file listing basic information about the crystals:

CELL 76.08 27.97 42.36 90 103.2 90        ! cell params
symfile /usr/local/lib/solve/c2.sym       ! space group symmetry
resolution 2.6 20.0                       ! resolution limits

Input script file used to run SOLVE on gene 5 protein

#  command file to run solve on gvp data
solve <<***
!command file to read in raw MAD data, scale, analyze and solve it----
checksolve                     ! compare solution with known h.a. sites
comparisonfile gvp.fft         ! FFT map using FCALC from refined model
@solve.setup                   ! get our standard information read in 
logfile mad.logfile            ! write out most information to this file.
                               ! summary info will be written to solve.prt
nobayes
readformatted                  ! alternatives are readdenzo, readtrek
premerged                      ! alternative is unmerged
read_intensities               ! alternative is read_amplitudes
refscattfactors                ! alternative is fixscattfactors

mad_atomname se                ! anomalously scattering atom is Se

lambda 1                      ! info on wavelength #1 follows 
label Wavelength #  1         ! a label for this wavelength
rawmadfile test_wva.fmt       ! datafile with h k l Intensity sigma or
                              ! h k l I+ sigma+ I- sigma-
wavelength 0.9000             ! wavelength value
fprimv_mad  -1.6              ! f' value at this wavelength
fprprv_mad  3.4               ! f doubleprime value at this wavelength

! input refined h.a. coordinates (used only for comparison in "checksolve")
atomname se
 XYZ   0.4813319      0.9972169      9.4140753E-02 
atomname se
 XYZ   0.9731338      0.2875228      0.9446641

lambda 2
rawmadfile test_wvb.fmt
wavelength 0.9794
fprimv_mad  -8.5
fprprv_mad  4.8

lambda 3
rawmadfile test_wvc.fmt
wavelength 0.9797
fprimv_mad  -9.85
fprprv_mad  2.86
premerged                  
readformatted
nres 100                  [approx # of residues in protein molecule]
nanomalous 2              [approx # of anomalously scattering atoms per protein]
SCALE_MAD                 ! read in and localscale the data
ANALYZE_MAD               ! run MADMRG and MADBST and analyze all the Pattersons
SOLVE                     ! Solve the structure
***

Summary information from the "solve.prt" output file produced after completion of the automated structure determination

Selenium atom occupancy, coordinates, and thermal factors, and
Cross-validation fouriers calculated with all heavy atoms in
all derivs except the site being evaluated and any sites equivalent to it.

(Peak height is height of peak at this position/rms of map)

  Site    x       y       z      occ       B      -- PEAK  HEIGHT --
    1   0.985   0.497   0.094   0.691  50.105              4.01
    2   0.030   0.286   0.056   0.365  60.000              3.75
Figure of merit versus resolution
 DMIN:           TOTAL    8.81   5.75   4.55   3.88   3.44   3.12   2.88   2.68
 N:                2544    146    224    297    337    380    401    436    323
 MEAN FIG MERIT:   0.62   0.74   0.74   0.78   0.69   0.64   0.53   0.51   0.47
List of sites analyzed for compatibility with difference Patterson
(Height is 1000 x height of peak in Patterson/rms of map.  Predicted height 
is expected height based on occupancy of sites)

   PEAK         X         Y         Z     OPTIMIZED
                                             RELATIVE OCCUPANCY
      1     0.984     0.500     0.090      88.492
      2     0.031     0.292     0.056      32.693

 Evaluation of this test soln with    2 sites after optimizing
 occupancy of each site

 Cross-vectors for sites  1 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -1.969   0.000  -0.181   15395.6     15661.5          2

 Cross-vectors for sites  2 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.953  -0.208  -0.035   3690.03     2893.06          1
   2   -1.016  -0.208  -0.146   4402.75     2893.06          1

 Cross-vectors for sites  2 and  2 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.063   0.000  -0.111   476.426     2137.67          2

Overall quality of this Patterson soln =  6963.16 (weighted sum of peak heights)
Overall quality of the fit to patterson = 1.07189 (agreement of PRED and HEIGHT)
The summary of scoring for this solution
 Summary of scoring for this solution:
                           -- over many solutions--    -- this solution --
 Criteria                       MEAN          SD         VALUE        Z-SCORE
 Pattersons:                   2.03         1.80         5.19         1.76
 Cross-validation Fourier:     4.43         1.02         6.47         1.98
 NatFourier CCx100:            26.0         4.42         30.5         1.01
 Mean figure of meritx100:    0.000E+00     10.9         61.5         5.62
 Correction for Z-scores:                                            -2.05

 Overall Z-score value:                                               8.33

Note that the Patterson and cross-validation Fourier scores are 2 sigma above the starting solutions, but native fourier analysis is just 1 sigma above. This is both because the asymmetric unit is small and the map is fairly noisy.

The end of the solve.status file:
 ***************************************************************************
                    SOLVE STATUS      29-dec-98 13:46:31

 TIME ELAPSED:     6 MIN

 ---------------------------------------------------------------------------
 CURRENT STEP:SOLVE MAIN PROGRAM
 STATUS:   DONE
 ---------------------------------------------------------------------------
 ---------------------------------------------------------------------------
     ---TOP SOLUTION FOUND BY SOLVE  ( = 0.62; score =   8.33) ---

           X        Y        Z         OCCUP     B          HEIGHT/SIGMA

   2     0.985    0.497    0.094     0.691     50.1              4.0
   2     0.030    0.286    0.056     0.365     60.0              3.7

        TIME REQUIRED TO OBTAIN THIS SOLUTION:     6 MIN
 ---------------------------------------------------------------------------
 CURRENT RESOLUTION:   2.6 A.    FINAL RESOLUTION:   2.6 A.

Armadillo repeat region of beta-catenin (data courtesy of Andy Huber and Bill Weis)

Summary of this structure solution:

This is a dataset with 4 wavelengths of MAD data, 17000 reflections to 2.7 A, 537 amino acids, and 15 selenium sites. SOLVE found 11 selenium sites in 2 hours on a DEC Alpha 500 MHz workstation. The remaining 5 sites (one selenium has 2 positions) are very weak and were not included by SOLVE. (Note: version 1.04 actually found one more site but the overall solution was just about the same quality)

Solve.setup file listing basic information about the beta-catenin crystals:

resolution 2.7 20
symfile /usr/local/lib/solve/c2221.sym
cell 64.1 102.0 187.0 90 90 90

Input script file used to run SOLVE on beta-catenin

solve <<***
!command file to read in raw MAD data, scale, analyze and solve it----
title armadillo repeat of beta catenin 4-wavelength MAD data
@solve.setup                   ! get our standard information read in 
logfile mad.logfile            ! write out most information to this file.
                               ! summary info will be written to "solve.prt"
readfor
unmerg
mad_atom se 
refscattfactors               ! do not refine scattering factors (you can if
                              ! you want though)

lambda 1                      ! info on wavelength #1 follows 
label Wavelength #  1         ! a label for this wavelength
rawmadfile l1.int
wavelength 0.9000             ! wavelength value
fprimv_mad  -1.6              ! f' value at this wavelength
fprprv_mad  3.4               ! f" value at this wavelength

lambda 2
rawmadfile l2.int
wavelength 0.9794
fprimv_mad  -11.44
fprprv_mad  8.74

lambda 3
rawmadfile l3.int
wavelength 0.9797
fprimv_mad  -12.83
fprprv_mad  2.56

lambda 4
rawmadfile l4.int
wavelength 0.9897
fprimv_mad -2.42
fprprv_mad 1.13

nres 700                  [approx # of residues in protein molecule]
nanomalous 15              [approx # of anomalously scattering atoms per protein]
acceptance 0.10
SCALE_MAD                 ! read in and localscale the data
ANALYZE_MAD               ! run MADMRG and MADBST and analyze all the Pattersons
SOLVE                     ! Solve the structure
***

Summary information from the "solve.prt" output file produced after completion of the automated structure determination of beta-catenin

Selenium atom occupancy, coordinates, and thermal factors, and
Cross-validation fouriers calculated with all heavy atoms in
all derivs except the site being evaluated and any sites equivalent to it.

(Peak height is height of peak at this position/rms of map)

  Site    x       y       z      occ       B      -- PEAK  HEIGHT --

    1   0.056   0.164   0.076   0.959  48.509             22.13
    2   0.169   0.385   0.231   0.988  43.723             27.41
    3   0.325   0.144   0.162   0.908  60.000             18.21
    4   0.577   0.376   0.102   0.679  42.390             23.23
    5   0.958   0.259   0.027   0.750  45.770             14.80
    6   0.594   0.417   0.119   0.726  31.859             17.94
    7   0.133   0.005   0.063   0.723  15.264             21.13
    8   0.031   0.272   0.025   0.981  60.000             15.82
    9   0.616   0.217   0.127   0.505  15.000             12.76
   10   0.713   0.177   0.123   0.348  32.061             10.00
   11   0.640   0.272   0.091   0.427  15.000             10.37

Figure of merit versus resolution

 DMIN:           TOTAL    9.09   5.96   4.72   4.03   3.57   3.24   2.99   2.79
 N:               17155    946   1466   1815   2122   2386   2623   2798   2999
 MEAN FIG MERIT:   0.77   0.87   0.89   0.86   0.81   0.79   0.75   0.71   0.66

   
List of sites analyzed for compatibility with difference Patterson

(Height is 1000 x height of peak in Patterson/rms of map.  Predicted height 
is expected height based on occupancy of sites)

 Cross-vectors for sites  1 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.111  -0.326   0.500   4571.23     6301.67          2
   2   -0.111   0.000   0.347   5552.65     6301.67          2
   3    0.000  -0.326  -0.153   4060.29     6301.67          2

 Cross-vectors for sites  2 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1    0.111   0.222   0.155   6118.04     3635.08          1
   2   -0.222  -0.549   0.655   5193.63     3635.08          1
   3   -0.222   0.222   0.192   3745.71     3635.08          1
   4    0.111  -0.549  -0.308   4992.29     3635.08          1

 Cross-vectors for sites  2 and  2 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.333  -0.771   0.500   6818.22     8387.49          2
   2   -0.333   0.000   0.037   6247.64     8387.49          2
   3    0.000  -0.771  -0.463   5827.51     8387.49          2

    (.... etc, for a total of some 200 Patterson vectors)
The scoring summary for this solution:
 Summary of scoring for this solution:
                           -- over many solutions--    -- this solution --
 Criteria                       MEAN          SD         VALUE        Z-SCORE
 Pattersons:                   4.82        0.668         15.3         15.7
 Cross-validation Fourier:     19.4         3.87         146.         32.7
 NatFourier CCx100:            11.1         5.43         53.5         7.81
 Mean figure of meritx100:    0.000E+00     5.00         70.7         14.1
 Correction for Z-scores:                                            -10.0

 Overall Z-score value:                                               60.3

Note that the score for this solution with 11 selenium sites (60.3) is much higher than for the gene 5 protein case with 2 sites (8.3). In general, the more sites there are in the structure, the higher the final score will be. This means that a correct solution might have a low score (if it has few sites) or a high score (if it has many sites).

End of the solve.status file:
 ***************************************************************************
                    SOLVE STATUS      29-dec-98 22:41:20

 DATASET TITLE: armadillo repeat of beta catenin 4-wavelength MAD data
 TIME ELAPSED:     2 HR

 ---------------------------------------------------------------------------
 CURRENT STEP:SOLVE MAIN PROGRAM
 STATUS:   DONE
 ---------------------------------------------------------------------------
 ---------------------------------------------------------------------------
     ---TOP SOLUTION FOUND BY SOLVE  ( = 0.71; score =  60.26) ---

           X        Y        Z         OCCUP     B          HEIGHT/SIGMA

   2     0.056    0.164    0.076     0.959     48.5             22.1
   2     0.169    0.385    0.231     0.988     43.7             27.4
   2     0.325    0.144    0.162     0.908     60.0             18.2
   2     0.577    0.376    0.102     0.679     42.4             23.2
   2     0.958    0.259    0.027     0.750     45.8             14.8
   2     0.594    0.417    0.119     0.726     31.9             17.9
   2     0.133    0.005    0.063     0.723     15.3             21.1
   2     0.031    0.272    0.025     0.981     60.0             15.8
   2     0.616    0.217    0.127     0.505     15.0             12.8
   2     0.713    0.177    0.123     0.348     32.1             10.0
   2     0.640    0.272    0.091     0.427     15.0             10.4

        TIME REQUIRED TO OBTAIN THIS SOLUTION:     2 HR
 ---------------------------------------------------------------------------
 CURRENT RESOLUTION:   2.7 A.    FINAL RESOLUTION:   2.7 A.

Granulocyte-macrophage colony stimulating factor

Summary of this structure solution:

This is an MIR dataset with 4800 reflections to 3.5 A, 4 derivatives, and 254 amino acids. The data is courtesy of Kay Diederichs. The derivatives are not very good and the overall figure of merit of the structure is only 0.51 to 3.5 A. Using all the data and including anomalous differences, SOLVE took 1 hour to solve this MIR problem on a DEC Alpha 500 MHz workstation.

 

Solve.setup file listing basic information about the crystals:

cell 47.6 59.1 126.7 90 90 90
symfile p212121.sym
resolution 20 3.5

Input script file used to run SOLVE on Granulocyte-macrophage colony stimulating factor

solve <<***
! solve.com for gmf 7-25-97
! include known h.a. sites for comparison and fft map as well

title gm native + 4 derivatives
@solve.setup
logfile mir.logfile ! write out most information to this file.
comparisonfile gmf.fft    ! fft file for comparison
checksolve
readformatted
premerged
rawnativefile gmnat.fmt
noanorefine
derivative 1
!inano
label deriv 1  gm18 pcmbs
rawderivfile gm18.fmt
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ   0.9035385      0.8977038      0.8083244
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ   0.4264956      7.3942125E-02  0.8033463
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ   0.8826839      2.6450584E-02  5.8744203E-02
derivative 2
!inano
label deriv 2 gmPt(EtNH2)2Cl2 derivative #40
rawderivfile gm40.fmt
 !  ----------- ATOM Pt   ----------
 ATOMNAME Pt
 XYZ   0.6877714      5.1501989E-03  0.1508604
 !  ----------- ATOM Pt   ----------
 ATOMNAME Pt
 XYZ   0.3656436      0.7006647      0.1530347
 !  ----------- ATOM Pt   ----------
 ATOMNAME Pt
 XYZ   0.4678949      0.7330396      9.7665032E-03
derivative 3
!inano
label mersalyl acid # 52
rawderivfile gm52.fmt
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ  -0.5778725     -0.9445465     -0.1966822
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ  -9.4626509E-02 -9.6770093E-02 -0.1956767

derivative 4
!inano
label HgI2 #57
rawderivfile gm57.fmt
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ   0.3547886      0.5863904      0.1859446
 !  ----------- ATOM Hg   ----------
 ATOMNAME Hg
 XYZ   0.9711263      0.4772473      0.2101191

acceptance 0.35       ! accept new sites with ~35% of height of avg
scale_native
scale_mir
analyze_mir
solve
***

Summary information from the "solve.prt" output file produced after completion of the automated structure determination

Selenium atom occupancy, coordinates, and thermal factors, and Cross-validation fouriers calculated with all heavy atoms in all derivs except the site being evaluated and any sites equivalent to it.
(Peak height is height of peak at this position/rms of map)

  Site    x       y       z       occ       B     -- PEAK  HEIGHT --

Deriv 1:
    1   0.405   0.599   0.191   0.149  60.000             17.80
    2   0.924   0.433   0.199   0.103  60.000             12.36
    3   0.860   0.100   0.153   0.004  15.000              4.79

Deriv 2:
    1   0.846   0.565   0.247   0.242  33.987              6.61
    2   0.064   0.979   0.212   0.244  60.000              4.82
    3   0.418   0.711   0.154   0.168  52.783              4.95

Deriv 3:
    1   0.908   0.443   0.197   0.231  60.000             22.66
    2   0.424   0.600   0.195   0.186  60.000             19.56

Deriv 4:
    1   0.340   0.588   0.187   0.105  60.000              7.99
    2   0.973   0.478   0.211   0.141  60.000             10.12
    3   0.388   0.651   0.249   0.015  15.000              5.24

Figure of merit versus resolution
 DMIN:           TOTAL   11.17   7.54   6.04   5.18   4.61   4.19   3.87   3.61
 N:                4801    297    418    525    589    663    713    778    818
 MEAN FIG MERIT:   0.51   0.64   0.66   0.64   0.55   0.47   0.44   0.44   0.42
List of sites analyzed for compatibility with difference Patterson

(Height is 1000 x height of peak in Patterson/rms of map. Predicted height is expected height based on occupancy of sites)

Derivative 1:

 Cross-vectors for sites  1 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.313  -1.194   0.500   4539.24     4107.37          2
   2   -0.813   0.500   0.118   2537.75     4107.37          2
   3    0.500  -0.694  -0.382   4848.71     4107.37          2

 Cross-vectors for sites  2 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1    0.521  -0.167   0.007   4650.46     3557.64          1
   2   -0.833  -1.028   0.507   4684.56     3557.64          1
   3   -1.333   0.333   0.111   3131.17     3557.64          1
   4    1.021  -0.528  -0.389   1961.24     3557.64          1

 Cross-vectors for sites  2 and  2 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#

 Cross-vectors for sites  3 and  1 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1    0.458  -0.500  -0.038   4330.50     1516.82          2
   2   -0.771  -0.694   0.462   1682.41     758.408          1
   3   -1.271   0.000   0.156   776.870     1516.82          2
   4    0.958  -0.194  -0.344   294.740     758.408          1

 Cross-vectors for sites  3 and  2 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -0.063  -0.333  -0.045   1541.17     1313.81          1
   2   -1.292  -0.528   0.455   1081.26     1313.81          1
   3   -1.792   0.167   0.149   607.953     1313.81          1
   4    0.438  -0.028  -0.351   1389.95     1313.81          1

 Cross-vectors for sites  3 and  3 (excluding origin):
  #      U        V      W      HEIGHT   PRED HEIGHT      SYMM#
   1   -1.729   0.500   0.194  -1289.57     560.147          2
   2    0.500   0.306  -0.306   43.4893     560.147          2
 Total of            4 of           22 patterson peaks used more than once.

 

(... etc for derivatives 2,3,4). Note that no cross-vectors are listed for site 2 vs site 2. This is because site 2 has the same Harker vectors as site 1 and they are only listed if they are unique.

Summary of scoring for this solution:
 Summary of scoring for this solution:
                           -- over many solutions--    -- this solution --
 Criteria                       MEAN          SD         VALUE        Z-SCORE
 Pattersons:                   1.57        0.500         4.08         5.03
 Cross-validation Fourier:     6.28         3.73         40.0         9.04
 NatFourier CCx100:            11.7         6.50         26.8         2.33
 Mean figure of meritx100:    0.000E+00     7.62         50.8         6.66
 Correction for Z-scores:                                            -2.82

 Overall Z-score value:                                               20.2

Tail end of the solve.status file:
 ***************************************************************************
                    SOLVE STATUS      29-dec-98 13:40:14

 DATASET TITLE: gm native + 4 derivatives
 TIME ELAPSED:     1 HR

 ---------------------------------------------------------------------------
 CURRENT STEP:SOLVE MAIN PROGRAM
 STATUS:   DONE
 ---------------------------------------------------------------------------
 ---------------------------------------------------------------------------
     ---TOP SOLUTION FOUND BY SOLVE  ( = 0.51; score =  20.24) ---

 Deriv     X        Y        Z         OCCUP     B          HEIGHT/SIGMA


   1     0.405    0.599    0.191     0.149     60.0             17.8
   1     0.924    0.433    0.199     0.103     60.0             12.4
   1     0.860    0.100    0.153     0.004     15.0              4.8

   2     0.846    0.565    0.247     0.242     34.0              6.6
   2     0.064    0.979    0.212     0.244     60.0              4.8
   2     0.418    0.711    0.154     0.168     52.8              5.0

   3     0.908    0.443    0.197     0.231     60.0             22.7
   3     0.424    0.600    0.195     0.186     60.0             19.6

   4     0.340    0.588    0.187     0.105     60.0              8.0
   4     0.973    0.478    0.211     0.141     60.0             10.1
   4     0.388    0.651    0.249     0.015     15.0              5.2

        TIME REQUIRED TO OBTAIN THIS SOLUTION:     1 HR
 ---------------------------------------------------------------------------
 CURRENT RESOLUTION:   3.5 A.    FINAL RESOLUTION:   3.5 A.