SOLVE Examples
This page gives three examples of what SOLVE can do. Each is an automatic structure determination carried out starting with raw data and a minimal amount of information from the user. In each case the top solution was correct and had the correct handedness (anomalous differences were used in each structure determination). The examples are:
For each structure determination, this page shows:
Summary of this structure solution:
This is a dataset with 3-wavelengths of MAD data, 2800 reflections to 2.6 A, 87 amino acids, and 2 selenium sites (Met-1, Met-77). SOLVE found both selenium sites in 6 minutes on an DEC alpha 500 MHz workstation. The Met-1 site has a very high thermal factor.
Solve.setup file listing basic information about the crystals:
CELL 76.08 27.97 42.36 90 103.2 90 ! cell params symfile /usr/local/lib/solve/c2.sym ! space group symmetry resolution 2.6 20.0 ! resolution limits
Input script file used to run SOLVE on gene 5 protein
# command file to run solve on gvp data solve <<*** !command file to read in raw MAD data, scale, analyze and solve it---- checksolve ! compare solution with known h.a. sites comparisonfile gvp.fft ! FFT map using FCALC from refined model @solve.setup ! get our standard information read in logfile mad.logfile ! write out most information to this file. ! summary info will be written to solve.prt nobayes readformatted ! alternatives are readdenzo, readtrek premerged ! alternative is unmerged read_intensities ! alternative is read_amplitudes refscattfactors ! alternative is fixscattfactors mad_atomname se ! anomalously scattering atom is Se lambda 1 ! info on wavelength #1 follows label Wavelength # 1 ! a label for this wavelength rawmadfile test_wva.fmt ! datafile with h k l Intensity sigma or ! h k l I+ sigma+ I- sigma- wavelength 0.9000 ! wavelength value fprimv_mad -1.6 ! f' value at this wavelength fprprv_mad 3.4 ! f doubleprime value at this wavelength ! input refined h.a. coordinates (used only for comparison in "checksolve") atomname se XYZ 0.4813319 0.9972169 9.4140753E-02 atomname se XYZ 0.9731338 0.2875228 0.9446641 lambda 2 rawmadfile test_wvb.fmt wavelength 0.9794 fprimv_mad -8.5 fprprv_mad 4.8 lambda 3 rawmadfile test_wvc.fmt wavelength 0.9797 fprimv_mad -9.85 fprprv_mad 2.86 premerged readformatted nres 100 [approx # of residues in protein molecule] nanomalous 2 [approx # of anomalously scattering atoms per protein] SCALE_MAD ! read in and localscale the data ANALYZE_MAD ! run MADMRG and MADBST and analyze all the Pattersons SOLVE ! Solve the structure ***
Summary information from the "solve.prt" output file produced after completion of the automated structure determination
Selenium atom occupancy, coordinates, and thermal factors, and Cross-validation fouriers calculated with all heavy atoms in all derivs except the site being evaluated and any sites equivalent to it.The end of the solve.status file:(Peak height is height of peak at this position/rms of map) Site x y z occ B -- PEAK HEIGHT -- 1 0.985 0.497 0.094 0.691 50.105 4.01 2 0.030 0.286 0.056 0.365 60.000 3.75Figure of merit versus resolutionDMIN: TOTAL 8.81 5.75 4.55 3.88 3.44 3.12 2.88 2.68 N: 2544 146 224 297 337 380 401 436 323 MEAN FIG MERIT: 0.62 0.74 0.74 0.78 0.69 0.64 0.53 0.51 0.47List of sites analyzed for compatibility with difference Patterson(Height is 1000 x height of peak in Patterson/rms of map. Predicted height is expected height based on occupancy of sites) PEAK X Y Z OPTIMIZED RELATIVE OCCUPANCY 1 0.984 0.500 0.090 88.492 2 0.031 0.292 0.056 32.693 Evaluation of this test soln with 2 sites after optimizing occupancy of each site Cross-vectors for sites 1 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -1.969 0.000 -0.181 15395.6 15661.5 2 Cross-vectors for sites 2 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.953 -0.208 -0.035 3690.03 2893.06 1 2 -1.016 -0.208 -0.146 4402.75 2893.06 1 Cross-vectors for sites 2 and 2 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.063 0.000 -0.111 476.426 2137.67 2 Overall quality of this Patterson soln = 6963.16 (weighted sum of peak heights) Overall quality of the fit to patterson = 1.07189 (agreement of PRED and HEIGHT)The summary of scoring for this solutionSummary of scoring for this solution: -- over many solutions-- -- this solution -- Criteria MEAN SD VALUE Z-SCORE Pattersons: 2.03 1.80 5.19 1.76 Cross-validation Fourier: 4.43 1.02 6.47 1.98 NatFourier CCx100: 26.0 4.42 30.5 1.01 Mean figure of meritx100: 0.000E+00 10.9 61.5 5.62 Correction for Z-scores: -2.05 Overall Z-score value: 8.33Note that the Patterson and cross-validation Fourier scores are 2 sigma above the starting solutions, but native fourier analysis is just 1 sigma above. This is both because the asymmetric unit is small and the map is fairly noisy.
*************************************************************************** SOLVE STATUS 29-dec-98 13:46:31 TIME ELAPSED: 6 MIN --------------------------------------------------------------------------- CURRENT STEP:SOLVE MAIN PROGRAM STATUS: DONE --------------------------------------------------------------------------- --------------------------------------------------------------------------- ---TOP SOLUTION FOUND BY SOLVE (= 0.62; score = 8.33) --- X Y Z OCCUP B HEIGHT/SIGMA 2 0.985 0.497 0.094 0.691 50.1 4.0 2 0.030 0.286 0.056 0.365 60.0 3.7 TIME REQUIRED TO OBTAIN THIS SOLUTION: 6 MIN --------------------------------------------------------------------------- CURRENT RESOLUTION: 2.6 A. FINAL RESOLUTION: 2.6 A.
Armadillo repeat region of beta-catenin (data courtesy of Andy Huber and Bill Weis)
Summary of this structure solution:
This is a dataset with 4 wavelengths of MAD data, 17000 reflections to 2.7 A, 537 amino acids, and 15 selenium sites. SOLVE found 11 selenium sites in 2 hours on a DEC Alpha 500 MHz workstation. The remaining 5 sites (one selenium has 2 positions) are very weak and were not included by SOLVE. (Note: version 1.04 actually found one more site but the overall solution was just about the same quality)
Solve.setup file listing basic information about the beta-catenin crystals:
resolution 2.7 20 symfile /usr/local/lib/solve/c2221.sym cell 64.1 102.0 187.0 90 90 90
Input script file used to run SOLVE on beta-catenin
solve <<*** !command file to read in raw MAD data, scale, analyze and solve it---- title armadillo repeat of beta catenin 4-wavelength MAD data @solve.setup ! get our standard information read in logfile mad.logfile ! write out most information to this file. ! summary info will be written to "solve.prt" readfor unmerg mad_atom se refscattfactors ! do not refine scattering factors (you can if ! you want though) lambda 1 ! info on wavelength #1 follows label Wavelength # 1 ! a label for this wavelength rawmadfile l1.int wavelength 0.9000 ! wavelength value fprimv_mad -1.6 ! f' value at this wavelength fprprv_mad 3.4 ! f" value at this wavelength lambda 2 rawmadfile l2.int wavelength 0.9794 fprimv_mad -11.44 fprprv_mad 8.74 lambda 3 rawmadfile l3.int wavelength 0.9797 fprimv_mad -12.83 fprprv_mad 2.56 lambda 4 rawmadfile l4.int wavelength 0.9897 fprimv_mad -2.42 fprprv_mad 1.13 nres 700 [approx # of residues in protein molecule] nanomalous 15 [approx # of anomalously scattering atoms per protein] acceptance 0.10 SCALE_MAD ! read in and localscale the data ANALYZE_MAD ! run MADMRG and MADBST and analyze all the Pattersons SOLVE ! Solve the structure ***
Summary information from the "solve.prt" output file produced after completion of the automated structure determination of beta-catenin
Selenium atom occupancy, coordinates, and thermal factors, and Cross-validation fouriers calculated with all heavy atoms in all derivs except the site being evaluated and any sites equivalent to it. (Peak height is height of peak at this position/rms of map) Site x y z occ B -- PEAK HEIGHT -- 1 0.056 0.164 0.076 0.959 48.509 22.13 2 0.169 0.385 0.231 0.988 43.723 27.41 3 0.325 0.144 0.162 0.908 60.000 18.21 4 0.577 0.376 0.102 0.679 42.390 23.23 5 0.958 0.259 0.027 0.750 45.770 14.80 6 0.594 0.417 0.119 0.726 31.859 17.94 7 0.133 0.005 0.063 0.723 15.264 21.13 8 0.031 0.272 0.025 0.981 60.000 15.82 9 0.616 0.217 0.127 0.505 15.000 12.76 10 0.713 0.177 0.123 0.348 32.061 10.00 11 0.640 0.272 0.091 0.427 15.000 10.37 Figure of merit versus resolution DMIN: TOTAL 9.09 5.96 4.72 4.03 3.57 3.24 2.99 2.79 N: 17155 946 1466 1815 2122 2386 2623 2798 2999 MEAN FIG MERIT: 0.77 0.87 0.89 0.86 0.81 0.79 0.75 0.71 0.66 List of sites analyzed for compatibility with difference Patterson (Height is 1000 x height of peak in Patterson/rms of map. Predicted height is expected height based on occupancy of sites) Cross-vectors for sites 1 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.111 -0.326 0.500 4571.23 6301.67 2 2 -0.111 0.000 0.347 5552.65 6301.67 2 3 0.000 -0.326 -0.153 4060.29 6301.67 2 Cross-vectors for sites 2 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 0.111 0.222 0.155 6118.04 3635.08 1 2 -0.222 -0.549 0.655 5193.63 3635.08 1 3 -0.222 0.222 0.192 3745.71 3635.08 1 4 0.111 -0.549 -0.308 4992.29 3635.08 1 Cross-vectors for sites 2 and 2 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.333 -0.771 0.500 6818.22 8387.49 2 2 -0.333 0.000 0.037 6247.64 8387.49 2 3 0.000 -0.771 -0.463 5827.51 8387.49 2 (.... etc, for a total of some 200 Patterson vectors)The scoring summary for this solution:
Summary of scoring for this solution: -- over many solutions-- -- this solution -- Criteria MEAN SD VALUE Z-SCORE Pattersons: 4.82 0.668 15.3 15.7 Cross-validation Fourier: 19.4 3.87 146. 32.7 NatFourier CCx100: 11.1 5.43 53.5 7.81 Mean figure of meritx100: 0.000E+00 5.00 70.7 14.1 Correction for Z-scores: -10.0 Overall Z-score value: 60.3
Note that the score for this solution with 11 selenium sites (60.3) is much higher than for the gene 5 protein case with 2 sites (8.3). In general, the more sites there are in the structure, the higher the final score will be. This means that a correct solution might have a low score (if it has few sites) or a high score (if it has many sites).
End of the solve.status file:*************************************************************************** SOLVE STATUS 29-dec-98 22:41:20 DATASET TITLE: armadillo repeat of beta catenin 4-wavelength MAD data TIME ELAPSED: 2 HR --------------------------------------------------------------------------- CURRENT STEP:SOLVE MAIN PROGRAM STATUS: DONE --------------------------------------------------------------------------- --------------------------------------------------------------------------- ---TOP SOLUTION FOUND BY SOLVE (= 0.71; score = 60.26) --- X Y Z OCCUP B HEIGHT/SIGMA 2 0.056 0.164 0.076 0.959 48.5 22.1 2 0.169 0.385 0.231 0.988 43.7 27.4 2 0.325 0.144 0.162 0.908 60.0 18.2 2 0.577 0.376 0.102 0.679 42.4 23.2 2 0.958 0.259 0.027 0.750 45.8 14.8 2 0.594 0.417 0.119 0.726 31.9 17.9 2 0.133 0.005 0.063 0.723 15.3 21.1 2 0.031 0.272 0.025 0.981 60.0 15.8 2 0.616 0.217 0.127 0.505 15.0 12.8 2 0.713 0.177 0.123 0.348 32.1 10.0 2 0.640 0.272 0.091 0.427 15.0 10.4 TIME REQUIRED TO OBTAIN THIS SOLUTION: 2 HR --------------------------------------------------------------------------- CURRENT RESOLUTION: 2.7 A. FINAL RESOLUTION: 2.7 A.
Granulocyte-macrophage colony stimulating factor
Summary of this structure solution:
This is an MIR dataset with 4800 reflections to 3.5 A, 4 derivatives, and 254 amino acids. The data is courtesy of Kay Diederichs. The derivatives are not very good and the overall figure of merit of the structure is only 0.51 to 3.5 A. Using all the data and including anomalous differences, SOLVE took 1 hour to solve this MIR problem on a DEC Alpha 500 MHz workstation.
Solve.setup file listing basic information about the crystals:
cell 47.6 59.1 126.7 90 90 90 symfile p212121.sym resolution 20 3.5
Input script file used to run SOLVE on Granulocyte-macrophage colony stimulating factor
solve <<*** ! solve.com for gmf 7-25-97 ! include known h.a. sites for comparison and fft map as well title gm native + 4 derivatives @solve.setup logfile mir.logfile ! write out most information to this file. comparisonfile gmf.fft ! fft file for comparison checksolve readformatted premerged rawnativefile gmnat.fmt noanorefine derivative 1 !inano label deriv 1 gm18 pcmbs rawderivfile gm18.fmt ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ 0.9035385 0.8977038 0.8083244 ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ 0.4264956 7.3942125E-02 0.8033463 ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ 0.8826839 2.6450584E-02 5.8744203E-02 derivative 2 !inano label deriv 2 gmPt(EtNH2)2Cl2 derivative #40 rawderivfile gm40.fmt ! ----------- ATOM Pt ---------- ATOMNAME Pt XYZ 0.6877714 5.1501989E-03 0.1508604 ! ----------- ATOM Pt ---------- ATOMNAME Pt XYZ 0.3656436 0.7006647 0.1530347 ! ----------- ATOM Pt ---------- ATOMNAME Pt XYZ 0.4678949 0.7330396 9.7665032E-03 derivative 3 !inano label mersalyl acid # 52 rawderivfile gm52.fmt ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ -0.5778725 -0.9445465 -0.1966822 ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ -9.4626509E-02 -9.6770093E-02 -0.1956767 derivative 4 !inano label HgI2 #57 rawderivfile gm57.fmt ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ 0.3547886 0.5863904 0.1859446 ! ----------- ATOM Hg ---------- ATOMNAME Hg XYZ 0.9711263 0.4772473 0.2101191 acceptance 0.35 ! accept new sites with ~35% of height of avg scale_native scale_mir analyze_mir solve ***
Summary information from the "solve.prt" output file produced after completion of the automated structure determination
Selenium atom occupancy, coordinates, and thermal factors, and Cross-validation fouriers calculated with all heavy atoms in all derivs except the site being evaluated and any sites equivalent to it.(Peak height is height of peak at this position/rms of map) Site x y z occ B -- PEAK HEIGHT -- Deriv 1: 1 0.405 0.599 0.191 0.149 60.000 17.80 2 0.924 0.433 0.199 0.103 60.000 12.36 3 0.860 0.100 0.153 0.004 15.000 4.79 Deriv 2: 1 0.846 0.565 0.247 0.242 33.987 6.61 2 0.064 0.979 0.212 0.244 60.000 4.82 3 0.418 0.711 0.154 0.168 52.783 4.95 Deriv 3: 1 0.908 0.443 0.197 0.231 60.000 22.66 2 0.424 0.600 0.195 0.186 60.000 19.56 Deriv 4: 1 0.340 0.588 0.187 0.105 60.000 7.99 2 0.973 0.478 0.211 0.141 60.000 10.12 3 0.388 0.651 0.249 0.015 15.000 5.24Figure of merit versus resolution
DMIN: TOTAL 11.17 7.54 6.04 5.18 4.61 4.19 3.87 3.61 N: 4801 297 418 525 589 663 713 778 818 MEAN FIG MERIT: 0.51 0.64 0.66 0.64 0.55 0.47 0.44 0.44 0.42List of sites analyzed for compatibility with difference Patterson
(Height is 1000 x height of peak in Patterson/rms of map. Predicted height is expected height based on occupancy of sites)
Derivative 1: Cross-vectors for sites 1 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.313 -1.194 0.500 4539.24 4107.37 2 2 -0.813 0.500 0.118 2537.75 4107.37 2 3 0.500 -0.694 -0.382 4848.71 4107.37 2 Cross-vectors for sites 2 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 0.521 -0.167 0.007 4650.46 3557.64 1 2 -0.833 -1.028 0.507 4684.56 3557.64 1 3 -1.333 0.333 0.111 3131.17 3557.64 1 4 1.021 -0.528 -0.389 1961.24 3557.64 1 Cross-vectors for sites 2 and 2 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# Cross-vectors for sites 3 and 1 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 0.458 -0.500 -0.038 4330.50 1516.82 2 2 -0.771 -0.694 0.462 1682.41 758.408 1 3 -1.271 0.000 0.156 776.870 1516.82 2 4 0.958 -0.194 -0.344 294.740 758.408 1 Cross-vectors for sites 3 and 2 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -0.063 -0.333 -0.045 1541.17 1313.81 1 2 -1.292 -0.528 0.455 1081.26 1313.81 1 3 -1.792 0.167 0.149 607.953 1313.81 1 4 0.438 -0.028 -0.351 1389.95 1313.81 1 Cross-vectors for sites 3 and 3 (excluding origin): # U V W HEIGHT PRED HEIGHT SYMM# 1 -1.729 0.500 0.194 -1289.57 560.147 2 2 0.500 0.306 -0.306 43.4893 560.147 2 Total of 4 of 22 patterson peaks used more than once.
(... etc for derivatives 2,3,4). Note that no cross-vectors are listed for site 2 vs site 2. This is because site 2 has the same Harker vectors as site 1 and they are only listed if they are unique.
Summary of scoring for this solution:Summary of scoring for this solution: -- over many solutions-- -- this solution -- Criteria MEAN SD VALUE Z-SCORE Pattersons: 1.57 0.500 4.08 5.03 Cross-validation Fourier: 6.28 3.73 40.0 9.04 NatFourier CCx100: 11.7 6.50 26.8 2.33 Mean figure of meritx100: 0.000E+00 7.62 50.8 6.66 Correction for Z-scores: -2.82 Overall Z-score value: 20.2Tail end of the solve.status file:
*************************************************************************** SOLVE STATUS 29-dec-98 13:40:14 DATASET TITLE: gm native + 4 derivatives TIME ELAPSED: 1 HR --------------------------------------------------------------------------- CURRENT STEP:SOLVE MAIN PROGRAM STATUS: DONE --------------------------------------------------------------------------- --------------------------------------------------------------------------- ---TOP SOLUTION FOUND BY SOLVE (= 0.51; score = 20.24) --- Deriv X Y Z OCCUP B HEIGHT/SIGMA 1 0.405 0.599 0.191 0.149 60.0 17.8 1 0.924 0.433 0.199 0.103 60.0 12.4 1 0.860 0.100 0.153 0.004 15.0 4.8 2 0.846 0.565 0.247 0.242 34.0 6.6 2 0.064 0.979 0.212 0.244 60.0 4.8 2 0.418 0.711 0.154 0.168 52.8 5.0 3 0.908 0.443 0.197 0.231 60.0 22.7 3 0.424 0.600 0.195 0.186 60.0 19.6 4 0.340 0.588 0.187 0.105 60.0 8.0 4 0.973 0.478 0.211 0.141 60.0 10.1 4 0.388 0.651 0.249 0.015 15.0 5.2 TIME REQUIRED TO OBTAIN THIS SOLUTION: 1 HR --------------------------------------------------------------------------- CURRENT RESOLUTION: 3.5 A. FINAL RESOLUTION: 3.5 A.