Setting Things Up With arp_warp

Next: Using Molecular Replacement Solutions Up: Using ARP/wARP Previous: Killing Jobs

Setting Things Up With arp_warp_setup.sh

This is the most important script on which all the others rely. It consists of a series of relatively simple questions about your protein and about how you would like to run the refinement. Based on your answers and package defaults a parameter file called warp.par is set up. This file is then fed into all subsequent applications.

To get an impression there follows a typical example a arp_warp_setup.sh script session.

For this example, which is with data provided by Martin Walsh, we have a set of amplitudes and phases from a MAD experiment, extending to 2.0 Å resolution, and a native dataset extending to 1.7 Å. We want to run multiple model averaging first to get a new improved map and then use this map for auto-building and refinement.

% arp_warp_setup.sh =========================================================================== This is the setup procedure for ARP/wARP version 5.0 Applications (see documentation for details) ===========================================================================

1:Refinement of MR solutions (arp_molrep.sh)

2:Improvement of MIRAS,MAD, etc phases (warp.sh)

3:Averaging of multiple refinement (warp.sh)

4:Automatic tracing of density map and model building (warpNtrace.sh)

5:Building of the solvent structure (arp_solvent.sh)

6:Ab initio structure for metalloproteins (warp_solve.sh)

Enter the name of the mtz file:test.mtz -------------------- Here are the data included in that file. OVERALL FILE STATISTICS for resolution range 0.000 - 0.346 =======================

Col Sort Min Max Num   %    Mean Mean Resolution Type Column

num order Missing complete abs. Low   High label

1 ASC 0 36 0 100.00 13.6 13.6 45.01 1.70 H H

2 NONE 0 38 0 100.00 14.1 14.1 45.01 1.70 H K

3 NONE 0 43 0 100.00 16.2 16.2 45.01 1.70 H L

4 NONE 6.3 1068.6 9890 41.71 162.59 162.59 45.01 1.70 F FSE1

5 NONE 0.8 24.3 9890 41.71 4.09 4.09 45.01 1.70 Q SIGFSE1

6 NONE 0.0 360.0 9796 42.26 171.93 171.93 45.01 1.70 P PHIB_123p

7 NONE 0.000 1.000 9796 42.26 0.560 0.560 45.01 1.70 W FOM_123p

8 NONE 0.0 360.0 9674 42.98 169.49 169.49 45.01 1.70 P PHIDM_123p

9 NONE 0.000 1.000 9674 42.98 0.736 0.736 45.01 1.70 W FOMDM_123p

10 NONE 1.3 930.8 1567 90.76 109.03 109.03 45.01 1.70 F F17

11 NONE 0.6 36.2 1567 90.76 3.15 3.15 45.01 1.70 Q SIGF17

12 NONE 9.9 1048.5 5126 69.79 152.04 152.04 45.01 1.70 F FP

13 NONE 0.9 46.8 5126 69.79 3.85 3.85 45.01 1.70 Q SIGFP

14 NONE 0.0 19.0 0 100.00 9.51 9.51 45.01 1.70 I FreeR_flag

No. of reflections used in FILE STATISTICS    16967 -----------------------
A nice report of all the file's contents has appeared! Let's go on.

You need to specify some labels from above Native data amplitude: F17 Native data sigma amplitude: SIGF17

Double-click with the left mouse button to the relevant label, paste it by pressing the middle mouse button and then press 'Enter'. This will help avoiding typos. As native we choose the 1.7 Å data, these are columns labeled as F17 and SIGF17.

Let's go on further.

Now enter the size of the protein in RESIDUES / AU: 145 Protein size estimated at about 1117 atoms Average B factor from Wilson Plot estimated to be 31
An important point to say is: If your Wilson plot looks funny or you are missing too many strong low resolution reflections, the ARP/wARP originators have a very good suggestion: GO BACK TO THE LAB, GROW CRYSTALS AND MEASURE THE DATA AGAIN, BUT PROPERLY! In the end it will be much faster and far more efficient than playing around with these crappy data.

Do you plan to use experimental phases as input (applications #2,#3 and #4) (Y/N) ? Y Amplitude (weighted) for initial map calculation: FSE1 Phase for initial map calculation: PHIDM_123p FOM. Press <Enter> if amplitude is already weighted : FOMDM_123p
Since we want to start from experimental phases the answer was Y. Answering with N means that you are interested in either starting from a molecular replacement solution (#1) , building the solvent of a refined structure (#5) or trying the ab initio option (#6).

Do you want to use multiple models averaging (application #3) (Y/N) ? Y Do you want single free atom model density modification (application #2) (Y/N) ? Y How many models do you plan to use for averaging ? 6 You will be now asked how many processors you can use at the SAME time for running arp/warp jobs. Remember that these machines should share a common home directory. If you are not sure of what you are doing please consult the local System manager. How many processors can you use simultaneously ? 3 Processor 1 is in a machine named: reggae Machine reggae is OK. Processor 2 is in a machine named: reggae Machine reggae is OK. Processor 3 is in a machine named: reggae Machine reggae is OK.
The initial question was if you you would like to use multiple models for the density modification (#3). The use of multiple model is extensively described in wARP97. Given the power of maximum likelihood refinement we recommend to exploit this (time consuming) option if your data are worse than 1.6 - 1.8 Å, which is not very unlikely to be honest. Single unrestrained ARP jobs (#2) are perfectly reasonable provided your data are higher than 1.8 Å. Anyhow, here the answer was Y, since the data were extending to 1.7 Å. If you answer N, the setup assumes that you will be using a single model and goes on to 'advanced parameters' (see below).

Since the answer was Y, you are asked to provide some details for how many models you want to use, the machine names, how many cycles of wARP you would like to run, etc. In this case we have decided to average 6 models. Averaging 2 models is pointless. Averaging of 3, 4 or 5 models is possible but not recommended, 6 is a much better number. Then you are asked how many processors you can use. Suppose you have a 4-processor machine. Using all 4 is not very wise and may be impolite to others. Since we request 6 refinement runs the script will first run 4 of them and then the remaining 2. In total this would take 2 'job cycles'. If we choose to use 3 processors, the script will run 3+3 jobs while leaving the fourth processor free for something else.

wARP makes 3 big iterations before averaging You will be asked how many refinement cycles you want in each Typical values are 10-30, read the manual for details How many refinement cycles for 1st wARP iteration ? 15 How many refinement cycles for 2nd wARP iteration ? 15 How many refinement cycles for 3rd wARP iteration ? 15
We now have to decide how many cycles we need per iteration. In one iteration each model is refined with unrestrained ARP . After it has finished it rejects lots of bad atoms, limits B factors and randomises coordinates a bit, to escape from local minima. The higher the resolution, the less cycles you need. In the last iteration before averaging it is recommended to use a few more cycles to let the models converge a bit better.

Do you like to set advanced parameters yourself (Y/N) ? N
Advanced users are recommended to customise the advanced parameters. These are parameters are REFMAC specific (see CCP4 documentation), if you don't set them up then standard default values will be chosen, they should work well but maybe not optimally. Before setting up advanced parameters on your own, please at least make sure you understand the following points, otherwise don't bother.

If resolution is lower than 2.0 Å, minimisation with low DAMPING may work better.
The lower the resolution, the lower the DAMPING should be (defaults are OK). CGMAT should perform as well as CDIR. But CGMAT often starts hanging. CDIR seems a bit more robust.
If the starting phases are very poor, fixing bulk scaling is a brilliant idea.
At resolution lower than 2.0 Å it is worth using the 'phased' REFMAC refinement.
If you insist on using Rfree feel free to do so, it's an option. It won't really help but it doesn't hurt either, if you have enough reflections!

If you don't ask for the advanced parameter setup, here is what you get:

You can choose between five protocols: F: A fast protocol that works with good data (AP/VL favourite). S: A considerably slower one which might work better in difficult cases. R: The slow protocol together with Rfree (EJD favourite). H: Optimised parameters for starting from heavy atoms alone. W: Water building optimal parameters What is your choice ? (F/S/R/H/W) F Advanced parameters set to default values for mode F
These questions are basically specific to the use of REFMAC.

The Fast protocol will setup the job so as to run CGMAT minimisation applying full shifts to all atoms. It will not use an Rfree factor for monitoring refinement progress. Lots of people like using Rfree (and in general they do well to do so!) and you are right to get suspicious if this is not done: BUT THIS IS FREE ATOMS REFINEMENT. There is no real point using Rfree in free atoms refinement. The ARP/wARP authors had this argument with many wARP papers referees and managed to convince a few. Just to clarify things: The ARP/wARP authors believe that Rfree is essential for a restrained model refinement to validate the protocol (unless the protocol has already been proven to be valid under the conditions used). However if no geometry is present there is certainly no danger of over-weighting or down-weighting X-ray data against geometry terms, which is what basically Rfree tells you ...

The Slow protocol, emerged after a long, constructive (and still standing) discussion with Eleanor Dodson (EJD) and Garib Murshudov (GM). In lots of cases it can be preferred. To be honest it tends to be my favourite lately (AP). The job will run CDIR minimisation applying 0.3 of calculated shifts. It will also run 4 internal REFMAC cycles before the model is updated by ARP . The only difference from what EJD and GM recommend is the absence of a free R factor.

The Rfree protocol, is the slow one plus usage of an Rfree, not only to be used as a test set, but most important for calculating $\sigma_A$ weights based on the free set. Although theoretically more sound it often fails with very bad starting models. But, it is worth a try ...No need to mention that this is EJD's favourite.

The Heavy protocol is optimised for starting from very few atoms. It runs lots of REFMAC cycles, fixes solvent scaling parameters, etc. We must say, that we do not have much experience with it. The parameters chosen will work with rubredoxin. If you have some high resolution data on a metalloprotein and this protocol does not work, we strongly encourage you to contact us.

The Water protocol is basically the same as Rfree one, i.e. it DOES use Rfree, since it refers to serious model building and you should make sure you monitor Rfree to see if you are doing anything sensible. It is also assumed that the model is in a good state, thus it applies full shifts (DAMP 1.0) and also does only two cycles in reciprocal space.

We now get back to finishing arp_warp_setup.sh. The next questions are

Would you like to setup restrained arp/warp (applications 1,4,5) (Y/N) ? Y How many total cycles of restrained arp/warp (applications 1,4,5)? 100 How many refinement cycles between rebuilding (application 4 only) ? 10 How many molecules per asymmetric unit (application 4 only) ? 1 A proper weight must be set for Xray/geometry contributions. Matrix suggested Default 0.5 -Decrease to tighten geometry -Increase to increase X-ray terms contribution Enter an appropriate number (Enter for default)
The first parameter (total cycles) refers to any of arp_molrep, warpNtrace or arp_solvent (applications #1,#4,#5) and is the total number of cycles to be run. in any of these restrained ARP/wARP applications. The following two questions refer exclusively to the newest ARP/wARP mode, warpNtrace. The first one is how often rebuilding is taking place in warpNtrace, while the centre of gravity is the approximate centre of gravity for a 'real' molecule. As for the last parameter, it is the weight between the X-ray terms and geometry as is explained. The default will do - for most of the cases.

The setup script has now finished and if nothing went too wrong there should now be a file named warp.par in your directory that looks like this:
set datafile = /full_path/test.mtz set fp = F17 set sigfp = SIGF17 set fbest = F17 set phibest = PHIDM_123p set fom = FOMDM_123p set protsize = 1117 set wilsonb = 31 set models = 6 set procs = 3 set machine1 = reggae set machine2 = reggae set machine3 = reggae set machine4 = DUMMY set machine5 = DUMMY set machine6 = DUMMY set rc1 = 15 set rc2 = 15 set rc3 = 20 set sym = 20 set cell = ' 62.59 64.76 74.30 90.00 90.00 90.00 ' set xyzlim = ' 0 0.5 0 0.5 0 0.5 ' set refmeth = CDIR set freer = N set freerml = N set damp = '0.99 0.99' set rrcyc = 2 set bulkls = 'SBULk -0.75 BBULk 150' set bulkml = 'SBULk -0.25 BBULk 80' set fixbulk = ' $\backslash$ #' set scale = BULK set eresol = 2.5 set phaseres = N set phasblur = UNK set refmax = MLKF set restrcyc = 100 set restrref = 10 set cgr = ' 25 25 -8 ' set resol = ' 20 1.700 ' set wmat = 0.5
If you want to change some of the parameters without going through the setup once again - just edit the file, but make sure you know what you are doing!

Next: Using Molecular Replacement Solutions Up: Using ARP/wARP Previous: Killing Jobs

VL AP RM
1998-09-03

`1:Refinement of MR solutions`	`(arp_molrep.sh)`
`2:Improvement of MIRAS,MAD, etc phases`	`(warp.sh)`
`3:Averaging of multiple refinement`	`(warp.sh)`
`4:Automatic tracing of density map and model building`	`(warpNtrace.sh)`
`5:Building of the solvent structure`	`(arp_solvent.sh)`
`6:Ab initio structure for metalloproteins`	`(warp_solve.sh)`

`Col`	`Sort`	`Min`	`Max`	`Num`	`%`	`Mean`	`Mean`	`Resolution`	`Type`	`Column`
`num`	`order`			`Missing`	`complete`		`abs.`	`Low High`		`label`

`1`	`ASC`	`0`	`36`	`0`	`100.00`	`13.6`	`13.6`	`45.01 1.70`	`H`	`H`
`2`	`NONE`	`0`	`38`	`0`	`100.00`	`14.1`	`14.1`	`45.01 1.70`	`H`	`K`
`3`	`NONE`	`0`	`43`	`0`	`100.00`	`16.2`	`16.2`	`45.01 1.70`	`H`	`L`
`4`	`NONE`	`6.3`	`1068.6`	`9890`	`41.71`	`162.59`	`162.59`	`45.01 1.70`	`F`	`FSE1`
`5`	`NONE`	`0.8`	`24.3`	`9890`	`41.71`	`4.09`	`4.09`	`45.01 1.70`	`Q`	`SIGFSE1`
`6`	`NONE`	`0.0`	`360.0`	`9796`	`42.26`	`171.93`	`171.93`	`45.01 1.70`	`P`	`PHIB_123p`
`7`	`NONE`	`0.000`	`1.000`	`9796`	`42.26`	`0.560`	`0.560`	`45.01 1.70`	`W`	`FOM_123p`
`8`	`NONE`	`0.0`	`360.0`	`9674`	`42.98`	`169.49`	`169.49`	`45.01 1.70`	`P`	`PHIDM_123p`
`9`	`NONE`	`0.000`	`1.000`	`9674`	`42.98`	`0.736`	`0.736`	`45.01 1.70`	`W`	`FOMDM_123p`
`10`	`NONE`	`1.3`	`930.8`	`1567`	`90.76`	`109.03`	`109.03`	`45.01 1.70`	`F`	`F17`
`11`	`NONE`	`0.6`	`36.2`	`1567`	`90.76`	`3.15`	`3.15`	`45.01 1.70`	`Q`	`SIGF17`
`12`	`NONE`	`9.9`	`1048.5`	`5126`	`69.79`	`152.04`	`152.04`	`45.01 1.70`	`F`	`FP`
`13`	`NONE`	`0.9`	`46.8`	`5126`	`69.79`	`3.85`	`3.85`	`45.01 1.70`	`Q`	`SIGFP`
`14`	`NONE`	`0.0`	`19.0`	`0`	`100.00`	`9.51`	`9.51`	`45.01 1.70`	`I`	`FreeR_flag`