To get an impression there follows a typical
example a arp_warp_setup.sh script session.
For this example, which is with data provided by Martin Walsh, we have a set of amplitudes and phases from a MAD experiment, extending to 2.0 Å resolution, and a native dataset extending to 1.7 Å. We want to run multiple model averaging first to get a new improved map and then use this map for auto-building and refinement.
% arp_warp_setup.sh
===========================================================================
This is the setup procedure for ARP/wARP version 5.0
Applications (see documentation for details)
===========================================================================
1:Refinement of MR solutions | (arp_molrep.sh) |
2:Improvement of MIRAS,MAD, etc phases | (warp.sh) |
3:Averaging of multiple refinement | (warp.sh) |
4:Automatic tracing of density map and model building | (warpNtrace.sh) |
5:Building of the solvent structure | (arp_solvent.sh) |
6:Ab initio structure for metalloproteins | (warp_solve.sh) |
Col | Sort | Min | Max | Num | % | Mean | Mean | Resolution | Type | Column | |
num | order | Missing | complete | abs. | Low High | label | |||||
1 | ASC | 0 | 36 | 0 | 100.00 | 13.6 | 13.6 | 45.01 1.70 | H | H | |
2 | NONE | 0 | 38 | 0 | 100.00 | 14.1 | 14.1 | 45.01 1.70 | H | K | |
3 | NONE | 0 | 43 | 0 | 100.00 | 16.2 | 16.2 | 45.01 1.70 | H | L | |
4 | NONE | 6.3 | 1068.6 | 9890 | 41.71 | 162.59 | 162.59 | 45.01 1.70 | F | FSE1 | |
5 | NONE | 0.8 | 24.3 | 9890 | 41.71 | 4.09 | 4.09 | 45.01 1.70 | Q | SIGFSE1 | |
6 | NONE | 0.0 | 360.0 | 9796 | 42.26 | 171.93 | 171.93 | 45.01 1.70 | P | PHIB_123p | |
7 | NONE | 0.000 | 1.000 | 9796 | 42.26 | 0.560 | 0.560 | 45.01 1.70 | W | FOM_123p | |
8 | NONE | 0.0 | 360.0 | 9674 | 42.98 | 169.49 | 169.49 | 45.01 1.70 | P | PHIDM_123p | |
9 | NONE | 0.000 | 1.000 | 9674 | 42.98 | 0.736 | 0.736 | 45.01 1.70 | W | FOMDM_123p | |
10 | NONE | 1.3 | 930.8 | 1567 | 90.76 | 109.03 | 109.03 | 45.01 1.70 | F | F17 | |
11 | NONE | 0.6 | 36.2 | 1567 | 90.76 | 3.15 | 3.15 | 45.01 1.70 | Q | SIGF17 | |
12 | NONE | 9.9 | 1048.5 | 5126 | 69.79 | 152.04 | 152.04 | 45.01 1.70 | F | FP | |
13 | NONE | 0.9 | 46.8 | 5126 | 69.79 | 3.85 | 3.85 | 45.01 1.70 | Q | SIGFP | |
14 | NONE | 0.0 | 19.0 | 0 | 100.00 | 9.51 | 9.51 | 45.01 1.70 | I | FreeR_flag |
Double-click with the left mouse button to the relevant label, paste it by pressing the middle mouse
button and then press 'Enter'. This will help avoiding typos. As native we choose the 1.7 Å data,
these are columns labeled as F17 and SIGF17.
Let's go on further.
Now enter the size of the protein in RESIDUES / AU: 145
Protein size estimated at about 1117 atoms
Average B factor from Wilson Plot estimated to be 31
An important point to say is: If your Wilson plot looks funny or you are missing
too many strong low resolution reflections, the ARP/wARP originators have a very
good suggestion: GO BACK TO THE LAB, GROW CRYSTALS AND MEASURE THE DATA AGAIN, BUT PROPERLY!
In the end it will be much faster and far more efficient than playing around with
these crappy data.
Do you plan to use experimental phases as input (applications #2,#3 and #4) (Y/N) ? Y
Amplitude (weighted) for initial map calculation: FSE1
Phase for initial map calculation: PHIDM_123p
FOM. Press <Enter> if amplitude is already weighted : FOMDM_123p
Since we want to start from experimental phases
the answer was Y. Answering with N means that
you are interested in either starting from a molecular replacement solution (#1) ,
building the solvent of a refined structure (#5) or trying the ab initio option (#6).
Do you want to use multiple models averaging (application #3) (Y/N) ? Y
Do you want single free atom model density modification (application #2) (Y/N) ? Y
How many models do you plan to use for averaging ? 6
You will be now asked how many processors you can use at the SAME time
for running arp/warp jobs. Remember that these machines should share
a common home directory.
If you are not sure of what you are doing please consult the local System manager.
How many processors can you use simultaneously ? 3
Processor 1 is in a machine named: reggae
Machine reggae is OK.
Processor 2 is in a machine named: reggae
Machine reggae is OK.
Processor 3 is in a machine named: reggae
Machine reggae is OK.
The initial question was if you you would like to use multiple models for
the density modification (#3).
The use of multiple model is extensively described
in wARP97. Given the power of maximum likelihood refinement we recommend to exploit this
(time consuming) option if your data are worse than 1.6 - 1.8 Å, which is not
very unlikely to be honest.
Single unrestrained ARP jobs (#2) are perfectly reasonable provided your
data are higher than 1.8 Å.
Anyhow, here the answer was Y,
since the data were extending to 1.7 Å.
If you answer N, the setup assumes that you will be using a single model and goes
on to 'advanced parameters' (see below).
Since the answer was Y,
you are asked to provide some details for how many models you want to use,
the machine names, how many cycles of wARP you would like to run, etc.
In this case we have decided to average 6 models. Averaging 2 models is pointless.
Averaging of 3, 4 or 5 models is possible but not recommended, 6 is a much better number.
Then you are asked how many processors you can use. Suppose you have a 4-processor machine.
Using all 4 is not very wise and may be impolite to others.
Since we request 6 refinement runs the script will first run 4 of them and then the remaining 2.
In total this would take 2 'job cycles'. If we choose to use 3 processors, the script will
run 3+3 jobs while leaving the fourth processor free for something else.
wARP makes 3 big iterations before averaging
You will be asked how many refinement cycles you want in each
Typical values are 10-30, read the manual for details
How many refinement cycles for 1st wARP iteration ? 15
How many refinement cycles for 2nd wARP iteration ? 15
How many refinement cycles for 3rd wARP iteration ? 15
We now have to decide how many cycles we need per iteration. In one iteration each model
is refined with unrestrained ARP . After it has finished it rejects lots of bad atoms,
limits B factors and randomises coordinates a bit, to escape from local minima.
The higher the resolution, the less cycles you need. In the last iteration before averaging
it is recommended to use a few more cycles to let the models converge a bit better.
Do you like to set advanced parameters yourself (Y/N) ? N
Advanced users are recommended to customise the advanced parameters.
These are parameters are REFMAC specific (see CCP4 documentation), if
you don't set them up then standard default values will be chosen, they should
work well but maybe not optimally.
Before setting up advanced parameters on your own, please at least make sure you
understand the following points, otherwise don't bother.
The Fast protocol will setup the job so as to run CGMAT minimisation applying
full shifts to all atoms. It will not use an Rfree factor for monitoring
refinement progress. Lots of people like using Rfree (and in general they
do well to do so!) and you are right to get suspicious if this is not done:
BUT THIS IS FREE ATOMS REFINEMENT. There is no real point using Rfree in
free atoms refinement. The ARP/wARP authors had this argument with many wARP papers referees
and managed to convince a few. Just to clarify things:
The ARP/wARP authors believe that Rfree is essential for a restrained model refinement
to validate the protocol (unless the protocol has already been proven to be valid under
the conditions used). However if no geometry is present there is certainly no danger of
over-weighting or down-weighting X-ray data against geometry terms,
which is what basically Rfree tells you ...
The Slow protocol, emerged after a long, constructive (and still standing) discussion
with Eleanor Dodson (EJD) and Garib Murshudov (GM). In lots of cases it can be preferred.
To be honest it tends to be my favourite lately (AP).
The job will run CDIR minimisation applying 0.3 of calculated shifts.
It will also run 4 internal REFMAC cycles before the model is updated by ARP .
The only difference from what EJD and GM recommend is the absence of a free R factor.
The Rfree protocol, is the slow one plus usage of
an Rfree, not only to be used as a test set, but
most important for calculating weights based on the free set.
Although theoretically more sound it often fails with very bad starting models.
But, it is worth a try ...No need to mention that this is EJD's favourite.
The Heavy protocol is optimised for starting from very few atoms.
It runs lots of REFMAC cycles, fixes solvent scaling parameters, etc.
We must say, that we do not have much experience with it. The parameters chosen will
work with rubredoxin.
If you have some high resolution data on a metalloprotein and this protocol does not work,
we strongly encourage you to contact us.
The Water protocol is basically the same as Rfree one,
i.e. it DOES use Rfree, since it refers
to serious model building and you should make sure you monitor Rfree to see if you are
doing anything sensible. It is also assumed that the model is in a good state, thus
it applies full shifts (DAMP 1.0) and also does only two cycles in reciprocal space.
We now get back to finishing arp_warp_setup.sh. The next questions are
Would you like to setup restrained arp/warp (applications 1,4,5) (Y/N) ? Y
How many total cycles of restrained arp/warp (applications 1,4,5)? 100
How many refinement cycles between rebuilding (application 4 only) ? 10
How many molecules per asymmetric unit (application 4 only) ? 1
A proper weight must be set for Xray/geometry contributions.
Matrix suggested Default 0.5
-Decrease to tighten geometry
-Increase to increase X-ray terms contribution
Enter an appropriate number (Enter for default)
The first parameter (total cycles)
refers to any of
arp_molrep, warpNtrace or arp_solvent (applications #1,#4,#5)
and is the total number of cycles to be run. in any of these restrained ARP/wARP applications.
The following two questions refer exclusively to
the newest ARP/wARP mode, warpNtrace.
The first one is how often rebuilding is taking place in warpNtrace,
while the centre of gravity is the approximate centre of gravity for a 'real' molecule.
As for the last parameter, it is the weight between the X-ray terms and geometry as
is explained. The default will do - for most of the cases.
The setup script has now finished and if nothing went too wrong there should now be
a file named warp.par in your directory that looks like this:
set datafile = /full_path/test.mtz
set fp = F17
set sigfp = SIGF17
set fbest = F17
set phibest = PHIDM_123p
set fom = FOMDM_123p
set protsize = 1117
set wilsonb = 31
set models = 6
set procs = 3
set machine1 = reggae
set machine2 = reggae
set machine3 = reggae
set machine4 = DUMMY
set machine5 = DUMMY
set machine6 = DUMMY
set rc1 = 15
set rc2 = 15
set rc3 = 20
set sym = 20
set cell = ' 62.59 64.76 74.30 90.00 90.00 90.00 '
set xyzlim = ' 0 0.5 0 0.5 0 0.5 '
set refmeth = CDIR
set freer = N
set freerml = N
set damp = '0.99 0.99'
set rrcyc = 2
set bulkls = 'SBULk -0.75 BBULk 150'
set bulkml = 'SBULk -0.25 BBULk 80'
set fixbulk = '
#'
set scale = BULK
set eresol = 2.5
set phaseres = N
set phasblur = UNK
set refmax = MLKF
set restrcyc = 100
set restrref = 10
set cgr = ' 25 25 -8 '
set resol = ' 20 1.700 '
set wmat = 0.5
If you want to change some of the parameters without going
through the setup once again - just edit the file, but make sure you know what you are doing!