EDPDB: A Multi-Functional Tool for Protein Structure Analysis

		Xue-jun Zhang and Brian W. Matthews

  Institute of Molecular Biology, Howard Hughes Medical Institute
		and Department of Physics
		University of Oregon
		Eugene, Oregon  97403

(A shorted version of this paper was published in J. Appl. Cryst. (1995) 28, 624-630).

ABSTRACT

EDPDB is a FORTRAN program that simplifies the analysis of protein structure and makes it easy to extract various types of geometrical and biologically relevant information both for the molecule in isolation as well as in its crystallographic context. EDPDB offers a large set of functions by which the user can evaluate, select and manipulate the coordinates of protein structures. Types of calculation available include the determination of solvent accessibility, bond lengths and torsion angles, determination of the van der Waals volume of a group of atoms, determination of the best-fit plane through a set of points, evaluation of crystal contacts between a molecule in a crystal and all symmetry-related molecules and the determination of hinge- bending motion between protein domains. It is also possible to compare different structures, to perform coordinate manipulations and to edit coordinate files. The program augments the graphic analysis of protein structure by allowing the user to construct a simple set of commands that will rapidly screen an entire structure. It may also make special purpose analyses feasible without complicated programming.

INTRODUCTION
GENERAL ORGANIZATION
EDPDB FUNCTIONS AND COMMAND CATEGORIES
EXAMPLES
EPILOGUE
ACKNOWLEDGEMENTS
REFERENCES
APPENDIX A

INTRODUCTION

A protein structure solved by X-ray crystallography is usually stored as a set of coordinates of its component atoms. The three- dimensional structure can be displayed and analyzed by interactive graphics programs such as FRODO (Jones, 1978) or its descendent O. Such graphics program packages are very well developed and are widely used. When quantitative analysis is required, text- based structure analysis is preferred. Such calculations are usually performed with a variety of algorithms implemented in separate programs (e.g. Lee & Richards, 1971; Connolly, 1983; Laskowski, MacArthur, Moss & Thornton, 1993; Moras, Podjarny & Thierry, 1991 and references therein). Currently, numerous program packages are in use for structure determination and analysis. For historical or technical reasons, these programs use different file formats for protein structure coordinates, so that still more programs are necessary to convert coordinates among the different formats. To simplify protein structure analysis, a general purpose, multi-functional, programmable, text-based program is highly desirable. In this paper, we describe a program, EDPDB, which has been designed with this goal in mind.

EDPDB was named to reflect its original conception, namely that it should be a general purpose editor for coordinate files in the format of the Brookhaven Protein Data Bank (PDB) (Bernstein et al., 1977). The PDB format is widely accepted and is often used as an intermediate to convert coordinates of protein structures from one format to another. The Brookhaven Protein Data Bank is also the main source for protein structure coordinates. For this reason, the PDB format was chosen for use by EDPDB. However, it would not be difficult to have EDPDB accept other formats if the present PDB format was superseded by another.

As a multi-functional line mode editor, EDPDB does not require any graphics hardware. This makes EDPDB easy to implement on different computer systems.

The main feature of EDPDB is its large collection of functions, which permit selection, editing and calculation. About one-hundred commands are available in the current version of the program (V94A). None of these is conceptually complicated and many of the single functions might be performed with a relatively simple program. The combination, however, significantly reduces the amount of input/output, compared to the use of separate programs, and thereby speeds up the calculations. Furthermore, a single program with multiple functions gives the user more freedom to follow his/her own logic. In contrast to a typical database program (Morris, MacArthur, Hutchinson & Thornton, 1992) which usually uses a master file containing precalculated data, EDPDB reads the structural coordinates and calculates all the data required for the task at hand.

Another major goal in designing EDPDB was that it be general. The same program should be useable for calculations as simple as counting atoms or as complex as determining fractional solvent accessibility. The various calculations have been optimized for speed. Keyword leading, free format input and an on-line help service make EDPDB user-friendly. The construction of the command interpreter makes EDPDB programmable, thereby providing the user ways to extend the potential of the program, for example, by defining macros.

Another feature of the program is to allow the full use of symmetry operators. This makes EDPDB an especially useful tool for crystallographers.

In the following sections, the general organization of EDPDB is discussed, followed by a brief description of the available commands. These are classified into seven categories according to their functions. A few examples are included to illustrate the potential of EDPDB.

GENERAL ORGANIZATION

Input/Output Fields and Selection of Records
Calling Functions
Input/Output Files
Conventions
Programming

Input/Output Fields and Selection of Records

EDPDB is designed to handle any number of PDB coordinate files. The only restrictions come from the dimensions of the predefined arrays in the FORTRAN program, which can be easily modified, if necessary. In a PDB format file, coordinates are stored with one atom per record. For each input atom record, there are several fields: ATOM keyword, entry number, atom name (e.g. CA, OG1), residue type (e.g. ALA, PHE, SOL), chain marker (blank space or single character), residue number, coordinates (X,Y,Z), crystallographic occupancy or weight (W), and thermal factor (B). For convenience, and following the convention of other popular crystallographic programs (e.g. FRODO; Jones, 1978), EDPDB also defines the chain marker followed by the residue number as a residue identification. Records can be selected by matching the appropriate input fields with a selection criterion. Similarly, for each atom record, EDPDB defines output fields as a displayed text string and two real number fields (i.e. a weight and a thermal factor). The displayed text string, which usually includes the atom name, residue type and coordinates, can be modified by using editing commands or updated as the result of various calculations. Therefore, fields in the displayed text string may contain information different from the internal fields. For example, an atom name can be overwritten in the displayed text, but will not be changed in the internal atom-name array, unless the modified file is written out and reread in.

During execution, EDPDB opens buffers to store the pointers (addresses) of the selected records. Records selected in the main buffer are called ON atoms. The remaining records are called OFF atoms. Most of the editing and calculation commands effect the ON atoms, while selection commands change the ON/OFF status of records.

Calling Functions

Each function in EDPDB is activated by a one-line input statement (a command or an input card). An input card starts with a keyword (i.e. the name of the function) and includes any parameters that may be required by the function. The leading key- word can be abbreviated as long as there is no ambiguity with other leading keywords. For example, command RESET may be abbreviated as RESE, but cannot be abbreviated further (e.g. RES) because of the ambiguity with the command RESIDUE. The input parameters are separated from each other by spaces or a comma (not a space followed by a comma). At the position where a parameter is expected, a comma may be used to indicate the default value for that parameter, if it has been defined. In most cases the order of the input parameters is organized so that required parameters are input first and optional ones later. Once an input card is entered the function is called immediately. For example, by typing ANALYZE, the average, minimum, maximum, range and standard deviation of the coordinates, occupancies and B-factors of the current ON atoms will be listed. The order of the input cards, therefore, will determine the behavior of EDPDB.

Input/Output Files

When being used in an editing mode, EDPDB uses PDB files as its direct input. New files in the PDB format can be output at any time.

At the commencement of running EDPDB, two new files are opened. Each file has the same name as the input PDB file, but with file extensions EDP and SCR. For example, if the input PDB file name is pdb4lzm.pdb, the newly created files will be called pdb4lzm.edp and pdb4lzm.scr. The EDP file contains a record of commands completed during the current execution. This file can be used subsequently, e.g. to repeat a set of commands, or can be modified into a macro for other similar calculations. The SCR file contains intermediate results which may or may not be shown on the terminal screen depending on the user's request. Usually the SCR file is used only for storing intermediate results. At a normal termination both the EDP and SCR files will be deleted by default, unless an optional save is requested (e.g. using the SAVE option in the QUIT command).

Most of the geometry calculations are based on the Cartesian coordinates as input from the PDB file. Some calculations, however, require additional information. For example, calculation of the area of solvent-accessible surface requires the assignment of a van der Waals radius to each of the atoms involved. An external file called acc.dat is used to input these radii. The SORT command may also require a file called pdbstd.dat which is used to specify the standard order of atoms within each residue and the labeling convention. The standard follows the normal atom order in a PDB file and the IUPAC-IUB convention (1970). It can, however, be modified for special purposes. Since EDPDB uses an event-driven strategy, this kind of additional information is not read unless it is needed by a specific function.

Conventions

EDPDB uses the Angstrom (1 A = 10^-10m) as the unit of length and the degree as the angular unit. For most functions EDPDB assumes that the input coordinates are orthogonal. The coordinates can, however, be transformed between different coordinate systems. In the case of a crystal structure in a non-orthogonal space group, the alignment of the Cartesian coordinate system (XYZ) relative to the crystallographic coordinate system (abc) is read from the header of the PDB file if present. The user can override this information explicitly through a CELL card. Crystallographic symmetry operators are input in the format of the International Tables for Crystallography. The conventions for the polar angles (phi, psi and kappa) are that phi is the angle between the X axis and the projection of the rotation axis on the X-Y plane; psi is the angle between the rotation axis and the Z axis; and kappa is the rotation angle about the rotation axis. Two different definitions of Eulerian angles can be used in EDPDB, and are explained explicitly in the program documentation. For all angles, the positive direction is defined as counterclockwise when looking down the rotation axis.

Programming

Like many other general purpose programs, EDPDB commands are programmable. The simplest method is to use a macro file that contains a sequence of EDPDB commands. Passing of parameters to such a macro file, and calling other macro files from within such a macro file, can be used. Loop and GOTO program structure in a macro are also available with EDPDB commands. An EDPDB command, ALIAS, can also be used to create user-defined commands which always have higher priority than the built-in command. EDPDB is a module-based program. This provides an open frame for new functions (commands) to be added. Therefore the capabilities of EDPDB can be extended at different programming levels.

EDPDB FUNCTIONS AND COMMAND CATEGORIES

There are a hundred or so EDPDB commands that are classified into seven categories, namely: Selection, Calculation, Editing,, Input/Output, Control, Definition and Miscellaneous. A summary of the currently available commands with a brief description is given in Appendix A. Partly for historical reasons, and partly for convenience of typing, not all of the keywords used in EDPDB are real English words.

In the following, we will first discuss how to start and quit EDPDB as well as how to get on-line help. Then a brief discussion and a list of command names is given for each of the seven categories. Note that some of the commands can fit into different categories, although they are listed in only one.

Starting EDPDB, Getting Help and Quitting

One impediment to learning a new program is that unfamiliar data files may need to be prepared and pre-programs run as a prerequisite to the desired calculation. After starting the program, it may be difficult to get help when needed. Sometimes it may not be apparent how to terminate the program in an appropriate manner. EDPDB is set up to minimize problems of this sort.

To start, one simply types the program name, EDPDB, followed by the file name of the PDB format coordinate file to be read. For example, if the user's PDB file is called pdb4lzm.pdb, the following command typed in a system command line (e.g. DCL/VMS) will start the EDPDB program by reading the PDB file.

	$ EDPDB PDB4LZM.PDB

The input PDB format coordinate file will be opened as a read-only file and will not be modified or overwritten (unless insisted by the user under a unix system).

If EDPDB is being used interactively an on-line help utility is available to describe functions and command syntax. Examples are included in the on-line help menu to illustrate the use of the command and its relation to other commands. Typing HELP at any time during the execution of EDPDB will call the on-line help utility. Another way to get on-line help for a particular command is to type the leading keyword of the command and to end the input line with "/?" (without the quotation marks). This is convenient when the user knows the command-leading keyword but is unsure about the detailed parameters.

There are several ways to terminate EDPDB. A standard way is via the QUIT command. A quick way is to input control-Z (or control-D on a unix computer, i.e. end- of-file). The difference is that the QUIT command will do housekeeping before exiting from the program, while control-Z will leave the .edp and .scr files created by the program. Saving these files is sometimes desirable, since they contain execution history.

Selection Commands

The currently available selection commands are as follows:
ATOM , B , CA , CHAIN , EXCLUDE , EXTRACT , GROUP , INCLUDE , INITIAL , LOAD , MAIN , MATCH , MMI , MMIR , MORE , NAYB , NAYBR , RESIDUE , SIDE , SWAP , W , X,Y,Z and ZONE .

Selection commands define the ON/OFF status of the records, as necessary to prepare groups of atoms for other EDPDB functions. These commands are followed by an argument specifying the atoms, chains, molecules, etc., to be selected. E.g. ATOM CA selects all the alpha-carbon atoms. The commands shown in bold face can also be used without an argument to show on the terminal the current atoms, chains, etc., that are active. EDPDB selects records (i.e. atoms) by matching any one of the PDB fields (except entry number) with selection criteria defined by the user (see below). Intramolecular and intermolecular distances between symmetry- related molecules can also be used as selection criteria.

Different selection strategies can be constructed with EDPDB. For instance, two consecutive selection commands work independently and select the records which satisfy either of the selection criteria. Thus, the following commands select both the backbone atoms and the C-beta atoms.

	MAIN
	ATOM CB

A nested command can be used to perform a logical and selection. For example, the following one-lined nested command will select all polar atoms (i.e. nitrogen or oxygen) which are within a 4.0 A sphere centered at the OH atom of residue 24.

	NAYB 4.0   24   OH FROM {ATOM O* N*}

the asterisk "*" is a wild card for any character(s) in the atom name. In this selection command, the order of the criteria does not affect the result. It works the same as the following command.

	ATOM O* N* FROM {NAYB 4.0   24   OH}

A multiply nested command can be used for a more complicated and selection.

There are several ways to make a logical not selection. The EXCLUDE command followed by another selection command is a convenient way to turn the selected records OFF. For example, the following two commands can be used to select records which have B-factor between 20 and 30 (A^2).

	B > 19.9999
	EXCLUDE B > 30.0

The first command selects all records with a thermal factor equal to or larger than 20 A^2. The second command then turns off those records with thermal factors larger than 30 A^2. SWAP is another command that performs a logical not selection. In some selection commands an EXCEPT option is also available. For example, the following two commands select glycine residues from molecule A (i.e. CHAIN A).

	CHAIN A
	EXCLUDE RESIDUE EXCEPT GLY

The command GROUP defines a set of selection criteria that can be used subsequently as a group. The following sequence illustrates the use of the GROUP command.

	...(selection commands)
	GROUP TMP
	INITIALIZE
	RESIDUE ASP FROM TMP

The GROUP command defines the pre-selected records as a group named TMP. The INITIALIZE command reinitializes the buffer used to store the ON atoms and the last line selects as the ON atoms only the Asp residues from the TMP group.

Calculation Commands

The currently available calculation commands are
AB , ABC , ABCD , ACCESS , ANALYZE , AVB , AXIS , BRIDGE , DIFF , DISTANCE , EULER , HARKER , CLIQUE , CLOSER , CORRELATION , JIGGLE , MMIG , MOMENTINERTIA , MOVECENTER , NEWXYZ , OVERLAY , PLANAR , POLAR , PV , RATIO , RMSW , RTN , SHAPE , SORT , SUMW , VOLUME , VP and VV .

EDPDB can carry out a variety of structural analyses and coordinate rotation and translation operations.

Rotation and translation matrices can be determined, stored, manipulated (e.g. inverted, multiplied, etc.) and applied to selected atoms later. For example, the command OVERLAY calculates the matrix that optimizes the superposition between two sets of coordinates (McLachlan, 1979). The command MOMENTUMINERTIAA determines the matrix that brings the three principal axes of rotation of the model (i.e. the selected atoms) into coincidence with the X,Y,Z coordinate axes. The MOVECENTER command defines the matrix that brings the center of mass of the current model as close as possible to the origin (or to any user specified position) consistent with all possible crystallographic symmetry operators.

With the RTN command, one can determine and/or apply various coordinate transformations. These include

(a) apply a transformation matrix read from a file
(b) apply a given crystallographic symmetry operation
(c) apply a rotation specified in Eulerian angles (either ZY'Z" convention or ZX'Z" convention)
(d) apply a rotation specified in polar angles
(e) transform between orthogonal and non-orthogonal coordinate systems or vice versa
(f) apply a rotation about a given bond
(g) center a molecule on the origin of coordinates
(h) apply a rotation to reset a torsion angle from any arbitrary angle to a given value (e.g. for setting side-chain torsion angles to standard values)
(i) make a coordinate transformation based on a three-atom superposition (e.g. for mutating an amino acid)
(j) determine and apply the transformation that will superimpose two atoms consistent with a given rotation (e.g. for applying non-crystallographic symmetry). The two atoms might represent heavy atoms bound to two protein molecules and the rotation might be determined from a self-rotation function search.

With the calculation commands HARKER, EULER and POLAR, the X,Y,Z input fields can contain crystallographic coordinates or angular values rather than coordinates in the standard Cartesian PDB framework. In such cases, the PDB format is simply used as a vehicle to input necessary data to EDPDB.

The result of a calculation will be written to the standard output device (e.g. the terminal if the program is being used interactively) and/or to a scratch file (input_file_name.scr). Some results may overwrite the field(s) of related records, e.g. the ACCESS command (used for solvent accessibility calculation) will overwrite the occupancy (W) field of each ON atom with the solvent-accessible area of that atom. These calculation commands can therefore be considered as editing commands as well.

Similarly, some calculation commands can also be considered as selection commands.

EDPDB uses the Lee-Richards algorithm (Lee & Richards, 1971) to calculate solvent-accessible area (command ACCESS) and McLachlan's algorithm (McLachlan, 1979) to perform least-squares structural superposition (command OVERLAY).

Editing Commands

The currently available editing commands are
BLANK , PERMUTE , SET , SETA , SETB , SETC , SETE , SETI , SETR , SETT , SETW , SWITCHWB and UPDATE .

Editing commands modify the output fields, i.e. the text strings, and occupancy or thermal factor values of selected records. In principle, every character in the text string can be modified. The editing commands do not change either the internal coordinates (which may still be used for geometry calculations), or the internal atom type, residue type (e.g. Ala) or residue name (e.g. A99) (which are used for selection criteria).

For a typical PDB coordinate file, the occupancy (W) field is usually less informative than the coordinate and B factor fields. Therefore, with some calculations (e.g. ACCESS and DISTANCE), the W field is overwritten with the results. In other calculations (e.g. OVERLAY and MOMENTUMINERTIA) the values in the W field may be used as weights. In this case, the weight value should be defined appropriately before performing the calculation.

Some commonly used programs create and accept non-standard "PDB format" files in which, for example, a field may be shifted by one or two columns. This can cause difficulties in sharing coordinate files among different programs. The PERMUTE command allows inconsistencies of this sort to be easily corrected.

EDPDB also provides an easy way to inter-change labels so as to change from one convention to another. In the following example, the OH atoms of residue type SOL are renamed as HOH, while any OH atoms that might be associated with another residue type (e.g. TYR) are not affected.

	ATOM OH FROM {RESIDUE SOL}
	SETA HOH

Input/Output Commands

The input/output commands are APPEND , EXIT , LIST , READ , SEQUENCE and WRITE .

As an editor program, EDPDB can output an edited file to the terminal or create a new file in the PDB format. The user also has the ability to reformat the output by (1) using the editing command PERMUTE to rearrange the text string, and (2) using a user-defined format in an output statement. By using the calculation command SORT, one can rearrange the records in a PDB file in many different ways. The APPEND command provides a simple way to merge blocks of coordinates in a user-preferred sequence.

Additional PDB files including those created within the current EDPDB execution can be accessed with the READ command. Peptide chains or molecules from such files can be distinguished by using additional chain identifiers.

A combination of a selection command with a LIST command provides a fast way to search for particular records in a PDB file. The following command sequence, for example, will list the solvent molecules with thermal factors less than 10 A^2.

	INITIALIZE
	B < 10 FROM {RESIDUE SOL}
	LIST

Control Commands

The control commands and related options include
Available commands: @macro_file , CLOSE , GOTO , label_statement: , MAXERR , PARAMETER , PAUSE , QUIT , RESET , RETURN , REWIND and SYSTEM .

Control commands are used to change the status of either the program or the input/output files. Many commands in this category were designed to enhance the programming ability of EDPDB. It is also possible for a user to call a system command (e.g. a DCL command in VMS, or a shell command in unix )or to create a sub-process without terminating the EDPDB program.

Definition Commands

The current definition commands include ALIAS , CELL , DFAB , DFABC , DFABCD , DFBRG , DFCA , DFNEWXYZ , DFMAIN , DFRES and SYMMETRY .

Each command is followed by an argument that defines sets of atoms or templates or parameters that are used in subsequent operations. When used without an argument the command will show on the terminal the definition, if any, that is currently in effect. One philosophy of EDPDB is that it should be as flexible and as general-purpose as possible. For example, the command DFABCD is used to define a template for the calculation of torsion angles. The program is not, however, restricted to the standard backbone and side-chain torsion angles. It is possible to use DFABCD to specify any type of pseudo torsion angle, e.g. the torsion angle formed by four sequential C-alpha atoms.

As a general purpose program, EDPDB allows the user to overwrite the default definitions, which might be too specific in some cases. For example the main-chain atoms are usually defined by the default definition DFMAIN N CA C O but this can be replaced by the command statement DFMAIN N CA C O CB.

The ALIAS command allows the user to define his/her own key- words in terms of meaningful EDPDB command(s). For example, a user-defined keyword DIRE (using ALIAS DIRE SYSTEM WAIT DIRECTORY) might be used to call the VMS/DCL command DIRECTORY. Thus, by typing DIRE the user can, for example, list the file names in the current directory without terminating EDPDB.

Miscellaneous Commands

These commands are designed to make EDPDB more user friendly and/or to provide the user with help when needed.

They include comment , C , FILE , HELP , PROMPT , SETENV and SHDF .

EXAMPLES

In this section, several examples are given to show the use, the flexibility and the potential of EDPDB. Appendix A can be consulted for additional information on the individual commands. Any text in an input line following a semicolon is regarded as a comment and is ignored by the command interpreter of EDPDB.

(1) Preliminary inspection and statistical analysis of a file.

	INITIALIZE	; initialize working buffer space for ON atoms
	ZONE ALL 	; select all the atom records
	ZONE		; show the zone information
	RESIDUE 	; show the number of residues for each type of residue
	ATOM 		; show the number of atoms for each type of atom
	ANALYSIS	; give statistical information for X, Y, Z, W and B

(see Appendix A)

(2) Backbone zeta angle calculations. The angle zeta of an amino acid residue is the pseudo torsion angle defined by connecting in sequence the atoms C�, N, C and C-beta. For a well-refined protein structure, this angle should equal approximately +33.5o for every residue. A negative zeta for an L-amino acid residue indicates a chirality error.

	DFABCD CA N C CB 1 1 1 1
		; define the template for a torsion angle calculation
		; where the numbers indicated that all the atom are within the same
		; residue 
	INITIALIZE
	ATOM N CA C CB	; select the atoms needed for the calculation
	ABCD           	; calculate all the torsion angles that fit the above template

(3) Search for all interactions between nitrogen and oxygen (i.e. potential hydrogen bonds) within a protein.

	INITIALIZE
	ATOM N* O*
		; select nitrogen and oxygen atoms
		; * indicates a wild card to include all types of N and O atoms
	GROUP P ; define these atoms as group P (i.e. "polar")
	DISTANCE P 2.0 3.5 3
		; list every polar atom pair with a distance
		; between 2.0 and 3.5* and separated by at least three residues

(4) Structural superposition or overlay of molecules A and B, each assumed to contain 164 residues.

	INITIALIZE
	GROUP A FROM {MAIN A1-A164}
		; define the backbone atoms of molecule A as group A
	GROUP B FROM {MAIN B1-B164}
		; define the backbone atoms of molecule B as group B
	LOAD B
		; select group B which contains the backbone atoms of molecule B
	OVERLAY A rtn.dat
		; calculate the matrix that gives the optimum least-squares superposition
		; of molecule B on molecule A and write the matrix to a file named
		; rtn.dat
	INITIALIZE
	AXIS RTN.DAT
		; analysis of the rotation-translation matrix written in the file rtn.dat
	CHAIN B ; select the whole of molecule B
	RTN FILE RTN.DAT
		; apply the rotation-translation matrix of rtn.dat to the currently selected
		; atoms, i.e. to the whole of molecule B

(5) Check and resequence a coordinate file according to the standard dictionary file pdbstd.dat supplied with the program.

	INITIALIZE
	ZONE ALL
	SORT DFRES
		; read file pdbstd.dat by default
		; check the atom order, side-chain chirality and labeling
	SET ENTRY
		; reset the entry number consistent with the new order of the records
	EXIT SORTED.PDB HEADER
		; create a new PDB coordinate file named sorted.pdb with the old file
		; headers but atoms reordered

(6) Check all possible intermolecular contacts in a crystal with two molecules, A and B, each of 164 amino acid residues, per asymmetric unit.

	CELL 80.0, 80.0, 50.0, 90.0, 90.0, 120.0, 6
		; input cell parameters and the alignment convention which in this case is
		; #6 X//a, Y//b*, Z//(a x b*)
	@symmetry R32                         
		; input symmetry operators from the file symmetry.edp (see below)
	INITIALIZE
	GROUP A FROM {ZONE A1-A164}
		; define molecule A as group A
	GROUP B FROM {ZONE B1-B164}
		; define molecule B as group B
	LOAD A	; select molecule A
	MMIG A 3.5 
		; check crystal contacts of atoms within 3.5*, between symmetry-related
		; A molecules
	MMIG B 3.5 
		; between all A molecules and all B molecules
	INITIALIZE
	LOAD B
	MMIG B 3.5 
		; between symmetry-related B molecules

The file symmetry.edp contains the symmetry operators for most of the commonly used space groups. Simple translations are not necessary since they are handled automatically.

	; symmetry operators for space group R32
	SYMMETRY  X, Y, Z
	SYMMETRY  -Y, X-Y, +Z
	SYMMETRY  -X+Y, -X, +Z
	SYMMETRY  Y, X, -Z
	SYMMETRY  X-Y, -Y, -Z
	SYMMETRY  -X, -X+Y, -Z

(7) Construct a file that contains the average thermal factor, B, for each residue, where the averaging is (a) over the whole residue, (b) over the backbone atoms, and (c) over the side-chain atoms of each residue.

	INITIALIZE
	CA 	; select C-alpha atoms only
	BLANK	; blank the X, Y and Z fields of the C-alpha atom records
		; these will be used to write the output
	MORE 
		; extend the selection to all residues which have C-alpha atoms, i.e. select all
		; amino acid residues
	AVB X   ; calculate average B for each residue
		; write the result to the X field of the C-alpha atom
	EXCLUDE MAIN
		; keep side-chain atoms only
	AVB Z   ; calculate side-chain average B for each residue
		; write the result to the Z field of the C-alpha atom
		; for glycine the field is left blank
	INITIALIZE
	MAIN    ; select main-chain atoms
	AVB Y	; calculate main-chain average B for each residue
		; write the result to the Y field of the C-alpha atom
	INITIALIZE
	CA  	; recall the modified C-alpha atom records
	WRITE AVB.LIS
		; output the result to a file named avb.lis

(8) Calculate the angle and the shortest distance between the axes of two alpha-helices in a protein. Assume that the helical regions include residues 115-122 and 126-134.

	INITIALIZE
	GROUP TGT FROM {MAIN 115-121}
	MAIN 116-122
	OVERLAY TGT V1.DAT
		; the axis of helix 115-122 will be defined as the rotation axis that
		; superimposes the backbone atoms of residues 115-121 on the partially
		; overlapping residues 116-122
		; calculate this transformation and store in the file v1.dat
	INITIALIZE
	GROUP TGT FROM {MAIN 126-133}
	MAIN 127-134
	OVERLAY TGT V2.DAT
		; similarly, determine the transformation that will define the axis of helix
		; 126-134
	INITIALIZE
	AXIS V1.DAT V1
		; extract the vector, v1, which corresponds to the rotation axis in the
		; matrix v1.dat
	AXIS V2.DAT V2
		; extract the vector, v2, which corresponds to the rotation axis in the
		; matrix v2.dat
	VV V1 V2
		; calculate the angle and distance between the two axes

(9) Search for candidate sites to introduce a disulfide bond by mutation. The C-beta - C-beta distance should be between 4.0-6.5 A, the C-alpha - C-beta - C-gamma angle should be larger than 80.0o, and the loop formed by the disulfide bond should, for example, be longer than 50 residues (e.g. Sowdhamini et al., 1989).

	INITIALIZE
	DFBRG CA CB CB CA X 0 Y 0 ,,,, 4.0, 6.5, 80.0, 180.0, 0.0, 360.0 WXYZ 50
		; create a template for the search
	ATOM CA CB
		; select the atoms needed for the search, define the search range
	BRIDGE 	; perform the search
		; list all the candidates that fit the template

(10) Calculate the correlation between the solvent-accessible area (SAA) and the average thermal factor, including all residues in the protein.

	INITIALIZE
	MORE FROM {CA}
		; select all amino acid residues, i.e. the protein part of the PDB file
	ACCESS	; calculate the solvent-accessible area for each atom
		; default van der Waals radii are assumed
		; store the value in the W field
	AVB X	; store the average B in the X field of the C-alpha atom
	SUMW Y	; store the summation of SAA in the Y field of the C-alpha atom
	INITIALIZE
	CA	; select the CA atom records in which the X and Y fields store the
		; average B and summation of SAA, respectively
	GROUP TMP
	CORRELATION TMP X Y
		; calculate the correlation between fields X and Y

(11) Calculate the solvent-accessible area (SAA) of molecule A in the presence and absence of molecule B. The difference is the solvent-accessible area of molecule A that is buried in the interface.

	INITIALIZE
	GROUP MOLB FROM {CHAIN B}
	CHAIN A
	ACCESS MOLB
		; calculate the SAA of molecule A in the presence of molecule B
	ACCESS	; calculate the SAA of molecule A in the absence of molecule B

(12) Assume that a protein contains N-terminal and C-terminal domains, defined respectively by the amino acids 1-60 and 80-162. Assume also that the structure of this protein is determined in two different situations, A and B (e.g. in two crystal forms). Calculate the change of the interdomain hinge-bending angle between the N-terminal and C-terminal domains in the two independent protein structures, A and B.

	INITIALIZE
	GROUP A FROM {MAIN A1-A60}
	MAIN B1-B60
	OVERLAY A OVERLAY_N_DOMAIN.DAT
		; calculate the overlay matrix between the two N-terminal domains
		; starting from an arbitrary position
		; store in the file overlay_n_domain.dat
	CHAIN B	; select the molecule B
	RTN FILE OVERLAY_N_DOMAIN.DAT
		; superimpose molecule B on molecule A
		; by applying the matrix stored in overlay_n_domain.dat which
		; will superimpose the N-terminal domains
	INITIALIZE
	GROUP A FROM {MAIN A80-A162}
	MAIN B80-B162
	OVERLAY A OVERLAY_C_DOMAIN.DAT
		; now calculate the matrix that will superimpose the C-terminal domains
		; starting from the position where the two N-terminal domains are
		; already superimposed
		; this is the desired transformation
	INITIALIZE
	AXIS OVERLAY_C_DOMAIN.DAT
		; extract the rotation axis and the rotation angle from the transformation
		; matrix the angle is the change of the hinge-bending angle

The above examples are by no means complete. They are intended only to illustrate the potential of EDPDB.

EPILOGUE

We take it as self-evident that efficiency is desirable. For example, when preparing a manuscript, one likes to use a word processing program, because such a program combines useful editing utilities and makes a complex task much simpler. Structural biologists in general, and protein crystallographers in particular, are constantly dealing with structure coordinate files. Attempting to edit or evaluate such files by sitting in front of a terminal and repeatedly typing a group of keys is tedious and time- consuming. There is a need for a crystallographic or structural "word processor", so that more time is available to concentrate on the underlying science. It is hoped that in this sense, EDPDB can be a useful tool for structural biologists. In summary, EDPDB has the following attributes:

- it works like a line mode editor.
- it is easy to use.
- on-line help is available.
- specific structural parameters that may be required for a given task are input from simple database files, rather than coded in the program.
- the program is very versatile. It has many features that were developed for macromolecular crystallography but it also permits detailed structural analysis of a single molecule.
- one can write macros to extend its capabilities. It is programmable.

EDPDB is available from the author who can be reached at the E-Mail address: cai-zhang@omrf.ouhsc.edu.

ACKNOWLEDGEMENTS

We thank numerous colleagues in the laboratories of B.W.M. and Dr. S.J. Remington (University of Oregon), particularly Andrew Morton and Dr. Dirk Heinz, for their continuous encouragement, stimulating discussions and help in proofreading the on-line help menu. This work was supported in part by NIH grant GM20066.

REFERENCES

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jr., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). J. Mol. Biol. 112, 535-542.

Connolly, M. L. (1983). Science 221, 709-713. Grindley, H. M., Artymiuk, P. J., Rice, D. W. & Willett, P. (1993). J. Mol. Biol. 229, 707-721.

IUPAC-IUB Commission on Biochemical Nomenclature. Abbreviations and Symbols for Description of the Conformation of Polypeptide Chains (1970). J. Mol. Biol. 52, 1-17.

Jones, T. A. (1978). J. Appl. Cryst. 11, 268-272. Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. (1993). J. Appl. Cryst. 26, 283-291.

Lee, B. & Richards, F. M. (1971). J. Mol. Biol. 55, 379-400.

McLachlan, A. D. (1979). J. Mol. Biol. 128, 49-79.

Moras, D., Podjarny, A. D. & Thierry, J. C., Eds. (1991).

"Crystallographic Computing 5: From Chemistry to Biology", International Union of Crystallography, Oxford University Press.

Morris, A. L., MacArthur, M. W., Hutchinson, E. G. & Thornton, J. M. (1992). Prot.: Struct. Funct. Genet. 12, 345-364.

Sowdhamini, R., Srinivasan, N., Shoichet, B., Santi, D. V., Ramakrishnan, C. & Balaram, P. (1989). Prot. Engin. 3, 95-103.

APPENDIX A

A Brief Summary of the Commands of EDPDB (V94A) In the following, the currently available EDPDB commands are summarized in alphabetical order. A more detailed account is given in the on-line help menu.

(Omitted ...)