Structure verification (CHECK)

Introduction.

The command CHECK will bring you to the so-called CHECK menu. This menu holds options that all check one or more aspects of protein structures. Most checks determine exceptional situations, like for example a contact that is seldomly seen in the database, but also hard errors, like for example a wrong SCALE matrix in a PDB file can be detected.

Several of the commands in this menu are also executable from another menu. For example CHICHK evaluates and checks torsion angles. This option can also be called as EVACHI from the CHIANG (torsion angle) menu.

Several options in the check menu are so called 'terminal' options. That means that they can destroy the status of the soup, and will definitely leave WHAT IF in an undefined state after the option finished.

Completely checking a protein (FULCHK)

The command FULCHK will cause WHAT IF to write a complete report about a protein structure. You will get the output in LaTeX format in a file "pdbout.tex", and in plain text format in "pdbout.txt". Obviously pictures can only be given in the LaTeX output. If you want to use the LaTeX output, you will need the latex program and some others. For your convenience suitable versions of these programs are archived on our anonymous ftp site "swift.embl-heidelberg.de" in the directory "/whatif/support".

To use the LaTeX output, you can type:


latex pdbout           (to reformat the file)
xdvi pdbout            (to preview the output)
dvips pdbout           (to make postscript output)
lpr pdbout.ps          (or a similar command, to print the postscript file)
A maximum of 100 lines will be given in any table. If more than 100 problems should be listed, the table is truncated at 50 lines, and the total number of lines is written at the bottom. Since most tables are sorted such that the worst numbers are at the top, this should not be a problem. If you want to see the whole list anyway, you can get it by running the individual check while creating a logfile (see DOLOG).

FULCHK is a terminal option. That means that you can not run FULCHK just in the middle of a WHAT IF session. You run FULCHK on one molecule, preferably in a "fresh" WHAT IF. After FULCHK finished, you are immediately asked to terminate the session with FULLSTOP.

The FULCHK option writes a human readable text file, but also a TeX style file with several kinds of graphics in it. If you want to get the programs required for some of these plots, please see the chapter on licenses.

Running only a subset of the checks (FSTCHK)

The command FSTCHK does the same as the FULCHK option. However, rather than running all checks, only a subset of all checks is executed. You can control which options are skipped and which are executed with the TODO.CHK file (of which there is an example in your dbdata directory of the WHAT IF account). In this file the first three characters of each line are the Check-Id, and columns 4-6 are either 'YES' or 'NO'. The rest of each line is free; in the example file you can find out what the check does and how long it normally takes.

Verifying normality of surfaces (ACCCHK)

The command ACCCHK will calculate and evaluate accessible surfaces. It will indicate whether the distribution of polar and apolar accessible and buried atoms looks normal or not. At present I am not sure yet how to interpret the numbers.... This option is not yet finished, and therefore is not encorporated in the FULCHK report.

Checking distance of atoms to symmetry axes (AXACHK)

The command AXACHK will verify for each atom in the structure whether it has a distance bigger than 0.7 Angstrom to all proper symmetry axes. Any atom coming closer than this distance must form a "bump" to a symmetry related copy of itself. The only exception is a water molecule that is exactly on an axis; therefore WHAT IF will not complain in such a case.

Checking for unusual short distances (BMPCHK)

The command BMPCHK activates a bump check that is rather different from the bump functions used by e.g. the DEBUMP option.

From a study of WHAT IF's database of high quality structures it was determined that no pair of non-hydrogen-bonded atoms should have an inter-atomic distance more than 0.4 Angstrom shorter than the sum of the two Van der Waals radii. For hydrogen bonded atoms this limit was found to be 0.55 Angstrom.

In the BMPCHK, all interatomic distances between non-bonded atoms are calculated, and verified against these rules. If two atoms do come closer, the amount by which the contact is too short is printed in a table. In the table it will be indicated whether the bump is between symmetry relatives (inter) or within the given asymmetric unit (intra).

A bump will never be reported between two atoms for which the sum of their atomic occupancies is less than 1.0

Checking bondlengths (BNDCHK)

The command BNDCHK does not require any additional input. It will perform a number of checks on the chemical bonds in the structure.

First it will check whether all atoms in all protein and nucleic acid residues are present.

After that it will compare each bond in protein residues with the Engh and Huber distance parameters [See Engh and Huber, Acta Cryst. A47, 392-400 (1991)] and print a table of all bonds that differ by more than 4 standard deviations from the expected values.

As a third check, the RMS deviation from the mean Engh and Huber parameters is determined (expressed in standard deviations). This RMS value is expected to be around 1.0. If it is bigger than 1.5 or smaller than 0.666 WHAT IF will complain.

Lastly, BNDCHK will determine whether the deviation from the Engh and Huber bondlengths is significantly correlated with the direction of the bond in the crystallographic unit cell. If such a correlation is found, a new unit cell is calculated where the correlation is gone. If this message appears, the cell used during refinement probably is not accurate enough. We do not have any experience on what to do about it, though.....

Evaluate unsatisfied buried H-bond donors and acceptors (BPOCHK)

The command BPOCHK will cause WHAT IF to list all buried unsatisfied hydrogen bond donors or acceptors. This check uses a very straightforward definition of a hydrogen bond. A more sophisticated check of unsatisfied hydrogen bond potential is part of the HNQCHK.

Checking torsion angles (CHICHK)

The command CHICHK is equivalent to the EVACHI command in the CHIANG menu.

All torsion angles in the molecule will be compared with the distribution of the same torsion angle in 150 of the 300 best refined proteins from the PDB. You will get a score for 'normality' and not for 'correctness' or energetics. In this score 0.0 means that this torsion angle value is as normal as it can be, and negative values represent less common conformations. Residue values below -2.0 warrant investigation, below -3.0 something strange must be happening.

For this analysis all torsion angles in the residue except omega are used.

Another part of the CHICHK verifies the phi/psi combination versus a Ramachandran plot. Residues that are in forbidden areas of the Ramachandran plot will be listed. Also, a separate check on omega values will be performed (for PRO and non-PRO residues), and residues with unusual values are listed.

Checking chain names (CHNCHK)

This check verifies the chain names in the PDB file. All residues with a certain chain name should be consecutive in the file, otherwise an error message will be given.

Checking for peptide plane flips (FLPCHK)

The command FLPCHK causes WHAT IF to compare all local backbone conformations (5 residue stretches) with similar (RMSD on alpha carbons less that 0.5 Angstrom) conformations in the database. The RMSD of the backbone oxygen in the structure and the database positions is given. If this value for a residue is above 1.5 manual inspection of the peptide plane seems advisable. In brackets the number of hits in the database is listed. This number should normally be 80, as that is the maximal number of hits WHAT IF looks for. If this number is considerably less than 80, the RMSD value for the oxygen position becomes a less sensitive measure of quality.

Checking validity of water molecules (H2OCHK)

The option H2OCHK will perform two checks on all water molecules in the soup.

For all clusters of water molecules H2OCHK will verify whether they are free-floating in the unit-cell, or touch the protein somewhere. If a cluster is free-floating this is reported as a problem: it is very unlikely that such clusters can be seen in the X-ray density, so the listed water molecules are probably refinement artefacts.

For all water molecules the closest protein molecule is located. If this is a molecule that is symmetry related to the ones given in the input file, a warning is given. For optimum usability of the file the listed waters should be moved such that they are closest to the untransformed protein molecule. See the MOVWAT option for this.

Checking the hand of chiral atoms (HNDCHK)

The command HNDCHK can be used to check for wrong handedness of chiral atoms in the twenty natural ocurring residues. All atoms with the wrong chirality will be listed.

Hydrogen bond network checks

HNQCHK performs a set of commands from the HBONDS menu in a row, having to do with the HB2 options. For this a complete calculation is done of the optimal hydrogen bond network in the protein. A number of warnings can be generated from the result.

The optimization of the hydrogen bond network considers two possibilities for the side-chain conformations of HIS, ASN and GLN residues. The X-ray experiment can not see the difference between the two conformations. If the orientation of the side chain of one of these residues in the optimized H-bond network is different from the orientation in the input file, a warning is given.

If any buried hydrogen bond donors do not have an acceptor, they are listed. In high resolution structures these do not occur, because it is energetically highly unfavourable!

If any polar side chain acceptor does not accept a hydrogen bond, the atom is listed.

From the optimized hydrogen bond network the protonation state of the HIS residues (HISD, HISE or HISH) can be deduced. Also, from the geometry of the HIS ring it is often possible to see which Engh and Huber parameters have been used for refinement. All these assignments are printed in a table. If the two assignments for a residue differ it is good to verify whether the correct parameters have been used for the refinement.

Nomenclature checks (NAMCHK)

The command NAMCHK alolows you to check the names of atoms. All atoms with non-IUPAC names will be listed. This involves simple torsion angle calculations (like for the PHE side chain) as well as checks for the exchange of atoms (like CG and OG in the THR side chain).

Starting a new summary file (NEWCHK)

Most checking options write a summary in a file that can be inspected by for example a simple perl-script like used in our WWW version of the CHECK procedures. The file is called 'check.db'. WHAT IF keeps adding its results to the end of this file. The command NEWCHK closes the old copy of this file if it exists. It also closes any TEX files that were made already. If you want to keep those files you should rename them BEFORE you run any other check option, because the check options will not even hesitate for a millisecond, and overwrite the old files.

Side Chain planarity check

The planarity of side chains of protein residues is verified against a database distribution. If any side chain deviates more than 3.0 standard deviations from planarity, this fact is reported.

Side Chain planarity check

For each atom connected to an aromatic ring system the distance of the atom to the least squares plane of the ring is calculated, and compared with a database distribution. If any value deviates more than 3.0 standard deviations from the plane, this fact is reported.

Checking packing quality (QUACHK)

The command QUACHK is similar to the RNGQUA option in the QUALITY menu. It activates the packing quality control. See the chapter on QUALITY control for an explanation. For short:

Every residue with a quality value below -5.0 is suspicious. A sequence of residues with low quality scores is "interesting".

Every molecule with a global quality below -2.7 is guaranteed wrong. A molecule with a quality below -2.0 might be misfolded or poorly refined. Every molecule with a global quality below -1.2 does not belong in a database of reliable structures.

Looking for rotamer normality (ROTCHK)

The command ROTCHK will compare for all residues their chi-1 rotamer with the distribution of observed rotamers for the same residue type in a similar local backbone conformation in the database. A normality index will be listed. If this index is lower than 0.5 a warning will be given. A few values are expected to appear for every structure, but normality values lower than 0.2 should occur only extremely sparingly!

Verifying symmetry information (SYMCHK)

The command SYMCHK is a killer command. That means that it starts by wiping out the soup. It will then prompt you for the name of the PDB file for which the symmetry information should be checked. This file will be read and checked.

This option checks the internal consistency of the SCALE and CRYST card in the PDB file, and it checks if the crystal can be reconstructed from the atomic coordinates and the provided symmetry information. It also checks whether the cell complies with rules set by the IUCr, and whether there is extra symmetry between so-called independent molecules.

Atomic occupancy check (WGTCHK)

WGTCHK checks whether all atomic occupancies are between 0 and 1.

B-factor check (XBFCHK)

XBFCHK verifies the B-factors in the structure. If many buried atoms have a B factor below 5.0, a warning is given. This either means that the structure has been determined at low temperature, or that there are problems in the refinement. If the average B factor for buried atoms is very high or very low, another warning is given. Finally, the distribution of B factors (basically the differences between B factors of bonded atoms) is analyzed. If the result is very strange, a warning is printed. If this warning appears, the B-factors should probably be constrained during the refinement. Because these strange observed differences can not be caused by thermal motion, adding constraints could improve the behaviour of the refinement.