Molecular System Database


Contents:


Introduction

A molecular system database is created whenever a begin, reset, or readFile command is encountered in a BTCL script (.inp file). This database is a collection of tables containing information describing a molecular system. Unlike general-purpose databases, a Discover database is stored in memory rather than on a peripheral device. This is essential for performance reasons.

The purpose of a molecular system database is to provide the user with access to important molecular system data. Any quantity stored in the database can be recovered via BTCL commands and manipulated or output as desired. For example, you might want to change the values of atom coordinates or add columns or tables for your own data.

Molecular system databases are deleted automatically when the Discover program is finished running or when a reset or begin command is issued.

The Discover program maintains a table of all databases known to it, and another table of ``current'' databases of every type (where the current database of a given type is the one to be used by default when there is not an explicit specification). A molecular system database is of type System in the CurrentDatabase table. A molecular system created with a begin becomes the current system database, but a molecular system created with a readFile does not.


Contents of the Molecular System Database Common to All Molecular Systems

These tables are always created in a molecular system database, regardless of periodicity or symmetry considerations:

        Atom 
Monomer
Molecule
NonbondGroup
Bond
The names Atom and Bond are not the actual names of the tables in question. The real names are MainCell/Atom and MainCell/Bond. However, Atom and Bond are aliases and can be used as if they were the actual names. The aliases are used in the remainder of this discussion.

Atom Table

The Atom table always has the following twelve columns, regardless of periodicity or symmetry considerations:

   Monomer        (type rid)
   Name           (type string)
   NonbondGroup   (type rid)
   Type           (type string)
   Charge         (type double)
   Chirality      (type byte)
   AtomicNumber   (type byte)
   FormalCharge   (type short)
   IsotopeNumber  (type short)
   OutOfPlane     (type byte)
   Coord          (type double(3))
   Mass           (type double)
In addition, an optional Velocity column, containing velocities that are read from history files, may be present. This column is created when a readFile history command is issued.

A row of the Atom table corresponds to an atom.

The entry in the Monomer column specifies the monomer to which the atom belongs. It is a reference to the Monomer table.

The entry in the Name column is the atom name appearing in the .car file.

The entry in the NonbondGroup column specifies the nonbond group to which the atom belongs. It is a reference to the NonbondGroup table.

The entry in the Type column is the forcefield atom type appearing in the .car file.

The entry in the Charge column is the forcefield-dependent partial charge.

The entry in the Chirality column has the following interpretation:

0 - neither chiral nor prochiral
1 - prochiral, lightest (or lowest priority) bonded atoms are equivalent
2 - prochiral, intermediate size (or priority) bonded atoms are equivalent
3 - prochiral, heaviest (or highest priority) bonded atoms are equivalent
4 - chiral
8 - not determined
9 - unable to determine

In the prochiral cases, priority is established on the basis of atom mass, and on the basis of the masses of connected atoms and groups when the initial masses are the same.

The entry in the AtomicNumber column is the number of protons in the atom.

The entry in the FormalCharge column is the formal charge on the atom.

The entry in the IsotopeNumber column is either 0 (for the most common isotope) or the isotopic number (number of protons plus number of neutrons).

The entry in the OutOfPlane column has the following interpretation:

For non-ESFF forcefields:

0: no oop energy
1: oop energy

For ESFF:
0: no oop energy, not an axial atom
1: oop energy, not an axial atom
2: no oop energy, axial atom
3: oop energy, axial atom

The Coord column contains the atom coordinates, and the Mass column contains the atomic mass.

Monomer Table

A monomer (also known as a residue in proteins) is a sequence of atoms within a molecule. For many crystalline systems (e.g., zeolites) where this breakdown does not make sense, the entire molecule is considered to be one monomer. That is, the Monomer table contains one entry, and all the P1 atoms in the MainCell/Atom table refer to this single ``monomer''.

The Monomer table always has the following columns:

   Molecule       (type rid)
   Number         (type string)
   Type           (type string)
A row of the Monomer table corresponds to a monomer.

The entry in the Molecule column specifies the molecule to which the monomer belongs. It is a reference to the Molecule table.

The entry in the Number column is the monomer number appearing in the .car file.

The entry in the Type column is the monomer type appearing in the .car file--e.g., ALA for an alanine amino acid.

Molecule Table

The Molecule table always has the following columns:

   Name           (type string)
   Type           (type string)
A row of the Molecule table corresponds to a molecule.

The entry in the Name column is the molecule name appearing in the .mdf file.

The Type column is not currently used.

NonbondGroup Table

The NonbondGroup table always has the following columns:

   Monomer        (type rid)
   Name           (type string)
   SwitchingAtom  (type rid)
A row of the NonbondGroup table corresponds to a neutral group to be used in a group-based calculation of nonbond energies.

The entry in the Monomer column specifies the monomer to which the nonbond group belongs. It is a reference to the Monomer table. A nonbond group can be assigned to a monomer because all the atoms of a nonbond group will belong to the same monomer.

The entry in the Name column is the nonbond group name appearing in the .mdf file.

The entry in the SwitchingAtom column specifies the atom at which the group begins. It is a reference to the Atom table. The switching atom is used to construct nonbond neighbor lists.

For non-protein systems, no use is made of NonbondGroups.

Bond Table

The Bond table always has the following columns:

   Atom-1         (type rid)
   Atom-2         (type rid)
   Order          (type float)
   Bibond         (type rid)
A row of the Bond Table corresponds to a bond or connection between a pair of atoms in a molecule.

The entry in the Atom-1 column specifies one atom of the pair. It is a reference to the Atom table.

The entry in the Atom-2 column specifies the other atom of the pair. It is a reference to the Atom table.

The entry in the Order column specifies the bond order (e.g., single, double, ...)

The entry in the Bibond column specifies the row in the Bond table, which contains the same bond but with the identities of Atom-1 and Atom-2 reversed. Each bond is stored twice in the Bond table to facilitate selection. Thus, to find all bonds involving a particular atom, you can just identify all rows of the Bond table having the given atom in the Atom-1 column.

One could store bonded atoms in additional columns of the Atom table rather than providing a separate table for bonds. For some operations, e.g., operations involving a fixed number of columns, this design would be slightly more efficient (however, the Discover program uses acceleration algorithms to make table searches quite efficient). For one-to-many relationships, however, representation of relationships in a separate table is preferable. This keeps table and column structure simple and also simplifies pattern matching.


Contents of the Molecular System Database for Certain Molecular Systems

Additional contents of the molecular system database for:

Subset Tables

A molecular system database may also contain subset information. When this is so, the database has a Subset table. A subset is an arbitrary collection of one or more objects. The objects can be of many different types, e.g., atoms, distances, angles, etc. A Subset table has the following columns:

   Context        (type rid)
   Name           (type string)
A row of the Subset table corresponds to a subset.

The Context column specifies the context in which the subset is defined, i.e., a particular molecule or monomer. That is, if a subset is defined using atoms in one monomer, it is in the monomer context; if the atoms are in several monomers, it is in the molecule context.

The Name column specifies the subset name. Subsets in different contexts may have the same name. If two subsets in the same context have the same name, subset select commands for that context and name retrieve the contents of both. They are separate subsets only in the sense that they correspond to different rows of the Subset table.

Subset contents are stored in the SubItem table. The SubItem table has the following columns:

   Subset         (type rid)
   Item           (type rid)
A row of the SubItem table corresponds to an item in a subset.

The Subset column specifies the row number in the Subset table of the subset which contains the item.

The Item column specifies the item itself. It could be an atom or a distance, angle, torsion, or out-of-plane object created in the course of the subset definition.

PseudoAtom Table

A molecular system database may also contain pseudoatom information. When this is so, the database has a PseudoAtom table. A pseudoatom does not correspond to a real atom, but rather is just a point specified explicitly or by the weighted-average coordinates of other atoms. It is called a pseudoatom because it can be used in many operations that take real atoms, e.g., distance measurements and energy restraints. A PseudoAtom table has the following columns:

   Context        (type rid)
   Name           (type string)
   Atom           (type rid)
   Subset         (type rid)
   Weight         (type OBJ_ARRAY or string)
A row of the PseudoAtom table corresponds to a pseudoatom.

The Context column specifies the context in which the pseudoatom is defined, i.e., a particular molecule or monomer. That is, if a pseudoatom is defined using atoms in one monomer, it is in the monomer context; if the atoms are in several monomers, it is in the molecule context.

The Name column specifies the pseudoatom name. Multiple pseudoatoms can share a given name and context.

The Atom column identifies the row of the Atom table corresponding to the pseudoatom.

The Subset column identifies a subset containing the component atoms of the pseudoatom. ``Fixed'' pseudoatoms, which are defined by fixed coordinates, have no component atoms and hence no entry in the Subset column.

The Weight column can take several forms:

NULL - pseudoatom coordinates are the geometric centroid of the component atom coordinates

OBJ_ARRAY - object array of weights used in computing pseudoatom coordinates (should have a weight for each component atom)

string: name of a column in the Atom table with values used as weights in computing pseudoatom coordinates; e.g., mass (for center of mass pseudoatom), charge, or a user-defined column--for center of mass, , where i runs over all atoms in the pseudoatom and wi represent the atomic weights.


Internal Energy Exclusion

For some applications it may be desirable to exclude the energy computation between specific atoms. For example, you may want to calculate a special bond energy for a specific bond or set of bonds and supply the contribution to the Discover program via interprocess communication (IPC). In this case, you can set the appropriate internal energy exclusion flags to tell the Discover program not to calculate such a bond. Note that this is different from the atomMovability flags which affect which atoms are movable: a set of atoms can be excluded from the energy computation (presumably because the values are calculated elsewhere) and still be movable for minimization and dynamics.

Exclusion Flags

To exclude specific atoms from the internal energy computation, create a column of type short in the Atom table of the System database. This column must be named InternalEnergyExclude. A non-zero setting means that that particular atom is to be excluded. Note that an internal energy term is excluded only if all the atoms involved in calculating that internal energy term are excluded. For instance, both atoms for a bond term need to be excluded before the bond term is not calculated.

BTCL Commands

To facilitate setting the internal energy exclusion flags, a BTCL procedure called System_SetExcludeInternalEnergy has been provided. This procedure checks if the Atom.InternalEnergyExclude column exists, creates it if necessary, and sets the value in the column to be 1 (exclude) or 0 (include) for the atoms specified. The procedure is contained in the file $BIOSYM/data/discover/script/systemAtom.tcl.

Example

#BIOSYM btcl 3
#testing acenm for internal energy exclusion functionalities

set PROJECT acenm
begin

database handle dbh System.

#energy for the whole system (output)
energy print energies = 1

#exclude some of the atoms for internal energy calculations
System_SetExcludeInternalEnergy "ACENM:ACE_1:(CA,HA1)"

#energy for the whole system with 2 atoms excluded (output)
energy print energies = 1

#exclude these atoms
atomMovability set excluded ex_1 "ACENM:ACE_1:(HA2,HA3,C,O), ACENM:N-M_2:*"

#include the atoms previously excluded back in the system
System_SetExcludeInternalEnergy "ACENM:ACE_1:(CA,HA1)" Include

#energy for the system with only 2 atoms (output)
energy print energies = 1
The output from calculating the energies of the system is:

Energy components             kcal/mol
Total:                      -18.530128
  Internal:                   5.607232
    Bond:                     2.758628
    Angle:                    2.848604
    Torsion:                  0.000000
    OutOfPlane:               0.000000
  Nonbond:                  -24.137360
    Vdw:                      3.727446
      Repulsive:              8.964310
      Dispersive:            -5.236864
    Electrostatic:          -27.864806
    Hydrogenbond:             0.000000
The output after excluding two of the atoms and recalculating the energies is:

Energy components             kcal/mol
Total:                      -18.743014
  Internal:                   5.394346
    Bond:                     2.545742
    Angle:                    2.848604
    Torsion:                  0.000000
    OutOfPlane:               0.000000
  Nonbond:                  -24.137360
    Vdw:                      3.727446
      Repulsive:              8.964310
      Dispersive:            -5.236864
    Electrostatic:          -27.864806
    Hydrogenbond:             0.000000
The output after excluding all but two atoms and recalculating the energies is:

Energy components             kcal/mol
Total:                        0.212886
  Internal:                   0.212886
    Bond:                     0.212886
    Angle:                    0.000000
    Torsion:                  0.000000
    OutOfPlane:               0.000000
  Nonbond:                    0.000000
    Vdw:                      0.000000
      Repulsive:              0.000000
      Dispersive:             0.000000
    Electrostatic:            0.000000
    Hydrogenbond:             0.000000
Note that the internal energy from the second and third calculations add up to the same internal energy as in the first calculation.


Nonbond Energy Exclusion

Exclusion Flags

To exclude the energies of some nonbond interaction pairs, create a column of flags of type short, called NonbondEnergyExclude, in the MainCell/Atom table of the System database. The flags in the column should be set to either 0 or 1. The nonbond interaction energies between any two atoms that are both labeled by 1 are not calculated during the energy calculation. Note that whether the nonbond interaction energy between an atom labeled 0 and an atom labeled 1 should be calculated is not affected by this column of flags. It is similar to the fixed-atom model, but is used for excluding only some nonbond interactions.

Example

#BIOSYM btcl 3
#testing acenm for internal energy exclusion functionalities

set PROJECT acenm
begin

database handle dbh System.

#energy for the whole system
energy print energies = 1

database handle en1 Energy.
$en1 select "Nonbond" Values.Name e1
$en1 get nb1 Values.Value $e1

#exclude atoms in residue 1
System_SetExcludeNonbondEnergy "ACENM:1:atom;*"

#energy for the whole system with 2 atoms excluded
energy print energies = 1

database handle en2 Energy.
$en2 select "Nonbond" Values.Name e2
$en2 get nb2 Values.Value $e2

$dbh print Atom.NonbondEnergyExclude
$dbh print Atom.Movability

#include back atoms in residue 1
System_SetExcludeNonbondEnergy "ACENM:1:atom;*" Include

$dbh print Atom.NonbondEnergyExclude
$dbh print Atom.Movability

#exclude atoms in residue 2
atomMovability set excluded ex_1 "ACENM:2:atom;*"

#energy for the system with only 2 atoms
energy print energies = 1

database handle en3 Energy.
$en3 select "Nonbond" Values.Name e3
$en3 get nb3 Values.Value $e3

$dbh print Atom.NonbondEnergyExclude
$dbh print Atom.Movability

echo nonbond energies for the full system= [vector nb1]
echo nonbond energies for the with excluded atoms= [vector nb2]
echo nonbond energies for the excluded atoms along= [vector nb3]

Consensus Dynamics Example

To do consensus dynamics, you need to create a column of flags of type integer, called Consensus, in the Main Cell/Atom table of the System database. See details.


Main access page Advanced-Use access.

BTCL Databases access

Copyright Biosym/MSI