Tables - Introduction to the Concept

Jump to Introduction, Illustrations, or Data Types for Columns; explanation of how keywords are documented.

Introduction

What you most need to know about Discover 95 tables can be summarized in two points:

All the data representing atoms, bonds, molecules, and so on, are stored in tables: a separate table for each kind of thing to be described. For example, each atom is one row of the Atom table, and each characteristic of atoms is a column of the Atom table: atom type, formal charge, Cartesian coordinate, and so on.
There are straightforward BTCL script commands--the same commands for all tables--that enable you to print, retrieve, search, or change the information stored in the tables.

Why should you care that Discover 95 stores its data in tables?

Traditionally, software:

Stores its data interwoven with the program.
Can operate on data only in the ways that are programmed in.
Requires writing of new C or C++ code to do anything new or different.

By contrast, Discover 95:

Stores its data separately from the program, not interwoven with it.
Provides a comprehensive script language for printing, extracting, selecting, or storing any data in a table.
Enables you to analyze your data or control the simulation in new ways by retrieving or modifying the data in tables.

To illustrate, here are a few examples of tables and script commands. Below is part of the Atom table for a simple molecule, a trimer of alanine, cysteine, and glycine, with hydrogens deleted to make the table shorter for the example. Only four of the columns are shown (the first column is the table row number of each atom).

     Atom
     {
         Row  Name  Monomer  Charge   Coord
         ---  ----  -------  -------  ------------------------
          0)  N           0  -0.5000  {  0.039 -0.028  0.000 }
          1)  CA          0   0.1200  {  1.499 -0.043  0.000 }
          2)  C           0   0.3800  {  2.055  1.361  0.000 }
          3)  O           0  -0.3800  {  1.321  2.356  0.011 }
          4)  CB          0  -0.3000  {  1.956 -0.866 -1.217 }
          5)  N           1  -0.5000  {  3.367  1.462  0.000 }
          6)  CA          1   0.1200  {  4.043  2.755  0.013 }
          7)  C           1   0.3800  {  5.543  2.582  0.010 }
          8)  O           1  -0.3800  {  6.075  1.462 -0.002 }
          9)  CB          1  -0.3000  {  3.546  3.556  1.233 }
         10)  SG          1   0.0000  {  4.209  5.237  1.208 }
         11)  N           2  -0.5000  {  6.253  3.691  0.022 }
         12)  CA          2  -0.1000  {  7.712  3.675  0.020 }
         13)  C           2   0.3800  {  8.268  5.079  0.035 }
         14)  O           2  -0.3800  {  7.540  6.067  0.046 }
     }

This listing of the Atom table is produced by the BTCL script command:

     $mol print Atom.(Name,Monomer,Charge,Coord)

All the table commands are prefixed by a previously defined database handle to a collection (database) of interrelated tables; here, $mol specifies the collection that contains the Atom table. The database handle print command lists information from all columns of a table (when only the table name is given), from a single column (table and column names joined by a period are given), or from selected columns (table name, period, parenthesis, and comma-separated list of column names). An optional array variable can be used at the end to specify particular rows of the table; when it is not present all rows of the table are printed.

Note the different types of data that columns can hold: Name is a string, Charge is a floating-point number, and the (Cartesian) Coord is a triplet of floating-point numbers in a single column, but two dimensional: each entry is a three-element array, as signified by { } braces.

To retrieve data from a table into a BTCL array variable, you simply use a command like:

     $mol get coord Atom.Coord

The database handle get command creates a new array variable (third word in the command) and retrieves information from a column of a table (fourth word, table and column names joined by a period). An optional fifth word can be used to specify particular rows of the table; when it is not present the entire column (that is, all rows of it) are retrieved.

Now look at the Monomer column of the Atom table. To take advantage of well-researched and powerful relational database techniques, we express many-to-one relationships such as atoms in a monomer by storing the row number of the ``one'' in a column of the ``many'' table. Thus, there is a column in the Atom table to store which monomer each atom is part of, but the Monomer table does not mention atoms:

     Monomer
     {
         Row  Molecule  Number  Type
         ---  --------  ------  ----
          0)         0  1       ALAn
          1)         0  1B      CYSH
          2)         0  1C      GLYN
     }

It may seem backwards initially, but you'll find that the BTCL script language makes it easy to look at relationships from either point of view, whether you want to know which monomer an atom is in or to find all the atoms of a monomer. For example:

     $mol select 1 Atom.Monomer atomList

creates a new array variable called atomList, which contains { 5 6 7 8 9 10 }--these are the row numbers (in the Atom table) of all the atoms in monomer 1.

By the same many-to-one principle, the Bond table stores the two atoms that make up each bond. Atoms can have one or more bonds, but each bond has exactly one atom at each end:

     Bond
     {
         Row  Atom-1  Atom-2  Order    Bibond
         ---  ------  ------  -------  ------
          0)       0       1   1.0000       1
          1)       1       0   1.0000       0
          2)       1       2   1.0000       4
          3)       1       4   1.0000       8
          4)       2       1   1.0000       2
          5)       2       3   2.0000       7
          6)       2       5   1.5000      10
                       .
                       .
     }

Each bond is stored twice, in both directions, to make it possible to find all the bonds of an atom by selecting only the Atom-1 column. For example, to get all the bonds of atom 2:

     $mol select 2 Bond.Atom-1 bondList

This command creates the variable bondList, which contains { 4 5 6 }--the row numbers in the Bond table. We can use the result of a database handle select or get command as a row-list specification for a chained get. For example, to get the names of the atoms bonded to atom 2:

     $mol get bondedAtom Bond.Atom-2 $bondList
     $mol get bondedName Atom.Name   $bondedAtom

These commands create the variables bondedAtom, which contains { 1 3 5 }, and bondedName, which contains { CA O N }.

Lastly, note that the Bond row number of the complementary bond is stored in the last column of the Bond table itself. This is an example of rows of a table being related to each other. Same-table relationships can be one-to-one as they are here, or many-to-one, such as might be used to express a derivation or inheritance hierarchy of rows within a single table.

Data Types for Columns

This table of BTCL Object Types lists the allowable data types for table columns (object attributes)

Special notes:

In addition to the scalar data types (data types with one element), vector data types, which are collections of n values with the same type, can be used as the data type of a column. The most common example of this is Cartesian coordinates, where each row entry in a coordinates column is really a 3-vector of doubles.

The rid data type is specific to the database itself and has no meaning outside a data base. A rid is a reference to an object somewhere in the database. That is, it refers to one row in one table in the database. Which table and row this is, is encoded in the value of the rid. This information is used by the database and script language to resolve references across tables. A rid is a global reference within a single database, but it cannot be used to refer to rows (objects) in another database.

The OBJ_ARRAY data type is an n-vector of values of some type. In fact, coordinates, which are described as 3-vectors of doubles, are OBJ_ARRAYs of type double and length three. The term OBJ_ARRAY is used because the type of an OBJ_ARRAY can also be rid, which means that a valid type for a column in a table is a list or array of objects (rows) in the database.

Main access page

Advanced-Use access.

BTCL Databases access