Selecting database proteins (SELECT)

Introduction.

If you are interested in SELECTing residues before executing an option, look at the command CHOOSE.

If your only interest is searching in the structure database, you dont need WHAT IF. The program SRS (URL=http://www.embl-heidelberg.de/srs/srsc) uses the same PDBFINDER as WHAT IF. SRS searches faster and more flexible than WHAT IF. However, if you want to select files as a pre-filter for the WHAT IF relational structure-sequence database SCAN3D, you should use this SELECT menu.

The selection is made in a relational way. This means that one can say things like: "I want to use all proteases with resolution better than 2.5 Angstrom, and R-factor less than 20.0", or things like that.

This relationallity is just like in the SEARCH menu, and in the SCAN3D menu obtained by creating arrays of logicals or pointers. In the SELECT menu these arrays are called columns. The above mentioned selection could be made in five steps:

Step 1) Use the SELTXT command to select all proteases. Make this column 1.

Step 2) Use the SELNMB command to select proteins solved with a resolution better than 2.5 Angstrom. Make this column 2.

Step 3) Use the SELNMB command to select all proteins for which the R-factor from the crystallographic refinement is better than 20.0. Make this column 3.

Step 4) Use the command SELAND to do a logical AND on the columns 1 and 2. Make this column 4.

Step 5) Do another AND on the columns 3 and 4 to get the final answer. Make this column 5.

After that you can use several commands to look at the results. E.g. SELSHO or SELHIT. You can use the SELUSE command to make the result permanent, i.e. force SCAN3D to only look in those proteins that are tagged true in a certain column in the SELECT MENU.


The data used by WHAT IF in the SELECT menu is all stored in the file PDBFIND.TXT. You find this file in the dbdata directory. We update this file continuously, so a real entry is likely to contain more information than the example listed below. A typical entry in this file roughly looks like:

ID           : 1CRN
Header       : PLANT SEED PROTEIN
 Date        : 1981-04-30
Compound     : Crambin
Source       : Abyssinian Cabbage (Crambe Abyssinica) Seed
Author       : W.A.Hendrickson
Author       : M.M.Teeter
Exp-Method   : X
 Resolution  : 1.50
 R-Factor    : 0.11
HSSP-N-Align : 8
T-Frac-Helix : 0.48
T-Frac-Beta  : 0.09
T-Nres-Prot  : 46
Chain        : _
 Sec-Struc   : 46
  Helix      : 22
   i,i+3     : 3
  Beta       : 4
   Anti-Hb   : 4
 Amino-Acids : 46
  CYSS       : 6
 Sequence    : TTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN

But dont count on this too much because more fields will be added in the future. Examples of planned extensions are: Quality of the entry; Symmetry information (SCALE and CRYST info, but corrected); pointers to other databases, etc.

Searching in the database at protein level

Several options exist to search in the database. You can search for numerical values, texts strings, sequence patterns, etc.

Searching for numerical values (SELNMB)

Many entries in PDBFIND.TXT hold numerical information (R-factor, Resolution, Number of amino acids, etc.). The SELNMB command will allow you to select the keyword of a numerical value, and a range for this value. All entries that have the requested value in the requested range will be set to true in the corresponding column. You will be prompted for the column number and the column name.

Logically combining selected columns

The commands SELAND and SELOR can be used to combine selected columns. Other combination possibilities can be added on request.

Searching for text (SELTXT)

Many entries in PDBFIND.TXT hold textual information (Authors, Experimental method, coumpound, source, het atom groups). The SELTXT command will allow you to select the keyword of a text value, and a range for this value. All entries that have the requested text as a subtext in the line that starts with the requested keyword will be set to true in the corresponding column. You will be prompted for the column number and the column name.

Logically combining selected columns

The commands SELAND and SELOR can be used to combine selected columns. Other combination possibilities can be added on request.

Do an and on two columns (SELAND)

The command SELAND will cause WHAT IF to prompt you for two input columns and one output column (which may be the same as one of the input columns). It will then do a logical AND on the two input columns, and store the result in the output column.

Do an or on two columns (SELOR)

The command SELOR will cause WHAT IF to prompt you for two input columns and one output column (which may be the same as one of the input columns). It will then do a logical OR on the two input columns, and store the result in the output column.

Inverting a column (SELINV)

The command SELINV will cause WHAT IF to prompt you for one input and one output column. The output column may be the same as the input column. It will then copy the input column to the output column, and toggle all TRUEs to FALSEs and vice versa in this output column.

Inspecting columns

Showing wich columns exist (SELSHO)

The command SELSHO will cause WHAT IF to show you the number of hits in every created column. It will also show for each column how it was created.

Showing the hits in a column (SELHIT)

The command SELHIT will cause WHAT IF to prompt you for a column number. It will then show part of the information from the entries in PDBFIND.TXT that have a TRUE in the given column. See also SELLST for a similar option that gives less output.

Showing the hits in a column (SELLST)

The command SELLST will cause WHAT IF to prompt you for a column number. It will then show a small part of the information from the entries in PDBFIND.TXT that have a TRUE in the given column. See also SELHIT for a similar option that gives more output.

Other commands

Several other commands exist:

Initializing columns (SELINI)

The command SELINI will cause WHAT IF to initialize all parameters and variables that are related to the SELECT menu operations. This initialization is irreversible.

Using select columns for scan3d (SELUSE)

The command SELUSE will cause WHAT IF to prompt you for a column. It will mark all entries in the SCAN3D database active that are true in this column. That means that future searches with SCAN3D will no longer look in the whole database, but in the subset of the SCAN3D database that was selected with thgis option. Use column 0 (zero) if you want to (re-)activate all SCAN3D entries (so, issue the SELUSE 0 command).