It does not seem likely that you can generate HSSP files if you are not working at EMBL, but you can download these files from the EMBL file server. Just send HELP to the internet address `NETSERV@EMBL-Heidelberg.DE` and you will get all information about the available data.
You do not have to type the extension of PDB or HSSP files if the extension is brk or pdb for PDB files and hssp for HSSP files. You do not have to type the path (is directory) if it is the standard PDB or HSSP directory of your system.
1) Header 2) List of aligned sequences 3) Sequence alignment part 4) ProfileThe insertion fragments at the bottom are presently not yet used.
Part 1: The header:
HSSP HOMOLOGY DERIVED SECONDARY STRUCTURE OF PROTEINS , VERSION 1.0 1991
PDBID 1crn
DATE file generated on 19-Mar-93
SEQBASE RELEASE 24.0 OF EMBL/SWISS-PROT WITH 28154 SEQUENCES
PARAMETER SMIN: -0.5 SMAX: 1.0
PARAMETER gap-open: 3.0 gap-elongation: 0.1
PARAMETER conservation weights
PARAMETER no insertions/deletions in secondary structure allowed
PARAMETER alignments sorted according to:DISTANCE
THRESHOLD according to t(L)=(290.15 * L ** -0.562) + 5
REFERENCE Sander C., Schneider R. : Database of homology-derived protein
... structures. Proteins, Proteins, 9:56-68 (1991).
CONTACT e-mail (INTERNET) Schneider@EMBL-Heidelberg.DE or Sander@EMBL-Heidelberg.DE
... / phone +49-6221-387361 / fax +49-6221-387306
AVAILABLE Free academic use. Commercial users must apply for license.
HEADER PLANT SEED PROTEIN
COMPND CRAMBIN
SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
AUTHOR W.A.HENDRICKSON,M.M.TEETER
SEQLENGTH 46
NCHAIN 1 chain(s) in 1crn.DSSP data set
NALIGN 8
NOTATION : ID: EMBL/SWISSPROT identifier of the aligned (homologous) protein
NOTATION : STRID: if the 3-D structure of the aligned protein is known, then
... STRID is the Protein Data Bank identifier as taken
NOTATION : from the database reference or DR-line of the EMBL/SWISSPROT entry
NOTATION : %IDE: percentage of residue identity of the alignment
NOTATION : %SIM (%WSIM): (weighted) similarity of the alignment
NOTATION : IFIR/ILAS: first and last residue of the alignment in the test sequence
NOTATION : JFIR/JLAS: first and last residue of the alignment in the alignend protein
NOTATION : LALI: length of the alignment excluding insertions and deletions
NOTATION : NGAP: number of insertions and deletions in the alignment
NOTATION : LGAP: total length of all insertions and deletions
NOTATION : LSEQ2: length of the entire sequence of the aligned protein
NOTATION : ACCNUM: SwissProt accession number
NOTATION : PROTEIN: one-line description of aligned protein
NOTATION : SeqNo,PDBNo,AA,STRUCTURE,BP1,BP2,ACC: sequential and PDB residue
123456789.123456789.123456789.123456789.123456789.123456789.123456789.123456789.
NOTATION : numbers, amino acid (lower case = Cys), secondary structure,
NOTATION : bridge partners, solvent exposure as in DSSP (Kabsch and Sander,
NOTATION : Biopolymers 22, 2577-2637(1983)
NOTATION : VAR: sequence variability on a scale of 0-100 as derived from
NOTATION : the NALIGN alignments pair of lower case characters (AvaK) in
NOTATION : the alignend sequence bracket a point of insertion in this sequence
NOTATION : dots (....) in the alignend sequence indicate points of deletion
NOTATION : in this sequence
NOTATION : SEQUENCE PROFILE: relative frequency of an amino acid type at
NOTATION : each position. Asx and Glx are in their acid/amide form in
NOTATION : proportion to their database frequencies
NOTATION : NOCC: number of aligned sequences spanning this position
NOTATION : NDEL: number of sequences with a deletion in the test protein
NOTATION : at this position
NOTATION : NINS: number of sequences with an insertion in the test protein
NOTATION : at this position
NOTATION : ENTROPY: entropy measure of sequence variability at this position
NOTATION : RELENT: relative entropy. entropy normalized to the range 0-100
NOTATION : WEIGHT: conservation weight
Part 2) The list of aligned sequences. In real HSSP files the name of the
sequence is written after the accession number.
## PROTEINS : EMBL/SWISSPROT identifier and alignment statistics
NR. ID STRID %IDE %WSIM IFIR ILAS JFIR JLAS LALI NGAP LGAP LSEQ2 ACCNUM
1:cram_craab 1CRN 1.00 1.00 1 46 1 46 46 0 0 46 P01542
2:thn_pyrpu 0.53 0.59 2 46 2 47 45 1 1 47 P07504
3:thn_dencl 0.53 0.69 2 44 2 44 43 0 0 46 P01541
4:thn3_visal 0.49 0.66 2 46 28 72 45 0 0 111 P01538
5:thn_pholi 0.47 0.61 2 46 2 46 45 0 0 46 P01540
6:thnl_horvu 0.44 0.61 2 46 30 74 45 0 0 137 P09617
7:thnb_visal 0.44 0.61 2 46 8 52 45 0 0 103 P08943
8:thn6_horvu 0.40 0.57 2 46 2 46 45 0 0 46 P09618
Part 3) the aligned sequences. Residues 10-39 have been deleted to save
some space.
## ALIGNMENTS 1 - 8
SeqNo PDBNo AA STRUCTURE BP1 BP2 ACC NOCC VAR ....:....1....:....2....:
1 1 T 0 0 77 2 0 T
2 2 T E -A 34 0A 21 9 23 TSSSSSSS
3 3 a E -A 33 0A 0 9 0 CCCCCCCC
4 4 b - 0 0 0 9 0 CCCCCCCC
5 5 P S S+ 0 0 52 9 28 PRPPPKPK
6 6 S S > S- 0 0 48 9 35 SNTNSNND
7 7 I H > S+ 0 0 123 9 25 ITTTTTTT
8 8 V H > S+ 0 0 98 9 47 VWATTTTL
9 9 A H > S+ 0 0 6 9 15 AAAGAGGA
40 40 a - 0 0 46 9 0 CCCCCCCC
41 41 P > - 0 0 53 9 11 PPPPBPPP
42 42 G G > S+ 0 0 75 9 32 GSPSSRSS
43 43 D G 3 S+ 0 0 116 9 12 DDGDGDDD
44 44 Y G < S+ 0 0 66 9 3 YYYYWYYY
45 45 A < 0 0 70 8 31 AP PBPPP
46 46 N 0 0 76 8 31 NK KHKKK
Part 4) the profile. The profile values for 16 residues have been
removed to save some space.
## SEQUENCE PROFILE AND ENTROPY
SeqNo PDBNo V L N D NOCC NDEL NINS ENTROPY RELENT WEIGHT
1 1 0 0 0 0 2 0 0 0.000 0 1.00
2 2 0 0 0 0 9 0 0 0.530 24 1.00
3 3 0 0 0 0 9 0 0 0.000 0 1.33
4 4 0 0 0 0 9 0 0 0.000 0 1.33
5 5 0 0 0 0 9 0 0 0.849 39 0.79
6 6 0 0 44 11 9 0 0 1.215 55 0.75
7 7 0 0 0 0 9 0 0 0.530 24 0.98
8 8 22 11 0 0 9 0 0 1.427 65 0.47
9 9 0 0 0 0 9 0 0 0.637 29 1.01
40 40 0 0 0 0 9 0 0 0.000 0 1.33
41 41 0 0 0 11 9 0 0 0.349 16 1.11
42 42 0 0 0 0 9 0 0 1.149 52 0.84
43 43 0 0 0 78 9 0 0 0.530 24 1.14
44 44 0 0 0 0 9 0 0 0.349 16 1.27
45 45 0 0 0 13 8 0 0 0.900 43 0.81
46 46 0 0 25 0 8 0 0 0.900 43 0.80
Part 5) The list of insertions is not used by WHAT IF.
All HSSP files are terminated with two slashes.
SHOHSP will cause WHAT IF to prompt you for the name of the HSSP file, and will thereafter ask you if you want to see the header, the aligned sequence file names, the alignment, and the derived sequence profile respectively. Just answer these questions with Y or N depending on what you want to see.
WHAT IF will also ask you if you want exact matches. You should normally answer this question with YES. However if something along the line went wrong, you could try NO, but be aware that WHAT IF will in that case not check anything at all.
WARNING: insertions in the aligned sequences are neglected.
The command PIRHSP will cause WHAT IF to prompt you for the name of an HSSP file. Thereafter you will be asked what to do with the two residues that are bordering the (absent) insertion. You can either keep them, or remove them; which is equivalent with making them both a deletion.
If your HSSP file is for example called 1xyz.hssp, then the PIR format sequence files will be called 1XYZ.101, 1XYZ.102, etc. So the directory part and the leading non-alpha numerical part of the HSSP file are removed, and the rest of the name is used as the basis for the PIR file names.
You are also prompted about what to do with insertion borders. You can either have them modeled, or have them deleted. There is something to say for both of these options. The reason to delete them would be that they are next to an insertion, and thus will guaranteed be modeled wrong. The reason to keep them is that otherwise their direct neighbours would incorrectly become surface residues. I guess you can come up with a thousand good reasons yourself too.
You will also see a question like: Do you only want to build those models for which a structure exists. If you say YES, WHAT IF will only build those models for which text is found in the STRID column. You can use this as a tric to only build a couple of models.
The structures will be generated completely automatically. Be aware of a couple of things:
1) Insertions are not modeled.
2) This option can become extremely time consuming. Try it out on a small case. Use BLDFST if you want quick-and-dirty models.
3) Never trust any automatically generated models.
You can use GO in the GRAFIC menu and click MOV+ and MOV- to flip through the models.