It does not seem likely that you can generate HSSP files if you are not working at EMBL, but you can download these files from the EMBL file server. Just send HELP to the internet address `NETSERV@EMBL-Heidelberg.DE` and you will get all information about the available data.
You do not have to type the extension of PDB or HSSP files if the extension is brk or pdb for PDB files and hssp for HSSP files. You do not have to type the path (is directory) if it is the standard PDB or HSSP directory of your system.
1) Header 2) List of aligned sequences 3) Sequence alignment part 4) ProfileThe insertion fragments at the bottom are presently not yet used.
Part 1: The header: HSSP HOMOLOGY DERIVED SECONDARY STRUCTURE OF PROTEINS , VERSION 1.0 1991 PDBID 1crn DATE file generated on 19-Mar-93 SEQBASE RELEASE 24.0 OF EMBL/SWISS-PROT WITH 28154 SEQUENCES PARAMETER SMIN: -0.5 SMAX: 1.0 PARAMETER gap-open: 3.0 gap-elongation: 0.1 PARAMETER conservation weights PARAMETER no insertions/deletions in secondary structure allowed PARAMETER alignments sorted according to:DISTANCE THRESHOLD according to t(L)=(290.15 * L ** -0.562) + 5 REFERENCE Sander C., Schneider R. : Database of homology-derived protein ... structures. Proteins, Proteins, 9:56-68 (1991). CONTACT e-mail (INTERNET) Schneider@EMBL-Heidelberg.DE or Sander@EMBL-Heidelberg.DE ... / phone +49-6221-387361 / fax +49-6221-387306 AVAILABLE Free academic use. Commercial users must apply for license. HEADER PLANT SEED PROTEIN COMPND CRAMBIN SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED AUTHOR W.A.HENDRICKSON,M.M.TEETER SEQLENGTH 46 NCHAIN 1 chain(s) in 1crn.DSSP data set NALIGN 8 NOTATION : ID: EMBL/SWISSPROT identifier of the aligned (homologous) protein NOTATION : STRID: if the 3-D structure of the aligned protein is known, then ... STRID is the Protein Data Bank identifier as taken NOTATION : from the database reference or DR-line of the EMBL/SWISSPROT entry NOTATION : %IDE: percentage of residue identity of the alignment NOTATION : %SIM (%WSIM): (weighted) similarity of the alignment NOTATION : IFIR/ILAS: first and last residue of the alignment in the test sequence NOTATION : JFIR/JLAS: first and last residue of the alignment in the alignend protein NOTATION : LALI: length of the alignment excluding insertions and deletions NOTATION : NGAP: number of insertions and deletions in the alignment NOTATION : LGAP: total length of all insertions and deletions NOTATION : LSEQ2: length of the entire sequence of the aligned protein NOTATION : ACCNUM: SwissProt accession number NOTATION : PROTEIN: one-line description of aligned protein NOTATION : SeqNo,PDBNo,AA,STRUCTURE,BP1,BP2,ACC: sequential and PDB residue 123456789.123456789.123456789.123456789.123456789.123456789.123456789.123456789. NOTATION : numbers, amino acid (lower case = Cys), secondary structure, NOTATION : bridge partners, solvent exposure as in DSSP (Kabsch and Sander, NOTATION : Biopolymers 22, 2577-2637(1983) NOTATION : VAR: sequence variability on a scale of 0-100 as derived from NOTATION : the NALIGN alignments pair of lower case characters (AvaK) in NOTATION : the alignend sequence bracket a point of insertion in this sequence NOTATION : dots (....) in the alignend sequence indicate points of deletion NOTATION : in this sequence NOTATION : SEQUENCE PROFILE: relative frequency of an amino acid type at NOTATION : each position. Asx and Glx are in their acid/amide form in NOTATION : proportion to their database frequencies NOTATION : NOCC: number of aligned sequences spanning this position NOTATION : NDEL: number of sequences with a deletion in the test protein NOTATION : at this position NOTATION : NINS: number of sequences with an insertion in the test protein NOTATION : at this position NOTATION : ENTROPY: entropy measure of sequence variability at this position NOTATION : RELENT: relative entropy. entropy normalized to the range 0-100 NOTATION : WEIGHT: conservation weight Part 2) The list of aligned sequences. In real HSSP files the name of the sequence is written after the accession number. ## PROTEINS : EMBL/SWISSPROT identifier and alignment statistics NR. ID STRID %IDE %WSIM IFIR ILAS JFIR JLAS LALI NGAP LGAP LSEQ2 ACCNUM 1:cram_craab 1CRN 1.00 1.00 1 46 1 46 46 0 0 46 P01542 2:thn_pyrpu 0.53 0.59 2 46 2 47 45 1 1 47 P07504 3:thn_dencl 0.53 0.69 2 44 2 44 43 0 0 46 P01541 4:thn3_visal 0.49 0.66 2 46 28 72 45 0 0 111 P01538 5:thn_pholi 0.47 0.61 2 46 2 46 45 0 0 46 P01540 6:thnl_horvu 0.44 0.61 2 46 30 74 45 0 0 137 P09617 7:thnb_visal 0.44 0.61 2 46 8 52 45 0 0 103 P08943 8:thn6_horvu 0.40 0.57 2 46 2 46 45 0 0 46 P09618 Part 3) the aligned sequences. Residues 10-39 have been deleted to save some space. ## ALIGNMENTS 1 - 8 SeqNo PDBNo AA STRUCTURE BP1 BP2 ACC NOCC VAR ....:....1....:....2....: 1 1 T 0 0 77 2 0 T 2 2 T E -A 34 0A 21 9 23 TSSSSSSS 3 3 a E -A 33 0A 0 9 0 CCCCCCCC 4 4 b - 0 0 0 9 0 CCCCCCCC 5 5 P S S+ 0 0 52 9 28 PRPPPKPK 6 6 S S > S- 0 0 48 9 35 SNTNSNND 7 7 I H > S+ 0 0 123 9 25 ITTTTTTT 8 8 V H > S+ 0 0 98 9 47 VWATTTTL 9 9 A H > S+ 0 0 6 9 15 AAAGAGGA 40 40 a - 0 0 46 9 0 CCCCCCCC 41 41 P > - 0 0 53 9 11 PPPPBPPP 42 42 G G > S+ 0 0 75 9 32 GSPSSRSS 43 43 D G 3 S+ 0 0 116 9 12 DDGDGDDD 44 44 Y G < S+ 0 0 66 9 3 YYYYWYYY 45 45 A < 0 0 70 8 31 AP PBPPP 46 46 N 0 0 76 8 31 NK KHKKK Part 4) the profile. The profile values for 16 residues have been removed to save some space. ## SEQUENCE PROFILE AND ENTROPY SeqNo PDBNo V L N D NOCC NDEL NINS ENTROPY RELENT WEIGHT 1 1 0 0 0 0 2 0 0 0.000 0 1.00 2 2 0 0 0 0 9 0 0 0.530 24 1.00 3 3 0 0 0 0 9 0 0 0.000 0 1.33 4 4 0 0 0 0 9 0 0 0.000 0 1.33 5 5 0 0 0 0 9 0 0 0.849 39 0.79 6 6 0 0 44 11 9 0 0 1.215 55 0.75 7 7 0 0 0 0 9 0 0 0.530 24 0.98 8 8 22 11 0 0 9 0 0 1.427 65 0.47 9 9 0 0 0 0 9 0 0 0.637 29 1.01 40 40 0 0 0 0 9 0 0 0.000 0 1.33 41 41 0 0 0 11 9 0 0 0.349 16 1.11 42 42 0 0 0 0 9 0 0 1.149 52 0.84 43 43 0 0 0 78 9 0 0 0.530 24 1.14 44 44 0 0 0 0 9 0 0 0.349 16 1.27 45 45 0 0 0 13 8 0 0 0.900 43 0.81 46 46 0 0 25 0 8 0 0 0.900 43 0.80 Part 5) The list of insertions is not used by WHAT IF. All HSSP files are terminated with two slashes.
SHOHSP will cause WHAT IF to prompt you for the name of the HSSP file, and will thereafter ask you if you want to see the header, the aligned sequence file names, the alignment, and the derived sequence profile respectively. Just answer these questions with Y or N depending on what you want to see.
WHAT IF will also ask you if you want exact matches. You should normally answer this question with YES. However if something along the line went wrong, you could try NO, but be aware that WHAT IF will in that case not check anything at all.
WARNING: insertions in the aligned sequences are neglected.
The command PIRHSP will cause WHAT IF to prompt you for the name of an HSSP file. Thereafter you will be asked what to do with the two residues that are bordering the (absent) insertion. You can either keep them, or remove them; which is equivalent with making them both a deletion.
If your HSSP file is for example called 1xyz.hssp, then the PIR format sequence files will be called 1XYZ.101, 1XYZ.102, etc. So the directory part and the leading non-alpha numerical part of the HSSP file are removed, and the rest of the name is used as the basis for the PIR file names.
You are also prompted about what to do with insertion borders. You can either have them modeled, or have them deleted. There is something to say for both of these options. The reason to delete them would be that they are next to an insertion, and thus will guaranteed be modeled wrong. The reason to keep them is that otherwise their direct neighbours would incorrectly become surface residues. I guess you can come up with a thousand good reasons yourself too.
You will also see a question like: Do you only want to build those models for which a structure exists. If you say YES, WHAT IF will only build those models for which text is found in the STRID column. You can use this as a tric to only build a couple of models.
The structures will be generated completely automatically. Be aware of a couple of things:
1) Insertions are not modeled.
2) This option can become extremely time consuming. Try it out on a small case. Use BLDFST if you want quick-and-dirty models.
3) Never trust any automatically generated models.
You can use GO in the GRAFIC menu and click MOV+ and MOV- to flip through the models.