LALNVIEW : A graphical viewer for pairwise alignments !!!!!!!!!!!!!!!!!!!!!!!!!! BETA VERSION !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Please report bugs to duret@dim.hcuge.ch !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Version 2.2, July 1996 Laurent Duret Medical Biochemistry Department Centre Medical Universitaire 1211 Geneva 4 Switzerland Electronic mail address: duret@dim.hcuge.ch WWW server: http://expasy.hcuge.ch/ Acknowledgments: I thank the GLAXO Institute for Molecular Biology (Geneva) for hardware and support, Nicolas Guex and Anne Danckaert for their help in porting LALNVIEW to Mac and PC. ---------------- Availability: Anonymous FTP at: expasy.hcuge.ch Directory: /pub/lalnview ---------------- 1- Introduction LALNVIEW is a graphical program for visualizing local alignments between two sequences (protein or nucleic acids). Sequences are represented by colored rectangles to give an overall picture of the similarities between the two sequences. Blocs of similarity between the two sequences are colored according to the degree of identity between the two segments. Look at the picture (GIF format) lalnview_1.4.gif to have an idea of what it looks like. By clicking on a bloc, the user can visualize the corresponding local alignment. A bloc can be repeated in a sequence: iterative clicking on one block will successively display all the similar blocks that occur in the other sequence. ---------------- 2- Running LALNVIEW LALNVIEW does not calculate the alignments itself: it uses the output of local alignment programs. LALNVIEW is able to read alignments from three widely used softwares: LFASTA, LALIGN and SIM. LFASTA uses the FASTA heuristic to quickly find local regions of similarity between two sequences [Pearson & Lipman 1988]. SIM and LALIGN are two different implementations of the rigourous algorithm of Huang and Miller [Huang and Miller 1991]. The Huang and Miller algorithm guarentees to find the N-best local alignments between two sequences. Its main drawback is that it is much slower than LFASTA (more than 100-fold slower for the comparison of two sequences of 3000 residues each). If sequences are relatively short (less than 1000 residues) or if calculation time is not limiting, then the Huang and Miller algorithm should be prefered to LFASTA. SIM sources, executables and documentation are distributed along with LALNVIEW. LFASTA and LALIGN are available by anonymous FTP at ftp.virginia.edu Directory /pub/fasta Or ftp.bio.indiana.edu Directory /molbio/search The first step is to run a local alignment software (SIM, LALIGN or LFASTA). The three programs read sequences in FASTA format : >seq-name annotation of any sort ATCGGAGTCGATGGTCACCGNTGGCAC GTACGTACCGTTGTCCAAACTGTGCAY ... For example, to find with SIM 7 best local alignments of segments from two sequences in files A and B, and using the default values, use the command sim 7 A B > search.out Or with LALIGN: lalign A B 7 > search.out Or with LFASTA: lfasta -b 7 A B > search.out Then run LALNVIEW : lalnview search.out & NB : lalnview needs the two sequence files "A" and "B" to be located in the same directory as "search.out". NB: Mac users simply have to drag and drop the search output file on the LALNVIEW icon. NB: SIM software has not been compiled for Macintosh or PC. To run the alignments, use the ExPaSy WWW server (protein sequences): http://expasy.hcuge.ch/sprot/sim-prot.html or http://expasy.hcuge.ch/sprot/sim-nucl.html or the ACNUC WWW server (nucleic acid sequences): http://acnuc.univ-lyon1.fr/lfasta.html and declare LALNVIEW as helper application on your WWW browser. ---------------- 3- Changing similarity score threshold LALNVIEW displays all the local alignments with a similarity score greater than a given threshold value. Change the value in the "Similarity Score Threshold" box and type to validate this new value. Increase this value to display only the most significant blocks of similarity. If you set the threshold to 0, LALNVIEW will display all the local alignments that have been reported by the local alignment search software. ---------------- 4- Saving the alignments in a file Click on "Save Alignments" button to write the alignments in a file (by default, this file is named according to the input filename + the extension ".aln". For example "search.aln" ). WARNING : Only the local alignments with a similarity score greater than the current threshold value will be reported. ---------------- 5- Creating a postscript picture Click on "Create Post-Script" button to save the current picture in a file in Post-Script format (by default this file is named according to the input filename + the extension ".ps". For example "search.ps" ). You can choose to save either a color or grey-scale picture. NB: Mac users can use the freeware PS2EPS+ to translate the postscript file into a Mac readable picture (EPS, PICT, TIFF, PCX). PS2EPS+ is distributed along with LALNVIEW. ---------------- 6- Adding "Features" The button "Add Features" allows you to read in a list of segment coordinates from a file. These segments are then displayed on the screen as coloured boxes. Your file with feature segments should be in this format: SeqNumber Begin End Width Color Label SeqNumber: 1 for the bottom sequence, 2 for the upper Begin: beginning of the feature End: end of the feature If "Begin" and "End" are equal, then only a line is displayed. Height: Height of the box representing the feature. any value between 0.0 and 2.2 Color: color of the box representing the feature. 135 colors are available (see at the end of this file). Label: any annotation < 80 characters The file should contain a line "# FEATURES BEGIN" and a line "# FEATURES END". Only the lines between these two ones are considered. Example: # FEATURES BEGIN 1 1 149 1.0 blue Exon 1 1 1 1.0 blue Transcription start site 2 700 800 1.0 green Exon 2 800 1000 0.2 blue Intron 2 850 950 1.0 yellow Alu repeat 2 1000 1100 1.0 green Exon 2 1100 1500 0.2 blue Intron 2 1500 1600 1.0 green Exon # FEATURES END Click on any "feature" box in the picture to see the corresponding annotation. WHEN USING LALNVIEW THROUGH THE EXPASY OR ACNUC WWW SERVERS, SEQUENCE FEATURES ARE AUTOMATICALLY EXTRACTED FROM DATABASE ANNOTATIONS. ExPaSy WWW server (protein sequences): http://expasy.hcuge.ch/sprot/sim-prot.html or http://expasy.hcuge.ch/sprot/sim-nucl.html ACNUC WWW server (nucleic acid sequences): http://acnuc.univ-lyon1.fr/lfasta.html ---------------- 7- Changing the colors of the similarity scale Click on any colored rectangle in the similarity scale to change the color affected to the corresponding value. ---------------- 8- Reading alignment, sequences and features from a single file LALNVIEW reads data from 3 (or 4) files: - SIM (or LFASTA or LALIGN) output file - first sequence file - second sequence file - features description file [optional] In some cases, it may be convenient to compile all these data in a single file. It is possible to concatenate these 3 (or 4) files in a single file: - this file must begin with SIM (or LFASTA or LALIGN) output data - the first sequence must be preceded by the line: "# SEQUENCES BEGIN 1" - the second sequence must be preceded by the line: "# SEQUENCES BEGIN 2" - the second sequence must be followed by the line: "# SEQUENCES END" In UNIX language, this would be done with the following commands: cat sim_output_file > my_new_file echo "# SEQUENCES BEGIN 1" >> my_new_file cat first_seq_file >> my_new_file echo "# SEQUENCES BEGIN 2" >> my_new_file cat second_seq_file >> my_new_file echo "# SEQUENCES END" >> my_new_file cat feature_file >> my_new_file Then run LALNVIEW with: lalnview my_new_file & Exemple: --------------- begin of file ------------------------------------ #:lav d { "SIM output with parameters: Used PAM200 matrix O = 12, E = 4" } s { "seq_file1" 1 110 "seq_file2" 1 115 } k { "% match" } a { s 168.0 b 11 13 e 110 115 l 11 13 52 54 52.4 l 53 56 64 67 25.0 l 65 69 87 91 4.3 l 88 93 110 115 78.3 } a { s 24.0 b 1 1 e 24 24 l 1 1 24 24 33.3 } # SEQUENCES BEGIN 1 >INS_HUMAN MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY TPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSIC SLYQLENYCN # SEQUENCES BEGIN 2 >INS_MYXGL MALSPFLAAVIPLVLLLSRAPPSADTRTTGHLCGKDLVNALYIACGVRGF FYDPTKMKRDTGALAAFLPLAYAEDNESQDDESIGINEVLKSKRGIVEQC CHKRCSIYDLENYCN # SEQUENCES END # FEATURES BEGIN 1 1 24 0.50 yellow SIGNAL 1 25 54 0.50 yellow CHAIN B CHAIN. 1 57 87 0.50 yellow PROPEP C PEPTIDE. 2 96 97 0.10 black TURN 2 102 108 0.50 blue HELIX 2 109 109 0.50 pink STRAND # FEATURES END --------------- end of file ------------------------------------ ########################################################################## Known Bugs: UNIX : LALNVIEW first opens a window named "Lost Window" this is just to get the features of the screen and thus adjust the size of the LALNVIEW window that is opened afterward. This "Lost Window" remains on the screen until one exits LALNVIEW. HP : for some stupid reasons, LALNVIEW produces many error messages when running on HP. To avoid these useless messages run lalnview with the following command: lalnview search.out >& /dev/null ########################################################################## ################################ # # # Available Colors # # # ################################ # # aliceblue green navajowhite # antiquewhite greenyellow navy # aquamarine grey navyblue # azure honeydew oldlace # beige hotpink olivedrab # bisque indianred orange # black ivory orangered # blanchedalmond khaki orchid # blue lavender palegoldenrod # blueviolet lavenderblush palegreen # brown lawngreen paleturquoise # burlywood lemonchiffon palevioletred # cadetblue lightblue papayawhip # chartreuse lightcoral peachpuff # chocolate lightcyan pink # coral lightgoldenrod plum # cornflowerblue lightgoldenrodyellow powderblue # cornsilk lightgray purple # cyan lightgrey red # darkgoldenrod lightpink rosybrown # darkgreen lightsalmon royalblue # darkkhaki lightseagreen saddlebrown # darkolivegreen lightskyblue salmon # darkorange lightslateblue sandybrown # darkorchid lightslategray seagreen # darksalmon lightslategrey seashell # darkseagreen lightsteelblue sienna # darkslateblue lightyellow skyblue # darkslategray limegreen slateblue # darkslategrey linen slategray # darkturquoise magenta slategrey # darkviolet maroon snow # deeppink mediumaquamarine springgreen # deepskyblue mediumblue steelblue # dimgray mediumorchid tan # dimgrey mediumpurple thistle # dodgerblue mediumseagreen tomato # firebrick mediumslateblue turquoise # floralwhite mediumspringgreen violet # forestgreen mediumturquoise violetred # gainsboro mediumvioletred wheat # ghostwhite midnightblue white # gold mintcream whitesmoke # goldenrod mistyrose yellow # gray moccasin yellowgreen