Exercise 7 Bioinformatics
Exercise 7 Bioinformatics
Glenda Z. Doblas
I. INTRODUCTION
II. OBJECTIVES
III. MATERIALS/EQUIPMENT
IV. PROCEDURE
1
c. Paste the unknown DNA sequences in FASTA format (your instructor will provide you with the
sequences) and click on BLAST button located at the bottom of the page. This will search DNA
sequences that are similar to the query sequence.
d. A result will be displayed showing the closest match. You will be shown with a color-coded graph
showing regions of similarity with red indicating the highest similarity and black as the least
similar.
e. You will also be shown with a table of sequences with significant alignments. The higher the
“Score” value, the closer the match. The lower the “E” value, the more statistically meaningful
the match. What is the best match of the sequence assigned to you?
f. Retrieve human and bacterial nucleotide sequences from NCBI and run appropriate sequence
using the online tools below. Interpret your results.
GeneFinder http://rulai.cshl.edu/tools/genefinder/
2
GLIMMER http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi
c. Retrieve the protein sequence in FASTA format of the first displayed sequence.
e. Paste the sequence in MS Word and edit leaving only the name of the organism and the
sequence. Example:
>Prochlorococcus marinus str. MIT 9301
MKIFKFLFVIPLITLIIIFQTSLQNRYLMASDIRDGETIFRNVCAGCHVRGGSVVLKGSKSLKLSDL
EKRGIADVNSITIIANEGIGFMKGYKNKLNDGEDKVLAQWIIQNAEKGWK
The letters correspond to the specific amino acids comprising the cytochrome C protein. What
amino acid is coded by each letter?
3
g. Retrieve sequences from other organisms using the same steps. Edit and paste these
sequences in the same MS WORD text file. To speed up this process, your instructor have
already prepared a text file containing a set of protein sequences from different organisms
representing the bacteria, animals, yeasts, molds, algae and plants.
h. Now, we will align the sequences. Launch the program “ClustalX”. In the file menu, select load
sequences and load the text file of protein sequences you saved previously.
i. The sequences are displayed in the original or unaligned state. Note how the length of
sequences differs from one another. You can view the sequences from left to right or right to left.
Notice the different colors. It corresponds to the different amino acids. Notice also the gray bars
below the ruler. The higher the bar, the more conserved the sequences are at that position.
j. From the alignment menu, choose Do Complete Alignment and select the folder where you want
to save the .aln and .dnd files.
4
k. The sequences are now aligned. What have you noticed with the gray bars and the position of
the sequences? Did you observe some symbols above the sequences? A “*” indicates a single
fully conserved residue, “:” indicates ‘strong’ amino acid R groups conserved and “.” indicates
‘weak” amino acid R groups conserved.
l. Now, calculate the genetic distances between the aligned sequences. Go to the Trees menu
and select Bootstrap N-J Tree. A .phb file containing genetic distance data will be written. This
will be used to construct phylogenetic tree.
m. Now, go back to your folder where you have saved the files. Check if you have the complete
files. Three of these files are written by ClustalX. The .aln file is the sequence alignment file. The
.dnd and .ph files are genetic distance files which can also be used by other softwares in building
phylogenetic trees.
5
n. Now, let’s build the phylogenetic tree using the N-J Plot software. Launch the N-J Plot
application. Click on the File menu and open th .ph file.
o. A phylogenetic tree is now displayed. Examine the branches. Are they of the same length?
Interpret the tree.
REFERENCES:
Altschul, S.F., W. Gish, W. Miller, E.W. Myers & Lipman, D.J.. (1990). Basic local alignment
search tool. J. Mol. Biol. 215: 403-410.
Perrière, G. & Gouy, M.(1996). WWW-Query: An on-line retrieval system for biological
sequence banks. Biochimie, 78: 364-369.
6
EXERCISE 7
BIOINFORMATICS: EXPLORING GENETIC DATABASES
AND DATA MINING TOOLS
DATA SHEET
V. DATA/OBSERVATION
Paste all screenshots procedure and output from BLAST, GeneFinder and GLIMMER.
Use separate sheet/s.
VI. CONCLUSION
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
_________
V. QUESTIONS
1. How important is bioinformatics in the field of genetics? _____________________________
___________________________________________________________________________
___________________________________________________________________________
4. Aside from protein sequences, what other sequences can be used in constructing phylogenetic
trees?
___________________________________________________________________________
5. What other software/applications are used for sequence alignment and construction of
phylogenetic trees?
___________________________________________________________________________
7
Enumerate the 20 amino acids and the single letter that symbolizes each.
Table 20. Single-letter symbols of 20 amino acids
Amino Acid Code Amino Acid Code Amino Acid Code Amino Acid Code