0% found this document useful (0 votes)
111 views8 pages

Exercise 7 Bioinformatics

This document describes an exercise on exploring genetic databases and data mining tools in bioinformatics. The exercise introduces students to key concepts like searching online databases, using BLAST for gene identification and sequence analysis, and constructing phylogenetic trees. Students will use tools like BLAST, GeneFinder, GLIMMER and ClustalX to analyze unknown DNA sequences, predict genes, align protein sequences and build phylogenetic trees to study taxonomic relationships between species.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views8 pages

Exercise 7 Bioinformatics

This document describes an exercise on exploring genetic databases and data mining tools in bioinformatics. The exercise introduces students to key concepts like searching online databases, using BLAST for gene identification and sequence analysis, and constructing phylogenetic trees. Students will use tools like BLAST, GeneFinder, GLIMMER and ClustalX to analyze unknown DNA sequences, predict genes, align protein sequences and build phylogenetic trees to study taxonomic relationships between species.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EXERCISE 7

BIOINFORMATICS: EXPLORING GENETIC DATABASES


AND DATA MINING TOOLS

Glenda Z. Doblas

I. INTRODUCTION

Bioinformatics is the branch of biology that links computer technologies to


biological sciences particularly the field of genetics and molecular biology. This exercise
introduces students to the key concepts and practical skills in bioinformatics. Specifically,
the students will search online databases and use data mining tools for sequence analysis,
gene prediction and identification, and construction of phylogenetic trees.

II. OBJECTIVES

At the end of the activity, the student must be able to:


1. Search the web for online databases used in genetic studies.
2. Submit and retrieve sequences from BLAST.
3. Use BLAST for gene identification.
4. Make phylogenetic trees.

III. MATERIALS/EQUIPMENT

Computer with internet connection


unknown DNA sequences in FASTA format (provided by the instructor)
amino acid sequence for cytochrome c of different species (provided by the instructor)

IV. PROCEDURE

1. Gene Prediction and Identification


a. Open the browser http://www.ncbi.nlm.nih.gov. This will lead you to the Entrez search
and retrieval system of NCBI (National Center for Biotechnology Information) of the
National Library of Medicine at the National Institute of Health. Here, you will see several
links to different databases and data mining tools. Click on BLAST.

b. Select nucleotide blast.

1
c. Paste the unknown DNA sequences in FASTA format (your instructor will provide you with the
sequences) and click on BLAST button located at the bottom of the page. This will search DNA
sequences that are similar to the query sequence.

d. A result will be displayed showing the closest match. You will be shown with a color-coded graph
showing regions of similarity with red indicating the highest similarity and black as the least
similar.

e. You will also be shown with a table of sequences with significant alignments. The higher the
“Score” value, the closer the match. The lower the “E” value, the more statistically meaningful
the match. What is the best match of the sequence assigned to you?

f. Retrieve human and bacterial nucleotide sequences from NCBI and run appropriate sequence
using the online tools below. Interpret your results.
GeneFinder http://rulai.cshl.edu/tools/genefinder/

2
GLIMMER http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi

2. Constructing Phylogenetic Trees


Using amino acid sequence of a specific mitochondrial sequence, you will determine the
taxonomic position of cyanobacteria in relation to other species. Your instructor will provide you
with amino acid sequence for cytochrome c of different species.
a. Open http://www.ncbi.nlm.nih.gov. Scroll to the protein database.
b. In the Entrez search engine, type the following: cyanobacteria “cytochrome c” [prot]

c. Retrieve the protein sequence in FASTA format of the first displayed sequence.

d. Highlight and copy the sequence.

e. Paste the sequence in MS Word and edit leaving only the name of the organism and the
sequence. Example:
>Prochlorococcus marinus str. MIT 9301
MKIFKFLFVIPLITLIIIFQTSLQNRYLMASDIRDGETIFRNVCAGCHVRGGSVVLKGSKSLKLSDL
EKRGIADVNSITIIANEGIGFMKGYKNKLNDGEDKVLAQWIIQNAEKGWK

The letters correspond to the specific amino acids comprising the cytochrome C protein. What
amino acid is coded by each letter?

f. Save the MS Word file as text file.

3
g. Retrieve sequences from other organisms using the same steps. Edit and paste these
sequences in the same MS WORD text file. To speed up this process, your instructor have
already prepared a text file containing a set of protein sequences from different organisms
representing the bacteria, animals, yeasts, molds, algae and plants.

h. Now, we will align the sequences. Launch the program “ClustalX”. In the file menu, select load
sequences and load the text file of protein sequences you saved previously.

i. The sequences are displayed in the original or unaligned state. Note how the length of
sequences differs from one another. You can view the sequences from left to right or right to left.
Notice the different colors. It corresponds to the different amino acids. Notice also the gray bars
below the ruler. The higher the bar, the more conserved the sequences are at that position.

j. From the alignment menu, choose Do Complete Alignment and select the folder where you want
to save the .aln and .dnd files.

4
k. The sequences are now aligned. What have you noticed with the gray bars and the position of
the sequences? Did you observe some symbols above the sequences? A “*” indicates a single
fully conserved residue, “:” indicates ‘strong’ amino acid R groups conserved and “.” indicates
‘weak” amino acid R groups conserved.

l. Now, calculate the genetic distances between the aligned sequences. Go to the Trees menu
and select Bootstrap N-J Tree. A .phb file containing genetic distance data will be written. This
will be used to construct phylogenetic tree.

m. Now, go back to your folder where you have saved the files. Check if you have the complete
files. Three of these files are written by ClustalX. The .aln file is the sequence alignment file. The
.dnd and .ph files are genetic distance files which can also be used by other softwares in building
phylogenetic trees.

5
n. Now, let’s build the phylogenetic tree using the N-J Plot software. Launch the N-J Plot
application. Click on the File menu and open th .ph file.

o. A phylogenetic tree is now displayed. Examine the branches. Are they of the same length?
Interpret the tree.

REFERENCES:

Altschul, S.F., W. Gish, W. Miller, E.W. Myers & Lipman, D.J.. (1990). Basic local alignment
search tool. J. Mol. Biol. 215: 403-410.

Larkin, M.A., G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F.


Valentin, I.M. Wallace, A. Wilm, R. Lopez, J.D. Thompson, T.J. Gibson & Higgins,
D.G..(2007). Clustal W and Clustal X version 2.0. Bioinformatics 23 (21): 2947-2948.

Perrière, G. & Gouy, M.(1996). WWW-Query: An on-line retrieval system for biological
sequence banks. Biochimie, 78: 364-369.

6
EXERCISE 7
BIOINFORMATICS: EXPLORING GENETIC DATABASES
AND DATA MINING TOOLS

DATA SHEET

Name: ______________________________ Date:__________ Rating: ____


Group No: ____ Lab. Schedule: _________ Lab. Instructor: ________________

V. DATA/OBSERVATION
Paste all screenshots procedure and output from BLAST, GeneFinder and GLIMMER.
Use separate sheet/s.

VI. CONCLUSION
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
_________

V. QUESTIONS
1. How important is bioinformatics in the field of genetics? _____________________________
___________________________________________________________________________
___________________________________________________________________________

2. What is the best match of the sequence assigned to you? Why?


___________________________________________________________________________

3. Give the uses of a phylogenetic tree?


___________________________________________________________________________
___________________________________________________________________________

4. Aside from protein sequences, what other sequences can be used in constructing phylogenetic
trees?
___________________________________________________________________________

5. What other software/applications are used for sequence alignment and construction of
phylogenetic trees?
___________________________________________________________________________

7
Enumerate the 20 amino acids and the single letter that symbolizes each.
Table 20. Single-letter symbols of 20 amino acids
Amino Acid Code Amino Acid Code Amino Acid Code Amino Acid Code

6. Paste your phylogenetic tree. Interpret.

You might also like