Lab Report 03
Lab Report 03
EXPERIMENT – 03
2
ABSTRACT
BLAST is one of the most widely used program in bioinformatics research. Its main function
is to compare the sequence of interest, the sequence of queries, with the sequences of a large
database. BLAST then provides the best matches or hits, which are founded in the database.
This simple program has two primary applications. First, if the functionality of the query
sequence is not known, its functionality can be inferred based on the accepted functions of
similar sequences. Second, if the researcher has a sequence of queries with a known function,
they can identify sequences in the database that have similar functions. This program
compares nucleotides or proteins with sequential databases and calculates the statistical
significance of matches. BLAST can be used to infer functional and evolutionary
relationships between sequences as well as to identify genetic family members. Therefore, the
efficacy of BLAST depends on the researcher's query sequence and database. Furthermore,
BLAST calculates an expected value, which estimates the number of similarities between two
sequences. It uses the local alignment of the sequence. In this report, it is discussed that how
to use BLAST to homology search with retrieved nucleotide sequence from GenBank. And
also, to view and use nucleotide BLAST for nucleotide sequence similarity search.
3
INTRODUCTION
Sequential alignment is one of the most common biotechnology functions. It is found almost
exclusively in the research and development activities of many industries in the field of
biology, including academia, biotechnology, services, software, companies, pharmaceutical
companies and hospitals.
The National Center for Biotechnology Information (NCBI) developed a widely used version
of the Basic local alignment search tool (BLAST).
Of all the sequential algorithms, BLAST is the most commonly used. BLAST is a group of
programs designed to perform similarity searches against a database of sequences. BLAST
can be used to gain an understanding of evolutionary relationships, and also can be used to
predict the activation and biological significance of gene products (BLAST: Basic Local
Alignment Search Tool, 2020). Typically, one query is used to compare a nucleotide or
protein sequence to a sequential database and to detect similarities and sequential matches. Its
success and popularity come from statistical assessment of its speed, sensitivity, and
productivity.
The best alignment between the query sequence and the subject sequence is a record. The E-
value (expected value) is the number of records we would expect to occur when our query
sequence is searched in a database against random sequences, and hence the results obtained
4
are absolutely based on chance. Altogether, an E-value less than 0.1 (a lower E-value) means
that our record is significant.
The best alignment between the query sequence and the subject sequence is a hit. The e-value
(expected value) is the number of hits expected to occur when our query sequence is searched
in a database against random sequences and is therefore randomly based on the result
obtained (E-value & Bit-score - Metagenomics, 2020). Overall, the e-value is less than 0.1
(low e-value) means that that query sequence is significant. .
5
OBJECTIVES
1. To learn how to use BLAST to homology search with retrieved nucleotide sequence
from GenBank.
2. To view and use nucleotide BLAST for nucleotide sequence similarity search.
MATERIALS
1. Computer.
2. Internet connection.
3. NCBI website.
6
METHODS AND RESULTS
1. First using the following link, the BLAST homepage was accessed.
https://blast.ncbi.nlm.nih.gov/.
2. Nucleotide blast was accessed and inserted the example sequence data set at
https://digitalworldbiology.com/BLAST/sequences
10
In the experiment, the sequence that had mentioned as Example Sequence was obtained from
the website (https://diqitaIworIdbioIoqy.com/BLAST/sequences). And then, that entire
sequence was entered into the query box in BLAST page (http://blast.ncbi.mm.nih.qov/) by
doing copy and paste. After that, all other settings were left as its default values and within a
few seconds, the results of the blastn were displayed by clicking BLAST button (Nucleotide
to Nucleotide BLAST (blastn) | KitBase, 2020). Scrolling through the BLAST results, some
information was identified such as:
Header
Description
graphic summary
alignments
taxonomy
In the header section, it included details about the query such as the query ID and query
length. The database contained information on the nucleotide collection like as title, brief
11
description, molecular type, date of update, and the number of sequences (2020). Also in the
header, the type of program used was listed as blastn. General-purpose of blastn is searching
nucleotide sequence and blastn is an alignment program that can be used to sequence tRNA
or rRNA sequences as well as mRNA or genetic DNA sequences containing a mixture of
encoded and non-encoded regions (BLAST: Basic Local Alignment Search Tool, 2020).
In the description section, it contained various search parameters and statistics. The graphic
summary contained the sequences and details that were closely identified with the query
sequence. The alignment section represented the alignment points between the similar
sequences and the query sequence. The taxonomy section included three types of views, such
as Organism Report, Lineage Report, and Taxonomy Report.
The alignment section included the length of the matched sequence, the sequence identities
along with the definition line, the score, and the E-value. The Expect value (E) is a parameter
that specifies the number of hits that can be expected to be seen by chance whenever a
database of a specific size is searched. As the match’s score (S) increases, it decreases
exponentially (E-value & Bit-score - Metagenomics, 2020).
The line also contained the information about the identical residues in alignment (identities),
the number of gaps used in the alignment. Finally, it was showed the actual alignment, along
with the query sequence on the top and database sequence below the query. The number on
either side of the alignment indicated the position of nucleotides in the sequence.
12
REFERENCES
Blast.ncbi.nlm.nih.gov. 2020. BLAST: Basic Local Alignment Search Tool. [online]
Available at: <https://blast.ncbi.nlm.nih.gov/Blast.cgi> [Accessed 9 November 2020].
2020. [online] Available at: <https://pseudomonas.com/blast/setnblast> [Accessed 10
November 2020].
Kitbase.ucdavis.edu. 2020. Nucleotide To Nucleotide BLAST (Blastn) | Kitbase.
[online] Available at: <https://kitbase.ucdavis.edu/blast/nucleotide/nucleotide>
[Accessed 10 November 2020].
Metagenomics.wiki. 2020. E-Value & Bit-Score - Metagenomics. [online] Available
at: <http://www.metagenomics.wiki/tools/blast/evalue> [Accessed 10 November
2020].
Wheeler, D. and Bhagwat, M. (2020). BLAST QuickStart. [online] Ncbi.nlm.nih.gov.
Available at: https://www.ncbi.nlm.nih.gov/books/NBK1734/ [Accessed 10
November 2020].
13
POST-LAB QUESTIONS
1. Do sequence similarity search using nucleotide blast with unknown sequence 2
and 3 from the sequence data set at
http://www.digitalworldbiology.com/BLAST/62000sequences.html
Sequence 2
a) How long is the sequence that was used to search the database?
395
Raphanus sativus
14
Figure 10_The description section of sequence 2
c) Which hit was statistically more significant? Explain
Raphanus sativus antifungal protein 1 preprotein (Rs-AFP1) mRNA, complete cds
The most significant hit must has the lowest E-value because if the E-value
increased the significance will be increased. Furthermore, the max percentage
identity corresponds to the match to a subject sequence with the highest percentage
of identical bases.
15
a) How long is the sequence that was used to search the database?
420
Danio rerio
The most significant hit must have the lowest E value because if the E value increased
the significance will be increased. Furthermore, the max percentage identity
corresponds to the match to a subject sequence with the highest percentage of
identical bases.
AJ001596.1
17
e) What is the possible function of the protein that is specified by the unknown
DNA sequence?
18