0% found this document useful (0 votes)
83 views18 pages

Lab Report 03

1. BLAST is a program that compares a query sequence to sequence databases and identifies sequences that are similar above a threshold. It provides hits that are the best matches in the database along with statistics like E values. 2. The experiment used BLAST to search a nucleotide sequence query against the nr database. It identified a 100% match to the Homo sapiens tyrosinase gene with an E value of 0, indicating high significance. 3. BLAST allows identification of evolutionary relationships between sequences and prediction of gene function based on sequence similarity. It works by finding local rather than global alignments between query and database sequences.

Uploaded by

Dew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views18 pages

Lab Report 03

1. BLAST is a program that compares a query sequence to sequence databases and identifies sequences that are similar above a threshold. It provides hits that are the best matches in the database along with statistics like E values. 2. The experiment used BLAST to search a nucleotide sequence query against the nr database. It identified a 100% match to the Homo sapiens tyrosinase gene with an E value of 0, indicating high significance. 3. BLAST allows identification of evolutionary relationships between sequences and prediction of gene function based on sequence similarity. It works by finding local rather than global alignments between query and database sequences.

Uploaded by

Dew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

1

EXPERIMENT – 03

SEQUENCE SIMILARITY SEARCH - 01

EB3233: BIOINFORMATICS LABORATORY

2
ABSTRACT
BLAST is one of the most widely used program in bioinformatics research. Its main function
is to compare the sequence of interest, the sequence of queries, with the sequences of a large
database. BLAST then provides the best matches or hits, which are founded in the database.
This simple program has two primary applications. First, if the functionality of the query
sequence is not known, its functionality can be inferred based on the accepted functions of
similar sequences. Second, if the researcher has a sequence of queries with a known function,
they can identify sequences in the database that have similar functions. This program
compares nucleotides or proteins with sequential databases and calculates the statistical
significance of matches. BLAST can be used to infer functional and evolutionary
relationships between sequences as well as to identify genetic family members. Therefore, the
efficacy of BLAST depends on the researcher's query sequence and database. Furthermore,
BLAST calculates an expected value, which estimates the number of similarities between two
sequences. It uses the local alignment of the sequence. In this report, it is discussed that how
to use BLAST to homology search with retrieved nucleotide sequence from GenBank. And
also, to view and use nucleotide BLAST for nucleotide sequence similarity search.

3
INTRODUCTION
Sequential alignment is one of the most common biotechnology functions. It is found almost
exclusively in the research and development activities of many industries in the field of
biology, including academia, biotechnology, services, software, companies, pharmaceutical
companies and hospitals.

The National Center for Biotechnology Information (NCBI) developed a widely used version
of the Basic local alignment search tool (BLAST).

Of all the sequential algorithms, BLAST is the most commonly used. BLAST is a group of
programs designed to perform similarity searches against a database of sequences. BLAST
can be used to gain an understanding of evolutionary relationships, and also can be used to
predict the activation and biological significance of gene products (BLAST: Basic Local
Alignment Search Tool, 2020). Typically, one query is used to compare a nucleotide or
protein sequence to a sequential database and to detect similarities and sequential matches. Its
success and popularity come from statistical assessment of its speed, sensitivity, and
productivity.

Following are the blast programs:

• blastn - Nucleotide query vs Nucleotide database

• blastp - Protein query vs Protein database

• blastx - Translated nucleotide (6 frames) query vs Protein database

• tblastn - Protein query vs Translated nucleotide (6 frames) database

• tblastx - Translated nucleotide (6 frames) query vs Translated nucleotide (6 frames)


database

The best alignment between the query sequence and the subject sequence is a record. The E-
value (expected value) is the number of records we would expect to occur when our query
sequence is searched in a database against random sequences, and hence the results obtained

4
are absolutely based on chance. Altogether, an E-value less than 0.1 (a lower E-value) means
that our record is significant.

The best alignment between the query sequence and the subject sequence is a hit. The e-value
(expected value) is the number of hits expected to occur when our query sequence is searched
in a database against random sequences and is therefore randomly based on the result
obtained (E-value & Bit-score - Metagenomics, 2020). Overall, the e-value is less than 0.1
(low e-value) means that that query sequence is significant. .

5
OBJECTIVES
1. To learn how to use BLAST to homology search with retrieved nucleotide sequence
from GenBank.
2. To view and use nucleotide BLAST for nucleotide sequence similarity search.

MATERIALS
1. Computer.
2. Internet connection.
3. NCBI website.

6
METHODS AND RESULTS
1. First using the following link, the BLAST homepage was accessed.
https://blast.ncbi.nlm.nih.gov/.

Figure 1_The BLAST homepage

2. Nucleotide blast was accessed and inserted the example sequence data set at
https://digitalworldbiology.com/BLAST/sequences

Figure 2_The example sequence data set


3. The database was set at its default; nr as well as all other settings were also set at their
default and started the search by using standard blastn program.

Figure 3_The Query box in the BLAST page


4. After that, the results were obtained from the search.

Figure 4_Search results page 8


5. The graphic summary was accessed and it showed the information about the extent
and significance of hits against the query sequence and information of the alignment.

Figure 5_The graphic summary of the sequence


6. The hits were represented in the order of increasing E value. The highest percentage
identity that corresponds to the match to a subject sequence with the highest
percentage of identical bases are obtained in the top hits.

Figure 6_The description section


7. As the Homo sapiens tyrosinase (TYR), mRNA have the highest Max score (1712)
and the lowest E value (0.0), the description about the sequences and the aligns of the
sequence.

Figure 7_The sequence alignment of Homo sapiens tyrosinase (TYR), mRNA


8. GenBank record was obtained by clicking at the GenBank.

10

Figure 8_GenBank record of Homo sapiens tyrosinase (TYR), mRNA


DISCUSSION
BLAST is a group of programs designed to perform similarity searches against a database of
sequences. BLAST can be used to gain an understanding of evolutionary relationships, and
also can be used to predict the activation and biological significance of gene products. An
algorithm that searches for local sequence (aligning parts of two sequences) rather than a
global alignment (aligning two sequences over their full length) is used by BLAST. By
searching local alignments, BLAST is able to identify areas that have similarities in two
sequences (BLAST: Basic Local Alignment Search Tool, 2020).

In the experiment, the sequence that had mentioned as Example Sequence was obtained from
the website (https://diqitaIworIdbioIoqy.com/BLAST/sequences). And then, that entire
sequence was entered into the query box in BLAST page (http://blast.ncbi.mm.nih.qov/) by
doing copy and paste. After that, all other settings were left as its default values and within a
few seconds, the results of the blastn were displayed by clicking BLAST button (Nucleotide
to Nucleotide BLAST (blastn) | KitBase, 2020). Scrolling through the BLAST results, some
information was identified such as:

- a unique request ID (RID)


- query information
- database information
- a link to taxonomy reports
- a graphical display showing alignments to the query sequence
- descriptions of sequences producing significant alignments
- Pairwise alignments between the query sequence and each BLAST hit sequence.
Above-mentioned information can be gained from following sections.

 Header
 Description
 graphic summary
 alignments
 taxonomy
In the header section, it included details about the query such as the query ID and query
length. The database contained information on the nucleotide collection like as title, brief

11
description, molecular type, date of update, and the number of sequences (2020). Also in the
header, the type of program used was listed as blastn. General-purpose of blastn is searching
nucleotide sequence and blastn is an alignment program that can be used to sequence tRNA
or rRNA sequences as well as mRNA or genetic DNA sequences containing a mixture of
encoded and non-encoded regions (BLAST: Basic Local Alignment Search Tool, 2020).

In the description section, it contained various search parameters and statistics. The graphic
summary contained the sequences and details that were closely identified with the query
sequence. The alignment section represented the alignment points between the similar
sequences and the query sequence. The taxonomy section included three types of views, such
as Organism Report, Lineage Report, and Taxonomy Report.

The alignment section included the length of the matched sequence, the sequence identities
along with the definition line, the score, and the E-value. The Expect value (E) is a parameter
that specifies the number of hits that can be expected to be seen by chance whenever a
database of a specific size is searched. As the match’s score (S) increases, it decreases
exponentially (E-value & Bit-score - Metagenomics, 2020).

The line also contained the information about the identical residues in alignment (identities),
the number of gaps used in the alignment. Finally, it was showed the actual alignment, along
with the query sequence on the top and database sequence below the query. The number on
either side of the alignment indicated the position of nucleotides in the sequence.

12
REFERENCES
 Blast.ncbi.nlm.nih.gov. 2020. BLAST: Basic Local Alignment Search Tool. [online]
Available at: <https://blast.ncbi.nlm.nih.gov/Blast.cgi> [Accessed 9 November 2020].
 2020. [online] Available at: <https://pseudomonas.com/blast/setnblast> [Accessed 10
November 2020].
 Kitbase.ucdavis.edu. 2020. Nucleotide To Nucleotide BLAST (Blastn) | Kitbase.
[online] Available at: <https://kitbase.ucdavis.edu/blast/nucleotide/nucleotide>
[Accessed 10 November 2020].
 Metagenomics.wiki. 2020. E-Value & Bit-Score - Metagenomics. [online] Available
at: <http://www.metagenomics.wiki/tools/blast/evalue> [Accessed 10 November
2020].
 Wheeler, D. and Bhagwat, M. (2020). BLAST QuickStart. [online] Ncbi.nlm.nih.gov.
Available at: https://www.ncbi.nlm.nih.gov/books/NBK1734/ [Accessed 10
November 2020].

13
POST-LAB QUESTIONS
1. Do sequence similarity search using nucleotide blast with unknown sequence 2
and 3 from the sequence data set at
http://www.digitalworldbiology.com/BLAST/62000sequences.html
Sequence 2

a) How long is the sequence that was used to search the database?

395

Figure 9_ Search results of sequence 2

b) What organism was the most likely source of sequence?

Raphanus sativus

14
Figure 10_The description section of sequence 2
c) Which hit was statistically more significant? Explain
 Raphanus sativus antifungal protein 1 preprotein (Rs-AFP1) mRNA, complete cds
 The most significant hit must has the lowest E-value because if the E-value
increased the significance will be increased. Furthermore, the max percentage
identity corresponds to the match to a subject sequence with the highest percentage
of identical bases.

d) What is the accession number for the best matching sequence?


U18557.1

Figure 11_Accession number for the best match


e) What is the possible function of the protein that is specified by the 2 unknown
DNA sequences?
Antifungal and fungistatic

15

Figure 12_Function of the protein


Sequence 3

a) How long is the sequence that was used to search the database?

420

Figure 13_Search results of sequence 3

b) What organism was the most likely source of sequence?

Danio rerio

Figure 14_The description section of sequence 3 16


c) Which hit was statistically more significant? Explain
 Danio rerio mRNA opioid receptor homologue

The most significant hit must have the lowest E value because if the E value increased
the significance will be increased. Furthermore, the max percentage identity
corresponds to the match to a subject sequence with the highest percentage of
identical bases.

d) What is the accession number for the best matching sequence?

AJ001596.1

Figure 15_Accession number for the best match (Sequence 3)

17
e) What is the possible function of the protein that is specified by the unknown
DNA sequence?

Opioid receptor homologue

Figure 16_Function of the protein (Sequence 3)

18

You might also like