NCBI BLAST
Name : Rohith ND
Roll No:20054
Introduction :
The Basic Local Alignment Search Tool (BLAST) is a program that can detect
sequence similarity between a query sequence and sequences within a
database. The ability to detect sequence homology allows us to identify
putative genes in a novel sequence. It also allows us to determine if a gene or
a protein is related to other known genes or proteins.
BLAST is popular because it can quickly identify regions of
local similarity between two sequences. More importantly, BLAST uses a
robust statistical framework that can determine if the alignment between two
sequences is statistically significant.
The NCBI BLAST web interface:
Before we begin the analysis, we should first familiarize ourselves with the
NCBI BLAST web interface. Open a new web browser window and navigate
to the NCBI BLAST main page. In this walkthrough, we will only use a few of
the tools available on the NCBI BLAST website. To learn about the more
advanced options available (such as setting up My NCBI accounts), click on
the “Help” link on the main navigation bar to access the documentations for
NCBI BLAST.
All the NCBI BLAST pages have the same header with four links:
Besides the main toolbar, there are two other sections of the NCBI BLAST
web interface that are of interests: the “Web BLAST” section contains links to
the common BLAST programs and the “Specialized searches” section
contains links to additional tools for performing sequence searches (e.g., use
CDsearch to identify conserved domains within a query sequence). The type
of BLAST search you need to use will depend primarily on the type of query
sequence and the database you would like to search.
Four of the five common BLAST programs are available through the “Web
BLAST” section of the NCBI BLAST home page . The program tblastx, which
translates the nucleotide query and nucleotide database when it performs the
sequence comparisons, is not listed under the “Web BLAST” section.
However, you can access this program by clicking on any of the BLAST
programs in the “Web BLAST” section and then click on the “ tblastx” tab in
the NCBI BLAST search form.
The basic BLAST programs are summarized below
Instead of searching a query sequence against sequences in a database, you
can also align two (or more) sequences by selecting the “Align two or more
sequences” checkbox at the bottom of the “Enter Query Sequence” section.
This feature is also known as BLAST 2 Sequences (bl2seq).
The Algorithm
The algorithm itself is straightforward, the important concept being that of the
segment pair. Given two sequences, a segment pair is defined as a pair of
sub-sequences of the same length that form an ungapped alignment. BLAST
calculates all segment pairs between the query and the database
sequences, above a scoring threshold. The algorithm searches for fixed-
length hits, which are then extended until certain threshold parameters are
achieved. The resulting high-scoring pairs (HSPs) form the basis of the
ungapped alignments that characterize BLAST output. Subsequently, a
modification of the algorithm had been introduced for generating gapped
alignments. The new algorithm seeks only one, rather than all, ungapped
alignments that make up a significant match, and hence speeds the initial
database search. Dynamic programming is used to extend a central pair of
aligned residues in both directions to yield the final gapped alignment. Having
dropped the requirements to find all ungapped alignments independently, the
new algorithm is three times faster than its predecessor.
Applications of BLAST in Biological Sciences
The BLAST tool finds its use in a wide range of biological applications and
some of which includes:
• identification of homologous gene candidates across diverse genomes
• species comparison by identifying similar genes in different organisms
• comparative gene prediction which involves conducting a search
between two genome sequences to provide both sensitive and specific
gene predictions
• functional annotation of genomes for the identification of functional
properties and biological roles of the genes in the genomes
• contig mapping for efficient gap-closure of prokaryotic genome
sequence assemblies
• pseudogene identification for understanding the evolutionary history
• of genes and genomes
• This tool is also helpful in building datasets for phylogenetic analysis
• constructing phylogenetic dendrograms/trees from protein sequences
• Further, it is also used for designing target-specific primers for
polymerase chain reaction
Performing the BLAST run
BLAST search can be performed using the NCBI website from the web address
http:// www.ncbi.nlm.nih.gov/Blast. There are various BLAST options available on
this home page
>gi|7558|emb|Z00030.1| D. melanogaster alcohol dehydrogenase structural
gene and flanks (composite sequence)
CATCCTCGCCCGTTTCCACGCCGTCGTCCTCCTCATCATCGGCGAGAGC
TGATTGCGTGGTGGTCAGAGG
CGAACCAGCGGTCTTCGTGGAGCTGGGACCCAGATCAAGGCTGCTCAA
CAGATTGCCTGCCGACTGGGAA
GACGTTAGGGTGTCCTTGTGATAGGAGCTGTGCCGATTGCCCAGCTTAG
TGGATAGTGTTAGGTCGCCGT
TGCTCGTTGGGCGTAGACTGCCCACCACCTGACCACCGGGCAGGGTGG
CGCTTCTCTTGTGGCGACCCTT
CGACTTGGGAAAGGCAGCCAGGATGTTGAGCCACCACTGGGATTCCTC
TGAACTGGTGCCCTTCACAAAG
GTCACGCGCTCGGGAGCGGTTATGGCGATGGAGTTGGGGTGACCTGTC
ACCTCCACGGCGCTGGTAACCT
CCAGCACTTTGGTCATATCAACGCACGCCTGCGGTATGGTTTCGGGCTA
TAGAAAATATATGTAAATTAA
AGAGTAAACAAGTTGTATTTTAAGATTTTAATTAGGAGAATTAATTAATCG
GTAATCAAATGAACTCGGC
CTATCGCGTAATAATATACATTTTTTAATTTAATGACTAATAAATAATATAA
AATCTAATTAATAGTTCA
GTAAGTTAGTAAAAGTAAATCAATCTGGTGGTAATTTAAGAAGCCACTTT
AATTCTTCCACTTCATAAAT
BLAST window with the query sequence pasted in it and the selected databases
Graphical Summary Output of BLAST showing the homology coverage between query and the Hits
Description section in the BLAST report showing one-line summaries of sequences producing significant
alignments
Alignment section from a BLAST report showing pair-wise sequence alignment between a query sequence and a
database sequence.
Further, for performing a BLAST search, the query sequence
should be in FASTA format .In order to perform the run, first we have to open
the NCBI home page and then click on BLAST. Then the type of BLAST
options can be selected from this window. In this
case, we have selected the nucleotide blast option which is in Basic BLAST.
Subsequently, the query sequence which is in FASTA file is pasted in the
Enter Query Sequence section of the window. Next, the database for
performing the BLAST is selected from the drop down menu; in our case we
have selected the nr database; and
from the program selection we have optimize for blastn. Subsequently we can
click the BLAST option and wait patiently for a few seconds. After a few
seconds, the BLAST output window will appear showing the results of the
BLAST search .
Conclusion
The parallel development of large-scale sequencing
projects and bioinformatics tools like BLAST has enabled
scientists to study the genetic blueprint of life across many
species and has helped bridge the gap between biology and
computer science in the maturing field of bioinformatics . It is
noteworthy to mention here that as the biological sequence data
are generated at an ever increasing rate, the role of bioinformatics
in biological research will also continue to grow. Hence,
bioinformatics tools that allow scientists to explore genome
sequence data have become a corner- stone of current biological
research and as such should be included in any modern biology
curriculum . No science curriculum can remain current without a
bioinformatics component. Undergraduate students increasingly
need training in methods related to finding and retrieving
information stored in vast databases.
References
https://blast.ncbi.nlm.nih.gov/Blast.cgi
https://www.researchgate.net/publication/
267332265_BLAST_An_introductory_tool_for_students_to_Bioinformatics_A
pplications
https://community.gep.wustl.edu/repository/course_materials_WU/annotation/
Introduction_NCBI_BLAST.pdf
https://blast.ncbi.nlm.nih.gov/BLAST_guide.pdf
https://en.wikipedia.org/wiki/BLAST_(biotechnology)