0% found this document useful (0 votes)

125 views18 pages

Blast

BLAST is a widely used program for comparing a query sequence to sequence databases and identifying homologous sequences. It works by searching databases for local alignments and high-scoring pairs between the query and target sequences. Several versions of BLAST are available for comparing nucleotides to nucleotides, proteins to proteins, or across domains. BLAST is a fast and scalable tool that is useful for tasks like identifying species, domains, phylogeny, and annotating sequences.

Uploaded by

Jhilik Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views18 pages

Blast

Uploaded by

Jhilik Pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

BLAST

Introduction
 BLAST (Basic Local Alignment Search Tool)
is the standard tool used for Alignment
[Atschul et al., 1990].
 BLAST is far being from basic as the name
indicates, it is a highly advanced algorithm
which has become very popular due to
speed, availability & accuracy.
 Many researchers use BLAST as initial
screening tool for their unknown sequence.
 BLAST identifies homologus seq. by
searching one or more databases.
 BLAST is an open source program &
anyone can download & change the
program code. (many BLAST derivatives;
WU-BLAST).
 BLAST is highly scalable & comes in a
number of different computer platform
configurations which make usage on both
small desktop as well as large computer
clusters possible.
BLAST USE
 Looking for Species.
 Looking for Domains.
 Looking at Phylogeny.
 Mapping DNA to a known
Chromosome.
 Annotations.
 Searching for Homology.
How Does BLAST Work?
 To run, BLAST requires a query sequence to
search for, and a sequence to search against.

 The main idea of BLAST is that there are often

high-scoring segment pairs (HSP) contained in a
statistically significant alignment.

 BLAST searches for high scoring sequence

alignments between the query sequence and
sequences in the database using a heuristic
approach that approximates the Smith-Waterman
algorithm.
BLAST ALGORITHM
 Remove low-complexity region or sequence
repeats in the query sequence.

 Low-complexity region means a region of a sequence is

composed of few kinds of elements. These regions might
give high scores that confuse the program to find the
actual significant sequences in the database, so they
should be filtered.
 SEG program is used for protein sequences and the
program DUST is used for DNA sequences. On the other
hand, the program XNU is used to mask off the tandem
repeats in protein sequences.
• Make a k-letter word list of the query
sequence.
Take k=3 for example, we list the words
of length 3 in the query protein sequence
(k is usually 11 for a DNA sequence)
“sequentially”, until the last letter of the
query sequence is included. The method
can be illustrated in figure.
 List the possible matching words.

 This step is one of the main differences between

BLAST and FASTA. FASTA cares about all of the
common words in the database and query
sequences that are listed in step 2; however, BLAST
cares about only the high-scoring words.
 The scores are created by comparing the word in the
list in step 2 with all the 3-letter words.
 The scores are created by comparing the word in the
list in step 2 with all the 3-letter words. By using the
scoring matrix (substitution matrix) to score the
comparison.
 The words whose scores are greater than the
threshold T will remain in the possible matching
words list, while those with lower scores will be
discarded.
 Organize the remaining high-scoring words into an
efficient search tree.

 This is for the purpose that the program can rapidly compare
the high-scoring words to the database sequences.

 Repeat step 1 to 4 for each 3-letter word in the

query sequence.

 Scan the database sequences for exact match with

the remaining high-scoring words.
 The BLAST program scans the database sequences for the
remaining high-scoring word, such as PEG, of each position. If
an exact match is found, this match is used to seed a possible
ungapped alignment between the query and database
sequences.
• Extend the exact matches to high-
scoring segment pair (HSP).
o The original version of BLAST stretches a
longer alignment between the query and the
database sequence in left and right
direction, from the position where exact
match is scanned. The extension doesn’t
stop until the accumulated total score of the
HSP begins to decrease. A simplified
example is presented in figure 2.
o To save more time, a newer version of BLAST,
called BLAST2 or gapped BLAST, has been
developed.
o BLAST2 adopts a lower neighborhood word score
threshold to maintain the same level of sensitivity for
detecting sequence similarity.
o Therefore, the possible matching words list in step 3
becomes longer.
o The exact matched regions, within distance A from
each other on the same diagonal in figure, will be
joined as a longer new region.
o Finally, the new regions are then extended as the
same method in the original version of BLAST, and
the HSPs’ (High-scoring segment pair) scores of the
extended regions are then created by using a
substitution matrix as before.
The positions of the exact
matches
 List all of the HSPs in the database whose
score is high enough to be considered.
 We list the HSPs whose scores are greater than the
empirically determined cutoff score S. By examining the
distribution of the alignment scores modeled by comparing
random sequences, a cutoff score S can be determined
such that its value is large enough to guarantee the
significance of the remained HSPs.
 Evaluate the significance of the HSP score.
 BLAST next assesses the statistical
significance of each HSP score by exploiting
the Gumbel extreme value distribution
(EVD). (It is proved that the distribution of
Smith-Waterman local alignment scores
between two random sequences follows the
Gumbel EVD, regardless of whether gaps are
allowed in the alignment).
 Make two or more HSP regions into a longer
alignment.
 Sometimes, we find two or more HSP regions in one database
sequence that can be made into a longer alignment. This
provides additional evidence of the relation between the query
and database sequence.
 There are two methods, the Poisson method and the sum-of
scores method, to compare the significance of the newly
combined HSP regions.
 Suppose that here are two combined HSP regions with the
sets of score (65, 40) and (52, 45), respectively.
 The Poisson method gives more significance to the set with
the lower score of each set is higher (45>40).
 However, the sum-of-scores method prefers the first set,
because 65+40 (105) is greater than 52+45(97).
 The original BLAST uses the Poisson method; gapped BLAST
and the WU-BLAST use the sum-of scores method.
 Show the gapped Smith-Waterman local
alignments of the query and each of the
matched database sequences.
 The original BLAST only generates ungapped alignments
including the initially found HSPs individually, even when
there is more than one HSP found in one database
sequence.
 BLAST2 versions produce a single alignment with gaps
that can include all of the initially found HSP regions.
Note that the computation of the score and its
corresponding E score is involved with the adequate gap
penalties.
 Report the matches whose expect score is
lower than a threshold parameter E.
 BLAST is actually a family of programs (all included in
the blastall executable). These include:
 Nucleotide-nucleotide BLAST (blastn)
 This program, given a DNA query, returns the most similar DNA
sequences from the DNA database that the user specifies.
 Protein-protein BLAST (blastp)
 This program, given a protein query, returns the most similar protein
sequences from the protein database that the user specifies.
 Position-Specific Iterative BLAST (PSI-BLAST )
 This program is used to find distant relatives of a protein. First, a list
of all closely related proteins is created. These proteins are
combined into a general "profile" sequence, which summarises
significant features present in these sequences
 Nucleotide 6-frame translation-protein (blastx)
 This program compares the six-frame conceptual translation products of a
nucleotide query sequence (both strands) against a protein sequence database

 Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)

 This program is the slowest of the BLAST family. It translates the query
nucleotide sequence in all six possible frames and compares it against the six-
frame translations of a nucleotide sequence database. The purpose of tblastx is
to find very distant relationships between nucleotide sequences.

 Protein-nucleotide 6-frame translation (tblastn)

 This program compares a protein query against the all six frame translations of
a nucleotide sequence database.

 Large numbers of query sequences (megablast)

 When comparing large numbers of input sequences via the command-line
BLAST, "megablast" is much faster than running BLAST multiple times. It
concatenates many input sequences together to form a large sequence before
searching the BLAST database, then post-analyze the search results to glean
individual alignments and statistical values.
 Alternative versions

 An extremely fast but considerably less sensitive alternative to BLAST

that compares nucleotide sequences to the genome is BLAT (Blast
Like Alignment Tool). A version designed for comparing multiple large
genomes or chromosomes is BLASTZ.

 Accelerated versions

 There are two main field-programmable gate array (FPGA)

implementations of the BLAST algorithm. Progeniq is up to 100x faster
than a software implementation running on the same processor[citation
needed]. TimeLogic [1] offers a FPGA BLAST package called Tera-
BLAST.

 The Mitrion-C Open Bio Project is an ongoing effort to port blast to run
on Mitrion FPGAs. It is available on SourceForge.

Rna Isolation
63% (8)
Rna Isolation
2 pages
Next Generation Sequencing
100% (10)
Next Generation Sequencing
301 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Lab 2.1
No ratings yet
Lab 2.1
21 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
Blast
No ratings yet
Blast
12 pages
BLAST
100% (1)
BLAST
4 pages
BLAST
No ratings yet
BLAST
30 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Database Searching
No ratings yet
Database Searching
41 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
BLAST
No ratings yet
BLAST
17 pages
BLAST - A Heuristic Algorithm
No ratings yet
BLAST - A Heuristic Algorithm
18 pages
Week 3 LocalAlignment
No ratings yet
Week 3 LocalAlignment
25 pages
BLAST Background
100% (1)
BLAST Background
27 pages
ItoBI Lec10 1
No ratings yet
ItoBI Lec10 1
17 pages
Sequence Alignment
No ratings yet
Sequence Alignment
14 pages
Bio 2
No ratings yet
Bio 2
39 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Fast Heuristic Local Alignment Algorithms: Stephen F
No ratings yet
Fast Heuristic Local Alignment Algorithms: Stephen F
18 pages
Ncbi Blast Name: Rohith ND Roll No:20054
No ratings yet
Ncbi Blast Name: Rohith ND Roll No:20054
11 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Merin 1
No ratings yet
Merin 1
10 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
Blast
No ratings yet
Blast
115 pages
Blast Fasta
No ratings yet
Blast Fasta
27 pages
BLAST Script
No ratings yet
BLAST Script
10 pages
Bt7 Ncbi Blast
No ratings yet
Bt7 Ncbi Blast
60 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
Blast 170122070200
No ratings yet
Blast 170122070200
22 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
No ratings yet
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
17 pages
5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
Fundamentals of Bioinformatics - L5
No ratings yet
Fundamentals of Bioinformatics - L5
56 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
No ratings yet
University of Kwazulu-Natal Bioinformatics Gene320 3 May 2016 Test 2 Duration 100 Minutes Total Marks: 70
6 pages
BE Blast
No ratings yet
BE Blast
11 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Blast
No ratings yet
Blast
6 pages
ALLIENU Blast and Fasta
No ratings yet
ALLIENU Blast and Fasta
27 pages
29) Altschul 1997
No ratings yet
29) Altschul 1997
14 pages
Basic Local Alignment Search Tool (Blast)
No ratings yet
Basic Local Alignment Search Tool (Blast)
3 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Second - Done - w14b - Searching Squence Databases
No ratings yet
Second - Done - w14b - Searching Squence Databases
32 pages
Sequence DB Search
No ratings yet
Sequence DB Search
38 pages
Blast
100% (1)
Blast
21 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
TY-Exercise 4
No ratings yet
TY-Exercise 4
8 pages
Lab Report 05
No ratings yet
Lab Report 05
20 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Absorption by Small Intestine and Difference Between Absorption
No ratings yet
Absorption by Small Intestine and Difference Between Absorption
12 pages
Plant Hormones
No ratings yet
Plant Hormones
13 pages
Genome Analysis 1
No ratings yet
Genome Analysis 1
5 pages
Protein Microarray
No ratings yet
Protein Microarray
25 pages
Devp211 Reviewer Prelim
No ratings yet
Devp211 Reviewer Prelim
28 pages
Aniridia Recent Developments in Scientific and Clinical Research Reference Book Download
100% (17)
Aniridia Recent Developments in Scientific and Clinical Research Reference Book Download
15 pages
Illustrated Laboratory Activity 6 Test For Nucleic Acids
No ratings yet
Illustrated Laboratory Activity 6 Test For Nucleic Acids
2 pages
Biology Grade 9 Teacher Guide Final Revised
No ratings yet
Biology Grade 9 Teacher Guide Final Revised
116 pages
GEN BIO Week 3
No ratings yet
GEN BIO Week 3
2 pages
AQA Biology GCSE Combined B14 Practice Answers
No ratings yet
AQA Biology GCSE Combined B14 Practice Answers
2 pages
Cochran Et Al 2024 Structure Activity Relationship of Antibody Oligonucleotide Conjugates Evaluating Bioconjugation
No ratings yet
Cochran Et Al 2024 Structure Activity Relationship of Antibody Oligonucleotide Conjugates Evaluating Bioconjugation
16 pages
Sa1-Revision Worksheet - G12-Biology
No ratings yet
Sa1-Revision Worksheet - G12-Biology
7 pages
SL Biology Syllabus Notes
No ratings yet
SL Biology Syllabus Notes
52 pages
GENERAL GENETICS Notes 1
No ratings yet
GENERAL GENETICS Notes 1
8 pages
Rolling Circle Replication
No ratings yet
Rolling Circle Replication
13 pages
The Barker Hypothesis: March 2018
No ratings yet
The Barker Hypothesis: March 2018
19 pages
Basics of Scientific Writing, Scientific Research, and Elementary Data Analysis
No ratings yet
Basics of Scientific Writing, Scientific Research, and Elementary Data Analysis
12 pages
Lesson Plan For Grade 9 Science Quarter 1 Week 6
No ratings yet
Lesson Plan For Grade 9 Science Quarter 1 Week 6
3 pages
CHAPTER XI - Gene Therapy
100% (1)
CHAPTER XI - Gene Therapy
4 pages
In Situ Hybridization Protocols 4th Edition Boye Schnack Nielsen (Eds.) - The Ebook With Rich Content Is Ready For You To Download
No ratings yet
In Situ Hybridization Protocols 4th Edition Boye Schnack Nielsen (Eds.) - The Ebook With Rich Content Is Ready For You To Download
75 pages
Mendels Law
No ratings yet
Mendels Law
8 pages
BIO 325 Exam 1 Review
No ratings yet
BIO 325 Exam 1 Review
17 pages
Gen. Bio 2 Sex Linkage and Recombination
No ratings yet
Gen. Bio 2 Sex Linkage and Recombination
16 pages
Department of Education: Dasmariñas Integrated High School
No ratings yet
Department of Education: Dasmariñas Integrated High School
2 pages
How Is Genotype Called in Usa - Google Search
No ratings yet
How Is Genotype Called in Usa - Google Search
1 page
Nurul Aina Hakima
No ratings yet
Nurul Aina Hakima
23 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
9 pages
Complete Answer Guide For Test Bank For Genetics Essentials Concepts and Connections 3rd Edition by Pierce ISBN 1464190755 9781464190759
100% (13)
Complete Answer Guide For Test Bank For Genetics Essentials Concepts and Connections 3rd Edition by Pierce ISBN 1464190755 9781464190759
56 pages
Sex - Linked Genes
100% (1)
Sex - Linked Genes
11 pages
What Happened To The Urchin Populations - Allele Frequency
No ratings yet
What Happened To The Urchin Populations - Allele Frequency
5 pages
Week 15 Filipino Cultures and Health Beliefs
No ratings yet
Week 15 Filipino Cultures and Health Beliefs
6 pages
57 3 1 Biology
No ratings yet
57 3 1 Biology
27 pages

Blast

Uploaded by

Blast

Uploaded by

BLAST

 The main idea of BLAST is that there are often

 BLAST searches for high scoring sequence

 Low-complexity region means a region of a sequence is

 This step is one of the main differences between

 Repeat step 1 to 4 for each 3-letter word in the

 Scan the database sequences for exact match with

 Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)

 Protein-nucleotide 6-frame translation (tblastn)

 Large numbers of query sequences (megablast)

 An extremely fast but considerably less sensitive alternative to BLAST

 There are two main field-programmable gate array (FPGA)

You might also like