0% found this document useful (0 votes)

31 views25 pages

Week 3 LocalAlignment

Uploaded by

derinergin3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views25 pages

Week 3 LocalAlignment

Uploaded by

derinergin3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Sequence Alignment

Local Sequence Alignment

Local Alignment

• So far we have discussed global alignment, where we are

looking for best match between sequences from one end to the
other

• More commonly, we want a local alignment, the best match

between subsequences of two sequences x and y.
Local Alignment DP Algorithm

• Original formulation: Smith & Waterman, Journal of Molecular

Biology, 1981.

• interpretation of array values is somewhat different

– F ( i, j ) = score of the best alignment of a suffix of x[1...i ] and a

suffix of y[1...j ].
Local
LocalAlignment
AlignmentDPDP
Algorithm
Algorithm
• The •recurrence
the recurrence relation is slightly different than for global
relation is slightly different than for global
algorithm
algorithm
 F (i − 1, j − 1) +s ( xi, yj )
 F (i − 1, j ) + g

F (i, j ) = max 
 F (i, j − 1) + g
0
Local Alignment Example

• The recurrence relation is slightly different than for global

algorithm

• initialization: first row and first column initialized with 0’s •

traceback:
– find maximum value of F(i, j); can be anywhere in matrix
– stop when we get to a cell with value 0
Local Alignment DP Algorithm
Local Alignment Example
Heuristic Methods
• The algorithms we learned take O(nm) time to align sequences, which is too
slow for searching large databases.
• imagine an internet search engine, but where queries and results are protein sequences

• Heuristic methods do fast approximation to dynamic programming

• example: BLAST [Altschul et al., 1990; Altschul et al., 1997] – break sequence into small
(e.g. 3 base pair) “words”
• scan database for word matches
• extend all matches to seek high-scoring alignments
• tradeoff: sensitivity for speed
Basic Local Alignment Search Tool
(BLAST®)
• The main NCBI tool for comparing a protein or DNA sequence to other
sequences in various databases
• BLAST searching is one of the fundamental ways of learning about a protein or
gene:
• the search reveals what related sequences are present in the same organism and other
organisms.

• BLAST searching allows the user to select one sequence (termed the query) and
perform pairwise sequence alignments between the query and an entire
database (termed the target).
Basic Local Alignment Search Tool
(BLAST®)
• Tens of millions of sequences are evaluated in a BLAST search and only the

most closely related matches are returned.

• The Smith–Waterman local alignment algorithm finds optimal pairwise

alignments, but we cannot use it for database searches generally because it is

too computationally intensive.

• BLAST offers a local alignment strategy having both speed and sensitivity.
BLAST uses
BLAST searching has a wide variety of uses:

1. Determining what orthologs and paralogs are known for a

particular protein or nucleic acid sequence.
• When a new bacterial genome is sequenced and several thousand proteins
are identified, how many of these proteins are paralogous?

2. Determining what proteins or genes are present in a particular

organism.
• Are there any reverse transcriptase genes (such as HIV‐1 Pol gene) in fish?
BLAST uses
BLAST searching has a wide variety of uses:

3. Determining the identity of a DNA or protein sequence.

• For example, you may perform an RNAseq experiment and learn that a particular
RNA sequence is dramatically regulated under the experimental conditions that you
are using. This sequence may be searched against a protein database to learn what
proteins are most related to the protein encoded by your nucleotide sequence.

4. Discovering new genes.

• For example, a BLAST search of genomic DNA may reveal that the DNA encodes a
protein that has not been described before.
BLAST uses
BLAST searching has a wide variety of uses:
5. Determining what variants have been described for a particular gene or
protein.
• For example, many viruses are extremely mutable; what HIV‐1 Pol variants are known?

6. Investigating expressed sequence tags (ESTs) that may exhibit

alternative splicing.
• There is an EST database that can be explored by BLAST searching.

7. Exploring amino acid residues that are important in the function and/or
structure of a protein.
• The results of a BLAST search can be multiply aligned to reveal conserved residues such as
cysteines that are likely to have important biological roles.
BLAST search
(4 steps)
1. Selecting a sequence of interest and pasting, typing, or uploading it into the BLAST
input box.

2. Selecting a BLAST program (most commonly BLASTP, BLASTN, BLASTX, TBLASTX,

or TBLASTN).

3. Selecting a database to search. A common choice is the nonredundant (nr) database,

but there are many other databases.

4. Selecting optional parameters, both for the search and for the format of the output.
These options include choosing a substitution matrix, filtering of low‐complexity
sequences, and restricting the search to a particular set of organisms.
Select the link “Standard protein‐protein BLAST [blastp].” You will see a box to nonredundant database currently
enter the query sequence; enter the sequence of human beta globin (NP_000509.1) has ∼30 million sequences and

BLAST search
then click the “BLAST” button (Fig. 4.1). The result lists the proteins that are most
closely related to beta globin. We now describe the practical aspects of BLAST
∼88 billion letters. Note that if
you search with a query such as

(4 steps)
searching in detail. NP_000509 without specifying a
version number then, by default,
the most recent version will
be used.

Main page for a BLASTP search at NCBI. The sequence can be input

1
as an accession number, GI identifier, or FASTA‐formatted sequence as
shown here (arrow 1). The database must be selected (arrow 2) if the
default setting is not selected (as here, in which the database is set to
RefSeq proteins); the choice is highlighted in yellow. The search can be
2 restricted to a particular organism or taxonomic group, and Entrez
queries can be used to further focus the search (arrow 3); here we limit
the search to entries including the author Max Perutz. We discuss the
3

BLASTP algorithm in this chapter (arrow 4), and PSI‐BLAST,

4 PHI‐BLAST, and DELTA‐BLAST in Chapter 5. Many of the search
parameters can be modified (arrow 5).

5
sequence) and the subject (i.e., the particular database match that is aligned to the query)
can be inspected. Four scoring measures are provided: the bit score, the expect score, the
BLAST search
percent identity, and the positives (percent similarity).

(Output)

3
1
4
2
5 6

FIGURE 4.6 Top portion of a BLAST output describes the search that was performed including the
• Topquery
portion of a BLAST output describes the search that was performed including the
(arrow 1), the query length (arrow 2), the database that was searched (arrow 3), and the program that
query (arrow(BLASTP
was employed 1), the 2.2.28
query length
in this (arrow
case; arrow 4). At2),
the the database
bottom, thatinclude
additional links wasa searched
search sum- (arrow 3), and
mary showing details of the search statistics (arrow 5) and taxonomy reports of the results (arrow 6).
theSource:
program that was employed (BLASTP 2.2.28 in this case; arrow 4). At the bottom,
BLAST, NCBI.
additional links include a search summary showing details of the search statistics (arrow
5) and taxonomy reports of the results (arrow 6).
BLAST search BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST)

(Output)
(a) Default: conditional compositional score matrix adjustment

• The lower portion of a BLAST search output consists of a series of pairwise sequence
(b) No adjustment (by default, filter low complexity regions)
alignments, such as this in the figure. Here, the pairwise match between the query (input
sequence) and the subject (i.e., the particular database match that is aligned to the
query) can be inspected. Four scoring measures are provided: the bit score, the
expect score, the percent identity, and the positives (percent similarity).
BASIC LOCAL ALIGNMENT SEARCH TOOL (BLAST) 133

BLAST search summary

2
The upper portion shows the search parameters
(e.g., the program that was used, the expect
3 value (arrow 1), the scoring matrix (arrow 2), any
filters that were applied, the threshold (arrow 3)).
The middle portion describes the database; in
4
this example it includes about 6.9 billion amino
acid residues (arrow 4), and the output has been
restricted to txid10090 (i.e., mouse). The bottom
portion shows Karlin–Altschul statistics including
lambda, K, and H.
(when that information is available). For example, a search of human beta globin DNA
(NM_000518.4) against human RefSeq nucleotide sequences includes a match to epsilon
BLAST search
1 globin (NM_005330.3). That alignment includes information about the corresponding
proteins (Fig. 4.11).

(Output)

• A typical BLASTP
FIGUREoutput includes
4.9 A typical BLASTPaoutput
list of database
includes sequences
a list of database that
sequences that match
match the query.
the query.
Links are provided to that database entry (e.g., an NCBI Protein entry) and to the pairwise alignment to
Links are provided
the query. to
Thethat database
bit score entry
and E value for (e.g., are
each alignment analso
NCBI Protein
provided. Note thatentry) and to
the best matches at the
the top of the list have large bit scores and small E values.
pairwise alignment to the
Source: BLASTP, query. The bit score and E value for each alignment are also
NCBI.

provided. Note that the best matches at the top of the list have large bit scores and small
E values.
How Blast works
1. BLAST identifies homologous sequences using a heuristic method which
initially finds short matches between two sequences; thus, the method does
not take the entire sequence space into account.

2. After initial match, BLAST attempts to start local alignments from these initial
matches. This also means that BLAST does not guarantee the optimal
alignment, thus some sequence hits may be missed.

3. In order to find optimal alignments, the Smith-Waterman algorithm should be

used.
How Blast works
1. Parse query sequence into "words”. If a query sequence has a QWRTG, the searched words are QWR, WRT,
RTG.

2. For each word in the query sequence, generate neighborhood words, which exceed the threshold of T

3. Use the exact words and neighborhood words to find database sequences that have the words in
common

4. After initial finding of words (seeding), extend the (only 3 residues long) alignment in both directions.
Each time the alignment is extended, an alignment score is increases/decreased. When the alignment
score drops below a predefined threshold, the extension of the alignment stops.

5. Report the hit in the search results if it meets or exceeds the BLAST cutoffs for a statistically significant
match
that the two sequences have in common. A word is simply defined as a number o
How
astp the default word size is 3 Blast
W=3. If a queryworks
sequence has a QWRTG, the search
WR, WRT, RTG. See figure 1 for an illustration of words in a protein sequence.

Figure 1: Generation of exact BLAST words with a word size of W=3.

the initial BLAST seeding, the algorithm finds all common words between t
nce and the hit sequence(s). Only regions with a word hit will be used to bui
ent.
A neighborhood word is a word obtaining a score of at least T when comparing, using a
How Blast works
selected scoring matrix (see figure 2). The default scoring matrix for blastp is BLOSUM62 (for
explanation of scoring matrices, see www.clcbio.com/be). The compilation of exact words
and neighborhood words is then used to match against the database sequences.

Figure 2: Neighborhood BLAST words based on the BLOSUM62 matrix. Only words where the
threshold T exceeds 13 are included in the initial seeding.

P. 2
r initial finding of words (seeding), the BLAST algorithm will extend the (only 3 residues
nment in both directions (see How Blast
figure 3). works
Each time the alignment is extended, an alig
e is increases/decreased. When the alignment score drops below a predefined thre
extension of the alignment stops. This ensures that the alignment is not extended to re
re only very poor alignment
HSP: between thesegment
High-scoring query and
pair hit sequence is possible. If the ob
nment receives a score above a certain threshold, it will be included in the final BLAST

Figure 3: Blast aligning in both directions. The initial word match is marked green.

weaking the word size W and the neighborhood word threshold T, it is possible to lim
rch space. E.g. by increasing T, the number of neighboring words will drop and thus lim
rch space as shown in figure 4.
Which BLAST program should I use?
Depending on the nature of the sequence it is possible to use different BLAST programs for the database
search. There are five versions of the BLAST program, blastn, blastp, blastx, tblastn, tblastx:

Option Query type DB type Comparison Nore

blastn nucleotide nucleotide nucleotide-
nucleotide
lastp protein protein protein-protein
tblastn protein nucleotide protein-protein The database is translated
into protein
blastx nucleotide protein protein-protein The queries are translated
into protein
tblastx nucleotide nucleotide protein-protein The queries and database
are translated into protein

Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
BLAST
No ratings yet
BLAST
30 pages
BLAST
No ratings yet
BLAST
17 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Ncbi Blast Name: Rohith ND Roll No:20054
No ratings yet
Ncbi Blast Name: Rohith ND Roll No:20054
11 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Blast
No ratings yet
Blast
60 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
BLAST
100% (1)
BLAST
4 pages
BLAST Presentation
No ratings yet
BLAST Presentation
18 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Merin 1
No ratings yet
Merin 1
10 pages
Blast 170122070200
No ratings yet
Blast 170122070200
22 pages
Blast Analisis II
No ratings yet
Blast Analisis II
15 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Mastering BLAST Tutorial
No ratings yet
Mastering BLAST Tutorial
4 pages
Bt7 Ncbi Blast
No ratings yet
Bt7 Ncbi Blast
60 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Multi Blast
No ratings yet
Multi Blast
3 pages
Blast
No ratings yet
Blast
18 pages
BE Blast
No ratings yet
BE Blast
11 pages
Bioinformatics Session8
No ratings yet
Bioinformatics Session8
33 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
No ratings yet
Variants of Blast: By-Darshana D Ghadi Roll No. - 03
17 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Database Searching
No ratings yet
Database Searching
41 pages
Fundamentals of Bioinformatics - L5
No ratings yet
Fundamentals of Bioinformatics - L5
56 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
LO6 Basic Local Alignment Search Tool
No ratings yet
LO6 Basic Local Alignment Search Tool
10 pages
Production of Biodiesel From Vegetable Oils
No ratings yet
Production of Biodiesel From Vegetable Oils
9 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
BLAST Background
100% (1)
BLAST Background
27 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Lecture 9... Basic Local Alignment Tool (BLAST) - 1
No ratings yet
Lecture 9... Basic Local Alignment Tool (BLAST) - 1
11 pages
Blast
No ratings yet
Blast
115 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Blast Tips
No ratings yet
Blast Tips
6 pages
Asic Ocal Lignment Earch Ool: B L A S T Blast
No ratings yet
Asic Ocal Lignment Earch Ool: B L A S T Blast
24 pages
Blast
No ratings yet
Blast
12 pages
Blast
100% (1)
Blast
21 pages
Bioinfo Lab - Exp 5 9921001004
No ratings yet
Bioinfo Lab - Exp 5 9921001004
5 pages
ItoBI Lec10 1
No ratings yet
ItoBI Lec10 1
17 pages
Lesson 4.3 Fast Blast
No ratings yet
Lesson 4.3 Fast Blast
45 pages
Blast Nsuite
No ratings yet
Blast Nsuite
19 pages
How To Use BLAST
No ratings yet
How To Use BLAST
18 pages
TY-Exercise 4
No ratings yet
TY-Exercise 4
8 pages
Lecture/Lab: BLAST: Materials Last Updated June 2007
No ratings yet
Lecture/Lab: BLAST: Materials Last Updated June 2007
11 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 S, A New Tool For Comparing Protein and Nucleotide Sequences
4 pages
Blast
No ratings yet
Blast
6 pages
Bio 2
No ratings yet
Bio 2
39 pages

Week 3 LocalAlignment

Uploaded by

Week 3 LocalAlignment

Uploaded by

Sequence Alignment

Local Sequence Alignment

• So far we have discussed global alignment, where we are

• More commonly, we want a local alignment, the best match

• Original formulation: Smith & Waterman, Journal of Molecular

• interpretation of array values is somewhat different

– F ( i, j ) = score of the best alignment of a suffix of x[1...i ] and a

• The recurrence relation is slightly different than for global

• initialization: first row and first column initialized with 0’s •

• Heuristic methods do fast approximation to dynamic programming

most closely related matches are returned.

• The Smith–Waterman local alignment algorithm finds optimal pairwise

alignments, but we cannot use it for database searches generally because it is

too computationally intensive.

1. Determining what orthologs and paralogs are known for a

2. Determining what proteins or genes are present in a particular

3. Determining the identity of a DNA or protein sequence.

4. Discovering new genes.

6. Investigating expressed sequence tags (ESTs) that may exhibit

2. Selecting a BLAST program (most commonly BLASTP, BLASTN, BLASTX, TBLASTX,

3. Selecting a database to search. A common choice is the nonredundant (nr) database,

BLASTP algorithm in this chapter (arrow 4), and PSI‐BLAST,

BLAST search summary

3. In order to find optimal alignments, the Smith-Waterman algorithm should be

Figure 1: Generation of exact BLAST words with a word size of W=3.

Option Query type DB type Comparison Nore

You might also like