0% found this document useful (0 votes)

91 views51 pages

PCB Lect02 Pairwise Allign

Pairwise sequence alignment involves comparing two biological sequences to identify regions of similarity and evolutionary relationships. Dynamic programming is used to compute the highest scoring alignment between two sequences. A scoring matrix assigns values to aligned pairs based on factors like mutation rates, and gap penalties discourage insertions and deletions. The dynamic programming algorithm fills a table to keep track of the best score for aligning prefixes of the two sequences.

Uploaded by

Livs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views51 pages

PCB Lect02 Pairwise Allign

Uploaded by

Livs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Lecture 2

Pairwise sequence alignment.

Principles Computational Biology

Teresa Przytycka, PhD
Assumptions:
•  Biological sequences evolved by evolution.
•  Micro scale changes: For short sequences (e.g. one
domain proteins) we usually assume that evolution
proceeds by:
–  Substitutions Human MSLICSISNEVPEHPCVSPVS …
–  Insertions/Deletions Protist MSIICTISGQTPEEPVIS-KT …

•  Macro scale changes: For large sequences (e.g.

whole genomes) we additionally allow,
–  Duplications
–  reversals
–  Protein segments known as domains are reused by
different proteins (via various mechanisms)
Importance of sequence comparison

Discovering functional and evolutional relationships

in biological sequences:

–  Similar sequences ! evolutionary relationship

–  evolutionary relationship ! related function
–  Orthologs ! same (almost same) function in different
organisms.

“!” should be read usually implies

Discovering sequence similarity by
dot plots
Given are two sequence lengths n and m respectively. Do they
share a similarity and if so in which region?

Dot-plot method: make n x m matrix with D and set D(i,j) = 1

if amino-acid (or nucleotide) position i in first sequence is
the same (or similar as described later) as the amino-acid
(nucleotide) at position j in the second sequence.
Print graphically the matrix printing dot for 1 and space for 0
Dot plot illustration
T T A C T C A A T Diagonals from top
A left to bottom right
correspond to regions
C that are identical in
both sequences
T
The diagonals in the
C perpendicular
A direction correspond
to reverse matches
T
T
Deletion?
A or

C Mutation?
An example of a dot plot where the relation
between sequences in not obvious

(In an obvious case we would see a long diagonal line)

Figure drawn with Dotter : www.cgb.ki.se/cgb/sonnhammer/Dotter.html
Removing noise in dot plots

•  Most of dots in a dot plot are by chance and

introduce a lot of noise.
•  Removing the noise: Put a dot ONLY if in
addition to the similarity in the given position
there is a similarity in the surrounding
positions (we look at in a “window” of a size
given as a parameter).
Dot plot with window 3
T T A C T C A A T
A A dot is kept
only if there ware
C a dots on both
T sides of it on the
corresponding
C
diagonal
A
T
T
A
C
W = 10
EXAMPLE: Genomic dot plots
In these comparisons, each dot corresponds to a pair of orthologous
genes The key feature of these plots is a distinct X-shaped pattern.
This suggests that large chromosomal inversions reversed the genomic
sequence symmetrically around the origin of replication; such
symmetrical inversions appear to be a common feature of bacterial
genome evolution.

3000 3000

2500 2500

2000 2000

Vpa Chr I
Vpa Chr I

1500 1500

1000 1000

500 500

0 0
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Vvu Chr I Vch Chr I
OWEN: aligning long collinear regions of genomes
OWEN is an interactive tool for aligning two long DNA sequences that represents similarity
between them by a chain of collinear local similarities. OWEN employs several methods for
constructing and editing local similarities and for resolving conflicts between them.
Sequence alignment
•  Write one sequence along the other so that to expose
any similarity between the sequences. Each element of
a sequence is either placed alongside of corresponding
element in the other sequence or alongside a special
“gap” character
•  Example: TGKGI and AGKVGL can be aligned as
TGK - GI
AGKVGL
•  Is there a better alignment? How can we compare the
“goodness” of two alignments.
•  We need to have:
–  A way of scoring an alignment
–  A way of computing maximum score alignment.
Identity score
Let (x,y) be an aligned pair of elements of two
sequences (at least one of x,y must not be a gap).

1 if x= y
id(x, y)= { 0 if x ≠ y

Score of an alignment = sum of scores of aligned pairs

TGK - G
AGKVG
60 % identical
0+1+1+0+1 = 3
Gap penalties
Consider two pairs of alignments:

ATCG AT – C G They have the same

and identity score but
ATTG AT T - G
alignment on the left is
more likely to be correct
ATC - - T A and AT - C - T A
ATT T T TA AT T T T TA

•  The first problem is corrected by introducing

“gap penalty”.
•  Second problem is corrected by introducing
additional penalty for opening a gap.
Example
Score the above alignment using identity score; gap penalty = 1
Gap opening penalty = 2

ATCG AT – C G
1+1+0+1=3 ATTG AT T - G
1+1-2-1-2-1+1=-3

ATC - - T A AT - C - T A
ATT T T TA AT T T T TA

1+1+0-2-1-1+1+1=0 1+1-2-1+0-2-1+1+1=-2
Problems with identity score
•  In the two pairs of aligned sequence below there are
mutations at the first and 6th position and insertion (or
deletion) on the 4th position. However while V and A share
significant biophysical similarity and we often see
mutation between them, W and A do not often substitute
one for the other.
VGK – GI… WGK – GI…
AGKVGL… AGKVGL
•  What if I mutated to V and then back to I should this have
the same score as when I was unchanged? If we will like to
use the score to estimate evolutionary distances it would
be wrong to consider them as identical.
Scoring Matrices
An amino-acid scoring matrix is a 20x20 table such that position
indexed with amino-acids so that position X,Y in the table gives the
score of aligning amino-acid X with amino-acid Y

Identity matrix – Exact matches receive one score and non-exact

matches a different score (1 on the diagonal 0 everywhere else)
Mutation data matrix – a scoring matrix compiled based on
observation of protein mutation rates: some mutations are
observed more often then other (PAM, BLOSUM).
Not used:
Physical properties matrix – amino acids with similar
biophysical properties receive high score.
Genetic code matrix – amino acids are scored based on
similarities in the coding triple.

(scoring matrices will be discussed during next class)

Principles of Dynamic programming

•  Need to figure out how to use solution to

smaller problems for solving larger
problem.
•  We need to keep a reasonable bound on
how many sub-problems we solve
•  Make sure that each sub-problem is solved
only once
Dynamic programming algorithm for
computing the score of the best alignment
For a sequence S = a1, a2, …, an let Sj = a1, a2, …, aj
S,S’ – two sequences
Align(Si,S’j) = the score of the highest scoring alignment between
S1i,S2j
S(ai, a’j)= similarity score between amino acids ai and aj
given by a scoring matrix like PAM, BLOSUM
g – gap penalty

{
Align(Si-1,S’j-1)+ S(ai, a’j)
Align(Si,S’j)= max Align(Si,S’j-1) - g
Align(Si-1,S’j) -g
Organizing the computation – dynamic
programming table
Align
j

Align(i,j) =
Align(Si,S’j)= max
i
Align(Si-1,S’j-1)+ s(ai, a’j)
{ Align(Si-1,S’j) - g
Align(Si,S’j-1) - g

+s(ai,aj)
max
Example of DP computation with
g = 0; match = 1; mismatch=0
Maximal Common Subsequence
initialization
A T T G C G C G C A T

0 0 0 0 0 0 0 0 0 0 0 0

A 0 1 1 1 1 1 1 1 1 1 1 1

T 0 1 2 2 2 2 2 2 2

G 0 1 2

C 0 1

T 0 1

A 0 1
A 0 1
+1 if match else 0
C 0 1
C
A
0 1
max
0 1
Example of DP computation with
g = 2 match = 2; mismatch = -1
Initialization (penalty for starting with a gap)
A T T G C G C G C A T

0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

A -2 2 0 -2

T -4 0 4

G -6 6

C -8

T -10

T
A
-12

-14
+2 if matched -1 else
A -16
C -18 -2
C
A
-20
max
-22
-2
The iterative algorithm
m = |S|; n = |S’|
for i " 0 to m do A[i,0]"- i * g
for j " 0 to n do A[0,j]" - j * g
for i " 1 to m do
for j " 1 to n
A[i,j]"max (
A[i-1,j] – g
A[i-1,j-1] + s(i,j)
A[i,j-1] – g
)
return(A[m,n])
Complexity of the algorithm

•  Time O(nm); Space O(nm) where n, m the

lengths of the two sequences.
•  Space complexity can be reduced to O(n) by
not storing the entries of dynamic
programming table that are no longer
needed for the computation (keep current
row and the previous row only).
From computing the score to
computing of the alignment
Desired output:
Sequence of substitutions/insertion/deletions leading to the optimal
score.
ATTGCGTTATAT
AT- GCG- TATAT

+s(ai,a’j) Red direction = mach

Blue direction = gap in horizontal sequence
max Green direction = gap in vertical sequence

a1, a2, …. aj a1, a2, …, aj - a1, a2, …….. aj

a’1, a’2, … a’j a’1, a’2, …, a’j a’1, a’2, …, a’j -
Recovering the path
A T T G

A
T
G Start path
from here!
C
ATTG- If at some position several choices lead
AT- GC to the same max value, the path need
not be unique.
Extra information not obligatory
Reducing space complexity in the
global alignment
Recall: Computing the score in linear space is easy.
Leaving “trace” for finding optimal alignment is harder. Why?

Let
OPT [ ]x
y
be an optimal alignment between
sequence x and y

fix i, then there exist j such that OPT [ ]

x
y can be obtained as

[ ]
x[1,…i-1] x[i+1,…m]
[
x[i] +

OR
OPT ]
y[1,…,j-1] + y[j]
OPT y[j+1,…,n]

OPT
[
x[1,…i-1]
y[1,…,j] ] +
x[i] +
-
OPT [ x[i+1,…m]
y[j+1,…,n] ]
Extra information – not obligatory

Computing which of the two cases holds and for

what value of j:
1.  Use dynamic programming for to compute the scores a[i,j] for
fixed i=n/2 and all j. O(nm/2)-time; linear space
2.  Do the same for the suffixes. O(nm/2)-time; linear space
3.  Find out which of the two cases from the previous case applies
and for which value of j.
4.  Apply 1 & 2 recursively for the sequences to the left of (i,j)
and to the right of (i,j) (figure from previous slide)
Ignoring initial and final gaps –
semiglobal comparison

CAGCA - CTTGGATTCTCGG No penalties for

- - - CAGCGTGG - - - - - - - - these gaps

Recall the initialization step for the dynamic programming table:

A[0,i] = ig; A[j,0]=jg – these are responsible for initial gaps.

set them to zero!

How to ignore final gaps?

Take the largest value in the last row /column and trace-back form there
Example of DP computation
ignoring flanking gaps by assigning 0 to initial gap penalties
A T T G C G C G C A T
0 0 0 0 0 0 0 0 0 0 0 0

0 1 -1
A
T 0 1 2

G 0

C 0

T 0
T 0
A 0
A 0
C
C
0
+s(ai,aj) -2
0
A
0
max
-2
To ignore final gap penalties choose the highest scoring entry in last
column or last row and trace the path from there.
Trace back from the highest score in red row or column
Compressing the gaps
The two alignments below have the same score.
The second alignment is better.

ATTTTAGTAC ATTTTAGTAC
ATT- - AGTAC A-T-T -AGTAC

Solution: Have additional penalty for opening a gap

Affine gap penalty

w(k) = h + gk ; h,g constants

Interpretation: const of starting a gap: h+g, extending gap: +g
Naïve extension of the previous
algorithm
Align
j
i’ Rather than checking for the best of
Three values we have to check whole
green row and blue column to consider
i
all possible gap lengths.

That is find max over the following

max over i’
s(i’,j) – opening gap -g(gap_length)
s(i-1,j-1)
max over j’
s(i,j’) – opening gap -g(gap_length)

Complexity O(n3)
General gap penalty

a[i-1,j-1] + s(i,j)

{
a[i,j]= max max b[i,j-k] – w(k) for 0 <=k<=i
max b[i-k,j] – w(k) for 0 <=k<=j

w(k) any gap penalty function (not necessarily afine)

k = size of a gap.
O(n2) algorithm for afine gap penalty
Let w(k) = h + gk
S1, S2 compared sequences

We will have 3 dynamic programming tables:

s a[i,j] best possible alignment of Si and S’j

s b[i,j] best possible alignment of Si and S’j that ends with a gap in S
s c[i,j] best possible alignment of Si and S’j that ends with a gap in S’

Text Pevzner’s book notation

continue a gap
c

open new gap and

insert gap of length one
matching
a

close gap
b

Jonson & Pevzner

Jonson & Pevzner
Initialization
Assume that we charge for initial and terminal gaps

-infinity is assigned where no alignment possible

a[0,0] = 0;
b[i,0] = - infinity
a[i,0] = - infinity (i<>0);
b[0,j] = - (h+gj)
a[0,j] = -infinity (j<>0)

c[i,0] = - (h+gi)
c[0,j] = - infinity
Affine gap penalty function - cont
w(k) = h + gk ; h,g constants
Interpretation: const of starting a gap: h+g, extending gap: +g
Let a,b,c be as before. Now they can be completed as follows:
a[i-1,j-1] + s(i,j)
a[i,j]= max
{ b[i,j]
c[i,j]

b[i,j]= max
{ a[i,j-1] – (h+g)
b[i,j-1] – g
--- start a new gap in first seq
-- extend gap in second first by one

c[i,j]= max
{ a[i-1,j] – (h+g)
c[i-1,j] – g
More sophisticated gap penalties
•  gap penalty can be made to dependent non-
linearly on length (e.g. as log function)

Let gap penalty be given by function w(k), where k-gap

length.

•  if w(k) is an arbitrary function – O(n3) algorithm

•  w(k) = log k (and other concave or convex functions)
- O(n2 log n) algorithm (non-trivial)
Comparing similar sequences
Similar sequences – optimal alignment has small number of gaps.

The “alignment path”

stays close to the
diagonal

From book Setubal Meidanis”Introduction Comp. Mol. Biol”

Speeding up dynamic programming
under assumption of small number of
gaps
Idea: Use only strip of width 2k+1 along the diagonal. The rest
of the array remains unused (and not initialized)

Modify the “max” expressions so that cell outside the strip are
not considered.

Time complexity O(kn)

Space complexity – if you store only cells that are used – O(kn)
Identifying diagonals:

Number the diagonals as follows:

0 – main diagonal
+i ith diagonal above 0 diagonal
-i ith diagonal below 0 diagonal

Simple test to find the number of diagonal for element a(i,j):

j-i
k-band alignment
n = |S|= |S’|
for i " 0 to k do A[i,0]"- i * g
for j " 0 to k do A[0,j]" - j * g
for i " 1 to n do
for d " -k to k
j = i+d;
if inside_strip(i,j,k) then:
A[i,j]"max (
if inside_strip(i-1,j,k) then A[i-1,j] – g else -infinity
A[i-1,j-1] + s(i,j)
if inside_strip(i-1,j,k) then A[i,j-1] – g else -infinity
)
return(A[m,n])

Where insid _strip(i,j,k) is a test if cell A[i.j] is inside the strip that is if |i-j|<=k
Local alignment
•  The alignment techniques considered so far
worked well for sequences which are similar over
all their length
•  This does not need to be the case: example gene
from hox family have very short but highly
conserved subsequence – the so called hox
domain.
•  Considered so far global alignment methods (that
is algorithm that try to find the best alignment
over whole length can miss this local similarity
region
Global

Local
Local alignment (Smith, Waterman)
So far we have been dealing with global alignment.
Local alignment – alignment between substrings.
Main idea: If alignment becomes to bad – drop it.

Set p and g so that alignment of random strings gives negative

score

a[i-1,j-1]+ s(ai, aj)

a[i,j]= max
{ a[i-1,j +g
a[i,j-1]+ g
0

Finding the alignment: find the highest scoring cell and trace it back
Example
Global/local comparison

Global alignment gives lower score.

Pairwise alignment: a combination of
local and global alignments

alignment

• semiglobal

• global
Step 2 of FASTA

Locate best diagonal runs (gapless alignments) Give

positive score for each hot spot
–  Give negative score for each space between hot spots
–  Find best scoring runs
–  Score the alignments from the runs and find ones above
a threshold. These are possible “sub-alignments”
Step 3 of FASTA

•  Combine sub-alignments into

one alignment.
•  We need to solve a problem
known as the chaining
problem : find a collection of
non-contradicting sub-
alignments that maximize
some scoring function.
•  Problem reduces to a problem
close to maximum common
subsequence.

Microbiology Lecture Transes
No ratings yet
Microbiology Lecture Transes
155 pages
March 2024 Guidelinestoimprovengstesting 1710528644464
No ratings yet
March 2024 Guidelinestoimprovengstesting 1710528644464
57 pages
Digital CMC Solutions in Pharma Manufacturing
No ratings yet
Digital CMC Solutions in Pharma Manufacturing
8 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Dynamic Programming Approach
No ratings yet
Dynamic Programming Approach
32 pages
Lectures 9-12
No ratings yet
Lectures 9-12
39 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Lec 02
No ratings yet
Lec 02
103 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
9700 m16 QP 22 PDF
No ratings yet
9700 m16 QP 22 PDF
16 pages
Chronic Disease Cause by Pseudomonas - Aeruginosa
No ratings yet
Chronic Disease Cause by Pseudomonas - Aeruginosa
22 pages
Module II
No ratings yet
Module II
51 pages
Abdi Et Al NatureComm 2018
No ratings yet
Abdi Et Al NatureComm 2018
16 pages
Lecture1 2
No ratings yet
Lecture1 2
44 pages
1.1.1.a Storyboard Launch PDF
No ratings yet
1.1.1.a Storyboard Launch PDF
29 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
Lecture 9 and 10 Pair Wise Global Alignment.
No ratings yet
Lecture 9 and 10 Pair Wise Global Alignment.
27 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Cytotoxicity Assay
No ratings yet
Cytotoxicity Assay
7 pages
Prof MV Rajam, Res Method
No ratings yet
Prof MV Rajam, Res Method
37 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Redox Chemistry in The Genome: Emergence of The (4Fe4S) Cofactor in Repair and Replication
No ratings yet
Redox Chemistry in The Genome: Emergence of The (4Fe4S) Cofactor in Repair and Replication
31 pages
Faculty Hotel and Tourism Management 2019 Session 2 Degree HTC556
No ratings yet
Faculty Hotel and Tourism Management 2019 Session 2 Degree HTC556
8 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
Bacteriorhodopsin Synchrotron Radiation
No ratings yet
Bacteriorhodopsin Synchrotron Radiation
25 pages
Week 4
No ratings yet
Week 4
38 pages
Revealing The World of RNA Interference: Nature September 2004
No ratings yet
Revealing The World of RNA Interference: Nature September 2004
42 pages
Kmab 14 2111748
No ratings yet
Kmab 14 2111748
20 pages
Biospin Whole Blood Genomic DNA Extraction Kit
No ratings yet
Biospin Whole Blood Genomic DNA Extraction Kit
2 pages
Liu, W., Shi, L., & Li, S. (2019) - The Immunomodulatory Effect of Alpha-Lipoic Acid in Autoimmune Diseases. BioMed Research International
No ratings yet
Liu, W., Shi, L., & Li, S. (2019) - The Immunomodulatory Effect of Alpha-Lipoic Acid in Autoimmune Diseases. BioMed Research International
12 pages
Sequence Comparison Part 3
No ratings yet
Sequence Comparison Part 3
22 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
Potential of rDNA Technology in Revolutionizing Future
No ratings yet
Potential of rDNA Technology in Revolutionizing Future
23 pages
Insulina Tutorial
No ratings yet
Insulina Tutorial
20 pages
GFP Tutorial
No ratings yet
GFP Tutorial
19 pages
L5.1: Introduction To Living Biohybrids: Cellular Design Principles
No ratings yet
L5.1: Introduction To Living Biohybrids: Cellular Design Principles
16 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Simple Models of Gene Expression
No ratings yet
Simple Models of Gene Expression
14 pages
NPTEL Course List
No ratings yet
NPTEL Course List
94 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
Biological Psychology: Chapter 02: Structure and Function of The Neuron, Anatomy of The Brain
No ratings yet
Biological Psychology: Chapter 02: Structure and Function of The Neuron, Anatomy of The Brain
6 pages
Lecture 1
No ratings yet
Lecture 1
3 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
Government College University, Faisalabad: Max Marks 32 (20+12)
No ratings yet
Government College University, Faisalabad: Max Marks 32 (20+12)
2 pages
L6.1: Synthetic Life: Jenna Rickus Purdue University
No ratings yet
L6.1: Synthetic Life: Jenna Rickus Purdue University
19 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Laboratory Report: Test Required Result Reference Sars-Cov-2 Test
No ratings yet
Laboratory Report: Test Required Result Reference Sars-Cov-2 Test
1 page
AmpliPhi Biosciences Bio Europe
No ratings yet
AmpliPhi Biosciences Bio Europe
24 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
BCH 202 - Paper 2 - Nov 2019
No ratings yet
BCH 202 - Paper 2 - Nov 2019
5 pages
On Job Training Proposal TOTO
No ratings yet
On Job Training Proposal TOTO
6 pages
1.6 - Compare Animal Cells and Plant Cells
No ratings yet
1.6 - Compare Animal Cells and Plant Cells
14 pages
A Tour of The Cell
No ratings yet
A Tour of The Cell
11 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
MODX LAB 2 - DNA Isolation
No ratings yet
MODX LAB 2 - DNA Isolation
7 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
(DASBIO) Chemostat + Recycle - 4
No ratings yet
(DASBIO) Chemostat + Recycle - 4
36 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
Arduino
No ratings yet
Arduino
3 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Unit I Biosensors SN Jain
No ratings yet
Unit I Biosensors SN Jain
5 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Ankita Patil's CV - ANKITA PATIL
No ratings yet
Ankita Patil's CV - ANKITA PATIL
3 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
All in Trans Molecular Biology
100% (1)
All in Trans Molecular Biology
12 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
4.1. Pairwise Alignment - 2
No ratings yet
4.1. Pairwise Alignment - 2
4 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Gpat Sylabus
No ratings yet
Gpat Sylabus
2 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Buku Biomedik
No ratings yet
Buku Biomedik
4 pages
Lecture 01 - Agricultural Biotechnology - History & Scope
No ratings yet
Lecture 01 - Agricultural Biotechnology - History & Scope
16 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
European Journal of Internal Medicine: Luca Pasina, Gianluigi Casadei, Alessandro Nobili
No ratings yet
European Journal of Internal Medicine: Luca Pasina, Gianluigi Casadei, Alessandro Nobili
8 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Three Steps in Dynamic Programming
No ratings yet
Three Steps in Dynamic Programming
7 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Second Quarter Exam Earth and Life Sciences
No ratings yet
Second Quarter Exam Earth and Life Sciences
4 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

PCB Lect02 Pairwise Allign

Uploaded by

PCB Lect02 Pairwise Allign

Uploaded by

Lecture 2

Pairwise sequence alignment.

Principles Computational Biology

• Macro scale changes: For large sequences (e.g.

Discovering functional and evolutional relationships

– Similar sequences ! evolutionary relationship

“!” should be read usually implies

Dot-plot method: make n x m matrix with D and set D(i,j) = 1

(In an obvious case we would see a long diagonal line)

• Most of dots in a dot plot are by chance and

Score of an alignment = sum of scores of aligned pairs

ATCG AT – C G They have the same

• The first problem is corrected by introducing

Identity matrix – Exact matches receive one score and non-exact

(scoring matrices will be discussed during next class)

• Need to figure out how to use solution to

0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

• Time O(nm); Space O(nm) where n, m the

+s(ai,a’j) Red direction = mach

a1, a2, …. aj a1, a2, …, aj - a1, a2, …….. aj

fix i, then there exist j such that OPT [ ]

Computing which of the two cases holds and for

CAGCA - CTTGGATTCTCGG No penalties for

Recall the initialization step for the dynamic programming table:

A[0,i] = ig; A[j,0]=jg – these are responsible for initial gaps.

How to ignore final gaps?

Solution: Have additional penalty for opening a gap

Affine gap penalty

w(k) = h + gk ; h,g constants

That is find max over the following

w(k) any gap penalty function (not necessarily afine)

We will have 3 dynamic programming tables:

s a[i,j] best possible alignment of Si and S’j

Text Pevzner’s book notation

open new gap and

Jonson & Pevzner

-infinity is assigned where no alignment possible

Let gap penalty be given by function w(k), where k-gap

• if w(k) is an arbitrary function – O(n3) algorithm

The “alignment path”

From book Setubal Meidanis”Introduction Comp. Mol. Biol”

Time complexity O(kn)

Number the diagonals as follows:

Simple test to find the number of diagonal for element a(i,j):

Set p and g so that alignment of random strings gives negative

a[i-1,j-1]+ s(ai, aj)

Global alignment gives lower score.

Locate best diagonal runs (gapless alignments) Give

• Combine sub-alignments into

You might also like

•  Macro scale changes: For large sequences (e.g.

–  Similar sequences ! evolutionary relationship

•  Most of dots in a dot plot are by chance and

•  The first problem is corrected by introducing

•  Need to figure out how to use solution to

•  Time O(nm); Space O(nm) where n, m the

•  if w(k) is an arbitrary function – O(n3) algorithm

•  Combine sub-alignments into