0% found this document useful (0 votes)
109 views

Dynamic Programming

This document discusses dynamic programming algorithms for pairwise sequence alignment. It introduces global and local alignment approaches. Global alignment finds the best alignment across the entire sequences, using the Needleman-Wunsch algorithm. Local alignment finds the best matching subsections, using the Smith-Waterman algorithm. Both use dynamic programming to compute the optimal alignment score in quadratic time and space. The document also discusses techniques to reduce the space complexity to linear.

Uploaded by

infamous0218
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Dynamic Programming

This document discusses dynamic programming algorithms for pairwise sequence alignment. It introduces global and local alignment approaches. Global alignment finds the best alignment across the entire sequences, using the Needleman-Wunsch algorithm. Local alignment finds the best matching subsections, using the Smith-Waterman algorithm. Both use dynamic programming to compute the optimal alignment score in quadratic time and space. The document also discusses techniques to reduce the space complexity to linear.

Uploaded by

infamous0218
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Alignment II

Dynamic Programming
2
Pair-wise sequence alignments
A: C A T - T C A - C
| | | | |
B: C - T C G C A G C

Idea: Display one sequence above
another with spaces inserted in both
to reveal similarity
3
Two types of alignment
S = CTGTCGCTGCACG
T = TGCCGTG
CTGTCGCTGCACG--
-------TGC-CGTG
CTGTCG-CTGCACG
-TGC-CG-TG----
Global alignment
Local alignment
4
Global alignment: Scoring
CTGTCG-CTGCACG
-TGC-CG-TG----
Reward for matches: o
Mismatch penalty: |
Space penalty:

score(A) = ow |x - y
w = #matches x = #mismatches y = #spaces
5
Global alignment: Scoring
C T G T C G C T G C
- T G C C G T G -
-5 10 10 -2 -5 -2 -5 -5 10 10 -5
Total = 11
Reward for matches: 10
Mismatch penalty: 2
Space penalty: 5
6
Optimum Alignment
The score of an alignment is a measure of
its quality
Optimum alignment problem: Given a pair
of sequences X and Y, find an alignment
(global or local) with maximum score
The similarity between X and Y, denoted
sim(X,Y), is the maximum score of an
alignment of X and Y
7
Alignment algorithms
Global: Needleman-Wunsch
Local: Smith-Waterman
NW and SW use dynamic
programming
Variations:
Gap penalty functions
Scoring matrices
8
Global Alignment: Algorithm
1..j 1..i
T and S of alignment optimum of Cost ) , ( = j i C
T of j length of Prefix
S of i length of Prefix
.. 1
.. 1
=
=
j
i
T
S

=
= +
=
b a
b a
b a w
if
if
) , (
|
o
9



+
=
) 1 j , i ( C
) j , 1 i ( C
) T , S ( w ) 1 j , 1 i ( C
max ) j , i ( C
j i
= = j ) j , 0 ( C i ) 0 , i ( C
Initial conditions:
Recurrence relation: For 1 s i s n, 1 s j s m:
Theorem. C(i,j) satisfies the following
relationships:
10
Justification
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j-1
T
j
C(i-1,j-1) + w(S
i
,T
j
)
S
1
S
2
. . . S
i-1
S
i
T
1
T
2
. . . T
j


C(i-1,j)
S
1
S
2
. . . S
i


T
1
T
2
. . . T
j-1
T
j

C(i,j-1)
11
Example
Case 1: Line up S
i
with T
j
S: C A T T C A C
T: C - T T C A G
i - 1
i
j
j -1
S: C A T T C A - C
T: C - T T C A G -
Case 2: Line up S
i
with space
i - 1 i
j
S: C A T T C A C -
T: C - T T C A - G
Case 3: Line up T
j
with space
i
j j -1
12
Computation Procedure
C(n,m)
C(0,0)
C(i,j)
{ } + = ) 1 j , i ( C , ) j , 1 i ( C ), T , S ( w ) 1 j , 1 i ( C max ) j , i ( C
j i
C(i-1,j) C(i-1,j-1)
C(i,j-1)
13
C T C G C A G C
A
C
T
T
C
A
C
+10 for match, -2 for mismatch, -5 for space
0 -5 -10 -15 -20 -25 -30 -35 -40
-5
-10
-15
-20
-25
-30
-35
10 5

14
0 -5 -10 -15 -20 -25 -30 -35 -40
-5 10 5 0 -5 -10 -15 -20 -25
-10 5 8 3 -2 -7 0 -5 -10
-15 0 15 10 5 0 -5 -2 -7
-20 -5 10 13 8 3 -2 -7 -4
-25 -10 5 20 15 18 13 8 3
-30 -15 0 15 18 13 28 23 18
-35 -20 -5 10 13 28 23 26 33
C T C G C A G C
A
C
T
T
C
A
C

Traceback can yield both optimum alignments
*
*
15
End-gap free alignment
Gaps at the start or end of alignment
are not penalized
Best global Best end-gap free
Match: +2 Mismatch and space: -1
Score = 1 Score = 9
16
Motivation: Shotgun assembly
Shotgun assembly produces large set of
partially overlapping subsequences from
many copies of one unknown DNA sequence.
Problem: Use the overlapping sections to
paste the subsequences together.
Overlapping pairs will have low global
alignment score, but high end-space free
score because of overlap.
17
Motivation: Shotgun assembly
18
Algorithm
Same as global alignment, except:
Initialize with zeros (free gaps at start)
Locate max in the last row/column (free
gaps at end)
19
10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
+10 for match, -2 for mismatch, -5 for gap
0 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0

5 8 5 8 5 20 15 10
0 15 10 5 6 15 18 13
-2 10 13 8 3 10 13 16
10 5 20 15 18 13 8 23
5 8 15 18 13 28 23 18
0 3 10 25 20 23 38 33
20
Local Alignment: Motivation
Ignoring stretches of non-coding DNA:
Non-coding regions are more likely to be
subjected to mutations than coding regions.
Local alignment between two sequences is likely
to be between two exons.
Locating protein domains:
Proteins of different kind and of different
species often exhibit local similarities
Local similarities may indicate functional
subunits.
21
Local alignment: Example
Best local alignment:
Match: +2 Mismatch and space: -1
Score = 5
S = g g t c t g a g
T = a a a c g a
g g t c t g a g
a a a c g a -
22
Local Alignment: Algorithm
Initialize top row and leftmost column to
zero.
| |
| | | | ( )
| |
| |



+
=
0
1 ,
, 1
, ] 1 , 1 [
max ,

j i C
j i C
j t i s score j i C
j i C
C [i, j] = Score of optimally aligning a
suffix of s with a suffix of t.
23
0 0 0 0 0 0 0 0 0
0 1 0 1 0 1 0 0 1
0 0 0 0 0 0 2 0 0
0 0 1 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 0 2 0 1 0 0 1
0 0 0 0 1 0 2 0 0
0 1 0 1 0 2 0 1 1
C T C G C A G C
A
C
T
T
C
A
C

+1 for a match, -1 for a mismatch, -5 for a space
24
Some Results
Most pairwise sequence alignment problems
can be solved in O(mn) time.
Space requirement can be reduced to
O(m+n), while keeping run-time fixed
[Myers88].
Highly similar sequences can be aligned in
O(dn) time, where d measures the distance
between the sequences [Landau86].
25
Reducing space requirements
O(mn) tables are often the limiting
factor in computing large alignments
There is a linear space technique that
only doubles the time required
[Hirschberg77]
26
0 10 5 10 5 10 5 0 10
C T C G C A G C
A
C
T
T
C
A
G
IDEA: We only need the previous row to calculate the next
0 0 0 0 0 0 0 0 0

0 5 8 5 8 5 20 15 10
27
Linear-space Alignments
mn + mn + mn + 1/8 mn + 1/16 mn + = 2 mn
28
Affine Gap Penalty Functions
Gap penalty = h + gk

where

k = length of gap
h = gap opening penalty
g = gap continuation penalty
Can also be solved
in O(nm) time
using dynamic
programming

You might also like