Lecture # 15 - New
Lecture # 15 - New
Decide on : algorithm
design techniques etc.
Design an algorithm
Prove correctness
Read Chapter 8
Dynamic Programming
Dynamic Programming
for Solving
Optimization Problems
(Edit Distance)
MI C H A E L J ACKSO N
E N D S H
Operation S D S S S D D D D I
Distance 0 1 2 3 4 5 6 7 8 9 10
Department of Computer Science
What is Edit Distance?
EDIT
Department SCRIPT:
of Computer Science DIS
What is Edit Distance?
What is the edit distance between FOOD and MONEY.
The edit distance between FOOD and MONEY is at most four:
S U N - - N Y
Cost = 5
The first word has a gap for every insertion (I) and
the second word has a gap for every deletion (D).
Columns with two different characters correspond to
substitutions (S). Matches (M) do not count.
Department of Computer Science
How to measure Edit Distance?
The first word has a gap for every insertion (I) and
the second word has a gap for every deletion (D).
Columns with two different characters correspond to
substitutions (S). Matches (M) do not count.
Department of Computer Science
How to measure Edit Distance?
Alignment : 2 x k matrix ( k m, n )
S1= SNOWY n=5
S2 = SUNNY m=5
- S N O W - Y
S U N - - N Y
Cost = 5
S - N O W Y
S U N N - Y
Cost = 3
Department of Computer Science
UP-SHOT
E(6,5)
E(5,4)
In this case: E(i, j) = E(i - 1, j - 1)
Department of Computer Science
Formulation of Edit Distance?
E(i, j) = E(i - 1, j - 1)
Insertion:
E(5,4)
E(5,3)
In this case
E(i, j) = E(i, j - 1) + 1
Department of Computer Science
Formulation of Edit Distance?
In this case
E(i, j) = E(i, j - 1) + 1
Deletion:
E(5,4)
E(4,4)
In this case:
Department of Computer Science
E(i, j) = E(i- 1, j) + 1
Formulation of Edit Distance?
In this case
E(i, j) = E(i- 1, j) + 1
E(4,3)
If this case: E(i, j) = E(i - 1, j - 1) + 1
E(i, j) = E(i - 1, j - 1) + 1
I I I I I I I
0 0 1 2 3 4 5 6 7
D K
I I I I I I
B 1 1 0 1 2 3 4 5 6
D D K
I I I I I
E 2 2 1 0 1 2 3 4 5
D D K S S S K
E I I I
3 3 2 1 1 2 3 3 4
T 4
D
4
D
3
D
2
S D
2
K
1
I
2
I
3
S
I ?
4
D D D S D D S S S
H I I
5 5 4 3 3 2 2 3 4
K I D S
Keep
Department of Computer ScienceInsert Delete Substitute
B E A T L E S
0 1 2 3 4 5 6 7
Keep B,1
B 1 0 1 2 3 4 5 6
Keep E,2
E 2 1 0 1 2 3 4 5
Subst EA,3
E 3 2 1 1 2 3 3 4
Keep T,4
T 4 3 2 2 1 2 3 4
Subst
Ins L,5 Ins E,6 HS,7
H 5 4 3 3 2 2 3 4
Keep
Department of Computer ScienceInsert Delete Substitute
Recursive Algorithm for Edit Distance?
The edit distance like algorithms are used to compute a distance between
DNA sequences (strings over A,C,G,T, or protein sequences (over an
alphabet of 20 amino acids), for various purposes, e.g.:
to find genes or proteins that may have shared functions or properties
to infer family relationships and evolutionary trees over different
organisms.
Department of Computer Science
Application of Edit Distance
In biological applications, we often want to compare the DNA of two (or
more) different organisms.
Finding sequence similarities with genes of known function is a common
approach to infer a newly sequenced gene’s function
A normal growth gene switched on at the wrong time causes cancer
!