0% found this document useful (0 votes)
6 views

Lecture 2

Uploaded by

chunfeng277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture 2

Uploaded by

chunfeng277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

CS 466

Introduction to Bioinformatics
Lecture 2
Mohammed El-Kebir
August 30, 2024
Course Announcements
Instructor:
• Mohammed El-Kebir (melkebir)
• Office hours: Wednesdays, 3:30-4:30pm in Siebel 3216

Piazza / Gradescope:
• https://piazza.com/illinois/fall2024/cs466
• Gradescope entry code: ‘VDEE52’

TA:
• Nicole (nsdong2): Mondays 4-5pm in CS Tutoring Center (Siebel basement)
• Claire (czchou2): Tuesdays 2-3pm in CS Tutoring Center (Siebel basement)
• Mrinmoy (mroddur2): Thursdays 2-3pm in Siebel 2219

2
Outline
1. Change problem
2. Review of running time analysis
3. Edit distance
4. Review elementary graph theory
5. Manhattan Tourist problem
6. Longest/shortest paths in DAGs

Reading:
• Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
• Lecture notes
3
The Change Problem
Change Problem: Given amount 𝑀 ∈ ℕ ∖ {0} and coins 𝐜 = 𝑐! , … , 𝑐" ∈ ℕ"
s.t. 𝑐" = 1 and 𝑐# ≥ 𝑐#$! for all 𝑖 ∈ 𝑛 − 1 = {1, … , 𝑛 − 1},
find 𝐝 = 𝑑! , … , 𝑑" ∈ ℕ" s.t. (i) 𝑀 = ∑"#%! 𝑐# 𝑑# and (ii) ∑"#%! 𝑑# is minimum

• Suppose we have 𝑛 = 3 coins:

7 3 1
𝐜=( cent , cent , cent
)

• What is the minimum number of coins needed to make change for 𝑀 = 9


cents?
• Answer: 𝑑! , … , 𝑑" = (1, 0, 2) thus 1 + 0 + 2 = 3 coins.
4
The Change Problem – Four Algorithms
GreedyChange(𝑀, 𝑐! , … , 𝑐" ) ExhaustiveChange(𝑀, 𝑐! , … , 𝑐" )
1. for 𝑖 ß 1 to 𝑛 1. for (d1 , . . . , dn ) 2 {0, . . . , bM/c1 c} ⇥ . . . ⇥ {0, . . . , bM/cn c}
Pn
2. 𝑑# ß bM/ci c 2. if i=1 ci di =M
3. 𝑀 ß 𝑀 − 𝑑# 𝑐# 3. return (d1 , . . . , dn )

RecursiveChange(𝑀, 𝑐! , … , 𝑐" ) DPChange(𝑀, 𝑐! , … , 𝑐" )


1. if 𝑀 = 0 1. for 𝑚 ß 1 to 𝑀
2. return 0 2. minNumCoins[𝑚] ß ∞
3. bestNumCoins ß ∞ 3. for 𝑖 ß 1 to 𝑛
4. for 𝑖 ß 1 to 𝑛 4. minNumCoins[𝑐# ] ß 1
5. if 𝑀 ≥ 𝑐# 5. for 𝑚 ß 1 to 𝑀
6. numCoins ß RecursiveChange(𝑀 − 𝑐# , 𝑐! , … , 𝑐" ) 6. for 𝑖 ß 1 to 𝑛
7. if numCoins + 1 < bestNumCoins 7. if 𝑚 > 𝑐#
8. bestNumCoins ß numCoins + 1 8. minNumCoins[𝑚] ß min(1 +
minNumCoins[𝑚 − 𝑐# ], minNumCoins[𝑚])
9. return bestNumCoins
9. return minNumCoins[M] 5
Four Different Algorithms

Technique Correct? Efficient?


Greedy algorithm no yes
[GreedyChange]
Exhaustive enumeration yes no
[ExhaustiveChange]
Recursive algorithm yes no
[RecursiveChange]
Dynamic programming yes yes
[DPChange]

Question: How to assess efficiency?


6
Running Time Analysis
• The running time of an algorithm 𝐴 for problem Π is the maximum number
of steps that 𝐴 will take on any instance of size 𝑛 = |𝑋|
• Asymptotic running time ignores constant factors using Big O notation

g(n)

f(n) 𝑓(𝑛) is 𝑂(𝑔 𝑛 ) provided there


exists 𝑐 > 0 and 𝑛6 ≥ 0 such that
𝑓 𝑛 ≤ 𝑐 𝑔(𝑛) for all 𝑛 ≥ 𝑛6

7
Running Time Analysis – Example
𝑓(𝑛) is 𝑂(𝑔 𝑛 ) provided there exists 𝑐 > 0 and 𝑛6 ≥ 0 such that
𝑓 𝑛 ≤ 𝑐 𝑔(𝑛) for all 𝑛 ≥ 𝑛6

𝑓 𝑛 = 10000 + 500𝑛! 𝑓 𝑛
𝑔 𝑛 = 𝑛"/2 1000 𝑔 𝑛

Pick 𝑐 = 1000 and 𝑛6 = 3. Then, 𝑓(𝑛) ≤ 𝑐𝑔(𝑛) for all 𝑛 ≥ 𝑛6 .

8
The Change Problem – Running Time Analysis
GreedyChange(𝑀, 𝑐! , … , 𝑐" )
Number of operations:
1. for 𝑖 ß 1 to 𝑛
• Line 2: 3 = 𝑂(1)
2. 𝑑# ß bM/ci c
• Line 3: 3 = 𝑂(1)
3. 𝑀 ß 𝑀 − 𝑑# 𝑐#
• Total: 6𝑛 = 𝑂(𝑛)
DPChange(𝑀, 𝑐! , … , 𝑐" )
1. for 𝑚 ß 1 to 𝑀
2. minNumCoins[𝑚] ß ∞
3. for 𝑖 ß 1 to 𝑛 Number of operations:
4. minNumCoins[𝑐# ] ß 1 • Lines 1-2: 𝑂(𝑀)
5. for 𝑚 ß 1 to 𝑀 • Lines 3-4: 𝑂(𝑛)
6. for 𝑖 ß 1 to 𝑛 • Lines 5-8: 𝑂(𝑀𝑛)
7. if 𝑚 > 𝑐#
• Total: 𝑂(𝑀) + 𝑂(𝑛) + 𝑂(𝑀𝑛) =
8. minNumCoins[𝑚] ß min(1 +
minNumCoins[𝑚 − 𝑐# ], minNumCoins[𝑚]) 𝑂(𝑀𝑛)
9. return minNumCoins[M] 9
Running Time Analysis – Guidelines
• 𝑂(𝑛! ) ⊂ 𝑂(𝑛" ) for any positive constants 𝑎 < 𝑏

• For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1,


𝑂(𝑎) ⊂ 𝑂(log 𝑛) ⊂ 𝑂(𝑛" ) ⊂ 𝑂(𝑐 # )

• We can multiply to learn about other functions. For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1,

𝑂 𝑎𝑛 = 𝑂(𝑛) ⊂ 𝑂(𝑛 log 𝑛) ⊂ 𝑂 𝑛 𝑛" = 𝑂(𝑛"$% ) ⊂ 𝑂(𝑛𝑐 # )

• Base of the logarithm is a constant and can be ignored. For any constants 𝑎, 𝑏 > 1,
𝑂 log ! 𝑛 = 𝑂(log " 𝑛/ log " 𝑎) = 𝑂(1/(log " 𝑎) log " 𝑛) = 𝑂(log " 𝑛)
10
Running Time Analysis – Guidelines
Big Oh Name
𝑂(1) Constant
• 𝑂(𝑛! ) ⊂ 𝑂(𝑛" ) for any positive constants 𝑎 < 𝑏
𝑂(log 𝑛) Logarithmic
𝑂(𝑛) Linear
• For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1, 𝑂(𝑛! ) Quadratic
𝑂 𝑛" = 𝑂(poly 𝑛 ) Polynomial
𝑂(𝑎) ⊂ 𝑂(log 𝑛) ⊂ 𝑂(𝑛" ) ⊂ 𝑂(𝑐 # )
𝑂(2#$%&(() ) Exponential

• We can multiply to learn about other functions. For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1,

𝑂 𝑎𝑛 = 𝑂(𝑛) ⊂ 𝑂(𝑛 log 𝑛) ⊂ 𝑂 𝑛 𝑛" = 𝑂(𝑛"$% ) ⊂ 𝑂(𝑛𝑐 # )

• Base of the logarithm is a constant and can be ignored. For any constants 𝑎, 𝑏 > 1,
𝑂 log ! 𝑛 = 𝑂(log " 𝑛/ log " 𝑎) = 𝑂(1/(log " 𝑎) log " 𝑛) = 𝑂(log " 𝑛)
11
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?

12
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )

• Recall that 𝑛! = ∏"#%! 𝑖 Question: What is 𝑂 𝑛! ?

13
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )

• Recall that 𝑛! = ∏"#%! 𝑖 Question: What is 𝑂 𝑛! ?


- - -
Stirling’s approximation: 𝑛! ≈ 2𝜋𝑛 .
= 2𝜋 ./0 -
𝑛- = 𝑂 𝑛- = 𝑂(2- 123 - )
(*)
(*) : 𝑛 / exp 𝑛 < 1 for all 𝑛 > 0

Question: Is 𝑛" = 𝑂 𝑛! ?

14
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )

• Recall that 𝑛! = ∏"#%! 𝑖 Question: What is 𝑂 𝑛! ?


- - -
Stirling’s approximation: 𝑛! ≈ 2𝜋𝑛 .
= 2𝜋 ./0 -
𝑛- = 𝑂 𝑛- = 𝑂(2- 123 - )
(*)
(*) : 𝑛 / exp 𝑛 < 1 for all 𝑛 > 0

Question: Is 𝑛" = 𝑂 𝑛! ?

Question: What is 𝑂 log(𝑛!) ?


15
Course Topic #1: Sequence Alignment
“Thus, although the FOXP2 protein is
extremely conserved among mammals, it
acquired two amino-acid changes on the
human lineage, at least one of which may have
functional consequences. This is an intriguing
finding, because FOXP2 is the first gene known
to be involved in the development of speech
and language.”
Nature (2002)

Question: How do we align sequences to identify similarities/differences?


16
Alignment
An alignment between two strings v (of m characters) and w (of n characters)
is a two row matrix where the first row contains the characters of v in order,
the second row contains the characters of w in order, and spaces may be
interspersed throughout each.

Input Output

v: KITTEN (m = 6) v: K - I T T E N -
w: SITTING (n = 7) w: S I - T T I N G

Question: Is this a good alignment?


Answer: Count the number of insertion, deletions, substitutions.
17
Alignment
An alignment between two strings v (of m characters) and w (of n characters)
is a two-row matrix where the first row contains the characters of v in order,
the second row contains the characters of w in order, and spaces may be
interspersed throughout each.

Input Output

v: KITTEN (m = 6) v: K - I T T E N -
w: SITTING (n = 7) w: S I - T T I N G

Question: Is this a good alignment?


Answer: Count the number of insertion, deletions, substitutions.
18
Edit Distance [Levenshtein, 1966]
Elementary operations: insertion, deletions and
substitutions of single characters

Edit Distance problem: Given strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " , compute the


minimum number 𝑑(𝐯, 𝐰) of elementary operations to transform 𝐯 into 𝐰.

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 1 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 2 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 3

19
Computing Edit Distance
Edit Distance problem: Given strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " , compute the
minimum number 𝑑(𝐯, 𝐰) of elementary operations to transform 𝐯 into 𝐰.

v: ATGTTAT... deletion insertion mismatch match

w: AGCGTAC...
𝑖−1 𝑖
prefix of 𝐯 of length 𝑖 𝐯# : A T - G T T T
prefix of 𝐰 of length 𝑗 𝐰9 : A G C G T - C
𝑗−1 𝑗
Optimal substructure:
Edit distance obtained from edit distance of prefix of string.
20
Computing Edit Distance – Optimal Substructure
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗

Deletion: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 + 1 … 𝐯9
Extend by a character in 𝐯 … -

Insertion: 𝑑 𝑖, 𝑗 = 𝑑 𝑖, 𝑗 − 1 + 1 … -
Extend by a character in 𝐰 … 𝐰8

Mismatch: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 − 1 + 1 … 𝐯9
Extend by a character in 𝐯 and 𝐰 … 𝐰8

Match: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 − 1 … 𝐯9
Extend by a character in 𝐯 and 𝐰 … 𝐰8
21
Computing Edit Distance – Recurrence
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗

8 … 𝐯$

>
> d[i 1, j] + 1, … -

>
<d[i, j 1] + 1, … -
… 𝐰#

d[i, j] = min … 𝐯$
>
> d[i 1, j 1] + 1, if vi 6= wj , … 𝐰#
>
: … 𝐯$
d[i 1, j 1], if vi = wj . … 𝐰#

22
Computing Edit Distance – Recurrence
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗

8
>
> 0, if i = 0 and j = 0,
>
>
>
>
<d[i 1, j] + 1, if i > 0,
d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .

23
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

A 1

T 2

G 3

T 4

24
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 0
or
1 1
T 2
1
G 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4

25
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

26
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 ? 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

27
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

28
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2 ?
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

29
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2 1
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

30
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

31
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

32
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4

33
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 1 2 3 0
or
1 1
T 2 2 1 0 1 2
1
G 3 3 2 1 1 1 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4 3 2 2 2

34
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 1 2 3 0
or
1 1
T 2 2 1 0 1 2
1
G 3 3 2 1 1 1 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4 3 2 2 2

35
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4
A T - G T
A 1 1 0 1 2 3 A T C G -
T 2 2 1 0 1 2

G 3 3 2 1 1 1
A T G T
A T C G
T 4 4 3 2 2 2

36
Computing Edit Distance – Running Time 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
V 0 1 2 3 4 0
or
1 1
0 0 1 2 3 4 1
𝑖, 𝑗 − 1 𝑖, 𝑗
A 1 1 0 1 2 3
For each 𝑚 + 1 × (𝑛 + 1) entry:
T 2 2 1 0 1 2
• 3 addition operations
G 3 3 2 1 1 1 • 1 comparison operation
• 1 minimum operation
T 4 4 3 2 2 2 Running time: 𝑂 𝑚𝑛 time

37
Computing Edit Distance – Running Time 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
V 0 1 2 3 4 0
or
1 1
0 0 1 2 3 4 1
𝑖, 𝑗 − 1 𝑖, 𝑗
A 1 1 0 1 2 3
For each 𝑚 + 1 × (𝑛 + 1) entry:
T 2 2 1 0 1 2
• 3 addition operations
G 3 3 2 1 1 1 • 1 comparison operation
• 1 minimum operation
T 4 4 3 2 2 2 Running time: 𝑂 𝑚𝑛 time

38
Computing Edit Distance – Your turn! 8
> 0, if i = 0 and j = 0,
… 𝐯" … -
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗 >
>
>
>
… -
deletion … 𝐰!
insertion >
<d[i 1, j] + 1, if i > 0,
0
or d[i, j] = min d[i, j 1] + 1, if j > 0,
1 1 >
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
… 𝐯" … 𝐯" >
>
… 𝐰!
mismatch … 𝐰!
match 1 :d[i 1, j 1], if i > 0, j > 0 and vi = wj .
𝑖, 𝑗 − 1 𝑖, 𝑗

W C A R W A T E W A R E
V 0 1 2 3 V 0 1 2 3 V 0 1 2 3

0 0 0

C 1 C 1 C 1

A 2 A 2 A 2

T 3 T 3 T 3

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 39


Change Problem and Edit distance
W A T C G
V 0 1 2 3 4
Make M cents using minimum
number of 1, 3 and 5 cent coins. 0 0 1 2 3 4

A 1 1 0 1 2 3
Value 1 2 3 4 5 6 7
Min # coins 1 2 1 2 1 2 3 T 2 2 1 0 1 2

G 3 3 2 1 1 1

T 4 4 3 2 2 2

• Both have optimal substructure and can be solved using dynamic programming
• These are examples of a more general problem!
41
Review of Graph Theory
• Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Edges 𝐸 = {(𝑣# , 𝑣9 ), … }

Chicago

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

42
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }

Chicago

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

43
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

44
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them
130
• Edges can be weighted
140
Bloomington
50
Champaign-Urbana

180
170 150 Indianapolis

St. Louis

45
Manhattan Tourist Problem

Begin
A tourist in Manhattan * *
wants to visit the
maximum number of *
attractions (*) by
* *
traveling on a path (only
* *
eastward and southward) *
from start to end
*
* * *
End

46
Manhattan Tourist Problem
A tourist in Manhattan
wants to visit the Begin
maximum number of 2 1
attractions (*) by
traveling on a path (only 1 1 1
eastward and southward)
from start to end 1 2
1
May be more than 1 5
attraction on a street. 1 1 3
Add weights! End

47
Manhattan Tourist Problem
0 1 2 3 4
begin j coordinate
3 2 4 0
0 0 3 5 9

Manhattan Tourist 1 0 2 4 3

Problem (MTP): 3 2 4 2
1 13
Given a weighted,
1
directed grid graph G 4 6 5 2

i coordinate
with two vertices “begin” 2
0 7 3
15
4
19
and “end”, find the
maximum weight path in 4 4 5 2 1

G from “begin” to “end”. 3


3 3 0 2
20

5 6 8 5 3

1 3 2 2 end
4 23
48
Manhattan Tourist Problem – Exhaustive Algorithm
0 1 2 3 4
Check all paths begin 3 2 4 0
j coordinate
0 0 3 5 9

Question: 1 0 2 4 3

How many paths? 1


3 2 4
13
2

1
4 6 5 2

i coordinate
0 7 3 4
2 15 19

4 4 5 2 1
3 3 0 2
3
20

5 6 8 5 3

1 3 2 2 end
4 23
49
Manhattan Tourist Problem – Greedy Algorithm
1 2 5
begin

5 3 10 5

2 1 5

3 5 3 1
better path!
2 3 4

promising start, 0 0 5 2
but leads to bad 22
0 0 0
choices! end
18

50
Manhattan Tourist Problem – Optimal Substructure
1 2 5
begin

5 3 10 5

2 1 5

best score to this


3 5 3 1 point
2 3 4
20

0 0 5 2

0 0 0
end best score to end
best score to this point 18 22

51
Manhattan Tourist Problem – Optimal Substructure
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)

1 2 5
begin

5 3 10 5

2 1 5

best score to
Question: What is the recurrence? 3 5 3 1 this point
2 3 4
20

0 0 5 2

0 0 0
end best score to
best score to this point 18 22 end

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗


• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between 𝑖, 𝑗 − 1 and 𝑖, 𝑗
52
Manhattan Tourist Problem – Optimal Substructure
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)

1 2 5
begin

5 3 10 5

2 1 5
8 best score to
>
<0, if i = 0 and j = 0, 3 5 3 1 this point
s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0, 2 3 4
>
: 20
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 0 5 2

0 0 0
end best score to
best score to this point 18 22 end

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗


• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between 𝑖, 𝑗 − 1 and 𝑖, 𝑗
53
MTP – Solving Recurrence using Dynamic Programming
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0
source 8
>
<0, if i = 0 and j = 0,
s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between


𝑖 − 1, 𝑗 and 𝑖, 𝑗
• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗

54
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1
source 8
>
<0, if i = 0 and j = 0,
1 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1
i 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗

55
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2
source 8
>
<0, if i = 0 and j = 0,
1 2 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3
i 5 3
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3

2
8

56
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5

2
2
8 12
0

3
8

57
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3

2 3
2
8 12 16
0 0

0
3
8 12

58
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4
2
8 12 16 20
0 0 5

0 0
3
8 12 21

59
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4
2
8 12 16 20
0 0 5 2

0 0 0
3
8 12 21 22

60
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4 Let 𝑚 be the number of rows and 𝑛 be


2
8 12 16 20 the number of columns.
0 0 5 2
Running time: 𝑂(𝑚𝑛)

3
0 0 0 Question: Implementation?
8 12 21 22
S3,3 = 22
61
Manhattan Is Not a Perfect Grid
A2 A3

What about diagonals?


A1 B

8
>
<s[A1 ] + w[A1 , B],
s[B] = max s[A2 ] + w[A2 , B],
>
:
s[A3 ] + w[A3 , B].

62
Manhattan Is Not a Perfect Grid, It’s a Directed Graph

pred 𝑖, 𝑗
𝐺 = (𝑉, 𝐸) is a directed
acyclic graph (DAG) Each edge is evaluated
with nonnegative edges once: 𝑂( 𝐸 ) time
weights 𝑤 ∶ 𝐸 → ℝ@
𝑖, 𝑗

s[0, 0] = 0
0 0 0 0
s[i, j] = max {s[i , j ] + w[(i , j ), (i, j)]}
(i0 ,j 0 ) 2 pred(i,j)

63
Dynamic Programming as a Graph Problem
Begin
Manhattan Tourist Problem: * *
Every path in directed graph is a possible * * *
tourist path. Find maximum weight path. *
* *
Running time: 𝑂 𝑚𝑛 = 𝑂( 𝐸 )
*
* * *
End

Change Problem: Make M cents using


minimum number of coins 𝐜 = 5, 3, 1 .
Every path in directed graph is a possible
change. Find shortest path.
Running time: 𝑂 𝑀𝑛 = 𝑂( 𝐸 )
64
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given
V 0 1 2 3 4
strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " ,
compute the minimum number
0 𝑑(𝐯, 𝐰) of elementary operations
A 1 to transform 𝐯 into 𝐰.
T 2

G 3

T 4

𝐯" deletion - insertion 𝐯" mismatch 𝐯" match


- 𝐰! 𝐰! 𝐰!

65
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given
V 0 1 2 3 4
strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " ,
compute the minimum number
0 O O O O O 𝑑(𝐯, 𝐰) of elementary operations
A 1 O O O O O to transform 𝐯 into 𝐰.
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).

66
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given edit
V 0 1 2 3 4
graph 𝐺 = (𝑉, 𝐸), with edge
weights c ∶ 𝐸 → 0,1 . Find
0 O O O O O shortest path from (0, 0) to (𝑚, 𝑛).
A 1 O O O O O
Alignment is a path from (0, 0) to (𝑚, 𝑛)
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).

67
Shortest Path vs Longest Path
• Change graph, edit graph and the MTP grid are directed graphs G.

• Change problem and Edit Distance problem are minimization problems.


• Find shortest path in G from source to sink.

• Manhattan Tourist problem is a maximization problem.


• Find longest path in G from source to sink.

68
Shortest Path vs Longest Path
• Shortest path in directed graphs can be found efficiently (Dijkstra, Bellman-
Ford, Floyd-Warshall algorithms)
• Longest path in direct graphs cannot be found efficiently (NP-hard).

• Change graph, edit graph and MTP grid graph are directed acylic graphs
(DAGs).
• No directed cycles.

• Longest path problem in a DAG can


solved efficiently by dynamic programming directed cycle

Question: What’s the relation between absence of directed cycles and optimal substructure?
69
Weighted Edit Distance
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗
8 deletion
… 𝐯$

>
> d[i 1, j] + 1, … -

>
<d[i, j 1] + 1, insertion
… -
… 𝐰#

d[i, j] = min mismatch … 𝐯$


>
> d[i 1, j 1] + 1, if vi 6= wj , … 𝐰#
>
: … 𝐯$
d[i 1, j 1], if vi = wj . … 𝐰#

Replace +1 with different penalties for different types of edits.


70
Summary
1. Change problem
2. Review of running time analysis
3. Edit distance
4. Review elementary graph theory
5. Manhattan Tourist problem
6. Longest/shortest paths in DAGs

Reading:
• Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
• Lecture notes
71
Sources
• CS 362 by Layla Oesper (Carleton College)
• CS 1810 by Ben Raphael (Brown/Princeton University)
• An Introduction to Bioinformatics Algorithms book (Jones and Pevzner)
• http://bioalgorithms.info/

72

You might also like