Lecture 2
Lecture 2
Introduction to Bioinformatics
Lecture 2
Mohammed El-Kebir
August 30, 2024
Course Announcements
Instructor:
• Mohammed El-Kebir (melkebir)
• Office hours: Wednesdays, 3:30-4:30pm in Siebel 3216
Piazza / Gradescope:
• https://piazza.com/illinois/fall2024/cs466
• Gradescope entry code: ‘VDEE52’
TA:
• Nicole (nsdong2): Mondays 4-5pm in CS Tutoring Center (Siebel basement)
• Claire (czchou2): Tuesdays 2-3pm in CS Tutoring Center (Siebel basement)
• Mrinmoy (mroddur2): Thursdays 2-3pm in Siebel 2219
2
Outline
1. Change problem
2. Review of running time analysis
3. Edit distance
4. Review elementary graph theory
5. Manhattan Tourist problem
6. Longest/shortest paths in DAGs
Reading:
• Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
• Lecture notes
3
The Change Problem
Change Problem: Given amount 𝑀 ∈ ℕ ∖ {0} and coins 𝐜 = 𝑐! , … , 𝑐" ∈ ℕ"
s.t. 𝑐" = 1 and 𝑐# ≥ 𝑐#$! for all 𝑖 ∈ 𝑛 − 1 = {1, … , 𝑛 − 1},
find 𝐝 = 𝑑! , … , 𝑑" ∈ ℕ" s.t. (i) 𝑀 = ∑"#%! 𝑐# 𝑑# and (ii) ∑"#%! 𝑑# is minimum
7 3 1
𝐜=( cent , cent , cent
)
g(n)
7
Running Time Analysis – Example
𝑓(𝑛) is 𝑂(𝑔 𝑛 ) provided there exists 𝑐 > 0 and 𝑛6 ≥ 0 such that
𝑓 𝑛 ≤ 𝑐 𝑔(𝑛) for all 𝑛 ≥ 𝑛6
𝑓 𝑛 = 10000 + 500𝑛! 𝑓 𝑛
𝑔 𝑛 = 𝑛"/2 1000 𝑔 𝑛
8
The Change Problem – Running Time Analysis
GreedyChange(𝑀, 𝑐! , … , 𝑐" )
Number of operations:
1. for 𝑖 ß 1 to 𝑛
• Line 2: 3 = 𝑂(1)
2. 𝑑# ß bM/ci c
• Line 3: 3 = 𝑂(1)
3. 𝑀 ß 𝑀 − 𝑑# 𝑐#
• Total: 6𝑛 = 𝑂(𝑛)
DPChange(𝑀, 𝑐! , … , 𝑐" )
1. for 𝑚 ß 1 to 𝑀
2. minNumCoins[𝑚] ß ∞
3. for 𝑖 ß 1 to 𝑛 Number of operations:
4. minNumCoins[𝑐# ] ß 1 • Lines 1-2: 𝑂(𝑀)
5. for 𝑚 ß 1 to 𝑀 • Lines 3-4: 𝑂(𝑛)
6. for 𝑖 ß 1 to 𝑛 • Lines 5-8: 𝑂(𝑀𝑛)
7. if 𝑚 > 𝑐#
• Total: 𝑂(𝑀) + 𝑂(𝑛) + 𝑂(𝑀𝑛) =
8. minNumCoins[𝑚] ß min(1 +
minNumCoins[𝑚 − 𝑐# ], minNumCoins[𝑚]) 𝑂(𝑀𝑛)
9. return minNumCoins[M] 9
Running Time Analysis – Guidelines
• 𝑂(𝑛! ) ⊂ 𝑂(𝑛" ) for any positive constants 𝑎 < 𝑏
• We can multiply to learn about other functions. For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1,
• Base of the logarithm is a constant and can be ignored. For any constants 𝑎, 𝑏 > 1,
𝑂 log ! 𝑛 = 𝑂(log " 𝑛/ log " 𝑎) = 𝑂(1/(log " 𝑎) log " 𝑛) = 𝑂(log " 𝑛)
10
Running Time Analysis – Guidelines
Big Oh Name
𝑂(1) Constant
• 𝑂(𝑛! ) ⊂ 𝑂(𝑛" ) for any positive constants 𝑎 < 𝑏
𝑂(log 𝑛) Logarithmic
𝑂(𝑛) Linear
• For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1, 𝑂(𝑛! ) Quadratic
𝑂 𝑛" = 𝑂(poly 𝑛 ) Polynomial
𝑂(𝑎) ⊂ 𝑂(log 𝑛) ⊂ 𝑂(𝑛" ) ⊂ 𝑂(𝑐 # )
𝑂(2#$%&(() ) Exponential
• We can multiply to learn about other functions. For any constants 𝑎, 𝑏 > 0 and 𝑐 > 1,
• Base of the logarithm is a constant and can be ignored. For any constants 𝑎, 𝑏 > 1,
𝑂 log ! 𝑛 = 𝑂(log " 𝑛/ log " 𝑎) = 𝑂(1/(log " 𝑎) log " 𝑛) = 𝑂(log " 𝑛)
11
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
12
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )
13
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )
Question: Is 𝑛" = 𝑂 𝑛! ?
14
Running Time Analysis – More Examples
"
Question: What is 𝑂 7
?
"
• For constant 𝑘 > 0 it holds that 7
= O(𝑛7 )
Question: Is 𝑛" = 𝑂 𝑛! ?
Input Output
v: KITTEN (m = 6) v: K - I T T E N -
w: SITTING (n = 7) w: S I - T T I N G
Input Output
v: KITTEN (m = 6) v: K - I T T E N -
w: SITTING (n = 7) w: S I - T T I N G
19
Computing Edit Distance
Edit Distance problem: Given strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " , compute the
minimum number 𝑑(𝐯, 𝐰) of elementary operations to transform 𝐯 into 𝐰.
w: AGCGTAC...
𝑖−1 𝑖
prefix of 𝐯 of length 𝑖 𝐯# : A T - G T T T
prefix of 𝐰 of length 𝑗 𝐰9 : A G C G T - C
𝑗−1 𝑗
Optimal substructure:
Edit distance obtained from edit distance of prefix of string.
20
Computing Edit Distance – Optimal Substructure
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗
Deletion: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 + 1 … 𝐯9
Extend by a character in 𝐯 … -
Insertion: 𝑑 𝑖, 𝑗 = 𝑑 𝑖, 𝑗 − 1 + 1 … -
Extend by a character in 𝐰 … 𝐰8
Mismatch: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 − 1 + 1 … 𝐯9
Extend by a character in 𝐯 and 𝐰 … 𝐰8
Match: 𝑑 𝑖, 𝑗 = 𝑑 𝑖 − 1, 𝑗 − 1 … 𝐯9
Extend by a character in 𝐯 and 𝐰 … 𝐰8
21
Computing Edit Distance – Recurrence
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗
8 … 𝐯$
>
> d[i 1, j] + 1, … -
>
<d[i, j 1] + 1, … -
… 𝐰#
d[i, j] = min … 𝐯$
>
> d[i 1, j 1] + 1, if vi 6= wj , … 𝐰#
>
: … 𝐯$
d[i 1, j 1], if vi = wj . … 𝐰#
22
Computing Edit Distance – Recurrence
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗
8
>
> 0, if i = 0 and j = 0,
>
>
>
>
<d[i 1, j] + 1, if i > 0,
d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
23
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
A 1
T 2
G 3
T 4
24
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 0
or
1 1
T 2
1
G 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4
25
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
26
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 ? 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
27
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
28
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2 ?
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
29
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 0
or
1 1
T 2 2 1
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
30
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
31
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
32
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0
or
1 1
T 2 2
1
G 3 3 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4
33
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 1 2 3 0
or
1 1
T 2 2 1 0 1 2
1
G 3 3 2 1 1 1 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4 3 2 2 2
34
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4 𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
A 1 1 0 1 2 3 0
or
1 1
T 2 2 1 0 1 2
1
G 3 3 2 1 1 1 𝑖, 𝑗 − 1 𝑖, 𝑗
T 4 4 3 2 2 2
35
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4
0 0 1 2 3 4
A T - G T
A 1 1 0 1 2 3 A T C G -
T 2 2 1 0 1 2
G 3 3 2 1 1 1
A T G T
A T C G
T 4 4 3 2 2 2
36
Computing Edit Distance – Running Time 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
V 0 1 2 3 4 0
or
1 1
0 0 1 2 3 4 1
𝑖, 𝑗 − 1 𝑖, 𝑗
A 1 1 0 1 2 3
For each 𝑚 + 1 × (𝑛 + 1) entry:
T 2 2 1 0 1 2
• 3 addition operations
G 3 3 2 1 1 1 • 1 comparison operation
• 1 minimum operation
T 4 4 3 2 2 2 Running time: 𝑂 𝑚𝑛 time
37
Computing Edit Distance – Running Time 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗
V 0 1 2 3 4 0
or
1 1
0 0 1 2 3 4 1
𝑖, 𝑗 − 1 𝑖, 𝑗
A 1 1 0 1 2 3
For each 𝑚 + 1 × (𝑛 + 1) entry:
T 2 2 1 0 1 2
• 3 addition operations
G 3 3 2 1 1 1 • 1 comparison operation
• 1 minimum operation
T 4 4 3 2 2 2 Running time: 𝑂 𝑚𝑛 time
38
Computing Edit Distance – Your turn! 8
> 0, if i = 0 and j = 0,
… 𝐯" … -
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗 >
>
>
>
… -
deletion … 𝐰!
insertion >
<d[i 1, j] + 1, if i > 0,
0
or d[i, j] = min d[i, j 1] + 1, if j > 0,
1 1 >
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
… 𝐯" … 𝐯" >
>
… 𝐰!
mismatch … 𝐰!
match 1 :d[i 1, j 1], if i > 0, j > 0 and vi = wj .
𝑖, 𝑗 − 1 𝑖, 𝑗
W C A R W A T E W A R E
V 0 1 2 3 V 0 1 2 3 V 0 1 2 3
0 0 0
C 1 C 1 C 1
A 2 A 2 A 2
T 3 T 3 T 3
A 1 1 0 1 2 3
Value 1 2 3 4 5 6 7
Min # coins 1 2 1 2 1 2 3 T 2 2 1 0 1 2
G 3 3 2 1 1 1
T 4 4 3 2 2 2
• Both have optimal substructure and can be solved using dynamic programming
• These are examples of a more general problem!
41
Review of Graph Theory
• Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Edges 𝐸 = {(𝑣# , 𝑣9 ), … }
Chicago
Bloomington
Champaign-Urbana
Indianapolis
St. Louis
42
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }
Chicago
Bloomington
Champaign-Urbana
Indianapolis
St. Louis
43
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them
Bloomington
Champaign-Urbana
Indianapolis
St. Louis
44
Review of Graph Theory
• Directed graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣! , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣9 ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them
130
• Edges can be weighted
140
Bloomington
50
Champaign-Urbana
180
170 150 Indianapolis
St. Louis
45
Manhattan Tourist Problem
Begin
A tourist in Manhattan * *
wants to visit the
maximum number of *
attractions (*) by
* *
traveling on a path (only
* *
eastward and southward) *
from start to end
*
* * *
End
46
Manhattan Tourist Problem
A tourist in Manhattan
wants to visit the Begin
maximum number of 2 1
attractions (*) by
traveling on a path (only 1 1 1
eastward and southward)
from start to end 1 2
1
May be more than 1 5
attraction on a street. 1 1 3
Add weights! End
47
Manhattan Tourist Problem
0 1 2 3 4
begin j coordinate
3 2 4 0
0 0 3 5 9
Manhattan Tourist 1 0 2 4 3
Problem (MTP): 3 2 4 2
1 13
Given a weighted,
1
directed grid graph G 4 6 5 2
i coordinate
with two vertices “begin” 2
0 7 3
15
4
19
and “end”, find the
maximum weight path in 4 4 5 2 1
5 6 8 5 3
1 3 2 2 end
4 23
48
Manhattan Tourist Problem – Exhaustive Algorithm
0 1 2 3 4
Check all paths begin 3 2 4 0
j coordinate
0 0 3 5 9
Question: 1 0 2 4 3
1
4 6 5 2
i coordinate
0 7 3 4
2 15 19
4 4 5 2 1
3 3 0 2
3
20
5 6 8 5 3
1 3 2 2 end
4 23
49
Manhattan Tourist Problem – Greedy Algorithm
1 2 5
begin
5 3 10 5
2 1 5
3 5 3 1
better path!
2 3 4
promising start, 0 0 5 2
but leads to bad 22
0 0 0
choices! end
18
50
Manhattan Tourist Problem – Optimal Substructure
1 2 5
begin
5 3 10 5
2 1 5
0 0 5 2
0 0 0
end best score to end
best score to this point 18 22
51
Manhattan Tourist Problem – Optimal Substructure
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
1 2 5
begin
5 3 10 5
2 1 5
best score to
Question: What is the recurrence? 3 5 3 1 this point
2 3 4
20
0 0 5 2
0 0 0
end best score to
best score to this point 18 22 end
1 2 5
begin
5 3 10 5
2 1 5
8 best score to
>
<0, if i = 0 and j = 0, 3 5 3 1 this point
s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0, 2 3 4
>
: 20
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 0 5 2
0 0 0
end best score to
best score to this point 18 22 end
54
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1
source 8
>
<0, if i = 0 and j = 0,
1 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1
i 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
55
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2
source 8
>
<0, if i = 0 and j = 0,
1 2 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3
i 5 3
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3
2
8
56
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5
2
2
8 12
0
3
8
57
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3
2 3
2
8 12 16
0 0
0
3
8 12
58
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1
2 3 4
2
8 12 16 20
0 0 5
0 0
3
8 12 21
59
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1
2 3 4
2
8 12 16 20
0 0 5 2
0 0 0
3
8 12 21 22
60
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1
3
0 0 0 Question: Implementation?
8 12 21 22
S3,3 = 22
61
Manhattan Is Not a Perfect Grid
A2 A3
8
>
<s[A1 ] + w[A1 , B],
s[B] = max s[A2 ] + w[A2 , B],
>
:
s[A3 ] + w[A3 , B].
62
Manhattan Is Not a Perfect Grid, It’s a Directed Graph
pred 𝑖, 𝑗
𝐺 = (𝑉, 𝐸) is a directed
acyclic graph (DAG) Each edge is evaluated
with nonnegative edges once: 𝑂( 𝐸 ) time
weights 𝑤 ∶ 𝐸 → ℝ@
𝑖, 𝑗
s[0, 0] = 0
0 0 0 0
s[i, j] = max {s[i , j ] + w[(i , j ), (i, j)]}
(i0 ,j 0 ) 2 pred(i,j)
63
Dynamic Programming as a Graph Problem
Begin
Manhattan Tourist Problem: * *
Every path in directed graph is a possible * * *
tourist path. Find maximum weight path. *
* *
Running time: 𝑂 𝑚𝑛 = 𝑂( 𝐸 )
*
* * *
End
G 3
T 4
65
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given
V 0 1 2 3 4
strings 𝐯 ∈ Σ 8 and 𝐰 ∈ Σ " ,
compute the minimum number
0 O O O O O 𝑑(𝐯, 𝐰) of elementary operations
A 1 O O O O O to transform 𝐯 into 𝐰.
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).
66
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given edit
V 0 1 2 3 4
graph 𝐺 = (𝑉, 𝐸), with edge
weights c ∶ 𝐸 → 0,1 . Find
0 O O O O O shortest path from (0, 0) to (𝑚, 𝑛).
A 1 O O O O O
Alignment is a path from (0, 0) to (𝑚, 𝑛)
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).
67
Shortest Path vs Longest Path
• Change graph, edit graph and the MTP grid are directed graphs G.
68
Shortest Path vs Longest Path
• Shortest path in directed graphs can be found efficiently (Dijkstra, Bellman-
Ford, Floyd-Warshall algorithms)
• Longest path in direct graphs cannot be found efficiently (NP-hard).
• Change graph, edit graph and MTP grid graph are directed acylic graphs
(DAGs).
• No directed cycles.
Question: What’s the relation between absence of directed cycles and optimal substructure?
69
Weighted Edit Distance
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰9 ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰9 is prefix of 𝐰 of length 𝑗
8 deletion
… 𝐯$
>
> d[i 1, j] + 1, … -
>
<d[i, j 1] + 1, insertion
… -
… 𝐰#
Reading:
• Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
• Lecture notes
71
Sources
• CS 362 by Layla Oesper (Carleton College)
• CS 1810 by Ben Raphael (Brown/Princeton University)
• An Introduction to Bioinformatics Algorithms book (Jones and Pevzner)
• http://bioalgorithms.info/
72