0% found this document useful (0 votes)
8 views

Lecture 19

Uploaded by

Burim Baftijari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 19

Uploaded by

Burim Baftijari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture 19: Edit Distance

Agenda:

• LCS

• Edit distance

Reading:

• Text: (partly covered) 350-355

1
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem:

Definitions: – Sequence/string:
dynamicprogramming is a sequence over the English alpha-
bet
– Base/letter/character
– Subsequence:
the given sequence with zero or more letters deleted
e.g., dog is a subsequence of dynamicprogramming
WARNing: bases appear in the same order, but not nec-
essarily consecutive
– Common subsequence
– LCS problem: given two sequences X = x1 x2 . . . xn and
Y = y1 y2 . . . ym , find a maximum-length common subse-
quence of them.

• The LCS problem has the “optimal substructure” ...


– if xn is NOT in the LCS (to be computed), then we only
need to compute an LCS of x1 x2 . . . xn−1 and y1 y2 . . . ym
...
– similarly, if ym is NOT in the LCS (to be computed),
then we only need to compute an LCS of x1 x2 . . . xn and
y1 y2 . . . ym−1 ...
– if xn and ym are both in the LCS (to be computed), then
xn = ym and we need to compute an LCS of x1 x2 . . . xn−1
and y1 y2 . . . ym−1 ;
and then adding xn to the end to form an LCS for the
original problem

2
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem (cont’d):

• Therefore,
Letting DP [n, m] to denote the length of an LCS of X and
Y , then it is equal to
LCS(x1 x2 . . . xn−1 , y1 y2 . . . ym ),

max length of LCS(x1 x2 . . . xn , y1 y2 . . . ym−1 ),
LCS(x1 x2 . . . xn−1 , y1 y2 . . . ym−1 ) + ‘x0n , if xn = ym

• Correctness

• In general, let DP [i, j] denote the length of an LCS of x1 x2 . . . xi


and y1 y2 . . . yj .

• Recurrence:
If i = 0 or j = 0 then clearly DP [i, j] = 0.
Else
DP [i − 1, j],

DP [i, j] = max DP [i, j − 1],
DP [i − 1, j − 1] + 1, if xi = yj

3
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem (cont’d)


— solving the recurrence:

• Pseudocode

procedure dpLCS(X, Y )

n ← length[X]
m ← length[Y ]
for i ← 1 to m do
DP [i, 0] ← 0
for j ← 0 to n do
DP [0, j] ← 0
for i ← 1 to n do
for j ← 1 to m do
if xi = yj then
DP [i, j] ← DP [i − 1, j − 1] + 1
else if DP [i − 1, j] ≥ DP [i, j − 1] then
DP [i, j] ← DP [i − 1, j]
else
DP [i, j] ← DP [i, j − 1]
return DP [n, m]

• Running time: Θ(n × m)


There are n × m entries each takes constant time to compute.

• Step 4: to find the LCS, store extra information and trace


back!

4
Lecture 19: Edit Distance

Editing Sequences:

• Given two sequences S and T over alphabets Σ = {a, b, c}.

• Starting with S, at each step we can either:


– delete a letter from S,
– insert a letter from Σ into S (at any place),
– exchange a letter of S with another one in Σ.

• each of these operations has a cost (given).

• If we use δ to denote “null” then:


– cost of exchanging a with b is: Cost[a, b], and
– cost of deleting c (i.e. replacing it with δ) is Cost[c, δ],
and
– cost of inserting letter b (i.e. replacing δ with b) is
Cost[δ, b].

• Given S, T , and cost table Cost, want to find a sequence of


operations that transform S into T with minimum cost.

• We assume that Cost is metric:


– Cost[α, α] = 0 for α ∈ Σ,
– Cost[α, β] = Cost[β, α], for α, β ∈ Σ ∪ {δ},
– Cost[α, β] ≤ Cost[α, σ] + Cost[σ, β], for α, β, σ ∈ Σ ∪ {δ}.

5
Lecture 19: Edit Distance

Edit Distance:

• Let’s assume S = s1 s2 . . . sn and T = t1 t2 . . . tm .

• Subproblems will be: best edit distance of s1 . . . si and t1 . . . tj


with 1 ≤ i ≤ n and 1 ≤ j ≤ m.

• Let Edit[i, j] denote the minimum cost of editing s1 . . . si into


t1 . . . tj .

• In the optimal solution, either:


– si is deleted and the rest of s1 . . . si−1 is transformed into
t1 . . . tj , or
– s1 . . . si is transformed into t1 . . . tj−1 and we insert tj at
the end, or
– si is changed into tj and the rest of s1 . . . si−1 is trans-
formed into t1 . . . tj−1 .

• Thus:
Edit[i − 1, j] + Cost[si , δ],

Edit[i, j] = min Edit[i, j − 1] + Cost[δ, tj ],
Edit[i − 1, j − 1] + Cost[si , tj ],

Pi Pj
• with Edit[i, 0] = k=1
Cost[sk , δ] and Edit[0, j] = k=1
Cost[δ, tj ].

6
Lecture 19: Dynamic Programming

Edit Distance:

• Pseudocode

procedure Edit-Dist(S, T, Cost)

n ← length[S]
m ← length[T ]
Edit[0, 0] ← 0
for i ← 1 to n do
Edit[i, 0] ← Edit[i − 1, 0] + Cost[S[i], δ]
for j ← 1 to n do
Edit[0, j] ← Edit[0, j − 1] + Cost[δ, T [j]]
for i ← 1 to n do
for j ← 1 to m do
Op1 ← Edit[i − 1, j] + Cost[S[i], δ]
Op2 ← Edit[i, j − 1] + Cost[δ, T [j]]
Op3 ← Edit[i − 1, j − 1] + Cost[S[i], T [j]]
Edit[i, j] = min{Op1, Op2, Op3}
return Edit[n, m]

• Running time: Θ(n × m)


There are n × m entries each takes constant time to compute.

• To find the sequence of operations: keep extra information


and trace back!

7
Lecture 19: Edit Distance

Have you understood the lecture contents?


well ok not-at-all topic

   LCS

   edit distance

   DP for edit distance

You might also like