0% found this document useful (0 votes)
44 views

Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)

The document discusses the problem of computing the edit distance between two strings and finding the optimal sequence of edit operations to transform one string into the other. It defines the edit operations of replacement, deletion and insertion with associated costs. It presents a dynamic programming algorithm that uses a recurrence relation to compute the edit distance matrix in quadratic time and traces back through the matrix to recover the optimal edit sequence.

Uploaded by

David Deza Veliz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)

The document discusses the problem of computing the edit distance between two strings and finding the optimal sequence of edit operations to transform one string into the other. It defines the edit operations of replacement, deletion and insertion with associated costs. It presents a dynamic programming algorithm that uses a recurrence relation to compute the edit distance matrix in quadratic time and traces back through the matrix to recover the optimal edit sequence.

Uploaded by

David Deza Veliz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Theory I

Algorithm Design and Analysis


(13 - Edit distance and approximate string matching)

Prof. Dr. Th. Ottmann

Problem: similarity of strings


Edit distance
For two given A and B, compute, as efficiently as possible, the edit
distance D(A,B) and a minimal sequence of edit operations which
transforms A into B.
i
i

n
n

f
t

- e r

- o r
p o l

m
-

a
a

t
t

i
i

k o n

Edit distance
Given: two strings A = a1a2 .... am and B = b1b2 ... bn
Wanted: minimal cost D(A,B) for a sequence of edit operations
to transform A into B.
Edit operations:
1. Replace one character in A by a character from B
2. Delete one character from A
3. Insert one character from B

Edit distance
Cost model:

1
c ( a, b)
0

if a b
if a b

a , b possible
We assume the triangle inequality holds for c:
c(a,c) c(a,b) + c(b,c)
Each character is changed at most once

Edit distance
Trace as representation of edit sequences
A=

b a a c a a b c

B= a b a c b c a c
or using indels
A= - b a a c a - a b c
B= a b a - c b c a - c
Edit distance (cost): 5
Division of an optimal trace results in two optimal sub-traces
dynamic programming can be used
5

Computation of the edit distance


Let Ai = a1...ai and Bj = b1....bj
Di,j = D(Ai,Bj)

A
B

SUBESTRUCTURA OPTIMA
Sea Au = a1...au u<i y Bt = b1....bt t<j subcadenas de A y B
respectivamente.
Sea la mejor solucion D(Au , Bt) = D(Au-1 , Bt-1)+add(Au , Bt)

De

existir

(Au-1 , Bt-1) < D(Au-1 , Bt-1)

SOLUCION OPTIMA
Se

hallo una mejor solucion optima a la ya obenida.

D(Au , Bt) = D(Au-1 , Bt-1)+add(Au , Bt)

PROBLEMAS REPETIDOS
4,4

3,4

2,4

2,3

2,3

.
.
.

4,3

3,3

3,3

2,3

2,2

3,2

1,2

1,1

2,1

3,3

3,2

4,2

3,2

Computation of the edit distance


Three possibilities of ending a trace:
1. am is replaced by bn :
Dm,n = Dm-1,n-1 + c(am, bn)
2. am is deleted: Dm,n = Dm-1,n + 1
3. bn is inserted: Dm,n = Dm,n-1 + 1

Computation of the edit distance


Recurrence relation, if m,n 1:

Dm ,n

Dm1,n1

min Dm1,n
D
m ,n 1

c(am , bn ),

1,

Computation of all Di,j is required, 0 i m, 0 j n.


Di-1,j-1

Di-1,j
+d

Di,j-1

+1

+1
Di,j

10

Recurrence relation for the edit distance


Base cases:
D0,0 = D(, ) = 0
D0,j = D(, Bj) = j
Di,0 = D(Ai,) = i

Recurrence equation:

Di 1, j 1

Di , j min Di 1, j
D
i , j 1

c(ai , b j )

1,

11

Example

0
b

12

Computation of the edit operations


Algorithm edit_operations (i,j)
Input: matrix D (computed)
1 if i = 0 and j = 0 then return
2 if i 0 and D[i,j] = D[i 1 , j] + 1
3 then delete a[i]
4
edit_operations (i 1, j)
5 else if j 0 and D[i,j] = D[i, j 1] + 1
6
then insert b[j]
7
edit_operations (i, j 1)
8 else
/* D[i,j] = D[i 1, j 1 ] + c(a[i], b[j]) */
9
replace a[i] by b[j]
10
edit_operations (i 1, j 1)
Initial call: edit_operations(m,n)
13

You might also like