0% found this document useful (0 votes)

61 views30 pages

DP and Edit Dist

This document discusses dynamic programming approaches to calculating edit distance between strings. It introduces the concept of using a cache or memoization to store previously calculated subproblem solutions, in order to avoid recomputing them, and thereby improve the efficiency of the recursive edit distance algorithm from exponential to linear time. An example dynamic programming algorithm is presented that builds up the edit distance values in a 2D table.

Uploaded by

jagadeeswara71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views30 pages

DP and Edit Dist

Uploaded by

jagadeeswara71

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Dynamic programming

and edit distance

Ben Langmead

Department of Computer Science

You are free to use these slides. If you do, please sign the
guestbook (www.langmead-lab.org/teaching-materials), or email
me ([email protected]) and tell me briefly how you’re
using them. For original Keynote files, email me.
Beyond approximate matching: sequence similarity
In many settings, Hamming and edit distance are too simple. Biologically-relevant
distances require algorithms. We will expand our tool set accordingly.

Score = 248 bits (129), Expect = 1e-63

Identities = 213/263 (80%), Gaps = 34/263 (12%)
Strand = Plus / Plus

Query: 161 atatcaccacgtcaaaggtgactccaactcca---ccactccattttgttcagataatgc 217

||||||||||||||||||||||||||||| | | | || ||||||||||||||
Sbjct: 481 atatcaccacgtcaaaggtgactccaact-tattgatagtgttttatgttcagataatgc 539

Query: 218 ccgatgatcatgtcatgcagctccaccgattgtgagaacgacagcgacttccgtcccagc 277

||||||| ||||||||||||||||||||| || | ||||||||||||
Sbjct: 540 ccgatgactttgtcatgcagctccaccgattttg-g------------ttccgtcccagc 586

Query: 278 c-gtgcc--aggtgctgcctcagattcaggttatgccgctcaattcgctgcgtatatcgc 334

| || | | ||||||||||||||||||||||||||||||||||||||| |||||||||
Sbjct: 587 caatgacgta-gtgctgcctcagattcaggttatgccgctcaattcgctgggtatatcgc 645

Query: 335 ttgctgattacgtgcagctttcccttcaggcggga------------ccagccatccgtc 382

||||||||||||||||||||||||||||||||||| |||||||||||||
Sbjct: 646 ttgctgattacgtgcagctttcccttcaggcgggattcatacagcggccagccatccgtc 705

Query: 383 ctccatatc-accacgtcaaagg 404

|||||||| ||||||||||||| Example BLAST alignment
Sbjct: 706 atccatatcaaccacgtcaaagg 728
Approximate string matching

A mismatch is a single-character substitution:

X: G T A G C G G C G
||| |||||
Y: G T A A C G G C G

An edit is a single-character substitution or gap (insertion or deletion):

X: G T A G C G G C G
||| |||||
Y: G T A A C G G C G
Gap in X
X: G T A G C - G C G
||||| ||| AKA insertion in Y or deletion in X
Y: G T A G C G G C G

X: G T A G C G G C G
|| |||||| AKA insertion in X or deletion in Y
Y: G T - G C G G C G
Gap in Y
Alignment

X: G C G T A T G A G G C T A - A C G C
|| |||| ||||| ||||
Y: G C - T A T G C G G C T A T A C G C

Above is an alignment: a way of lining up the characters of x and y

Could include mismatches, gaps or both

Vertical lines are drawn where opposite characters match

Hamming and edit distance

Finding Hamming distance between 2 strings is easy:

def hammingDistance(x, y):
assert len(x) == len(y)
nmm = 0 GAGGTAGCGGCGTTTAAC
for i in xrange(0, len(x)): | |||| ||| |||||||
if x[i] != y[i]: GTGGTAACGGGGTTTAAC
nmm += 1
return nmm

Edit distance is harder:

def editDistance(x, y): GCGTATGCGGCTA-ACGC
|| |||||||||| |||
??? GC-TATGCGGCTATACGC
Edit distance

def editDistance(x, y): GCGTATGCGGCTA-ACGC

|| |||||||||| ||||
return ??? GC-TATGCGGCTATACGC

If strings x and y are same length, what can we say about

editDistance(x, y) relative to hammingDistance(x, y)?

editDistance(x, y) ≤ hammingDistance(x, y)

If strings x and y are diﬀerent lengths, what can we say about

editDistance(x, y)?

editDistance(x, y) ≥ | | x | - | y | |

Python example: http://bit.ly/CG_DP_EditDist

Edit distance
Can think of edits as being introduced by an optimal editor working left-to-right.
Edit transcript describes how editor turns x into y.

x: GCGTATGCGGCTAACGC Operations:
M = match, R = replace,
y: GCTATGCGGCTATACGC I = insert into x, D = delete from x

x: GCGTATGCGGCTAACGC
|| MMD
y: G C - T A T G C G G C T A T A C G C

x: GCGTATGCGGCTA-ACGC
|| |||||||||| MMDMMMMMMMMMMI
y: G C - T A T G C G G C T A T A C G C

x: GCGTATGCGGCTA-ACGC
|| |||||||||| |||| MMDMMMMMMMMMMIMMMM
y: G C - T A T G C G G C T A T A C G C
Edit distance

Edit transcripts with

Alignments:
respect to x:

x: GCGTATGCGGCTA-ACGC
|| |||||||||| |||| MMDMMMMMMMMMMIMMMM Distance = 2
y: G C - T A T G C G G C T A T A C G C

x: GCGTATGAGGCTA-ACGC
|| |||| ||||| |||| MMDMMMMRMMMMMIMMMM Distance = 3
y: G C - T A T G C G G C T A T A C G C

x: the longest----
||||||| DDDDMMMMMMMIIII Distance = 8
y: - - - - l o n g e s t d a y
Edit distance
D[i, j]: edit distance between length-i prefix of x and length-j prefix of y
i
x:
y:
j

Think in terms of edit transcript. Optimal transcript for D[i, j] can be built
by extending a shorter one by 1 operation. Only 3 options:

Append D to transcript for D[i-1, j]

Append I to transcript for D[i, j-1]
Append M or R to transcript for D[i-1, j-1]

D[i, j] is minimum of the three, and D[|x|, |y|] is the overall edit distance
Edit distance

Let D[0, j] = j, and let D[i, 0] = i

8 D
< D[i 1, j] + 1 I
Otherwise, let D[i, j] = min D[i, j 1] + 1 M or R
:
D[i 1, j 1] + (x[i 1], y[j 1])
(a, b) is 0 if a = b, 1 otherwise
Edit distance
Let D[0, j] = j, and let D[i, 0] = i
8
< D[i 1, j] + 1
Otherwise, let D[i, j] = min D[i, j 1] + 1
:
D[i 1, j 1] + (x[i 1], y[j 1])
(a, b) is 0 if a = b, 1 otherwise

A simple recursive algorithm: prefixes of x and y currently

under consideration
def edDistRecursive(x, y):
if len(x) == 0: return len(y)
if len(y) == 0: return len(x)
delt = 1 if x[-‐1] != y[-‐1] else 0
diag = edDistRecursive(x[:-‐1], y[:-‐1]) + delt
vert = edDistRecursive(x[:-‐1], y) + 1
Recursively solve
horz = edDistRecursive(x, y[:-‐1]) + 1 smaller problems
return min(diag, vert, horz)

Python example: http://bit.ly/CG_DP_EditDist

Edit distance
def edDistRecursive(x, y):
if len(x) == 0: return len(y)
if len(y) == 0: return len(x)
delt = 1 if x[-‐1] != y[-‐1] else 0
diag = edDistRecursive(x[:-‐1], y[:-‐1]) + delt
vert = edDistRecursive(x[:-‐1], y) + 1
horz = edDistRecursive(x, y[:-‐1]) + 1
return min(diag, vert, horz)

>>> import datetime as d

>>> st = d.datetime.now(); \
... edDistRecursive("Shakespeare", "shake spear"); \
... print (d.datetime.now()-‐st).total_seconds()
3
31.498284

Simple, but takes >30 seconds for a small problem

Edit distance: dynamic programming
Subproblems (D[i, j]s) can be reused instead of being recalculated:
def edDistRecursive(x, y):
if len(x) == 0: return len(y)
if len(y) == 0: return len(x)
delt = 1 if x[-‐1] != y[-‐1] else 0
diag = edDistRecursive(x[:-‐1], y[:-‐1]) + delt
vert = edDistRecursive(x[:-‐1], y) + 1
horz = edDistRecursive(x, y[:-‐1]) + 1
return min(diag, vert, horz)

def edDistRecursiveMemo(x, y, memo=None):

if memo is None: memo = {}
if len(x) == 0: return len(y)
Reusing Return if len(y) == 0: return len(x)
solutions to memoized if (len(x), len(y)) in memo:
return memo[(len(x), len(y))]
subproblems is answer, if delt = 1 if x[-‐1] != y[-‐1] else 0
memoization: avaialable diag = edDistRecursiveMemo(x[:-‐1], y[:-‐1], memo) + delt
vert = edDistRecursiveMemo(x[:-‐1], y, memo) + 1
horz = edDistRecursiveMemo(x, y[:-‐1], memo) + 1
ans = min(diag, vert, horz)
Memoize D[i, j] memo[(len(x), len(y))] = ans
return ans

Python example: http://bit.ly/CG_DP_EditDist

Edit distance: dynamic programming
def edDistRecursiveMemo(x, y, memo=None):
if memo is None: memo = {}
if len(x) == 0: return len(y)
if len(y) == 0: return len(x)
if (len(x), len(y)) in memo:
return memo[(len(x), len(y))]
delt = 1 if x[-‐1] != y[-‐1] else 0
diag = edDistRecursiveMemo(x[:-‐1], y[:-‐1], memo) + delt
vert = edDistRecursiveMemo(x[:-‐1], y, memo) + 1
horz = edDistRecursiveMemo(x, y[:-‐1], memo) + 1
ans = min(diag, vert, horz)
memo[(len(x), len(y))] = ans
return ans

>>> import datetime as d

>>> st = d.datetime.now(); \
... edDistRecursiveMemo("Shakespeare", "shake spear"); \
... print (d.datetime.now()-‐st).total_seconds()
3
0.000593

Much better
Edit distance: dynamic programming

edDistRecursiveMemo is a top-down dynamic programming approach

Alternative is bottom-up. Here, bottom-up recursion is pretty intuitive and
interpretable, so this is how edit distance algorithm is usually explained.
Fills in a table (matrix) of D(i, j)s:
import numpy numpy: package for matrices, etc

def edDistDp(x, y):

""" Calculate edit distance between sequences x and y using
matrix dynamic programming. Return distance. """
D = numpy.zeros((len(x)+1, len(y)+1), dtype=int)
D[0, 1:] = range(1, len(y)+1)
D[1:, 0] = range(1, len(x)+1) Fill 1st row, col
for i in xrange(1, len(x)+1):
for j in xrange(1, len(y)+1):
delt = 1 if x[i-‐1] != y[j-‐1] else 0 Fill rest of matrix
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)
return D[len(x), len(y)]
Edit distance: dynamic programming

ϵ is empty y
string
ϵ G C T A T G C C A C G C Let n = | x |, m = | y |
ϵ
G D: (n+1) x (m+1) matrix
C D[i, j] = edit distance b/t
G length-i prefix of x and
T length-j prefix of y
A
D: x T
G
C
A
C
G
C
Edit distance: dynamic programming
y
ϵ G C T A T G C C A C G C
ϵ
G D[6, 5]
C D[5, 5]
G
D[5, 6]
T
A
D: x T D[6, 6]
G Value in a cell depends upon its upper,
C left, and upper-left neighbors
A 8
C < D[i 1, j] + 1 upper
left upper-left
G D[i, j] = min
:
D[i, j 1] + 1
D[i 1, j 1] + (x[i 1], y[j 1])
C
Edit distance: dynamic programming
D = numpy.zeros((len(x)+1, len(y)+1), dtype=int)
First few lines
D[0, 1:] = range(1, len(y)+1)
of edDistDp: D[1:, 0] = range(1, len(x)+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12 Initialize D[0, j] to j,
G 1 D[i, 0] to i
C 2
G 3
T 4
A 5
T 6
G 7
C 8
A 9
C 10
G 11
C 12
Edit distance: dynamic programming
for i in xrange(1, len(x)+1):
Loop from for j in xrange(1, len(y)+1):
edDistDp: delt = 1 if x[i-‐1] != y[j-‐1] else 0
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12 Fill remaining cells from
G 1 top row to bottom and
C 2 from left to right
G 3
T 4
A 5 etc
T 6
G 7
C 8
A 9
C 10
G 11
C 12
Edit distance: dynamic programming
for i in xrange(1, len(x)+1):
Loop from for j in xrange(1, len(y)+1):
edDistDp: delt = 1 if x[i-‐1] != y[j-‐1] else 0
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12 Fill remaining cells from
G 1 ? top row to bottom and
C 2 from left to right
G 3
T 4 What goes here in i=1,j=1?
A 5 x[i-‐1] = y[j-‐1] = ‘G ‘,
T 6 so delt = 0
G 7
C 8 D[i, j] = min(D[i-‐1, j-‐1]+delt,
D[i-‐1, j]+1,
A 9 D[i, j-‐1]+1)
= min(0 + 0, 1 + 1, 1 + 1)
C 10 = 0
G 11
C 12
Edit distance: dynamic programming
for i in xrange(1, len(x)+1):
Loop from for j in xrange(1, len(y)+1):
edDistDp: delt = 1 if x[i-‐1] != y[j-‐1] else 0
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12 Fill remaining cells from
G 1 0 1 2 3 4 5 6 7 8 9 10 11 top row to bottom and
C 2 1 0 1 2 3 4 5 6 7 8 9 10 from left to right
G 3 2 1 1 2 3 3 4 5 6 7 8 9
T 4 3 2 1 2 2 3 4 5 6 7 8 9
A 5 4 3 2 1 2 3 4 5 5 6 7 8
T 6 5 4 3 2 1 2 3 4 5 6 7 8
G 7 6 5 4 3 2 1 2 3 4 5 6 7
C 8 7 6 5 4 3 2 1 2 3 4 5 6
A 9 8 7 6 5 4 3 2 2 2 3 4 5
C 10 9 8 7 6 5 4 3 2 3 2 3 4 Edit distance for x, y
G 11 10 9 8 7 6 5 4 3 3 3 2 3
C 12 11 10 9 8 7 6 5 4 4 3 3 2
Edit distance: dynamic programming
for i in xrange(1, len(x)+1):
Loop from for j in xrange(1, len(y)+1):
edDistDp: delt = 1 if x[i-‐1] != y[j-‐1] else 0
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12 Could we have filled the
G 1 cells in a diﬀerent order?
C 2
G 3
T 4
A 5 etc
T 6
G 7
C 8
A 9
C 10
G 11
C 12
Edit distance: dynamic programming
for j in xrange(1, len(y)+1):
Switched for i in xrange(1, len(x)+1):
delt = 1 if x[i-‐1] != y[j-‐1] else 0
D[i, j] = min(D[i-‐1, j-‐1]+delt, D[i-‐1, j]+1, D[i, j-‐1]+1)

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 Yes: e.g. invert the loops
C 2
G 3
T 4
A 5
T 6 etc
G 7
C 8
A 9
C 10
G 11
C 12
Edit distance: dynamic programming

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 Or by anti-diagonal
C 2
G 3
T 4 etc
A 5
T 6
G 7
C 8
A 9
C 10
G 11
C 12
Edit distance: dynamic programming

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 Or blocked
C 2
G 3
T 4
A 5
T 6
G 7
C 8
A 9
C 10
G 11 etc
C 12
Edit distance: getting the alignment

Full backtrace path corresponds to an optimal alignment / edit transcript:

Start at end; at each step, ask: which predecessor gave the minimum?

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 0 1 2 3 4 5 6 7 8 9 10 11
C 2 1 0 1 2 3 4 5 6 7 8 9 10
G 3 2 1 1 2 3 3 4 5 6 7 8 9
T 4 3 2 1 2 2 3 4 5 6 7 8 9
A 5 4 3 2 1 2 3 4 5 5 6 7 8
T 6 5 4 3 2 1 2 3 4 5 6 7 8
G 7 6 5 4 3 2 1 2 3 4 5 6 7
C 8 7 6 5 4 3 2 1 2 3 4 5 6
A 9 8 7 6 5 4 3 2 2 2 3 4 5
C 10 9 8 7 6 5 4 3 2 3 2 3 4 A: From here
G 11 10 9 8 7 6 5 4 3 3 3 2 3 Q: How did I get here?
C 12 11 10 9 8 7 6 5 4 4 3 3 2
Edit distance: getting the alignment

Full backtrace path corresponds to an optimal alignment / edit transcript:

Start at end; at each step, ask: which predecessor gave the minimum?

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 0 1 2 3 4 5 6 7 8 9 10 11
C 2 1 0 1 2 3 4 5 6 7 8 9 10
G 3 2 1 1 2 3 3 4 5 6 7 8 9
T 4 3 2 1 2 2 3 4 5 6 7 8 9
A 5 4 3 2 1 2 3 4 5 5 6 7 8
T 6 5 4 3 2 1 2 3 4 5 6 7 8
G 7 6 5 4 3 2 1 2 3 4 5 6 7
C 8 7 6 5 4 3 2 1 2 3 4 5 6
A 9 8 7 6 5 4 3 2 2 2 3 4 5 A: From here
C 10 9 8 7 6 5 4 3 2 3 2 3 4 Q: How did I get here?
G 11 10 9 8 7 6 5 4 3 3 3 2 3
C 12 11 10 9 8 7 6 5 4 4 3 3 2
Edit distance: getting the alignment

Full backtrace path corresponds to an optimal alignment / edit transcript:

Start at end; at each step, ask: which predecessor gave the minimum?

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 0 1 2 3 4 5 6 7 8 9 10 11
C 2 1 0 1 2 3 4 5 6 7 8 9 10
G 3 2 1 1 2 3 3 4 5 6 7 8 9
T 4 3 2 1 2 2 3 4 5 6 7 8 9
A 5 4 3 2 1 2 3 4 5 5 6 7 8
T 6 5 4 3 2 1 2 3 4 5 6 7 8 A: From here
G 7 6 5 4 3 2 1 2 3 4 5 6 7 Q: How did I get here?
C 8 7 6 5 4 3 2 1 2 3 4 5 6
A 9 8 7 6 5 4 3 2 2 2 3 4 5
C 10 9 8 7 6 5 4 3 2 3 2 3 4
G 11 10 9 8 7 6 5 4 3 3 3 2 3
C 12 11 10 9 8 7 6 5 4 4 3 3 2
Edit distance: getting the alignment

Full backtrace path corresponds to an optimal alignment / edit transcript:

Start at end; at each step, ask: which predecessor gave the minimum?

ϵ G C T A T G C C A C G C
ϵ 0 1 2 3 4 5 6 7 8 9 10 11 12
G 1 0 1 2 3 4 5 6 7 8 9 10 11
Alignment:
C 2 1 0 1 2 3 4 5 6 7 8 9 10
G 3 2 1 1 2 3 3 4 5 6 7 8 9 GCGTATG-CACGC
T 4 3 2 1 2 2 3 4 5 6 7 8 9 || |||| |||||
GC-TATGCCACGC
A 5 4 3 2 1 2 3 4 5 5 6 7 8
T 6 5 4 3 2 1 2 3 4 5 6 7 8
G 7 6 5 4 3 2 1 2 3 4 5 6 7
C 8 7 6 5 4 3 2 1 2 3 4 5 6 Edit transcript:
A 9 8 7 6 5 4 3 2 2 2 3 4 5 MMDMMMMIMMMMM
C 10 9 8 7 6 5 4 3 2 3 2 3 4
G 11 10 9 8 7 6 5 4 3 3 3 2 3
C 12 11 10 9 8 7 6 5 4 4 3 3 2
Edit distance: summary

Matrix-filling dynamic programming algorithm is O(mn) time and space

FillIng matrix is O(mn) space and time, and yields edit distance

Backtrace is O(m + n) time, yields optimal alignment / edit transcript

03 Text Processing - Minimum Edit Distance
No ratings yet
03 Text Processing - Minimum Edit Distance
41 pages
Systems and Computational Biology Molecular and Cellular Experimental Systems PDF
No ratings yet
Systems and Computational Biology Molecular and Cellular Experimental Systems PDF
344 pages
Dynamic Programming 4
No ratings yet
Dynamic Programming 4
107 pages
Lec10 12 Edit Distance
No ratings yet
Lec10 12 Edit Distance
54 pages
Scoring Matrices 06
No ratings yet
Scoring Matrices 06
25 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
Spell Correction & Edit Distance
No ratings yet
Spell Correction & Edit Distance
35 pages
Lecture # 15 - New
No ratings yet
Lecture # 15 - New
70 pages
03 Med
No ratings yet
03 Med
52 pages
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
No ratings yet
STAR Manual 2.7.3a: Alexander Dobin Dobin@cshl - Edu October 8, 2019
54 pages
2 EditDistance 2022
No ratings yet
2 EditDistance 2022
37 pages
03 Text Processing - Minimum Edit Distance
No ratings yet
03 Text Processing - Minimum Edit Distance
41 pages
Workshop 01 - Live Session HW0
No ratings yet
Workshop 01 - Live Session HW0
21 pages
Edit Dist
No ratings yet
Edit Dist
35 pages
Bio Python 202111
No ratings yet
Bio Python 202111
63 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Calculating Minimum Edit Distance
0% (1)
Calculating Minimum Edit Distance
52 pages
05 Dynamic Programming I I
No ratings yet
05 Dynamic Programming I I
64 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
On Differentially Private String Distances
No ratings yet
On Differentially Private String Distances
25 pages
Manual - Hisat2
No ratings yet
Manual - Hisat2
18 pages
Edit Distance
No ratings yet
Edit Distance
19 pages
L3 Edit Distance
No ratings yet
L3 Edit Distance
23 pages
Tolypothrix Distorta
No ratings yet
Tolypothrix Distorta
13 pages
Abhilash-SWISS MODEL Seminar 2023
No ratings yet
Abhilash-SWISS MODEL Seminar 2023
25 pages
03 Med
No ratings yet
03 Med
35 pages
2 EditDistance 2023
No ratings yet
2 EditDistance 2023
35 pages
EditDistance
No ratings yet
EditDistance
28 pages
Lecture 24
No ratings yet
Lecture 24
10 pages
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
100% (1)
Reference: Apache Hadoop: Hadoop: The Definitive Guide, by Tom White, 2 Edition, Oreilly's, 2010
57 pages
Levenshtein Distance - Coderust - Hacking The Coding Interview
No ratings yet
Levenshtein Distance - Coderust - Hacking The Coding Interview
10 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
10 pages
Bi Syllabus
No ratings yet
Bi Syllabus
32 pages
Defini'on of Minimum Edit Distance
No ratings yet
Defini'on of Minimum Edit Distance
52 pages
Bioinformatics Exercises Final
No ratings yet
Bioinformatics Exercises Final
10 pages
B505 Lec.10 DynamicProgramming 1
No ratings yet
B505 Lec.10 DynamicProgramming 1
19 pages
Scalability and Validation of Big Data Bioinformatics Software
No ratings yet
Scalability and Validation of Big Data Bioinformatics Software
8 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
9 pages
CSE-AIML R22 I Year Syllabus
No ratings yet
CSE-AIML R22 I Year Syllabus
65 pages
R RNAd B
No ratings yet
R RNAd B
27 pages
Apunts Modul 1
No ratings yet
Apunts Modul 1
25 pages
Cuda Smith Watermaan Speed Up
No ratings yet
Cuda Smith Watermaan Speed Up
7 pages
Introduction To Structural Databases
No ratings yet
Introduction To Structural Databases
10 pages
Module4 Session1 Prac Lucy Nakabazzi 2
100% (1)
Module4 Session1 Prac Lucy Nakabazzi 2
3 pages
Description of A New Ornithodoros Pavlovskyella Ixodida Argasidae Tick Species From Pakistan
No ratings yet
Description of A New Ornithodoros Pavlovskyella Ixodida Argasidae Tick Species From Pakistan
14 pages
Testing Object-Oriented Systems:: Dr. Magdy S. Hanna
No ratings yet
Testing Object-Oriented Systems:: Dr. Magdy S. Hanna
42 pages
Distance Measure
No ratings yet
Distance Measure
11 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
Pymol PDF
No ratings yet
Pymol PDF
28 pages
Final Exam Fall 23
No ratings yet
Final Exam Fall 23
10 pages
Illumina Sequencing Introduction
No ratings yet
Illumina Sequencing Introduction
12 pages
HMM102 T2 2023 Practical 3 For Students
No ratings yet
HMM102 T2 2023 Practical 3 For Students
13 pages
A Guided Tour To Approximate String Matching: Gonzalo Navarro
No ratings yet
A Guided Tour To Approximate String Matching: Gonzalo Navarro
58 pages
01 Defining Minimum Edit Distance 7-04
No ratings yet
01 Defining Minimum Edit Distance 7-04
3 pages
ICAR Syllabus NET
No ratings yet
ICAR Syllabus NET
3 pages
Ieee Inter SVM DT 07857480
No ratings yet
Ieee Inter SVM DT 07857480
5 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
(BIF 401) Current Solved Papers.
No ratings yet
(BIF 401) Current Solved Papers.
16 pages
Code Review
No ratings yet
Code Review
5 pages
The Yan Report
96% (117)
The Yan Report
26 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
13 pages
Big Data Analytics in Heart Attack Prediction 2167 1168 1000393
No ratings yet
Big Data Analytics in Heart Attack Prediction 2167 1168 1000393
9 pages
Data Analysis of Brain Cancer With Biopython
No ratings yet
Data Analysis of Brain Cancer With Biopython
9 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
R05311201 Automata and Compiler Design
No ratings yet
R05311201 Automata and Compiler Design
7 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Global Alignment: Ben Langmead
No ratings yet
Global Alignment: Ben Langmead
15 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
LATEX Forloop PDF
No ratings yet
LATEX Forloop PDF
2 pages
Error Detection
No ratings yet
Error Detection
6 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
No ratings yet
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
13 pages
University of Campinas Notebook
No ratings yet
University of Campinas Notebook
17 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
IV B BCS401 Assignment-1
No ratings yet
IV B BCS401 Assignment-1
37 pages
q1 Answer
No ratings yet
q1 Answer
2 pages
Programming Assignment 5: Dynamic Programming 1
No ratings yet
Programming Assignment 5: Dynamic Programming 1
11 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Efficient Algorithms For Normalized Edit Distance: Abdullah N. Arslan, Department of Computer Science
No ratings yet
Efficient Algorithms For Normalized Edit Distance: Abdullah N. Arslan, Department of Computer Science
18 pages
ECN Informatics 2013
No ratings yet
ECN Informatics 2013
17 pages
Approximate Matching
No ratings yet
Approximate Matching
16 pages
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
No ratings yet
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
18 pages
2025 FEB Coding FY Induction Paper
No ratings yet
2025 FEB Coding FY Induction Paper
8 pages
Sascha Schnepp: Skiena's TADM Problems Chapter 8
No ratings yet
Sascha Schnepp: Skiena's TADM Problems Chapter 8
17 pages
An O (ND) Difference Algorithm and Its Variations
No ratings yet
An O (ND) Difference Algorithm and Its Variations
15 pages
Levenshtein
No ratings yet
Levenshtein
14 pages
NLP Assignment-3 (Print)
No ratings yet
NLP Assignment-3 (Print)
5 pages
Programming Assignment 5: Dynamic Programming 1
No ratings yet
Programming Assignment 5: Dynamic Programming 1
11 pages
Week5 Dynamic Programming1
No ratings yet
Week5 Dynamic Programming1
11 pages
Microsyllabus BSC 304
No ratings yet
Microsyllabus BSC 304
4 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

DP and Edit Dist

Uploaded by

DP and Edit Dist

Uploaded by

Dynamic programming

and edit distance

Department of Computer Science

Score = 248 bits (129), Expect = 1e-63

Query: 161 atatcaccacgtcaaaggtgactccaactcca---ccactccattttgttcagataatgc 217

Query: 218 ccgatgatcatgtcatgcagctccaccgattgtgagaacgacagcgacttccgtcccagc 277

Query: 278 c-gtgcc--aggtgctgcctcagattcaggttatgccgctcaattcgctgcgtatatcgc 334

Query: 335 ttgctgattacgtgcagctttcccttcaggcggga------------ccagccatccgtc 382

Query: 383 ctccatatc-accacgtcaaagg 404

A mismatch is a single-character substitution:

An edit is a single-character substitution or gap (insertion or deletion):

Above is an alignment: a way of lining up the characters of x and y

Could include mismatches, gaps or both

Vertical lines are drawn where opposite characters match

Finding Hamming distance between 2 strings is easy:

Edit distance is harder:

def editDistance(x, y): GCGTATGCGGCTA-ACGC

If strings x and y are same length, what can we say about

If strings x and y are diﬀerent lengths, what can we say about

Python example: http://bit.ly/CG_DP_EditDist

Edit transcripts with

Append D to transcript for D[i-1, j]

Let D[0, j] = j, and let D[i, 0] = i

A simple recursive algorithm: prefixes of x and y currently

Python example: http://bit.ly/CG_DP_EditDist

>>> import datetime as d

Simple, but takes >30 seconds for a small problem

def edDistRecursiveMemo(x, y, memo=None):

Python example: http://bit.ly/CG_DP_EditDist

>>> import datetime as d

edDistRecursiveMemo is a top-down dynamic programming approach

def edDistDp(x, y):

Full backtrace path corresponds to an optimal alignment / edit transcript:

Full backtrace path corresponds to an optimal alignment / edit transcript:

Full backtrace path corresponds to an optimal alignment / edit transcript:

Full backtrace path corresponds to an optimal alignment / edit transcript:

Matrix-filling dynamic programming algorithm is O(mn) time and space

Backtrace is O(m + n) time, yields optimal alignment / edit transcript

You might also like