2 EditDistance 2022
2 EditDistance 2022
m Edit
Distance
An alignment:
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
0; if X(i) = Y(j)
Termination:
D(N,M) is distance
The Edit Distance Table
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Edit Distance
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
The Edit Distance Table
N 9 8 9 10 11 12 11 10 9 8
O 8 7 8 9 10 11 10 9 8 9
I 7 6 7 8 9 10 9 8 9 10
T 6 5 6 7 8 9 8 9 10 11
N 5 4 5 6 7 8 9 10 11 10
E 4 3 4 5 6 7 8 9 10 9
T 3 4 5 6 7 8 7 8 9 8
N 2 3 4 5 6 7 8 7 8 7
I 1 2 3 4 5 6 7 6 7 8
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Minimu
m Edit Computing Minimum Edit
Distance
Distance
Minimu
m Edit Backtrace for Computing
Alignments
Distance
Computing alignments
Edit distance isn’t sufficient
◦ We often need to align each character of the two strings to
each other
We do this by keeping a “backtrace”
Every time we enter a cell, remember where we came
from
When we reach the end,
◦ Trace back the path from the upper right corner to read off
the alignment
Edit Distance
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
MinEdit with Backtrace
Adding Backtrace to Minimum Edit Distance
corresponds to
an alignment
of the two sequences
An optimal alignment is
y0 ……………………………… yM composed of optimal
SLIDE ADAPTED FROM SERAFIM BATZOGLOU WITH PERMISSION
subalignments
Result of Backtrace
Two strings and their alignment:
Performance
Time:
O(nm)
Space:
O(nm)
Backtrace
O(n+m)
Minimu
m Edit Backtrace for Computing
Alignments
Distance
Minimu
m Edit Weighted Minimum Edit
Distance
Distance
Weighted Edit Distance
Why would we add weights to the computation?
◦ Spell Correction: some letters are more likely to be
mistyped than others
◦ Biology: certain kinds of deletions or insertions are more
likely than others
Confusion matrix for spelling
errors
Minimu
m Edit Weighted Minimum Edit
Distance
Distance
Minimu
m Edit Minimum Edit Distance in
Computational Biology
Distance
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
Why sequence alignment?
Comparing genes or regions from different species
◦ to find important regions
◦ determine function
◦ uncover evolutionary forces
Assembling fragments to sequence DNA
Compare individuals to looking for mutations
Alignments in two fields