0% found this document useful (0 votes)

45 views

04 Dynamic Programming 2 Editdistance

The document discusses identifying the cystic fibrosis gene. It describes how in the late 1980s, biologists narrowed the search for the gene to a region of chromosome 7 containing many genes. They then searched for genes in this region that were similar to known genes responsible for secretion, leading to the identification of a gene similar to ATP binding proteins as the cystic fibrosis gene.

Uploaded by

KINGS entertainment KHAN

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

04 Dynamic Programming 2 Editdistance

Uploaded by

KINGS entertainment KHAN

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 99

Dynamic Programming:

String Comparison

Pavel Pevzner
Department of Computer Science and Engineering
University of California, San Diego

Algorithmic Design and Techniques

Algorithms and Data Structures
Cystic Fibrosis
Cystic fibrosis (CF): An often
fatal disease which affects the
respiratory system and
produces an abnormally large
amount of mucus.
Mucus is a slimy material
that coats epithelial
surfaces and is secreted
into fluids such as saliva.
Approximately 1 in 25 Humans
Carry a Faulty CF Gene
When BOTH parent carry a
faulty gene, there is a 25%
chance that their child will
have cystic fibrosis.
In the early 1980s biologists
hypothesized that CF is
caused by mutations in an
unidentified gene.
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was responsible for CF.
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Hint 3: Should we search for genes in this region that are
similar to known genes responsible for secretion?
Identifying the Cystic Fibrosis Gene

BINGO: One of the genes in this

region was similar to ATP binding
proteins that act as transport
channels responsible for secretion.

Hint 1: Cystic fibrosis involves sweet secretion with abnor-

mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Hint 3: Should we search for genes in this region that are
similar to known genes responsible for secretion?
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

The Alignment Game

A T G T T A T A
A T C G T C C

Alignment game: remove all symbols from two

strings in such a way that the number of points
is maximized:
Remove the 1st symbol from both strings:
1 point if the symbols match,
0 points if they don’t match
Remove the 1st symbol from one of the strings:
0 points
The Alignment Game

A T G T T A T A
A T C G T C C
+1
The Alignment Game

A T G T T A T A
A T C G T C C
+1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C - C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C - C
+1 +1 +1 +1 =4
Sequence Alignment

A T - G T T A T A
A T C G T - C - C

Alignment of two strings is a two-row matrix:

1st row: symbols of the 1st string (in order)
interspersed by “–”
2nd row: symbols of the 2nd string (in order)
interspersed by “–”
Sequence Alignment
matches

A T - G T T A T A
A T C G T - C - C
Sequence Alignment
matches mismatches

A T - G T T A T A
A T C G T - C - C
Sequence Alignment
matches insertions deletions mismatches

A T - G T T A T A
A T C G T - C - C
Alignment Score

A T - G T T A T A
A T C G T - C - C

Alignment score: premium for every match (+1) and

penalty for every mismatch (−𝜇), indel (−𝜎).
Alignment Score

A T - G T T A T A
A T C G T - C - C
+1 +1 -1 +1 +1 -1 +0 -1 +0 =1

Alignment score: premium for every match (+1) and

penalty for every mismatch (−𝜇), indel (−𝜎).

Example: 𝜇 = 0 and 𝜎 = 1
Alignment Score

#matches − 𝜇 · #mismatches − 𝜎 · #indels

Optimal alignment
Input: Two strings, mismatch penalty 𝜇,
and indel penalty 𝜎.
Output: An alignment of the strings
maximizing the score.
Common Subsequence

A T - G T T A T A
A T C G T - C - C

Matches in an alignment of two strings

(ATGT) form their common subsequence
Longest common subsequence
Input: Two strings.
Output: A longest common subsequence of
these strings.
Longest common subsequence
Input: Two strings.
Output: A longest common subsequence of
these strings.

Maximizing the length of a common

subsequence corresponds to maximizing the
score of an alignment with 𝜇 = 𝜎 = 0.
Edit distance
Input: Two strings.
Output: The minimum number of
operations (insertions, deletions,
and substitutions of symbols) to
transform one string into another.
Edit distance
Input: Two strings.
Output: The minimum number of
operations (insertions, deletions,
and substitutions of symbols) to
transform one string into another.

The minimum number of insertions, deletions

and mismatches in an alignment of two
strings (among all possible alignments).
Example

E D I − T I N G −
− D I S T A N C E
Example

mismatches
matches

E D I − T I N G −
− D I S T A N C E

deletion insertions
E D I − T I N G −
− D I S T A N C E
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
+2·#mismatches
+1·#insertions
+1·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
+2·#matches −1·#insertions
+2·#mismatches −1·#deletions
=
+1·#insertions +2·#mismatches
+1·#deletions +2·#insertions
+2·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
2 · AlignmentScore
+2·#matches −1·#insertions
(𝜇 = 0, 𝜎 = 1/2)
+2·#mismatches −1·#deletions
= +
+1·#insertions +2·#mismatches
+1·#deletions +2·#insertions 2 · EditDistance
+2·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
minimizing edit distance
2 · AlignmentScore
+2·#matches −1·#insertions
= (𝜇 = 0, 𝜎 = 1/2)
+2·#mismatches −1·#deletions
= +
maximizing
+1·#insertions alignment score
+2·#mismatches
+1·#deletions +2·#insertions 2 · EditDistance
+2·#deletions
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

A[1 . . . i]
B[1 . . . j]

Given strings A[1 . . . n] and B[1 . . . m], what is an op-

timal alignment (an alignment that results in minimum
edit distance) of an i-prefix A[1 . . . i] of the first string
and a j-prefix B[1 . . . j] of the second string?
A[1 . . . i]
B[1 . . . j]

The last column of an optimal alignment is either

insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]

A[1 . . . i]
B[1 . . . j]

The last column of an optimal alignment is either

an insertion,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j]

The last column of an optimal alignment is either

an insertion,
a deletion,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
B[1 . . . j − 1] B[j]
mismatch

The last column of an optimal alignment is either

an insertion,
a deletion,
a mismatch,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]
The last column of an optimal alignment is either
an insertion,
a deletion,
a mismatch,
or a match.
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]
The last column of an optimal alignment is either
an insertion,
a deletion,
a mismatch,
or a match.
What is left (after the removal of the last column) is an
optimal alignment of the corresponding two prefixes.
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]

Let D(i, j) be the edit distance of an i-prefix A[1 . . . i]

and a j-prefix B[1 . . . j].
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]

⎧
⎪
⎪
⎪D(i, j − 1) + 1
⎪
⎨D(i − 1, j) + 1
D(i, j) = min
⎪
⎪
⎪D(i − 1, j − 1) + 1 if A[i] ̸= B[j]
⎩D(i − 1, j − 1)
⎪
if A[i] = B[j]
j
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
i T 4
I 5
N 6
G 7
comparing A[1 . . . n] = EDITING
and B[1 . . . m] = DISTANCE
j
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
i T 4 D(i, j)

I 5
N 6
G 7
comparing A[1 . . . n] = EDITING
and B[1 . . . m] = DISTANCE
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0
E 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 ?
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(1, 1) = min{D(1, 0) + 1, D(0, 1) + 1, D(0, 0) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 ?
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(1, 1) = min{2, 2, 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D2 2 ?
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(2, 1) = min{D(2, 0) + 1, D(1, 1) + 1, D(1, 0)}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 ?
T 4 4
I 5 5
N 6 6
G 7 7
D(3, 1) = min{D(3, 0) + 1, D(2, 1) + 1, D(2, 0) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 2
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 ?
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D(1, 2) = min{D(1, 1) + 1, D(0, 2) + 1, D(0, 1) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 ?
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D(1, 1) = min{2, 3, 2}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2
D 2 2 1 2
I 3 3 2 1
T 4 4 3 2
I 5 5 4 3
N 6 6 5 4
G 7 7 6 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
EditDistance(A[1 . . . n], B[1 . . . m])
D(i, 0) ← i and D(0, j) ← j for all i, j
for j from 1 to m:
for i from 1 to n:
insertion ← D(i, j − 1) + 1
deletion ← D(i − 1, j) + 1
match ← D(i − 1, j − 1)
mismatch ← D(i − 1, j − 1) + 1
if A[i] = B[j]:
D(i, j) ← min(insertion, deletion, match)
else:
D(i, j) ← min(insertion, deletion, mismatch)
return D(n, m)
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

Optimal Alignment
We have computed the edit distance,
but how can we find an optimal
alignment?
Optimal Alignment
We have computed the edit distance,
but how can we find an optimal
alignment?
The backtracking pointers that we
stored will help us to reconstruct an
optimal alignment.
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
T 4
I 5
N 6
G 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E
D
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −
D I
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−
D I S
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D
D I S −
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I
D I S −−
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I T
D I S −−T
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
the constructed
path corresponds D 2
to distance 8 and I 3
is not optimal T 4
(edit distance is 5)
I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
to construct an
optimal align- D 2
ment we will use I 3
the backtracking
T 4
pointers
I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
the edit distance
I 3 3 2 1 2 3 4 5 6 7
is 5
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7 we arrived to the
D 2 2 1 2 3 4 5 6 7 8 bottom right cell
by moving along
I 3 3 2 1 2 3 4 5 6 7
the backtracking
T 4 4 3 2 2 2 3 4 5 6 pointers shown
I 5 5 4 3 3 3 3 4 5 6 below
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
there exists an op-
E 1 1 1 2 3 4 5 6 7 7 timal alignment
D 2 2 1 2 3 4 5 6 7 8 whose last column
is a mismatch and
I 3 3 2 1 2 3 4 5 6 7
an optimal align-
T 4 4 3 2 2 2 3 4 5 6 ment whose last
I 5 5 4 3 3 3 3 4 5 6 column is an inser-
N 6 6 5 4 4 4 4 3 4 5 tion

G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
let’s consider a
I 3 3 2 1 2 3 4 5 6 7
mismatch
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

G
E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
we continue in a
I 3 3 2 1 2 3 4 5 6 7
similar fashion
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

G
E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

−G
C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

N−G
N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

I N−G
A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

T I N−G
T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

−T I N−G
S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

I −T I N−G
I S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

D I −T I N−G
D I S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

E D I −T I N−G
−D I S T A N C E
OutputAlignment(i, j)
if i = 0 and j = 0:
return
if backtrack(i, j) = ↓:
OutputAlignment(i − 1, j)
print A[i]
−
else if backtrack(i, j) = →:
OutputAlignment(i, j − 1)
print −
B[j]
else:
OutputAlignment(i − 1, j − 1)
print A[i]
B[j]
OutputAlignment(i, j)
if i = 0 and j = 0:
return
if i > 0 and D(i, j) = D(i − 1, j) + 1:
OutputAlignment(i − 1, j)
print A[i]
−
else if j > 0 and D(i, j) = D(i, j − 1) + 1:
OutputAlignment(i, j − 1)
print −
B[j]
else:
OutputAlignment(i − 1, j − 1)
print A[i]
B[j]
Comparing Genes, Proteins, Bioinformatics Algorithms textbook
and Genomes MOOC (a part at bioinformaticsalgorithms.org
of Bioinformatics Specialization (2nd two-volume edition was pub-
on Coursera) lished in 2015)

Handbook DOLE
100% (1)
Handbook DOLE
66 pages
Pipe Hanger Support Calculation Weight of Pipe - : Conclusion
No ratings yet
Pipe Hanger Support Calculation Weight of Pipe - : Conclusion
4 pages
lecture1-2
No ratings yet
lecture1-2
44 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
06DynamicProgrammingII 2x2
No ratings yet
06DynamicProgrammingII 2x2
17 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
05 Dynamic Programming i i
No ratings yet
05 Dynamic Programming i i
64 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Sequence Comparison Homology and Similarity
No ratings yet
Sequence Comparison Homology and Similarity
12 pages
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
lec-02
No ratings yet
lec-02
103 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
03 Med
No ratings yet
03 Med
52 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
2 EditDistance 2022
No ratings yet
2 EditDistance 2022
37 pages
Sequence Comparison and Alignment: Bioinformatics #4 IPB University
No ratings yet
Sequence Comparison and Alignment: Bioinformatics #4 IPB University
37 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Sequence Alignment
No ratings yet
Sequence Alignment
17 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Multiple Alignment PDF
No ratings yet
Multiple Alignment PDF
45 pages
W03_Pairwise
No ratings yet
W03_Pairwise
55 pages
Edit Dist
No ratings yet
Edit Dist
35 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
q1 Answer
No ratings yet
q1 Answer
2 pages
Lectures_9-12
No ratings yet
Lectures_9-12
39 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Week 4
No ratings yet
Week 4
38 pages
HW1 2014
No ratings yet
HW1 2014
2 pages
Lecture 9 and 10 Pair wise global Alignment.
No ratings yet
Lecture 9 and 10 Pair wise global Alignment.
27 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Levenshtein
No ratings yet
Levenshtein
14 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
4. Sequence Alignment
No ratings yet
4. Sequence Alignment
24 pages
Global Alignment: Ben Langmead
No ratings yet
Global Alignment: Ben Langmead
15 pages
Sequence Alignment Methods and Algorithms
No ratings yet
Sequence Alignment Methods and Algorithms
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Labwork8 Biomedical Informatics University of Ljubljana, Faculty of Electrical Engineering
No ratings yet
Labwork8 Biomedical Informatics University of Ljubljana, Faculty of Electrical Engineering
7 pages
Dynamic Programming
No ratings yet
Dynamic Programming
24 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Calculating Minimum Edit Distance
0% (1)
Calculating Minimum Edit Distance
52 pages
Defini'on of Minimum Edit Distance
No ratings yet
Defini'on of Minimum Edit Distance
52 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Tidu 466
No ratings yet
Tidu 466
8 pages
The Design of Intelligent Washing Machine Controller Based On FPGA
No ratings yet
The Design of Intelligent Washing Machine Controller Based On FPGA
4 pages
Image Processing Based Automatic-7488
No ratings yet
Image Processing Based Automatic-7488
13 pages
Lab 2
No ratings yet
Lab 2
16 pages
03 Divide and Conquer 3 Master Theorem
No ratings yet
03 Divide and Conquer 3 Master Theorem
76 pages
EE 374 Feedback Control Systems For TE EL TC
No ratings yet
EE 374 Feedback Control Systems For TE EL TC
71 pages
ABAP KT Tracker v1.0-2
No ratings yet
ABAP KT Tracker v1.0-2
119 pages
GentleLase-Pro-brochure
No ratings yet
GentleLase-Pro-brochure
4 pages
Rain Water Calculation
No ratings yet
Rain Water Calculation
2 pages
Task1 Oet
No ratings yet
Task1 Oet
4 pages
Solar Main Catalogue Nov - 22 - V5
No ratings yet
Solar Main Catalogue Nov - 22 - V5
24 pages
sika-reemat-ecr-csm-450 (1)
No ratings yet
sika-reemat-ecr-csm-450 (1)
2 pages
La Espada Del Tiempo Rick Riordan
No ratings yet
La Espada Del Tiempo Rick Riordan
320 pages
300 M Latrobe Special Alloys
No ratings yet
300 M Latrobe Special Alloys
4 pages
1415 Editable Form
No ratings yet
1415 Editable Form
11 pages
Implantable Medical Electronics
No ratings yet
Implantable Medical Electronics
8 pages
Medical Examination
No ratings yet
Medical Examination
2 pages
HEMS Plan Matrix
No ratings yet
HEMS Plan Matrix
3 pages
Basic Butter Cake
No ratings yet
Basic Butter Cake
2 pages
Week 3: Technology of Mouldmaking and Core Making, Moulding Processes
No ratings yet
Week 3: Technology of Mouldmaking and Core Making, Moulding Processes
4 pages
FM200 FM approval
No ratings yet
FM200 FM approval
2 pages
LM, Manual
No ratings yet
LM, Manual
25 pages
Synthesis of Procaine PDF
No ratings yet
Synthesis of Procaine PDF
3 pages
Prayer Strategies For Singles (PDFDrive)
100% (4)
Prayer Strategies For Singles (PDFDrive)
84 pages
Cenovnik 2022 NAJNOVIJI 09.12.2022
No ratings yet
Cenovnik 2022 NAJNOVIJI 09.12.2022
7 pages
Re A Children - 2000
No ratings yet
Re A Children - 2000
5 pages
Circuit Breaker Galilio
No ratings yet
Circuit Breaker Galilio
34 pages
Fight For Your Health, Byron J. Richards
No ratings yet
Fight For Your Health, Byron J. Richards
15 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
7 pages
Instant Download A Small Dose of Toxicology The Health Effects of Common Chemicals 1st Edition Steven G. Gilbert PDF All Chapters
100% (6)
Instant Download A Small Dose of Toxicology The Health Effects of Common Chemicals 1st Edition Steven G. Gilbert PDF All Chapters
81 pages
SBJ Production Books
No ratings yet
SBJ Production Books
10 pages
001 Brosur Accent M320
No ratings yet
001 Brosur Accent M320
3 pages
All-Alloys COPPER CROSS REF ASTM
100% (1)
All-Alloys COPPER CROSS REF ASTM
72 pages
RL Fact Sheet Saving Southern Residents 1 Revised
No ratings yet
RL Fact Sheet Saving Southern Residents 1 Revised
5 pages

04 Dynamic Programming 2 Editdistance

Uploaded by

04 Dynamic Programming 2 Editdistance

Uploaded by

Dynamic Programming:

Algorithmic Design and Techniques

BINGO: One of the genes in this

Hint 1: Cystic fibrosis involves sweet secretion with abnor-

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

Alignment game: remove all symbols from two

Alignment of two strings is a two-row matrix:

Alignment score: premium for every match (+1) and

Alignment score: premium for every match (+1) and

#matches − 𝜇 · #mismatches − 𝜎 · #indels

Matches in an alignment of two strings

Maximizing the length of a common

The minimum number of insertions, deletions

the total number of symbols in two strings=

the total number of symbols in two strings=

the total number of symbols in two strings=

the total number of symbols in two strings=

the total number of symbols in two strings=

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

Given strings A[1 . . . n] and B[1 . . . m], what is an op-

The last column of an optimal alignment is either

The last column of an optimal alignment is either

The last column of an optimal alignment is either

The last column of an optimal alignment is either

Let D(i, j) be the edit distance of an i-prefix A[1 . . . i]

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment

You might also like