0% found this document useful (0 votes)
45 views

04 Dynamic Programming 2 Editdistance

The document discusses identifying the cystic fibrosis gene. It describes how in the late 1980s, biologists narrowed the search for the gene to a region of chromosome 7 containing many genes. They then searched for genes in this region that were similar to known genes responsible for secretion, leading to the identification of a gene similar to ATP binding proteins as the cystic fibrosis gene.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

04 Dynamic Programming 2 Editdistance

The document discusses identifying the cystic fibrosis gene. It describes how in the late 1980s, biologists narrowed the search for the gene to a region of chromosome 7 containing many genes. They then searched for genes in this region that were similar to known genes responsible for secretion, leading to the identification of a gene similar to ATP binding proteins as the cystic fibrosis gene.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Dynamic Programming:

String Comparison

Pavel Pevzner
Department of Computer Science and Engineering
University of California, San Diego

Algorithmic Design and Techniques


Algorithms and Data Structures
Cystic Fibrosis
Cystic fibrosis (CF): An often
fatal disease which affects the
respiratory system and
produces an abnormally large
amount of mucus.
Mucus is a slimy material
that coats epithelial
surfaces and is secreted
into fluids such as saliva.
Approximately 1 in 25 Humans
Carry a Faulty CF Gene
When BOTH parent carry a
faulty gene, there is a 25%
chance that their child will
have cystic fibrosis.
In the early 1980s biologists
hypothesized that CF is
caused by mutations in an
unidentified gene.
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was responsible for CF.
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Where Is the Cystic Fibrosis Gene?
In the late 1980s, biologists narrowed
the search for the CF gene to a million
nucleotide long region on chromosome
7. chromosome 7
However, this regions contained many
genes and it was unclear which of them
was
Hintresponsible
1: Cystic for CF. involves sweet secretion with abnor-
fibrosis
mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Hint 3: Should we search for genes in this region that are
similar to known genes responsible for secretion?
Identifying the Cystic Fibrosis Gene

BINGO: One of the genes in this


region was similar to ATP binding
proteins that act as transport
channels responsible for secretion.

Hint 1: Cystic fibrosis involves sweet secretion with abnor-


mally high sodium levels
Hint 2: By that time biologists already knew the sequences
of some genes responsible for secretion, e.g., ATP binding
proteins act as transport channels responsible for secretion
Hint 3: Should we search for genes in this region that are
similar to known genes responsible for secretion?
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment


The Alignment Game

A T G T T A T A
A T C G T C C

Alignment game: remove all symbols from two


strings in such a way that the number of points
is maximized:
Remove the 1st symbol from both strings:
1 point if the symbols match,
0 points if they don’t match
Remove the 1st symbol from one of the strings:
0 points
The Alignment Game

A T G T T A T A
A T C G T C C
+1
The Alignment Game

A T G T T A T A
A T C G T C C
+1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T C C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C - C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C - C
+1 +1 +1 +1
The Alignment Game

A T - G T T A T A
A T C G T - C - C
+1 +1 +1 +1 =4
Sequence Alignment

A T - G T T A T A
A T C G T - C - C

Alignment of two strings is a two-row matrix:


1st row: symbols of the 1st string (in order)
interspersed by “–”
2nd row: symbols of the 2nd string (in order)
interspersed by “–”
Sequence Alignment
matches

A T - G T T A T A
A T C G T - C - C
Sequence Alignment
matches mismatches

A T - G T T A T A
A T C G T - C - C
Sequence Alignment
matches insertions deletions mismatches

A T - G T T A T A
A T C G T - C - C
Alignment Score

A T - G T T A T A
A T C G T - C - C

Alignment score: premium for every match (+1) and


penalty for every mismatch (−𝜇), indel (−𝜎).
Alignment Score

A T - G T T A T A
A T C G T - C - C
+1 +1 -1 +1 +1 -1 +0 -1 +0 =1

Alignment score: premium for every match (+1) and


penalty for every mismatch (−𝜇), indel (−𝜎).

Example: 𝜇 = 0 and 𝜎 = 1
Alignment Score

#matches − 𝜇 · #mismatches − 𝜎 · #indels

Optimal alignment
Input: Two strings, mismatch penalty 𝜇,
and indel penalty 𝜎.
Output: An alignment of the strings
maximizing the score.
Common Subsequence

A T - G T T A T A
A T C G T - C - C

Matches in an alignment of two strings


(ATGT) form their common subsequence
Longest common subsequence
Input: Two strings.
Output: A longest common subsequence of
these strings.
Longest common subsequence
Input: Two strings.
Output: A longest common subsequence of
these strings.

Maximizing the length of a common


subsequence corresponds to maximizing the
score of an alignment with 𝜇 = 𝜎 = 0.
Edit distance
Input: Two strings.
Output: The minimum number of
operations (insertions, deletions,
and substitutions of symbols) to
transform one string into another.
Edit distance
Input: Two strings.
Output: The minimum number of
operations (insertions, deletions,
and substitutions of symbols) to
transform one string into another.

The minimum number of insertions, deletions


and mismatches in an alignment of two
strings (among all possible alignments).
Example

E D I − T I N G −
− D I S T A N C E
Example

mismatches
matches

E D I − T I N G −
− D I S T A N C E

deletion insertions
E D I − T I N G −
− D I S T A N C E
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=


E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
+2·#mismatches
+1·#insertions
+1·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
+2·#matches −1·#insertions
+2·#mismatches −1·#deletions
=
+1·#insertions +2·#mismatches
+1·#deletions +2·#insertions
+2·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
2 · AlignmentScore
+2·#matches −1·#insertions
(𝜇 = 0, 𝜎 = 1/2)
+2·#mismatches −1·#deletions
= +
+1·#insertions +2·#mismatches
+1·#deletions +2·#insertions 2 · EditDistance
+2·#deletions
E D I − T I N G −
− D I S T A N C E

the total number of symbols in two strings=

+2·#matches
minimizing edit distance
2 · AlignmentScore
+2·#matches −1·#insertions
= (𝜇 = 0, 𝜎 = 1/2)
+2·#mismatches −1·#deletions
= +
maximizing
+1·#insertions alignment score
+2·#mismatches
+1·#deletions +2·#insertions 2 · EditDistance
+2·#deletions
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment


A[1 . . . i]
B[1 . . . j]

Given strings A[1 . . . n] and B[1 . . . m], what is an op-


timal alignment (an alignment that results in minimum
edit distance) of an i-prefix A[1 . . . i] of the first string
and a j-prefix B[1 . . . j] of the second string?
A[1 . . . i]
B[1 . . . j]

The last column of an optimal alignment is either


insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]

A[1 . . . i]
B[1 . . . j]

The last column of an optimal alignment is either


an insertion,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j]

The last column of an optimal alignment is either


an insertion,
a deletion,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
B[1 . . . j − 1] B[j]
mismatch

The last column of an optimal alignment is either


an insertion,
a deletion,
a mismatch,
insertion A[1 . . . i] −
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]
The last column of an optimal alignment is either
an insertion,
a deletion,
a mismatch,
or a match.
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]
The last column of an optimal alignment is either
an insertion,
a deletion,
a mismatch,
or a match.
What is left (after the removal of the last column) is an
optimal alignment of the corresponding two prefixes.
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]

Let D(i, j) be the edit distance of an i-prefix A[1 . . . i]


and a j-prefix B[1 . . . j].
insertion A[1 . . . i] −
+1
B[1 . . . j − 1] B[j]
deletion A[1 . . . i − 1] A[i]
+1
A[1 . . . i] B[1 . . . j] −
B[1 . . . j] A[i] ̸= B[j] A[1 . . . i − 1] A[i]
+1
B[1 . . . j − 1] B[j]
mismatch
A[i] = B[j] A[1 . . . i − 1] A[i]
match B[1 . . . j − 1] B[j]




⎪D(i, j − 1) + 1

⎨D(i − 1, j) + 1
D(i, j) = min


⎪D(i − 1, j − 1) + 1 if A[i] ̸= B[j]
⎩D(i − 1, j − 1)

if A[i] = B[j]
j
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
i T 4
I 5
N 6
G 7
comparing A[1 . . . n] = EDITING
and B[1 . . . m] = DISTANCE
j
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
i T 4 D(i, j)

I 5
N 6
G 7
comparing A[1 . . . n] = EDITING
and B[1 . . . m] = DISTANCE
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0
E 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 ?
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(1, 1) = min{D(1, 0) + 1, D(0, 1) + 1, D(0, 0) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 ?
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(1, 1) = min{2, 2, 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D2 2 ?
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D(2, 1) = min{D(2, 0) + 1, D(1, 1) + 1, D(1, 0)}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 ?
T 4 4
I 5 5
N 6 6
G 7 7
D(3, 1) = min{D(3, 0) + 1, D(2, 1) + 1, D(2, 0) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 2
T 4 4
I 5 5
N 6 6
G 7 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 ?
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D(1, 2) = min{D(1, 1) + 1, D(0, 2) + 1, D(0, 1) + 1}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 ?
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D(1, 1) = min{2, 3, 2}
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2
D 2 2 1
I 3 3 2
T 4 4 3
I 5 5 4
N 6 6 5
G 7 7 6
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2
D 2 2 1 2
I 3 3 2 1
T 4 4 3 2
I 5 5 4 3
N 6 6 5 4
G 7 7 6 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
EditDistance(A[1 . . . n], B[1 . . . m])
D(i, 0) ← i and D(0, j) ← j for all i, j
for j from 1 to m:
for i from 1 to n:
insertion ← D(i, j − 1) + 1
deletion ← D(i − 1, j) + 1
match ← D(i − 1, j − 1)
mismatch ← D(i − 1, j − 1) + 1
if A[i] = B[j]:
D(i, j) ← min(insertion, deletion, match)
else:
D(i, j) ← min(insertion, deletion, mismatch)
return D(n, m)
Outline

1 The Alignment Game

2 Computing Edit Distance

3 Reconstructing an Optimal Alignment


Optimal Alignment
We have computed the edit distance,
but how can we find an optimal
alignment?
Optimal Alignment
We have computed the edit distance,
but how can we find an optimal
alignment?
The backtracking pointers that we
stored will help us to reconstruct an
optimal alignment.
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
D 2
I 3
T 4
I 5
N 6
G 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E
D
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −
D I
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−
D I S
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D
D I S −
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I
D I S −−
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I T
D I S −−T
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
any path from E 1
(0, 0) to (i, j) D 2
spells an align-
I 3
ment of prefixes
A[1 . . . i] and T 4
B[1 . . . j] I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
the constructed
path corresponds D 2
to distance 8 and I 3
is not optimal T 4
(edit distance is 5)
I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0
E 1
to construct an
optimal align- D 2
ment we will use I 3
the backtracking
T 4
pointers
I 5
N 6
G 7

E −−D I T I N−G
D I S −−T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
the edit distance
I 3 3 2 1 2 3 4 5 6 7
is 5
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7 we arrived to the
D 2 2 1 2 3 4 5 6 7 8 bottom right cell
by moving along
I 3 3 2 1 2 3 4 5 6 7
the backtracking
T 4 4 3 2 2 2 3 4 5 6 pointers shown
I 5 5 4 3 3 3 3 4 5 6 below
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
there exists an op-
E 1 1 1 2 3 4 5 6 7 7 timal alignment
D 2 2 1 2 3 4 5 6 7 8 whose last column
is a mismatch and
I 3 3 2 1 2 3 4 5 6 7
an optimal align-
T 4 4 3 2 2 2 3 4 5 6 ment whose last
I 5 5 4 3 3 3 3 4 5 6 column is an inser-
N 6 6 5 4 4 4 4 3 4 5 tion

G 7 7 6 5 5 5 5 4 4 5
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
let’s consider a
I 3 3 2 1 2 3 4 5 6 7
mismatch
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

G
E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
we continue in a
I 3 3 2 1 2 3 4 5 6 7
similar fashion
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

G
E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

−G
C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

N−G
N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

I N−G
A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

T I N−G
T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

−T I N−G
S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

I −T I N−G
I S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

D I −T I N−G
D I S T A N C E
D I S T A N C E
0 1 2 3 4 5 6 7 8
0 0 1 2 3 4 5 6 7 8
E 1 1 1 2 3 4 5 6 7 7
D 2 2 1 2 3 4 5 6 7 8
I 3 3 2 1 2 3 4 5 6 7
T 4 4 3 2 2 2 3 4 5 6
I 5 5 4 3 3 3 3 4 5 6
N 6 6 5 4 4 4 4 3 4 5
G 7 7 6 5 5 5 5 4 4 5

E D I −T I N−G
−D I S T A N C E
OutputAlignment(i, j)
if i = 0 and j = 0:
return
if backtrack(i, j) = ↓:
OutputAlignment(i − 1, j)
print A[i]

else if backtrack(i, j) = →:
OutputAlignment(i, j − 1)
print −
B[j]
else:
OutputAlignment(i − 1, j − 1)
print A[i]
B[j]
OutputAlignment(i, j)
if i = 0 and j = 0:
return
if i > 0 and D(i, j) = D(i − 1, j) + 1:
OutputAlignment(i − 1, j)
print A[i]

else if j > 0 and D(i, j) = D(i, j − 1) + 1:
OutputAlignment(i, j − 1)
print −
B[j]
else:
OutputAlignment(i − 1, j − 1)
print A[i]
B[j]
Comparing Genes, Proteins, Bioinformatics Algorithms textbook
and Genomes MOOC (a part at bioinformaticsalgorithms.org
of Bioinformatics Specialization (2nd two-volume edition was pub-
on Coursera) lished in 2015)

You might also like