0% found this document useful (0 votes)

78 views7 pages

What Is Dynamic Programming?

Dynamic programming algorithms break problems down into smaller subproblems that are solved just once and stored in a table. This avoids redundant calculations. For sequence alignment, dynamic programming constructs a matrix where each cell contains the score of the optimal alignment of subsequences up to that point. It fills the matrix from top-left to bottom-right, using previous cell scores. It then traces back through the matrix to find the highest scoring alignment. Dynamic programming guarantees an optimal alignment but the biological accuracy depends on the scoring system.

Uploaded by

Arun Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views7 pages

What Is Dynamic Programming?

Uploaded by

Arun Verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

What is dynamic programming?

Sean R. Eddy
Howard Hughes Medical Institute & Department of Genetics,
Washington University School of Medicine
4444 Forest Park Blvd., Box 8510
Saint Louis, Missouri 63108 USA
[email protected]
June 8, 2004

Sequence alignment methods often use something called a “dynamic program-

ming” algorithm. What is dynamic programming, and how does it work?

Dynamic programming algorithms are a very good place to start understand-

ing what’s really going on inside computational biology software. The heart of
many well-known programs is a dynamic programming algorithm, or a fast ap-
proximation of one, including sequence database search programs like BLAST
and FASTA, multiple sequence alignment programs like CLUSTALW, profile
search programs like HMMER, genefinding programs like GENSCAN, and even
RNA folding programs like MFOLD and phylogenetic inference programs like
PHYLIP.
Don’t expect much enlightenment from the etymology of the term “dynamic
programming”, though. Dynamic programming was formalized in the early 1950’s
by mathematician Richard Bellman, who was working at RAND Corporation on
optimal decision processes. He wanted to concoct an impressive name that would
shield his work from U.S. Secretary of Defense Charles Wilson, a man known
to be hostile to mathematics research. His work involved time series and plan-
ning – hence “dynamic” and “programming” (note, nothing particularly to do
with computer programming). Bellman especially liked “dynamic” because “it’s
impossible to use the word dynamic in a pejorative sense”; he figured dynamic
programming was “something not even a Congressman could object to” [1].

1
The best way to understand how dynamic programming works is to see an
example. Conveniently, optimal sequence alignment provides an example that is
both simple and biologically relevant.

The biological problem: pairwise sequence alignment

We have two DNA or protein sequences, and we want to infer if they are ho-
mologous or not. To do this, we will calculate a score that reflects how similar
the two sequences are (that is, how likely they are to be derived from a common
ancestor). Since sequences will differ not just by substitution, but also by inser-
tion and deletion, we want to optimally align the two sequences to maximize their
similarity.
Why do we need a fancy algorithm? Can we just score all possible alignments
2N
and pick the best one? This isn’t practical, because there are about √22πN different
alignments for two sequences of length N ; for two sequences of length 300, that’s
about 10179 different alignments.
Let’s set up the problem with some notation. Call the two sequences x and
y. They are of length M and N residues, respectively. Call the i’th residue of
x xi , and the j’th residue of y yj . We need some parameters for how to score
alignments: we’ll use a scoring matrix σ(a, b) for aligning two residues a, b to
each other (e.g. a 4x4 matrix for scoring any pair of aligned DNA nucleotides,
or simply a match and a mismatch score), and a gap penalty γ for every time we
introduce a gap character.
A dynamic programming algorithm consists of four parts: a recursive defini-
tion of the optimal score; a dynamic programming matrix for remembering opti-
mal scores of subproblems; a bottom-up approach of filling the matrix by solving
the smallest subproblems first; and a traceback of the matrix to recover the struc-
ture of the optimal solution that gave the optimal score. For pairwise alignment,
those steps are the following:

Recursive definition of the optimal alignment score.

There are only three ways the alignment can possibly end: 1) residues xM and
yN are aligned to each other; 2) residue xM is aligned to a gap character, and yN
appeared somewhere earlier in the alignment; or 3) residue yN is aligned to a gap
character and xM appeared earlier in the alignment. The optimal alignment will
be the highest scoring of these three cases.

2
Crucially, our scoring system allows us to define the score of these three cases
recursively, in terms of optimal alignments of the preceding subsequences. Let
S(i, j) be the score of the optimum alignment of sequence prefix x1 ..xi to prefix
y1 ..yj . The score for case (1) above is the score σ(xM , yN ) for aligning xM to yN ,
plus the score S(M − 1, N − 1) for an optimal alignment of everything else up to
this point. Case (2) is the gap penalty γ plus the score S(M − 1, N ); case (3) is
the gap penalty γ plus the score S(M, N − 1).
This works because the problem breaks into independently optimizable pieces,
since the scoring system is strictly local to one aligned column at a time. That
is, for instance, the optimal alignment of x1 ..xM −1 to y1 ..yN −1 is unaffected by
adding on the aligned residue pair xM , yN , and likewise, the score σ(xM , yN ) we
add on is independent of the previous optimal alignment.
So, to calculate the score of the three cases, we will need to know three more
alignment scores for three smaller problems:
S(M − 1, N − 1), S(M − 1, N ), S(M, N − 1).
And to calculate those, we need the solutions for nine still smaller problems:
S(M − 2, N − 2), S(M − 2, N − 1), S(M − 1, N − 2),
S(M − 2, N − 1), S(M − 2, N ), S(M − 1, N − 1),
S(M − 1, N − 2), S(M − 1, N − 1), S(M, N − 2).
and so on, until we reach tiny alignment subproblems with obvious solutions (the
score S(0, 0) for aligning nothing to nothing is zero).
Thus, we can write a general recursive definition of all our partial optimal
alignment scores S(i, j):


 S(i − 1, j − 1) + σ(xi , yj )

S(i, j) = max  S(i − 1, j) + γ (1)

S(i, j − 1) + γ.

The dynamic programming matrix.

The problem with a purely recursive alignment algorithm may already be ob-
vious, if you looked carefully at that list of nine smaller subproblems we’d be
solving in the second round of the top-down recursion. Some subproblems are
already occurring more than once, and this wastage gets exponentially worse as
we recurse deeper. Clearly, the sensible thing to do is to somehow keep track of
which subproblems we are already working on. This is the key difference between
dynamic programming and simple recursion: a dynamic programming algorithm

3
memorizes the solutions of optimal subproblems in an organized, tabular form (a
dynamic programming matrix), so that each subproblem is solved just once.
For the pairwise sequence alignment algorithm, the optimal scores S(i, j) are
tabulated in a two-dimensional matrix, with i running from 0..M and j running
from 0..N , as showing in Figure 1. As we calculate solutions to subproblems
S(i, j), their optimal alignment scores are stored in the appropriate (i, j) cell of
the matrix.

A bottom-up calculation to get the optimal score.

Once the dynamic matrix programming matrix S(i, j) is laid out – either on a
napkin or in your computer’s memory – it is easy to fill it in in a “bottom-up” way,
from smallest problems to progressively bigger problems. We know the boundary
conditions in the leftmost column and topmost row (S(0, 0) = 0; S(i, 0) = γi;
S(0, j) = γj): for example, the optimum alignment of the first i residues of
sequence x to nothing in sequence y has only one possible solution, which is to
align x1 ..xi to gap characters and pay i gap penalties. Once we’ve initialized the
top row and left column, we can fill in the rest of the matrix by using the recursive
definition of S(i, j) to calculate any cell where we already know the values we
need for the three adjoining cells to the upper left (i − 1, j − 1), above (i − 1, j)
and to the left (i, j − 1). There are several different ways we can do this; one is to
iterate two nested loops, i = 1 to M and j = 1 to N , so we’re filling in the matrix
left to right, top to bottom.

A traceback to get the optimal alignment.

Once we’re done filling in the matrix, the score of the optimal alignment of the
complete sequences is the last score we calculate, S(M, N ). We still don’t know
the optimal alignment itself, though. This, we recover by a recursive “traceback”
of the matrix. We start in cell M, N , determine which of the three cases we used
to get here (by repeating the same three calculations, for example), record that
choice as part of the alignment, and then follow the appropriate path for that case
back into the previous cell on the optimum path. We keep doing that, one cell in
the optimal path at a time, until we reach cell (0, 0), at which point the optimal
alignment is fully reconstructed.

4
Fine. But what do I really need to know?
Dynamic programming is guaranteed to give you a mathematically optimal (high-
est scoring) solution. Whether that corresponds to the biologically correct align-
ment is a problem for your scoring system, not for the algorithm.
Similarly, the dynamic programming algorithm will happily align unrelated
sequences. (The two sequences in Figure 1 might look well-aligned; but in fact,
they are unrelated, randomly generated DNA sequences!) The question of when a
score is statistically significant is also a separate problem, requiring clever statis-
tical theory.
Dynamic programming is surprisingly computationally demanding. You can
see that filling in the matrix takes time proportional to M N . Alignment of two
200-mers will take four times as long as two 100-mers. This is why there is so
much research devoted to finding good, fast approximations to dynamic program-
ming alignment, like the venerable workhorses BLAST and FASTA, and newer
programs like BLAT and FLASH.
Only certain scoring systems are amenable to dynamic programming. The
scoring system has to allow the optimal solution to be broken up into independent
parts, or else it can’t be dealt with recursively. The reason that programs use
simple alignment scoring systems is that we’re striking a reasonable compromise
between biological realism and efficient computation.

Further study.
To study a working example, you can download a small, bare-bones C imple-
mentation of this algorithm from http://blah-blah.blah. I used this C
program to generate Figure 1.

References
[1] R. E. Bellman. Eye of the Hurricane: An Autobiography. World Scientific,
1984.

5
Figure legend
The filled dynamic programming matrix for two randomly generated sequences,
x = TTCATA and y = TGCTCGTA, for a scoring system of +5 for a match, −2
for a mismatch, and −6 for each insertion or deletion. The cells in the optimum
path are shown in red. Arrowheads are “traceback pointers”, indicating which of
the three cases were optimal for reaching each cell. (Some cells can be reached by
two or three different optimal paths of equal score: whenever two or more cases
are equally optimal, dynamic programming implementations usually choose one
case arbitrarily. In this example, though, the optimal path is unique.)

6
dynamic programming matrix:
j (sequence y)
0 1 2 3 4 5 6 7 8 =N
T G C T C G T A
i 0 0 -6 -12 -18 -24 -30 -36 -42 -48

1 T -6 5 -1 -7 -13 -19 -25 -31 -37

2 T -12 -1 3 -3 -2 -8 -14 -20 -26

(sequence x)

3 C -18 -7 -3 8 2 3 -3 -9 -15

4 A -24 -13 -9 2 6 0 1 -5 -4

5 T -30 -19 -15 -4 7 4 -2 6 0

M= 6 A -36 -25 -21 -10 1 5 2 0 11

optimum alignment scores 11:

T - - T C A T A
T G C T C G T A
+5 -6 -6 +5 +5 -2 +5 +5

Unit Iv
No ratings yet
Unit Iv
98 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
38 pages
String Alignment Techniques
No ratings yet
String Alignment Techniques
76 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Sequence Alignment Techniques
No ratings yet
Sequence Alignment Techniques
49 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Dynamic Programming in Bioinformatics
No ratings yet
Dynamic Programming in Bioinformatics
32 pages
CS 838: Pairwise Sequence Alignment
No ratings yet
CS 838: Pairwise Sequence Alignment
18 pages
Lec09 dp1
No ratings yet
Lec09 dp1
9 pages
Dynamic Programming Approach
No ratings yet
Dynamic Programming Approach
32 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
27 pages
L-8 Global Alignment
No ratings yet
L-8 Global Alignment
19 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
Sequence Alignment Algorithms Overview
75% (4)
Sequence Alignment Algorithms Overview
37 pages
Bioinformatics Sequence Alignments
No ratings yet
Bioinformatics Sequence Alignments
37 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
3 pages
Dynamic Programming Lecture 1
No ratings yet
Dynamic Programming Lecture 1
12 pages
Lecture-7-Dynamic Programming Global-Sequence Alignment
No ratings yet
Lecture-7-Dynamic Programming Global-Sequence Alignment
31 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Lecture 1 DP
No ratings yet
Lecture 1 DP
55 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Three Steps in Dynamic Programming
No ratings yet
Three Steps in Dynamic Programming
7 pages
Factorial Design Selection
No ratings yet
Factorial Design Selection
12 pages
DP Rod Cutting Problem
No ratings yet
DP Rod Cutting Problem
13 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Dynamic Programming & Bioinformatics
No ratings yet
Dynamic Programming & Bioinformatics
10 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Dynamic Programming Lecture Notes
No ratings yet
Dynamic Programming Lecture Notes
8 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Heuristics Search Project - 01
No ratings yet
Heuristics Search Project - 01
15 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
Intro To Dynamic Programming
No ratings yet
Intro To Dynamic Programming
7 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
2NGS 01 Alignment
No ratings yet
2NGS 01 Alignment
18 pages
Dynamic Programming Guide
No ratings yet
Dynamic Programming Guide
8 pages
Hierarchical Clustering Implementation
No ratings yet
Hierarchical Clustering Implementation
34 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Sequence Alignment Methods Overview
No ratings yet
Sequence Alignment Methods Overview
57 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Pairwise Sequence Alignment Methods
No ratings yet
Pairwise Sequence Alignment Methods
22 pages
Introduction Dynamic Programming
No ratings yet
Introduction Dynamic Programming
52 pages
Algorithms and Data Structure
No ratings yet
Algorithms and Data Structure
29 pages
Dynamic Programming
No ratings yet
Dynamic Programming
5 pages
Offline DP
No ratings yet
Offline DP
3 pages
Dynamic Programming Algorithms
No ratings yet
Dynamic Programming Algorithms
6 pages
Lecture1 2
No ratings yet
Lecture1 2
44 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Sequence Alignment Algorithms in Bioinformatics
No ratings yet
Sequence Alignment Algorithms in Bioinformatics
95 pages
Lecture 5 Introduction Dynamic Programming
No ratings yet
Lecture 5 Introduction Dynamic Programming
52 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Global Sequence Alignment Guide
No ratings yet
Global Sequence Alignment Guide
24 pages
Linear Programming: Mohamed Kobeissi
No ratings yet
Linear Programming: Mohamed Kobeissi
63 pages
Daa r22 Unit-1 QB Answers Key
No ratings yet
Daa r22 Unit-1 QB Answers Key
38 pages
Video Motion Estimation Techniques
No ratings yet
Video Motion Estimation Techniques
11 pages
Stochastic Control Course Overview
No ratings yet
Stochastic Control Course Overview
2 pages
Regression Analysis Cheat Sheet
No ratings yet
Regression Analysis Cheat Sheet
9 pages
Algorithms: CSE 202 - Final Examination: March 2015
No ratings yet
Algorithms: CSE 202 - Final Examination: March 2015
5 pages
AIO2023
No ratings yet
AIO2023
20 pages
Bcs602 Tie Vtu
No ratings yet
Bcs602 Tie Vtu
8 pages
Ece-Am-2021-Ec 8553
No ratings yet
Ece-Am-2021-Ec 8553
3 pages
Transportation Problem
No ratings yet
Transportation Problem
21 pages
Asymptotic Analysis in CS 148
No ratings yet
Asymptotic Analysis in CS 148
41 pages
BIG M Method
No ratings yet
BIG M Method
8 pages
A2. A) The Steps of Methodology of Operation Research (OR) Is As Follows
No ratings yet
A2. A) The Steps of Methodology of Operation Research (OR) Is As Follows
7 pages
C++ Binary Tree Operations Manual
No ratings yet
C++ Binary Tree Operations Manual
41 pages
Lecture 7 - Perceptrons and Multi-Layer Feedforward Neural Networks Using Matlab Part 3
No ratings yet
Lecture 7 - Perceptrons and Multi-Layer Feedforward Neural Networks Using Matlab Part 3
6 pages
ECE 650 Midterm Exam Overview
No ratings yet
ECE 650 Midterm Exam Overview
2 pages
AI Production Scheduling Assignment
No ratings yet
AI Production Scheduling Assignment
2 pages
Data Structures & Algorithms Guide
No ratings yet
Data Structures & Algorithms Guide
2 pages
Uninformed Search: BFS and DFS Explained
No ratings yet
Uninformed Search: BFS and DFS Explained
41 pages
Attention
No ratings yet
Attention
12 pages
Hw1 Solutions
No ratings yet
Hw1 Solutions
3 pages
Generate and Verify PN Sequences
100% (3)
Generate and Verify PN Sequences
8 pages
Lecture 06 - Binary Search Tree (BST) - Design Analysis of Algorithm
No ratings yet
Lecture 06 - Binary Search Tree (BST) - Design Analysis of Algorithm
30 pages
Part C-Instructional Materials
No ratings yet
Part C-Instructional Materials
11 pages
100 Most Important DSA Question List
No ratings yet
100 Most Important DSA Question List
10 pages
Numerical Method 1
No ratings yet
Numerical Method 1
3 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
62 pages
Neural Networks PDF
No ratings yet
Neural Networks PDF
1 page
Digital Signal Processing Course Guide
No ratings yet
Digital Signal Processing Course Guide
64 pages

What Is Dynamic Programming?

Uploaded by

What Is Dynamic Programming?

Uploaded by

What is dynamic programming?

Sequence alignment methods often use something called a “dynamic program-

Dynamic programming algorithms are a very good place to start understand-

The biological problem: pairwise sequence alignment

Recursive definition of the optimal alignment score.

The dynamic programming matrix.

A bottom-up calculation to get the optimal score.

A traceback to get the optimal alignment.

1 T -6 5 -1 -7 -13 -19 -25 -31 -37

2 T -12 -1 3 -3 -2 -8 -14 -20 -26

5 T -30 -19 -15 -4 7 4 -2 6 0

M= 6 A -36 -25 -21 -10 1 5 2 0 11

optimum alignment scores 11:

You might also like