Dynamic Programming
C Patvardhan
Professor, Electrical Engineering
Dayalbagh Educational Institute, Agra
Algorithm
g types
yp
Algorithm types we will consider include:
Simple recursive algorithms
Backtracking algorithms
Di id andd conquer algorithms
Divide l ith
Dynamic programming algorithms
Greedy algorithms
Branch and bound algorithms
Brute force algorithms
g
Randomized algorithms
Countingg coins
To find the minimum number of US coins to make any amount,
the greedy method always works
At each step, just choose the largest coin that does not overshoot the
desired amount: 31¢=25
The greedy method would not work if we did not have 5¢ coins
For 31 cents, the greedy method gives seven coins (25+1+1+1+1+1+1),
but we can do it with four (10+10+10+1)
( )
The greedy method also would not work if we had a 21¢ coin
For 63 cents, the greedy method gives six coins (25+25+10+1+1+1), but
we can do it with three (21+21+21)
How can we find the minimum number of coins for any given
coin set?
Coin set for examples
p
For the following examples, we will assume coins in the
f ll i denominations:
following d i i
1¢ 5¢ 10¢ 21¢ 25¢
W ’ll use 63¢ as our goall
We’ll
This example is taken from:
Data Structures & Problem Solving using Java by Mark Allen Weiss
A simple
p solution
We always need a 1¢ coin, otherwise no solution exists for
making one cent
To make K cents:
If there is a K
K-cent
cent coin
coin, then that one coin is the minimum
Otherwise, for each value i < K,
Find the minimum number of coins needed to make i cents
Find
Fi d the
th minimum
i i number
b off coins
i needed
d d tto make
k K - i cents
t
Choose the i that minimizes this sum
This algorithm
g can be viewed as divide-and-conquer,
q , or as brute
force
This solution is very recursive
It requires exponential work
It is infeasible to solve for 63¢
Another solution
We can reduce the problem recursively by choosing the
first coin
coin, and solving for the amount that is left
For 63¢:
One 1¢ coin plus the best solution for 62¢
One 5¢ coin plus the best solution for 58¢
One 10¢ coin plus the best solution for 53¢
One 21¢ coin plus the best solution for 42¢
One 25¢ coin plus the best solution for 38¢
Choose the best solution from among the 5 given above
Instead of solving 62 recursive problems, we solve 5
This is still a very expensive algorithm
A dynamic
y pprogramming
g g solution
Idea: Solve first for one cent, then two cents, then three cents,
etc up to the desired amount
etc.,
Save each answer in an array !
For each new amount N, compute
p all the ppossible ppairs of
previous answers which sum to N
For example, to find the solution for 13¢,
First, solve for all of 1¢,
First 1¢ 2¢,
2¢ 3¢,
3¢ ..., 12¢
Next, choose the best solution among:
Solution for 1¢ + solution for 12¢
Solution for 2¢ + solution for 11¢
Solution for 3¢ + solution for 10¢
Solution for 4¢ + solution for 9¢
Solution for 5¢ + solution for 8¢
Solution for 6¢ + solution for 7¢
Example
p
Suppose coins are 1¢, 3¢, and 4¢
There’s
Th ’ only
l one way to t makek 1¢ (one
( coin)
i )
To make 2¢, try 1¢+1¢ (one coin + one coin = 2 coins)
To make 3¢, just use the 3¢ coin (one coin)
To make 4¢, just use the 4¢ coin (one coin)
To make 5¢, try
1¢¢ + 4¢
¢ ((1 coin + 1 coin = 2 coins))
2¢ + 3¢ (2 coins + 1 coin = 3 coins)
The first solution is better, so best solution is 2 coins
To make 6¢, try
1¢ + 5¢ (1 coin + 2 coins = 3 coins)
2¢ + 4¢ (2 coins + 1 coin = 3 coins)
3¢ + 3¢ (1 coin + 1 coin = 2 coins) – best solution
Etc.
How ggood is the algorithm?
g
The first algorithm is recursive, with a branching factor
of up to 62
Possibly the average branching factor is somewhere around
half of that (31)
The algorithm takes exponential time, with a large base
The second algorithm is much better—it has a
b
branching
hi factor
f t off 5
This is exponential time, with base 5
The dynamic programming algorithm is O(NO(N*K)
K), where
N is the desired amount and K is the number of different
kinds of coins
Comparison
p with divide-and-conquer
q
Divide-and-conquer algorithms split a problem into separate
subproblems solve the subproblems,
subproblems, subproblems and combine the results for
a solution to the original problem
Example: Quicksort
Example: Mergesort
Example: Binary search
Divide-and-conquer algorithms can be thought of as top-down
algorithms
In contrast, a dynamic programming algorithm proceeds by
solving small problems, then combining them to find the solution
to larger problems
Dynamic programming can be thought of as bottom-up
Example
p 2: Binomial Coefficients
(x + y)2 = x2 + 2xy + y2, coefficients are 1,2,1
( + y))3 = x3 + 3x
(x 3 2y + 33xy2 + y3, coefficients
ffi i are 1,3,3,1
1331
(x + y)4 = x4 + 4x3y + 6x2y2 + 4xy3 + y4,
coefficients are 1,4,6,4,1
(x + y)5 = x5 + 5x4y + 10x3y2 + 10x2y3 + 5xy4 + y5,
coefficients are 1,5,10,10,5,1
The n+1 coefficients can be computed for (x + y)n according to
the formula c(n, i) = n! / (i! * (n – i)!)
for each of i = 0..n
The repeated computation of all the factorials gets to be expensive
We can use dynamic programming to save the factorials as we go
Solution byy dynamic
y programming
p g g
n c(n,0) c(n,1) c(n,2) c(n,3) c(n,4) c(n,5) c(n,6)
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
Each row depends only on the preceding row
Only linear space and quadratic time are needed
This algorithm is known as Pascal’s Triangle
The pprinciple
p of optimality,
p y, I
Dynamic programming is a technique for finding an
optimal solution
The principle of optimality applies if the optimal
solution to a problem always contains optimal solutions
to all subproblems
Example: Consider the problem of making N¢ with the
fewest number of coins
Either there is an N¢ coin, or
The set of coins making up an optimal solution for N¢ can be
divided into two nonempty subsets, n1¢ and n2¢
If either subset, n1¢ or n2¢, can be made with fewer coins, then clearly
N¢ can be made with fewer coins,
coins hence solution was not optimal
The pprinciple
p of optimality,
p y, II
The principle of optimality holds if
Every optimal solution to a problem contains...
contains
...optimal solutions to all subproblems
The principle of optimality does not say
If you have
h optimal
i l solutions
l i to all
ll subproblems...
b bl
...then you can combine them to get an optimal solution
Example: In US coinage,
The optimal solution to 7¢ is 5¢ + 1¢ + 1¢, and
The optimal solution to 6¢ is 5¢ + 1¢, but
The optimal solution to 13¢ is not 5¢ + 1¢ + 1¢ + 5¢ + 1¢
But there is some way of dividing up 13¢ into subsets with
optimal solutions (say, 11¢ + 2¢) that will give an optimal
solution for 13¢
Hence, the principle of optimality holds for this problem
Longest
g simple
p ppath
B
1 2
Consider the following graph: 3
1 4
A C D
The longest
g simplep ppath (p y ) from A
(path not containingg a cycle)
to D is A B C D
However, the subpath A B is not the longest simple path
f
from A to B (A C B is
i longer)
l )
The principle of optimality is not satisfied for this problem
Hence the longest simple path problem cannot be solved by
Hence,
a dynamic programming approach
The 0-1 knapsack
p pproblem
A thief breaks into a house, carrying a knapsack...
He can carry up tto 25 pounds
H d off lloott
He has to choose which of N items to steal
Each item has some weight and some value
“0-1” because each item is stolen (1) or not stolen (0)
He has to select the items to steal in order to maximize the value of his
loot, but cannot exceed 25 pounds
A greedy algorithm does not find an optimal solution
A dynamic programming algorithm works well
Thi is
This i similar
i il to, but
b not identical
id i l to, theh coins
i problem
bl
In the coins problem, we had to make an exact amount of change
In the 0-1 knapsack
p pproblem, we can’t exceed the weight
g limit, but the
optimal solution may be less than the weight limit
The dynamic programming solution is similar to that of the coins problem
Comments
Dynamic programming relies on working “from the bottom up”
and saving the results of solving simpler problems
These solutions to simpler problems are then used to compute the solution
to more complex problems
Dynamic programming solutions can often be quite complex and
tricky
Dynamic programming is used for optimization problems,
problems
especially ones that would otherwise take exponential time
Only problems that satisfy the principle of optimality are suitable for
d
dynamic
i programmingi solutions
l ti
Since exponential time is unacceptable for all but the smallest
pproblems,, dynamic
y programming
p g g is sometimes essential
Longest Common Subsequence
Problem: Given 2 sequences, X = 〈x1,...,xm〉 and
Y = 〈y1,...,yn〉, find a common subsequence whose
length is maximum.
springtime ncaa tournament basketball
printing north carolina krzyzewski
Subsequence need not be consecutive, but must be in order.
Other sequence questions
Edit distance: Given 2 sequences, X = 〈x1,...,xm〉
and Y = 〈y1,,...,y
,yn〉, what is the minimum number of
deletions, insertions, and changes that you must do
g one to another?
to change
Protein sequence alignment: Given a score matrix
on amino acid pairs
pairs, s(a,b) a b∈{Λ} ∪A,
s(a b) for a,b∈{Λ} ∪A
and 2 amino acid sequences, X = 〈x1,...,xm〉∈Am
and Y = 〈y1,...,yyn〉∈An, find the alignment with
lowest score…
More problems
Optimal BST: Given sequence K = k1 < k2 <··· < kn
of n sorted keys,
y , with a search pprobability
y pi for
each key ki, build a binary search tree (BST) with
p
minimum expected search cost.
Matrix chain multiplication: Given a sequence of
matrices A1 A2 … An, with Ai of dimension mi×ni,
insert parenthesis to minimize the total number of
scalar multiplications.
multiplications
Minimum convex decomposition of a polygon,
H d
Hydrogen placement
l t in
i protein
t i structures,
t t …
Dynamic Programming
Dynamic Programming is an algorithm design technique for
optimization problems: often minimizing or maximizing.
Like divide and conquer,
conquer DP solves problems by combining
solutions to subproblems.
Unlike divide and conquer, subproblems are not independent.
» S
Subproblems
b bl may share
h subsubproblems,
b b bl
» However, solution to one subproblem may not affect the solutions to other
subproblems of the same problem. (More on this later.)
DP reduces
d computation
i bby
» Solving subproblems in a bottom-up fashion.
» Storing solution to a subproblem the first time it is solved.
» Looking up the solution when subproblem is encountered again.
Key: determine structure of optimal solutions
Steps in Dynamic Programming
1. Characterize structure of an optimal solution.
2 Define value of optimal solution recursively.
2. recursively
3. Compute optimal solution values either top-
down with caching or bottom-up
bottom up in a table.
table
4. Construct an optimal solution from computed
values.
l
We’ll study these with the help of examples.
Longest Common Subsequence
Problem: Given 2 sequences, X = 〈x1,...,xm〉 and
Y = 〈y1,...,yn〉, find a common subsequence whose
length is maximum.
springtime ncaa tournament basketball
printing north carolina snoeyink
Subsequence need not be consecutive, but must be in order.
Naïve Algorithm
For every subsequence of X, check whether it’s a
q
subsequence of Y .
Time: Θ(n2m).
» 2m subsequences of X to check
check.
» Each subsequence takes Θ(n) time to check:
scan Y for first letter
letter, for second,
second and so on.
on
Optimal Substructure
Theorem
Let Z = 〈〈z1, . . . , zk〉 be anyy LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Notation:
prefix Xi = 〈x1,...,xi〉 is the first i letters of X.
This says what any longest common subsequence must look like;
do you believe it?
Optimal Substructure
Theorem
Let Z = 〈〈z1, . . . , zk〉 be anyy LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Proof: (case 1: xm = yn)
Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn
to the end. Therefore,
((1)) longest
g common subsequence
q ((LCS)) Z must end in xm = yn.
(2) Zk-1 is a common subsequence of Xm-1 and Yn-1, and
(3) there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.
Optimal Substructure
Theorem
Let Z = 〈〈z1, . . . , zk〉 be anyy LCS of X and Y .
1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1.
2. If xm ≠ yn, then either zk ≠ xm and Z is an LCS of Xm-1 and Y .
3
3. or zk ≠ yn and Z is an LCS of X and Yn-1.
Proof: (case 2: xm ≠ yn, and zk ≠ xm)
Since Z does not end in xm,
((1)) Z is a common subsequence
q of Xm-1
m 1 and Y,, and
(2) there is no longer CS of Xm-1 and Y, or Z would not be an LCS.
Recursive Solution
Define c[i, j] = length of LCS of Xi and Yj .
We want c[m,n].
[ ]
This gives a recursive algorithm and solves the problem.
But does it solve it well?
Recursive Solution
c[springtime printing]
c[springtime,
c[springtim, printing] c[springtime, printin]
[springti, printing] [springtim, printin] [springtim, printin] [springtime, printi]
[springt, printing] [springti, printin] [springtim, printi] [springtime, print]
Recursive Solution
p r i n t i n g
•Keep ,β] in a
p track of c[[α,β
table of nm entries: s
p
•top/down
r
•bottom/up i
n
g
t
i
m
e
Computing the length of an LCS
LCS-LENGTH (X, Y)
1. m ← length[X]
2. n ← length[Y]
3. for i ← 1 to m
4. do c[i, 0] ← 0
5. for j ← 0 to n
6. do c[0, j ] ← 0 b[i, j ] points to table entry
7 for i ← 1 to m
7.
8. do for j ← 1 to n
whose subproblem we used
9. do if xi = yj in solving LCS of Xi
10
10. c[i j ] ← c[i−1,
then c[i, 1 j−1] + 1 and Yj.
11. b[i, j ] ← “ ”
12. else if c[i−1, j ] ≥ c[i, j−1]
13. then c[i, [ − 1,, j ]
[ , j ] ← c[i c[m,n] contains the length
14. b[i, j ] ← “↑” of an LCS of X and Y.
15. else c[i, j ] ← c[i, j−1]
16. b[i, j ] ← “←” Time: O(mn)
17. return c and b
Constructing an LCS
PRINT-LCS (b, X, i, j)
1. if i = 0 or j = 0
2. then return
3. if b[i, j ] = “ ”
4. then PRINT-LCS(b, X, i−1, j−1)
5
5. print
i t xi
6. elseif b[i, j ] = “↑”
7. then PRINT-LCS(b, X, i−1, j)
8. else PRINT-LCS(b, X, i, j−1)
•Initial call is PRINT-LCS
PRINT LCS (b
(b, X,m,
X m n).
n)
•When b[i, j ] = , we have extended LCS by one character. So
LCS = entries with in them.
•Time:
i (
O(m+n) )
Steps in Dynamic Programming
1. Characterize structure of an optimal solution.
2 Define value of optimal solution recursively.
2. recursively
3. Compute optimal solution values either top-
down with caching or bottom-up
bottom up in a table.
table
4. Construct an optimal solution from computed
values.
l
We’ll study these with the help of examples.
Optimal Binary Search Trees
Problem
» Given sequence K = k1 < k2 <··· < kn of n sorted keys,
with a search probability pi for each key ki.
» Want to build a binary search tree (BST)
with minimum expected search cost. cost
» Actual cost = # of items examined.
» For keyy ki, cost = depth
p T((ki))+1,, where depth
p T((ki) = depth
p of ki in
BST T .
Expected Search Cost
E[search cost in T ]
n
= ∑ (depth T (ki ) + 1) ⋅ pi
i =1
n n
= ∑ depth
d th T (ki ) ⋅ pi + ∑ pi
i =1 i =1
n Sum of probabilities is 1.
= 1 + ∑ depth T (ki ) ⋅ pi (15.16)
(15 16)
i =1
Example
Consider 5 keys with these search probabilities:
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2 i depthT(ki) depthT(ki)·pi
1 1 0.25
2 0 0
k1 k4 3 2 0.1
4 1 0.2
5 2 0.6
1.15
k3 k5
Therefore, E[search cost] = 2.15.
Example
p1 = 0.25, p2 = 0.2, p3 = 0.05, p4 = 0.2, p5 = 0.3.
k2 i depthT(ki) depthT(ki)·pi
1 1 0.25
2 0 0
k1 k5 3 3 0.15
4 2 0.4
5 1 0.3
03
1.10
k4
Therefore, E[search cost] = 2.10.
k3 This tree turns out to be optimal for this set of keys.
keys
Example
Observations:
» Optimal
Op BST
S mayy not have smallest height.
g
» Optimal BST may not have highest-probability key at
root.
Build by exhaustive checking?
» Construct each n-node BST
BST.
» For each,
assign keys and compute expected search cost.
» But there are Ω(4n/n3/2) different BSTs with n nodes.
Optimal Substructure
Any subtree of a BST contains keys in a contiguous range
ki, ..., kj for some 1 ≤ i ≤ j ≤ n.
T′
If T is an optimal BST and
T contains subtree T′ with keys ki, ... ,kkj ,
then T′ must be an optimal BST for keys ki, ..., kj.
Proof: Cut and paste.
Optimal Substructure
One of the keys in ki, …,kj, say kr, where i ≤ r ≤ j,
must be the root of an optimal subtree for these keys.
Left subtree of kr contains ki,...,kr−1.
kr
Right subtree of kr contains kr+1, ...,kj.
ki kr-1 kr+1 kj
To find an optimal BST:
» Examine all candidate roots kr , for i ≤ r ≤ j
» Determine all optimal BSTs containing ki,...,kr−1 and
containingg kr+1,,...,k
, j
Recursive Solution
Find optimal BST for ki,...,kj, where i ≥ 1, j ≤ n, j ≥ i−1.
When j = i−1, the tree is empty.
Define e[i, j ] = expected search cost of optimal BST for ki,...,kj.
If j = i−1, then e[i, j ] = 0.
If j ≥ i,
» Select a root kr, for some i ≤ r ≤ j .
» Recursively make an optimal BSTs
• for ki,..,kr−1 as the left subtree, and
• for kr+1,..,kj as the right subtree.
Recursive Solution
When the OPT subtree becomes a subtree of a node:
» Depth of every node in OPT subtree goes up by 1.
» Expected search cost increases by
from (15.16)
(15 16)
If kr is the root of an optimal BST for ki,..,kj :
» e[i, ( [i r−1] + w(i,
[i j ] = pr + (e[i, (i r−1))+(e[r+1,
1)) ( [ 1 j] + w(r+1, ( 1 j))
= e[i, r−1] + e[r+1, j] + w(i, j). (because w(i, j)=w(i,r−1) + pr + w(r + 1, j))
But
But, we don’t
don t know kr. Hence
Hence,
Computing an Optimal Solution
For each subproblem (i,j), store:
expected
p search cost in a table e[1
[ ..n+1 , 0 ..n]]
» Will use only entries e[i, j ], where j ≥ i−1.
root[i, j ] = root of subtree with keys ki,..,kj, for 1 ≤ i ≤ j ≤ n.
w[1..n+1, 0..n] = sum of probabilities
» w[i, i−1] = 0 for 1 ≤ i ≤ n.
» w[i, j ] = w[i, j-1] + pj for 1 ≤ i ≤ j ≤ n.
Pseudo-code
OPTIMAL BST(p, q, n)
OPTIMAL-BST(p,
1. for i ← 1 to n + 1
2. do e[i, i− 1] ← 0
Consider all trees with l keys.
3. w[i, i− 1] ← 0
4. for l ← 1 to n Fix the first key.
5. do for i ← 1 to n−l + 1 Fix the last key
6. do j ←i + l−1
7. e[i, j ]←∞
8. w[i, j ] ← w[i, j−1] + pj
9. for r ←i to j
10
10. d t ← e[i,
do [i r−1] + e[r
[ + 11, j ] + w[i,
[i j ] Determine the root
11. if t < e[i, j ] of the optimal
12. then e[i, j ] ← t (sub)tree
13
13. root[i j ] ←r
root[i,
14. return e and root
Time: O(n3)
Elements of Dynamic Programming
Optimal substructure
Overlapping subproblems
Optimal Substructure
Show that a solution to a problem consists of making a
choice, which leaves one or more subproblems to solve.
Suppose that you are given this last choice that leads to an
optimal solution.
Given this choice, determine which subproblems arise and
how to characterize the resulting space of subproblems.
Show that the solutions to the subproblems used within
the optimal solution must themselves be optimal. Usually
use cut-and-paste.
cut and paste
Need to ensure that a wide enough range of choices and
subproblems are considered.
considered
Optimal Substructure
Optimal substructure varies across problem domains:
» 1. How many subproblems are used in an optimal solution.
» 2. How many choices in determining which subproblem(s) to
use.
IInformally,
f ll runningi time
ti depends
d d on (# off subproblems
b bl
overall) × (# of choices).
How many subproblems and choices do the examples
considered contain?
Dynamic programming uses optimal substructure bottom
up.
» First find optimal solutions to subproblems.
» Then choose which to use in optimal solution to the problem.
Optimal Substucture
Does optimal substructure apply to all optimization
problems? No.
Applies to determining the shortest path but NOT the
longest simple path of an unweighted directed graph.
Why?
» Shortest path has independent subproblems.
» Solution to one subproblem does not affect solution to another
subproblem of the same problem.
» Subproblems are not independent in longest simple path.
path
• Solution to one subproblem affects the solutions to other subproblems.
» Example:
Overlapping Subproblems
The space of subproblems must be “small”.
The total number of distinct subproblems
p is a polynomial
p y
in the input size.
» A recursive algorithm is exponential because it solves the same
problems
bl repeatedly.
dl
» If divide-and-conquer is applicable, then each problem solved
will be brand new.