DP Rod Cutting Problem
DP Rod Cutting Problem
DP Rod Cutting Problem
1 Rod cutting
Suppose you have a rod of length n, and you want to cut up the rod and sell the pieces in
a way that maximizes the total amount of money you get. A piece of length i is worth pi
dollars.
For example, if you have a rod of length 4, there are eight different ways to cut it, and the
best strategy is cutting it into two pieces of length 2, which gives you 10 dollars.
Exercise: How many ways are there to cut up a rod of length n?
Answer: 2n−1 , because there are n − 1 places where we can choose to make cuts, and at
each place, we either make a cut or we do not make a cut.
Despite the exponentially large possibility space, we can use dynamic programming to write
an algorithm that runs in Θ(n2 ).
1
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
First we ask “what is the maximum amount of money we can get?” And later we can extend
the algorithm to give us the actual rod decomposition that leads to that maximum value.
Let ri be the maximum amount of money you can get with a rod of size i. We can view the
problem recursively as follows:
• First, cut a piece off the left end of the rod, and sell it.
• Then, find the optimal way to cut the remainder of the rod.
Now we don’t know how large a piece we should cut off. So we try all possible cases. First
we try cutting a piece of length 1, and combining it with the optimal way to cut a rod of
length n − 1. Then we try cutting a piece of length 2, and combining it with the optimal
way to cut a rod of length n − 2. We try all the possible lengths and then pick the best one.
We end up with
rn = max (pi + rn−i )
1≤i≤n
(Note that by allowing i to be n, we handle the case where the rod is not cut at all.)
2
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
However, the computation time is ridiculous, because there are so many subproblems. If you
draw the recursion tree, you will see that we are actually doing a lot of extra work, because
we are computing the same things over and over again. For example, in the computation for
n = 4, we compute the optimal solution for n = 1 four times!
It is much better to compute it once, and then refer to it in future recursive calls.
One way we can do this is by writing the recursion as normal, but store the result of the
recursive calls, and if we need the result in a future recursive call, we can use the precomputed
value. The answer will be stored in r[n].
3
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
Runtime: Θ(n2 ). Each subproblem is solved exactly once, and to solve a subproblem of
size i, we run through i iterations of the for loop. So the total number of iterations of the
for loop, over all recursive calls, forms an arithmetic series, which produces Θ(n2 ) iterations
in total.
Here we proactively compute the solutions for smaller rods first, knowing that they will later
be used to compute the solutions for larger rods. The answer will once again be stored in
r[n].
Often the bottom up approach is simpler to write, and has less overhead, because you don’t
have to keep a recursive call stack. Most people will write the bottom up procedure when
they implement a dynamic programming algorithm.
Runtime: Θ(n2 ), because of the double for loop.
If we want to actually find the optimal way to split the rod, instead of just finding the
maximum profit we can get, we can create another array s, and let s[j] = i if we determine
4
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
that the best thing to do when we have a rod of length j is to cut off a piece of length i.
To solve a optimization problem using dynamic programming, we must first characterize the
structure of an optimal solution. Specifically, we must prove that we can create an optimal
solution to a problem using optimal solutions to subproblems. (Then we can store all the
optimal solutions in an array and compute later elements in the array from earlier elements
in the array.)
5
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
We can’t really use dynamic programming if the optimal solution to a problem might not
require subproblem solutions to be optimal. This often happens when the subproblems are
not independent of each other.
For dynamic programming to be useful, the recursive algorithm should require us to compute
optimal solutions to the same subproblems over and over again, because in this case, we
benefit from just computing them once and then using the results later.
In total, there should be a small number of distinct subproblems (i.e. polynomial in the
input size), even if there is an exponential number of total subproblems.
Note that in the divide-and-conquer algorithms we saw earlier in the class, the number of
subproblems gets exponentially larger on each level of the recursion tree, but the size of
the subproblems gets exponentially smaller. In these dynamic programming algorithms, the
number of distinct subproblems should be polynomial, but the size of the subproblems might
decrease by 1 every time.
The first step to solving this using dynamic programming is to say that we can create an
optimal solution to this problem using optimal solutions to subproblems. The hardest part
is to decide what the subproblems are.
Here there are two possible cases:
6
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
1. The last elements of X and Y are equal. Then they must both be part of the longest
common subsequence. So we can chop both elements off the ends of the subsequence
(adding them to a common subsequence) and find the longest common subsequence of
the smaller sequences.
2. The last elements are different. Then either the last element of X or the last element
of Y cannot be part of the longest common subsequence. So now we can find the
longest common subsequence of X and a smaller version of Y , or the longest common
subsequence of Y and a smaller version of X.
Formally, let X = x1 , . . . , xm and Y = y1 , . . . , yn be sequences, and let Z = z1 , . . . , zk be any
longest common subsequence of X and Y . Let Xi refer to the first i elements of X, and Yi
refer to the first i elements of Y , etc.
Then
1. If xm = yn , then zk = xm = yn and Zk−1 is a longest common subsequence of Xm−1
and Yn−1 .
2. If xm 6= yn , then zk 6= xm implies that Z is a longest common subsequence of Xm−1
and Y .
3. If xm 6= yn , then zk 6= yn implies that Z is a longest common subsequence of X and
Yn−1 .
Using this theorem, we show that the longest common subsequence problem can always be
solved by finding the longest common subsequence of smaller problems, and then combining
the solutions.
Proof:
1. If zk 6= xm , then we can append xm = yn to Z to obtain a common subsequence of
X and Y of length k + 1, which contradicts the fact that Z is the longest common
subsequence. Now Zk−1 is a common subsequence of Xm−1 and Yn−1 of length k − 1.
It must be a longest common subsequence, because if W was a common subsequence
of Xm−1 and Ym−1 with length greater than k − 1, then appending xm = yn to W
produces a common subsequence of X and Y whose length is greater than k, which is
a contradiction.
2. If zk 6= xm , then Z is a longest common subsequence of Xm−1 and Y . This is because if
there were a common subsequence W of Xm−1 and Y with length greater than k, then
W would also be a common subsequence of Xm and Y , so Z would not be a longest
common subsequence. Contradiction.
3. Same as the proof for part (2).
7
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
To store the solutions to subproblems, this time we use a 2D matrix, instead of a 1D array.
We want to compute all the values c[i, j], which are the lengths of the longest common
subsequences between the first i elements of X and the first j elements of Y . At the end,
the answer will be stored in c[m, n].
Using this recurrence, we can write the actual pseudocode. Observe that it is necessary to
populate the table in a certain order, because some elements of the table depend on other
elements of the table having already been computed.
8
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
To reconstruct the solution, we just print the elements that we marked as part of the longest
common subsequence.
9
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
10
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
4.1 Correctness
Let s be the start node/source node, v.d be the “distance estimate” of a vertex v, and δ(u, v)
be the true distance from u to v. We want to prove two statements:
1. At any point in time, v.d ≥ δ(s, v).
2. When v is extracted from the queue, v.d = δ(s, v). (Distance estimates never increase,
so once v.d = δ(s, v), it stays that way.)
11
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
We want to show that at any point in time, if v.d < ∞ then v.d is the weight of some path
from s to v (not necessarily the shortest path). Since δ(s, v) is the weight of the shortest
path from s to v, the conclusion will follow immediately.
We induct on the number of Relax operations. As our base case, we know that s.d = 0 =
δ(s, s), and all other distance estimates are ∞, which is greater than or equal to their true
distance.
As our inductive step, assume that at some point in time, every distance estimate corresponds
to the weight of some path from s to that vertex. Now when a Relax operation is performed,
the distance estimate of some neighbor u.d may be changed to x.d + w(x, u), for some vertex
x. We know x.d is the weight of some path from s to x. If we add the edge (x, u) at the end
of that path, the weight of the resulting path is x.d + w(x, u), which is u.d.
Alternatively, the distance estimate u.d may not change at all during the Relax step. In that
case we already know (from the inductive hypothesis) that u.d is the weight of some path
from s to u, so the inductive step is still satisfied.
We induct on the order in which we add nodes to S. For the base case, s is added to S when
s.d = δ(s, s) = 0, so the claim holds.
For the inductive step, assume that the claim holds for all nodes that are currently in S,
and let x be the node in Q that currently has the minimum distance estimate. (This is the
node that is about to be extracted from the queue.) We will show that x.d = δ(s, x).
Suppose p is a shortest path from s to x. Suppose z is the node on p closest to x for which
z.d = δ(s, z). (We know z exists because there is at least one such node, namely s, where
s.d = δ(s, s).) This means for every node y on the path p between z (not inclusive) and x
(inclusive), we have y.d > δ(s, y).
If z = x, then x.d = δ(s, x), so we are done.
So suppose z 6= x. Then there is a node z 0 after z on p (which might equal x). We argue
that z.d = δ(s, z) ≤ δ(s, x) ≤ x.d.
12
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)
• δ(s, x) ≤ x.d from the previous lemma, and z.d = δ(s, z) by assumption.
• δ(s, z) ≤ δ(s, x) because subpaths of shortest paths are also shortest paths. That is, if
s → · · · → z → · · · → x is a shortest path to x, then the subpath s → · · · → z is a
shortest path to z. This is because, suppose there were an alternate path from s to z
that was shorter. Then we could “glue that path into” the path from s to x, producing
a shorter path and contradicting the fact that the s-to-x path was a shortest path.
Now we want to collapse these inequalities into equalities, by proving that z.d = x.d. Assume
(by way of contradiction) that z.d < x.d. Because x.d has the minimum distance estimate
out of all the unprocessed vertices, it follows that z has already been added to S. This
means that all of the edges coming out of z have already been relaxed by our algorithm,
which means that z 0 .d ≤ δ(s, z) + w(z, z 0 ) = δ(s, z 0 ). (This last equality holds because z
precedes z 0 on the shortest path to x, so z is on the shortest path to z 0 .)
However, this contradicts the assumption that z is the closest node on the path to x with a
correct distance estimate. Thus, z.d = x.d.
13