DP Rod Cutting Problem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

Dynamic programming is a problem solving method that is applicable to many different


types of problems. I think it is best learned by example, so we will mostly do examples
today.

1 Rod cutting
Suppose you have a rod of length n, and you want to cut up the rod and sell the pieces in
a way that maximizes the total amount of money you get. A piece of length i is worth pi
dollars.

For example, if you have a rod of length 4, there are eight different ways to cut it, and the
best strategy is cutting it into two pieces of length 2, which gives you 10 dollars.
Exercise: How many ways are there to cut up a rod of length n?
Answer: 2n−1 , because there are n − 1 places where we can choose to make cuts, and at
each place, we either make a cut or we do not make a cut.
Despite the exponentially large possibility space, we can use dynamic programming to write
an algorithm that runs in Θ(n2 ).

1
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

1.1 Basic approach

First we ask “what is the maximum amount of money we can get?” And later we can extend
the algorithm to give us the actual rod decomposition that leads to that maximum value.
Let ri be the maximum amount of money you can get with a rod of size i. We can view the
problem recursively as follows:
• First, cut a piece off the left end of the rod, and sell it.
• Then, find the optimal way to cut the remainder of the rod.
Now we don’t know how large a piece we should cut off. So we try all possible cases. First
we try cutting a piece of length 1, and combining it with the optimal way to cut a rod of
length n − 1. Then we try cutting a piece of length 2, and combining it with the optimal
way to cut a rod of length n − 2. We try all the possible lengths and then pick the best one.
We end up with
rn = max (pi + rn−i )
1≤i≤n

(Note that by allowing i to be n, we handle the case where the rod is not cut at all.)

1.1.1 Naive algorithm

This formula immediately translates into a recursive algorithm.

2
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

However, the computation time is ridiculous, because there are so many subproblems. If you
draw the recursion tree, you will see that we are actually doing a lot of extra work, because
we are computing the same things over and over again. For example, in the computation for
n = 4, we compute the optimal solution for n = 1 four times!
It is much better to compute it once, and then refer to it in future recursive calls.

1.1.2 Memoization (top down approach)

One way we can do this is by writing the recursion as normal, but store the result of the
recursive calls, and if we need the result in a future recursive call, we can use the precomputed
value. The answer will be stored in r[n].

3
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

Runtime: Θ(n2 ). Each subproblem is solved exactly once, and to solve a subproblem of
size i, we run through i iterations of the for loop. So the total number of iterations of the
for loop, over all recursive calls, forms an arithmetic series, which produces Θ(n2 ) iterations
in total.

1.1.3 Bottom up approach

Here we proactively compute the solutions for smaller rods first, knowing that they will later
be used to compute the solutions for larger rods. The answer will once again be stored in
r[n].

Often the bottom up approach is simpler to write, and has less overhead, because you don’t
have to keep a recursive call stack. Most people will write the bottom up procedure when
they implement a dynamic programming algorithm.
Runtime: Θ(n2 ), because of the double for loop.

1.1.4 Reconstructing a solution

If we want to actually find the optimal way to split the rod, instead of just finding the
maximum profit we can get, we can create another array s, and let s[j] = i if we determine

4
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

that the best thing to do when we have a rod of length j is to cut off a piece of length i.

Using these values s[j], we can reconstruct a rod decomposition as follows:

1.1.5 Answer to example problem

In our example, the program produces this answer:

2 General dynamic programming remarks


2.0.1 Optimal substructure

To solve a optimization problem using dynamic programming, we must first characterize the
structure of an optimal solution. Specifically, we must prove that we can create an optimal
solution to a problem using optimal solutions to subproblems. (Then we can store all the
optimal solutions in an array and compute later elements in the array from earlier elements
in the array.)

5
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

We can’t really use dynamic programming if the optimal solution to a problem might not
require subproblem solutions to be optimal. This often happens when the subproblems are
not independent of each other.

2.0.2 Overlapping subproblems

For dynamic programming to be useful, the recursive algorithm should require us to compute
optimal solutions to the same subproblems over and over again, because in this case, we
benefit from just computing them once and then using the results later.
In total, there should be a small number of distinct subproblems (i.e. polynomial in the
input size), even if there is an exponential number of total subproblems.
Note that in the divide-and-conquer algorithms we saw earlier in the class, the number of
subproblems gets exponentially larger on each level of the recursion tree, but the size of
the subproblems gets exponentially smaller. In these dynamic programming algorithms, the
number of distinct subproblems should be polynomial, but the size of the subproblems might
decrease by 1 every time.

3 Longest common subsequence


Suppose we have a sequence of letters ACCGGTC. Then a subsequence of this sequence would
be like ACCG or ACTC or CCC. To get ACCG, we pick the first four letters. To get ACTC, we pick
letters 1, 2, 6, and 7. To get CCC, we pick letters 2, 3, and 7, etc.
Formally, given a sequence X = x1 , x2 , . . . , xm , another sequence Z = z1 , . . . , zk is a subse-
quence of X if there exists a strictly increasing sequence i1 , i2 , . . . , ik of indices of X such
that for all j = 1, 2, . . . , k, we have xij = zj .
In the longest-common-subsequence problem, we are given two sequences X and Y , and
want to find the longest possible sequence that is a subsequence of both X and Y .
For example, if X = ABCBDAB and Y = BDCABA, the sequence BCA is a common sequence of
both X and Y . However, it is not a longest common subsequence, because BCBA is a longer
sequence that is also common to both X and Y . Both BCBA and BDAB are longest common
subsequences, since there are no common sequences of length 5 or greater.

3.0.1 Optimal substructure

The first step to solving this using dynamic programming is to say that we can create an
optimal solution to this problem using optimal solutions to subproblems. The hardest part
is to decide what the subproblems are.
Here there are two possible cases:

6
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

1. The last elements of X and Y are equal. Then they must both be part of the longest
common subsequence. So we can chop both elements off the ends of the subsequence
(adding them to a common subsequence) and find the longest common subsequence of
the smaller sequences.
2. The last elements are different. Then either the last element of X or the last element
of Y cannot be part of the longest common subsequence. So now we can find the
longest common subsequence of X and a smaller version of Y , or the longest common
subsequence of Y and a smaller version of X.
Formally, let X = x1 , . . . , xm and Y = y1 , . . . , yn be sequences, and let Z = z1 , . . . , zk be any
longest common subsequence of X and Y . Let Xi refer to the first i elements of X, and Yi
refer to the first i elements of Y , etc.
Then
1. If xm = yn , then zk = xm = yn and Zk−1 is a longest common subsequence of Xm−1
and Yn−1 .
2. If xm 6= yn , then zk 6= xm implies that Z is a longest common subsequence of Xm−1
and Y .
3. If xm 6= yn , then zk 6= yn implies that Z is a longest common subsequence of X and
Yn−1 .
Using this theorem, we show that the longest common subsequence problem can always be
solved by finding the longest common subsequence of smaller problems, and then combining
the solutions.
Proof:
1. If zk 6= xm , then we can append xm = yn to Z to obtain a common subsequence of
X and Y of length k + 1, which contradicts the fact that Z is the longest common
subsequence. Now Zk−1 is a common subsequence of Xm−1 and Yn−1 of length k − 1.
It must be a longest common subsequence, because if W was a common subsequence
of Xm−1 and Ym−1 with length greater than k − 1, then appending xm = yn to W
produces a common subsequence of X and Y whose length is greater than k, which is
a contradiction.
2. If zk 6= xm , then Z is a longest common subsequence of Xm−1 and Y . This is because if
there were a common subsequence W of Xm−1 and Y with length greater than k, then
W would also be a common subsequence of Xm and Y , so Z would not be a longest
common subsequence. Contradiction.
3. Same as the proof for part (2).

7
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

3.0.2 A recursive solution

To store the solutions to subproblems, this time we use a 2D matrix, instead of a 1D array.
We want to compute all the values c[i, j], which are the lengths of the longest common
subsequences between the first i elements of X and the first j elements of Y . At the end,
the answer will be stored in c[m, n].

3.0.3 Dynamic programming algorithm

Using this recurrence, we can write the actual pseudocode. Observe that it is necessary to
populate the table in a certain order, because some elements of the table depend on other
elements of the table having already been computed.

The algorithm fills in the elements of the table as follows:

8
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

3.0.4 Reconstructing the solution

To reconstruct the solution, we just print the elements that we marked as part of the longest
common subsequence.

9
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

4 Correctness proof for Dijkstra’s algorithm (if there


is time)
While breadth-first-search computes shortest paths in an unweighted graph, Dijkstra’s al-
gorithm is a way of computing shortest paths in a weighted graph. Specifically Dijkstra’s
computes the shortest paths from a source node s to every other node in the graph.
The idea is that we keep “distance estimates” for every node in the graph (which are always
greater than the true distance from the start node). On each iteration of the algorithm we
process the (unprocessed) vertex with the smallest distance estimate. (We can prove that by
the time we get around to processing a vertex, its distance estimate reflects the true distance
to that vertex. This is nontrivial and must be proven.)
Whenever we process a vertex, we update the distance estimates of its neighbors, to account
for the possibility that we may be reaching those neighbors through that vertex. Specifically,
if we are processing u, and there is an edge from u → v with weight w, we change v’s distance
estimate v.d to be the minimum of its current value and u.d+w. (It’s possible that v.d doesn’t
change at all, for example, if the shortest path from s to v was through a different vertex.)
If we did lower the estimate v.d, we set v’s parent to be u, to signify that (we think) the
best way to reach v is through u. The parent may change multiple times through the course
of the algorithm, and at the end, the parent-child relations form a shortest path tree, where
the path (along tree edges) from s to any node in the tree is a shortest path to that node.
Note that the shortest path tree is very much like the breadth first search tree.
Important: Dijkstra’s algorithm does not handle graphs with negative edge weights! For
that you would need to use a different algorithm, such as Bellman-Ford.

10
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

4.1 Correctness

Let s be the start node/source node, v.d be the “distance estimate” of a vertex v, and δ(u, v)
be the true distance from u to v. We want to prove two statements:
1. At any point in time, v.d ≥ δ(s, v).
2. When v is extracted from the queue, v.d = δ(s, v). (Distance estimates never increase,
so once v.d = δ(s, v), it stays that way.)

11
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

4.1.1 v.d ≥ δ(s, v)

We want to show that at any point in time, if v.d < ∞ then v.d is the weight of some path
from s to v (not necessarily the shortest path). Since δ(s, v) is the weight of the shortest
path from s to v, the conclusion will follow immediately.
We induct on the number of Relax operations. As our base case, we know that s.d = 0 =
δ(s, s), and all other distance estimates are ∞, which is greater than or equal to their true
distance.
As our inductive step, assume that at some point in time, every distance estimate corresponds
to the weight of some path from s to that vertex. Now when a Relax operation is performed,
the distance estimate of some neighbor u.d may be changed to x.d + w(x, u), for some vertex
x. We know x.d is the weight of some path from s to x. If we add the edge (x, u) at the end
of that path, the weight of the resulting path is x.d + w(x, u), which is u.d.
Alternatively, the distance estimate u.d may not change at all during the Relax step. In that
case we already know (from the inductive hypothesis) that u.d is the weight of some path
from s to u, so the inductive step is still satisfied.

4.1.2 v.d = δ(s, v) when v is extracted from the queue

We induct on the order in which we add nodes to S. For the base case, s is added to S when
s.d = δ(s, s) = 0, so the claim holds.
For the inductive step, assume that the claim holds for all nodes that are currently in S,
and let x be the node in Q that currently has the minimum distance estimate. (This is the
node that is about to be extracted from the queue.) We will show that x.d = δ(s, x).
Suppose p is a shortest path from s to x. Suppose z is the node on p closest to x for which
z.d = δ(s, z). (We know z exists because there is at least one such node, namely s, where
s.d = δ(s, s).) This means for every node y on the path p between z (not inclusive) and x
(inclusive), we have y.d > δ(s, y).
If z = x, then x.d = δ(s, x), so we are done.
So suppose z 6= x. Then there is a node z 0 after z on p (which might equal x). We argue
that z.d = δ(s, z) ≤ δ(s, x) ≤ x.d.

12
CS 161 Lecture 12 – Dynamic Programming Jessica Su (some parts copied from CLRS)

• δ(s, x) ≤ x.d from the previous lemma, and z.d = δ(s, z) by assumption.
• δ(s, z) ≤ δ(s, x) because subpaths of shortest paths are also shortest paths. That is, if
s → · · · → z → · · · → x is a shortest path to x, then the subpath s → · · · → z is a
shortest path to z. This is because, suppose there were an alternate path from s to z
that was shorter. Then we could “glue that path into” the path from s to x, producing
a shorter path and contradicting the fact that the s-to-x path was a shortest path.
Now we want to collapse these inequalities into equalities, by proving that z.d = x.d. Assume
(by way of contradiction) that z.d < x.d. Because x.d has the minimum distance estimate
out of all the unprocessed vertices, it follows that z has already been added to S. This
means that all of the edges coming out of z have already been relaxed by our algorithm,
which means that z 0 .d ≤ δ(s, z) + w(z, z 0 ) = δ(s, z 0 ). (This last equality holds because z
precedes z 0 on the shortest path to x, so z is on the shortest path to z 0 .)
However, this contradicts the assumption that z is the closest node on the path to x with a
correct distance estimate. Thus, z.d = x.d.

13

You might also like