SLchapt 3
SLchapt 3
Dynamic Programming
This chapter introduces basic ideas and methods of dynamic programming. 1 It sets
out the basic elements of a recursive optimization problem, describes the functional
equation (the Bellman equation), presents three methods for solving the Bellman
equation, and gives the Benveniste-Scheinkman formula for the derivative of the op-
timal value function. Let’s dive in.
∞
β t r (xt , ut ) , (3.1.1)
t=0
1 This chapter is written in the hope of getting the reader to start using the methods
quickly. We hope to promote demand for further and more rigorous study of the
subject. In particular see Bertsekas (1976), Bertsekas and Shreve (1978), Stokey and
Lucas (with Prescott) (1989), Bellman (1957), and Chow (1981). This chapter covers
much of the same material as Sargent (1987b, chapter 1).
– 78 –
Sequential problems 79
where the maximization is subject to x̃ = g(x, u) with x given, and x̃ denotes the
state next period. Thus, we have exchanged the original problem of finding an infinite
sequence of controls that maximizes expression (3.1.1 ) for the problem of finding the
optimal value function V (x) and a function h that solves the continuum of maximum
problems (3.1.4 )—one maximum problem for each value of x. This exchange doesn’t
look like progress, but we shall see that it often is.
Our task has become jointly to solve for V (x), h(x), which are linked by the
Bellman equation
V (x) = max{r (x, u) + βV [g (x, u)]}. (3.1.5)
u
The maximizer of the right side of equation (3.1.5 ) is a policy function h(x) that
satisfies
V (x) = r [x, h (x)] + βV {g [x, h (x)]}. (3.1.6)
Equation (3.1.5 ) or (3.1.6 ) is a functional equation to be solved for the pair of un-
known functions V (x), h(x).
Methods for solving the Bellman equation are based on mathematical structures
that vary in their details depending on the precise nature of the functions r and g . 2
2 There are alternative sets of conditions that make the maximization (3.1.4 ) well
behaved. One set of conditions is as follows: (1) r is concave and bounded, and
(2) the constraint set generated by g is convex and compact, that is, the set of
{(xt+1 , xt ) : xt+1 ≤ g(xt , ut )} for admissible ut is convex and compact. See Stokey,
Lucas, and Prescott (1989), and Bertsekas (1976) for further details of convergence
results. See Benveniste and Scheinkman (1979) and Stokey, Lucas, and Prescott
(1989) for the results on differentiability of the value function. In an appendix on
functional analysis, chapter A, we describe the mathematics for one standard set of
assumptions about (r, g). In chapter 5, we describe it for another set of assumptions
about (r, g).
80 Chapter 3: Dynamic Programming
All of these structures contain versions of the following four findings. Under various
particular assumptions about r and g , it turns out that
1. The functional equation (3.1.5 ) has a unique strictly concave solution.
2. This solution is approached in the limit as j → ∞ by iterations on
subject to x̃ = g(x, u), x given, starting from any bounded and continuous initial
V0 .
3. There is a unique and time invariant optimal policy of the form ut = h(xt ),
where h is chosen to maximize the right side of (3.1.5 ). 3
4. Off corners, the limiting value function V is differentiable with
∂r ∂g
V (x) = [x, h (x)] + β [x, h (x)] V {g [x, h (x)]}. (3.1.8)
∂x ∂x
∂r
V (x) = [x, h (x)] . (3.1.9)
∂x
3 The time invariance of the policy function u = h(x ) is very convenient econo-
t t
metrically, because we can impose a single decision rule for all periods. This lets us
pool data across period to estimate the free parameters of the return and transition
functions that underlie the decision rule.
Sequential problems 81
Guess and verify. A second method involves guessing and verifying a solution
V to equation (3.1.5 ). This method relies on the uniqueness of the solution to the
equation, but because it relies on luck in making a good guess, it is not generally
available.
1. Pick a feasible policy, u = h0 (x), and compute the value associated with oper-
ating forever with that policy:
∞
Vhj (x) = β t r [xt , hj (xt )] ,
t=0
for each x.
3. Iterate over j to convergence on steps 1 and 2.
4 See the appendix on functional analysis for what it means for a sequence of
functions to converge.
5 A proof of the uniform convergence of iterations on equation (3.1.10 ) is contained
in the appendix on functional analysis, chapter A.
82 Chapter 3: Dynamic Programming
∞
β t ln (ct )
t=0
This problem can be solved “by hand,” using any of our three methods. We begin
with iteration on the Bellman equation. Start with v0 (k) = 0 , and solve the one-
period problem: choose c to maximize ln(c) subject to c + k̃ = Ak α . The solution
is evidently to set c = Ak α , k̃ = 0 , which produces an optimized value v1 (k) =
1 βα
ln A + α ln k . At the second step, we find c = 1+βα Ak α , k̃ = 1+βα Ak α , v2 (k) =
A αβA
ln 1+αβ + β ln A + αβ ln 1+αβ + α(1 + αβ) ln k . Continuing, and using the algebra of
geometric series, gives the limiting policy functions c = (1 − βα)Ak α , k̃ = βαAk α ,
and the value function v(k) = (1 − β)−1 {ln[A(1 − βα)] + 1−βα
βα
ln(Aβα)} + 1−βαα
ln k .
Here is how the guess-and-verify method applies to this problem. Since we already
know the answer, we’ll guess a function of the correct form, but leave its coefficients
undetermined. 8 Thus, we make the guess
v (k) = E + F ln k, (3.1.12)
where E and F are undetermined constants. The left and right sides of equation
(3.1.12 ) must agree for all values of k . For this guess, the first-order necessary
condition for the maximum problem on the right side of equation (3.1.10 ) implies the
following formula for the optimal policy k̃ = h(k), where k̃ is next period’s value and
k is this period’s value of the capital stock:
βF
k̃ = Ak α . (3.1.13)
1 + βF
Substitute equation (3.1.13 ) into the Bellman equation and equate the result to the
right side of equation (3.1.12 ). Solving the resulting equation for E and F gives
F = α/(1 − αβ) and E = (1 − β)−1 [ln A(1 − αβ) + 1−αβ βα
ln Aβα]. It follows that
k̃ = βαAk α . (3.1.14)
Note that the term F = α/(1 − αβ) can be interpreted as a geometric sum α[1 +
αβ + (αβ)2 + . . .].
Equation (3.1.14 ) shows that the optimal policy is to have capital move according
to the difference equation kt+1 = Aβαktα , or ln kt+1 = ln Aβα + α ln kt . That α
is less than 1 implies that kt converges as t approaches infinity for any positive
α
initial value k0 . The stationary point is given by the solution of k∞ = Aβαk∞ , or
α−1 −1
k∞ = (Aβα) .
an Euler equation that is exploited extensively in the theories of finance, growth, and
real business cycles.
subject to
xt+1 = g (xt , ut , t+1 ) , (3.2.2)
be computed by iterating on
starting from any bounded continuous initial V0 . Under various particular regularity
conditions, there obtain versions of the same four properties listed earlier. 9
The first-order necessary condition for the problem on the right side of equation
(3.2.3 ) is
∂r (x, u) ∂g
+ βE (x, u, ) V [g (x, u, )] |x = 0,
∂u ∂u
which we obtained simply by differentiating the right side of equation (3.2.3 ), passing
the differentiation operation under the E (an integration) operator. Off corners, the
value function satisfies
∂r ∂g
V (x) = [x, h (x)] + βE [x, h (x) , ] V (g [x, h (x) , ]) |x .
∂x ∂x
In the special case in which ∂g/∂x ≡ 0 , the formula for V (x) becomes
∂r
V (x) = [x, h (x)] .
∂x
Substituting this formula into the first-order necessary condition for the problem gives
the stochastic Euler equation
∂r ∂g ∂r
(x, u) + βE (x, u, ) (x̃, ũ) |x = 0,
∂u ∂u ∂x
9 See Stokey and Lucas (with Prescott) (1989), or the framework presented in the
appendix on functional analysis, chapter A.
Exercise 87
Exercise
subject to ct + kt+1 ≤ Aktα θt ,k0 given, A > 0 , 1 > α > 0 , where {θt } is an i.i.d.
sequence with ln θt distributed according to a normal distribution with mean zero
and variance σ 2 .
Consider the following algorithm. Guess at a policy of the form kt+1 = h0 (Aktα θt )
for any constant h0 ∈ (0, 1). Then form
∞
J0 (k0 , θ0 ) = E0 β t ln (Aktα θt − h0 Aktα θt ) .
t=0
ln (Ak α θ − k ) + βEJ0 (k , θ ) ,