Bellman
Bellman
Bellman
IOANID ROSU
As I understand, there are two approaches to dynamic optimization: the Pontrjagin (Hamil-
tonian) approach, and the Bellman approach. I saw several clear discussions of the Hamilton-
ian approach (Barro & Sala-i-Martin, Blanchard & Fischer, D. Romer), but to my surprise, I
didn’t see any clear treatment of the Bellman principle. What I saw so far (Duffie, Chapters
3 and 9, and the Appendix of Mas-Colell, Whinston & Green) is confusing to me. I guess I
should take a look at some dynamic optimization textbook, but I’m too lazy for that. Instead,
I’m going to try to figure it out on my own, hoping that my freshness on the subject can be
put to use.
The first four sections are only about local conditions for having a (finite) optimum. In the
last section I will discuss global conditions for optima in Bellman’s framework, and give an
example where I solve the problem completely. As a bonus, in Section 5, I use the Bellman
method to derive the Euler–Lagrange equation of variational calculus.
In addition, we impose a budget constraint, which for many examples is the restriction that kt
be eventually positive (i.e. lim inf t kt ≥ 0). This budget constraint excludes explosive solutions
for ct , so that we can apply the Bellman method. I won’t mention the budget constraint until
the last section, but we should keep in mind that without it (or some constraint like it), we
might have no solution.
The usual names for the variables involved is: ct is the control variable (because it is under
the control of the choice maker), and kt is the state variable (because it describes the state of
the system at the beginning of t, when the agent makes the decision). In this paper, I call the
equation kt+1 = g(t, kt , ct ) the “state equation”, I don’t know how it is called in the literature.
To get some intuition about the problem, think of kt as capital available for production
at time t, and of ct as consumption at t. At time 0, for a starting level of capital k0 , the
consumer chooses the level of consumption c0 . This determines the level of capital available
for the next period, k1 = g(0, k0 , c0 ). So at time 1, the consumer decides on the level of c1 ,
which
P∞ together with k1 determines k2 , and the cycle is repeated on and on. The infinite sum
t=0 f (t, kt , ct ) is to be thought of as the total “utility” of the consumer, which the latter is
supposed to maximize at time 0.
Bellman’s idea for solving (1) is to define a value function V at each t = 0, 1, 2, . . .
∞
X
V (t, kt ) = max f (s, ks , cs ) s.t. ks+1 = g(s, ks , cs ) ,
(cs )
s=t
which represents the consumer’s maximum “utility” given the initial level of kt . Then we have
the following obvious result
Theorem 1.1. (Bellman’s principle of optimality)
For each t = 0, 1, 2, . . .
h i
(2) V (t, kt ) = max f (t, kt , ct ) + V t + 1, g(t, kt , ct ) .
ct
Example 1.2. In a typical dynamic optimization problem, the consumer has to maximize
intertemporal utility, for which the instantaneous “felicity” is u(c), with u a von Neumann–
Morgenstern utility function. Therefore, f (t, kt , ct ) = β t u(ct ), and 0 < β < 1 is the discount
constant. The state equation kt+1 = g(t, kt , ct ) typically is given by
(9) et + φ(kt ) = ct + (kt+1 − kt ) ,
where et is endowment (e.g. labor income), and φ(kt ) is the production function (technology).
As an example of the technology function, we have φ(kt ) = rkt . The derivative φ0 (kt ) = r is
then, as expected, the interest rate on capital. Notice that with the above description we have
kt+1 = g(t, kt , ct ) = kt + φ(kt ) + et − ct .
So we get the following formulas: ∂f t 0
∂ct = u (ct ),
∂gt
∂ct = −1, ∂ft
∂kt = 0, ∂gt
∂kt = 1 + r. The
Bellman–Euler equation (7) becomes
u0 (ct ) = β 1 + φ0 (kt ) u0 (ct+1 ) ,
As usual, we denote by Et the expectation given information available at time t. Then we can
define the value function
∞
X
V (t, kt ) = max Et f (s, ks , cs ) s.t. ks+1 = g(s, ks , cs ) .
(cs )
s=t
Now in order to derive the Euler equation with uncertainty, all we have to do is replace
V (t + 1) in the formulas of the previous section by Et V (t + 1) (using, of course, the fact
that differentiation commutes with expectation). We arrive at the following Bellman–Euler
equation
fc (t + 1) fc (t)
(12) Et fk (t + 1) − · gk (t + 1) = − .
gc (t + 1) gc (t)
For our particular Example 1.2, we get
u0 (ct ) = β(1 + r)Et u0 (ct+1 ) ,
In general, in order to solve this, notice that we can rewrite (21) as dct /dt = λ(t, kt , ct ), so the
optimum is given by the following system of ODE’s
dct /dt = λ(t, kt , ct )
(22)
dkt /dt = h(t, kt , ct ) .
Example 3.1. Applying the above analysis to our favorite example, we have f (t, kt , ct ) =
e−ρt u(ct ) and dkt /dt = h(t, kt , ct ) = et + φ(kt ) − ct . The Euler equation (20) becomes
d −ρt 0
−e u (ct ) = e−ρt u0 (ct )φ0 (kt ) ,
dt
or equivalently
00
u (ct )ct dct /dt
(23) − 0 · = φ0 (kt ) − ρ .
u (ct ) ct
Notice that we get the same equation as (7’) from Blanchard and Fischer (Chapter 2, p. 40),
so we are on the right track.
Now V (t+dt, kt+dt ) = V (t+dt, kt +αdt+β dWt ) = V (t, kt )+Vt dt+Vk (αdt+β dWt )+ 12 Vkk β 2 dt,
where the last equality comes from Itô’s lemma. Taking expectation at t, it follows that
(30) Et V (t + dt, kt+dt ) = V (t, kt ) + Vt dt + Vk α dt + 12 Vkk β 2 dt .
The Bellman principle (29) can be written equivalently as
sup −V (t, kt ) + f (t, kt , ct )dt + Et V (t + dt, kt+dt ) ≤ 0 ,
ct
with equality at the optimum ct . Putting this together with (30), we get the following
(31) sup Da V (t, y) + f (t, y, a) = 0 ,
a
where Da is a partial differential operator defined by
(32) Da V (t, y) = Vt (t, y) + Vy (t, y)α(t, y, a) + 12 Vyy β(t, y, a)2 .
Equation (31) is also known as the Hamilton–Jacobi–Bellman equation. (We got the same
equation as Duffie, chapter 9A, so we’re fine.) This is not quite a PDE yet, because we have
a supremum operator before it. However, the first order condition for (31) does give a PDE.
The boundary condition comes from some transversality condition that we have to impose on
V at infinity (see Duffie).
Notice that for the stochastic case we took a different route than before. This is because
now we cannot eliminate the value function anymore (the reason is that we get an extra term
in the first order condition coming from the dWt -term, and that term depends on V as well).
So this approach first looks at a value function which satisfies the Hamilton–Jacobi–Bellman
equation, and then derives the optimal consumption ct and capital kt .
Suppose we know the optimum curve c only up to some point x = c(t). Then we define the
value function
Z b
V (t, x) = max F (s, c, ċ)ds s.t. c(t) = x and c(b) = Q .
c(s) t
In the discussion that follows, we denote by λ a direction in Rn (the curve c also has values
in Rn ). The Bellman principle of optimality says that
Z t+dt
(35) V (t, x) = max F (t, x, λ)dt + Vt+dt (x + λdt) .
λ t
The last term comes from the identity c(t + dt) = c(t) + ċ(t)dt = x + λdt.
The first order condition for this maximum is (after dividing through by dt)
dVt+dt
= −Fċ .
dc
THE BELLMAN PRINCIPLE OF OPTIMALITY 7
We assume that the endowment is always positive and has a finite present value t β t et < ∞.
P
We impose the requirement that the capital kt be eventually positive, i.e. that lim inf t kt ≥ 0.
For simplicity, assume also that β(1 + r) = 1.
Start with an optimum sequence of consumption (ct ). Such a sequence clearly exists, since
we are looking for a (ct ) that achieves the maximum utility at zero (finite or not). Note that
we don’t assume anything particular about it; for example we don’t assume that the value
function corresponding to it is finite, or that it attains an interior optimum. Now we start to
use our assumptions and show that (ct ) does satisfy all those properties.
Using the state equation, we deduce that for t sufficiently large (so that kt+1 becomes
positive, hence bigger than −1),
ct ≤ (1 + r)kt + et + 1 .
Write this inequality for all s > t and multiply through by β s−t . Denote by P Vt (c) =
s−t c , the present value of consumption at time t, and similarly for k and e. Adding
P
s≥t β s
up the above inequalities for s ≥ t, we get
1
(36) P Vt (c) ≤ (1 + r)P Vt (k) + P Vt (e) + .
1−β
We assumed that P Vt (e) < ∞. Let’s show that P Vt (c) < ∞. If P Vt (k) < ∞, we are done. If
P Vt (k) = ∞, then we can consume out of this capital until P Vt (k) becomes finite. Certainly,
by doing this, P Vt (c) will increase, yet it will still satisfy (36). Now all the terms on the
right hand side of (36) are finite, hence so is P Vt (c), at the new level of consumption. That
8 IOANID ROSU
means that our original P Vt (c) must be finite. Moreover, it also follows that P Vt (k) is finite.
Suppose it isn’t. Then, we can proceed as above and increase P Vt (c), while still keeping it
finite. But this is in contradiction with the fact that (ct ) was chosen to be an optimum. (I
have to admit that this part is tricky, but we can’t avoid it if we want to be rigorous!) Since
u is monotone concave, P Vt u(c) is also finite. That means that the Bellman value function
V (t, kt ) is finite, so our first concern is resolved.
We now look at the Bellman principle of optimality
h i
V (t, kt ) = max β t u(ct ) + V t + 1, (1 + r)kt + et − ct ) .
ct
Here’s a subtle idea: if we save one small quantity today, tomorrow it becomes (1 + r), and
we can either consume it, or leave for later. This latter decision has to be made in an optimum
fashion (since this is the definition of Vt+1 ), so we are indifferent between consuming (1 + r)
tomorrow and saving it for later on. Thus, we might as well assume that we are consuming
it tomorrow, so when we calculate Vt+1 , only tomorrow’s utility is affected. Therefore, for
marginal purposes we can regard Vt+1 as equal to β t+1 u(ct+1 ). Then, to analyze the Bellman
optimum locally is the same as analyzing
h i
max β t u(ct − ) + β t+1 u ct+1 + (1 + r)
locally around the optimum = 0. Since β(1 + r) = 1, the Bellman–Euler equation is
u0 (ct ) = u0 (ct+1 ) or equivalently ct = ct+1 = c .
Because u is concave, it is easy to see that = 0 is a local maximum. That takes care of our
second concern.
We notice that consumption should be smoothened completely, so that ct = c for all t.
Recall the state equation: et − ct = kt+1 − (1 + r)kt . Multiplying this by β t = 1/(1 + r)t and
summing over all t, we get
X X 1
β t (et − ct ) = = lim β t kt+1 − (1 + r)k0 .
t
k t+1 − kt
t t
(1 + r) t→∞