Bellman

You are on page 1of 8

THE BELLMAN PRINCIPLE OF OPTIMALITY

IOANID ROSU

As I understand, there are two approaches to dynamic optimization: the Pontrjagin (Hamil-
tonian) approach, and the Bellman approach. I saw several clear discussions of the Hamilton-
ian approach (Barro & Sala-i-Martin, Blanchard & Fischer, D. Romer), but to my surprise, I
didn’t see any clear treatment of the Bellman principle. What I saw so far (Duffie, Chapters
3 and 9, and the Appendix of Mas-Colell, Whinston & Green) is confusing to me. I guess I
should take a look at some dynamic optimization textbook, but I’m too lazy for that. Instead,
I’m going to try to figure it out on my own, hoping that my freshness on the subject can be
put to use.
The first four sections are only about local conditions for having a (finite) optimum. In the
last section I will discuss global conditions for optima in Bellman’s framework, and give an
example where I solve the problem completely. As a bonus, in Section 5, I use the Bellman
method to derive the Euler–Lagrange equation of variational calculus.

1. Discrete time, certainty


We start in discrete time, and we assume perfect foresight (so no expectation will be in-
volved). The general problem we want to solve is


 max X f (t, k , c )

t t
(1) (ct )
 t=0
s.t. kt+1 = g(t, kt , ct ) .

In addition, we impose a budget constraint, which for many examples is the restriction that kt
be eventually positive (i.e. lim inf t kt ≥ 0). This budget constraint excludes explosive solutions
for ct , so that we can apply the Bellman method. I won’t mention the budget constraint until
the last section, but we should keep in mind that without it (or some constraint like it), we
might have no solution.
The usual names for the variables involved is: ct is the control variable (because it is under
the control of the choice maker), and kt is the state variable (because it describes the state of
the system at the beginning of t, when the agent makes the decision). In this paper, I call the
equation kt+1 = g(t, kt , ct ) the “state equation”, I don’t know how it is called in the literature.
To get some intuition about the problem, think of kt as capital available for production
at time t, and of ct as consumption at t. At time 0, for a starting level of capital k0 , the
consumer chooses the level of consumption c0 . This determines the level of capital available
for the next period, k1 = g(0, k0 , c0 ). So at time 1, the consumer decides on the level of c1 ,
which
P∞ together with k1 determines k2 , and the cycle is repeated on and on. The infinite sum
t=0 f (t, kt , ct ) is to be thought of as the total “utility” of the consumer, which the latter is
supposed to maximize at time 0.
Bellman’s idea for solving (1) is to define a value function V at each t = 0, 1, 2, . . .

X
V (t, kt ) = max f (s, ks , cs ) s.t. ks+1 = g(s, ks , cs ) ,
(cs )
s=t

Date: April 8, 2002.


1
2 IOANID ROSU

which represents the consumer’s maximum “utility” given the initial level of kt . Then we have
the following obvious result
Theorem 1.1. (Bellman’s principle of optimality)
For each t = 0, 1, 2, . . .
h i
(2) V (t, kt ) = max f (t, kt , ct ) + V t + 1, g(t, kt , ct ) .
ct

This in principle reduces an infinite-period optimization problem to a two-period opti-


mization problem. But is this the whole story? How do we actually solve the optimization
problem (1)? Here is where the textbooks I mentioned above are less clear1. Well, let’s try to
squeeze all we can out of Bellman’s equation (2).
We denote partial derivatives by using subscripts. A star superscript denotes the optimum.
Then the first order condition from (2) reads
fc (t, kt , c∗t ) + Vk t + 1, g(t, kt , c∗t ) · gc (t, kt , c∗t ) = 0 .

(3)
Looking at this formula, it is clear that we would like to be able to compute the derivative
Vk (t + 1, kt+1 ). We can try to do that using again the formula (2). Since we are differentiating
a maximum operator, we apply the envelope theorem2 and obtain
Vk (t, kt ) = fk (t, kt , c∗t ) + Vk t + 1, g(t, kt , c∗t ) · gk (t, kt , c∗t ) .

(4)
From (3), we can calculate Vk t + 1, g(t, kt , c∗t ) , and substituting it in (4), we get

 
fc
(5) Vk (t, kt ) = fk − · gk (t, kt , c∗t ) .
gc
Finally, substitute this formula into (3) and obtain a condition which does not depend on the
value function anymore:
 
∗ ∗ fc
· gk t + 1, g(kt , c∗t ), c∗t+1 = 0 .

(6) fc (t, kt , ct ) + gc (t, kt , ct ) · fk −
gc
Notice that this formula is true for any kt , not necessarily only for the optimal one up to that
point. But in that case, c∗t and c∗t+1 are the optimal choices given kt . In any case, from now
on we are only going to work at the optimum (t, kt∗ , c∗t ). The previous formula can be written
as follows:
fc (t + 1) fc (t)
(7) fk (t + 1) − · gk (t + 1) = − .
gc (t + 1) gc (t)
This is the key equation that allows us to compute the optimum c∗t , using only the initial data
(ft and gt ). I guess equation (7) should be called the Bellman equation, although in particular
cases it goes by the Euler equation (see the next Example). I am going to compromise and
call it the Bellman–Euler equation.
For the purposes of comparison with the continuous-time version, write g(t, kt , ct ) = kt +
h(t, kt , ct ). Denote ∆t φ = φ(t+1)−φ(t). Then we can rewrite the Bellman–Euler equation (7)
 
fc (t) fc (t + 1)
(8) ∆t = fk (t + 1) − · hk (t + 1) .
hc (t) hc (t + 1)
1MWG only discusses the existence and uniqueness of a value function, while Duffie treats only the Example
mentioned above, and leaves a crucial Lemma as an exercise at the end of the chapter.
2To state the envelope theorem, start with a function of two variables f (x, θ), such that for every x, the
maximum maxθ f (x, θ) is achieved at a point θ = θ∗ (x) in the interior of the θ-interval. Then
d ∂f ∗
max f (x, θ) = (θ (x)) .
dx θ ∂x
THE BELLMAN PRINCIPLE OF OPTIMALITY 3

Example 1.2. In a typical dynamic optimization problem, the consumer has to maximize
intertemporal utility, for which the instantaneous “felicity” is u(c), with u a von Neumann–
Morgenstern utility function. Therefore, f (t, kt , ct ) = β t u(ct ), and 0 < β < 1 is the discount
constant. The state equation kt+1 = g(t, kt , ct ) typically is given by
(9) et + φ(kt ) = ct + (kt+1 − kt ) ,
where et is endowment (e.g. labor income), and φ(kt ) is the production function (technology).
As an example of the technology function, we have φ(kt ) = rkt . The derivative φ0 (kt ) = r is
then, as expected, the interest rate on capital. Notice that with the above description we have
kt+1 = g(t, kt , ct ) = kt + φ(kt ) + et − ct .
So we get the following formulas: ∂f t 0
∂ct = u (ct ),
∂gt
∂ct = −1, ∂ft
∂kt = 0, ∂gt
∂kt = 1 + r. The
Bellman–Euler equation (7) becomes
u0 (ct ) = β 1 + φ0 (kt ) u0 (ct+1 ) ,


which is the usual Euler equation.

2. Discrete time, uncertainty


Now we assume everything to be stochastic, and the agent solves the problem


 max E X f (t, k , c )

0 t t
(10) (ct )
 t=0
s.t. kt+1 = g(t, kt , ct ) .

As usual, we denote by Et the expectation given information available at time t. Then we can
define the value function

X
V (t, kt ) = max Et f (s, ks , cs ) s.t. ks+1 = g(s, ks , cs ) .
(cs )
s=t

The Bellman principle of optimality (2) becomes


h i
(11) V (t, kt ) = max f (t, kt , ct ) + Et V t + 1, g(t, kt , ct ) .
ct

Now in order to derive the Euler equation with uncertainty, all we have to do is replace
V (t + 1) in the formulas of the previous section by Et V (t + 1) (using, of course, the fact
that differentiation commutes with expectation). We arrive at the following Bellman–Euler
equation
 
fc (t + 1) fc (t)
(12) Et fk (t + 1) − · gk (t + 1) = − .
gc (t + 1) gc (t)
For our particular Example 1.2, we get
u0 (ct ) = β(1 + r)Et u0 (ct+1 ) ,

3. Continuous time, certainty


This is a bit trickier, but the same derivation as in discrete time can be used. The difference
is that instead of the interval t, t + 1 we now look at t, t + dt.
The problem that the decision maker has to solve is
 Z ∞
max f (t, kt , ct )

(13) ct 0
s.t. dk
dt = h(t, kt , ct ) .
t

4 IOANID ROSU

The constraint can be rewritten in differential notation


(14) kt+dt = kt + h(t, kt , ct )dt ,
so we have a problem similar in form to (1), and we can solve it by an analogous method.
Define the value function
Z ∞
dks
V (t, kt ) = max f (s, ks , cs ) ds s.t. = h(s, ks , cs ) .
(cs ) t ds
The Bellman principle of optimality states that
Z t+dt 

(15) V (t, kt ) = max f (s, ks , cs ) ds + V t + dt, kt + h(t, kt , ct )dt .
ct t
R t+dt
We know that t f (s, ks , cs ) ds = f (t, kt , ct )dt. The first order condition for a maximum is
(16) fc (t)dt + Vk (t + dt, kt+dt ) · hc (t) dt = 0 .
This is equivalent to
fc (t)
(17) Vk (t + dt) = − .
hc (t)
As we did in the first section, apply the envelope theorem to derive

(18) Vk (t) = fk (t)dt + Vk (t + dt) · 1 + hk (t) dt .
Substitute (17) into (18) to obtain
 
fc (t) fc (t)
Vk (t) = − + fk (t) − · hk (t) dt .
hc (t) hc (t)
If φ is any differentiable function, then φ(t + dt)dt = φ(t)dt, so we get the formula
 
fc (t + dt) fc (t)
(19) Vk (t + dt) = − + fk (t) − · hk (t) dt .
hc (t + dt) hc (t)
Putting equations (17) and (19), we get
 
fc (t + dt) fc (t) fc (t)
− = fk (t) − · hk (t) dt .
hc (t + dt) hc (t) hc (t)
Using the formula φ(t + dt) − φ(t) = dφ
dt dt, we can rewrite the above formula as
 
d fc (t) fc (t)
(20) = fk (t) − · hk (t)
dt hc (t) hc (t)
This is the Bellman–Euler equation in continuous-time. One can see that it is pretty similar
to our equation (8) in discrete time.
We now have to be more careful with our notation. By fc (t) in the above formula, what
we really mean is fc (t, kt , ct ) (as usual, calculated at the optimum, but I’m going to omit the
star superscript). Then we calculate
d dc
(fc ) = ftc + fkc · h + fcc · .
dt dt
So we can rewrite the Bellman–Euler equation (20) as follows:
   
dc fc dc
(21) − ftc + fkc · h + fcc · =− htc + hkc · h + hcc · − hc · hk − hc · fk .
dt hc dt
THE BELLMAN PRINCIPLE OF OPTIMALITY 5

In general, in order to solve this, notice that we can rewrite (21) as dct /dt = λ(t, kt , ct ), so the
optimum is given by the following system of ODE’s

dct /dt = λ(t, kt , ct )
(22)
dkt /dt = h(t, kt , ct ) .
Example 3.1. Applying the above analysis to our favorite example, we have f (t, kt , ct ) =
e−ρt u(ct ) and dkt /dt = h(t, kt , ct ) = et + φ(kt ) − ct . The Euler equation (20) becomes
d  −ρt 0 
−e u (ct ) = e−ρt u0 (ct )φ0 (kt ) ,
dt
or equivalently
 00 
u (ct )ct dct /dt
(23) − 0 · = φ0 (kt ) − ρ .
u (ct ) ct
Notice that we get the same equation as (7’) from Blanchard and Fischer (Chapter 2, p. 40),
so we are on the right track.

4. Continuous time, uncertainty


First, we assume that the uncertainty comes from the function h (for example, if h depends
on an uncertain endowment et ). In the second part of this section, we are going to assume
that the constraint is stochastic. But for now, the agent solves the problem
 Z ∞
max E0 f (t, kt , ct )

(24) ct 0
s.t. dk
dt = h(t, kt , ct ) .
t

The value function takes the form


Z ∞
dks
V (t, kt ) = max f (s, ks , cs ) ds s.t. = h(s, ks , cs ) ,
(cs ) t ds
and the Bellman principle of optimality (15) becomes
Z t+dt 

(25) V (t, kt ) = max f (s, ks , cs ) ds + Et V t + dt, kt + h(t, kt , ct )dt .
ct t
We arrive at the following Bellman–Euler equation
 
d fc fc
(26) Et = fk − · hk .
dt hc hc
For our particular Example 1.2, we get
 00 
u (ct )ct dct /dt
(27) − 0 · Et = φ0 (kt ) − ρ .
u (ct ) ct
Now we assume everything is stochastic, and the agent solves the problem
 Z ∞
max E0 f (t, kt , ct )

(28) ct 0
 s.t. dk = α(t, k , c )dt + β(t, k , c )dW ,
t t t t t t

where Wt is a one-dimensional Wiener process (Brownian motion), and α, β are deterministic


funtions. If we define the value function V (t, kt ) as above, the Bellman principle of optimality
implies that
h i
(29) V (t, kt ) = max f (t, kt , ct ) dt + Et V (t + dt, kt+dt ) .
ct
6 IOANID ROSU

Now V (t+dt, kt+dt ) = V (t+dt, kt +αdt+β dWt ) = V (t, kt )+Vt dt+Vk (αdt+β dWt )+ 12 Vkk β 2 dt,
where the last equality comes from Itô’s lemma. Taking expectation at t, it follows that
(30) Et V (t + dt, kt+dt ) = V (t, kt ) + Vt dt + Vk α dt + 12 Vkk β 2 dt .
The Bellman principle (29) can be written equivalently as
 
sup −V (t, kt ) + f (t, kt , ct )dt + Et V (t + dt, kt+dt ) ≤ 0 ,
ct

with equality at the optimum ct . Putting this together with (30), we get the following
(31) sup Da V (t, y) + f (t, y, a) = 0 ,
a
where Da is a partial differential operator defined by
(32) Da V (t, y) = Vt (t, y) + Vy (t, y)α(t, y, a) + 12 Vyy β(t, y, a)2 .
Equation (31) is also known as the Hamilton–Jacobi–Bellman equation. (We got the same
equation as Duffie, chapter 9A, so we’re fine.) This is not quite a PDE yet, because we have
a supremum operator before it. However, the first order condition for (31) does give a PDE.
The boundary condition comes from some transversality condition that we have to impose on
V at infinity (see Duffie).
Notice that for the stochastic case we took a different route than before. This is because
now we cannot eliminate the value function anymore (the reason is that we get an extra term
in the first order condition coming from the dWt -term, and that term depends on V as well).
So this approach first looks at a value function which satisfies the Hamilton–Jacobi–Bellman
equation, and then derives the optimal consumption ct and capital kt .

5. The Euler–Lagrange equation


The reason I’m treating this variational problem here is that the Bellman method seems
very well suited to solve it. The classical Euler–Lagrange equation
 
d ∂F ∂F
(33) =
dt ∂ ċ ∂c
solves the following problem
b
 Z
max F (t, c, ċ)dt

(34) c(t) a
s.t. c(a) = P and c(b) = Q .

Suppose we know the optimum curve c only up to some point x = c(t). Then we define the
value function
Z b
V (t, x) = max F (s, c, ċ)ds s.t. c(t) = x and c(b) = Q .
c(s) t

In the discussion that follows, we denote by λ a direction in Rn (the curve c also has values
in Rn ). The Bellman principle of optimality says that
Z t+dt
(35) V (t, x) = max F (t, x, λ)dt + Vt+dt (x + λdt) .
λ t
The last term comes from the identity c(t + dt) = c(t) + ċ(t)dt = x + λdt.
The first order condition for this maximum is (after dividing through by dt)
dVt+dt
= −Fċ .
dc
THE BELLMAN PRINCIPLE OF OPTIMALITY 7

The envelope theorem applied to equation (35) yields


dVt dVt+dt
= Fc dt + = Fc dt − Fċ .
dc dc
Replace t by t + dt in the above equation to get
dVt+dt
= Fc dt − Fċ (t + dt) .
dc
Putting together the two formulas we have for dVt+dt /dt, we obtain
Fċ (t + dt) − Fċ (t) = Fc dt .
This implies  
d
Fċ dt = Fc dt ,
dt
which after canceling dt is the desired Euler–Lagrange equation (34).

6. Conditions for a global optimum


There are two main issues in the existence of the solution. One is whether the value function
is finite or not. This will be taken care of by the budget constraint, as we will see below. The
other issue is whether or not the maximum in the Bellman principle of optimality is attained
at an interior point (in order to get the first order condition to hold). This last issue should
be resolved once we analyze the solution, and see that we have indeed an interior maximum.
Of course, after we know that the Bellman–Euler equation holds, we need to do a little bit of
extra work to see which of the possible solutions fits the initial data of the problem.
Since I don’t want to develop the general theory for this (one can look it up in a texbook),
I will just analyze our initial Example 1.2. Recall that we are solving the problem


 max X β t u(c )

t
(ct )
 t=0
s.t. kt+1 = (1 + r)kt + et − ct .

We assume that the endowment is always positive and has a finite present value t β t et < ∞.
P
We impose the requirement that the capital kt be eventually positive, i.e. that lim inf t kt ≥ 0.
For simplicity, assume also that β(1 + r) = 1.
Start with an optimum sequence of consumption (ct ). Such a sequence clearly exists, since
we are looking for a (ct ) that achieves the maximum utility at zero (finite or not). Note that
we don’t assume anything particular about it; for example we don’t assume that the value
function corresponding to it is finite, or that it attains an interior optimum. Now we start to
use our assumptions and show that (ct ) does satisfy all those properties.
Using the state equation, we deduce that for t sufficiently large (so that kt+1 becomes
positive, hence bigger than −1),
ct ≤ (1 + r)kt + et + 1 .
Write this inequality for all s > t and multiply through by β s−t . Denote by P Vt (c) =
s−t c , the present value of consumption at time t, and similarly for k and e. Adding
P
s≥t β s
up the above inequalities for s ≥ t, we get
1
(36) P Vt (c) ≤ (1 + r)P Vt (k) + P Vt (e) + .
1−β
We assumed that P Vt (e) < ∞. Let’s show that P Vt (c) < ∞. If P Vt (k) < ∞, we are done. If
P Vt (k) = ∞, then we can consume out of this capital until P Vt (k) becomes finite. Certainly,
by doing this, P Vt (c) will increase, yet it will still satisfy (36). Now all the terms on the
right hand side of (36) are finite, hence so is P Vt (c), at the new level of consumption. That
8 IOANID ROSU

means that our original P Vt (c) must be finite. Moreover, it also follows that P Vt (k) is finite.
Suppose it isn’t. Then, we can proceed as above and increase P Vt (c), while still keeping it
finite. But this is in contradiction with the fact that (ct ) was chosen to be an optimum. (I
have to admit that this part is tricky, but we can’t avoid it if we want to be rigorous!) Since
u is monotone concave, P Vt u(c) is also finite. That means that the Bellman value function
V (t, kt ) is finite, so our first concern is resolved.
We now look at the Bellman principle of optimality
h i
V (t, kt ) = max β t u(ct ) + V t + 1, (1 + r)kt + et − ct ) .
ct
Here’s a subtle idea: if we save one small quantity  today, tomorrow it becomes (1 + r), and
we can either consume it, or leave for later. This latter decision has to be made in an optimum
fashion (since this is the definition of Vt+1 ), so we are indifferent between consuming (1 + r)
tomorrow and saving it for later on. Thus, we might as well assume that we are consuming
it tomorrow, so when we calculate Vt+1 , only tomorrow’s utility is affected. Therefore, for
marginal purposes we can regard Vt+1 as equal to β t+1 u(ct+1 ). Then, to analyze the Bellman
optimum locally is the same as analyzing
h i
max β t u(ct − ) + β t+1 u ct+1 + (1 + r)

locally around the optimum  = 0. Since β(1 + r) = 1, the Bellman–Euler equation is
u0 (ct ) = u0 (ct+1 ) or equivalently ct = ct+1 = c .
Because u is concave, it is easy to see that  = 0 is a local maximum. That takes care of our
second concern.
We notice that consumption should be smoothened completely, so that ct = c for all t.
Recall the state equation: et − ct = kt+1 − (1 + r)kt . Multiplying this by β t = 1/(1 + r)t and
summing over all t, we get
X X 1
β t (et − ct ) = = lim β t kt+1 − (1 + r)k0 .

t
k t+1 − kt
t t
(1 + r) t→∞

P t that1 P Vt (k) < ∞. In


During the discussion of the finiteness of the value function, we noticed
t
particular, this implies that limt β kt = 0. So we get the formula t β et − 1−β c = −(1 + r)k0 ,
from which we deduce that

X
(37) c = rβ β t et + rk0 .
t=0
Since there is only one optimum, it must be a global optimum, so this ends the solution of
our example.

You might also like