Dynamic Programming 3 Bellman
Dynamic Programming 3 Bellman
(s) ds = v(s) + v
(s) s dt v(s) dt
where we have ignored higher order terms of dt. Therefore,
v(s) = max
x
{u(x, s) dt + v(s) + v
(s) s} , (25)
with s = (x, s). The FOC for this problem is
u
x
(x, s) +
x
(x, s)v
(s) = 0
and if x
, s) + (x
, s)v
(s) ,
which is a rst-order dierential equation. The solution to this ODE, subject to the
appropriate boundary condition (often v(0) = u(0, 0)), is the value function for the
problem, and x
(s)} .
22
The FOC is
1/x = v
(s), x
= 1/v
(s)
and so the Bellman equation becomes
v(s) = ln v
(s) 1 .
Use the exponential function, and rearrange to arrive at
v
(s) exp(v(s) + 1) = .
Since the LHS of this is
d
ds
exp(v(s) + 1), we see that the solution is
exp(v(s) + 1) = s + k ,
i.e.
v(s) = ln(s + k) 1
for some constant k. The boundary condition is v(0) = u(0) = , i.e. ln k = or
k = 0. So the solution is
v(s) =
1
[ln(s) 1] ,
with v
(s) = s .
Therefore, in this example, it is optimal to consume a constant fraction of the remaining
stock at each point in time. (More precisely, the agent should consume at a rate that
is proportional to her remaining stock.) The rate of consumption is increasing with the
discount rate as one would expect.
Exercise: Cake-eating with CRRA utility
Generalize the above problem to one where the agent has utility function u(x) =
x
1R
1R
,
where 0 < R < 1. Show that the value function is
v(s) =
1
R
R
(s)
1R
1 R
with optimal consumption given by
x
(s) = s/R.
What is the solution if the agent has utility function u(x) =
x
1R
1
1R
? Check if this
solution converges to that of the previous problem when R 1. What about the case
R > 1?
Example: Consumption-savings with CRRA utility and a known interest rate
Consider an agent with utility function u(x) =
x
1R
1R
, and wealth s. Wealth evolves
according to s = rs x where r is known and xed over time. The value function for
this problem satises
v(s) = max
0xrs
{u(x) + [rs x]v
(s)} , (26)
23
the FOC is the same as in the above exercise, and the Bellman equation becomes
v(s) = rsv
(s) +
R
1 R
v
(s)
1R
R
.
This is not an especially easy ODE to solve, but given the solution when r = 0 it is natural
to guess a solution of the form v(s) =
Bs
1R
1R
. This satises the BC v(0) = u(0) = 0,
and also satises the ODE for a value of B leading to
v(s) =
_
R
(1 R)r
_
R
s
1R
1 R
.
(Notice that this solution collapses to the cake-eating problem when r = 0.) Unfortu-
nately, since v
(s)
1/R
, or
x
(s) =
(1 R)r
R
s
and again the agent consumes at a rate proportional to her wealth at each point in time.
Notice that, since x
(s) = rv
(s) + (rs x
(s))v
(s)
or
(r )v
(s) = (rs x
(s))(v
(s)) .
Since we can show directly that v is both increasing and concave, i.e. both v
and v
(s)x}
= max
x
{R(x) [c(s) v
(s)]x} .
A myopic rm would choose x simply to maximize R(x) c(s)x. Since it is clear that
v
(s) < 0), it is easy to see that the rm produces more at each point
in time than its short-run incentive would dictate in eect its relevant marginal costs
are lower than c(s) the intuition being that producing more now has a positive eect
on future prots.
Example: The Ramsey growth model
The Bellman equation for the Ramsey growth model is given by
v(s) = max
x
{u(x) + v
, x
(s
) = v
(s
) [f(s
) x
(n + )s
] + v
(s
) [f
(s
) (n + )] .
However, from the condition that s = 0, this pair must also satisfy f(s
)x
(n+)s
=
0. Therefore
v
(s
) = v
(s
) [f
(s
) (n + )] ,
and by cancelling v
(s
(s
) = n + + .
(Compare equation (17).) Stability properties and asymptotic behaviour are covered in
the section on Optimal Control.
One problem that does allow an explicit solution is the following (see Dixit, Exercise
11.2), which takes a slightly dierent form to our previous problems.
Example: A rms research project involves ongoing R&D. The R&D intensity at time
t is x(t) and the stock of research is s(t), where this evolves according to s = f(x), with
f being a concave function. As soon as the stock reaches the level s = s the project is
completed and the rm receives a payo R. If the time taken to reach this target is T,
the rms discounted payo is
e
T
R
_
T
0
e
t
x(t) dt .
25
Let v(s) denote the maximum discounted prot starting with an initial research stock s.
Then the Bellman equation is
v(s) = max
x
{x + f(x)v
(s)} . (27)
In the case where f(x) = 2
x this becomes
v(s) = (v
(s))
2
.
Solving this subject to the boundary condition v( s) = R gives the solution
_
v(s) =
( s s)/2 .
Therefore, starting with an initial stock s = 0, it is worth pursuing the project only if
R > s
2
/4. From (27) the optimal choice of x as a function of the current stock s is
given by
x
(s) = (v
(s))
2
= v(s) .
Therefore, the optimal time-path of R&D intensity is increasing over time.
5.2.1 Asset equations
Assume that you can be in one of two states, X and Y . In state X you receive ow
utility of w
X
and with probability q
X
dt an event occurs resulting in instantaneous utility
of u
X
and a transit to state Y ; with probability 1 q
X
dt the event does not occur and
you simply remain in state X. Similarly for being in state Y . Then, with v() denoting
the value function,
v(X) w
X
dt + (1 dt) [q
X
dt (u
X
+ v(Y )) + (1 q
X
dt)v(X)]
= w
X
dt + (1 dt) [q
X
(u
X
+ v(Y ) v(X)) dt + v(X)]
w
X
dt + q
X
[u
X
+ v(Y ) v(X)] dt + v(X) v(X) dt
So
v(X) = w
X
+ q
X
[u
X
+ v(Y ) v(X)]
and similarly
v(Y ) = w
Y
+ q
Y
[u
Y
+ v(X) v(Y )] .
These are known as asset equations, and are of the form
discount rate asset value = dividend + expected capital gain,
and subtracting one from the other gives us a single equation determining v(X) v(Y ).
Example: Diamonds model of search (Journal of Political Economy, 1982)
Any agent can be in one of two states, employed (E): in possession of a good to barter;
unemployed (U): searching for a good. In state E an agent meets another in state E
with probability b dt, they exchange, consume the good receiving utility y, and become
unemployed. In state U an agent nds a production opportunity with probability a dt;
26
this production opportunity has a cost c which is distributed according to some CDF
G, and she uses a cut-o rule accept the opportunity if c c and reject otherwise.
The agents objective is to maximize discounted life-time utility
_
0
e
t
u(y
t
) dt where
u(y) = y.
If v() denotes the value function for this problem, show that
v(E) = b [y + v(U) v(E)]
and
v(U) = a
_
c
0
[c + v(E) v(U)] dG(c) .
Writing V ( c) for v(E) v(U) when using cut-o c, show that
V ( c) =
by + a
_
c
0
c dG(c)
+ b + aG( c)
and nd the optimal cut-o c
.
The rest of the model:
The probability of two agents in state E meeting depends on the fraction of agents
employed e, i.e. b = b(e), with b
) eb(e)
and we can determine steady-state values of e as a function of c
x, nd by direct methods the value function v(s, T). Show that this
function satises the Bellman equation (28).
Exercise: In the cake-eating problem with a nite horizon T and no discounting (i.e.
= 0), it is intuitive (when utility u(x) is concave) that the agent will consume the cake
steadily at the rate s/T for the whole time, which yields total utility v(s, T) = Tu(s/T).
Verify that for any concave utility function u(x), this value function does indeed satisfy
the Bellman equation (28).
28
6 Optimal Control and Dynamic Programming the
connections
In the optimal control approach, the rst equation in (13) shows that x
(t) maximizes
H(x, s(t)) = u(x, s(t)) + (t)(x, s(t)) ,
whereas from (25) we see that in the dynamic programming approach x
(t) maximizes
u(x, s(t)) + v
(s(t))(x, s(t)) .
These two approaches are consistent only if
(t) v
(s(t)) ,
so that the multiplier (t) represents the marginal benet of having fractionally more of
the state variable s at time t. (This is a general property of Lagrange multipliers.)
The second equation in (13) can be rewritten as
=
s
u(x
, s) +
s
(x
, s) +
(s) =
s
u(x
, s) + v
(s)
s
(x
, s) +
d
dt
v
(s)
=
s
u(x
, s) + v
(s)
s
(x
, s) + v
(s) s
=
s
u(x
, s) + v
(s)
s
(x
, s) + v
(s)(x
, s)
=
s
u(x
, s) +
s
_
v
(s)(x
, s)
_
.
If we consider the dierential equation that comes from the Bellman equation (25),
namely
v(s) = u(x
, s) + v
(s)(x
, s) ,
and dierentiate it with respect to s, using the envelope theorem, then we obtain exactly
the same equality.
29