3 Recursive
3 Recursive
Dynamic Programming
Dmitry Mukhin
[email protected]
The previous lecture shows that dynamic optimization problems can be much more com-
plex than the static ones because of a large number of periods and states. Such problems,
however, also have one important advantage that allows to solve them in a completely differ-
ent way than the static problems: the variables are naturally ordered by time. As a result, one
can use a backward induction and apply the recursive methods.
T
X
max β t u(Ct )
{Ct ,Bt+1 }
t=0
BT +1 ≥ 0.
These optimality conditions together with the budget constraint constitute a system of 2 ×
(T + 1) equations with 2 × (T + 1) unknowns. As the horizon T goes up, the dimensionality
of the problem and the computational burden increase as well.
The key idea of the recursive methods is to solve the problem sequentially, period by pe-
riod, starting with the very last one and then moving backwards to period zero. In this case,
there is no need to solve a large system of equations. Instead, the problem reduces to multiple
steps, each computationally very simple. To see this, we solve the model backwards assuming
for simplicity log utility u(C) = log C and zero income Yt = 0:
1. Period T : the optimal decision is trivial as the agent consumes all of her wealth leaving
1
zero bequest
BT +1 = 0, CT = RBT
It follows the utility the household gets in the last period is equal
VT = log(RBT )
2. Period T − 1: the agent starts the period with wealth RBT −1 and chooses saving to
maximize utility in the two remaining periods of life:
n o
max log(CT −1 ) + βVT (BT )
BT
s.t. BT = RBT −1 − CT −1
1 βR
VT −1 = log CT −1 + βVT = log RBT −1 + β log RBT −1
1+β 1+β
RBT −1
= (1 + β) log + β log(βR).
1+β
3. Period T − 2: the agent starts the period with wealth RBT −2 and chooses saving to
maximize utility in the current and all remaining periods:
n o
max log(CT −2 ) + βVT −1 (BT −1 )
BT −1
s.t. BT −1 = RBT −2 − CT −2
2
Take the first-order condition
1 β(1 + β)
=
RBT −2 − BT −1 BT −1
β(1 + β) 1
BT −1 = RBT −2 , CT −2 = RBT −2 .
1 + β(1 + β) 1 + β(1 + β)
It follows that the total welfare from the last three periods is equal
RBT −2
VT −2 = log CT −2 + βVT −1 = (1 + β + β 2 ) log + β(1 + 2β) log(βR)
1 + β + β2
4. . . .
5. Period t: although we can follow the same steps up until period zero, this problem is
simple enough to solve it using the “guess and verify” method. Indeed, the only thing
we need to know about Vt+1 to solve the optimization problem in period t is how it
depends on Bt+1 . The previous steps suggest that
where constt+1 is some constant that depends only on parameters β and R. Using this
“educated guess”, the optimization problem of period t can be written as
n o
max log(RBt − Bt+1 ) + β(1 + β + · · · + β T −t−1 ) log Bt+1 + βconstt+1 .
Bt+1
1 β(1 + β + · · · + β T −t−1 )
=
RBt − Bt+1 Bt+1
β + β 2 + · · · + β T −t 1
Bt+1 = RBt , Ct = RBt .
1 + β + · · · + β T −t 1 + β + · · · + β T −t
Of course, not every problem can be solved using the guess-and-verify method. In some cases,
3
it is not even possible to solve analytically the optimization problem in a given period. How-
ever, implemented numerically, the method is extremely powerful and can be applied to a
large class of models. Moreover, increasing horizon T to infinity actually simplifies the prob-
lem. This surprising result can be easily seen from the expressions above: while Vt (B) depends
on t when T is finite, taking the limit T → ∞, results in
1
V (B) = log B + const,
1−β
where const is independent of t or Bt . Thus, the continuation value V (Bt ) depends only on
the level of wealth at the beginning of the corresponding period. The optimization problem
in arbitrary period of an infinitely-lived agent is given by
n 1 o
max log(RB − B 0 ) + β log B 0 + βconst ,
B0 1−β
where a prime denotes the next period’s variable. The optimality condition
1 β 1
0
=
RB − B 1 − β B0
implies
Bt+1 = βRBt , Ct = (1 − β)RBt
and
1 β
V (B) = log Ct + βV (Bt+1 ) = log Bt + log(1 − β)R + log βR + βconst.
1−β 1−β
Substituting the guess for V (B) on the left hand side of the equation, we can solve for the
constant term:
1 h β i
const = log(1 − β)R + log βR .
1−β 1−β
The theory of dynamic programming generalizes these insights. Instead of fousing on
the optimal sequences {Ct , Bt+1 }, it looks for time-invariant value function V (·) and policy
function Bt+1 = g(·) that solve the Bellman equation:
n o
0 0
V (B) = max
0
u(RB − B ) + βV (B ) .
B
Intuitively, the policy function allows to recover the optimal path as B1 = g(B0 ), B2 =
g(B1 ) = g(g(B0 )), etc. To find the policy function, we need to solve the optimization prob-
lem in some period t. The state variable summarizes the past, i.e. the wealth accumulated in
previous periods, while the value function summarizes the future, i.e. the maximum utility
attainable in the next periods. Knowing these two objects is sufficient to resolve the trade-off
4
between current and future consumption.
Optimality conditions Although this is hardly the main application of the Bellman equa-
tion, the recursive methods can be used to derive the same optimality conditions that we
previously obtained following the sequential approach. Indeed, the first order condition for
the Bellman equation is
u0 (RB − B 0 ) = βV 0 (B 0 ) ⇒ u0 (C) = βV 0 (B 0 ),
while the envelope condition for the value function, also called the Benveniste-Scheinkman
equation, implies
Combining together these conditions, we get the Euler equation from the previous lecture
u0 (C) = βR u0 (C 0 ).
Computation How can we solve the Bellman equation when simple analytical expressions
for V (·) do not exist? Luckily, the contraction mapping theorem can be used to show that
under some regularity conditions, the Bellman equation has unique solution and iterating the
equation starting from an arbitrary guess for a value function will eventually lead to the right
solution. This formalizes the idea that VT (·) → V () as the horizon goes to infinity T → ∞.
This theoretical result suggests the following numerical algorithm:
1. Start with an arbitrary function V0 (·).
4. Stop when the value function converges, i.e. the functions Vj+1 (·) and Vj (·) are “close”
for any values of B.
5
dynamic programming over the sequential approach is that it can solve discrete problems. In-
deed, the computation does not involve taking the derivatives and does not require the func-
tions to be differentiable. Similarly, the method can easily accommodate occasionally binding
constraints, which give rise to additional complementary slackness conditions under the se-
quential approach.
Theory We briefly discuss the main theoretical result about dynamic programming. Define
the value function as a solution to a general dynamic problem
∞
X
V (Bt ) = max β j u(Bt+j , Bt+j+1 ) (1)
{Bt+j+1 }∞
j=0
j=0
where B is a vector, Γ(·) is a non-empty correspondence, and Bt is given. The next result
shows that this problem can be restated in terms of a functional equation, i.e. an equation
with unknown function, rather than a variable.
n ∞
X o
V (Bt ) = max u(Bt , Bt+1 ) + β j u(Bt+j , Bt+j+1 )
{Bt+j+1 }∞
j=0
j=1
n ∞
X o
= max u(Bt , Bt+1 ) + max ∞ β j u(Bt+j , Bt+j+1 )
Bt+1 {Bt+j+1 }j=1
j=1
n ∞
X o
= max u(Bt , Bt+1 ) + max β j+1 u(B(t+1)+j , B(t+1)+j+1 )
Bt+1 {B(t+1)+j+1 }∞
j=0
j=0
n o
= max u(Bt , Bt+1 ) + βV (Bt+1 )
Bt+1
where the last step follows from the definition of V (Bt+1 ) and the constraint Bt+j+1 ∈ Γ(Bt+j )
is suppressed to simplify notation.
HJB equation Just like the sequential approach, the recursive methods can be also applied to
solve continuous-time problems. To see this, consider again the deterministic savings problem
with zero income Yt = 0 and assume an arbitrary period length ∆. Note that the value function
is a stock as it reflects the continuation value at a given point in time. Given that β ≡ e−ρ∆ ,
6
the Bellman equation (2) becomes
n o
V (Bt ) = max u(Ct )∆ + e−ρ∆ V (Bt+∆ ) .
Ct
Because V (Bt ) does not affect the optimal choice, we can subtract it from both sides of the
equation to obtain
n o
−ρ∆
max u(Ct )∆ + e V (Bt+∆ ) − V (Bt ) = 0.
Ct
Use the approximation e−ρ∆ ≈ 1 − ρ∆ for small values of ∆ and rewrite the problem in terms
of the control variable:
n o
max u(Ct )∆ − ρV (Bt+∆ )∆ + V (Bt+∆ ) − V (Bt ) = 0.
Ct
Lastly, substitute in the continuous-time budget constraint from the previous lecture, drop
time subscripts, and rearrange terms to get the Hamilton-Jacobi-Bellman (HJB) equation:
n o
ρV (B) = max u(C) + V 0 (B)(rB − C) .
C
u0 (Ct ) = V 0 (Bt )
in every period of time. Therefore, differentiating this equation with respect to t, we get
7
Combining these expressions, we get
which can be rewritten as the Euler equation from the previous lecture:
u00 (Ct )
− Ċt = r − ρ.
u0 (Ct )
Example Consider again the case of log utility u(C) = log C and no labor income Yt = 0,
so that the HJB equation is given by
n o
0
ρV (B) = max log C + V (B)(rB − C) .
C
Following our results for the discrete-time problem, conjecture that V (B) = α + γ log B,
where α and γ are some unknown constants. Substitute this conjecture into the HJB, take the
FOC, and express consumption function as C = Bγ . Substitute this result together with the
conjectured value function into the HJB:
B γ B
ρα + ργ log B = log + rB − .
γ B γ
1 log ρ + r/ρ − 1
γ= , α= .
ρ ρ
Finally, notice that this solution can also be obtained by writing the discrete-time value func-
tion from above for an arbitrary period length ∆ and taking the limit ∆ → 0.