0% found this document useful (0 votes)
25 views8 pages

3 Recursive

1) Dynamic programming provides a recursive method for solving dynamic optimization problems by breaking them into sequential single-period problems, starting from the final period and moving backwards. 2) This document provides an example of using dynamic programming to solve a finite-horizon consumption-savings problem. 3) The example shows how to derive a time-invariant value function and policy function for an infinite-horizon problem by taking the limit of the finite-horizon problem as the time horizon approaches infinity.

Uploaded by

刘銘
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

3 Recursive

1) Dynamic programming provides a recursive method for solving dynamic optimization problems by breaking them into sequential single-period problems, starting from the final period and moving backwards. 2) This document provides an example of using dynamic programming to solve a finite-horizon consumption-savings problem. 3) The example shows how to derive a time-invariant value function and policy function for an infinite-horizon problem by taking the limit of the finite-horizon problem as the time horizon approaches infinity.

Uploaded by

刘銘
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EC400: Mathematical Methods for Macroeconomics Fall 2023

Dynamic Programming
Dmitry Mukhin
[email protected]

The previous lecture shows that dynamic optimization problems can be much more com-
plex than the static ones because of a large number of periods and states. Such problems,
however, also have one important advantage that allows to solve them in a completely differ-
ent way than the static problems: the variables are naturally ordered by time. As a result, one
can use a backward induction and apply the recursive methods.

Recursive approach Consider again a finite-period deterministic savings problem:

T
X
max β t u(Ct )
{Ct ,Bt+1 }
t=0

s.t. Bt+1 = RBt + Yt − Ct

BT +1 ≥ 0.

Following a standard sequential approach, we get the Euler equations

u0 (Ct ) = βR u0 (Ct+1 ), ∀t < T

and the transversality condition


BT +1 = 0.

These optimality conditions together with the budget constraint constitute a system of 2 ×
(T + 1) equations with 2 × (T + 1) unknowns. As the horizon T goes up, the dimensionality
of the problem and the computational burden increase as well.
The key idea of the recursive methods is to solve the problem sequentially, period by pe-
riod, starting with the very last one and then moving backwards to period zero. In this case,
there is no need to solve a large system of equations. Instead, the problem reduces to multiple
steps, each computationally very simple. To see this, we solve the model backwards assuming
for simplicity log utility u(C) = log C and zero income Yt = 0:

1. Period T : the optimal decision is trivial as the agent consumes all of her wealth leaving

1
zero bequest
BT +1 = 0, CT = RBT

It follows the utility the household gets in the last period is equal

VT = log(RBT )

2. Period T − 1: the agent starts the period with wealth RBT −1 and chooses saving to
maximize utility in the two remaining periods of life:
n o
max log(CT −1 ) + βVT (BT )
BT

s.t. BT = RBT −1 − CT −1

Substitute VT and the budget constraint into the objective function:


n o
max log(RBT −1 − BT ) + β log(RBT )
BT

The first-order condition


1 β
=
RBT −1 − BT BT
implies
β 1
BT = RBT −1 , CT −1 = RBT −1 .
1+β 1+β
and the total utility from the last two periods is equal

1 βR
VT −1 = log CT −1 + βVT = log RBT −1 + β log RBT −1
1+β 1+β

RBT −1
= (1 + β) log + β log(βR).
1+β

3. Period T − 2: the agent starts the period with wealth RBT −2 and chooses saving to
maximize utility in the current and all remaining periods:
n o
max log(CT −2 ) + βVT −1 (BT −1 )
BT −1

s.t. BT −1 = RBT −2 − CT −2

Substitute VT −1 and the budget constraint into the objective function:


n RBT −1 o
max log(RBT −2 − BT −1 ) + β(1 + β) log + β 2 log(βR)
BT −1 1+β

2
Take the first-order condition

1 β(1 + β)
=
RBT −2 − BT −1 BT −1

and solve it to obtain

β(1 + β) 1
BT −1 = RBT −2 , CT −2 = RBT −2 .
1 + β(1 + β) 1 + β(1 + β)

It follows that the total welfare from the last three periods is equal

RBT −2
VT −2 = log CT −2 + βVT −1 = (1 + β + β 2 ) log + β(1 + 2β) log(βR)
1 + β + β2

4. . . .

5. Period t: although we can follow the same steps up until period zero, this problem is
simple enough to solve it using the “guess and verify” method. Indeed, the only thing
we need to know about Vt+1 to solve the optimization problem in period t is how it
depends on Bt+1 . The previous steps suggest that

Vt+1 = (1 + β + · · · + β T −t−1 ) log Bt+1 + constt+1 ,

where constt+1 is some constant that depends only on parameters β and R. Using this
“educated guess”, the optimization problem of period t can be written as
n o
max log(RBt − Bt+1 ) + β(1 + β + · · · + β T −t−1 ) log Bt+1 + βconstt+1 .
Bt+1

Take the first-order condition

1 β(1 + β + · · · + β T −t−1 )
=
RBt − Bt+1 Bt+1

and solve it to get

β + β 2 + · · · + β T −t 1
Bt+1 = RBt , Ct = RBt .
1 + β + · · · + β T −t 1 + β + · · · + β T −t

The resulting total welfare is equal

Vt = log Ct + βVt+1 = (1 + β + · · · + β T −t ) log Bt + constt ,

confirming the guess above.

Of course, not every problem can be solved using the guess-and-verify method. In some cases,

3
it is not even possible to solve analytically the optimization problem in a given period. How-
ever, implemented numerically, the method is extremely powerful and can be applied to a
large class of models. Moreover, increasing horizon T to infinity actually simplifies the prob-
lem. This surprising result can be easily seen from the expressions above: while Vt (B) depends
on t when T is finite, taking the limit T → ∞, results in

1
V (B) = log B + const,
1−β

where const is independent of t or Bt . Thus, the continuation value V (Bt ) depends only on
the level of wealth at the beginning of the corresponding period. The optimization problem
in arbitrary period of an infinitely-lived agent is given by
n 1 o
max log(RB − B 0 ) + β log B 0 + βconst ,
B0 1−β

where a prime denotes the next period’s variable. The optimality condition

1 β 1
0
=
RB − B 1 − β B0

implies
Bt+1 = βRBt , Ct = (1 − β)RBt

and

1 β
V (B) = log Ct + βV (Bt+1 ) = log Bt + log(1 − β)R + log βR + βconst.
1−β 1−β

Substituting the guess for V (B) on the left hand side of the equation, we can solve for the
constant term:
1 h β i
const = log(1 − β)R + log βR .
1−β 1−β
The theory of dynamic programming generalizes these insights. Instead of fousing on
the optimal sequences {Ct , Bt+1 }, it looks for time-invariant value function V (·) and policy
function Bt+1 = g(·) that solve the Bellman equation:
n o
0 0
V (B) = max
0
u(RB − B ) + βV (B ) .
B

Intuitively, the policy function allows to recover the optimal path as B1 = g(B0 ), B2 =
g(B1 ) = g(g(B0 )), etc. To find the policy function, we need to solve the optimization prob-
lem in some period t. The state variable summarizes the past, i.e. the wealth accumulated in
previous periods, while the value function summarizes the future, i.e. the maximum utility
attainable in the next periods. Knowing these two objects is sufficient to resolve the trade-off

4
between current and future consumption.

Optimality conditions Although this is hardly the main application of the Bellman equa-
tion, the recursive methods can be used to derive the same optimality conditions that we
previously obtained following the sequential approach. Indeed, the first order condition for
the Bellman equation is

u0 (RB − B 0 ) = βV 0 (B 0 ) ⇒ u0 (C) = βV 0 (B 0 ),

while the envelope condition for the value function, also called the Benveniste-Scheinkman
equation, implies

V 0 (B) = u0 (RB − B 0 )R ⇒ V 0 (B) = u0 (C)R.

Combining together these conditions, we get the Euler equation from the previous lecture

u0 (C) = βR u0 (C 0 ).

Computation How can we solve the Bellman equation when simple analytical expressions
for V (·) do not exist? Luckily, the contraction mapping theorem can be used to show that
under some regularity conditions, the Bellman equation has unique solution and iterating the
equation starting from an arbitrary guess for a value function will eventually lead to the right
solution. This formalizes the idea that VT (·) → V () as the horizon goes to infinity T → ∞.
This theoretical result suggests the following numerical algorithm:
1. Start with an arbitrary function V0 (·).

2. For every value of B, solve the Bellman equation to obtain V1 (·)


n o
0 0
V1 (B) = max
0
u(RB − B ) + βV 0 (B ) .
B

3. Iterate this procedure updating the value function at each step


n o
0 0
Vj+1 (B) = max
0
u(RB − B ) + βVj (B ) .
B

4. Stop when the value function converges, i.e. the functions Vj+1 (·) and Vj (·) are “close”
for any values of B.

5. Solve for the policy function B 0 = g(B).


This algorithm works surprisingly well for basic macroeconomic models and can be improved
in many ways to accommodate more complex problems. The important advantage of the

5
dynamic programming over the sequential approach is that it can solve discrete problems. In-
deed, the computation does not involve taking the derivatives and does not require the func-
tions to be differentiable. Similarly, the method can easily accommodate occasionally binding
constraints, which give rise to additional complementary slackness conditions under the se-
quential approach.

Theory We briefly discuss the main theoretical result about dynamic programming. Define
the value function as a solution to a general dynamic problem

X
V (Bt ) = max β j u(Bt+j , Bt+j+1 ) (1)
{Bt+j+1 }∞
j=0
j=0

s.t. Bt+j+1 ∈ Γ(Bt+j ),

where B is a vector, Γ(·) is a non-empty correspondence, and Bt is given. The next result
shows that this problem can be restated in terms of a functional equation, i.e. an equation
with unknown function, rather than a variable.

Theorem 1 (Principle of optimality) Solving sequential problem is equivalent to solving the


Bellman equation n o
0 0
V (B) = max
0
u(B, B ) + βV (B ) . (2)
B ∈Γ(B)

Proof: Partition the sum into two terms:

n ∞
X o
V (Bt ) = max u(Bt , Bt+1 ) + β j u(Bt+j , Bt+j+1 )
{Bt+j+1 }∞
j=0
j=1
n ∞
X o
= max u(Bt , Bt+1 ) + max ∞ β j u(Bt+j , Bt+j+1 )
Bt+1 {Bt+j+1 }j=1
j=1
n ∞
X o
= max u(Bt , Bt+1 ) + max β j+1 u(B(t+1)+j , B(t+1)+j+1 )
Bt+1 {B(t+1)+j+1 }∞
j=0
j=0
n o
= max u(Bt , Bt+1 ) + βV (Bt+1 )
Bt+1

where the last step follows from the definition of V (Bt+1 ) and the constraint Bt+j+1 ∈ Γ(Bt+j )
is suppressed to simplify notation. 

HJB equation Just like the sequential approach, the recursive methods can be also applied to
solve continuous-time problems. To see this, consider again the deterministic savings problem
with zero income Yt = 0 and assume an arbitrary period length ∆. Note that the value function
is a stock as it reflects the continuation value at a given point in time. Given that β ≡ e−ρ∆ ,

6
the Bellman equation (2) becomes
n o
V (Bt ) = max u(Ct )∆ + e−ρ∆ V (Bt+∆ ) .
Ct

Because V (Bt ) does not affect the optimal choice, we can subtract it from both sides of the
equation to obtain
n o
−ρ∆
max u(Ct )∆ + e V (Bt+∆ ) − V (Bt ) = 0.
Ct

Use the approximation e−ρ∆ ≈ 1 − ρ∆ for small values of ∆ and rewrite the problem in terms
of the control variable:
n o
max u(Ct )∆ − ρV (Bt+∆ )∆ + V (Bt+∆ ) − V (Bt ) = 0.
Ct

Divide the equation by ∆ and express it as follows:


n V (Bt+∆ ) − V (Bt ) Bt+∆ − Bt o
max u(Ct ) − ρV (Bt+∆ ) + = 0.
Ct Bt+∆ − Bt ∆

V (Bt+∆ )−V (Bt ) Bt+∆ −Bt


Note that as ∆ → 0, Bt+∆ → Bt and hence, Bt+∆ −Bt
→ V 0 (Bt ) and ∆
→ Ḃt :
n o
max u(Ct ) − ρV (Bt ) + V 0 (Bt )Ḃt = 0.
Ct

Lastly, substitute in the continuous-time budget constraint from the previous lecture, drop
time subscripts, and rearrange terms to get the Hamilton-Jacobi-Bellman (HJB) equation:
n o
ρV (B) = max u(C) + V 0 (B)(rB − C) .
C

There are well-developed numerical tools to solve such differential equations.


As the Bellman equation in discrete time, the HJB equation can be used to derive the Euler
equation. The first order optimality condition requires

u0 (Ct ) = V 0 (Bt )

in every period of time. Therefore, differentiating this equation with respect to t, we get

u00 (Ct )Ċt = V 00 (Bt )Ḃt .

On the other hand, the envelope condition implies

ρV 0 (Bt ) = rV 0 (Bt ) + V 00 (Bt )Ḃt .

7
Combining these expressions, we get

ρu0 (Ct ) = ru0 (Ct ) + u00 (Ct )Ċt

which can be rewritten as the Euler equation from the previous lecture:

u00 (Ct )
− Ċt = r − ρ.
u0 (Ct )

Example Consider again the case of log utility u(C) = log C and no labor income Yt = 0,
so that the HJB equation is given by
n o
0
ρV (B) = max log C + V (B)(rB − C) .
C

Following our results for the discrete-time problem, conjecture that V (B) = α + γ log B,
where α and γ are some unknown constants. Substitute this conjecture into the HJB, take the
FOC, and express consumption function as C = Bγ . Substitute this result together with the
conjectured value function into the HJB:
 
B γ B
ρα + ργ log B = log + rB − .
γ B γ

Combing constant terms and terms with log B, we get

1 log ρ + r/ρ − 1
γ= , α= .
ρ ρ

Finally, notice that this solution can also be obtained by writing the discrete-time value func-
tion from above for an arbitrary period length ∆ and taking the limit ∆ → 0.

You might also like