0% found this document useful (0 votes)
25 views

Dynamic Programming

The document summarizes a dynamic programming problem involving the optimal consumption of a cake over multiple periods. It defines the value function for the final period as the utility of consuming the remaining cake. The value function for earlier periods is defined recursively as the maximized value of current utility plus discounted future value, assuming future choices will be optimal. Exercises define the first-order conditions characterizing the optimal policy functions for saving cake in each period for problems of different time horizons.

Uploaded by

Franck Banalet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Dynamic Programming

The document summarizes a dynamic programming problem involving the optimal consumption of a cake over multiple periods. It defines the value function for the final period as the utility of consuming the remaining cake. The value function for earlier periods is defined recursively as the maximized value of current utility plus discounted future value, assuming future choices will be optimal. Exercises define the first-order conditions characterizing the optimal policy functions for saving cake in each period for problems of different time horizons.

Uploaded by

Franck Banalet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

BYU-MCL Boot Camp

Dynamic Programming
Professor Richard W. Evans

1 Introduction
It is hard to think of any decision that is not dynamic and does not involve some trade-
off between the current period and some future period. Examples of dynamic decisions
include the classical consumption-savings decision, marriage, eduction, labor-leisure.
Dynamic decision making is ubiquitous. In fact, static models are often simply an
approximation of the more realistic dynamic setting.
The term “Dynamic programming” initially was the name for the program of
research that studied how to solve dynamic models (see Dreyfus, 2002). Dynamic
programming has since come to be associated with the particular dynamic solution
method of value function iteration pioneered by Richard Bellman pioneered.
Although dynamic modeling is very rich and realistic, it is also very hard. Take,
for example, the law of motion for the capital stock.

kt+1 = kt − δkt + it (1)

This equation says that the value of the capital stock tomorrow kt+1 is equal to
the value of the capital stock today kt minus the percent of the capital stock that
depreciates between today and tomorrow δkt plus how ever much value (investment)
you put back into the capital stock today it . This equation is dynamic because it has
current period variables that have a t subscript and next period variables that have
a t + 1 subscript.
How many sequences of capital stock values and investment values {kt , it }∞ t=0 sat-
isfy the law of motion for capital (1)? One potential sequence is to have the initial
capital stock be k0 > 0 and have investment be it = 0 for all t. This would make the
capital stock get smaller and smaller every period kt+1 < kt until the capital stock
gets very close to zero limt→∞ kt = 0. A different sequence of capital stocks and
investment amounts that satisfies (1) is the investment amount that exactly offsets
the depreciation, thereby keeping the capital stock constant it = δkt for all t. This is
termed the steady state. You could also come up with infinitely many other sequences
that satisfy (1), where capital kt and investment it are fluctuating. More structure on
the model—like household optimization or firm profit maximization conditions—helps
pin down the set of possible sequences satisfying (1).
In this chapter, we are going to look at a very particular class of dynamic problems
with “nice” properties. These will be models that can be written in recursive form and
that can be transformed into a contraction mapping.1 We will learn to solve them
using a particular solution method: value function iteration. But, as you already
know, there are many solution methods to dynamic models. The best one depends
1
Good references for how to solve these problems are Stokey and Lucas (1989) and Adda and
Cooper (2003).

1
on what model you are using and what your research question is. Value function
iteration is a nonlinear solution method for dynamic models.
A general form of the Bellman equation of a recursive model is the following,

V (x, y) = max
0
σ(x, y, y 0 ) + βE [V (x0 , y 0 )] (2)
y

where x is the exogenous state (set of exogenous variables), y is the endogenous state
(set of choice variables), y 0 is the control or choice variable, σ(·) is the period objective
function, and V (x, y) is the value function. The value function tells you the value to
the agent of showing up in a given period with state (x, y), and the value function
accounts for all expected future benefits in addition to the current period benefits.

2 The Sequence Problem


Assume individuals have perfect foresight (no uncertainty) and that remaining lifetime
utility U of an agent who lives T periods is given by the following equation,
T
X
U= β t−1 u(ct )
t=1

where β ∈ (0, 1) is the discount factor and u(·) is a period utility function that is
increasing, continuous, continuously differentiable, and concave (u0 > 0, u00 < 0).
Using the notation and example of Adda and Cooper (2003), we assume that
individuals are choosing how much to consume each period ct of a cake that starts
out with size W1 > 0. So the law of motion for the size of the cake at the end of
period t is the following.
Wt+1 = Wt − ct (3)
You can think of the size of the cake at the beginning of a period Wt as being given.
The optimization problem for the individual is to choose consumption in each
period ct in order to maximize lifetime utility Ut subject to the constraint on the size
of the cake (law of motion).
T
X
max β t−1 u(ct ) s.t Wt+1 = Wt − ct
ct ∈[0,Wt ]
t=1

This problem can be rewritten in the following way by substituting the law of motion
for the cake size into the utility function.
T
X
max β t−1 u(Wt − Wt+1 )
Wt+1 ∈[0,Wt ]
t=1

Exercise 1. If the individual lives for one period T = 1, what is the condition that
characterizes the optimal amount of cake to eat in period 1? Write the problem in
the equivalent way of showing what the condition is for the optimal amount of cake
to save for the next period WT +1 or W2 .

2
Exercise 2. If the individual lives for two periods T = 2, what is the condition that
characterizes the optimal amount of cake to leave for the next period W3 in period
2? What is the condition that characterizes the optimal amount of cake leave for the
next period W2 in period 1?
Exercise 3. If the individual lives for three periods T = 3 what are the conditions
that characterize the optimal amount of cake to leave for the next period in each
period {W2 , W3 , W4 }? Now assume that the initial cake size is W1 = 1, the discount
factor is β = 0.9, and the period utility function is log(ct ). Show how {ct }3t=1 and
{Wt }4t=1 evolve over the three periods.

3 The Recursive Problem, Finite Horizon


Now we want to define a general function, called a value function VT (WT ), that
represents the value to an individual of entering the last period of his life T with a
cake of size WT .2
VT (WT ) ≡ max u (WT − WT +1 ) (4)
WT +1

The solution to this problem is a policy function for WT +1 that is a function of


the state WT +1 = ψT (WT ) that maximizes the value of entering the period with
state WT , and the corresponding value function VT (WT ). The state refers to all the
variables that are known by the individual at the time of the decision and that are
relevant to the decision (WT in this case). So one way to think of this problem is as
a policy function WT +1 = ψT (WT ) that satisfies a value function condition.
So the value function for the last period of an individual’s life can be rewritten as
the utility of choosing the optimal amount of cake to save for the next period,
 
VT (WT ) = u WT − ψT (WT ) (5)

where WT +1 = ψT (WT ) = min {WT } = 0 maximizes the period-T value function.


This solution is equivalent to the one-period problem from Exercise 1. Again, this
means that the optimal cake-saving policy in the last period of one’s life is to save no
cake. So the value to an individual of entering the last period of his life with cake of
size WT is equal to the utility of eating WT of cake.
VT (WT ) = u (WT ) (6)
The problem of the individual in period T − 1 becomes more interesting. One way
to write the value function of entering period T − 1 with a cake of size WT −1 is to
characterize it as a function of the discounted sum of utilities in which WT and WT +1
are chosen optimally.
VT −1 (WT −1 ) ≡ max u (WT −1 − WT ) + βu (WT − WT +1 )
WT ,WT +1

2
Note that the notation for the value function Vt and policy function ψt with a time subscript
is different from the notation in chapter 1 of Adda and Cooper (2003). Here, the time subscript
denotes a different function for each period of time, whereas the subscript T in Adda and Cooper
(2003) denotes a function in which the total number of periods is T .

3
If we make an envelope theorem assumption that WT +1 will be chosen optimally in
period T according to (5), we can rewrite the problem in the following way,

VT −1 (WT −1 ) ≡ max u (WT −1 − WT ) + βVT (WT ) (7)


WT

where VT (WT ) is defined in (4). This is the finite horizon version of the famous
recursive workhorse, the Bellman equation.3 The assumption that future choices will
be made optimally is called the principle of optimality.
Exercise 4. Using the envelope theorem that says WT +1 will be chosen optimally in
the next period, show the condition that characterizes the optimal choice (the policy
function) in period T − 1 for WT = ψT −1 (WT −1 ). Show the value function VT −1 in
terms of ψT −1 (WT −1 ).
 
Exercise 5. Let u(c) = log(c). Show that VT −1 W̄ does not equal VT W̄ and that
ψT −1 W̄ does not equal ψT W̄ for a cake size of W̄ when T < ∞ represents the
last period of an individual’s life.
Exercise 6. Using u(c) = log(c), write the finite horizon Bellman equation for the
value function at time T − 2. Characterize the solution for the period T − 2 policy
function for how much cake to save for the next period WT −1 = ψT −2 (WT −2 ) using
the envelope theorem (the principle of optimality) and write its analytical solution.
Also, write the analytical solution for VT −2 .
Exercise 7. Using u(c) = log(c) and the answers to Exercises 5 and 6, write down
the expressions for the analytical solutions for ψT −s (WT −s ) and VT −s (WT −s ) for the
general integer s ≥ 1 using induction. Show that lims→∞ VT −s (WT −s ) = V (WT −s )
and that lims→∞ ψT −s (WT −s ) = ψ (WT −s ). That is, as the horizon becomes further
and further away (infinite), the value function and policy function become indepen-
dent of time. Another way of saying this is the following. The value of entering a
period t with a certain amount of cake when the end of your life is far enough away
only depends on how much cake there is Wt , not in what period you have that amount
of cake.
Exercise 8. Write the Bellman equation for the cake eating problem with a general
utility function u(c) when the horizon is infinite (i.e., either T = ∞ or s = ∞).

4 The Recursive Problem, Infinite Horizon


You showed in Exercise 7 in Section 3 that the value function and policy function in
the Bellman equation for the infinite horizon problem are independent of time. So
everything can now be written in terms of variables today and variables tomorrow.
We will denote variables tomorrow with a “ 0 ”.

V (W ) = max
0
u (W − W 0 ) + βV (W 0 ) (8)
W ∈[0,W ]

3
Dreyfus (2002) is a good reference for the origin of the Bellman equation.

4
Note that the value function V on the left-hand-side of (8) and on the right-hand-side
are the same function. This is what you showed in Exercise 7.
Because the problem now has an infinite horizon, the nature of the solution is a
little different. The solution to (8) is a policy function W 0 = ψ(W ) that creates a fixed
point in V . In other words, the solution is a policy function ψ(W ) that makes the
function V on the left-hand-side of (8) equal the function V on the right-hand-side.
Another way of thinking of the problem is that the Bellman equation is one equa-
tion with two unknowns—the value function V and the policy function ψ. The
condition that renders this problem identified is that the Bellman equation must be
a contraction mapping. In a sense, the contraction mapping condition pins down the
value function V .
Define C as an operator on any value function Vt (W ). Let C perform the following
operation.4  
C Vt (W ) ≡ max 0
u (W − W 0 ) + βVt (W 0 ) (9)
W ∈[0,W ]

Note that the value function on the right-hand-side of (9) and on the left-hand-side
are the same function Vt , but have a different value of the size of the cake—W versus
W 0 . For a reason that will become apparent in a moment, define the resulting function
from the C operator as Vt−1 .
 
Vt−1 (W ) ≡ C Vt (W ) (10)

The value function that results from the C operation Vt−1 is not necessarily the same
as the value function that the system began with Vt . The solution, then, is the fixed
point in V .  
C Vt (W ) = Vt−1 (W ) = Vt (W ) = V (W )

Definition 1 (Contraction mapping). Let (S, ρ) be a metric space and C : S → S


be a function mapping S onto itself. C is a contraction mapping (with modulus β) if
for some β ∈ (0, 1), ρ(Cx, Cy) ≤ βρ(x, y), for all x, y ∈ S.

The operator C(·) is called a contraction mapping if applying it over and over again
to an arbitrary value function Vt converges to a fixed point. One way to characterize
a contraction mapping is:
 
lim C s Vt (W ) = V (W )
s→∞

4
I use a subscript t here to denote the iteration number. I have the contraction operator C(·)
advance the iteration number backward t − 1 to maintain the backwards induction analogy from the
previous exercises of solving for value functions from some terminal period T .

5
Theorem 1 (Contraction mapping theorem). If (S, ρ) is a complete metric space
and C : S → S is a contraction mapping with modulus β, then
• C has exactly one fixed point v in S, and
• for any v0 ∈ S, ρ(C n v0 , v) ≤ β n ρ(v0 , v) for n = 0, 1, 2, ....

A set of sufficient conditions for C(·) to be a contraction mapping are due to


Blackwell (1965) and are that C be monotonic and that it have the property of
discounting.5 Adda and Cooper (2003) outline one set of sufficient conditions that
ensure our problem is a contraction mapping and therefore has a unique solution.
These conditions are that the period utility function u(·) must be real-valued, con-
tinuous, and bounded, that β ∈ (0, 1), and that the constraint set W 0 ∈ [0, W ] be
nonempty, compact-valued, and continuous.

Theorem 2 (Blackwell’s sufficient conditions for a contraction). Let X ⊆ Rl ,


and let B(X) be a space of bounded functions f : X → R, with the sup norm. Let
C : B(X) → B(X) be an operator satisfying the following two conditions.
• (monotonicity) f, g ∈ B(X) and f (x) ≤ g(x) for all x ∈ X, implies (Cf )(x) ≤
(Cg)(x), for all x ∈ X;
• (discounting) there exists some β ∈ (0, 1) such that
[C(f + a)] (x) ≤ (Cf )(x) + βa for all f ∈ B(X), a ≥ 0, x∈X

[Here (f + a)(x) is the function defined by (f + a)(x) = f (x) + a.] Then C is a


contraction with modulus β.

Before moving on to the exercises in which you will solve for the value function
V and the policy function ψ by utilizing the contraction mapping theorem, it is
important to communicate one last important reason why it works. In the recursive
finite horizon problem from Section 3, you could always solve for V and ψ. The
Bellman equation was a second-order difference equation in Wt . This is easily seen in
the period T − 2 problem from Exercise 6.
VT −2 (WT −2 ) = max u WT −2 − WT −1 + βu WT −1 − WT + β 2 u WT
  
(11)
WT −1

This is a difference equation with cakes sizes in three different periods, WT −2 , WT −1 ,


and WT . We could solve for the value function V and policy function ψ in this
second order difference equation because we had an initial condition WT −2 , an Euler
equation, and an ending condition WT +1 = 0.6
5
A more formal definition of a contraction mapping and the corresponding sufficient conditions
is given in Stokey and Lucas (1989, pp. 49-55). The conditions for a contraction mapping are due
to Blackwell (1965) and are often called “Blackwell’s sufficient conditions for a contraction.”
6
For a first order difference equation, you just need an initial condition or an ending condition
and an Euler equation.

6
But how do we solve for these objects V and ψ in the infinite horizon in which we
do not have an ending condition? The answer is that we do have an ending condition
in the infinite horizon problem, and it is called the transversality condition.

lim β t E0 [Wt u0 (Wt )] = 0 (12)


t→∞

The transversality condition simply states that the present value of the state Wt goes
to zero far off into the future (ending condition). This ensures that people don’t save
cake forever, thereby consuming zero in any period. In value function iteration, this
is analogous to starting with V0 = 0. All recursive (infinite horizon) problems have a
transversality condition in the background.
Exercise 9. Let the maximum size of the cake be Wmax = 1. Approximate the
continuum of possible cake sizes by a column vector called W that ranges between a
number very close to 0 to 1.7 Let the number of possible cake values be N = 100 so
that the increment between each value is 0.01. So Wmin = 0.01.
Exercise 10. As in the previous problem sets, assume that period T is the final
period of an individual’s life. So VT +1 (W 0 ) for entering period T + 1 with a cake of
size W 0 is a column vector of zeros of length N , where VT +1 : R → R and W, W 0 ∈
[Wmin , Wmax ]. Assume that the period utility function has the logarithmic functional
form u(c) = log(c), and that the discount factor is β = 0.9. What is the resulting
policy function W 0 = ψT (W ) and value function VT (W ) when VT is defined as the
contraction in equations (9) and (10)?8 See Appendix A-1 for a fast way to compute
this exercise.
Exercise 11. Generate a norm δT = kVT (W ) − VT +1 (W 0 )k that measures the dis-
tance between the two value functions. Define the distance metric as the sum of the
squared differences,

δT ≡ kVT (W ) − VT +1 (W 0 )k = (VT − VT +1 )0 ∗ (VT − VT +1 ) (13)

where (VT − VT +1 )0 is the transpose of the difference of the two vectors. Defined in
this way, δT ∈ R+ .
Exercise 12. Take the resulting VT from Exercise 10, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
 
VT −1 (W ) = C VT (W ) = max 0
u (W − W 0 ) + βVT (W 0 )
W ∈[0,W ]

and the accompanying policy function W 0 = ψT −1 (W ). Calculate the accompany-


ing distance measure for δT −1 using the formula from (13) with the updated period
subscripts. Compare δT −1 with δT from Exercise 11.
7
We use a number close to zero rather than zero because the log utility function that we will use
in the rest of the problems is undefined at zero.
8
HINT: The policy function should be a vector of length N of optimal future values of the cake
0
W given the current value of the cake W , and VT should be an N -length vector representing the
value of entering a period with cake size W .

7
Exercise 13. Repeat Exercise 12 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .

Exercise 14. Write a while loop in Python that performs the contraction operation
from Exercises 10, 12, and 13 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve
just completed your first solution by value function iteration. The distance measure
δT −s being arbitrarily close to zero means you have converged to the fixed point
Vt = Vt−1 = V . (For fun, you can show that the policy function converges to the
same function regardless of what you put in for your initial policy function value.)

Exercise 15. Using the matplotlib library, plot the policy function for the converged
problem W 0 = ψT −s (W ) = ψ (W ) which gives the value of the cake tomorrow (y-axis)
as a function of the cake today (x-axis).

5 Value Function Iteration versus Policy Function


Iteration
Exercises 9 through 15 took you through the steps of the value function iteration
(VFI) solution technique to dynamic programming problems. As you can see, VFI is
a very direct application of the contraction mapping theorem described in Theorem
1. A similar approach, which is often faster than VFI, is policy function iteration
(PFI).
[See Jeff’s policy function iteration lab.]

6 Infinite Horizon, Stochastic, i.i.d.


Now assume that the individuals preferences fluctuate each period according to some
i.i.d. shock ε. The Bellman equation can be easily rewritten in the following way to
incorporate the uncertainty,

V (W, ε) = max
0
εu (W − W 0 ) + βEε0 [V (W 0 , ε0 )] where ε ∼ N(µ, σ 2 )
W ∈[0,W ]

where E is the unconditional expectations operator over all values in the support of
ε, µ is the mean of ε, and σ 2 is the variance of ε.

Exercise 16. Approximate the support of ε by generating a row vector of possible


values for ε. Let the maximum value be three standard deviations above the mean
εmax = µ + 3σ, and let the minimum value be three standard deviations below εmin =
µ−3σ, where σ 2 = 0.25 and µ = 4σ.9 And let there be M = 7 (make m an odd number
so the mean is included in the support) equally spaced points in the support so that
9
I set µ = 4σ so that the entire support of ε is positive. This is important because the utility
function is any number in the range u(c) = log(c) ∈ (−∞, W ]. A negative shock would give a
premium to negative utility.

8
ε is an M -length row vector. Generate the probability distribution over Γ(ε) such
that ε ∼ N (0, σ 2 ). Thus, Γ(ε) represents the probability of a particular realization
Pr (ε = εm ). (Hint: This is essentially the Newton-Cotes method of approximating
an integral that did this in your numerical integration labs.)
Exercise 17. As in Exercise 9 from Section 4, assume that the vector of possible cake
sizes is W with N = 100 equally spaced values between 0.01 and 1. As in Exercise
10 from Section 4, assume a value function VT +1 (W 0 , ε0 ) for entering the period after
the last period of life with cake size W 0 and taste shock realization ε0 . This value
function will be a matrix with each row corresponding to different values of W 0 and
each column corresponding to different values of ε0 . So each element in the matrix
is VT +1 (Wn0 , ε0m ). Let your initial guess for the value function VT +1 be a matrix of
zeros with N rows and M columns. Assume that the period utility function has the
logarithmic functional form u(c) = log(c), and that the discount factor is β = 0.9.
What is the resulting policy function W 0 = ψT (W, ε) and value function VT (W, ε)
when VT is defined as in (14) below? See Appendix A-2 for a fast way to compute
this exercise.
  h i
0 0 0
Vt−1 (W, ε) ≡ C Vt (W, ε) ≡ max 0
εu (W − W ) + βE ε 0 Vt (W , ε ) (14)
W ∈[0,W ]

Exercise 18. Generate a norm δT = kVT (W, ε) − VT +1 (W 0 , ε0 )k that measures the


distance between the two value functions. Define the distance metric as the sum of
the squares of each corresponding element in the two value functions,

δT ≡ kVT (W, ε) − VT +1 (W 0 ε0 )k ≡ vec (VT − VT +1 )0 ∗ vec (VT − VT +1 ) (15)

where vec (VT − VT +1 )0 is the transpose of the column vectorized version of VT −VT +1 .
Defined in this way, δT ∈ R+ .
Exercise 19. Take the resulting VT from Exercise 17, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
  h i
0 0 0
VT −1 (W, ε) = C VT (W, ε) = max 0
εu (W − W ) + βEε VT (W , ε )
0
W ∈[0,W ]

and the accompanying policy function W 0 = ψT −1 (W, ε). Calculate the accompany-
ing distance measure for δT −1 using the formula from (15) with the updated period
subscripts. Compare δT −1 with δT from Exercise 18.
Exercise 20. Repeat Exercise 19 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .
Exercise 21. Write a while loop in Python that performs the contraction operation
from Exercises 17, 19, and 20 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve just
completed your first solution to a stochastic problem by value function iteration. The
distance measure δT −s being arbitrarily close to zero means you have converged to
the fixed point Vt = Vt−1 = V .

9
Exercise 22. Use Python ’s matplotlib library to make a 3-D surface plot of the
policy function for the converged problem W 0 = ψT −s (W, ε) = ψ (W, ε) which gives
the value of the cake tomorrow (y-axis) as a function of the cake today (x1-axis) and
the taste shock today (x2-axis).

7 Infinite Horizon, Stochastic, AR(1)


Now assume that the taste shock is persistent. Let the persistence be characterized
by the following AR(1) process.

ε0 = (1 − ρ)µ + ρε + ν 0 where ρ ∈ (0, 1) and ν ∼ N (0, σ 2 ) (16)

Then the Bellman equation becomes the following, in which the only change from
the problems in Section 6 is that the expectations operator is now a conditional
expectation because of the persistent shock process,

V (W, ε) = max
0
εu (W − W 0 ) + βEε0 |ε [V (W 0 , ε0 )]
W ∈[0,W ]

where ε0 is distributed according to (16). Let Γ(ε0 |ε) = Pr ε0j |εi where ε0j is the


shock in the next period and εi is the value of the shock in the current period.

Exercise 23. Use the method described by Tauchen and Hussey (1991) to approx-
imate the AR(1) process for ε from (16) as a first order Markov process. The
Python function file “tauchenhussey.py” will produce a vector of length M for the
support of ε and an M × M transition matrix Γ(ε0 |ε) = Pr ε0j |εi where each element
in row i and column j represents the probability of ε0j tomorrow given εi today. As

inputs, let M = 7, the mean of the process µ = 4σ, ρ = 1/2, σ = σ 2 = 1/2, and
basesigma = (0.5 + ρ4 )σ + (0.5 − ρ4 ) ∗ √ σ 2 .
1−ρ

Exercise 24. As in Exercise 17 from Section 6, assume that the vector of possible
cake sizes is W with N = 100 equally spaced values between 0.01 and 1 and assume
a value function VT +1 (W 0 , ε0 ) for entering the period after the last period of life with
cake size W 0 and taste shock realization ε0 . This value function will be a matrix with
each row corresponding to different values of W 0 and each column corresponding to
different values of ε0 . So each element in the matrix is VT +1 (Wn0 , ε0m ). Let your
initial guess for the value function VT +1 be a matrix of zeros with N rows and M
columns. Assume that the period utility function has the logarithmic functional form
u(c) = log(c), and that the discount factor is β = 0.9. What is the resulting policy
function W 0 = ψT (W, ε) and value function VT (W, ε) when VT is defined as in (17)
below? See Appendix A-3 for a fast way to compute this exercise.
  h i
0 0 0
Vt−1 (W, ε) ≡ C Vt (W, ε) ≡ max 0
εu (W − W ) + βEε0 |ε Vt (W , ε ) (17)
W ∈[0,W ]

10
Exercise 25. Generate a norm δT = kVT (W, ε) − VT +1 (W 0 , ε0 )k that measures the
distance between the two value functions. Define the distance metric as the sum of
the squares of each corresponding element in the two value functions,
δT ≡ kVT (W, ε) − VT +1 (W 0 ε0 )k ≡ vec (VT − VT +1 )0 ∗ vec (VT − VT +1 ) (18)
where vec (VT − VT +1 )0 is the transpose of the column vectorized version of VT −VT +1 .
Defined in this way, δT ∈ R+ .
Exercise 26. Take the resulting VT from Exercise 24, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
  h i
0 0 0
VT −1 (W, ε) = C VT (W, ε) = max 0
εu (W − W ) + βE 0
ε |ε VT (W , ε )
W ∈[0,W ]

and the accompanying policy function W 0 = ψT −1 (W, ε). Calculate the accompany-
ing distance measure for δT −1 using the formula from (18) with the updated period
subscripts. Compare δT −1 with δT from Exercise 25.
Exercise 27. Repeat Exercise 26 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .
Exercise 28. Write a while loop in Python that performs the contraction operation
from Exercises 24, 26, and 27 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve
just completed your first solution to a stochastic AR(1) problem by value function
iteration. The distance measure δT −s being arbitrarily close to zero means you have
converged to the fixed point Vt = Vt−1 = V .
Exercise 29. Make a 3-D surface plot of the policy function for the converged prob-
lem W 0 = ψT −s (W, ε) = ψ (W, ε) which gives the value of the cake tomorrow (y-axis)
as a function of the cake today (x1-axis) and the taste shock today (x2-axis).

8 Discrete Choice (Threshold) Problems


One powerful application of dynamic programming that illustrates its versatility as
a dynamic solution method is to models that have both an extensive and intensive
margin. Adda and Cooper (2003) refer to these models as discrete choice problems
or optimal stopping problems. They are also sometimes called threshold problems
because the discrete choice policy function is determined by the state variable being
above or below certain threshold values. Examples include models of employment
that involve both the choice of whether to work and how much to work, models of
firm entry and exit that involve the choice of both whether to produce and how much
to produce, and models of marriage that involve the choice of whether to date (get
married or keep dating) and how much to date.
In this problem set, we follow a simple version of a standard job search model.10
Assume that workers are infinitely lived. Let the value of entering a period with most
10
See Rogerson et al. (2005) and Adda and Cooper (2003, pp. 257-263).

11
recent wage w, current job offer wage w0 , and employment status s be given by the
following value function,
(
V E (w) if s = E
V (w, w0 , s) = U 0
(19)
V (w, w ) if s = U

where employment status s = {E, U } can either be employed or unemployed.


If an individual’s job status is employed s = E in a given period, net present value
of expected utility is the period utility of consumption plus the discounted expected
value of the entering the next period with wage w, job offer wage w00 , and employment
status s0 .
V E (w) = u(w) + βEw00 ,s0 V (w, w00 , s0 ) (20)
The period utility function is u(w), and the argument w implies a simplified budget
constraint c = w that abstracts from any ability to borrow or save. The discount
factor is β, the expectations operator Ew00 ,s0 is over the job offer wage, and employment
status in the next period, and next period’s value function is simply (19) with the
future value of employment status s0 .
The joint probability distribution over w00 and s0 is characterized in the following
simple way. If the worker stays employed in the next period s0 = E, then next
period’s wage equals the current period’s wage. If the worker becomes unemployed in
the next period s0 = U , then the worker’s unemployment benefits will be a percentage
of his current wage αw. Any worker who is unemployed will receive one wage offer
per period w0 , which that worker will receive in the following period, drawn from
the cumulative density function F (w0 ) or probability density function f (w0 ), which
is independent of the worker’s previous wage (for simplicity). Lastly, let γ represent
the probability that an employed worker becomes unemployed in the next period. So
(20) can be rewritten in the following way.
h i
E E U 00
V (w) = u(w) + β (1 − γ)V (w) + γEw00 V (w, w ) (21)

The value of being unemployed in a given period is a function of both the wage
at the most recent job w as well as the wage of the current job offer w0 ,
n o
U 0 E 0 00
 U
V (w, w ) = u(αw) + β 0 max V (w ), Ew00 V (w, w ) (22)
s ∈{E,U }

where α ∈ (0, 1) is the fraction of the worker’s previous wage paid in unemployment
insurance benefits. It is only in the unemployed state s = U in which the worker makes
a decision. Once the job offer is received w0 which is drawn from the independent
cumulative probability distribution F (w0 ) or the probability density function f (w0 ),
the worker can choose whether to accept or reject the offer. The expectation in (22)
is, therefore, not over w0 but over the possible job offers in the following period w00 if
the worker chooses to reject the current job offer s0 = U .
The policy function for the decision of the unemployed worker whether to accept a
job s0 = E or whether to reject a job s0 = U will be a function of both the amount of

12
the most recent wage w and the amount the the current job offer: s0 = ψ(w, w0 ). These
discrete choice problems are often called threshold problems because the policy choice
depends on whether the state variable is greater than or less than some threshold level.
0
In the labor search model, the threshold level is called the “reservation wage” wR .
0
The reservation wage wR is defined as the wage offer such that the worker is indifferent
between accepting the job s0 = E and staying unemployed s0 = U .
0
≡ w0 : V E (w0 ) = Ew00 V U (w, w00 )
 
wR (23)
0
Note that the reservation wage wR is a function of the wage at the most recent job
w. The policy function will then take the form of accepting the job if w0 ≥ wR 0
or
0 0
rejecting the job offer and stay unemployed if w < wR .
(
0 0 E if w0 ≥ wR 0
s = ψ(w, w ) = (24)
U if w0 < wR 0

In summary, the labor search discrete choice problem is characterized by the value
functions (19), (21), and (22), the reservation wage (23), and the policy function (24).
Because wage offers are distributed according to the cdf F (w0 ) and because the policy
function takes the form of (24), the probability that the unemployed worker receives
0
a wage offer that he will reject is F (wR ) and the probability that he receives a wage
0
offer that he will accept is 1 − F (wR ). Just like the continuous choice cake eating
problems in problem sets 1 through 5, this problem can be solved by value function
iteration, which is similar to starting at the “final” period of an individual’s life and
solving for the an infinite series of solutions by backward induction.

1. Assume that workers only live a finite number of periods T and assume that
the utility of consumption is log utility u(c) = log(c). The value of entering
the last period of life with most recent wage w and employment status s is the
following. (
VTE (w) = log(w) if s = E
VT (w, w0 , s) = U 0
VT (w, w ) = log(αw) if s = U
Solve analytically for the value of entering the second-to-last period of life
with most recent wage, current job offer, and employment status VT −1 (w, w0 , s)
(which includes VTE−1 (w) and VTU−1 (w, w0 )), the reservation wage wR,T
0
−1 , and
0 0
the policy function s = ψT −1 (w, w ).
0 0 0
2. Given the solutions for the VT −1 , wR,T −1 , and s = ψT −1 (w ) from the previous
exercise, solve analytically for the value of entering the third-to-last period of life
with most recent wage, current job offer, and employment status VT −2 (w, w0 , s)
(which includes VTE−2 (w) and VTU−2 (w, w0 )), the reservation wage wR,T 0
−2 , and
0 0
the policy function s = ψT −2 (w, w ). [NOTE: This operation of solving for the
new value function Vt (w, s) is a contraction.]

The value function iteration solution method for the equilibrium in the labor
search problem is analogous to the value function iteration we did in problem sets 3,

13
4, and 5. The only difference is that two value functions must converge to a fixed
point in this problem instead of just one value function converging in the previous
problems.
For the following exercises, you will use Python . Assume that the probability
of becoming unemployed in a given period is γ = 0.10, the fraction of wages paid
in unemployment benefits is α = 0.5, and the discount factor is β = 0.9. Assume
that wage offers to unemployed workers are distributed lognormally w0 ∼ LogN(µ, σ)
where m = 20 is the mean wage, v = 400 is the variance of the wage, µ is the mean
of log(w0 ) and σ is the standard deviation of log(w0 ). Denote the cdf of the lognormal
distribution as F (w0 ) and the pdf of the distribution as f (w0 ).

[The following exercises require Python .]

3. Approximate the support of w ∈ (0, ∞) by generating a column vector of pos-


sible values for w. Let the maximum value be wmax = 100, let the minimum
value be wmin = 0.2, and let the number of equally spaced points in the vec-
tor be N = 500 (an increment value of 0.2). Let the wage of a job offer in
any periodpbe lognormally distributed w0 ∼ LogN(µ, σ), where µ = E [log(w0 )]
and σ = var[log(w0 )]. So if the mean job offer wage w0 is m = 20 and the
variance of job offer wages w0 is v = 200, the the corresponding mean √ µ and
standard deviation σ for the lognormal distribution are µ = log m2 / v + m2
p
and σ = log ((v/m2 ) + 1). Generate the discrete approximation of the log-
normal probability density function f (w0 ) such that w0 ∼ LogN(µ, σ). Thus,
f (w0 ) represents the probability of a particular realization Pr (w0 = wn ). (Hint:
This problem is very easy if you use the MatLab function discretelognorm in
the function file discretelognorm.py available upon request. You’re welcome.)

4. Write Python code that solves for the equilibrium optimal policy function s0 =
ψ(w, w0 ), the reservation wage wR
0
as a function of the current wage w, and the
value functions V E (w) and V U (w, w0 ) using value function iteration.
0
5. Plot the equilibrium reservation wage wR of the converged problem as a function
of the current wage w with the current wage on the x-axis and the reservation
0
wage wR on the y-axis. This is the most common way to plot discrete choice
policy functions. The reservation wage represents the wage that makes the
unemployed worker indifferent between taking a job offer and rejecting it. So
any wage above the reservation wage line represents s0 = E and any wage below
the reservation wage line represents s0 = U .

14
References
Adda, Jérôme and Russell Cooper, Dynamic Economics: Quantitative Methods
and Applications, The MIT Press: Cambridge, Massachusetts, 2003.

Blackwell, David, “Discounted Dynamic Programming,” Annals of Mathematical


Statistics, February 1965, 36 (1), 226–235.

Dreyfus, Stuart, “Richard Bellman on the Birth of Dynamic Programming,” Op-


erations Research, January-February 2002, 50 (1), 48–51.

Rogerson, Richard, Robert Shimer, and Randall Wright, “Search-Theoretic


Models of the Labor Market: A Survey,” Journal of Economic Literature, December
2005, 43 (4), 959–988.

Stokey, Nancy L. and Robert E. Lucas Jr., Recursive Methods in Economic


Dynamics, Harvard University Press, 1989.

Tauchen, George and Robert Hussey, “Quadrature-based Methods for Obtain-


ing Approximate Solutions to Nonlinear Asset Pricing Models,” Econometrica,
March 1991, 59 (2), pp. 371–396.

15
APPENDIX
A-1 Computation of the value function and policy
function using discretized state and control
space with perfect foresight
In Exercise 10 from Section 4, the finite horizon Bellman equation is:

VT (W ) = max u (W − W 0 ) + βVT +1 (W 0 ) (A.1.1)


W 0 ∈[Wmin ,Wmax ]

A simple approach to take in calculating these kinds of problems is to put it into the
computer without the maxW 0 operator using the entire set of utility values and value
function values for each possible point in the state space W and each possible point
in the control space W 0 . In this case, I put values associated with W on the rows
(1st dimension) and values associated with W 0 in the columns (2nd dimension). The
computational geometry is shown below.

Figure 1: Computational geometry of perfect


foresight value function iteration

It is straightforward to calculate u(W − W 0 ) for every value of W and W 0 . You


make sure the constraint of c ≥ 0 is satisfied by replacing the negative entries in
W − W 0 with a number that is very close to zero (e.g., 10−10 ) because negative entries
do not satisfy the constraint that W 0 ≤ W . I also then replace any entries in the
value function Vt+1 (W 0 ) that correspond to negative W − W 0 with a very big negative
number (e.g., −1010 ).
The VT +1 (W 0 ) function is a column vector (similar to VT ). But in (A.1.1) it is
only a function of W 0 which is measured along the column dimension (2nd dimension).
So I simply take the transpose of VT +1 so that it is a row vector and then copy it
down n rows. This copying represents the fact that VT +1 is not a function of W .

16
As mentioned in the previous paragraph, you’ll need to replace all the entries of the
VT +1 matrix that correspond to values for which W 0 > W with a very large negative
number (e.g., −1010 or even -1000) so that those values of W 0 will not be picked in
the maximization.
Now you just add your u(W − W 0 ) matrix to your N × N βVT +1 matrix and you
have an N × N matrix VT (W, W 0 ) representing the period-T value function for any
W and any W 0 . The last step is to maximize over the W 0 dimension (2nd dimension).
The policy function ψT (W ) will be an N × 1 column vector that represents the W 0
value that maximizes the value function for a given W .

17
A-2 Computation of the value function and policy
function using discretized state and control
space with i.i.d. shock
In Exercise 17 from Section 6, the finite horizon Bellman equation is:

VT (W, ε) = max εu (W − W 0 ) + βEε0 VT +1 (W 0 , ε0 ) (A.2.1)


W 0 ∈[Wmin ,W ]

The approach I take in calculating these kinds of problems is to put it into the
computer without the maxW 0 operator using the entire set of utility values and value
function values for each possible point in the state space (W, ε) and each possible
point in the control space W 0 . In this case, I put values associated with W on the
rows (1st dimension), the values associated with ε on the columns (2nd dimension),
and values associated with W 0 in the depth (3rd dimension). The computational
geometry is shown below.

Figure 2: Computational geometry of stochastic


i.i.d. value function iteration

It is straightforward to calculate εu(W − W 0 ) for every value of W , ε, and W 0 . I


simplify this process by replacing the negative entries in W − W 0 with a number that
is very close to zero (e.g., 10−10 ) because negative entries do not satisfy the constraint
that W 0 ≤ W . I also then replace any entries in the value function Vt+1 (W 0 , ε0 ) that
correspond to negative W − W 0 with a very big negative number (e.g., −1010 ).
The VT +1 (W 0 , ε0 ) function is an N × M matrix (similar to VT ). But in (A.2.1)
it is only a function of W 0 because the expectations operator integrates out the ε0
dimension, and W 0 is measured along the depth dimension (3rd dimension). So I
simply reshape the vector Eε0 VT +1 so that it is (1 × 1 × N )-dimensional array and
then copy it down N rows and M columns. This copying represents the fact that
VT +1 is not a function of W or ε. As in the previous paragraph, you’ll need to replace
all the entries of the VT +1 matrix that correspond to values for which W 0 > W with a

18
very large negative number (e.g., −1010 or even -100) so that those values of W 0 will
not be picked in the maximization.
Now you just add your εu(W − W 0 ) array to your N × M × N βEε0 VT +1 array,
and you have an N × M × N matrix VT (W, ε, W 0 ) representing the period-T value
function for any W , ε, and W 0 . The last step is to maximize over the W 0 dimension.
The policy function ψT (W ) will be an n × M matrix that represents the W 0 value
that maximizes the value function for a given W and ε.

Here is a summary of the process.

1. Take the expectation of Eε0 [VT +1 (W 0 , ε0 )] by integrating out the ε0 dimension of


the value function using the probability distribution of the taste shock Γ(ε) from
Exercise 16. Do this even though the answer is trivially a vector of zeros for the
case in Exercise 17. It will not be trivial in future cases. So Eε0 [VT +1 (W 0 , ε0 )]
becomes a column vector of length N that is only a function of W 0 .

2. Then change the shape of Eε0 [VT +1 (W 0 , ε0 )] so that it is a 1×1×N -dimensional


array.

3. Then copy the reshaped Eε0 [VT +1 (W 0 , ε0 )] to N rows and M columns so that
you have an array that has dimension N × M × N that is only a function of W 0
which is represented in the third dimension of the array.

4. Then create an array that represents all the possible values of εu (W − W 0 ) in


which the (row, column, depth) dimensions correspond to the values of (W, ε, W 0 ).

5. Lastly, the new value function is obtained by adding the two three-dimensional
arrays together (multiplying the second array by the discount factor) and maxi-
mizing over the third dimension W 0 . A max command along the 3rd dimension
of the array can return a matrix of index numbers that represent the optimal
value of W 0 , from which you can create the policy function matrix ψ (W, ε).

19
A-3 Computation of the value function and policy
function using discretized state and control
space with AR(1) shock
In Exercise 24 from Section 7, the finite horizon Bellman equation is:

VT (W, ε) = max εu (W − W 0 ) + βEε0 |ε VT +1 (W 0 , ε0 ) (A.3.1)


W 0 ∈[Wmin ,W ]

A simple approach to take in calculating these kinds of problems is to put it into


the computer without the maxW 0 operator using the entire set of utility values and
value function values for each possible point in the state space (W, ε) and each possible
point in the control space W 0 . In this case, I put values associated with W on the
rows (1st dimension), the values associated with ε on the columns (2nd dimension),
and values associated with W 0 in the depth (3rd dimension). The computational
geometry is shown below.

Figure 3: Computational geometry of stochastic


AR(1) value function iteration

It is straightforward to calculate εu(W − W 0 ) for every value of W , ε, and W 0 . I


simplify this process by replacing the negative entries in W − W 0 with a number that
is very close to zero (e.g., 10−10 ) because negative entries do not satisfy the constraint
that W 0 ≤ W . I also then replace any entries in the value function Vt+1 (W 0 , ε0 ) that
correspond to negative W − W 0 with a very big negative number (e.g., −1010 ).
The VT +1 (W 0 , ε0 ) function is an N × M matrix (similar to VT ). But in (A.3.1) it is
rather a function of W 0 and ε because of the conditional expectations operator. The
ε0 dimension can be integrated out by matrix multiplying VT +1 (W 0 , ε0 ) Γ(ε0 |ε)0 where
values of W 0 correspond to the rows (1st dimension) and values of ε correspond to the
columns (2nd dimension). The matrix must then be reshaped into a 1 × M × N -array
so that values of W 0 correspond to the depth of the array (3rd dimension). If the
original N ×M matrix were called “VTp1”, it could be reshaped in MatLab in the way

20
described by writing the following code: “EVTp1array = reshape(VTp10 , [1, M, N ])”
where you make sure to use the transpose “VTp10 ” in the reshape command.
Then copy the 1 × M × N -array Eε0 |ε VT +1 (W 0 , ε0 ) down N rows. This copying can
be easily done using the “repmat” command in MatLab and represents the fact that
VT +1 is not a function of W . As with the εu(W − W 0 ) array, you will need to replace
all the entries of the VT +1 matrix that correspond to values for which W 0 ≥ W with
a very large negative number (e.g., −1010 or even -100) so that those values of W 0
will not be picked in the maximization.
Now you just add your εu(W − W 0 ) array to your N × M × N βEε0 |ε VT +1 array,
and you have an N × M × N matrix VT (W, ε, W 0 ) representing the period-T value
function for any W , ε, and W 0 . The last step is to maximize over the W 0 dimension.
The policy function ψT (W ) will be an N × M matrix that represents the W 0 value
that maximizes the value function for a given W and ε.

Here is a summary of the process.

1. Take the expectation of Eε0 |ε [VT +1 (W 0 , ε0 )] by integrating out the ε0 dimen-


sion of the value function using Markov transition matrix for the taste shock
Γ(ε0 |ε) from Exercise 23. Do this even though the answer is trivially a vector
of zeros for the case in Exercise 24. It will not be trivial in future cases. So
Eε0 |ε [VT +1 (W 0 , ε0 )] becomes an N × M matrix that is now a function of W 0 and
ε.

2. Then change the shape of Eε0 |ε [VT +1 (W 0 , ε0 )] so that it is a 1 × M × N -


dimensional array.

3. Then copy the reshaped Eε0 |ε [VT +1 (W 0 , ε0 )] to N rows so that you have an array
that has dimension N × M × N that is only a function of W 0 (represented in
the third dimension of the array) and ε (represented on the second dimension
of the array).

4. Then create an array that represents all the possible values of εu (W − W 0 ) in


which the (row, column, depth) dimensions correspond to the values of (W, ε, W 0 ).

5. Lastly, the new value function is obtained by adding the two three-dimensional
arrays together (multiplying the second array by the discount factor) and maxi-
mizing over the third dimension W 0 . A max command along the 3rd dimension
of the array can return a matrix of index numbers that represent the optimal
value of W 0 , from which you can create the policy function matrix ψ (W, ε).

21

You might also like