Dynamic Programming
Dynamic Programming
Dynamic Programming
Professor Richard W. Evans
1 Introduction
It is hard to think of any decision that is not dynamic and does not involve some trade-
off between the current period and some future period. Examples of dynamic decisions
include the classical consumption-savings decision, marriage, eduction, labor-leisure.
Dynamic decision making is ubiquitous. In fact, static models are often simply an
approximation of the more realistic dynamic setting.
The term “Dynamic programming” initially was the name for the program of
research that studied how to solve dynamic models (see Dreyfus, 2002). Dynamic
programming has since come to be associated with the particular dynamic solution
method of value function iteration pioneered by Richard Bellman pioneered.
Although dynamic modeling is very rich and realistic, it is also very hard. Take,
for example, the law of motion for the capital stock.
This equation says that the value of the capital stock tomorrow kt+1 is equal to
the value of the capital stock today kt minus the percent of the capital stock that
depreciates between today and tomorrow δkt plus how ever much value (investment)
you put back into the capital stock today it . This equation is dynamic because it has
current period variables that have a t subscript and next period variables that have
a t + 1 subscript.
How many sequences of capital stock values and investment values {kt , it }∞ t=0 sat-
isfy the law of motion for capital (1)? One potential sequence is to have the initial
capital stock be k0 > 0 and have investment be it = 0 for all t. This would make the
capital stock get smaller and smaller every period kt+1 < kt until the capital stock
gets very close to zero limt→∞ kt = 0. A different sequence of capital stocks and
investment amounts that satisfies (1) is the investment amount that exactly offsets
the depreciation, thereby keeping the capital stock constant it = δkt for all t. This is
termed the steady state. You could also come up with infinitely many other sequences
that satisfy (1), where capital kt and investment it are fluctuating. More structure on
the model—like household optimization or firm profit maximization conditions—helps
pin down the set of possible sequences satisfying (1).
In this chapter, we are going to look at a very particular class of dynamic problems
with “nice” properties. These will be models that can be written in recursive form and
that can be transformed into a contraction mapping.1 We will learn to solve them
using a particular solution method: value function iteration. But, as you already
know, there are many solution methods to dynamic models. The best one depends
1
Good references for how to solve these problems are Stokey and Lucas (1989) and Adda and
Cooper (2003).
1
on what model you are using and what your research question is. Value function
iteration is a nonlinear solution method for dynamic models.
A general form of the Bellman equation of a recursive model is the following,
V (x, y) = max
0
σ(x, y, y 0 ) + βE [V (x0 , y 0 )] (2)
y
where x is the exogenous state (set of exogenous variables), y is the endogenous state
(set of choice variables), y 0 is the control or choice variable, σ(·) is the period objective
function, and V (x, y) is the value function. The value function tells you the value to
the agent of showing up in a given period with state (x, y), and the value function
accounts for all expected future benefits in addition to the current period benefits.
where β ∈ (0, 1) is the discount factor and u(·) is a period utility function that is
increasing, continuous, continuously differentiable, and concave (u0 > 0, u00 < 0).
Using the notation and example of Adda and Cooper (2003), we assume that
individuals are choosing how much to consume each period ct of a cake that starts
out with size W1 > 0. So the law of motion for the size of the cake at the end of
period t is the following.
Wt+1 = Wt − ct (3)
You can think of the size of the cake at the beginning of a period Wt as being given.
The optimization problem for the individual is to choose consumption in each
period ct in order to maximize lifetime utility Ut subject to the constraint on the size
of the cake (law of motion).
T
X
max β t−1 u(ct ) s.t Wt+1 = Wt − ct
ct ∈[0,Wt ]
t=1
This problem can be rewritten in the following way by substituting the law of motion
for the cake size into the utility function.
T
X
max β t−1 u(Wt − Wt+1 )
Wt+1 ∈[0,Wt ]
t=1
Exercise 1. If the individual lives for one period T = 1, what is the condition that
characterizes the optimal amount of cake to eat in period 1? Write the problem in
the equivalent way of showing what the condition is for the optimal amount of cake
to save for the next period WT +1 or W2 .
2
Exercise 2. If the individual lives for two periods T = 2, what is the condition that
characterizes the optimal amount of cake to leave for the next period W3 in period
2? What is the condition that characterizes the optimal amount of cake leave for the
next period W2 in period 1?
Exercise 3. If the individual lives for three periods T = 3 what are the conditions
that characterize the optimal amount of cake to leave for the next period in each
period {W2 , W3 , W4 }? Now assume that the initial cake size is W1 = 1, the discount
factor is β = 0.9, and the period utility function is log(ct ). Show how {ct }3t=1 and
{Wt }4t=1 evolve over the three periods.
2
Note that the notation for the value function Vt and policy function ψt with a time subscript
is different from the notation in chapter 1 of Adda and Cooper (2003). Here, the time subscript
denotes a different function for each period of time, whereas the subscript T in Adda and Cooper
(2003) denotes a function in which the total number of periods is T .
3
If we make an envelope theorem assumption that WT +1 will be chosen optimally in
period T according to (5), we can rewrite the problem in the following way,
where VT (WT ) is defined in (4). This is the finite horizon version of the famous
recursive workhorse, the Bellman equation.3 The assumption that future choices will
be made optimally is called the principle of optimality.
Exercise 4. Using the envelope theorem that says WT +1 will be chosen optimally in
the next period, show the condition that characterizes the optimal choice (the policy
function) in period T − 1 for WT = ψT −1 (WT −1 ). Show the value function VT −1 in
terms of ψT −1 (WT −1 ).
Exercise 5. Let u(c) = log(c). Show that VT −1 W̄ does not equal VT W̄ and that
ψT −1 W̄ does not equal ψT W̄ for a cake size of W̄ when T < ∞ represents the
last period of an individual’s life.
Exercise 6. Using u(c) = log(c), write the finite horizon Bellman equation for the
value function at time T − 2. Characterize the solution for the period T − 2 policy
function for how much cake to save for the next period WT −1 = ψT −2 (WT −2 ) using
the envelope theorem (the principle of optimality) and write its analytical solution.
Also, write the analytical solution for VT −2 .
Exercise 7. Using u(c) = log(c) and the answers to Exercises 5 and 6, write down
the expressions for the analytical solutions for ψT −s (WT −s ) and VT −s (WT −s ) for the
general integer s ≥ 1 using induction. Show that lims→∞ VT −s (WT −s ) = V (WT −s )
and that lims→∞ ψT −s (WT −s ) = ψ (WT −s ). That is, as the horizon becomes further
and further away (infinite), the value function and policy function become indepen-
dent of time. Another way of saying this is the following. The value of entering a
period t with a certain amount of cake when the end of your life is far enough away
only depends on how much cake there is Wt , not in what period you have that amount
of cake.
Exercise 8. Write the Bellman equation for the cake eating problem with a general
utility function u(c) when the horizon is infinite (i.e., either T = ∞ or s = ∞).
V (W ) = max
0
u (W − W 0 ) + βV (W 0 ) (8)
W ∈[0,W ]
3
Dreyfus (2002) is a good reference for the origin of the Bellman equation.
4
Note that the value function V on the left-hand-side of (8) and on the right-hand-side
are the same function. This is what you showed in Exercise 7.
Because the problem now has an infinite horizon, the nature of the solution is a
little different. The solution to (8) is a policy function W 0 = ψ(W ) that creates a fixed
point in V . In other words, the solution is a policy function ψ(W ) that makes the
function V on the left-hand-side of (8) equal the function V on the right-hand-side.
Another way of thinking of the problem is that the Bellman equation is one equa-
tion with two unknowns—the value function V and the policy function ψ. The
condition that renders this problem identified is that the Bellman equation must be
a contraction mapping. In a sense, the contraction mapping condition pins down the
value function V .
Define C as an operator on any value function Vt (W ). Let C perform the following
operation.4
C Vt (W ) ≡ max 0
u (W − W 0 ) + βVt (W 0 ) (9)
W ∈[0,W ]
Note that the value function on the right-hand-side of (9) and on the left-hand-side
are the same function Vt , but have a different value of the size of the cake—W versus
W 0 . For a reason that will become apparent in a moment, define the resulting function
from the C operator as Vt−1 .
Vt−1 (W ) ≡ C Vt (W ) (10)
The value function that results from the C operation Vt−1 is not necessarily the same
as the value function that the system began with Vt . The solution, then, is the fixed
point in V .
C Vt (W ) = Vt−1 (W ) = Vt (W ) = V (W )
The operator C(·) is called a contraction mapping if applying it over and over again
to an arbitrary value function Vt converges to a fixed point. One way to characterize
a contraction mapping is:
lim C s Vt (W ) = V (W )
s→∞
4
I use a subscript t here to denote the iteration number. I have the contraction operator C(·)
advance the iteration number backward t − 1 to maintain the backwards induction analogy from the
previous exercises of solving for value functions from some terminal period T .
5
Theorem 1 (Contraction mapping theorem). If (S, ρ) is a complete metric space
and C : S → S is a contraction mapping with modulus β, then
• C has exactly one fixed point v in S, and
• for any v0 ∈ S, ρ(C n v0 , v) ≤ β n ρ(v0 , v) for n = 0, 1, 2, ....
Before moving on to the exercises in which you will solve for the value function
V and the policy function ψ by utilizing the contraction mapping theorem, it is
important to communicate one last important reason why it works. In the recursive
finite horizon problem from Section 3, you could always solve for V and ψ. The
Bellman equation was a second-order difference equation in Wt . This is easily seen in
the period T − 2 problem from Exercise 6.
VT −2 (WT −2 ) = max u WT −2 − WT −1 + βu WT −1 − WT + β 2 u WT
(11)
WT −1
6
But how do we solve for these objects V and ψ in the infinite horizon in which we
do not have an ending condition? The answer is that we do have an ending condition
in the infinite horizon problem, and it is called the transversality condition.
The transversality condition simply states that the present value of the state Wt goes
to zero far off into the future (ending condition). This ensures that people don’t save
cake forever, thereby consuming zero in any period. In value function iteration, this
is analogous to starting with V0 = 0. All recursive (infinite horizon) problems have a
transversality condition in the background.
Exercise 9. Let the maximum size of the cake be Wmax = 1. Approximate the
continuum of possible cake sizes by a column vector called W that ranges between a
number very close to 0 to 1.7 Let the number of possible cake values be N = 100 so
that the increment between each value is 0.01. So Wmin = 0.01.
Exercise 10. As in the previous problem sets, assume that period T is the final
period of an individual’s life. So VT +1 (W 0 ) for entering period T + 1 with a cake of
size W 0 is a column vector of zeros of length N , where VT +1 : R → R and W, W 0 ∈
[Wmin , Wmax ]. Assume that the period utility function has the logarithmic functional
form u(c) = log(c), and that the discount factor is β = 0.9. What is the resulting
policy function W 0 = ψT (W ) and value function VT (W ) when VT is defined as the
contraction in equations (9) and (10)?8 See Appendix A-1 for a fast way to compute
this exercise.
Exercise 11. Generate a norm δT = kVT (W ) − VT +1 (W 0 )k that measures the dis-
tance between the two value functions. Define the distance metric as the sum of the
squared differences,
where (VT − VT +1 )0 is the transpose of the difference of the two vectors. Defined in
this way, δT ∈ R+ .
Exercise 12. Take the resulting VT from Exercise 10, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
VT −1 (W ) = C VT (W ) = max 0
u (W − W 0 ) + βVT (W 0 )
W ∈[0,W ]
7
Exercise 13. Repeat Exercise 12 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .
Exercise 14. Write a while loop in Python that performs the contraction operation
from Exercises 10, 12, and 13 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve
just completed your first solution by value function iteration. The distance measure
δT −s being arbitrarily close to zero means you have converged to the fixed point
Vt = Vt−1 = V . (For fun, you can show that the policy function converges to the
same function regardless of what you put in for your initial policy function value.)
Exercise 15. Using the matplotlib library, plot the policy function for the converged
problem W 0 = ψT −s (W ) = ψ (W ) which gives the value of the cake tomorrow (y-axis)
as a function of the cake today (x-axis).
V (W, ε) = max
0
εu (W − W 0 ) + βEε0 [V (W 0 , ε0 )] where ε ∼ N(µ, σ 2 )
W ∈[0,W ]
where E is the unconditional expectations operator over all values in the support of
ε, µ is the mean of ε, and σ 2 is the variance of ε.
8
ε is an M -length row vector. Generate the probability distribution over Γ(ε) such
that ε ∼ N (0, σ 2 ). Thus, Γ(ε) represents the probability of a particular realization
Pr (ε = εm ). (Hint: This is essentially the Newton-Cotes method of approximating
an integral that did this in your numerical integration labs.)
Exercise 17. As in Exercise 9 from Section 4, assume that the vector of possible cake
sizes is W with N = 100 equally spaced values between 0.01 and 1. As in Exercise
10 from Section 4, assume a value function VT +1 (W 0 , ε0 ) for entering the period after
the last period of life with cake size W 0 and taste shock realization ε0 . This value
function will be a matrix with each row corresponding to different values of W 0 and
each column corresponding to different values of ε0 . So each element in the matrix
is VT +1 (Wn0 , ε0m ). Let your initial guess for the value function VT +1 be a matrix of
zeros with N rows and M columns. Assume that the period utility function has the
logarithmic functional form u(c) = log(c), and that the discount factor is β = 0.9.
What is the resulting policy function W 0 = ψT (W, ε) and value function VT (W, ε)
when VT is defined as in (14) below? See Appendix A-2 for a fast way to compute
this exercise.
h i
0 0 0
Vt−1 (W, ε) ≡ C Vt (W, ε) ≡ max 0
εu (W − W ) + βE ε 0 Vt (W , ε ) (14)
W ∈[0,W ]
where vec (VT − VT +1 )0 is the transpose of the column vectorized version of VT −VT +1 .
Defined in this way, δT ∈ R+ .
Exercise 19. Take the resulting VT from Exercise 17, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
h i
0 0 0
VT −1 (W, ε) = C VT (W, ε) = max 0
εu (W − W ) + βEε VT (W , ε )
0
W ∈[0,W ]
and the accompanying policy function W 0 = ψT −1 (W, ε). Calculate the accompany-
ing distance measure for δT −1 using the formula from (15) with the updated period
subscripts. Compare δT −1 with δT from Exercise 18.
Exercise 20. Repeat Exercise 19 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .
Exercise 21. Write a while loop in Python that performs the contraction operation
from Exercises 17, 19, and 20 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve just
completed your first solution to a stochastic problem by value function iteration. The
distance measure δT −s being arbitrarily close to zero means you have converged to
the fixed point Vt = Vt−1 = V .
9
Exercise 22. Use Python ’s matplotlib library to make a 3-D surface plot of the
policy function for the converged problem W 0 = ψT −s (W, ε) = ψ (W, ε) which gives
the value of the cake tomorrow (y-axis) as a function of the cake today (x1-axis) and
the taste shock today (x2-axis).
Then the Bellman equation becomes the following, in which the only change from
the problems in Section 6 is that the expectations operator is now a conditional
expectation because of the persistent shock process,
V (W, ε) = max
0
εu (W − W 0 ) + βEε0 |ε [V (W 0 , ε0 )]
W ∈[0,W ]
where ε0 is distributed according to (16). Let Γ(ε0 |ε) = Pr ε0j |εi where ε0j is the
shock in the next period and εi is the value of the shock in the current period.
Exercise 23. Use the method described by Tauchen and Hussey (1991) to approx-
imate the AR(1) process for ε from (16) as a first order Markov process. The
Python function file “tauchenhussey.py” will produce a vector of length M for the
support of ε and an M × M transition matrix Γ(ε0 |ε) = Pr ε0j |εi where each element
in row i and column j represents the probability of ε0j tomorrow given εi today. As
√
inputs, let M = 7, the mean of the process µ = 4σ, ρ = 1/2, σ = σ 2 = 1/2, and
basesigma = (0.5 + ρ4 )σ + (0.5 − ρ4 ) ∗ √ σ 2 .
1−ρ
Exercise 24. As in Exercise 17 from Section 6, assume that the vector of possible
cake sizes is W with N = 100 equally spaced values between 0.01 and 1 and assume
a value function VT +1 (W 0 , ε0 ) for entering the period after the last period of life with
cake size W 0 and taste shock realization ε0 . This value function will be a matrix with
each row corresponding to different values of W 0 and each column corresponding to
different values of ε0 . So each element in the matrix is VT +1 (Wn0 , ε0m ). Let your
initial guess for the value function VT +1 be a matrix of zeros with N rows and M
columns. Assume that the period utility function has the logarithmic functional form
u(c) = log(c), and that the discount factor is β = 0.9. What is the resulting policy
function W 0 = ψT (W, ε) and value function VT (W, ε) when VT is defined as in (17)
below? See Appendix A-3 for a fast way to compute this exercise.
h i
0 0 0
Vt−1 (W, ε) ≡ C Vt (W, ε) ≡ max 0
εu (W − W ) + βEε0 |ε Vt (W , ε ) (17)
W ∈[0,W ]
10
Exercise 25. Generate a norm δT = kVT (W, ε) − VT +1 (W 0 , ε0 )k that measures the
distance between the two value functions. Define the distance metric as the sum of
the squares of each corresponding element in the two value functions,
δT ≡ kVT (W, ε) − VT +1 (W 0 ε0 )k ≡ vec (VT − VT +1 )0 ∗ vec (VT − VT +1 ) (18)
where vec (VT − VT +1 )0 is the transpose of the column vectorized version of VT −VT +1 .
Defined in this way, δT ∈ R+ .
Exercise 26. Take the resulting VT from Exercise 24, and perform the same contrac-
tion on it to generate VT −1 and ψT −1 . That is, generate,
h i
0 0 0
VT −1 (W, ε) = C VT (W, ε) = max 0
εu (W − W ) + βE 0
ε |ε VT (W , ε )
W ∈[0,W ]
and the accompanying policy function W 0 = ψT −1 (W, ε). Calculate the accompany-
ing distance measure for δT −1 using the formula from (18) with the updated period
subscripts. Compare δT −1 with δT from Exercise 25.
Exercise 27. Repeat Exercise 26 and generate VT −2 and ψT −2 by performing the
contraction on VT −1 . Compare δT −2 to δT −1 and δT .
Exercise 28. Write a while loop in Python that performs the contraction operation
from Exercises 24, 26, and 27 iteratively until the distance measure is very small
δT −s < 10−9 . How many iterations did it take (s + 1)? Congratulations, you’ve
just completed your first solution to a stochastic AR(1) problem by value function
iteration. The distance measure δT −s being arbitrarily close to zero means you have
converged to the fixed point Vt = Vt−1 = V .
Exercise 29. Make a 3-D surface plot of the policy function for the converged prob-
lem W 0 = ψT −s (W, ε) = ψ (W, ε) which gives the value of the cake tomorrow (y-axis)
as a function of the cake today (x1-axis) and the taste shock today (x2-axis).
11
recent wage w, current job offer wage w0 , and employment status s be given by the
following value function,
(
V E (w) if s = E
V (w, w0 , s) = U 0
(19)
V (w, w ) if s = U
The value of being unemployed in a given period is a function of both the wage
at the most recent job w as well as the wage of the current job offer w0 ,
n o
U 0 E 0 00
U
V (w, w ) = u(αw) + β 0 max V (w ), Ew00 V (w, w ) (22)
s ∈{E,U }
where α ∈ (0, 1) is the fraction of the worker’s previous wage paid in unemployment
insurance benefits. It is only in the unemployed state s = U in which the worker makes
a decision. Once the job offer is received w0 which is drawn from the independent
cumulative probability distribution F (w0 ) or the probability density function f (w0 ),
the worker can choose whether to accept or reject the offer. The expectation in (22)
is, therefore, not over w0 but over the possible job offers in the following period w00 if
the worker chooses to reject the current job offer s0 = U .
The policy function for the decision of the unemployed worker whether to accept a
job s0 = E or whether to reject a job s0 = U will be a function of both the amount of
12
the most recent wage w and the amount the the current job offer: s0 = ψ(w, w0 ). These
discrete choice problems are often called threshold problems because the policy choice
depends on whether the state variable is greater than or less than some threshold level.
0
In the labor search model, the threshold level is called the “reservation wage” wR .
0
The reservation wage wR is defined as the wage offer such that the worker is indifferent
between accepting the job s0 = E and staying unemployed s0 = U .
0
≡ w0 : V E (w0 ) = Ew00 V U (w, w00 )
wR (23)
0
Note that the reservation wage wR is a function of the wage at the most recent job
w. The policy function will then take the form of accepting the job if w0 ≥ wR 0
or
0 0
rejecting the job offer and stay unemployed if w < wR .
(
0 0 E if w0 ≥ wR 0
s = ψ(w, w ) = (24)
U if w0 < wR 0
In summary, the labor search discrete choice problem is characterized by the value
functions (19), (21), and (22), the reservation wage (23), and the policy function (24).
Because wage offers are distributed according to the cdf F (w0 ) and because the policy
function takes the form of (24), the probability that the unemployed worker receives
0
a wage offer that he will reject is F (wR ) and the probability that he receives a wage
0
offer that he will accept is 1 − F (wR ). Just like the continuous choice cake eating
problems in problem sets 1 through 5, this problem can be solved by value function
iteration, which is similar to starting at the “final” period of an individual’s life and
solving for the an infinite series of solutions by backward induction.
1. Assume that workers only live a finite number of periods T and assume that
the utility of consumption is log utility u(c) = log(c). The value of entering
the last period of life with most recent wage w and employment status s is the
following. (
VTE (w) = log(w) if s = E
VT (w, w0 , s) = U 0
VT (w, w ) = log(αw) if s = U
Solve analytically for the value of entering the second-to-last period of life
with most recent wage, current job offer, and employment status VT −1 (w, w0 , s)
(which includes VTE−1 (w) and VTU−1 (w, w0 )), the reservation wage wR,T
0
−1 , and
0 0
the policy function s = ψT −1 (w, w ).
0 0 0
2. Given the solutions for the VT −1 , wR,T −1 , and s = ψT −1 (w ) from the previous
exercise, solve analytically for the value of entering the third-to-last period of life
with most recent wage, current job offer, and employment status VT −2 (w, w0 , s)
(which includes VTE−2 (w) and VTU−2 (w, w0 )), the reservation wage wR,T 0
−2 , and
0 0
the policy function s = ψT −2 (w, w ). [NOTE: This operation of solving for the
new value function Vt (w, s) is a contraction.]
The value function iteration solution method for the equilibrium in the labor
search problem is analogous to the value function iteration we did in problem sets 3,
13
4, and 5. The only difference is that two value functions must converge to a fixed
point in this problem instead of just one value function converging in the previous
problems.
For the following exercises, you will use Python . Assume that the probability
of becoming unemployed in a given period is γ = 0.10, the fraction of wages paid
in unemployment benefits is α = 0.5, and the discount factor is β = 0.9. Assume
that wage offers to unemployed workers are distributed lognormally w0 ∼ LogN(µ, σ)
where m = 20 is the mean wage, v = 400 is the variance of the wage, µ is the mean
of log(w0 ) and σ is the standard deviation of log(w0 ). Denote the cdf of the lognormal
distribution as F (w0 ) and the pdf of the distribution as f (w0 ).
4. Write Python code that solves for the equilibrium optimal policy function s0 =
ψ(w, w0 ), the reservation wage wR
0
as a function of the current wage w, and the
value functions V E (w) and V U (w, w0 ) using value function iteration.
0
5. Plot the equilibrium reservation wage wR of the converged problem as a function
of the current wage w with the current wage on the x-axis and the reservation
0
wage wR on the y-axis. This is the most common way to plot discrete choice
policy functions. The reservation wage represents the wage that makes the
unemployed worker indifferent between taking a job offer and rejecting it. So
any wage above the reservation wage line represents s0 = E and any wage below
the reservation wage line represents s0 = U .
14
References
Adda, Jérôme and Russell Cooper, Dynamic Economics: Quantitative Methods
and Applications, The MIT Press: Cambridge, Massachusetts, 2003.
15
APPENDIX
A-1 Computation of the value function and policy
function using discretized state and control
space with perfect foresight
In Exercise 10 from Section 4, the finite horizon Bellman equation is:
A simple approach to take in calculating these kinds of problems is to put it into the
computer without the maxW 0 operator using the entire set of utility values and value
function values for each possible point in the state space W and each possible point
in the control space W 0 . In this case, I put values associated with W on the rows
(1st dimension) and values associated with W 0 in the columns (2nd dimension). The
computational geometry is shown below.
16
As mentioned in the previous paragraph, you’ll need to replace all the entries of the
VT +1 matrix that correspond to values for which W 0 > W with a very large negative
number (e.g., −1010 or even -1000) so that those values of W 0 will not be picked in
the maximization.
Now you just add your u(W − W 0 ) matrix to your N × N βVT +1 matrix and you
have an N × N matrix VT (W, W 0 ) representing the period-T value function for any
W and any W 0 . The last step is to maximize over the W 0 dimension (2nd dimension).
The policy function ψT (W ) will be an N × 1 column vector that represents the W 0
value that maximizes the value function for a given W .
17
A-2 Computation of the value function and policy
function using discretized state and control
space with i.i.d. shock
In Exercise 17 from Section 6, the finite horizon Bellman equation is:
The approach I take in calculating these kinds of problems is to put it into the
computer without the maxW 0 operator using the entire set of utility values and value
function values for each possible point in the state space (W, ε) and each possible
point in the control space W 0 . In this case, I put values associated with W on the
rows (1st dimension), the values associated with ε on the columns (2nd dimension),
and values associated with W 0 in the depth (3rd dimension). The computational
geometry is shown below.
18
very large negative number (e.g., −1010 or even -100) so that those values of W 0 will
not be picked in the maximization.
Now you just add your εu(W − W 0 ) array to your N × M × N βEε0 VT +1 array,
and you have an N × M × N matrix VT (W, ε, W 0 ) representing the period-T value
function for any W , ε, and W 0 . The last step is to maximize over the W 0 dimension.
The policy function ψT (W ) will be an n × M matrix that represents the W 0 value
that maximizes the value function for a given W and ε.
3. Then copy the reshaped Eε0 [VT +1 (W 0 , ε0 )] to N rows and M columns so that
you have an array that has dimension N × M × N that is only a function of W 0
which is represented in the third dimension of the array.
5. Lastly, the new value function is obtained by adding the two three-dimensional
arrays together (multiplying the second array by the discount factor) and maxi-
mizing over the third dimension W 0 . A max command along the 3rd dimension
of the array can return a matrix of index numbers that represent the optimal
value of W 0 , from which you can create the policy function matrix ψ (W, ε).
19
A-3 Computation of the value function and policy
function using discretized state and control
space with AR(1) shock
In Exercise 24 from Section 7, the finite horizon Bellman equation is:
20
described by writing the following code: “EVTp1array = reshape(VTp10 , [1, M, N ])”
where you make sure to use the transpose “VTp10 ” in the reshape command.
Then copy the 1 × M × N -array Eε0 |ε VT +1 (W 0 , ε0 ) down N rows. This copying can
be easily done using the “repmat” command in MatLab and represents the fact that
VT +1 is not a function of W . As with the εu(W − W 0 ) array, you will need to replace
all the entries of the VT +1 matrix that correspond to values for which W 0 ≥ W with
a very large negative number (e.g., −1010 or even -100) so that those values of W 0
will not be picked in the maximization.
Now you just add your εu(W − W 0 ) array to your N × M × N βEε0 |ε VT +1 array,
and you have an N × M × N matrix VT (W, ε, W 0 ) representing the period-T value
function for any W , ε, and W 0 . The last step is to maximize over the W 0 dimension.
The policy function ψT (W ) will be an N × M matrix that represents the W 0 value
that maximizes the value function for a given W and ε.
3. Then copy the reshaped Eε0 |ε [VT +1 (W 0 , ε0 )] to N rows so that you have an array
that has dimension N × M × N that is only a function of W 0 (represented in
the third dimension of the array) and ε (represented on the second dimension
of the array).
5. Lastly, the new value function is obtained by adding the two three-dimensional
arrays together (multiplying the second array by the discount factor) and maxi-
mizing over the third dimension W 0 . A max command along the 3rd dimension
of the array can return a matrix of index numbers that represent the optimal
value of W 0 , from which you can create the policy function matrix ψ (W, ε).
21