Lecture SM 1 DP
Lecture SM 1 DP
1
The basic framework
• An agent, given state st ∈ S takes an optimal action at ∈ A (s) that determines current utility
u (st , at ) and affects the distribution of next period’s state st+1 via a Markov chain p (st+1 |st , at ).
• The difficulty is that we are not looking for a set of numbers a = {a1 , . . . , aT } but for a set of
functions α = {α1 , . . . , αT }.
2
The DP problem
• DP simplifies the MDP problem, allowing us to find α = {α1 , . . . , αT } using a recursive procedure.
• We are going to focus on infinite horizon problems, where V is the unique solution for the Bellman
equation V = Γ (V ).
3
The Bellman operator and the Bellman equation
• This will allow us to use some numerical procedures to find the solution to the Bellman equation
recursively.
4
Discrete vs. continuous MDPs
• Difference between Discrete MDPs –whose state and control variables can only take a finite number
of points– and continuous MDPs –whose state and control variables can take a continuum of values.
• Value functions for discrete MDPs belong to a subset of the finite-dimensional Euclidean space R #S .
• Value functions for continuous MDPs belong to a subset of the infinite-dimensional Banach space
B (S) of bounded, measurable real-valued functions on S.
• Therefore, we can solve discrete MDPs exactly (rounding errors) while we can only approximate the
solution to continuous MDPs.
• Discrete MDPs arise naturally in IO/labor type of applications while continuous MDPs arise
naturally in Macro.
5
Computation: speed vs. accuracy
• The approximating error ϵ introduces a trade-off: better accuracy (lower ϵ) versus shorter time to
find the solution (higher ϵ).
• The time needed to find the solution also depends on the dimension of the problem: d.
• Hence, we will have to solve the Bellman equation for various values of the “structural” parameters
defining β, u, and p.
6
Approximation to continuous DPs
• Discrete.
• Smooth.
• Discrete solves an equivalent discrete problem that approximates the original continuous DPs.
• Smooth treats the value function V and the decision rule α are smooth functions of s and a finite set
of coefficients θ.
7
Smooth approximation to continuous DPs
• Then we will try to find θb such that the approximations the approximated value function Vθb and
decision rule αθb are close to V and α using some metric.
• Example:
1. Let S = [−1, 1].
Pk
2. Consider Vθ (s) = i=1 θi pi (s) and let pi (s) = s i .
• Another example is pi (s) = cos i cos−1 (s) . These are called the Chebyshev polynomials of the first
kind.
8
The Stone-Weierstrass approximation theorem
• Let ε > 0 and V be a continuous function in [−1, 1], then there exists a polynomial Vθ such that
∥V − Vθ ∥ < ε
N
!1/2
X 2
Vθ (si ) − b
Γ (Vθ ) (si )
i=1
where b
Γ (Vθ ) is an approximation to the Bellman operator. Why is an approximation?
9
MDP definitions
• If T < ∞ (the problem has a finite horizon), DP is equivalent to backward induction. In the terminal
period αT is:
αT (sT ) = arg max u (sT , aT )
aT ∈A(sT )
Z
Vt (st ) = u (st , αt (st )) + β Vt+1 (st+1 ) p (dst+1 |st , αt (st ))
• It could be the case that at = αt (st , at−1 , st−1 , . . .) depend on the whole history, but it can be shown
that separability and the Markovian property of p imply that at = αt (st ).
11
The Bellman equation in the infinite horizon problem I
• On the other hand, the separability and the Markovian property of p imply that at = α (st ), that is,
the problem has a stationary Markovian structure.
• Blackwell (1965) and Denardo (1967) show that the Bellman operator is a contraction mapping: for
W , V in B (S),
∥Γ (V ) − Γ (W )∥ ≤ β ∥V − W ∥
Z
V (s) = u (s, α (s)) + β V (s) p (ds ′ |s, α (s))
• Consider u (s, a) = 1.
15
Phelps’ (1972) example I
• The state variable, w , is the wealth of the agent and the decision variable, c, is how much to
consume.
• The saving are invested in a single risky asset with iid return Rt with distribution F .
16
Phelps’ (1972) example II
• If we substitute V∞ into the Bellman equation and we look for f∞ and g∞ , we get:
1
f∞ =
1−β
• Examples:
1. Life cycle.
3. Finite games.
19
Infinite time
• Examples:
1. Industry dynamics.
2. Business cycles.
3. Infinite games.
20
Discrete state space
1. ε-equilibria.
2. Estimation.
21
Infinite state space
• Bounds?
• Interaction of bounds?
22
Different strategies
3. Projection.
4. Perturbation.
• Many other strategies are actually particular cases of the previous ones.
23
Value function iteration
1. We come back to our two distinctions: finite versus infinite time and discrete versus continuous state
space.
• Initialization.
• Discretization.
24
Value function iteration in finite time
25
Value function iteration in infinite time
26
Policy function iteration
• With infinite time, we can also apply policy function iteration (aka as Howard improvement
algorithm):
• Under some conditions, if can be faster than value function iteration (more on this later).
• Most of the next slides applies to policy function iteration without any (material) change.
27
Normalization
• Three advantages:
2. Stability properties.
28
Initial value in finite time problems
29
Initial guesses for infinite time problems
• That does not mean we should not be smart picking our initial guess.
1. Steady state of the problem (if one exists). Usually saves at least one iteration.
2. Perturbation approximation.
30
Discretization
• In the case where we have a continuous state space, we need to discretize it into a grid.
• How do we do that?
31
New approximated problem
• Exact problem: Z
′ ′
V (s) = max (1 − β) u (s, a) + β V (s ) p (ds |s, a)
a∈A(s)
• Approximated problem:
" N
#
X
Vb (s) = max (1 − β) u (s, a) + β Vb (sk′ ) pN (sk′ |s, a)
a∈A(s)
b
k=1
32
Grid generation
2. How to approximate p by pN .
33
Uniform grid
34
Non-uniform grid
35
Discretizing stochastic process
σε2
• Recall that E[z] = µz and Var [z] = σz2 = (1−ρ2 ) .
zN = µz + mσz z1 = µz − mσz
• z2 , z3 , ..., zN−1 are equispaced over the interval [z1 , zN ] with zk < zk+1 for any k ∈ {1, 2, ..., N − 1}
36
State
Example Space (Con’t)
37
Transition I
38
Transition II
39
Transition probability
41
VAR(1) case: transition probability
• Consider a transition from zi = (zi11 , zi22 , ..., ziKK ) to zj = (zj11 , zj22 , ..., zjKK ).
QK
• Therefore, πi,j = k=1 πikk ,jk .
42
Example
43
Quadrature grid
• Gaussian quadrature: we require previous equation to be exact for all polynomials of degree less than
or equal to 2N − 1.
44
Rouwenhorst (1995) Method
iid
• Consider again z ′ = ρz + ε′ with ε′ ∼ N (0, σε2 ).
• Transition probability ΘN .
√
• Set endpoints as zN = σz N − 1 ≡ ψ, and z1 = −ψ.
1. For n = 2, define Θ2 .
45
State and transition probability
1+ρ
• Define p = q = 2 (under the assumption of symmetric distribution) and
" #
p 1−p
Θ2 =
1−q q
• Compute Θn by:
" # " #
Θn−1 0 0 Θn−1
Θn = p + (1 − p)
0′ 0 0 0′
" # " #
0′ 0 0 0′
+(1 − q) +q
Θn−1 0 0 Θn−1
47
Invariant distribution
(N) (N)
• Distribution generated by ΘN converges to the invariant distribution λ(N) = (λ1 , ..., λN ) with
!
(N) N −1
λi = s i−1 (1 − s)N−1
i −1
where
1−p
s=
2 − (p + q)
• From this invariant distribution, we can compute moments associate with ΘN analytically.
48
Which method is better?
• Kopecky and Suen (2010) argue that Rouwenhorst method is the best approx., especially for high
persistence (ρ → 1).
• Test bed:
Z
′ ′ ′
V (k, a) = max
′
log (c) + β V (k , a )dF (a |a)
c,k ≥0
50
Stochastic grid
• Why?
51
Interpolation
• Problem: in one than more dimension, linear interpolation may not preserve concavity.
52
V(kt)
kt
53
Multigrid algorithms
• Basic idea: solve first a problem in a coarser grid and use it as a guess for more refined solution.
• Examples:
1. Differential equations.
2. Projection methods.
54
Applying the algorithm
• After deciding initialization and discretization, we still need to implement each step:
Z
T T −1 ′ ′
V (s) = max u (s, a) + β V (s ) p (ds |s, a)
a∈A(s)
1. Maximization.
2. Integral.
55
Maximization
• Brute force (always works): check all the possible choices in the grid.
56
Brute force
• Some times we do not have any other alternative. Examples: problems with discrete choices,
non-differentiabilities, non-convex constraints, etc.
1. Previous solution.
2. Monotonicity of choices.
57
Newton or Quasi-Newton
• Much quicker.
• However:
58
Generalized policy iteration
• Often, while we update the value function, optimal choices are not.
• This suggests a simple strategy: apply the max operator only from time to time.
• How do we choose the optimal timing of the max operator (i.e., the relative sweeps of value and
policy)?
59
How do we integrate?
• Exact integration.
• Quadrature.
• Monte Carlo.
60
Convergence assessment
61
Non-local accuracy test
we can define:
αe zt+1 k i (kt , zt )α−1
EE i (kt , zt ) ≡ 1 − c i (kt , zt ) Et
c i (k i (kt , zt ), zt+1 )
• Units of reporting.
• Interpretation.
62
Error analysis
• How?
• Advantages of procedure.
• Problems.
63
The endogenous grid method
• It is actually easier to understand with a concrete example: a basic stochastic neoclassical growth
model.
64
Changing state variables
• We use a capital Yt to denote the total market resources and a lower yt for the production function.
• More general point: changes of variables are often key in solving our problems.
to get: ( )
1−τ
(Yt − kt+1 )
V (Yt , zt ) = max + Ṽ (kt+1, zt )
kt+1 1−τ
66
Backing up consumption
∗
a nonlinear equation on kt+1 for each point in a grid for kt .
• The key difference is, thus, that the endogenous grid method defines a fixed grid over the values of
kt+1 instead of over the values of kt .
• This implies that we already know what values the policy function for next period’s capital take and,
thus, we can skip the root-finding.
68