Quantecon Python Advanced
Quantecon Python Advanced
Python
II LQ Control 75
5 Information and Consumption Smoothing 77
i
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Two Representations of One Nonfinancial Income Process . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Application of Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 News Shocks and Less Informative Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5 Representation of 𝜖𝑡 Shock in Terms of Future 𝑦𝑡 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.6 Representation in Terms of 𝑎𝑡 Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.7 Permanent Income Consumption-Smoothing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.8 State Space Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.9 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.10 Simulating Income Process and Two Associated Shock Processes . . . . . . . . . . . . . . . . . . . . 89
5.11 Calculating Innovations in Another Way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.12 Another Invertibility Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
ii
11.4 Better Representation of Roll-Over Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
iii
17.8 Cattle Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
17.9 Models of Occupational Choice and Pay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
17.10 Permanent Income Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
17.11 Gorman Heterogeneous Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
17.12 Non-Gorman Heterogeneous Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
iv
26 Etymology of Entropy 469
26.1 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
26.2 A Measure of Unpredictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
26.3 Mathematical Properties of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
26.4 Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
26.5 Independence as Maximum Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
26.6 Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.7 Statistical Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.8 Continuous distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
26.9 Relative entropy and Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
26.10 Von Neumann Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
26.11 Backus-Chernov-Zin Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
26.12 Wiener-Kolmogorov Prediction Error Formula as Entropy . . . . . . . . . . . . . . . . . . . . . . . . 474
26.13 Multivariate Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
26.14 Frequency Domain Robust Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
26.15 Relative Entropy for a Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
27 Robustness 479
27.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
27.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
27.3 Constructing More Robust Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
27.4 Robustness as Outcome of a Two-Person Zero-Sum Game . . . . . . . . . . . . . . . . . . . . . . . 485
27.5 The Stochastic Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
27.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
27.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
27.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
v
32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
32.2 A Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
32.3 Finite Horizon Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
32.4 Infinite Horizon Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
32.5 Undiscounted Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
32.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
32.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
vi
37.6 Adding Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
37.7 Bayesian Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
37.8 Curve Decolletage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
37.9 Black-Litterman Recommendation as Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 685
37.10 A Robust Control Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
37.11 A Robust Mean-Variance Portfolio Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
37.12 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689
37.13 Special Case – IID Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
37.14 Dependence and Sampling Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
37.15 Frequency and the Mean Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
vii
42.6 A Gradient Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
42.7 A More Structured ML Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
42.8 Continuation Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
42.9 Adding Some Human Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
42.10 What has Machine Learning Taught Us? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
viii
48.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
48.2 The Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952
48.3 Long Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
48.4 Asymptotic Mean and Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
IX Other 1045
51 Troubleshooting 1047
51.1 Fixing Your Local Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
51.2 Reporting an Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1048
52 References 1049
Bibliography 1053
Index 1061
ix
x
Advanced Quantitative Economics with Python
CONTENTS 1
Advanced Quantitative Economics with Python
– Estimation of Spectra
– Additive and Multiplicative Functionals
– Classical Control with Linear Algebra
– Classical Prediction and Filtering With Linear Algebra
– Knowing the Forecasts of Others
• Asset Pricing and Finance
– Asset Pricing II: The Lucas Asset Pricing Model
– Elementary Asset Pricing Theory
– Two Modifications of Mean-Variance Portfolio Theory
– Irrelevance of Capital Structures with Complete Markets
– Equilibrium Capital Structures with Incomplete Markets
• Dynamic Programming Squared
– Optimal Unemployment Insurance
– Stackelberg Plans
– Machine Learning a Ramsey Plan
– Time Inconsistency of Ramsey Plans
– Sustainable Plans for a Calvo Model
– Optimal Taxation with State-Contingent Debt
– Optimal Taxation without State-Contingent Debt
– Fluctuating Interest Rates Deliver Fiscal Insurance
– Fiscal Risk and Government Debt
– Competitive Equilibria of a Model of Chang
– Credible Government Policies in a Model of Chang
• Other
– Troubleshooting
– References
– Execution Statistics
2 CONTENTS
Part I
3
CHAPTER
ONE
1.1 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applications.
These include
• Least squares projection, also known as linear regression
• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc
In this lecture, we focus on
• key ideas
• least squares regression
We’ll require the following imports:
import numpy as np
from scipy.linalg import qr
For background and foundational concepts, see our lecture on linear algebra.
For more proofs and greater theoretical detail, see A Primer in Econometric Theory.
For a complete set of proofs in a general setting, see, for example, [Roman, 2005].
For an advanced treatment of projection in the context of least squares prediction, see this book chapter.
5
Advanced Quantitative Economics with Python
Assume 𝑥, 𝑧 ∈ ℝ𝑛 .
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖 .
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩.
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors 𝑥 and 𝑧.
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧.
𝑦 ̂ ∶= arg min ‖𝑦 − 𝑧‖
𝑧∈𝑆
Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof.
For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship
Orthogonal Complement
Let 𝑆 ⊂ ℝ𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every 𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂ .
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂ .
We write
𝑌 = 𝑆 ⊕ 𝑆⟂
to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that 𝑦 = 𝑥1 + 𝑥2 .
Moreover, 𝑥1 = 𝐸𝑆̂ 𝑦 and 𝑥2 = 𝑦 − 𝐸𝑆̂ 𝑦.
This amounts to another version of the OPT:
Theorem. If 𝑆 is a linear subspace of ℝ𝑛 , 𝐸𝑆̂ 𝑦 = 𝑃 𝑦 and 𝐸𝑆̂ ⟂ 𝑦 = 𝑀 𝑦, then
To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify
𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1.1)
𝑗=1
When a subspace onto which we project is orthonormal, computing the projection simplifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then
𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ ℝ𝑛 (1.2)
𝑖=1
𝐸𝑆̂ 𝑦 = 𝑃 𝑦
𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′
An expression of the form 𝑋𝑎 is precisely a linear combination of the columns of 𝑋 and hence an element of 𝑆.
Claim 2 is equivalent to the statement
It is common in applications to start with 𝑛 × 𝑘 matrix 𝑋 with linearly independent columns and let
𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦
We have recovered our earlier result about projecting onto the span of an orthonormal basis.
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦
Because 𝑋𝑏 ∈ span(𝑋)
If probabilities and hence 𝔼 are unknown, we cannot solve this problem directly.
However, if a sample is available, we can estimate the risk with the empirical risk:
1 𝑁
min ∑(𝑦𝑛 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁
𝑛=1
1.6.2 Solution
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦
Let’s return to the connection between linear independence and orthogonality touched on above.
A result of much interest is a famous algorithm for constructing orthonormal sets from linearly independent sets.
The next section gives details.
Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ ℝ𝑛 , there exists an orthonormal set {𝑢1 , … , 𝑢𝑘 } with
1.7.2 QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition.
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization 𝑋 = 𝑄𝑅 where
• 𝑅 is 𝑘 × 𝑘, upper triangular, and nonsingular
• 𝑄 is 𝑛 × 𝑘 with orthonormal columns
Proof sketch: Let
• 𝑥𝑗 ∶= col𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with the same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖
Since 𝑥𝑗 ∈ span{𝑢1 , … , 𝑢𝑗 }, we have
𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1
For matrices 𝑋 and 𝑦 that overdetermine 𝛽 in the linear equation system 𝑦 = 𝑋𝛽, we found the least squares approximator
𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦.
Using the QR decomposition 𝑋 = 𝑄𝑅 gives
𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦
Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitution.
1.8 Exercises
Exercise 1.8.1
Show that, for any linear subspace 𝑆 ⊂ ℝ𝑛 , 𝑆 ∩ 𝑆 ⟂ = {0}.
Exercise 1.8.2
Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and symmetric. Can you give
any intuition as to why they should be idempotent?
Exercise 1.8.3
Using Gram-Schmidt orthogonalization, produce a linear projection of 𝑦 onto the column space of 𝑋 and verify this
using the projection matrix 𝑃 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and also using QR decomposition, where:
1
𝑦 ∶= ⎛
⎜ 3 ⎞⎟,
⎝ −3 ⎠
and
1 0
𝑋 ∶= ⎛
⎜ 0 −6 ⎞
⎟
⎝ 2 2 ⎠
def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.
Parameters
----------
X : an n x k array with linearly independent columns
Returns
-------
U : an n x k array with orthonormal columns
"""
# Set up
n, k = X.shape
U = np.empty((n, k))
I = np.eye(n)
1.8. Exercises 19
Advanced Quantitative Economics with Python
# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))
return U
y = [1, 3, -3]
X = [[1, 0],
[0, -6],
[2, 2]]
First, let’s try projection of 𝑦 onto the column space of 𝑋 using the ordinary matrix expression:
Now let’s do the same using an orthonormal basis created from our gram_schmidt function
U = gram_schmidt(X)
U
Py2 = U @ U.T @ y
Py2
This is the same answer. So far so good. Finally, let’s try the same thing but with the basis obtained via QR decomposition:
Q, R = qr(X, mode='economic')
Q
array([[-0.4472136 , -0.13187609],
[-0. , -0.98907071],
[-0.89442719, 0.06593805]])
Py3 = Q @ Q.T @ y
Py3
1.8. Exercises 21
Advanced Quantitative Economics with Python
TWO
In addition to what’s in Anaconda, this lecture will need the following libraries:
2.1 Overview
In a previous lecture, we learned about finite Markov chains, a relatively elementary class of stochastic dynamic models.
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov chains.
Most stochastic dynamic models studied by economists either fit directly into this class or can be represented as continuous
state Markov chains after minor modifications.
In this lecture, our focus will be on continuous Markov models that
• evolve in discrete-time
• are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic models have their own highly
developed toolset, as we’ll see later on.
The question that interests us most is: Given a particular stochastic dynamic model, how will the state of the system
evolve over time?
In particular,
• What happens to the distribution of the state variables?
• Is there anything we can say about the “average behavior” of these variables?
• Is there a notion of “steady state” or “long-run equilibrium” that’s applicable to the model?
– If so, how can we compute it?
Answering these questions will lead us to revisit many of the topics that occupied us in the finite state case, such as
simulation, distribution dynamics, stability, ergodicity, etc.
Note: For some people, the term “Markov chain” always refers to a process with a finite or discrete state space. We
follow the mainstream mathematical literature (e.g., [Meyn and Tweedie, 2009]) in using the term to refer to any discrete
time Markov process.
23
Advanced Quantitative Economics with Python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm, beta
from quantecon import LAE
from scipy.stats import norm, gaussian_kde
You are probably aware that some distributions can be represented by densities and some cannot.
(For example, distributions on the real numbers ℝ that put positive probability on individual points have no density
representation)
We are going to start our analysis by looking at Markov chains where the one-step transition probabilities have density
representations.
The benefit is that the density case offers a very direct parallel to the finite case in terms of notation and intuition.
Once we’ve built some intuition we’ll cover the general case.
In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on a finite state space 𝑆.
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnegative square matrix 𝑃 = 𝑃 [𝑖, 𝑗]
such that each row 𝑃 [𝑖, ⋅] sums to one.
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖 to state 𝑗 in one unit of time.
In symbols,
ℙ{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]
Equivalently,
• 𝑃 can be thought of as a family of distributions 𝑃 [𝑖, ⋅], one for each 𝑖 ∈ 𝑆
• 𝑃 [𝑖, ⋅] is the distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑖
(As you probably recall, when using NumPy arrays, 𝑃 [𝑖, ⋅] is expressed as P[i,:])
In this section, we’ll allow 𝑆 to be a subset of ℝ, such as
• ℝ itself
• the positive reals (0, ∞)
• a bounded interval (𝑎, 𝑏)
The family of discrete distributions 𝑃 [𝑖, ⋅] will be replaced by a family of densities 𝑝(𝑥, ⋅), one for each 𝑥 ∈ 𝑆.
Analogous to the finite state case, 𝑝(𝑥, ⋅) is to be understood as the distribution (density) of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥.
More formally, a stochastic kernel on 𝑆 is a function 𝑝 ∶ 𝑆 × 𝑆 → ℝ with the property that
1. 𝑝(𝑥, 𝑦) ≥ 0 for all 𝑥, 𝑦 ∈ 𝑆
2. ∫ 𝑝(𝑥, 𝑦)𝑑𝑦 = 1 for all 𝑥 ∈ 𝑆
1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (2.1)
2𝜋 2
In the previous section, we made the connection between stochastic difference equation (2.2) and stochastic kernel (2.1).
In economics and time-series analysis we meet stochastic difference equations of all different shapes and sizes.
It will be useful for us if we have some systematic methods for converting stochastic difference equations into stochastic
kernels.
To this end, consider the generic (scalar) stochastic difference equation given by
This is a special case of (2.3) with 𝜇(𝑥) = 𝛼𝑥 and 𝜎(𝑥) = (𝛽 + 𝛾𝑥2 )1/2 .
Example 3: With stochastic production and a constant savings rate, the one-sector neoclassical growth model leads to a
law of motion for capital per worker such as
Here
• 𝑠 is the rate of savings
• 𝐴𝑡+1 is a production shock
1 𝑦 − 𝜇(𝑥)
𝑝(𝑥, 𝑦) = 𝜙( ) (2.7)
𝜎(𝑥) 𝜎(𝑥)
1 𝑦 − (1 − 𝛿)𝑥
𝑝(𝑥, 𝑦) = 𝜙( ) (2.8)
𝑠𝑓(𝑥) 𝑠𝑓(𝑥)
In this section of our lecture on finite Markov chains, we asked the following question: If
1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃
2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡
then what is the distribution of 𝑋𝑡+1 ?
Letting 𝜓𝑡+1 denote the distribution of 𝑋𝑡+1 , the answer we gave was that
This intuitive equality states that the probability of being at 𝑗 tomorrow is the probability of visiting 𝑖 today and then
going on to 𝑗, summed over all possible 𝑖.
In the density case, we just replace the sum with an integral and probability mass functions with densities, yielding
Note: Unlike most operators, we write 𝑃 to the right of its argument, instead of to the left (i.e., 𝜓𝑃 instead of 𝑃 𝜓).
This is a common convention, with the intention being to maintain the parallel with the finite case — see here
With this notation, we can write (2.9) more succinctly as 𝜓𝑡+1 (𝑦) = (𝜓𝑡 𝑃 )(𝑦) for all 𝑦, or, dropping the 𝑦 and letting
“=” indicate equality of functions,
𝜓𝑡+1 = 𝜓𝑡 𝑃 (2.11)
Equation (2.11) tells us that if we specify a distribution for 𝜓0 , then the entire sequence of future distributions can be
obtained by iterating with 𝑃 .
It’s interesting to note that (2.11) is a deterministic difference equation.
Thus, by converting a stochastic difference equation such as (2.3) into a stochastic kernel 𝑝 and hence an operator 𝑃 , we
convert a stochastic difference equation into a deterministic one (albeit in a much higher dimensional space).
Note: Some people might be aware that discrete Markov chains are in fact a special case of the continuous Markov
chains we have just described. The reason is that probability mass functions are densities with respect to the counting
measure.
2.2.4 Computation
To learn about the dynamics of a given process, it’s useful to compute and study the sequences of densities generated by
the model.
One way to do this is to try to implement the iteration described by (2.10) and (2.11) using numerical integration.
However, to produce 𝜓𝑃 from 𝜓 via (2.10), you would need to integrate at every 𝑦, and there is a continuum of such 𝑦.
Another possibility is to discretize the model, but this introduces errors of unknown size.
A nicer alternative in the present setting is to combine simulation with an elegant estimator called the look-ahead estimator.
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of which we repeat here for
convenience:
Our aim is to compute the sequence {𝜓𝑡 } associated with this model and fixed initial condition 𝜓0 .
To approximate 𝜓𝑡 by simulation, recall that, by definition, 𝜓𝑡 is the density of 𝑘𝑡 given 𝑘0 ∼ 𝜓0 .
If we wish to generate observations of this random variable, all we need to do is
1. draw 𝑘0 from the specified initial condition 𝜓0
1 𝑛
𝜓𝑡𝑛 (𝑦) = 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) (2.13)
𝑛 𝑖=1
1 𝑛 𝑖 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) → 𝔼𝑝(𝑘𝑡−1 , 𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡−1 (𝑥) 𝑑𝑥 = 𝜓𝑡 (𝑦)
𝑛 𝑖=1
2.2.5 Implementation
A class called LAE for estimating densities by this technique can be found in lae.py.
Given our use of the __call__ method, an instance of LAE acts as a callable object, which is essentially a function that
can store its own data (see this discussion).
This function returns the right-hand side of (2.13) using
• the data and stochastic kernel that it stores as its instance data
• the value 𝑦 as its argument
The function is vectorized, in the sense that if psi is such an instance and y is an array, then the call psi(y) acts
elementwise.
(This is the reason that we reshaped X and y inside the class — to make vectorization work)
Because the implementation is fully vectorized, it is about as efficient as it would be in C or Fortran.
2.2.6 Example
The following code is an example of usage for the stochastic growth model described above
# == Define parameters == #
s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # We set f(k) = k**α
ψ_0 = beta(5, 5, scale=0.5) # Initial distribution
ϕ = lognorm(a_σ)
# == Generate T instances of LAE using this data, one for each date t == #
laes = [LAE(p, k[:, t]) for t in range(T)]
# == Plot == #
fig, ax = plt.subplots()
ygrid = np.linspace(0.01, 4.0, 200)
greys = [str(g) for g in np.linspace(0.0, 0.8, T)]
greys.reverse()
for ψ, g in zip(laes, greys):
ax.plot(ygrid, ψ(ygrid), color=g, lw=2, alpha=0.6)
ax.set_xlabel('capital')
ax.set_title(f'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={T}$')
plt.show()
The figure shows part of the density sequence {𝜓𝑡 }, with each density computed via the look-ahead estimator.
Notice that the sequence of densities shown in the figure seems to be converging — more on this in just a moment.
Another quick comment is that each of these distributions could be interpreted as a cross-sectional distribution (recall
this discussion).
Up until now, we have focused exclusively on continuous state Markov chains where all conditional distributions 𝑝(𝑥, ⋅)
are densities.
As discussed above, not all distributions can be represented as densities.
If the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 cannot be represented as a density for some 𝑥 ∈ 𝑆, then we need
a slightly different theory.
The ultimate option is to switch from densities to probability measures, but not all readers will be familiar with measure
theory.
We can, however, construct a fairly general theory using distribution functions.
To illustrate the issues, recall that Hopenhayn and Rogerson [Hopenhayn and Rogerson, 1993] study a model of firm
dynamics where individual firm productivity follows the exogenous process
IID
𝑋𝑡+1 = 𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 , where {𝜉𝑡 } ∼ 𝑁 (0, 𝜎2 )
If you think about it, you will see that for any given 𝑥 ∈ [0, 1], the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 puts
positive probability mass on 0 and 1.
Hence it cannot be represented as a density.
What we can do instead is use cumulative distribution functions (cdfs).
To this end, set
This family of cdfs 𝐺(𝑥, ⋅) plays a role analogous to the stochastic kernel in the density case.
The distribution dynamics in (2.9) are then replaced by
Here 𝐹𝑡 and 𝐹𝑡+1 are cdfs representing the distribution of the current state and next period state.
The intuition behind (2.14) is essentially the same as for (2.9).
2.3.2 Computation
If you wish to compute these cdfs, you cannot use the look-ahead estimator as before.
Indeed, you should not use any density estimator, since the objects you are estimating/computing are not densities.
One good option is simulation as before, combined with the empirical distribution function.
2.4 Stability
In our lecture on finite Markov chains, we also studied stationarity, stability and ergodicity.
Here we will cover the same topics for the continuous case.
We will, however, treat only the density case (as in this section), where the stochastic kernel is a family of densities.
The general case is relatively similar — references are given below.
2.4. Stability 31
Advanced Quantitative Economics with Python
Analogous to the finite case, given a stochastic kernel 𝑝 and corresponding Markov operator as defined in (2.10), a density
𝜓∗ on 𝑆 is called stationary for 𝑃 if it is a fixed point of the operator 𝑃 .
In other words,
As with the finite case, if 𝜓∗ is stationary for 𝑃 , and the distribution of 𝑋0 is 𝜓∗ , then, in view of (2.11), 𝑋𝑡 will have
this same distribution for all 𝑡.
Hence 𝜓∗ is the stochastic equivalent of a steady state.
In the finite case, we learned that at least one stationary distribution exists, although there may be many.
When the state space is infinite, the situation is more complicated.
Even existence can fail very easily.
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210).
However, there are well-known conditions under which a stationary density 𝜓∗ exists.
With additional conditions, we can also get a unique stationary density (𝜓 ∈ 𝒟 and 𝜓 = 𝜓𝑃 ⟹ 𝜓 = 𝜓∗ ), and also
global convergence in the sense that
∀ 𝜓 ∈ 𝒟, 𝜓𝑃 𝑡 → 𝜓∗ as 𝑡 → ∞ (2.16)
This combination of existence, uniqueness and global convergence in the sense of (2.16) is often referred to as global
stability.
Under very similar conditions, we get ergodicity, which means that
1 𝑛
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑛 → ∞ (2.17)
𝑛 𝑡=1
for any (measurable) function ℎ ∶ 𝑆 → ℝ such that the right-hand side is finite.
Note that the convergence in (2.17) does not depend on the distribution (or value) of 𝑋0 .
This is actually very important for simulation — it means we can learn about 𝜓∗ (i.e., approximate the right-hand side of
(2.17) via the left-hand side) without requiring any special knowledge about what to do with 𝑋0 .
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the “edges” of the state space.
2. Sufficient “mixing” obtains.
For one such set of conditions see theorem 8.2.14 of EDTC.
In addition
• [Stokey et al., 1989] contains a classic (but slightly outdated) treatment of these topics.
• From the mathematical literature, [Lasota and MacKey, 1994] and [Meyn and Tweedie, 2009] give outstanding
in-depth treatments.
• Section 8.1.2 of EDTC provides detailed intuition, and section 8.3 gives additional references.
• EDTC, section 11.3.4 provides a specific treatment for the growth model we considered in this lecture.
As stated above, the growth model treated here is stable under mild conditions on the primitives.
• See EDTC, section 11.3.4 for more details.
We can see this stability in action — in particular, the convergence in (2.16) — by simulating the path of densities from
various initial conditions.
Here is such a figure.
All sequences are converging towards the same limit, regardless of their initial condition.
The details regarding initial conditions and so on are given in this exercise, where you are asked to replicate the figure.
In the preceding figure, each sequence of densities is converging towards the unique stationary density 𝜓∗ .
Even from this figure, we can get a fair idea what 𝜓∗ looks like, and where its mass is located.
However, there is a much more direct way to estimate the stationary density, and it involves only a slight modification of
the look-ahead estimator.
Let’s say that we have a model of the form (2.3) that is stable and ergodic.
Let 𝑝 be the corresponding stochastic kernel, as given in (2.7).
2.4. Stability 33
Advanced Quantitative Economics with Python
To approximate the stationary density 𝜓∗ , we can simply generate a long time-series 𝑋0 , 𝑋1 , … , 𝑋𝑛 and estimate 𝜓∗ via
1 𝑛
𝜓𝑛∗ (𝑦) = ∑ 𝑝(𝑋𝑡 , 𝑦) (2.18)
𝑛 𝑡=1
This is essentially the same as the look-ahead estimator (2.13), except that now the observations we generate are a single
time-series, rather than a cross-section.
The justification for (2.18) is that, with probability one as 𝑛 → ∞,
1 𝑛
∑ 𝑝(𝑋𝑡 , 𝑦) → ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥 = 𝜓∗ (𝑦)
𝑛 𝑡=1
where the convergence is by (2.17) and the equality on the right is by (2.15).
The right-hand side is exactly what we want to compute.
On top of this asymptotic result, it turns out that the rate of convergence for the look-ahead estimator is very good.
The first exercise helps illustrate this point.
2.5 Exercises
Exercise 2.5.1
Consider the simple threshold autoregressive model
IID
𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (2.19)
This is one of those rare nonlinear stochastic models where an analytical expression for the stationary density is available.
In particular, provided that |𝜃| < 1, there is a unique stationary density 𝜓∗ given by
𝜃𝑦
𝜓∗ (𝑦) = 2 𝜙(𝑦) Φ [ ] (2.20)
(1 − 𝜃2 )1/2
Here 𝜙 is the standard normal density and Φ is the standard normal cdf.
As an exercise, compute the look-ahead estimate of 𝜓∗ , as defined in (2.18), and compare it with 𝜓∗ in (2.20) to see
whether they are indeed close for large 𝑛.
In doing so, set 𝜃 = 0.8 and 𝑛 = 500.
The next figure shows the result of such a computation
The additional density (black line) is a nonparametric kernel density estimate, added to the solution for illustration.
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look-ahead estimator is a much tighter fit than the kernel density estimator.
If you repeat the simulation you will see that this is consistently the case.
2.5. Exercises 35
Advanced Quantitative Economics with Python
ϕ = norm()
n = 500
θ = 0.8
# == Frequently used constants == #
d = np.sqrt(1 - θ**2)
δ = θ / d
def ψ_star(y):
"True stationary density of the TAR Model"
return 2 * norm.pdf(y) * norm.cdf(δ * y)
Z = ϕ.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = θ * np.abs(X[t]) + d * Z[t]
ψ_est = LAE(p, X)
k_est = gaussian_kde(X)
Exercise 2.5.2
Replicate the figure on global convergence shown above.
The densities come from the stochastic growth model treated at the start of the lecture.
Begin with the code found above.
Use the same parameters.
For the four initial distributions, use the shifted beta distributions
# == Define parameters == #
s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # f(k) = k**α
ϕ = lognorm(a_σ)
(continues on next page)
2.5. Exercises 37
Advanced Quantitative Economics with Python
for i in range(4):
ax = axes[i]
ax.set_xlim(0, xmax)
ψ_0 = beta(5, 5, scale=0.5, loc=i*2) # Initial distribution
Exercise 2.5.3
A common way to compare distributions visually is with boxplots.
To illustrate, let’s generate three artificial data sets and compare them with a boxplot.
The three data sets we will use are:
{𝑋1 , … , 𝑋𝑛 } ∼ 𝐿𝑁 (0, 1), {𝑌1 , … , 𝑌𝑛 } ∼ 𝑁 (2, 1), and {𝑍1 , … , 𝑍𝑛 } ∼ 𝑁 (4, 1),
n = 500
x = np.random.randn(n) # N(0, 1)
x = np.exp(x) # Map x to lognormal
y = np.random.randn(n) + 2.0 # N(2, 1)
z = np.random.randn(n) + 4.0 # N(4, 1)
2.5. Exercises 39
Advanced Quantitative Economics with Python
Each data set is represented by a box, where the top and bottom of the box are the third and first quartiles of the data,
and the red line in the center is the median.
The boxes give some indication as to
• the location of probability mass for each sample
• whether the distribution is right-skewed (as is the lognormal distribution), etc
Now let’s put these ideas to use in a simulation.
Consider the threshold autoregressive model in (2.19).
We know that the distribution of 𝑋𝑡 will converge to (2.20) whenever |𝜃| < 1.
Let’s observe this convergence from different initial conditions using boxplots.
In particular, the exercise is to generate J boxplot figures, one for each initial condition 𝑋0 in
initial_conditions = np.linspace(8, 0, J)
Note the way we use vectorized code to simulate the 𝑘 time series for one boxplot all at once
n = 20
k = 5000
J = 8
θ = 0.9
d = np.sqrt(1 - θ**2)
δ = θ / d
for j in range(J):
axes[j].set_ylim(-4, 8)
axes[j].set_title(f'time series from t = {initial_conditions[j]}')
Z = np.random.randn(k, n)
X[:, 0] = initial_conditions[j]
for t in range(1, n):
X[:, t] = θ * np.abs(X[:, t-1]) + d * Z[:, t]
axes[j].boxplot(X)
plt.show()
2.5. Exercises 41
Advanced Quantitative Economics with Python
2.6 Appendix
2.6. Appendix 43
Advanced Quantitative Economics with Python
THREE
This lecture uses the Kalman filter to reformulate John F. Muth’s first paper [Muth, 1960] about rational expectations.
Muth used classical prediction methods to reverse engineer a stochastic process that renders optimal Milton Friedman’s
[Friedman, 1956] “adaptive expectations” scheme.
Milton Friedman [Friedman, 1956] (1956) posited that consumer’s forecast their future disposable income with the adap-
tive expectations scheme
∞
∗
𝑦𝑡+𝑖,𝑡 = 𝐾 ∑(1 − 𝐾)𝑗 𝑦𝑡−𝑗 (3.1)
𝑗=0
∗
where 𝐾 ∈ (0, 1) and 𝑦𝑡+𝑖,𝑡 is a forecast of future 𝑦 over horizon 𝑖.
Milton Friedman justified the exponential smoothing forecasting scheme (3.1) informally, noting that it seemed a plau-
sible way to use past income to forecast future income.
In his first paper about rational expectations, John F. Muth [Muth, 1960] reverse-engineered a univariate stochastic
∞
process {𝑦𝑡 }𝑡=−∞ for which Milton Friedman’s adaptive expectations scheme gives linear least forecasts of 𝑦𝑡+𝑗 for any
horizon 𝑖.
Muth sought a setting and a sense in which Friedman’s forecasting scheme is optimal.
That is, Muth asked for what optimal forecasting question is Milton Friedman’s adaptive expectation scheme the answer.
Muth (1960) used classical prediction methods based on lag-operators and 𝑧-transforms to find the answer to his question.
Please see lectures Classical Control with Linear Algebra and Classical Filtering and Prediction with Linear Algebra for an
introduction to the classical tools that Muth used.
45
Advanced Quantitative Economics with Python
Rather than using those classical tools, in this lecture we apply the Kalman filter to express the heart of Muth’s analysis
concisely.
The lecture First Look at Kalman Filter describes the Kalman filter.
We’ll use limiting versions of the Kalman filter corresponding to what are called stationary values in that lecture.
Suppose that an observable 𝑦𝑡 is the sum of an unobserved random walk 𝑥𝑡 and an IID shock 𝜖2,𝑡 :
𝑥𝑡+1 = 𝑥𝑡 + 𝜎𝑥 𝜖1,𝑡+1
(3.2)
𝑦𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡
where
𝜖
[ 1,𝑡+1 ] ∼ 𝒩(0, 𝐼)
𝜖2,𝑡
is an IID process.
Note: A property of the state-space representation (3.2) is that in general neither 𝜖1,𝑡 nor 𝜖2,𝑡 is in the space spanned by
square-summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , ….
𝜖
In general [ 1,𝑡 ] has more information about future 𝑦𝑡+𝑗 ’s than is contained in 𝑦𝑡 , 𝑦𝑡−1 , ….
𝜖2𝑡
We can use the asymptotic or stationary values of the Kalman gain and the one-step-ahead conditional state covariance
matrix to compute a time-invariant innovations representation
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾𝑎𝑡
(3.3)
𝑦𝑡 = 𝑥𝑡̂ + 𝑎𝑡
where 𝑥𝑡̂ = 𝐸[𝑥𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …] and 𝑎𝑡 = 𝑦𝑡 − 𝐸[𝑦𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …].
Note: A key property about an innovations representation is that 𝑎𝑡 is in the space spanned by square summable linear
combinations of 𝑦𝑡 , 𝑦𝑡−1 , ….
For more ramifications of this property, see the lectures Shock Non-Invertibility and Recursive Models of Dynamic Linear
Economies.
Later we’ll stack these state-space systems (3.2) and (3.3) to display some classic findings of Muth.
But first, let’s create an instance of the state-space system (3.2) then apply the quantecon Kalman class, then uses it to
construct the associated “innovations representation”
Now we want to map the time-invariant innovations representation (3.3) and the original state-space system (3.2) into a
convenient form for deducing the impulse responses from the original shocks to the 𝑥𝑡 and 𝑥𝑡̂ .
Putting both of these representations into a single state-space system is yet another application of the insight that “finding
the state is an art”.
We’ll define a state vector and appropriate state-space matrices that allow us to represent both systems in one fell swoop.
Note that
𝑎𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂
so that
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾(𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂ )
= (1 − 𝐾)𝑥𝑡̂ + 𝐾𝑥𝑡 + 𝐾𝜎𝑦 𝜖2,𝑡
𝑥𝑡+1 1 0 0 𝑥𝑡 𝜎𝑥 0
⎤ = ⎡𝐾 𝜖1,𝑡+1
⎡ 𝑥̂
⎢ 𝑡+1 ⎥ ⎢ (1 − 𝐾) 𝐾𝜎𝑦 ⎤ ⎡ 𝑥̂ ⎤ + ⎡ 0
⎥⎢ 𝑡 ⎥ ⎢ 0⎤
⎥ [𝜖 ]
⎣𝜖2,𝑡+1 ⎦ ⎣ 0 0 0 ⎦ ⎣𝜖2,𝑡 ⎦ ⎣ 0 1⎦ 2,𝑡+1
𝑥
𝑦 1 0 𝜎𝑦 ⎡ 𝑡 ⎤
[ 𝑡] = [ ] ⎢ 𝑥𝑡̂ ⎥
𝑎𝑡 1 −1 𝜎𝑦
⎣𝜖2,𝑡 ⎦
𝜖
is a state-space system that tells us how the shocks [ 1,𝑡+1 ] affect states 𝑥𝑡+1
̂ , 𝑥𝑡 , the observable 𝑦𝑡 , and the innovation
𝜖2,𝑡+1
𝑎𝑡 .
With this tool at our disposal, let’s form the composite system and simulate it
Now that we have simulated our joint system, we have 𝑥𝑡 , 𝑥𝑡̂ , and 𝑦𝑡 .
We can now investigate how these variables are related by plotting some key objects.
First, let’s plot the hidden state 𝑥𝑡 and the filtered version 𝑥𝑡̂ that is linear-least squares projection of 𝑥𝑡 on the history
𝑦𝑡−1 , 𝑦𝑡−2 , …
fig, ax = plt.subplots()
ax.plot(xf[0, :], label="$x_t$")
ax.plot(xf[1, :], label="Filtered $x_t$")
ax.legend()
ax.set_xlabel("Time")
ax.set_title(r"$x$ vs $\hat{x}$")
plt.show()
fig, ax = plt.subplots()
ax.plot(yf[0, :], label="y")
ax.plot(xf[0, :], label="x")
ax.legend()
ax.set_title(r"$x$ and $y$")
ax.set_xlabel("Time")
plt.show()
We see above that 𝑦 seems to look like white noise around the values of 𝑥.
3.5.1 Innovations
Recall that we wrote down the innovation representation that depended on 𝑎𝑡 . We now plot the innovations {𝑎𝑡 }:
fig, ax = plt.subplots()
ax.plot(yf[1, :], label="a")
ax.legend()
ax.set_title(r"Innovation $a_t$")
ax.set_xlabel("Time")
plt.show()
fig, ax = plt.subplots(2)
ax[0].plot(coefs_ma_array, label="MA")
ax[0].legend()
ax[1].plot(coefs_var_array, label="VAR")
(continues on next page)
plt.show()
The moving average coefficients in the top panel show tell-tale signs of 𝑦𝑡 being a process whose first difference is a
first-order autoregression.
The autoregressive coefficients decline geometrically with decay rate (1 − 𝐾).
These are exactly the target outcomes that Muth (1960) aimed to reverse engineer
FOUR
In addition to what’s in Anaconda, this lecture will need the following libraries:
4.1 Overview
In this lecture we discuss a family of dynamic programming problems with the following features:
1. a discrete state space and discrete choices (actions)
2. an infinite horizon
3. discounted rewards
4. Markov state transitions
We call such problems discrete dynamic programs or discrete DPs.
Discrete DPs are the workhorses in much of modern quantitative economics, including
• monetary economics
• search and labor economics
• household savings and consumption theory
• investment theory
• asset pricing
• industrial organization, etc.
When a given model is not inherently discrete, it is common to replace it with a discretized version in order to use discrete
DP techniques.
This lecture covers
• the theory of dynamic programming in a discrete setting, plus examples and applications
• a powerful set of routines for solving discrete DPs from the QuantEcon code library
Let’s start with some imports:
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
import scipy.sparse as sparse
(continues on next page)
53
Advanced Quantitative Economics with Python
4.1.2 Code
4.1.3 References
For background reading on dynamic programming and additional applications, see, for example,
• [Ljungqvist and Sargent, 2018]
• [Hernandez-Lerma and Lasserre, 1996], section 3.5
• [Puterman, 2005]
• [Stokey et al., 1989]
• [Rust, 1996]
• [Miranda and Fackler, 2002]
• EDTC, chapter 5
Loosely speaking, a discrete DP is a maximization problem with an objective function of the form
∞
𝔼 ∑ 𝛽 𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 ) (4.1)
𝑡=0
where
• 𝑠𝑡 is the state variable
• 𝑎𝑡 is the action
• 𝛽 is a discount factor
• 𝑟(𝑠𝑡 , 𝑎𝑡 ) is interpreted as a current reward when the state is 𝑠𝑡 and the action chosen is 𝑎𝑡
Each pair (𝑠𝑡 , 𝑎𝑡 ) pins down transition probabilities 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) for the next period state 𝑠𝑡+1 .
Thus, actions influence not only current rewards but also the future time path of the state.
The essence of dynamic programming problems is to trade off current rewards vs favorable positioning of the future state
(modulo randomness).
Examples:
• consuming today vs saving and accumulating assets
• accepting a job offer today vs seeking a better one in the future
• exercising an option now vs waiting
4.2.1 Policies
The most fruitful way to think about solutions to discrete DP problems is to compare policies.
In general, a policy is a randomized map from past actions and states to current action.
In the setting formalized below, it suffices to consider so-called stationary Markov policies, which consider only the current
state.
In particular, a stationary Markov policy is a map 𝜎 from states to actions
• 𝑎𝑡 = 𝜎(𝑠𝑡 ) indicates that 𝑎𝑡 is the action to be taken in state 𝑠𝑡
It is known that, for any arbitrary policy, there exists a stationary Markov policy that dominates it at least weakly.
• See section 5.5 of [Puterman, 2005] for discussion and proofs.
In what follows, stationary Markov policies are referred to simply as policies.
The aim is to find an optimal policy, in the sense of one that maximizes (4.1).
Let’s now step through these ideas more carefully.
SA ∶= {(𝑠, 𝑎) ∣ 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)}
3. A reward function 𝑟 ∶ SA → ℝ.
4. A transition probability function 𝑄 ∶ SA → Δ(𝑆), where Δ(𝑆) is the set of probability distributions over 𝑆.
5. A discount factor 𝛽 ∈ [0, 1).
We also use the notation 𝐴 ∶= ⋃𝑠∈𝑆 𝐴(𝑠) = {0, … , 𝑚 − 1} and call this set the action space.
A policy is a function 𝜎 ∶ 𝑆 → 𝐴.
A policy is called feasible if it satisfies 𝜎(𝑠) ∈ 𝐴(𝑠) for all 𝑠 ∈ 𝑆.
Denote the set of all feasible policies by Σ.
If a decision-maker uses a policy 𝜎 ∈ Σ, then
• the current reward at time 𝑡 is 𝑟(𝑠𝑡 , 𝜎(𝑠𝑡 ))
• the probability that 𝑠𝑡+1 = 𝑠′ is 𝑄(𝑠𝑡 , 𝜎(𝑠𝑡 ), 𝑠′ )
For each 𝜎 ∈ Σ, define
• 𝑟𝜎 by 𝑟𝜎 (𝑠) ∶= 𝑟(𝑠, 𝜎(𝑠)))
• 𝑄𝜎 by 𝑄𝜎 (𝑠, 𝑠′ ) ∶= 𝑄(𝑠, 𝜎(𝑠), 𝑠′ )
Notice that 𝑄𝜎 is a stochastic matrix on 𝑆.
It gives transition probabilities of the controlled chain when we follow policy 𝜎.
If we think of 𝑟𝜎 as a column vector, then so is 𝑄𝑡𝜎 𝑟𝜎 , and the 𝑠-th row of the latter has the interpretation
Comments
• {𝑠𝑡 } ∼ 𝑄𝜎 means that the state is generated by stochastic matrix 𝑄𝜎 .
• See this discussion on computing expectations of Markov chains for an explanation of the expression in (4.2).
Notice that we’re not really distinguishing between functions from 𝑆 to ℝ and vectors in ℝ𝑛 .
This is natural because they are in one to one correspondence.
Let 𝑣𝜎 (𝑠) denote the discounted sum of expected reward flows from policy 𝜎 when the initial state is 𝑠.
To calculate this quantity we pass the expectation through the sum in (4.1) and use (4.2) to get
∞
𝑣𝜎 (𝑠) = ∑ 𝛽 𝑡 (𝑄𝑡𝜎 𝑟𝜎 )(𝑠) (𝑠 ∈ 𝑆)
𝑡=0
This function is called the policy value function for the policy 𝜎.
The optimal value function, or simply value function, is the function 𝑣∗ ∶ 𝑆 → ℝ defined by
(We can use max rather than sup here because the domain is a finite set)
A policy 𝜎 ∈ Σ is called optimal if 𝑣𝜎 (𝑠) = 𝑣∗ (𝑠) for all 𝑠 ∈ 𝑆.
Given any 𝑤 ∶ 𝑆 → ℝ, a policy 𝜎 ∈ Σ is called 𝑤-greedy if
As discussed in detail below, optimal policies are precisely those that are 𝑣∗ -greedy.
Now that the theory has been set out, let’s turn to solution methods.
The code for solving discrete DPs is available in ddp.py from the QuantEcon.py code library.
It implements the three most important solution methods for discrete dynamic programs, namely
• value function iteration
• policy function iteration
• modified policy function iteration
Let’s briefly review these algorithms and their implementation.
Perhaps the most familiar method for solving all manner of dynamic programs is value function iteration.
This algorithm uses the fact that the Bellman operator 𝑇 is a contraction mapping with fixed point 𝑣∗ .
Hence, iterative application of 𝑇 to any initial function 𝑣0 ∶ 𝑆 → ℝ converges to 𝑣∗ .
The details of the algorithm can be found in the appendix.
This routine, also known as Howard’s policy improvement algorithm, exploits more closely the particular structure of a
discrete DP problem.
Each iteration consists of
1. A policy evaluation step that computes the value 𝑣𝜎 of a policy 𝜎 by solving the linear equation 𝑣 = 𝑇𝜎 𝑣.
2. A policy improvement step that computes a 𝑣𝜎 -greedy policy.
In the current setting, policy iteration computes an exact optimal policy in finitely many iterations.
• See theorem 10.2.6 of EDTC for a proof.
The details of the algorithm can be found in the appendix.
Modified policy iteration replaces the policy evaluation step in policy iteration with “partial policy evaluation”.
The latter computes an approximation to the value of a policy 𝜎 by iterating 𝑇𝜎 for a specified number of times.
This approach can be useful when the state space is very large and the linear system in the policy evaluation step of policy
iteration is correspondingly difficult to solve.
The details of the algorithm can be found in the appendix.
𝑠′ = 𝑎 + 𝑈 where 𝑈 ∼ 𝑈 [0, … , 𝐵]
This information will be used to create an instance of DiscreteDP by passing the following information
1. An 𝑛 × 𝑚 reward array 𝑅.
2. An 𝑛 × 𝑚 × 𝑛 transition probability array 𝑄.
3. A discount factor 𝛽.
For 𝑅 we set 𝑅[𝑠, 𝑎] = 𝑢(𝑠 − 𝑎) if 𝑎 ≤ 𝑠 and −∞ otherwise.
For 𝑄 we follow the rule in (4.3).
Note:
• The feasibility constraint is embedded into 𝑅 by setting 𝑅[𝑠, 𝑎] = −∞ for 𝑎 ∉ 𝐴(𝑠).
• Probability distributions for (𝑠, 𝑎) with 𝑎 ∉ 𝐴(𝑠) can be arbitrary.
class SimpleOG:
self.populate_Q()
self.populate_R()
def populate_R(self):
"""
Populate the R matrix, with R[s, a] = -np.inf for infeasible
state-action pairs.
"""
for s in range(self.n):
for a in range(self.m):
self.R[s, a] = self.u(s - a) if a <= s else -np.inf
def populate_Q(self):
"""
Populate the Q matrix by setting
for a in range(self.m):
self.Q[:, a, a:(a + self.B + 1)] = 1.0 / (self.B + 1)
results = ddp.solve(method='policy_iteration')
dir(results)
(In IPython version 4.0 and above you can also type results. and hit the tab key)
The most important attributes are v, the value function, and σ, the optimal policy
results.v
results.sigma
array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5])
Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound max_iter.
Let’s make sure this didn’t happen
results.max_iter
250
results.num_iter
Another interesting object is results.mc, which is the controlled chain defined by 𝑄𝜎∗ , where 𝜎∗ is the optimal policy.
In other words, it gives the dynamics of the state when the agent follows the optimal policy.
Since this object is an instance of MarkovChain from QuantEcon.py (see this lecture for more discussion), we can easily
simulate it, compute its stationary distribution and so on.
results.mc.stationary_distributions
If we look at the bar graph we can see the rightward shift in probability mass
def u(c):
return c**α
s_indices = []
a_indices = []
Q = []
R = []
b = 1.0 / (B + 1)
for s in range(n):
for a in range(min(M, s) + 1): # All feasible a at this s
s_indices.append(s)
a_indices.append(a)
q = np.zeros(n)
q[a:(a + B + 1)] = b # b on these values, otherwise 0
Q.append(q)
R.append(u(s - a))
For larger problems, you might need to write this code more efficiently by vectorizing or using Numba.
4.5 Exercises
In the stochastic optimal growth lecture from our introductory lecture series, we solve a benchmark model that has an
analytical solution.
The exercise is to replicate this solution using DiscreteDP.
4.5. Exercises 63
Advanced Quantitative Economics with Python
4.6 Solutions
4.6.1 Setup
α = 0.65
f = lambda k: k**α
u = np.log
β = 0.95
Here we want to solve a finite state version of the continuous state model above.
We discretize the state space into a grid of size grid_size=500, from 10−6 to grid_max=2
grid_max = 2
grid_size = 500
grid = np.linspace(1e-6, grid_max, grid_size)
We choose the action to be the amount of capital to save for the next period (the state is the capital stock at the beginning
of the period).
Thus the state indices and the action indices are both 0, …, grid_size-1.
Action (indexed by) a is feasible at state (indexed by) s if and only if grid[a] < f([grid[s]) (zero consumption
is not allowed because of the log utility).
Thus the Bellman equation is:
# State-action indices
s_indices, a_indices = np.where(C > 0)
print(L)
print(s_indices)
print(a_indices)
118841
[ 0 1 1 ... 499 499 499]
[ 0 0 1 ... 389 390 391]
R = u(C[s_indices, a_indices])
(Degenerate) transition probability matrix Q (of shape (L, grid_size)), where we choose the scipy.sparse.lil_matrix
format, while any format will do (internally it will be converted to the csr format):
Q = sparse.lil_matrix((L, grid_size))
Q[np.arange(L), a_indices] = 1
(If you are familiar with the data structure of scipy.sparse.csr_matrix, the following is the most efficient way to create the
Q matrix in the current case)
# data = np.ones(L)
# indptr = np.arange(L+1)
# Q = sparse.csr_matrix((data, a_indices, indptr), shape=(L, grid_size))
Notes
Here we intensively vectorized the operations on arrays to simplify the code.
As noted, however, vectorization is memory consumptive, and it can be prohibitively so for grids with large size.
res = ddp.solve(method='policy_iteration')
v, σ, num_iter = res.v, res.sigma, res.num_iter
num_iter
10
Note that sigma contains the indices of the optimal capital stocks to save for the next period. The following translates
sigma to the corresponding consumption vector.
def v_star(k):
return c1 + c2 * np.log(k)
def c_star(k):
return (1 - ab) * k**α
Let us compare the solution of the discrete model with that of the original continuous model
4.6. Solutions 65
Advanced Quantitative Economics with Python
np.abs(v - v_star(grid)).max()
121.49819147053378
np.abs(v - v_star(grid))[1:].max()
0.012681735127500815
np.abs(c - c_star(grid)).max()
0.003826523100010082
In fact, the optimal consumption obtained in the discrete version is not really monotone, but the decrements are quite
small:
diff = np.diff(c)
(diff >= 0).all()
False
174
np.abs(diff[dec_ind]).max()
0.001961853339766839
True
Value Iteration
ddp.epsilon = 1e-4
ddp.max_iter = 500
res1 = ddp.solve(method='value_iteration')
res1.num_iter
294
np.array_equal(σ, res1.sigma)
True
4.6. Solutions 67
Advanced Quantitative Economics with Python
res2 = ddp.solve(method='modified_policy_iteration')
res2.num_iter
16
np.array_equal(σ, res2.sigma)
True
Speed Comparison
%timeit ddp.solve(method='value_iteration')
%timeit ddp.solve(method='policy_iteration')
%timeit ddp.solve(method='modified_policy_iteration')
94.9 ms ± 360 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
9.34 ms ± 16.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.3 ms ± 59.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As is often the case, policy iteration and modified policy iteration are much faster than value iteration.
Let us first visualize the convergence of the value iteration algorithm as in the lecture, where we use ddp.
bellman_operator implemented as a method of DiscreteDP
plt.show()
We next plot the consumption policies along with the value iteration
4.6. Solutions 69
Advanced Quantitative Economics with Python
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/quantecon/_
↪compute_fp.py:152: RuntimeWarning: max_iter attained before convergence in␣
↪compute_fixed_point
warnings.warn(_non_convergence_msg, RuntimeWarning)
4.6. Solutions 71
Advanced Quantitative Economics with Python
Finally, let us work on Exercise 2, where we plot the trajectories of the capital stock for three different discount factors,
0.9, 0.94, and 0.98, with initial condition 𝑘0 = 0.1.
sample_size = 25
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel("time")
ax.set_ylabel("capital")
ax.set_ylim(0.10, 0.30)
ax.legend(loc='lower right')
plt.show()
This appendix covers the details of the solution algorithms implemented for DiscreteDP.
We will make use of the following notions of approximate optimality:
• For 𝜀 > 0, 𝑣 is called an 𝜀-approximation of 𝑣∗ if ‖𝑣 − 𝑣∗ ‖ < 𝜀.
• A policy 𝜎 ∈ Σ is called 𝜀-optimal if 𝑣𝜎 is an 𝜀-approximation of 𝑣∗ .
The DiscreteDP value iteration method implements value function iteration as follows
1. Choose any 𝑣0 ∈ ℝ𝑛 , and specify 𝜀 > 0; set 𝑖 = 0.
2. Compute 𝑣𝑖+1 = 𝑇 𝑣𝑖 .
3. If ‖𝑣𝑖+1 − 𝑣𝑖 ‖ < [(1 − 𝛽)/(2𝛽)]𝜀, then go to step 4; otherwise, set 𝑖 = 𝑖 + 1 and go to step 2.
4. Compute a 𝑣𝑖+1 -greedy policy 𝜎, and return 𝑣𝑖+1 and 𝜎.
Given 𝜀 > 0, the value iteration algorithm
• terminates in a finite number of iterations
• returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy function (unless iter_max
is reached)
(While not explicit, in the actual implementation each algorithm is terminated if the number of iterations reaches
iter_max)
LQ Control
75
CHAPTER
FIVE
5.1 Overview
In the linear-quadratic permanent income of consumption smoothing model described in this quantecon lecture, a scalar
parameter 𝛽 ∈ (0, 1) plays two roles:
• it is a discount factor that the consumer applies to future utilities from consumption
• it is the reciprocal of the gross interest rate on risk-free one-period loans
That 𝛽 plays these two roles is essential in delivering the outcome that, regardless of the stochastic process that describes
his non-financial income, the consumer chooses to make consumption follow a random walk (see [Hall, 1978]).
In this lecture, we assign a third role to 𝛽:
• it describes a first-order moving average process for the growth in non-financial income
We study two consumers who have exactly the same nonfinancial income process and who both conform to the linear-
quadratic permanent income of consumption smoothing model described here.
The two consumers have different information about their future nonfinancial incomes.
A better informed consumer each period receives news in the form of a shock that simultaneously affects both today’s
nonfinancial income and the present value of future nonfinancial incomes in a particular way.
A less informed consumer each period receives a shock that equals the part of today’s nonfinancial income that could not
be forecast from past values of nonfinancial income.
Even though they receive exactly the same nonfinancial incomes each period, our two consumers behave differently
because they have different information about their future nonfinancial incomes.
The second consumer receives less information about future nonfinancial incomes in a sense that we shall make precise.
This difference in their information sets manifests itself in their responding differently to what they regard as time 𝑡
information shocks.
Thus, although at each date they receive exactly the same histories of nonfinancial income, our two consumers receive
different shocks or news about their future nonfinancial incomes.
77
Advanced Quantitative Economics with Python
We study consequences of endowing a consumer with one of two alternative representations for the change in the con-
sumer’s nonfinancial income 𝑦𝑡+1 − 𝑦𝑡 .
For both types of consumer, a parameter 𝛽 ∈ (0, 1) plays three roles.
It appears
• as a discount factor applied to future expected one-period utilities,
• as the reciprocal of a gross interest rate on one-period loans, and
• as a parameter in a first-order moving average that equals the increment in a consumer’s non-financial income
The first representation, which we shall sometimes refer to as the more informative representation, is
where {𝜖𝑡 } is an i.i.d. normally distributed scalar process with means of zero and contemporaneous variances 𝜎𝜖2 .
This representation of the process is used by a consumer who at time 𝑡 knows both 𝑦𝑡 and the shock 𝜖𝑡 and can use both
of them to forecast future 𝑦𝑡+𝑗 ’s.
As we’ll see below, representation (5.1) has the peculiar property that a positive shock 𝜖𝑡+1 leaves the discounted present
value of the consumer’s financial income at time 𝑡 + 1 unaltered.
The second representation of the same {𝑦𝑡 } process is
where {𝑎𝑡 } is another i.i.d. normally distributed scalar process, with means of zero and now variances 𝜎𝑎2 > 𝜎𝜖2 .
The i.i.d. shock variances are related by
so that the variance of the innovation exceeds the variance of the original shock by a multiplicative factor 𝛽 −2 .
Representation (5.2) is the innovations representation of equation (5.1) associated with Kalman filtering theory.
To see how this works, note that equating representations (5.1) and (5.2) for 𝑦𝑡+1 −𝑦𝑡 implies 𝜖𝑡+1 −𝛽 −1 𝜖𝑡 = 𝑎𝑡+1 −𝛽𝑎𝑡 ,
which in turn implies
Solving this difference equation backwards for 𝑎𝑡+1 gives, after a few lines of algebra,
∞
𝑎𝑡+1 = 𝜖𝑡+1 + (𝛽 − 𝛽 −1 ) ∑ 𝛽 𝑗 𝜖𝑡−𝑗 (5.3)
𝑗=0
∞
where 𝐿 is the one-period lag operator, ℎ(𝐿) = ∑𝑗=0 ℎ𝑗 𝐿𝑗 , 𝐼 is the identity operator, and
𝐼 − 𝛽 −1 𝐿
ℎ(𝐿) =
𝐼 − 𝛽𝐿
Let 𝑔𝑗 ≡ 𝐸𝑧𝑡 𝑧𝑡−𝑗 be the 𝑗th autocovariance of the {𝑦𝑡 − 𝑦𝑡−1 } process.
Using calculations in the quantecon lecture, where 𝑧 ∈ 𝐶 is a complex variable, the covariance generating function
∞
𝑔(𝑧) = ∑𝑗=−∞ 𝑔𝑗 𝑧 𝑗 of the {𝑦𝑡 − 𝑦𝑡−1 } process equals
𝜎𝑎2 = 𝛽 −1 𝜎𝜖2 .
To verify these claims, just notice that 𝑔(𝑧) = 𝛽 −2 𝜎𝜖2 implies that
• 𝑔0 = 𝛽 −2 𝜎𝜖2 , and
• 𝑔𝑗 = 0 for 𝑗 ≠ 0.
Alternatively, if you are uncomfortable with covariance generating functions, note that we can directly calculate 𝜎𝑎2 from
formula (5.3) according to
∞
𝜎𝑎2 = 𝜎𝜖2 + [1 + (𝛽 − 𝛽 −1 )2 ∑ 𝛽 2𝑗 ] = 𝛽 −1 𝜎𝜖2 .
𝑗=0
We can also use the the Kalman filter to obtain representation (5.2) from representation (5.1).
Thus, from equations associated with the Kalman filter, it can be verified that the steady-state Kalman gain 𝐾 = 𝛽 2 and
the steady state conditional covariance
In a little more detail, let 𝑧𝑡 = 𝑦𝑡 − 𝑦𝑡−1 and form the state-space representation
𝜖𝑡+1
̂ = 0𝜖𝑡̂ + 𝐾𝑎𝑡+1
𝑧𝑡+1 = −𝛽𝑎𝑡 + 𝑎𝑡+1
By applying formulas for the steady-state Kalman filter, by hand it is possible to verify that 𝐾 = 𝛽 2 , 𝜎𝑎2 = 𝛽 −2 𝜎𝜖2 = 𝛽 −2 ,
and Σ = (1 − 𝛽 2 )𝜎𝜖2 .
Alternatively, we can obtain these formulas via the classical filtering theory described in this lecture.
Representation (5.1) is cast in terms of a news shock 𝜖𝑡+1 that represents a shock to nonfinancial income coming from
taxes, transfers, and other random sources of income changes known to a well-informed person who perhaps has all sorts
of information about the income process.
Representation (5.2) for the same income process is driven by shocks 𝑎𝑡 that contain less information than the news shock
𝜖𝑡 .
Representation (5.2) is called the innovations representation for the {𝑦𝑡 − 𝑦𝑡−1 } process.
It is cast in terms of what time series statisticians call the innovation or fundamental shock that emerges from apply-
ing the theory of optimally predicting nonfinancial income based solely on the information in past levels of growth in
nonfinancial income.
Fundamental for the 𝑦𝑡 process means that the shock 𝑎𝑡 can be expressed as a square-summable linear combination of
𝑦𝑡 , 𝑦𝑡−1 , ….
The shock 𝜖𝑡 is not fundamental because it has more information about the future of the {𝑦𝑡 − 𝑦𝑡−1 } process than is
contained in 𝑎𝑡 .
Representation (5.3) reveals the important fact that the original shock 𝜖𝑡 contains more information about future 𝑦’s than
is contained in the semi-infinite history 𝑦𝑡 = [𝑦𝑡 , 𝑦𝑡−1 , …].
Staring at representation (5.3) for 𝑎𝑡+1 shows that it consists both of new news 𝜖𝑡+1 as well as a long moving average
∞
(𝛽 − 𝛽 −1 ) ∑𝑗=0 𝛽 𝑗 𝜖𝑡−𝑗 of old news.
The more information representation (5.1) asserts that a shock 𝜖𝑡 results in an impulse response to nonfinancial income
of 𝜖𝑡 times the sequence
1, 1 − 𝛽 −1 , 1 − 𝛽 −1 , …
so that a shock that increases nonfinancial income 𝑦𝑡 by 𝜖𝑡 at time 𝑡 is followed by a change in future 𝑦 of 𝜖𝑡 times
1 − 𝛽 −1 < 0 in all subsequent periods.
Because 1 − 𝛽 −1 < 0, this means that a positive shock of 𝜖𝑡 today raises income at time 𝑡 by 𝜖𝑡 and then permanently
decreases all future incomes by (𝛽 −1 − 1)𝜖𝑡 .
This pattern precisely describes the following mental experiment:
• The consumer receives a government transfer of 𝜖𝑡 at time 𝑡.
• The government finances the transfer by issuing a one-period bond on which it pays a gross one-period risk-free
interest rate equal to 𝛽 −1 .
• In each future period, the government rolls over the one-period bond and so continues to borrow 𝜖𝑡 forever.
• The government imposes a lump-sum tax on the consumer in order to pay just the current interest on the original
bond and its rolled over successors.
• Thus, in periods 𝑡 + 1, 𝑡 + 2, …, the government levies a lump-sum tax on the consumer of 𝛽 −1 − 1 that is just
enough to pay the interest on the bond.
0
The present value of the impulse response or moving average coefficients equals 𝑑𝜖 (𝐿) = 1−𝛽 = 0, a fact that we’ll see
again below.
Representation (5.2), i.e., the innovations representation, asserts that a shock 𝑎𝑡 results in an impulse response to nonfi-
nancial income of 𝑎𝑡 times
1, 1 − 𝛽, 1 − 𝛽, …
so that a shock that increases income 𝑦𝑡 by 𝑎𝑡 at time 𝑡 can be expected to be followed by an increase in 𝑦𝑡+𝑗 of 𝑎𝑡 times
1 − 𝛽 > 0 in all future periods 𝑗 = 1, 2, ….
1−𝛽2
The present value of the impulse response or moving average coefficients for representation (5.2) is 𝑑𝑎 (𝛽) = 1−𝛽 =
(1 + 𝛽), another fact that will be important below.
Notice that reprentation (5.1), namely, 𝑦𝑡+1 − 𝑦𝑡 = −𝛽 −1 𝜖𝑡 + 𝜖𝑡+1 implies the linear difference equation
𝜖𝑡 = 𝛽𝜖𝑡+1 − 𝛽(𝑦𝑡+1 − 𝑦𝑡 ).
This equation shows that 𝜖𝑡 equals 𝛽 times the one-step-backwards error in optimally backcasting 𝑦𝑡 based on the semi-
𝑡
infinite future 𝑦+ ≡ [𝑦𝑡+1 , 𝑦𝑡+2 , …] via the optimal backcasting formula
∞
𝑡
𝐸[𝑦𝑡 |𝑦+ ] = (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1
𝑗=0
𝑡
Thus, 𝜖𝑡 exactly reveals the gap between 𝑦𝑡 and 𝐸[𝑦𝑡 |𝑦+ ].
Next notice that representation (5.2), namely, 𝑦𝑡+1 − 𝑦𝑡 = −𝛽𝑎𝑡 + 𝑎𝑡+1 implies the linear difference equation
Solving this equation backward establishes that the one-step-prediction error 𝑎𝑡+1 is
∞
𝑎𝑡+1 = 𝑦𝑡+1 − (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡−𝑗 .
𝑗=0
Here the information set is 𝑦𝑡 = [𝑦𝑡 , 𝑦𝑡−1 , …] and a one step-ahead optimal prediction is
∞
𝐸[𝑦𝑡+1 |𝑦𝑡 ] = (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡−𝑗
𝑗=0
When we computed optimal consumption-saving policies for our two representations (5.1) and (5.2) by using formulas
obtained with the difference equation approach described in quantecon lecture, we obtained:
for a consumer having the information assumed in the news representation (5.1):
𝑐𝑡+1 − 𝑐𝑡 = 0
𝑏𝑡+1 − 𝑏𝑡 = −𝛽 −1 𝜖𝑡
for a consumer having the more limited information associated with the innovations representation (5.2):
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽 2 )𝑎𝑡+1
𝑏𝑡+1 − 𝑏𝑡 = −𝛽𝑎𝑡
These formulas agree with outcomes from Python programs below that deploy state-space representations and dynamic
programming.
Evidently, although they receive exactly the same histories of nonfinancial incomethe two consumers behave differently.
The better informed consumer who has the information sets associated with representation (5.1) responds to each shock
𝜖𝑡+1 by leaving his consumption unaltered and saving all of 𝜖𝑡+1 in anticipation of the permanently increased taxes that he
will bear in order to service the permanent interest payments on the risk-free bonds that the government has presumably
issued to pay for the one-time addition 𝜖𝑡+1 to his time 𝑡 + 1 nonfinancial income.
The less well informed consumer who has information sets associated with representation (5.2) responds to a shock 𝑎𝑡+1
by increasing his consumption by what he perceives to be the permanent part of the increase in consumption and by
increasing his saving by what he perceives to be the temporary part.
The behavior of the better informed consumer sharply illustrates the behavior predicted in a classic Ricardian equivalence
experiment.
We now cast our representations (5.1) and (5.2), respectively, in terms of the following two state space systems:
𝑦𝑡+1 1 −𝛽 −1 𝑦𝑡 𝜎
[ ]=[ ] [ ] + [ 𝜖 ] 𝑣𝑡+1
𝜖𝑡+1 0 0 𝜖𝑡 𝜎𝜖
(5.4)
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝜖𝑡
and
𝑦𝑡+1 1 −𝛽 𝑦𝑡 𝜎
[ ]=[ ] [ ] + [ 𝑎 ] 𝑢𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 𝜎𝑎
(5.5)
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where {𝑣𝑡 } and {𝑢𝑡 } are both i.i.d. sequences of univariate standardized normal random variables.
These two alternative income processes are ready to be used in the framework presented in the section “Comparison with
the Difference Equation Approach” in thid quantecon lecture.
All the code that we shall use below is presented in that lecture.
5.9 Computations
We shall use Python to form two state-space representations (5.4) and (5.5).
We set the following parameter values 𝜎𝜖 = 1, 𝜎𝑎 = 𝛽 −1 𝜎𝜖 = 𝛽 −1 where 𝛽 is the same value as the discount factor in
the household’s problem in the LQ savings problem in the lecture.
For these two representations, we use the code in this lecture to
• compute optimal decision rules for 𝑐𝑡 , 𝑏𝑡 for the two types of consumers associated with our two representations
of nonfinancial income
• use the value function objects 𝑃 , 𝑑 returned by the code to compute optimal values for the two representations
when evaluated at the initial condition
10
𝑥0 = [ ]
0
for each representation.
• create instances of the LinearStateSpace class for the two representations of the {𝑦𝑡 } process and use them to
obtain impulse response functions of 𝑐𝑡 and 𝑏𝑡 to the respective shocks 𝜖𝑡 and 𝑎𝑡 for the two representations.
• run simulations of {𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡 } of length 𝑇 under both of the representations
We formulae the problem:
∞
2
min ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)
𝑡=0
𝑦𝑡+1 1 −𝛽 −1 0 𝑦𝑡 0 𝜎𝜖
⎡ 𝜖 ⎤= ⎡ 0 0 0 ⎤ ⎡ 𝜖 ⎤ + ⎡ 0 ⎤ [ 𝑐 ] + ⎡ 𝜎 ⎤𝜈 ,
⎢ 𝑡+1 ⎥ ⎢ ⎥⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡 ⎢ 𝜖 ⎥ 𝑡+1
⎣ − (1 + 𝑟)
⎣ 𝑏𝑡+1 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 0 1 + 𝑟 ⎦ ⎣ 𝑏𝑡 ⎦ ⏟⎣⏟1⏟
+⏟𝑟⏟
⎦ ⎣
⏟ 0 ⎦
≡𝐴1 ≡𝐵1 ≡𝐶1
and
𝑦𝑡+1 1 −𝛽 0 𝑦𝑡 0 𝜎𝑎
⎡ 𝑎 ⎤ ⎡ 0 0 0 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎢ 𝑡+1 ⎥ = ⎢ ⎥ ⎢ 𝑎𝑡 ⎥ + ⎢ 0 ⎥ [ 𝑐𝑡 ] + ⎢ 𝜎𝑎 ⎥𝑢𝑡+1 .
⎣ 𝑏𝑡+1 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⎣ − (1 + 𝑟) 0 1 + 𝑟 ⎦ ⎣ 𝑏𝑡 ⎦ ⏟ ⎣⏟1⏟
+⏟𝑟⏟
⎦ ⎣ 0 ⎦
⏟
≡𝐴2 ≡𝐵2 ≡𝐶2
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
5.9. Computations 83
Advanced Quantitative Economics with Python
# Set parameters
β, σϵ = 0.95, 1
σa = σϵ / β
R = 1 / β
Evidently, optimal consumption and debt decision rules for the consumer having news representation (5.1) are
𝑐𝑡∗ = 𝑦𝑡 − 𝜖𝑡 − (1 − 𝛽) 𝑏𝑡 ,
∗
𝑏𝑡+1 = 𝛽 −1 𝑐𝑡∗ + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝛽 −1 𝑦𝑡 − 𝛽 −1 𝜖𝑡 − (𝛽 −1 − 1) 𝑏𝑡 + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝑏𝑡 − 𝛽 −1 𝜖𝑡 .
# Innovations representation
ALQ2 = np.array([[1, -β, 0],
[0, 0, 0],
[-R, 0, R]])
BLQ2 = np.array([[0, 0, R]]).T
CLQ2 = np.array([[σa, σa, 0]]).T
-F2
For a consumer having access only to the information associated with the innovations representation (5.2), the optimal
Now we construct two Linear State Space models that emerge from using optimal policies of the form 𝑢𝑡 = −𝐹 𝑥𝑡 .
Take the more informative original representation (5.1) as an example:
𝑦𝑡+1 𝑦𝑡
⎡ 𝜖 ⎤ = (𝐴 − 𝐵 𝐹 ) ⎡ 𝜖 ⎤ + 𝐶 𝜈
⎢ 𝑡+1 ⎥ 1 1 1 ⎢ 𝑡 ⎥ 1 𝑡+1
⎣ 𝑏𝑡+1 ⎦ ⎣ 𝑏𝑡 ⎦
𝑦
𝑐𝑡 −𝐹1 ⎡ 𝑡 ⎤
[ ]=[ ] ⎢ 𝜖𝑡 ⎥
𝑏𝑡 𝑆𝑏
⎣ 𝑏𝑡 ⎦
To have the Linear State Space model be of an innovations representation form (5.2), we can simply replace the corre-
sponding matrices.
5.9. Computations 85
Advanced Quantitative Economics with Python
<matplotlib.legend.Legend at 0x7fc14e8fe450>
The above two impulse response functions show that when the consumer has the information assumed in the more infor-
mative representation (5.1), his response to receiving a positive shock of 𝜖𝑡 is to leave his consumption unchanged and to
save the entire amount of his extra income and then forever roll over the extra bonds that he holds.
To see this notice, that starting from next period on, his debt permanently decreases by 𝛽 −1
plt.title("innovations representation")
plt.plot(range(J), c_res2 / σa, label="c impulse response function")
plt.plot(range(J), b_res2 / σa, label="b impulse response function")
plt.plot([0, J-1], [0, 0], '--', color='k')
plt.legend()
<matplotlib.legend.Legend at 0x7fc14e6a64e0>
The above impulse responses show that when the consumer has only the information that is assumed to be available
under the innovations representation (5.2) for {𝑦𝑡 − 𝑦𝑡−1 }, he responds to a positive 𝑎𝑡 by permanently increasing his
consumption.
He accomplishes this by consuming a fraction (1 − 𝛽 2 ) of the increment 𝑎𝑡 to his nonfinancial income and saving the
rest, thereby lowering 𝑏𝑡+1 in order to finance the permanent increment in his consumption.
The preceding computations confirm what we had derived earlier using paper and pencil.
Now let’s simulate some paths of consumption and debt for our two types of consumers while always presenting both
types with the same {𝑦𝑡 } path.
x1, y1 = LSS1.simulate(ts_length=T)
plt.plot(range(T), y1[0, :], label="c")
plt.plot(range(T), x1[2, :], label="b")
plt.plot(range(T), x1[0, :], label="y")
plt.title("more informative representation")
plt.legend()
<matplotlib.legend.Legend at 0x7fc14e6a6d20>
5.9. Computations 87
Advanced Quantitative Economics with Python
x2, y2 = LSS2.simulate(ts_length=T)
plt.plot(range(T), y2[0, :], label="c")
plt.plot(range(T), x2[2, :], label="b")
plt.plot(range(T), x2[0, :], label="y")
plt.title("innovations representation")
plt.legend()
<matplotlib.legend.Legend at 0x7fc14e1b6c00>
We now form a single {𝑦𝑡 }𝑇𝑡=0 realization that we will use to simulate decisions associated with our two types of consumer.
We accomplish this in the following steps.
1. We form a {𝑦𝑡 , 𝜖𝑡 } realization by drawing a long simulation of {𝜖𝑡 }𝑇𝑡=0 , where 𝑇 is a big integer, 𝜖𝑡 = 𝜎𝜖 𝑣𝑡 , 𝑣𝑡 is
a standard normal scalar, 𝑦0 = 100, and
𝑦𝑡+1 − 𝑦𝑡 = −𝛽 −1 𝜖𝑡 + 𝜖𝑡+1 .
2. We take the {𝑦𝑡 } realization generated in step 1 and form an innovation process {𝑎𝑡 } from the formulas
𝑎0 = 0
𝑡−1
𝑎𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡−𝑗 − 𝑦𝑡−𝑗−1 ) + 𝛽 𝑡 𝑎0 , 𝑡≥1
𝑗=0
3. We throw away the first 𝑆 observations and form a sample {𝑦𝑡 , 𝜖𝑡 , 𝑎𝑡 }𝑇𝑆+1 as the realization that we’ll use in the
following steps.
4. We use the step 3 realization to evaluate and simulate the decision rules for 𝑐𝑡 , 𝑏𝑡 that Python has computed for
us above.
The above steps implement the experiment of comparing decisions made by two consumers having identical incomes at
each date but at each date having different information about their future incomes.
Here we use formula (5.3) above to compute 𝑎𝑡+1 as a function of the history 𝜖𝑡+1 , 𝜖𝑡 , 𝜖𝑡−1 , …
Thus, we compute
We can verify that we recover the same {𝑎𝑡 } sequence computed earlier.
This quantecon lecture contains another example of a shock-invertibility issue that is endemic to the LQ permanent income
or consumption smoothing model.
The technical issue discussed there is ultimately the source of the shock-invertibility issues discussed by Eric Leeper,
Todd Walker, and Susan Yang [Leeper et al., 2013] in their analysis of fiscal foresight.
SIX
6.1 Overview
91
Advanced Quantitative Economics with Python
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
import scipy.linalg as la
6.2 Background
We’ll study a complete markets model adapted to a setting with a continuous Markov state like that in the first lecture on
the permanent income model.
In that model
• a consumer can trade only a single risk-free one-period bond bearing gross one-period risk-free interest rate equal
to 𝛽 −1 .
• a consumer’s exogenous nonfinancial income is governed by a linear state space model driven by Gaussian shocks,
the kind of model studied in an earlier lecture about linear state space models.
Let’s write down a complete markets counterpart of that model.
Suppose that nonfinancial income is governed by the state space system
where 𝜙(⋅ | 𝜇, Σ) is a multivariate Gaussian distribution with mean vector 𝜇 and covariance matrix Σ.
With the pricing kernel 𝑞𝑡+1 (𝑥𝑡+1 | 𝑥𝑡 ) in hand, we can price claims to consumption at time 𝑡 + 1 consumption that pay
off when 𝑥𝑡+1 ∈ 𝑆 at time 𝑡 + 1:
𝑛
where 𝑆 is a subset of ℝ .
The price ∫𝑆 𝑞𝑡+1 (𝑥𝑡+1 | 𝑥𝑡 )𝑑𝑥𝑡+1 of such a claim depends on state 𝑥𝑡 because the prices of the 𝑥𝑡+1 -contingent securities
depend on 𝑥𝑡 through the pricing kernel 𝑞(𝑥𝑡+1 | 𝑥𝑡 ).
Let 𝑏(𝑥𝑡+1 ) be a vector of state-contingent debt due at 𝑡 + 1 as a function of the 𝑡 + 1 state 𝑥𝑡+1 .
Using the pricing kernel assumed in (6.1), the value at 𝑡 of 𝑏(𝑥𝑡+1 ) is evidently
In our complete markets setting, the consumer faces a sequence of budget constraints
or
which verifies that 𝛽𝐸𝑡 𝑏𝑡+1 is the value of time 𝑡 + 1 state-contingent claims on time 𝑡 + 1 consumption issued by the
consumer at time 𝑡
We can solve the time 𝑡 budget constraint forward to obtain
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 )
𝑗=0
In the incomplete markets version of the model, we assumed that 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 , so that the above utility functional
became
∞
− ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)2 , 0<𝛽<1
𝑡=0
But in the complete markets version, it is tractable to assume a more general utility function that satisfies 𝑢′ > 0 and
𝑢″ < 0.
First-order conditions for the consumer’s problem with complete markets and our assumption about Arrow securities
prices are
or
1
𝑏𝑡 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥𝑡 − 𝑐̄ (6.2)
1−𝛽
where 𝑐 ̄ satisfies
1
𝑏̄0 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥0 − 𝑐̄ (6.3)
1−𝛽
where 𝑏̄0 is an initial level of the consumer’s debt due at time 𝑡 = 0, specified as a parameter of the problem.
Thus, in the complete markets version of the consumption-smoothing model, 𝑐𝑡 = 𝑐,̄ ∀𝑡 ≥ 0 is determined by (6.3) and
the consumer’s debt is the fixed function of the state 𝑥𝑡 described by (6.2).
Please recall that in the LQ permanent income model studied in permanent income model, the state is 𝑥𝑡 , 𝑏𝑡 , where 𝑏𝑡 is
a complicated function of past state vectors 𝑥𝑡−𝑗 .
Notice that in contrast to that incomplete markets model, at time 𝑡 the state vector is 𝑥𝑡 alone in our complete markets
model.
Here’s an example that shows how in this setting the availability of insurance against fluctuating nonfinancial income
allows the consumer completely to smooth consumption across time and across states of the world
# Debt
x_hist, y_hist = lss.simulate(T)
b_hist = np.squeeze(S_y @ rm @ x_hist - cbar / (1 - β))
# Define parameters
N_simul = 80
α, ρ1, ρ2 = 10.0, 0.9, 0.0
σ = 1.0
# Consumption plots
ax[0].set_title('Consumption and income')
ax[0].plot(np.arange(N_simul), c_hist_com, label='consumption')
ax[0].plot(np.arange(N_simul), y_hist_com, label='income', alpha=.6, linestyle='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([80, 120])
# Debt plots
ax[1].set_title('Debt and income')
ax[1].plot(np.arange(N_simul), b_hist_com, label='debt')
ax[1].plot(np.arange(N_simul), y_hist_com, label='Income', alpha=.6, linestyle='--')
ax[1].legend()
ax[1].axhline(0, color='k')
ax[1].set_xlabel('Periods')
plt.show()
The incomplete markets version of the model with nonfinancial income being governed by a linear state space system is
described in permanent income model.
In that incomplete markerts setting, consumption follows a random walk and the consumer’s debt follows a process with
a unit root.
We now turn to a finite-state Markov version of the model in which the consumer’s nonfinancial income is an exact
function of a Markov state that takes one of 𝑁 values.
We’ll start with a setting in which in each version of our consumption-smoothing model, nonfinancial income is governed
by a two-state Markov chain (it’s easy to generalize this to an 𝑁 state Markov chain).
In particular, the state 𝑠𝑡 ∈ {1, 2} follows a Markov chain with transition probability matrix
𝑃𝑖𝑗 = ℙ{𝑠𝑡+1 = 𝑗 | 𝑠𝑡 = 𝑖}
𝑦1̄ if 𝑠𝑡 = 1
𝑦𝑡 = {
𝑦2̄ if 𝑠𝑡 = 2
Our complete and incomplete markets models differ in how thoroughly the market structure allows a consumer to transfer
resources across time and Markov states, there being more transfer opportunities in the complete markets setting than in
the incomplete markets setting.
Watch how these differences in opportunities affect
• how smooth consumption is across time and Markov states
• how the consumer chooses to make his levels of indebtedness behave over time and across Markov states
At each date 𝑡 ≥ 0, the consumer trades a full array of one-period ahead Arrow securities.
We assume that prices of these securities are exogenous to the consumer.
Exogenous means that they are unaffected by the consumer’s decisions.
In Markov state 𝑠𝑡 at time 𝑡, one unit of consumption in state 𝑠𝑡+1 at time 𝑡 + 1 costs 𝑞(𝑠𝑡+1 | 𝑠𝑡 ) units of the time 𝑡
consumption good.
The prices 𝑞(𝑠𝑡+1 | 𝑠𝑡 ) are given and can be organized into a matrix 𝑄 with 𝑄𝑖𝑗 = 𝑞(𝑗|𝑖)
At time 𝑡 = 0, the consumer starts with an inherited level of debt due at time 0 of 𝑏0 units of time 0 consumption goods.
The consumer’s budget constraint at 𝑡 ≥ 0 in Markov state 𝑠𝑡 is
where 𝑏𝑡 is the consumer’s one-period debt that falls due at time 𝑡 and 𝑏𝑡+1 (𝑗 | 𝑠𝑡 ) are the consumer’s time 𝑡 sales of the
time 𝑡 + 1 consumption good in Markov state 𝑗.
Thus
• 𝑞(𝑗 | 𝑠𝑡 )𝑏𝑡+1 (𝑗 | 𝑠𝑡 ) is a source of time 𝑡 financial income for the consumer in Markov state 𝑠𝑡
• 𝑏𝑡 ≡ 𝑏𝑡 (𝑗 | 𝑠𝑡−1 ) is a source of time 𝑡 expenditures for the consumer when 𝑠𝑡 = 𝑗
Remark: We are ignoring an important technicality here, namely, that the consumer’s choice of 𝑏𝑡+1 (𝑗| 𝑠𝑡 ) must respect
so-called natural debt limits that assure that it is feasible for the consumer to repay debts due even if he consumers zero
forevermore. We shall discuss such debt limits in another lecture.
A natural analog of Hall’s assumption that the one-period risk-free gross interest rate is 𝛽 −1 is
To understand how this is a natural analogue, observe that in state 𝑖 it costs ∑𝑗 𝑞(𝑗 | 𝑖) to purchase one unit of consumption
next period for sure, i.e., meaning no matter what Markov state 𝑗 occurs at 𝑡 + 1.
Hence the implied price of a risk-free claim on one unit of consumption next period is
∑ 𝑞(𝑗 | 𝑖) = ∑ 𝛽𝑃𝑖𝑗 = 𝛽
𝑗 𝑗
This confirms the sense in which (6.6) is a natural counterpart to Hall’s assumption that the risk-free one-period gross
interest rate is 𝑅 = 𝛽 −1 .
It is timely please to recall that the gross one-period risk-free interest rate is the reciprocal of the price at time 𝑡 of a
risk-free claim on one unit of consumption tomorrow.
First-order necessary conditions for maximizing the consumer’s expected utility subject to the sequence of budget con-
straints (6.5) are
𝑢′ (𝑐𝑡+1 )
𝛽 ℙ{𝑠𝑡+1 | 𝑠𝑡 } = 𝑞(𝑠𝑡+1 | 𝑠𝑡 )
𝑢′ (𝑐𝑡 )
for all 𝑠𝑡 , 𝑠𝑡+1 or, under our assumption (6.6) about Arrow security prices,
𝑐𝑡+1 = 𝑐𝑡 (6.7)
Thus, our consumer sets 𝑐𝑡 = 𝑐 ̄ for all 𝑡 ≥ 0 for some value 𝑐 ̄ that it is our job now to determine along with values for
𝑏𝑡+1 (𝑗|𝑠𝑡 = 𝑖) for 𝑖 = 1, 2 and 𝑗 = 1, 2.
We’ll use a guess and verify method to determine these objects
Guess: We’ll make the plausible guess that
so that the amount borrowed today depends only on tomorrow’s Markov state. (Why is this is a plausible guess?)
To determine 𝑐,̄ we shall deduce implications of the consumer’s budget constraints in each Markov state today and our
guess (6.8) about the consumer’s debt level choices.
For 𝑡 ≥ 1, these imply
where 𝑏0 is the (exogenous) debt the consumer is assumed to bring into period 0
If we substitute (6.10) into the first equation of (6.9) and rearrange, we discover that
𝑏(1) = 𝑏0 (6.11)
We can then use the second equation of (6.9) to deduce the restriction
The preceding calculations indicate that in the complete markets version of our model, we obtain the following striking
results:
• The consumer chooses to make consumption perfectly constant across time and across Markov states.
• State-contingent debt purchases 𝑏𝑡+1 (𝑠𝑡+1 = 𝑗|𝑠𝑡 = 𝑖) depend only on 𝑗
• If the initial Markov state is 𝑠0 = 𝑗 and initial consumer debt is 𝑏0 , then debt in Markov state 𝑗 satisfies 𝑏(𝑗) = 𝑏0
To summarize what we have achieved up to now, we have computed the constant level of consumption 𝑐 ̄ and indicated
how that level depends on the underlying specifications of preferences, Arrow securities prices, the stochastic process of
exogenous nonfinancial income, and the initial debt level 𝑏0
• The consumer’s debt neither accumulates, nor decumulates, nor drifts – instead, the debt level each period is an
exact function of the Markov state, so in the two-state Markov case, it switches between two values.
• We have verified guess (6.8).
• When the state 𝑠𝑡 returns to the initial state 𝑠0 , debt returns to the initial debt level.
• Debt levels in all other states depend on virtually all remaining parameters of the model.
6.4.2 Code
Here’s some code that, among other things, contains a function called consumption_complete().
This function computes {𝑏(𝑖)}𝑁
𝑖=1 , 𝑐 ̄ as outcomes given a set of parameters for the general case with 𝑁 Markov states
under the assumption of complete markets
class ConsumptionProblem:
"""
The data for a consumption problem, including some default values.
"""
def __init__(self,
β=.96,
y=[2, 1.5],
b0=3,
P=[[.8, .2],
[.4, .6]],
init=0):
"""
Parameters
----------
β : discount factor
y : list containing the two income levels
b0 : debt in period 0 (= initial state debt level)
(continues on next page)
return s_path
def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.
Parameters
----------
cp : instance of ConsumptionProblem
Returns
-------
Q = β * P
"""
β, P, y, b0, init = cp.β, cp.P, cp.y, cp.b0, cp.init # Unpack
A = np.zeros((n, n))
A[:, 0] = 1
A[1:, 1:] = np.eye(n-1)
c_bar = x[0, 0]
b = x[1:, 0]
return c_bar, b
Parameters
----------
cp : instance of ConsumptionProblem
s_path : the path of states
"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack
N_simul = len(s_path)
# Useful variables
n = len(y)
y.shape = (n, 1)
v = np.linalg.inv(np.eye(n) - β * P) @ y
for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - np.full((n, 1), b_path[i]))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]
cp = ConsumptionProblem()
c_bar, b = consumption_complete(cp)
np.isclose(c_bar + b[1] - cp.y[1] - (cp.β * cp.P)[1, :] @ b, 0)
True
Below, we’ll take the outcomes produced by this code – in particular the implied consumption and debt paths – and
compare them with outcomes from an incomplete markets model in the spirit of Hall [Hall, 1978]
This is a version of the original model of Hall (1978) in which the consumer’s ability to substitute intertemporally is
constrained by his ability to buy or sell only one security, a risk-free one-period bond bearing a constant gross interest
rate that equals 𝛽 −1 .
Given an initial debt 𝑏0 at time 0, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝑏𝑡+1 , 𝑡≥0
where 𝛽 is the price at time 𝑡 of a risk-free claim on one unit of time consumption at time 𝑡 + 1.
First-order conditions for the consumer’s problem are
which for our finite-state Markov setting is Hall’s (1978) conclusion that consumption follows a random walk.
As we saw in our first lecture on the permanent income model, this leads to
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − (1 − 𝛽)−1 𝑐𝑡 (6.14)
𝑗=0
and
∞
𝑐𝑡 = (1 − 𝛽) [𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑏𝑡 ] (6.15)
𝑗=0
Equation (6.15) expresses 𝑐𝑡 as a net interest rate factor 1 − 𝛽 times the sum of the expected present value of nonfinancial
∞
income 𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 and financial wealth −𝑏𝑡 .
Substituting (6.15) into the one-period budget constraint and rearranging leads to
∞
𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑦𝑡 ] (6.16)
𝑗=0
∞
Now let’s calculate the key term 𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 in our finite Markov chain setting.
Define the expected discounted present value of non-financial income
∞
𝑣𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0
𝑣𝑡 ∶= 𝑦𝑡 + 𝛽𝔼𝑡 𝑣𝑡+1
In our two-state Markov chain setting, 𝑣𝑡 = 𝑣(1) when 𝑠𝑡 = 1 and 𝑣𝑡 = 𝑣(2) when 𝑠𝑡 = 2.
Therefore, we can write our Bellman equation as
or
𝑣 ⃗ = 𝑦 ⃗ + 𝛽𝑃 𝑣 ⃗
𝑣(1) 𝑦(1)
where 𝑣 ⃗ = [ ] and 𝑦 ⃗ = [ ].
𝑣(2) 𝑦(2)
We can also write the last expression as
𝑣 ⃗ = (𝐼 − 𝛽𝑃 )−1 𝑦 ⃗
In our finite Markov chain setting, from expression (6.15), consumption at date 𝑡 when debt is 𝑏𝑡 and the Markov state
today is 𝑠𝑡 = 𝑖 is evidently
In contrast to outcomes in the complete markets model, in the incomplete markets model
• consumption drifts over time as a random walk; the level of consumption at time 𝑡 depends on the level of debt that
the consumer brings into the period as well as the expected discounted present value of nonfinancial income at 𝑡.
• the consumer’s debt drifts upward over time in response to low realizations of nonfinancial income and drifts
downward over time in response to high realizations of nonfinancial income.
• the drift over time in the consumer’s debt and the dependence of current consumption on today’s debt level account
for the drift over time in consumption.
The code above also contains a function called consumption_incomplete() that uses (6.17) and (6.18) to
• simulate paths of 𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡+1
• plot these against values of 𝑐,̄ 𝑏(𝑠1 ), 𝑏(𝑠2 ) found in a corresponding complete markets economy
Let’s try this, using the same parameters in both complete and incomplete markets economies
cp = ConsumptionProblem()
s_path = cp.simulate()
N_simul = len(s_path)
ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), np.full(N_simul, c_bar),
(continues on next page)
ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path],
label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')
plt.show()
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets, but takes a random walk in the incomplete markets
version of the model.
• the consumer’s debt oscillates between two values that are functions of the Markov state in the complete markets
model, while the consumer’s debt drifts in a “unit root” fashion in the incomplete markets economy.
6.5.3 A sequel
In tax smoothing with complete and incomplete markets, we reinterpret the mathematics and Python code presented in this
lecture in order to construct tax-smoothing models in the incomplete markets tradition of Barro [Barro, 1979] as well as
in the complete markets tradition of Lucas and Stokey [Lucas and Stokey, 1983].
SEVEN
7.1 Overview
This lecture describes tax-smoothing models that are counterparts to consumption-smoothing models in Consumption
Smoothing with Complete and Incomplete Markets.
• one is in the complete markets tradition of Lucas and Stokey [Lucas and Stokey, 1983].
• the other is in the incomplete markets tradition of Barro [Barro, 1979].
Complete markets allow a government to buy or sell claims contingent on all possible Markov states.
Incomplete markets allow a government to buy or sell only a limited set of securities, often only a single risk-free security.
Barro [Barro, 1979] worked in an incomplete markets tradition by assuming that the only asset that can be traded is a
risk-free one period bond.
In his consumption-smoothing model, Hall [Hall, 1978] had assumed an exogenous stochastic process of nonfinancial
income and an exogenous gross interest rate on one period risk-free debt that equals 𝛽 −1 , where 𝛽 ∈ (0, 1) is also a
consumer’s intertemporal discount factor.
Barro [Barro, 1979] made an analogous assumption about the risk-free interest rate in a tax-smoothing model that turns
out to have the same mathematical structure as Hall’s consumption-smoothing model.
To get Barro’s model from Hall’s, all we have to do is to rename variables.
We maintain Hall’s and Barro’s assumption about the interest rate when we describe an incomplete markets version of
our model.
In addition, we extend their assumption about the interest rate to an appropriate counterpart to create a “complete markets”
model in the style of Lucas and Stokey [Lucas and Stokey, 1983].
105
Advanced Quantitative Economics with Python
For each version of a consumption-smoothing model, a tax-smoothing counterpart can be obtained simply by relabeling
• consumption as tax collections
• a consumer’s one-period utility function as a government’s one-period loss function from collecting taxes that im-
pose deadweight welfare losses
• a consumer’s nonfinancial income as a government’s purchases
• a consumer’s debt as a government’s assets
Thus, we can convert the consumption-smoothing models in lecture Consumption Smoothing with Complete and Incomplete
Markets into tax-smoothing models by setting 𝑐𝑡 = 𝑇𝑡 , 𝑦𝑡 = 𝐺𝑡 , and −𝑏𝑡 = 𝑎𝑡 , where 𝑇𝑡 is total tax collections, {𝐺𝑡 }
is an exogenous government expenditures process, and 𝑎𝑡 is the government’s holdings of one-period risk-free bonds
coming maturing at the due at the beginning of time 𝑡.
For elaborations on this theme, please see Optimal Savings II: LQ Techniques and later parts of this lecture.
We’ll spend most of this lecture studying acquire finite-state Markov specification, but will also treat the linear state space
specification.
Link to History
For those who love history, President Thomas Jefferson’s Secretary of Treasury Albert Gallatin (1807) [Gallatin, 1837]
seems to have prescribed policies that come from Barro’s model [Barro, 1979]
Let’s start with some standard imports:
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
To exploit the isomorphism between consumption-smoothing and tax-smoothing models, we simply use code from Con-
sumption Smoothing with Complete and Incomplete Markets
7.1.2 Code
class ConsumptionProblem:
"""
The data for a consumption problem, including some default values.
"""
def __init__(self,
β=.96,
y=[2, 1.5],
b0=3,
P=[[.8, .2],
[.4, .6]],
init=0):
"""
(continues on next page)
β : discount factor
y : list containing the two income levels
b0 : debt in period 0 (= initial state debt level)
P : 2x2 transition matrix
init : index of initial state s0
"""
self.β = β
self.y = np.asarray(y)
self.b0 = b0
self.P = np.asarray(P)
self.init = init
return s_path
def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.
Parameters
----------
cp : instance of ConsumptionProblem
Returns
-------
Q = β * P
"""
β, P, y, b0, init = cp.β, cp.P, cp.y, cp.b0, cp.init # Unpack
A = np.zeros((n, n))
A[:, 0] = 1
A[1:, 1:] = np.eye(n-1)
c_bar = x[0, 0]
b = x[1:, 0]
return c_bar, b
Parameters
----------
cp : instance of ConsumptionProblem
s_path : the path of states
"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack
N_simul = len(s_path)
# Useful variables
n = len(y)
y.shape = (n, 1)
v = np.linalg.inv(np.eye(n) - β * P) @ y
for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - np.full((n, 1), b_path[i]))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]
The code above also contains a function called consumption_incomplete() that uses (6.17) and (6.18) to
• simulate paths of 𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡+1
• plot these against values of 𝑐,̄ 𝑏(𝑠1 ), 𝑏(𝑠2 ) found in a corresponding complete markets economy
Let’s try this, using the same parameters in both complete and incomplete markets economies
cp = ConsumptionProblem()
s_path = cp.simulate()
N_simul = len(s_path)
ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), np.full(N_simul, c_bar), label='complete market')
ax[0].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path], label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')
plt.show()
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets.
• consumption takes a random walk in the incomplete markets version of the model.
• the consumer’s debt oscillates between two values that are functions of the Markov state in the complete markets
model.
• the consumer’s debt drifts because it contains a unit root in the incomplete markets economy.
As indicated above, we relabel variables to acquire tax-smoothing interpretations of the complete markets and incomplete
markets consumption-smoothing models.
plt.show()
𝑇𝑖 + 𝑏𝑖 = 𝐺𝑖 + ∑ 𝑄𝑖𝑗 𝑏𝑗
𝑗
where
𝑄𝑖𝑗 = 𝛽𝑃𝑖𝑗
is the price today of one unit of goods in Markov state 𝑗 tomorrow when the Markov state is 𝑖 today.
𝑏𝑖 is the government’s level of assets when it arrives in Markov state 𝑖.
That is, 𝑏𝑖 equals one-period state-contingent claims owed to the government that fall due at time 𝑡 when the Markov state
is 𝑖.
Thus, if 𝑏𝑖 < 0, it means the government is owed 𝑏𝑖 or owes −𝑏𝑖 when the economy arrives in Markov state 𝑖 at time 𝑡.
In our examples below, this happens when in a previous war-time period the government has sold an Arrow securities
paying off −𝑏𝑖 in peacetime Markov state 𝑖
It can be enlightening to express the government’s budget constraint in Markov state 𝑖 as
𝑇𝑖 = 𝐺𝑖 + (∑ 𝑄𝑖𝑗 𝑏𝑗 − 𝑏𝑖 )
𝑗
in which the term (∑𝑗 𝑄𝑖𝑗 𝑏𝑗 − 𝑏𝑖 ) equals the net amount that the government spends to purchase one-period Arrow
securities that will pay off next period in Markov states 𝑗 = 1, … , 𝑁 after it has received payments 𝑏𝑖 this period.
The cumulative return earned from putting 1 unit of time 𝑡 goods into the government portfolio of state-contingent
securities at time 𝑡 and then rolling over the proceeds into the government portfolio each period thereafter is
Here is some code that computes one-period and cumulative returns on the government portfolio in the finite-state Markov
version of our complete markets model.
Convention: In this code, when 𝑃𝑖𝑗 = 0, we arbitrarily set 𝑅(𝑗|𝑖) to be 0.
values = Q @ b
n = len(b)
R = np.zeros((n, n))
for i in range(n):
ind = cp.P[i, :] != 0
R[i, ind] = b[ind] / values[i]
return R
RT_path = np.empty(T)
RT_path[0] = 1
RT_path[1:] = np.cumprod([R[s_path[t], s_path[t+1]] for t in range(T-1)])
return RT_path
# Parameters
β = .96
cp = ConsumptionProblem(β, g, b0, P)
(continues on next page)
print(f"P \n {P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in peace and war = {g}")
print(f"Constant tax collections = {T_bar}")
print(f"Govt debts in two states = {-b}")
msg = """
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
"""
print(msg)
AS1 = Q[0, :] @ b
# spending on Arrow security
# since the spending on Arrow peace security is not 0 anymore after we change b0 to 1
print(f"Spending on Arrow security in peace = {AS1}")
AS2 = Q[1, :] @ b
print(f"Spending on Arrow security in war = {AS2}")
print("")
# tax collections minus debt levels
print("Government tax collections minus debt levels in peace and war")
TB1 = T_bar + b[0]
print(f"T+b in peace = {TB1}")
TB2 = T_bar + b[1]
print(f"T+b in war = {TB2}")
print("")
print("Total government spending in peace and war")
G1 = g[0] + AS1
G2 = g[1] + AS2
print(f"Peace = {G1}")
print(f"War = {G2}")
print("")
print("Let's see ex-post and ex-ante returns on Arrow securities")
Π = np.reciprocal(Q)
exret = Π
print(f"Ex-post returns to purchase of Arrow securities = \n {exret}")
exant = Π * P
print(f"Ex-ante returns to purchase of Arrow securities \n {exant}")
print("")
print("The Ex-post one-period gross return on the portfolio of government assets")
print(R)
print(RT_path[-1])
P
[[0.8 0.2]
[0.4 0.6]]
Q
[[0.768 0.192]
[0.384 0.576]]
Govt expenditures in peace and war = [1, 2]
Constant tax collections = 1.2716883116883118
Govt debts in two states = [-1. -2.62337662]
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
2.0860704239993675
7.3.2 Explanation
In this example, the government always purchase 1 units of the Arrow security that pays off in peace time (Markov state
1).
And it purchases a higher amount of the security that pays off in war time (Markov state 2).
Thus, this is an example in which
• during peacetime, the government purchases insurance against the possibility that war breaks out next period
• during wartime, the government purchases insurance against the possibility that war continues another period
• so long as peace continues, the ex post return on insurance against war is low
• when war breaks out or continues, the ex post return on insurance against war is high
• given the history of states that we assumed, the value of one unit of the portfolio of government assets eventually
doubles in the end because of high returns during wartime.
We recommend plugging the quantities computed above into the government budget constraints in the two Markov states
and staring.
Exercise 7.3.1
Try changing the Markov transition matrix so that
1 0
𝑃 =[ ]
.2 .8
Also, start the system in Markov state 2 (war) with initial government assets −10, so that the government starts the war
in debt and 𝑏2 = −10.
To interpret some episodes in the fiscal history of the United States, we find it interesting to study a few more examples.
We compute examples in an 𝑁 state Markov setting under both complete and incomplete markets.
These examples differ in how Markov states are jumping between peace and war.
To wrap procedures for solving models, relabeling graphs so that we record government debt rather than government
assets, and displaying results, we construct a Python class.
class TaxSmoothingExample:
"""
construct a tax-smoothing example, by relabeling consumption problem class.
"""
def __init__(self, g, P, b0, states, β=.96,
init=0, s_path=None, N_simul=80, random_state=1):
def display(self):
# plot graphs
N = len(self.T_path)
plt.figure()
plt.title('Tax collection paths')
plt.plot(np.arange(N), self.T_path, label='incomplete market')
plt.plot(np.arange(N), np.full(N, self.T_bar), label='complete market')
plt.plot(np.arange(N), self.g_path, label='govt expenditures', alpha=.6, ls='-
↪-')
plt.legend()
plt.xlabel('Periods')
plt.show()
plt.legend()
plt.axhline(0, color='k', ls='--')
plt.xlabel('Periods')
plt.show()
fig, ax = plt.subplots()
ax.set_title('Cumulative return path (complete markets)')
line1 = ax.plot(np.arange(N), self.RT_path, color='blue')[0]
c1 = line1.get_color()
ax.set_xlabel('Periods')
ax.set_ylabel('Cumulative return', color=c1)
ax_ = ax.twinx()
line2 = ax_.plot(np.arange(N), self.g_path, ls='--', color='green')[0]
c2 = line2.get_color()
ax_.set_ylabel('Government expenditures', color=c2)
plt.show()
print(f"P \n {self.cp.P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in {', '.join(self.states)} = {self.cp.y.flatten()}
↪")
print(f"Constant tax collections = {self.T_bar}")
print(f"Govt debt in {len(self.states)} states = {-self.b}")
print("")
print(f"Government tax collections minus debt levels in {', '.join(self.
↪states)}")
for i in range(len(self.states)):
TB = self.T_bar + self.b[i]
print(f" T+b in {self.states[i]} = {TB}")
print("")
print(f"Total government spending in {', '.join(self.states)}")
for i in range(len(self.states)):
G = self.cp.y[i, 0] + Q[i, :] @ self.b
print(f" {self.states[i]} = {G}")
print("")
print("Let's see ex-post and ex-ante returns on Arrow securities \n")
print("")
exant = 1 / self.cp.β
print(f"Ex-ante returns to purchase of Arrow securities = {exant}")
print("")
print("The Ex-post one-period gross return on the portfolio of government␣
↪assets")
print(self.R)
print("")
print("The cumulative return earned from holding 1 unit market portfolio of␣
↪government bonds")
print(self.RT_path[-1])
7.4.1 Parameters
γ = .1
λ = .1
ϕ = .1
θ = .1
ψ = .1
g_L = .5
g_M = .8
g_H = 1.2
β = .96
7.4.2 Example 1
This example is designed to produce some stylized versions of tax, debt, and deficit paths followed by the United States
during and after the Civil War and also during and after World War I.
We set the Markov chain to have three states
1−𝜆 𝜆 0
𝑃 =⎡
⎢ 0 1 − 𝜙 𝜙⎤⎥
⎣ 0 0 1⎦
P
[[0.9 0.1 0. ]
[0. 0.9 0.1]
[0. 0. 1. ]]
Q
[[0.864 0.096 0. ]
[0. 0.864 0.096]
[0. 0. 0.96 ]]
Govt expenditures in peace, war, postwar = [0.5 1.2 0.8]
Constant tax collections = 0.7548096885813149
Govt debt in 3 states = [-1. -4.07093426 -1.12975779]
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
0.17908622141460231
# The following shows the use of the wrapper class when a specific state path is given
s_path = [0, 0, 1, 1, 2]
ts_s_path = TaxSmoothingExample(g_ex1, P_ex1, b0_ex1, states_ex1, s_path=s_path)
ts_s_path.display()
P
[[0.9 0.1 0. ]
[0. 0.9 0.1]
[0. 0. 1. ]]
Q
[[0.864 0.096 0. ]
[0. 0.864 0.096]
[0. 0. 0.96 ]]
Govt expenditures in peace, war, postwar = [0.5 1.2 0.8]
Constant tax collections = 0.7548096885813149
Govt debt in 3 states = [-1. -4.07093426 -1.12975779]
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
0.9045311615620277
7.4.3 Example 2
This example captures a peace followed by a war, eventually followed by a permanent peace .
Here we set
1 0 0
𝑃 =⎡
⎢0 1−𝛾 𝛾 ⎤⎥
⎣𝜙 0 1 − 𝜙⎦
P
[[1. 0. 0. ]
[0. 0.9 0.1]
[0.1 0. 0.9]]
Q
[[0.96 0. 0. ]
[0. 0.864 0.096]
[0.096 0. 0.864]]
Govt expenditures in peace, temporary peace, war = [0.5 0.5 1.2]
Constant tax collections = 0.6053287197231834
Govt debt in 3 states = [ 2.63321799 -1. -2.51384083]
Government tax collections minus debt levels in peace, temporary peace, war
T+b in peace = -2.0278892733564
T+b in temporary peace = 1.6053287197231834
T+b in war = 3.1191695501730106
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
-9.368991732594216
7.4.4 Example 3
This example features a situation in which one of the states is a war state with no hope of peace next period, while another
state is a war state with a positive probability of peace next period.
The Markov chain is:
1−𝜆 𝜆 0 0
⎡ 0 1−𝜙 𝜙 0 ⎤
𝑃 =⎢ ⎥
⎢ 0 0 1−𝜓 𝜓 ⎥
⎣ 𝜃 0 0 1 − 𝜃⎦
with government expenditure levels for the four states being [𝑔𝐿 𝑔𝐿 𝑔𝐻 𝑔𝐻 ] where 𝑔𝐿 < 𝑔𝐻 .
We start with 𝑏0 = 1 and 𝑠0 = 1.
P
[[0.9 0.1 0. 0. ]
[0. 0.9 0.1 0. ]
[0. 0. 0.9 0.1]
[0.1 0. 0. 0.9]]
Q
[[0.864 0.096 0. 0. ]
[0. 0.864 0.096 0. ]
[0. 0. 0.864 0.096]
[0.096 0. 0. 0.864]]
Govt expenditures in peace1, peace2, war1, war2 = [0.5 0.5 1.2 1.2]
Constant tax collections = 0.6927944572748268
Govt debt in 4 states = [-1. -3.42494226 -6.86027714 -4.43533487]
Government tax collections minus debt levels in peace1, peace2, war1, war2
T+b in peace1 = 1.6927944572748268
T+b in peace2 = 4.117736720554273
T+b in war1 = 7.553071593533488
T+b in war2 = 5.128129330254041
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
0.02371440178864222
7.4.5 Example 4
with government expenditure levels for the five states being [𝑔𝐿 𝑔𝐿 𝑔𝐻 𝑔𝐻 𝑔𝐿 ] where 𝑔𝐿 < 𝑔𝐻 .
We ssume that 𝑏0 = 1 and 𝑠0 = 1.
P
[[0.9 0.1 0. 0. 0. ]
[0. 0.9 0.1 0. 0. ]
[0. 0. 0.9 0.1 0. ]
[0. 0. 0. 0.9 0.1]
[0. 0. 0. 0. 1. ]]
Q
[[0.864 0.096 0. 0. 0. ]
[0. 0.864 0.096 0. 0. ]
[0. 0. 0.864 0.096 0. ]
[0. 0. 0. 0.864 0.096]
[0. 0. 0. 0. 0.96 ]]
Govt expenditures in peace1, peace2, war1, war2, permanent peace = [0.5 0.5 1.2 1.
↪2 0.5]
Government tax collections minus debt levels in peace1, peace2, war1, war2,␣
↪permanent peace
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
-11.132109773063616
7.4.6 Example 5
The example captures a case when the system follows a deterministic path from peace to war, and back to peace again.
Since there is no randomness, the outcomes in complete markets setting should be the same as in incomplete markets
setting.
The Markov chain is:
0 1 0 0 0 0 0
⎡0 0 1 0 0 0 0⎤
⎢ ⎥
⎢0 0 0 1 0 0 0⎥
𝑃 = ⎢0 0 0 0 1 0 0⎥
⎢ ⎥
⎢0 0 0 0 0 1 0⎥
⎢0 0 0 0 0 0 1⎥
⎣0 0 0 0 0 0 1⎦
with government expenditure levels for the seven states being [𝑔𝐿 𝑔𝐿 𝑔𝐻 𝑔𝐻 𝑔𝐻 𝑔𝐻 𝑔𝐿 ] where 𝑔𝐿 < 𝑔𝐻 .
Assume 𝑏0 = 1 and 𝑠0 = 1.
ts_ex5.display()
P
[[0 1 0 0 0 0 0]
[0 0 1 0 0 0 0]
[0 0 0 1 0 0 0]
[0 0 0 0 1 0 0]
[0 0 0 0 0 1 0]
[0 0 0 0 0 0 1]
[0 0 0 0 0 0 1]]
Q
[[0. 0.96 0. 0. 0. 0. 0. ]
[0. 0. 0.96 0. 0. 0. 0. ]
[0. 0. 0. 0.96 0. 0. 0. ]
[0. 0. 0. 0. 0.96 0. 0. ]
[0. 0. 0. 0. 0. 0.96 0. ]
[0. 0. 0. 0. 0. 0. 0.96]
[0. 0. 0. 0. 0. 0. 0.96]]
Govt expenditures in peace1, peace2, war1, war2, war3, permanent peace = [0.5 0.5␣
↪1.2 1.2 1.2 1.2 0.5]
Government tax collections minus debt levels in peace1, peace2, war1, war2, war3,␣
↪permanent peace
Total government spending in peace1, peace2, war1, war2, war3, permanent peace
peace1 = 1.5571895472128
peace2 = 1.6584286588928003
war1 = 1.7638860668928
war2 = 1.1445708668928003
war3 = 0.49945086689280027
permanent peace = -0.17254913310719933
The cumulative return earned from holding 1 unit market portfolio of government␣
↪bonds
1.2775343959060068
To construct a tax-smoothing version of the complete markets consumption-smoothing model with a continuous state
space that we presented in the lecture consumption smoothing with complete and incomplete markets, we simply relabel
variables.
Thus, a government faces a sequence of budget constraints
where 𝑇𝑡 is tax revenues, 𝑏𝑡 are receipts at 𝑡 from contingent claims that the government had purchased at time 𝑡 − 1, and
which states that the present value of government purchases equals the value of government assets at 𝑡 plus the present
value of tax receipts.
With these relabelings, examples presented in consumption smoothing with complete and incomplete markets can be inter-
preted as tax-smoothing models.
Returns: In the continuous state version of our incomplete markets model, the ex post one-period gross rate of return
on the government portfolio equals
𝑏(𝑥𝑡+1 )
𝑅(𝑥𝑡+1 |𝑥𝑡 ) =
𝛽𝐸𝑏(𝑥𝑡+1 )|𝑥𝑡
Related Lectures
Throughout this lecture, we have taken one-period interest rates and Arrow security prices as exogenous objects deter-
mined outside the model and specified them in ways designed to align our models closely with the consumption smoothing
model of Barro [Barro, 1979].
Other lectures make these objects endogenous and describe how a government optimally manipulates prices of govern-
ment debt, albeit indirectly via effects distorting taxes have on equilibrium prices and allocations.
In optimal taxation in an LQ economy and recursive optimal taxation, we study complete-markets models in which the
government recognizes that it can manipulate Arrow securities prices.
Linear-quadratic versions of the Lucas-Stokey tax-smoothing model are described in Optimal Taxation in an LQ Economy.
That lecture is a warm-up for the non-linear-quadratic model of tax smoothing described in Optimal Taxation with State-
Contingent Debt.
In both Optimal Taxation in an LQ Economy and Optimal Taxation with State-Contingent Debt, the government recognizes
that its decisions affect prices.
In optimal taxation with incomplete markets, we study an incomplete-markets model in which the government also
manipulates prices of government debt.
EIGHT
In addition to what’s in Anaconda, this lecture will need the following libraries:
8.1 Overview
This lecture describes Markov jump linear quadratic dynamic programming, an extension of the method described
in the first LQ control lecture.
Markov jump linear quadratic dynamic programming is described and analyzed in [Do Val et al., 1999] and the references
cited there.
The method has been applied to problems in macroeconomics and monetary economics by [Svensson et al., 2008] and
[Svensson and Williams, 2009].
The periodic models of seasonality described in chapter 14 of [Hansen and Sargent, 2013] are a special case of Markov
jump linear quadratic problems.
Markov jump linear quadratic dynamic programming combines advantages of
• the computational simplicity of linear quadratic dynamic programming, with
• the ability of finite state Markov chains to represent interesting patterns of random variation.
The idea is to replace the constant matrices that define a linear quadratic dynamic programming problem with 𝑁 sets
of matrices that are fixed functions of the state of an 𝑁 state Markov chain.
The state of the Markov chain together with the continuous 𝑛 × 1 state vector 𝑥𝑡 form the state of the system.
For the class of infinite horizon problems being studied in this lecture, we obtain 𝑁 interrelated matrix Riccati equations
that determine 𝑁 optimal value functions and 𝑁 linear decision rules.
One of these value functions and one of these decision rules apply in each of the 𝑁 Markov states.
That is, when the Markov state is in state 𝑗, the value function and the decision rule for state 𝑗 prevails.
143
Advanced Quantitative Economics with Python
𝑢𝑡 = −𝐹 𝑥𝑡
− (𝑥′𝑡 𝑃 𝑥𝑡 + 𝜌)
𝜌 = 𝛽 (𝜌 + trace(𝑃 𝐶𝐶 ′ ))
With the preceding formulas in mind, we are ready to approach Markov Jump linear quadratic dynamic programming.
The key idea is to make the matrices 𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 fixed functions of a finite state 𝑠 that is governed by an 𝑁 state
Markov chain.
This makes decision rules depend on the Markov state, and so fluctuate through time in limited ways.
In particular, we use the following extension of a discrete-time linear quadratic dynamic programming problem.
We let 𝑠𝑡 ∈ [1, 2, … , 𝑁 ] be a time 𝑡 realization of an 𝑁 -state Markov chain with transition matrix Π having typical
element Π𝑖𝑗 .
We’ll switch between labeling today’s state as 𝑠𝑡 and 𝑖 and between labeling tomorrow’s state as 𝑠𝑡+1 or 𝑗.
The decision-maker solves the minimization problem:
∞
min
∞
𝐸 ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝑠𝑡 , 𝑢𝑡 )
{𝑢𝑡 }𝑡=0
𝑡=0
with
subject to linear laws of motion with matrices (𝐴, 𝐵, 𝐶) each possibly dependent on the Markov-state-𝑠𝑡 :
𝑢𝑡 = −𝐹𝑠𝑡 𝑥𝑡
or equivalently
−𝑥′𝑡 𝑃𝑖 𝑥𝑡 − 𝜌𝑖
The optimal value functions −𝑥′ 𝑃𝑖 𝑥 − 𝜌𝑖 for 𝑖 = 1, … , 𝑛 satisfy the 𝑁 interrelated Bellman equations
−𝑥′ 𝑃𝑖 𝑥 − 𝜌𝑖 = max −𝑥′ 𝑅𝑖 𝑥 +𝑢′ 𝑄𝑖 𝑢 + 2𝑢′ 𝑊𝑖 𝑥 + 𝛽 ∑ Π𝑖𝑗 𝐸((𝐴𝑖 𝑥 + 𝐵𝑖 𝑢 + 𝐶𝑖 𝑤)′ 𝑃𝑗 (𝐴𝑖 𝑥 + 𝐵𝑖 𝑢 + 𝐶𝑖 𝑤)𝑥 + 𝜌𝑗 )
𝑢
𝑗
The matrices 𝑃𝑠𝑡 = 𝑃𝑖 and the scalars 𝜌𝑠𝑡 = 𝜌𝑖 , 𝑖 = 1, …, n satisfy the following stacked system of algebraic matrix
Riccati equations:
8.4 Applications
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
8.5 Example 1
This example is a version of a classic problem of optimally adjusting a variable 𝑘𝑡 to a target level in the face of costly
adjustment.
This provides a model of gradual adjustment.
Given 𝑘0 , the objective function is
∞
max
∞
𝐸0 ∑ 𝛽 𝑡 𝑟 (𝑠𝑡 , 𝑘𝑡 )
{𝑘𝑡 }𝑡=1 𝑡=0
𝐸0 is a mathematical expectation conditioned on time 0 information 𝑥0 , 𝑠0 and the transition law for continuous state
variable 𝑘𝑡 is
𝑘𝑡+1 − 𝑘𝑡 = 𝑢𝑡
We can think of 𝑘𝑡 as the decision-maker’s capital and 𝑢𝑡 as costs of adjusting the level of capital.
We assume that 𝑓1 (𝑠𝑡 ) > 0, 𝑓2 (𝑠𝑡 ) > 0, and 𝑑 (𝑠𝑡 ) > 0.
Denote the state transition matrix for Markov state 𝑠𝑡 ∈ {1, 2} as Π:
Pr (𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖) = Π𝑖𝑗
𝑘
Let 𝑥𝑡 = [ 𝑡 ]
1
We can represent the one-period payoff function 𝑟 (𝑠𝑡 , 𝑘𝑡 ) as
Rs = np.zeros((m, n, n))
Qs = np.zeros((m, k, k))
for i in range(m):
Rs[i, 0, 0] = f2_vals[i]
Rs[i, 1, 0] = - f1_vals[i] / 2
Rs[i, 0, 1] = - f1_vals[i] / 2
Qs[i, 0, 0] = d_vals[i]
The continuous part of the state 𝑥𝑡 consists of two variables, namely, 𝑘𝑡 and a constant term.
We start with a Markov transition matrix that makes the Markov state be strictly periodic:
0 1
Π1 = [ ],
1 0
We set 𝑓1,𝑠𝑡 and 𝑓2,𝑠𝑡 to be independent of the Markov state 𝑠𝑡
𝑓1,1 = 𝑓1,2 = 1,
𝑓2,1 = 𝑓2,2 = 1
In contrast to 𝑓1,𝑠𝑡 and 𝑓2,𝑠𝑡 , we make the adjustment cost 𝑑𝑠𝑡 vary across Markov states 𝑠𝑡 .
We set the adjustment cost to be lower in Markov state 2
𝑑1 = 1, 𝑑2 = 0.5
The following code forms a Markov switching LQ problem and computes the optimal value functions and optimal decision
rules for each Markov state
# Construct matrices
Qs, Rs, Ns, As, Bs, Cs, k_star = construct_arrays1(d_vals=[1., 0.5])
Let’s look at the value function matrices and the decision rules for each Markov state
# P(s)
ex1_a.Ps
[[ 1.37424214, -0.68712107],
[-0.68712107, -4.65643947]]])
array([0., 0.])
# F(s)
ex1_a.Fs
[[ 0.74848427, -0.37424214]]])
Now we’ll plot the decision rules and see if they make sense
fig, ax = plt.subplots()
ax.plot(k_grid, k_grid + u1_star, label=r"$\overline{s}_1$ (high)")
ax.plot(k_grid, k_grid + u2_star, label=r"$\overline{s}_2$ (low)")
# The optimal k*
ax.scatter([0.5, 0.5], [0.5, 0.5], marker="*")
ax.plot([k_star[0], k_star[0]], [0., 1.0], '--')
(continues on next page)
# 45 degree line
ax.plot([0., 1.], [0., 1.], '--', label="45 degree line")
ax.set_xlabel("$k_t$")
ax.set_ylabel("$k_{t+1}$")
ax.legend()
plt.show()
The above graph plots 𝑘𝑡+1 = 𝑘𝑡 + 𝑢𝑡 = 𝑘𝑡 − 𝐹 𝑥𝑡 as an affine (i.e., linear in 𝑘𝑡 plus a constant) function of 𝑘𝑡 for both
Markov states 𝑠𝑡 .
It also plots the 45 degree line.
Notice that the two 𝑠𝑡 -dependent closed loop functions that determine 𝑘𝑡+1 as functions of 𝑘𝑡 share the same rest point
(also called a fixed point) at 𝑘𝑡 = 0.5.
Evidently, the optimal decision rule in Markov state 2, in which the adjustment cost is lower, makes 𝑘𝑡+1 a flatter function
of 𝑘𝑡 in Markov state 2.
This happens because when 𝑘𝑡 is not at its fixed point, |𝑢𝑡,2 | > |𝑢𝑡,2 |, so that the decision-maker adjusts toward the fixed
point faster when the Markov state 𝑠𝑡 takes a value that makes it cheaper.
Now we’ll depart from the preceding transition matrix that made the Markov state be strictly periodic.
We’ll begin with symmetric transition matrices of the form
1−𝜆 𝜆
Π2 = [ ].
𝜆 1−𝜆
λ = 0.8 # high λ
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
[[ 0.74434525, -0.37217263]]])
λ = 0.2 # low λ
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
[[ 0.72818728, -0.36409364]]])
for i, λ in enumerate(λ_vals):
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
ax.set_xlabel(r"$\lambda$")
ax.set_ylabel("$F_{s_t}$")
ax.set_title(f"Coefficient on {state_var}")
ax.legend()
plt.show()
Notice how the decision rules’ constants and slopes behave as functions of 𝜆.
Evidently, as the Markov chain becomes more nearly periodic (i.e., as 𝜆 → 1), the dynamic program adjusts capital faster
in the low adjustment cost Markov state to take advantage of what is only temporarily a more favorable time to invest.
Now let’s study situations in which the Markov transition matrix Π is asymmetric
1−𝜆 𝜆
Π3 = [ ].
𝛿 1−𝛿
λ, δ = 0.8, 0.2
Π3 = np.array([[1-λ, λ],
[δ, 1-δ]])
[[ 0.72749075, -0.36374537]]])
for i, λ in enumerate(λ_vals):
λ_grid[i, :] = λ
δ_grid[i, :] = δ_vals
for j, δ in enumerate(δ_vals):
Π3 = np.array([[1-λ, λ],
[δ, 1-δ]])
The following code defines a wrapper function that computes optimal decision rules for cases with different Markov
transition matrices
# Symmetric Π
# Notice that pure periodic transition is a special case
# when λ=1
print("symmetric Π case:\n")
λ_vals = np.linspace(0., 1., 10)
F1 = np.empty((λ_vals.size, len(state_vec)))
F2 = np.empty((λ_vals.size, len(state_vec)))
for i, λ in enumerate(λ_vals):
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
ax.set_xlabel(r"$\lambda$")
ax.set_ylabel(r"$F(\overline{s}_t)$")
ax.set_title(f"coefficient on {state_var}")
ax.legend()
plt.show()
ax.set_xlabel(r"$\lambda$")
ax.set_ylabel("$k$")
ax.set_title("Optimal k levels and k targets")
ax.text(0.5, min(k_star)+(max(k_star)-min(k_star))/20, r"$\lambda=0.5$")
(continues on next page)
# Asymmetric Π
print("asymmetric Π case:\n")
δ_vals = np.linspace(0., 1., 10)
for i, λ in enumerate(λ_vals):
λ_grid[i, :] = λ
δ_grid[i, :] = δ_vals
for j, δ in enumerate(δ_vals):
Π3 = np.array([[1-λ, λ],
[δ, 1-δ]])
To illustrate the code with another example, we shall set 𝑓2,𝑠𝑡 and 𝑑𝑠𝑡 as constant functions and
Thus, the sole role of the Markov jump state 𝑠𝑡 is to identify times in which capital is very productive and other times in
which it is less productive.
The example below reveals much about the structure of the optimum problem and optimal policies.
Only 𝑓1,𝑠𝑡 varies with 𝑠𝑡 .
𝑓1,𝑠𝑡
So there are different 𝑠𝑡 -dependent optimal static 𝑘 level in different states 𝑘𝑠∗𝑡 = 2𝑓2,𝑠𝑡 , values of 𝑘 that maximize
one-period payoff functions in each state.
We denote a target 𝑘 level as 𝑘𝑠𝑡𝑎𝑟𝑔𝑒𝑡
𝑡
, the fixed point of the optimal policies in each state, given the value of 𝜆.
We call 𝑘𝑠𝑡𝑎𝑟𝑔𝑒𝑡
𝑡
a “target” because in each Markov state 𝑠𝑡 , optimal policies are contraction mappings and will push 𝑘𝑡
towards a fixed point 𝑘𝑠𝑡𝑎𝑟𝑔𝑒𝑡
𝑡
.
When 𝜆 → 0, each Markov state becomes close to absorbing state and consequently 𝑘𝑠𝑡𝑎𝑟𝑔𝑒𝑡
𝑡
→ 𝑘𝑠∗𝑡 .
But when 𝜆 → 1, the Markov transition matrix becomes more nearly periodic, so the optimum decision rules target more
at the optimal 𝑘 level in the other state in order to enjoy higher expected payoff in the next period.
The switch happens at 𝜆 = 0.5 when both states are equally likely to be reached.
Below we plot an additional figure that shows optimal 𝑘 levels in the two states Markov jump state and also how the
targeted 𝑘 levels change as 𝜆 changes.
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
8.6 Example 2
We now add to the example 1 setup another state variable 𝑤𝑡 that follows the evolution law
We think of 𝑤𝑡 as a rental rate or tax rate that the decision maker pays each period for 𝑘𝑡 .
To capture this idea, we add to the decision-maker’s one-period payoff function the product of 𝑤𝑡 and 𝑘𝑡
𝑘𝑡
We now let the continuous part of the state at time 𝑡 be 𝑥𝑡 = ⎡ ⎤
⎢ 1 ⎥ and continue to set the control 𝑢𝑡 = 𝑘𝑡+1 − 𝑘𝑡 .
⎣𝑤𝑡 ⎦
We can write the one-period payoff function 𝑟 (𝑠𝑡 , 𝑘𝑡 , 𝑤𝑡 ) as
2
𝑟 (𝑠𝑡 , 𝑘𝑡 , 𝑤𝑡 ) = 𝑓1 (𝑠𝑡 ) 𝑘𝑡 − 𝑓2 (𝑠𝑡 ) 𝑘𝑡2 − 𝑑 (𝑠𝑡 ) (𝑘𝑡+1 − 𝑘𝑡 ) − 𝑤𝑡 𝑘𝑡
⎛ ⎞
⎜
⎜ 𝑓2 (𝑠𝑡 ) − 𝑓1 (𝑠 2
𝑡) 1
2
⎟
⎟
⎜ ′ ⎡ 𝑓 (𝑠 ) ⎤ 2⎟
= −⎜
⎜𝑥 𝑡 ⎢− 1
2
𝑡
0 0 ⎥𝑥𝑡 + 𝑑
⏟ (𝑠 𝑡 ) 𝑢𝑡 ⎟
⎟ ,
⎜
⎜ ⏟⏟ 1
0 0 ≡𝑄(𝑠𝑡 ) ⎟
⎟
⎣ ⏟ 2⏟⏟⏟⏟⏟⏟⏟⏟ ⎦
⎝ ≡𝑅(𝑠𝑡 ) ⎠
m = len(f1_vals)
n, k, j = 3, 1, 1
Rs = np.zeros((m, n, n))
Qs = np.zeros((m, k, k))
As = np.zeros((m, n, n))
Bs = np.zeros((m, n, k))
Cs = np.zeros((m, n, j))
for i in range(m):
Rs[i, 0, 0] = f2_vals[i]
Rs[i, 1, 0] = - f1_vals[i] / 2
(continues on next page)
Qs[i, 0, 0] = d_vals[i]
As[i, 0, 0] = 1
As[i, 1, 1] = 1
As[i, 2, 1] = α0_vals[i]
As[i, 2, 2] = ρ_vals[i]
Ns = None
k_star = None
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
symmetric Π case:
asymmetric Π case:
The following lectures describe how Markov jump linear quadratic dynamic programming can be used to extend the
[Barro, 1979] model of optimal tax-smoothing and government debt in several interesting directions
1. How to Pay for a War: Part 1
2. How to Pay for a War: Part 2
3. How to Pay for a War: Part 3
NINE
9.1 Overview
This lecture uses the method of Markov jump linear quadratic dynamic programming that is described in lecture
Markov Jump LQ dynamic programming to extend the [Barro, 1979] model of optimal tax-smoothing and government
debt in a particular direction.
This lecture has two sequels that offer further extensions of the Barro model
1. How to Pay for a War: Part 2
2. How to Pay for a War: Part 3
The extensions are modified versions of his 1979 model suggested by [Barro, 1999] and [Barro and McCleary, 2003]).
[Barro, 1979] m is about a government that borrows and lends in order to minimize an intertemporal measure of distortions
caused by taxes.
Technical tractability induced [Barro, 1979] to assume that
• the government trades only one-period risk-free debt, and
• the one-period risk-free interest rate is constant
By using Markov jump linear quadratic dynamic programming we can allow interest rates to move over time in empirically
interesting ways.
Also, by expanding the dimension of the state, we can add a maturity composition decision to the government’s problem.
By doing these two things we extend [Barro, 1979] along lines he suggested in [Barro, 1999] and [Barro and McCleary,
2003]).
[Barro, 1979] assumed
• that a government faces an exogenous sequence of expenditures that it must finance by a tax collection sequence
whose expected present value equals the initial debt it owes plus the expected present value of those expenditures.
∞
• that the government wants to minimize a measure of tax distortions that is proportional to 𝐸0 ∑𝑡=0 𝛽 𝑡 𝑇𝑡2 , where
𝑇𝑡 are total tax collections and 𝐸0 is a mathematical expectation conditioned on time 0 information.
• that the government trades only one asset, a risk-free one-period bond.
• that the gross interest rate on the one-period bond is constant and equal to 𝛽 −1 , the reciprocal of the factor 𝛽 at
which the government discounts future tax distortions.
Barro’s model can be mapped into a discounted linear quadratic dynamic programming problem.
Partly inspired by [Barro, 1999] and [Barro and McCleary, 2003], our generalizations of [Barro, 1979], assume
• that the government borrows or saves in the form of risk-free bonds of maturities 1, 2, … , 𝐻.
197
Advanced Quantitative Economics with Python
• that interest rates on those bonds are time-varying and in particular, governed by a jointly stationary stochastic
process.
Our generalizations are designed to fit within a generalization of an ordinary linear quadratic dynamic programming
problem in which matrices that define the quadratic objective function and the state transition function are time-varying
and stochastic.
This generalization, known as a Markov jump linear quadratic dynamic program, combines
• the computational simplicity of linear quadratic dynamic programming, and
• the ability of finite state Markov chains to represent interesting patterns of random variation.
We want the stochastic time variation in the matrices defining the dynamic programming problem to represent variation
over time in
• interest rates
• default rates
• roll over risks
As described in Markov Jump LQ dynamic programming, the idea underlying Markov jump linear quadratic dynamic
programming is to replace the constant matrices defining a linear quadratic dynamic programming problem with
matrices that are fixed functions of an 𝑁 state Markov chain.
For infinite horizon problems, this leads to 𝑁 interrelated matrix Riccati equations that pin down 𝑁 value functions and
𝑁 linear decision rules, applying to the 𝑁 Markov states.
import quantecon as qe
import numpy as np
import matplotlib.pyplot as plt
We begin by solving a version of [Barro, 1979] by mapping it into the original LQ framework.
As mentioned in this lecture, the Barro model is mathematically isomorphic with the LQ permanent income model.
Let
• 𝑇𝑡 denote tax collections
• 𝛽 be a discount factor
• 𝑏𝑡,𝑡+1 be time 𝑡 + 1 goods that at 𝑡 the government promises to deliver to time 𝑡 buyers of one-period government
bonds
• 𝐺𝑡 be government purchases
• 𝑝𝑡,𝑡+1 the number of time 𝑡 goods received per time 𝑡 + 1 goods promised to one-period bond purchasers.
Evidently, 𝑝𝑡,𝑡+1 is inversely related to appropriate corresponding gross interest rates on government debt.
In the spirit of [Barro, 1979], the stochastic process of government expenditures is exogenous.
The government’s problem is to choose a plan for taxation and borrowing {𝑏𝑡+1 , 𝑇𝑡 }∞
𝑡=0 to minimize
∞
𝐸0 ∑ 𝛽 𝑡 𝑇𝑡2
𝑡=0
𝐺𝑡 = 𝑈𝑔 𝑧𝑡
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1
where 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼)
The variables 𝑇𝑡 , 𝑏𝑡,𝑡+1 are control variables chosen at 𝑡, while 𝑏𝑡−1,𝑡 is an endogenous state variable inherited from the
past at time 𝑡 and 𝑝𝑡,𝑡+1 is an exogenous state variable at time 𝑡.
To begin, we assume that 𝑝𝑡,𝑡+1 is constant (and equal to 𝛽)
• later we will extend the model to allow 𝑝𝑡,𝑡+1 to vary over time
𝑏𝑡−1,𝑡
To map into the LQ framework, we use 𝑥𝑡 = [ ] as the state vector, and 𝑢𝑡 = 𝑏𝑡,𝑡+1 as the control variable.
𝑧𝑡
Therefore, the (𝐴, 𝐵, 𝐶) matrices are defined by the state-transition law:
0 0 1 0
𝑥𝑡+1 = [ ] 𝑥 + [ ] 𝑢𝑡 + [ ] 𝑤𝑡+1
0 𝐴22 𝑡 0 𝐶2
To find the appropriate (𝑅, 𝑄, 𝑊 ) matrices, we note that 𝐺𝑡 and 𝑏𝑡−1,𝑡 can be written as appropriately defined functions
of the current state:
𝐺𝑡 = 𝑆𝐺 𝑥𝑡 , 𝑏𝑡−1,𝑡 = 𝑆1 𝑥𝑡
If we define 𝑀𝑡 = −𝑝𝑡,𝑡+1 , and let 𝑆 = 𝑆𝐺 + 𝑆1 , then we can write taxation as a function of the states and control using
the government’s budget constraint:
𝑇𝑡 = 𝑆𝑥𝑡 + 𝑀𝑡 𝑢𝑡
1
To do this, we set 𝑧𝑡 = [ ], and consequently:
𝐺𝑡
1 0 0
𝐴22 = [ ̄ ] , 𝐶2 = [ ]
𝐺 𝜌 𝜎
# Model parameters
β, Gbar, ρ, σ = 0.95, 5, 0.8, 1
C2 = np.array([[0],
[σ]])
Ug = np.array([[0, 1]])
# LQ framework matrices
A_t = np.zeros((1, 3))
A_b = np.hstack((np.zeros((2, 1)), A22))
A = np.vstack((A_t, A_b))
B = np.zeros((3, 1))
B[0, 0] = 1
M = np.array([[-β]])
R = S.T @ S
Q = M.T @ M
W = M.T @ S
We can see the isomorphism by noting that consumption is a martingale in the permanent income model and that taxation
is a martingale in Barro’s model.
We can check this using the 𝐹 matrix of the LQ model.
Because 𝑢𝑡 = −𝐹 𝑥𝑡 , we have
𝑇𝑡 = 𝑆𝑥𝑡 + 𝑀 𝑢𝑡 = (𝑆 − 𝑀 𝐹 )𝑥𝑡
and
(𝑆 − 𝑀 𝐹 )(𝐴 − 𝐵𝐹 ) = (𝑆 − 𝑀 𝐹 ),
S - M @ F, (S - M @ F) @ (A - B @ F)
This explains the fanning out of the conditional empirical distribution of taxation across time, computed by simulating
the Barro model many times and averaging over simulated paths:
T = 500
for i in range(250):
x, u, w = LQBarro.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), ((S - M @ F) @ x)[0, :])
plt.xlabel('Time')
plt.ylabel('Taxation')
plt.show()
We can see a similar, but a smoother pattern, if we plot government debt over time.
T = 500
for i in range(250):
x, u, w = LQBarro.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), x[0, :])
plt.xlabel('Time')
plt.ylabel('Govt Debt')
plt.show()
To implement the extension to the Barro model in which 𝑝𝑡,𝑡+1 varies over time, we must allow the M matrix to be
time-varying.
Our 𝑄 and 𝑊 matrices must also vary over time.
We can solve such a model using the LQMarkov class that solves Markov jump linear quandratic control problems as
described above.
The code for the class can be viewed here.
The class takes lists of matrices that corresponds to 𝑁 Markov states.
The value and policy functions are then found by iterating on a coupled system of matrix Riccati difference equations.
Optimal 𝑃𝑠 , 𝐹𝑠 , 𝑑𝑠 are stored as attributes.
The class also contains a method that simulates a model.
9.4. Python Class to Solve Markov Jump Linear Quadratic Control Problems 203
Advanced Quantitative Economics with Python
We can use the above class to implement a version of the Barro model with a time-varying interest rate.
A simple way to extend the model is to allow the interest rate to take two possible values.
We set:
1
𝑝𝑡,𝑡+1 = 𝛽 + 0.02 = 0.97
2
𝑝𝑡,𝑡+1 = 𝛽 − 0.017 = 0.933
Thus, the first Markov state has a low interest rate and the second Markov state has a high interest rate.
We must also specify a transition matrix for the Markov state.
We use:
0.8 0.2
Π=[ ]
0.2 0.8
Here, each Markov state is persistent, and there is are equal chances of moving from one state to the other.
The choice of parameters means that the unconditional expectation of 𝑝𝑡,𝑡+1 is 0.9515, higher than 𝛽(= 0.95).
If we were to set 𝑝𝑡,𝑡+1 = 0.9515 in the version of the model with a constant interest rate, government debt would
explode.
As = [A, A]
Bs = [B, B]
Cs = [C, C]
Rs = [R, R]
M1 = np.array([[-β - 0.02]])
M2 = np.array([[-β + 0.017]])
Q1 = M1.T @ M1
Q2 = M2.T @ M2
Qs = [Q1, Q2]
W1 = M1.T @ S
W2 = M2.T @ S
Ws = [W1, W2]
lqm.Fs[0]
lqm.Fs[1]
Simulating a large number of such economies over time reveals interesting dynamics.
Debt tends to stay low and stable but recurrently surges.
T = 2000
x0 = np.array([[1000, 1, 25]])
for i in range(250):
x, u, w, s = lqm.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), x[0, :])
plt.xlabel('Time')
plt.ylabel('Govt Debt')
plt.show()
TEN
10.1 Overview
This lecture presents another application of Markov jump linear quadratic dynamic programming and constitutes a sequel
to an earlier lecture.
We use a method introduced in lecture Markov Jump LQ dynamic programming toimplement suggestions by [Barro, 1999]
and [Barro and McCleary, 2003]) for extending his classic 1979 model of tax smoothing.
[Barro, 1979] model is about a government that borrows and lends in order to help it minimize an intertemporal measure
of distortions caused by taxes.
Technically, [Barro, 1979] model looks a lot like a consumption-smoothing model.
Our generalizations of [Barro, 1979] will also look like souped-up consumption-smoothing models.
Wanting tractability induced [Barro, 1979] to assume that
• the government trades only one-period risk-free debt, and
• the one-period risk-free interest rate is constant
In our earlier lecture, we relaxed the second of these assumptions but not the first.
In particular, we used Markov jump linear quadratic dynamic programming to allow the exogenous interest rate to vary
over time.
In this lecture, we add a maturity composition decision to the government’s problem by expanding the dimension of the
state.
We assume
• that the government borrows or saves in the form of risk-free bonds of maturities 1, 2, … , 𝐻.
• that interest rates on those bonds are time-varying and in particular are governed by a jointly stationary stochastic
process.
In addition to what’s in Anaconda, this lecture deploys the quantecon library:
import quantecon as qe
import numpy as np
import matplotlib.pyplot as plt
207
Advanced Quantitative Economics with Python
Let
• 𝑇𝑡 denote tax collections
• 𝛽 be a discount factor
• 𝑏𝑡,𝑡+1 be time 𝑡 + 1 goods that the government promises to pay at 𝑡
• 𝑏𝑡,𝑡+2 betime 𝑡 + 2 goods that the government promises to pay at time 𝑡
• 𝐺𝑡 be government purchases
• 𝑝𝑡,𝑡+1 be the number of time 𝑡 goods received per time 𝑡 + 1 goods promised
• 𝑝𝑡,𝑡+2 be the number of time 𝑡 goods received per time 𝑡 + 2 goods promised.
Evidently, 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 are inversely related to appropriate corresponding gross interest rates on government debt.
In the spirit of [Barro, 1979], government expenditures are governed by an exogenous stochastic process.
Given initial conditions 𝑏−2,0 , 𝑏−1,0 , 𝑧0 , 𝑖0 , where 𝑖0 is the initial Markov state, the government chooses a contingency
plan for {𝑏𝑡,𝑡+1 , 𝑏𝑡,𝑡+2 , 𝑇𝑡 }∞
𝑡=0 to maximize.
∞
−𝐸0 ∑ 𝛽 𝑡 [𝑇𝑡2 + 𝑐1 (𝑏𝑡,𝑡+1 − 𝑏𝑡,𝑡+2 )2 ]
𝑡=0
Here
• 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼) and Π𝑖𝑗 is the probability that the Markov state moves from state 𝑖 to state 𝑗 in one period
• 𝑇𝑡 , 𝑏𝑡,𝑡+1 , 𝑏𝑡,𝑡+2 are control variables chosen at time 𝑡
• variables 𝑏𝑡−1,𝑡 , 𝑏𝑡−2,𝑡 are endogenous state variables inherited from the past at time 𝑡
• 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 are exogenous state variables at time 𝑡
The parameter 𝑐1 imposes a penalty on the government’s issuing different quantities of one and two-period debt.
This penalty deters the government from taking large “long-short” positions in debt of different maturities.
An example below will show the penalty in action.
As well as extending the model to allow for a maturity decision for government debt, we can also in principle allow the
matrices 𝑈𝑔,𝑠𝑡 , 𝐴22,𝑠𝑡 , 𝐶2,𝑠𝑡 to depend on the Markov state 𝑠𝑡 .
Below, we will often adopt the convention that for matrices appearing in a linear state space, 𝐴𝑡 ≡ 𝐴𝑠𝑡 , 𝐶𝑡 ≡ 𝐶𝑠𝑡 and
so on, so that dependence on 𝑡 is always intermediated through the Markov state 𝑠𝑡 .
First, define
𝑏̂𝑡
𝑏̄𝑡 = [ ]
𝑏𝑡−1,𝑡+1
𝑏̄
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
𝑏̂ 0 1 𝑏̂𝑡 1 0 𝑏𝑡,𝑡+1
[ 𝑡+1 ] = [ ][ ]+[ ][ ]
𝑏𝑡,𝑡+2 0 0 𝑏𝑡−1,𝑡+1 0 1 𝑏𝑡,𝑡+2
or
𝐺𝑡 = 𝑆𝐺,𝑡 𝑥𝑡 , 𝑏̂𝑡 = 𝑆1 𝑥𝑡
and
𝑀𝑡 = [−𝑝𝑡,𝑡+1 −𝑝𝑡,𝑡+2 ]
where 𝑝𝑡,𝑡+1 is the discount on one period loans in the discrete Markov state at time 𝑡 and 𝑝𝑡,𝑡+2 is the discount on
two-period loans in the discrete Markov state.
Define
𝑆𝑡 = 𝑆𝐺,𝑡 + 𝑆1
𝑇𝑡 = 𝑀𝑡 𝑢𝑡 + 𝑆𝑡 𝑥𝑡
It follows that
or
where
Because the payoff function also includes the penalty parameter on issuing debt of different maturities, we have:
1 −1
where 𝑄𝑐 = [ ].
−1 1
Therefore, the appropriate 𝑄 matrix in the Markov jump LQ problem is:
𝑄𝑐𝑡 = 𝑄𝑡 + 𝑐1 𝑄𝑐
where
𝐴11 0 𝐵1 0
𝐴𝑡 = [ ], 𝐵=[ ], 𝐶𝑡 = [ ]
0 𝐴22,𝑡 0 𝐶2,𝑡
Thus, in this problem all the matrices apart from 𝐵 may depend on the Markov state at time 𝑡.
As shown in the previous lecture, when provided with appropriate 𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 matrices for each Markov state the
LQMarkov class can solve Markov jump LQ problems.
The function below maps the primitive matrices and parameters from the above two-period model into the matrices that
the LQMarkov class requires:
"""
Function which takes A22, C2, Ug, p_{t, t+1}, p_{t, t+2} and penalty
parameter c1, and returns the required matrices for the LQMarkov
model: A, B, C, R, Q, W.
This version uses the condensed version of the endogenous state.
"""
B1 = np.eye(2)
# Create M matrix
M = np.hstack((-p1, -p2))
# Create A, B, C matrices
A_T = np.hstack((A11, np.zeros((2, nz))))
A_B = np.hstack((np.zeros((nz, 2)), A22))
A = np.vstack((A_T, A_B))
# Create R, Q, W matrices
R = S.T @ S
Q = M.T @ M + c1 * Qc
W = M.T @ S
return A, B, C, R, Q, W
With the above function, we can proceed to solve the model in two steps:
1. Use LQ_markov_mapping to map 𝑈𝑔,𝑡 , 𝐴22,𝑡 , 𝐶2,𝑡 , 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 into the 𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 matrices for each
of the 𝑛 Markov states.
2. Use the LQMarkov class to solve the resulting n-state Markov jump LQ problem.
To implement a simple example of the two-period model, we assume that 𝐺𝑡 follows an AR(1) process:
1
To do this, we set 𝑧𝑡 = [ ], and consequently:
𝐺𝑡
1 0 0
𝐴22 = [ ̄ ] , 𝐶2 = [ ] , 𝑈𝑔 = [0 1]
𝐺 𝜌 𝜎
We first solve the model with no penalty parameter on different issuance across maturities, i.e. 𝑐1 = 0.
We specify that the transition matrix for the Markov state is
0.9 0.1
Π=[ ]
0.1 0.9
Thus, each Markov state is persistent, and there is an equal chance of moving from one to the other.
# Model parameters
β, Gbar, ρ, σ, c1 = 0.95, 5, 0.8, 1, 0
p1, p2, p3, p4 = β, β**2 - 0.02, β, β**2 + 0.02
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping(A22, C_2, Ug, p1, p2, c1)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping(A22, C_2, Ug, p3, p4, c1)
Π = np.array([[0.9, 0.1],
[0.1, 0.9]])
The above simulations show that when no penalty is imposed on different issuances across maturities, the government has
an incentive to take large “long-short” positions in debt of different maturities.
To prevent such outcomes, we set 𝑐1 = 0.01.
This penalty is big enough to motivate the government to issue positive quantities of both one- and two-period debt:
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping(A22, C_2, Ug, p1, p2, c1)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping(A22, C_2, Ug, p3, p4, c1)
To map this into the Markov Jump LQ framework, we define state and control variables.
Let:
𝑏𝑡𝑡−1 𝑡
𝑏𝑡+1
⎡ 𝑏𝑡−1 ⎤ ⎡ 𝑏𝑡 ⎤
𝑏̄𝑡 = ⎢ 𝑡+1 ⎥ , 𝑢𝑡 = ⎢ 𝑡+2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑡−1 𝑡
𝑏
⎣ 𝑡+𝐻−1 ⎦ ⎣𝑏𝑡+𝐻 ⎦
Thus, 𝑏̄𝑡 is the endogenous state (debt issued last period) and 𝑢𝑡 is the control (debt issued today).
As before, we will also have the exogenous state 𝑧𝑡 , which determines government spending.
𝑏̄
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
We also define a vector 𝑝𝑡 that contains the time 𝑡 price of goods in period 𝑡 + 𝑗:
𝑝𝑡,𝑡+1
⎡𝑝 ⎤
𝑝𝑡 = ⎢ 𝑡,𝑡+2 ⎥
⎢ ⋮ ⎥
⎣𝑝𝑡,𝑡+𝐻 ⎦
𝑝𝑡,𝑡+1 1 0 0 ⋯ 0
⎡ 𝑝 ⎤ ⎡0 1 0 ⋯ 0⎤
⎢ 𝑡,𝑡+2 ⎥ = 𝑆𝑠 𝑝𝑡 where 𝑆𝑠 = ⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋱ ⎥
⎣𝑝𝑡,𝑡+𝐻−1 ⎦ ⎣0 0 ⋯ 1 0⎦
𝑡−1
𝑏𝑡+1 0 1 0 ⋯ 0
⎡ 𝑏𝑡−1 ⎤ ⎡0 0 1 ⋯ 0⎤
⎢ 𝑡+2 ̄
⎥ = 𝑆𝑥 𝑏𝑡 where 𝑆𝑥 = ⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋱ ⎥
𝑡−1
⎣𝑏𝑡+𝑇 −1 ⎦ ⎣0 0 ⋯ 0 1⎦
or
𝑇𝑡 = 𝑆𝑡 𝑥𝑡 − 𝑝𝑡′ 𝑢𝑡
Therefore
where
where to economize on notation we adopt the convention that for the linear state matrices 𝑅𝑡 ≡ 𝑅𝑠𝑡 , 𝑄𝑡 ≡ 𝑊𝑠𝑡 and so
on.
We’ll use this convention for the linear state matrices 𝐴, 𝐵, 𝑊 and so on below.
Because the payoff function also includes the penalty parameter for rescheduling, we have:
𝐻−1
𝑇𝑡2 + ∑ 𝑐2 (𝑏𝑡+𝑗
𝑡−1 𝑡
− 𝑏𝑡+𝑗+1 )2 = 𝑇𝑡2 + 𝑐2 (𝑏̄𝑡 − 𝑢𝑡 )′ (𝑏̄𝑡 − 𝑢𝑡 )
𝑗=0
Because the complete state is 𝑥𝑡 and not 𝑏̄𝑡 , we rewrite this as:
where 𝑆𝑐 = [𝐼 0]
Multiplying this out gives:
Therefore, with the cost term, we must amend our 𝑅, 𝑄, 𝑊 matrices as follows:
𝑅𝑡𝑐 = 𝑅𝑡 + 𝑐2 𝑆𝑐′ 𝑆𝑐
𝑄𝑐𝑡 = 𝑄𝑡 + 𝑐2 𝐼
𝑊𝑡𝑐 = 𝑊𝑡 − 𝑐2 𝑆𝑐
To finish mapping into the Markov jump LQ setup, we need to construct the law of motion for the full state.
This is simpler than in the previous setup, as we now have 𝑏̄𝑡+1 = 𝑢𝑡 .
Therefore:
𝑏̄𝑡+1
𝑥𝑡+1 ≡ [ ] = 𝐴𝑡 𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑡 𝑤𝑡+1
𝑧𝑡+1
where
0 0 𝐼 0
𝐴𝑡 = [ ], 𝐵 = [ ], 𝐶=[ ]
0 𝐴22,𝑡 0 𝐶2,𝑡
We can define a function that maps the primitives of the model with restructuring into the matrices required by the
LQMarkov class:
"""
Function which takes A22, C2, T, p_t, c and returns the
required matrices for the LQMarkov model: A, B, C, R, Q, W
Note, p_t should be a T by 1 matrix
(continues on next page)
# Create Sx, tSx, Ss, S_t matrices (tSx stands for \tilde S_x)
Ss = np.hstack((np.eye(T-1), np.zeros((T-1, 1))))
Sx = np.hstack((np.zeros((T-1, 1)), np.eye(T-1)))
tSx = np.zeros((1, T))
tSx[0, 0] = 1
# Create A, B, C matrices
A_T = np.hstack((np.zeros((T, T)), np.zeros((T, nz))))
A_B = np.hstack((np.zeros((nz, T)), A22))
A = np.vstack((A_T, A_B))
As an example let 𝐻 = 3.
Assume that there are two Markov states, one with a flatter yield curve, the other with a steeper yield curve.
In state 1, prices are:
1 1 1
𝑝𝑡,𝑡+1 = 0.9695 , 𝑝𝑡,𝑡+2 = 0.902 , 𝑝𝑡,𝑡+3 = 0.8369
We specify the same transition matrix and 𝐺𝑡 process that we used earlier.
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping_restruct(A22, C_2, Ug, H, p1, c2)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping_restruct(A22, C_2, Ug, H, p2, c2)
fig, ax = plt.subplots()
ax.plot((u[0, :] / (u[0, :] + u[1, :] + u[2, :])))
ax.set_title('One-period debt issuance share')
ax.set_xlabel('Time')
plt.show()
ELEVEN
11.1 Overview
This lecture presents another application of Markov jump linear quadratic dynamic programming and constitutes a sequel
to an earlier lecture.
We again use a method introduced in lecture Markov Jump LQ dynamic programming to implement some ideas of [Barro,
1999] and [Barro and McCleary, 2003]) that extend the classic [Barro, 1979] model of tax smoothing.
[Barro, 1979] is about a government that borrows and lends in order to help it minimize an intertemporal measure of
distortions caused by taxes.
Technically, [Barro, 1979] looks a lot like a consumption-smoothing model.
Our generalization will also look like a souped-up consumption-smoothing model.
In this lecture, we describe a tax-smoothing problem of a government that faces roll-over risk.
In addition to what’s in Anaconda, this lecture deploys the quantecon library:
import quantecon as qe
import numpy as np
import matplotlib.pyplot as plt
Let 𝑇𝑡 denote tax collections, 𝛽 a discount factor, 𝑏𝑡,𝑡+1 time 𝑡 + 1 goods that the government promises to pay at 𝑡, 𝐺𝑡
𝑡
government purchases, 𝑝𝑡+1 the number of time 𝑡 goods received per time 𝑡 + 1 goods promised.
The stochastic process of government expenditures is exogenous.
The government’s problem is to choose a plan for borrowing and tax collections {𝑏𝑡+1 , 𝑇𝑡 }∞
𝑡=0 to minimize
∞
𝐸0 ∑ 𝛽 𝑡 𝑇𝑡2
𝑡=0
221
Advanced Quantitative Economics with Python
𝐺𝑡 = 𝑈𝑔,𝑡 𝑧𝑡
in Markov state 2.
Consequently, in the second Markov state, the government is unable to borrow, and the budget constraint becomes 𝑇𝑡 =
𝐺𝑡 + 𝑏𝑡−1,𝑡 .
However, if this is the only adjustment we make in our linear-quadratic model, the government will not set 𝑏𝑡,𝑡+1 = 0,
which is the outcome we want to express roll-over risk in period 𝑡.
Instead, the government would have an incentive to set 𝑏𝑡,𝑡+1 to a large negative number in state 2 – it would accumulate
large amounts of assets to bring into period 𝑡 + 1 because that is cheap
• Riccati equations will tell us this
Thus, we must represent “roll-over risk” some other way.
To force the government to set 𝑏𝑡,𝑡+1 = 0, we can instead extend the model to have four Markov states:
1. Good today, good yesterday
2. Good today, bad yesterday
3. Bad today, good yesterday
4. Bad today, bad yesterday
where good is a state in which effectively the government can issue debt and bad is a state in which effectively the
government can’t issue debt.
We’ll explain what effectively means shortly.
We now set
𝑡
𝑝𝑡+1 =𝛽
in all states.
In addition – and this is important because it defines what we mean by effectively – we put a large penalty on the 𝑏𝑡−1,𝑡
element of the state vector in states 2 and 4.
This will prevent the government from wishing to issue any debt in states 3 or 4 because it would experience a large
penalty from doing so in the next period.
The transition matrix for this formulation is:
0.95 0 0.05 0
⎡0.95 0 0.05 0 ⎤
Π=⎢ ⎥
⎢ 0 0.9 0 0.1⎥
⎣ 0 0.9 0 0.1⎦
This transition matrix ensures that the Markov state cannot move, for example, from state 3 to state 1.
Because state 3 is “bad today”, the next period cannot have “good yesterday”.
# Model parameters
β, Gbar, ρ, σ = 0.95, 5, 0.8, 1
# LQ framework matrices
A_t = np.zeros((1, 3))
A_b = np.hstack((np.zeros((2, 1)), A22))
A = np.vstack((A_t, A_b))
B = np.zeros((3, 1))
B[0, 0] = 1
R = S.T @ S
M = np.array([[-β]])
(continues on next page)
Using the same process for 𝐺𝑡 as in this lecture, we shall simulate our model with roll-over risk.
𝑡
When 𝑝𝑡+1 = 𝛽 government debt fluctuates around zero.
The spikes in the tax collection series indicate periods when the government is unable to access financial markets:
• positive spikes occur when debt is positive and the government must urgently raise tax revenues now
Negative spikes occur when the government has positive asset holdings.
An inability to use financial markets in the next period means that the government uses those assets to lower taxation
today.
x0 = np.array([[0, 1, 25]])
T = 300
x, u, w, state = lqm.compute_sequence(x0, ts_length=T)
# Calculate taxation each period from the budget constraint and the Markov state
tax = np.zeros([T, 1])
for i in range(T):
tax[i, :] = S @ x[:, i] + M @ u[:, i]
We can adjust parameters so that, rather than debt fluctuating around zero, the government is a debtor in every period
that it can borrow.
𝑡
To accomplish this, we simply raise 𝑝𝑡+1 to 𝛽 + 0.02 = 0.97.
M = np.array([[-β - 0.02]])
Q = M.T @ M
W = M.T @ S
# Calculate taxation each period from the budget constraint and the
# Markov state
tax = np.zeros([T, 1])
for i in range(T):
tax[i, :] = S @ x[:, i] + M @ u[:, i]
With a lower interest rate, the government has an incentive to increase debt over time.
However, with “roll-over risk”, debt is recurrently reset to zero and tax collections spike up.
In this model, high costs of a “sudden stop” make the government wary about letting its debt get too high.
TWELVE
In addition to what’s in Anaconda, this lecture will need the following libraries:
12.1 Overview
import sys
import numpy as np
import matplotlib.pyplot as plt
(continues on next page)
227
Advanced Quantitative Economics with Python
We begin by outlining the key assumptions regarding technology, households and the government sector.
12.2.1 Technology
12.2.2 Households
Consider a representative household who chooses a path {ℓ𝑡 , 𝑐𝑡 } for labor and consumption to maximize
1 ∞
−𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] (12.1)
2 𝑡=0
Here
• 𝛽 is a discount factor in (0, 1).
• 𝑝𝑡0 is a scaled Arrow-Debreu price at time 0 of history contingent goods at time 𝑡 + 𝑗.
• 𝑏𝑡 is a stochastic preference parameter.
• 𝑑𝑡 is an endowment process.
• 𝜏𝑡 is a flat tax rate on labor income.
𝛽 𝑡 𝑝𝑡0
𝜋𝑡0 (𝑥𝑡 )
Thus, our scaled Arrow-Debreu price is the ordinary Arrow-Debreu price multiplied by the discount factor 𝛽 𝑡 and divided
by an appropriate probability.
The budget constraint (12.2) requires that the present value of consumption be restricted to equal the present value of
endowments, labor income and coupon payments on bond holdings.
12.2.3 Government
The government imposes a linear tax on labor income, fully committing to a stochastic path of tax rates at time zero.
The government also issues state-contingent debt.
Given government tax and borrowing plans, we can construct a competitive equilibrium with distorting government taxes.
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare of the representative
consumer.
Endowments, government expenditure, the preference shock process 𝑏𝑡 , and promised coupon payments on initial gov-
ernment debt 𝑠𝑡 are all exogenous, and given by
• 𝑑𝑡 = 𝑆𝑑 𝑥𝑡
• 𝑔𝑡 = 𝑆𝑔 𝑥𝑡
• 𝑏𝑡 = 𝑆𝑏 𝑥𝑡
• 𝑠𝑡 = 𝑆𝑠 𝑥𝑡
The matrices 𝑆𝑑 , 𝑆𝑔 , 𝑆𝑏 , 𝑆𝑠 are primitives and {𝑥𝑡 } is an exogenous stochastic process taking values in ℝ𝑘 .
We consider two specifications for {𝑥𝑡 }.
1. Discrete case: {𝑥𝑡 } is a discrete state Markov chain with transition matrix 𝑃 .
2. VAR case: {𝑥𝑡 } obeys 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 where {𝑤𝑡 } is independent zero-mean Gaussian with identify
covariance matrix.
12.2.5 Feasibility
𝑐𝑡 + 𝑔𝑡 = 𝑑𝑡 + ℓ𝑡 (12.3)
Where 𝑝𝑡0 is again a scaled Arrow-Debreu price, the time zero government budget constraint is
∞
𝔼 ∑ 𝛽 𝑡 𝑝𝑡0 (𝑠𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 ) = 0 (12.4)
𝑡=0
12.2.7 Equilibrium
An equilibrium is a feasible allocation {ℓ𝑡 , 𝑐𝑡 }, a sequence of prices {𝑝𝑡0 }, and a tax system {𝜏𝑡 } such that
1. The allocation {ℓ𝑡 , 𝑐𝑡 } is optimal for the household given {𝑝𝑡0 } and {𝜏𝑡 }.
2. The government’s budget constraint (12.4) is satisfied.
The Ramsey problem is to choose the equilibrium {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } that maximizes the household’s welfare.
If {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } solves the Ramsey problem, then {𝜏𝑡 } is called the Ramsey plan.
The solution procedure we adopt is
1. Use the first-order conditions from the household problem to pin down prices and allocations given {𝜏𝑡 }.
2. Use these expressions to rewrite the government budget constraint (12.4) in terms of exogenous variables and
allocations.
3. Maximize the household’s objective function (12.1) subject to the constraint constructed in step 2 and the feasibility
constraint (12.3).
The solution to this maximization problem pins down all quantities of interest.
12.2.8 Solution
Step one is to obtain the first-conditions for the household’s problem, taking taxes and prices as given.
Letting 𝜇 be the Lagrange multiplier on (12.2), the first-order conditions are 𝑝𝑡0 = (𝑐𝑡 − 𝑏𝑡 )/𝜇 and ℓ𝑡 = (𝑐𝑡 − 𝑏𝑡 )(1 − 𝜏𝑡 ).
Rearranging and normalizing at 𝜇 = 𝑏0 − 𝑐0 , we can write these conditions as
𝑏𝑡 − 𝑐 𝑡 ℓ𝑡
𝑝𝑡0 = and 𝜏𝑡 = 1 − (12.5)
𝑏0 − 𝑐 0 𝑏𝑡 − 𝑐 𝑡
The Ramsey problem now amounts to maximizing (12.1) subject to (12.6) and (12.3).
The associated Lagrangian is
∞
1
ℒ = 𝔼 ∑ 𝛽 𝑡 {− [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] + 𝜆 [(𝑏𝑡 − 𝑐𝑡 )(ℓ𝑡 − 𝑠𝑡 − 𝑔𝑡 ) − ℓ𝑡2 ] + 𝜇𝑡 [𝑑𝑡 + ℓ𝑡 − 𝑐𝑡 − 𝑔𝑡 ]} (12.7)
𝑡=0
2
and
ℓ𝑡 − 𝜆[(𝑏𝑡 − 𝑐𝑡 ) − 2ℓ𝑡 ] = 𝜇𝑡
Combining these last two equalities with (12.3) and working through the algebra, one can show that
where
• 𝜈 ∶= 𝜆/(1 + 2𝜆)
• ℓ𝑡̄ ∶= (𝑏𝑡 − 𝑑𝑡 + 𝑔𝑡 )/2
• 𝑐𝑡̄ ∶= (𝑏𝑡 + 𝑑𝑡 − 𝑔𝑡 )/2
• 𝑚𝑡 ∶= (𝑏𝑡 − 𝑑𝑡 − 𝑠𝑡 )/2
Apart from 𝜈, all of these quantities are expressed in terms of exogenous variables.
To solve for 𝜈, we can use the government’s budget constraint again.
The term inside the brackets in (12.6) is (𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 ) − (𝑏𝑡 − 𝑐𝑡 )ℓ𝑡 + ℓ𝑡2 .
Using (12.8), the definitions above and the fact that ℓ ̄ = 𝑏 − 𝑐,̄ this term can be rewritten as
𝑏0 + 𝑎0 (𝜈 2 − 𝜈) = 0
for 𝜈.
Provided that 4𝑏0 < 𝑎0 , there is a unique solution 𝜈 ∈ (0, 1/2), and a unique corresponding 𝜆 > 0.
Let’s work out how to compute mathematical expectations in (12.10).
For the first one, the random variable (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) inside the summation can be expressed as
1 ′
𝑥 (𝑆 − 𝑆𝑑 + 𝑆𝑔 )′ (𝑆𝑔 + 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
For the second expectation in (12.10), the random variable 2𝑚2𝑡 can be written as
1 ′
𝑥 (𝑆 − 𝑆𝑑 − 𝑆𝑠 )′ (𝑆𝑏 − 𝑆𝑑 − 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
It follows that both objects of interest are special cases of the expression
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 𝑥′𝑡 𝐻𝑥𝑡 (12.11)
𝑡=0
Next, suppose that {𝑥𝑡 } is the discrete Markov process described above.
Suppose further that each 𝑥𝑡 takes values in the state space {𝑥1 , … , 𝑥𝑁 } ⊂ ℝ𝑘 .
Let ℎ ∶ ℝ𝑘 → ℝ be a given function, and suppose that we wish to evaluate
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 ℎ(𝑥𝑡 ) given 𝑥0 = 𝑥𝑗
𝑡=0
Here
• 𝑃 𝑡 is the 𝑡-th power of the transition matrix 𝑃 .
• ℎ is, with some abuse of notation, the vector (ℎ(𝑥1 ), … , ℎ(𝑥𝑁 )).
• (𝑃 𝑡 ℎ)[𝑗] indicates the 𝑗-th element of 𝑃 𝑡 ℎ.
It can be shown that (12.12) is in fact equal to the 𝑗-th element of the vector (𝐼 − 𝛽𝑃 )−1 ℎ.
This last fact is applied in the calculations below.
We are interested in tracking several other variables besides the ones described above.
To prepare the way for this, we define
𝑡
𝑏𝑡+𝑗 − 𝑐𝑡+𝑗
𝑝𝑡+𝑗 =
𝑏𝑡 − 𝑐 𝑡
as the scaled Arrow-Debreu time 𝑡 price of a history contingent claim on one unit of consumption at time 𝑡 + 𝑗.
These are prices that would prevail at time 𝑡 if markets were reopened at time 𝑡.
These prices are constituents of the present value of government obligations outstanding at time 𝑡, which can be expressed
as
∞
𝐵𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑝𝑡+𝑗
𝑡
(𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) (12.13)
𝑗=0
Using our expression for prices and the Ramsey plan, we can also write 𝐵𝑡 as
∞ 2
(𝑏𝑡+𝑗 − 𝑐𝑡+𝑗 )(ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) − ℓ𝑡+𝑗
𝐵𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗
𝑗=0
𝑏𝑡 − 𝑐 𝑡
and
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 (12.14)
Define
𝑅𝑡−1 ∶= 𝔼𝑡 𝛽 𝑗 𝑝𝑡+1
𝑡
(12.15)
12.2.12 A Martingale
where 𝐸𝑡̃ is the conditional mathematical expectation taken with respect to a one-step transition density that has been
formed by multiplying the original transition density with the likelihood ratio
𝑡
𝑝𝑡+1
𝑚𝑡𝑡+1 = 𝑡
𝐸𝑡 𝑝𝑡+1
which asserts that {𝜋𝑡+1 } is a martingale difference sequence under the distorted probability measure, and that {Π𝑡 } is
a martingale under the distorted probability measure.
In the tax-smoothing model of Robert Barro [Barro, 1979], government debt is a random walk.
In the current model, government debt {𝐵𝑡 } is not a random walk, but the excess payoff {Π𝑡 } on it is.
12.3 Implementation
Parameters
===========
T: int
Length of the simulation
Returns
========
path: a namedtuple of type 'Path', containing
g - Govt spending
d - Endowment
b - Utility shift parameter
s - Coupon payment on existing debt
c - Consumption
l - Labor
p - Price
τ - Tax rate
rvn - Revenue
B - Govt debt
R - Risk-free gross return
π - One-period risk-free interest rate
Π - Cumulative rate of return, adjusted
ξ - Adjustment factor for Π
"""
# Simplify names
β, Sg, Sd, Sb, Ss = econ.β, econ.Sg, econ.Sd, econ.Sb, econ.Ss
if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc
return path
def gen_fig_1(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# Prepare axes
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10))
plt.subplots_adjust(hspace=0.4)
for i in range(num_rows):
for j in range(num_cols):
axes[i, j].grid()
axes[i, j].set_xlabel('Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
def gen_fig_2(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# Prepare axes
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
plt.show()
The function var_quadratic_sum imported from quadsums is for computing the value of (12.11) when the ex-
ogenous process {𝑥𝑡 } is of the VAR type described above.
Below the definition of the function, you will see definitions of two namedtuple objects, Economy and Path.
The first is used to collect all the parameters and primitives of a given LQ economy, while the second collects output of
the computations.
In Python, a namedtuple is a popular data type from the collections module of the standard library that replicates
the functionality of a tuple, but also allows you to assign a name to each tuple element.
These elements can then be references via dotted attribute notation — see for example the use of path in the functions
gen_fig_1() and gen_fig_2().
The benefits of using namedtuples:
• Keeps content organized by meaning.
• Helps reduce the number of global variables.
Other than that, our code is long but relatively straightforward.
12.4 Examples
# == Parameters == #
β = 1 / 1.05
ρ, mg = .7, .35
A = eye(2)
A[0, :] = ρ, mg * (1-ρ)
C = np.zeros((2, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 10
Sg = np.array((1, 0)).reshape(1, 2)
Sd = np.array((0, 0)).reshape(1, 2)
Sb = np.array((0, 2.135)).reshape(1, 2)
Ss = np.array((0, 0)).reshape(1, 2)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
gen_fig_2(path)
Our second example adopts a discrete Markov specification for the exogenous process
# == Parameters == #
β = 1 / 1.05
P = np.array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 1, 0, 0)).reshape(1, 5)
Ss = np.array((0, 0, 0, 1, 0)).reshape(1, 5)
T = 15
path = compute_paths(T, economy)
gen_fig_1(path)
↪extract a single element from your array before performing this operation.␣
gen_fig_2(path)
12.5 Exercises
Exercise 12.5.1
Modify the VAR example given above, setting
# == Parameters == #
β = 1 / 1.05
ρ, mg = .95, .35
A = np.array([[0, 0, 0, ρ, mg*(1-ρ)],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1]])
C = np.zeros((5, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 8
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
# Chosen st. (Sc + Sg) * x0 = 1
Sb = np.array((0, 0, 0, 0, 2.135)).reshape(1, 5)
Ss = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
gen_fig_2(path)
249
CHAPTER
THIRTEEN
In addition to what’s in Anaconda, this lecture will need the following libraries:
13.1 Overview
This lecture computes versions of Arellano’s [Arellano, 2008] model of sovereign default.
The model describes interactions among default risk, output, and an equilibrium interest rate that includes a premium for
endogenous default risk.
The decision maker is a government of a small open economy that borrows from risk-neutral foreign creditors.
The foreign lenders must be compensated for default risk.
The government borrows and lends abroad in order to smooth the consumption of its citizens.
The government repays its debt only if it wants to, but declining to pay has adverse consequences.
The interest rate on government debt adjusts in response to the state-dependent default probability chosen by government.
The model yields outcomes that help interpret sovereign default experiences, including
• countercyclical interest rates on sovereign debt
• countercyclical trade balances
• high volatility of consumption relative to output
Notably, long recessions caused by bad draws in the income process increase the government’s incentive to default.
This can lead to
• spikes in interest rates
• temporary losses of access to international credit markets
• large drops in output, consumption, and welfare
• large capital outflows during recessions
Such dynamics are consistent with experiences of many countries.
Let’s start with some imports:
251
Advanced Quantitative Economics with Python
13.2 Structure
A small open economy is endowed with an exogenous stochastically fluctuating potential output stream {𝑦𝑡 }.
Potential output is realized only in periods in which the government honors its sovereign debt.
The output good can be traded or consumed.
The sequence {𝑦𝑡 } is described by a Markov process with stochastic density kernel 𝑝(𝑦, 𝑦′ ).
Households within the country are identical and rank stochastic consumption streams according to
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (13.1)
𝑡=0
Here
• 0 < 𝛽 < 1 is a time discount factor
• 𝑢 is an increasing and strictly concave utility function
Consumption sequences enjoyed by households are affected by the government’s decision to borrow or lend internationally.
The government is benevolent in the sense that its aim is to maximize (13.1).
The government is the only domestic actor with access to foreign credit.
Because household are averse to consumption fluctuations, the government will try to smooth consumption by borrowing
from (and lending to) foreign creditors.
The only credit instrument available to the government is a one-period bond traded in international credit markets.
The bond market has the following features
• The bond matures in one period and is not state contingent.
• A purchase of a bond with face value 𝐵′ is a claim to 𝐵′ units of the consumption good next period.
• To purchase 𝐵′ next period costs 𝑞𝐵′ now, or, what is equivalent.
• For selling −𝐵′ units of next period goods the seller earns −𝑞𝐵′ of today’s goods.
– If 𝐵′ < 0, then −𝑞𝐵′ units of the good are received in the current period, for a promise to repay −𝐵′ units
next period.
– There is an equilibrium price function 𝑞(𝐵′ , 𝑦) that makes 𝑞 depend on both 𝐵′ and 𝑦.
Earnings on the government portfolio are distributed (or, if negative, taxed) lump sum to households.
When the government is not excluded from financial markets, the one-period national budget constraint is
Here and below, a prime denotes a next period value or a claim maturing next period.
To rule out Ponzi schemes, we also require that 𝐵 ≥ −𝑍 in every period.
• 𝑍 is chosen to be sufficiently large that the constraint never binds in equilibrium.
Foreign creditors
• are risk neutral
• know the domestic output stochastic process {𝑦𝑡 } and observe 𝑦𝑡 , 𝑦𝑡−1 , … , at time 𝑡
• can borrow or lend without limit in an international credit market at a constant international interest rate 𝑟
• receive full payment if the government chooses to pay
• receive zero if the government defaults on its one-period debt due
When a government is expected to default next period with probability 𝛿, the expected value of a promise to pay one unit
of consumption next period is 1 − 𝛿.
Therefore, the discounted expected value of a promise to pay 𝐵 next period is
1−𝛿
𝑞= (13.3)
1+𝑟
Next we turn to how the government in effect chooses the default probability 𝛿.
While in a state of default, the economy regains access to foreign credit in each subsequent period with probability 𝜃.
13.3 Equilibrium
Informally, an equilibrium is a sequence of interest rates on its sovereign debt, a stochastic sequence of government default
decisions and an implied flow of household consumption such that
1. Consumption and assets satisfy the national budget constraint.
2. The government maximizes household utility taking into account
• the resource constraint
• the effect of its choices on the price of bonds
• consequences of defaulting now for future net output and future borrowing and lending opportunities
3. The interest rate on the government’s debt includes a risk-premium sufficient to make foreign creditors expect on
average to earn the constant risk-free international interest rate.
To express these ideas more precisely, consider first the choices of the government, which
1. enters a period with initial assets 𝐵, or what is the same thing, initial debt to be repaid now of −𝐵
2. observes current output 𝑦, and
3. chooses either
1. to default, or
2. to pay −𝐵 and set next period’s debt due to −𝐵′
In a recursive formulation,
• state variables for the government comprise the pair (𝐵, 𝑦)
• 𝑣(𝐵, 𝑦) is the optimum value of the government’s problem when at the beginning of a period it faces the choice of
whether to honor or default
• 𝑣𝑐 (𝐵, 𝑦) is the value of choosing to pay obligations falling due
• 𝑣𝑑 (𝑦) is the value of choosing to default
𝑣𝑑 (𝑦) does not depend on 𝐵 because, when access to credit is eventually regained, net foreign assets equal 0.
Expressed recursively, the value of defaulting is
𝑣𝑐 (𝐵, 𝑦) = max
′
{𝑢(𝑦 − 𝑞(𝐵′ , 𝑦)𝐵′ + 𝐵) + 𝛽 ∫ 𝑣(𝐵′ , 𝑦′ )𝑝(𝑦, 𝑦′ )𝑑𝑦′ }
𝐵 ≥−𝑍
Given zero profits for foreign creditors in equilibrium, we can combine (13.3) and (13.4) to pin down the bond price
function:
1 − 𝛿(𝐵′ , 𝑦)
𝑞(𝐵′ , 𝑦) = (13.5)
1+𝑟
An equilibrium is
• a pricing function 𝑞(𝐵′ , 𝑦),
• a triple of value functions (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)),
• a decision rule telling the government when to default and when to pay as a function of the state (𝐵, 𝑦), and
• an asset accumulation rule that, conditional on choosing not to default, maps (𝐵, 𝑦) into 𝐵′
such that
• The three Bellman equations for (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)) are satisfied
• Given the price function 𝑞(𝐵′ , 𝑦), the default decision rule and the asset accumulation decision rule attain the
optimal value function 𝑣(𝐵, 𝑦), and
• The price function 𝑞(𝐵′ , 𝑦) satisfies equation (13.5)
13.4 Computation
class Arellano_Economy:
" Stores data and creates primitives for the Arellano economy. "
def __init__(self,
B_grid_size= 251, # Grid size for bonds
B_grid_min=-0.45, # Smallest B value
B_grid_max=0.45, # Largest B value
y_grid_size=51, # Grid size for income
β=0.953, # Time discount parameter
γ=2.0, # Utility parameter
r=0.017, # Lending rate
ρ=0.945, # Persistence in the income process
η=0.025, # Standard deviation of the income process
θ=0.282, # Prob of re-entering financial markets
def_y_param=0.969): # Parameter governing income in default
# Save parameters
self.β, self.γ, self.r, = β, γ, r
self.ρ, self.η, self.θ = ρ, η, θ
self.y_grid_size = y_grid_size
self.B_grid_size = B_grid_size
self.B_grid = np.linspace(B_grid_min, B_grid_max, B_grid_size)
mc = qe.markov.tauchen(y_grid_size, ρ, η, 0, 3)
self.y_grid, self.P = np.exp(mc.state_values), mc.P
def params(self):
return self.β, self.γ, self.r, self.ρ, self.η, self.θ
def arrays(self):
return self.P, self.y_grid, self.B_grid, self.def_y, self.B0_idx
Notice how the class returns the data it stores as simple numerical values and arrays via the methods params and
arrays.
We will use this data in the Numba-jitted functions defined below.
Jitted functions prefer simple arguments, since type inference is easier.
Here is the utility function.
@njit
def u(c, γ):
return c**(1-γ)/(1-γ)
Here is a function to compute the bond price at each state, given 𝑣𝑐 and 𝑣𝑑 .
@njit
def compute_q(v_c, v_d, q, params, arrays):
"""
Compute the bond price function q(b, y) at each (b, y) pair.
# Unpack
β, γ, r, ρ, η, θ = params
P, y_grid, B_grid, def_y, B0_idx = arrays
@njit
def T_d(y_idx, v_c, v_d, params, arrays):
"""
The RHS of the Bellman equation when income is at index y_idx and
the country has chosen to default. Returns an update of v_d.
"""
# Unpack
β, γ, r, ρ, η, θ = params
P, y_grid, B_grid, def_y, B0_idx = arrays
current_utility = u(def_y[y_idx], γ)
v = np.maximum(v_c[B0_idx, :], v_d)
cont_value = np.sum((θ * v + (1 - θ) * v_d) * P[y_idx, :])
@njit
def T_c(B_idx, y_idx, v_c, v_d, q, params, arrays):
"""
The RHS of the Bellman equation when the country is not in a
defaulted state on their debt. Returns a value that corresponds to
v_c[B_idx, y_idx], as well as the optimal level of bond sales B'.
"""
# Unpack
β, γ, r, ρ, η, θ = params
P, y_grid, B_grid, def_y, B0_idx = arrays
B = B_grid[B_idx]
y = y_grid[y_idx]
Here is a fast function that calls these operators in the right sequence.
@njit(parallel=True)
def update_values_and_prices(v_c, v_d, # Current guess of value functions
B_star, q, # Arrays to be written to
params, arrays):
# Unpack
β, γ, r, ρ, η, θ = params
P, y_grid, B_grid, def_y, B0_idx = arrays
y_grid_size = len(y_grid)
B_grid_size = len(B_grid)
# Allocate memory
new_v_c = np.empty_like(v_c)
new_v_d = np.empty_like(v_d)
We can now write a function that will use the Arellano_Economy class and the functions defined above to compute
the solution to our model.
We do not need to JIT compile this function since it only consists of outer loops (and JIT compiling makes almost zero
difference).
In fact, one of the jobs of this function is to take an instance of Arellano_Economy, which is hard for the JIT
compiler to handle, and strip it down to more basic objects, which are then passed out to jitted functions.
# Allocate memory
q = np.empty_like(v_c)
B_star = np.empty_like(v_c, dtype=int)
current_iter = 0
dist = np.inf
while (current_iter < max_iter) and (dist > tol):
if current_iter % 100 == 0:
print(f"Entering iteration {current_iter}.")
Finally, we write a function that will allow us to simulate the economy once we have the policy functions
"""
# Unpack elements of the model
B0_idx = model.B0_idx
y_grid = model.y_grid
B_grid, y_grid, P = model.B_grid, model.y_grid, model.P
# Perform simulation
t = 0
while t < T:
# if in default:
if v_c[B_idx, y_idx] < v_d[y_idx] or in_default:
y_a_sim[t] = model.def_y[y_idx]
d_sim[t] = 1
Bp_idx = B0_idx
# Re-enter financial markets next period with prob θ
in_default = False if np.random.rand() < model.θ else True
else:
y_a_sim[t] = y_sim[t]
d_sim[t] = 0
Bp_idx = B_star[B_idx, y_idx]
13.5 Results
The grid used to compute this figure was relatively fine (y_grid_size, B_grid_size = 51, 251), which
explains the minor differences between this and Arrelano’s figure.
The figure shows that
• Higher levels of debt (larger −𝐵′ ) induce larger discounts on the face value, which correspond to higher interest
rates.
• Lower income also causes more discounting, as foreign creditors anticipate greater likelihood of default.
The next figure plots value functions and replicates the right hand panel of Figure 4 of [Arellano, 2008].
We can use the results of the computation to study the default probability 𝛿(𝐵′ , 𝑦) defined in (13.4).
The next plot shows these default probabilities over (𝐵′ , 𝑦) as a heat map.
As anticipated, the probability that the government chooses to default in the following period increases with indebtedness
and falls with income.
Next let’s run a time series simulation of {𝑦𝑡 }, {𝐵𝑡 } and 𝑞(𝐵𝑡+1 , 𝑦𝑡 ).
The grey vertical bars correspond to periods when the economy is excluded from financial markets because of a past
default.
One notable feature of the simulated data is the nonlinear response of interest rates.
Periods of relative stability are followed by sharp spikes in the discount rate on government debt.
13.6 Exercises
Exercise 13.6.1
To the extent that you can, replicate the figures shown above
• Use the parameter values listed as defaults in Arellano_Economy.
• The time series will of course vary depending on the shock draws.
ae = Arellano_Economy()
Entering iteration 0.
# Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(y_grid) * 1.05, np.mean(y_grid) * .95
iy_high, iy_low = (np.searchsorted(y_grid, x) for x in (high, low))
# Create figure
fig, ax = plt.subplots(figsize=(10, 6.5))
hm = ax.pcolormesh(xx, yy, zz.T)
cax = fig.add_axes([.92, .1, .02, .8])
fig.colorbar(hm, cax=cax)
ax.axis([xx.min(), 0.05, yy.min(), yy.max()])
ax.set(xlabel="$B'$", ylabel="$y$", title="Probability of Default")
plt.show()
T = 250
np.random.seed(42)
y_sim, y_a_sim, B_sim, q_sim, d_sim = simulate(ae, T, v_c, v_d, q, B_star)
plt.show()
FOURTEEN
14.1 Overview
In this lecture, we review the paper Globalization and Synchronization of Innovation Cycles by Kiminori Matsuyama,
Laura Gardini and Iryna Sushko.
This model helps us understand several interesting stylized facts about the world economy.
One of these is synchronized business cycles across different countries.
Most existing models that generate synchronized business cycles do so by assumption, since they tie output in each country
to a common shock.
They also fail to explain certain features of the data, such as the fact that the degree of synchronization tends to increase
with trade ties.
By contrast, in the model we consider in this lecture, synchronization is both endogenous and increasing with the extent
of trade integration.
In particular, as trade costs fall and international competition increases, innovation incentives become aligned and coun-
tries synchronize their innovation cycles.
Let’s start with some imports:
import numpy as np
import matplotlib.pyplot as plt
from numba import jit
from ipywidgets import interact
14.1.1 Background
The model builds on work by Judd [Judd, 1985], Deneckner and Judd [Deneckere and Judd, 1992] and Helpman and
Krugman [Helpman and Krugman, 1985] by developing a two-country model with trade and innovation.
On the technical side, the paper introduces the concept of coupled oscillators to economic modeling.
As we will see, coupled oscillators arise endogenously within the model.
Below we review the model and replicate some of the results on synchronization of innovation across countries.
271
Advanced Quantitative Economics with Python
As discussed above, two countries produce and trade with each other.
In each country, firms innovate, producing new varieties of goods and, in doing so, receiving temporary monopoly power.
Imitators follow and, after one period of monopoly, what had previously been new varieties now enter competitive pro-
duction.
Firms have incentives to innovate and produce new goods when the mass of varieties of goods currently in production is
relatively low.
In addition, there are strategic complementarities in the timing of innovation.
Firms have incentives to innovate in the same period, so as to avoid competing with substitutes that are competitively
produced.
This leads to temporal clustering in innovations in each country.
After a burst of innovation, the mass of goods currently in production increases.
However, goods also become obsolete, so that not all survive from period to period.
This mechanism generates a cycle, where the mass of varieties increases through simultaneous innovation and then falls
through obsolescence.
14.2.2 Synchronization
In the absence of trade, the timing of innovation cycles in each country is decoupled.
This will be the case when trade costs are prohibitively high.
If trade costs fall, then goods produced in each country penetrate each other’s markets.
As illustrated below, this leads to synchronization of business cycles across the two countries.
14.3 Model
𝑜
Here 𝑋𝑘,𝑡 is a homogeneous input which can be produced from labor using a linear, one-for-one technology.
It is freely tradeable, competitively supplied, and homogeneous across countries.
By choosing the price of this good as numeraire and assuming both countries find it optimal to always produce the
homogeneous good, we can set 𝑤1,𝑡 = 𝑤2,𝑡 = 1.
The good 𝑋𝑘,𝑡 is a composite, built from many differentiated goods via
1
1− 1 1− 𝜎
𝑋𝑘,𝑡 𝜎 = ∫ [𝑥𝑘,𝑡 (𝜈)] 𝑑𝜈
Ω𝑡
Here 𝑥𝑘,𝑡 (𝜈) is the total amount of a differentiated good 𝜈 ∈ Ω𝑡 that is produced.
The parameter 𝜎 > 1 is the direct partial elasticity of substitution between a pair of varieties and Ω𝑡 is the set of varieties
available in period 𝑡.
We can split the varieties into those which are supplied competitively and those supplied monopolistically; that is, Ω𝑡 =
Ω𝑐𝑡 + Ω𝑚
𝑡 .
14.3.1 Prices
The price of a variety also depends on the origin, 𝑗, and destination, 𝑘, of the goods because shipping varieties between
countries incurs an iceberg trade cost 𝜏𝑗,𝑘 .
Thus the effective price in country 𝑘 of a variety 𝜈 produced in country 𝑗 becomes 𝑝𝑘,𝑡 (𝜈) = 𝜏𝑗,𝑘 𝑝𝑗,𝑡 (𝜈).
Using these expressions, we can derive the total demand for each variety, which is
where
𝜌𝑗,𝑘 𝐿𝑘
𝐴𝑗,𝑡 ∶= ∑ and 𝜌𝑗,𝑘 = (𝜏𝑗,𝑘 )1−𝜎 ≤ 1
𝑘
(𝑃𝑘,𝑡 )1−𝜎
It is assumed that 𝜏1,1 = 𝜏2,2 = 1 and 𝜏1,2 = 𝜏2,1 = 𝜏 for some 𝜏 > 1, so that
Monopolists will have the same marked-up price, so, for all 𝜈 ∈ Ω𝑚 ,
𝑚 𝜓 𝑚 𝑚 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 1 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
1− 𝜎
Define
𝑐
𝑝𝑗,𝑡 𝑐
𝑦𝑗,𝑡 1 1−𝜎
𝜃 ∶= 𝑚 𝑚 = (1 − )
𝑝𝑗,𝑡 𝑦𝑗,𝑡 𝜎
Using the preceding definitions and some algebra, the price indices can now be rewritten as
1−𝜎 𝑚
𝑃𝑘,𝑡 𝑐
𝑁𝑗,𝑡
( ) = 𝑀𝑘,𝑡 + 𝜌𝑀𝑗,𝑡 where 𝑀𝑗,𝑡 ∶= 𝑁𝑗,𝑡 +
𝜓 𝜃
𝑐 𝑚
The symbols 𝑁𝑗,𝑡 and 𝑁𝑗,𝑡 will denote the measures of Ω𝑐 and Ω𝑚 respectively.
To introduce a new variety, a firm must hire 𝑓 units of labor per variety in each country.
Monopolist profits must be less than or equal to zero in expectation, so
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝑁𝑗,𝑡 ≥ 0, 𝜋𝑗,𝑡 ∶= (𝑝𝑗,𝑡 − 𝜓)𝑦𝑗,𝑡 −𝑓 ≤0 and 𝜋𝑗,𝑡 𝑁𝑗,𝑡 =0
𝑚 𝑐 1 𝛼𝐿𝑗 𝛼𝐿𝑘
𝑁𝑗,𝑡 = 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ) ≥ 0, [ + ]≤𝑓
𝜎 𝜃(𝑀𝑗,𝑡 + 𝜌𝑀𝑘,𝑡 ) 𝜃(𝑀𝑗,𝑡 + 𝑀𝑘,𝑡 /𝜌)
With 𝛿 as the exogenous probability of a variety becoming obsolete, the dynamic equation for the measure of firms
becomes
𝑐 𝑐 𝑚 𝑐 𝑐
𝑁𝑗,𝑡+1 = 𝛿(𝑁𝑗,𝑡 + 𝑁𝑗,𝑡 ) = 𝛿(𝑁𝑗,𝑡 + 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ))
Here
𝐷𝐿𝐿 ∶= {(𝑛1 , 𝑛2 ) ∈ ℝ2+ |𝑛𝑗 ≤ 𝑠𝑗 (𝜌)}
𝐷𝐻𝐻 ∶= {(𝑛1 , 𝑛2 ) ∈ ℝ2+ |𝑛𝑗 ≥ ℎ𝑗 (𝑛𝑘 )}
𝐷𝐻𝐿 ∶= {(𝑛1 , 𝑛2 ) ∈ ℝ2+ |𝑛1 ≥ 𝑠1 (𝜌) and 𝑛2 ≤ ℎ2 (𝑛1 )}
𝐷𝐿𝐻 ∶= {(𝑛1 , 𝑛2 ) ∈ ℝ2+ |𝑛1 ≤ ℎ1 (𝑛2 ) and 𝑛2 ≥ 𝑠2 (𝜌)}
while
𝑠1 − 𝜌𝑠2
𝑠1 (𝜌) = 1 − 𝑠2 (𝜌) = min { , 1}
1−𝜌
14.4 Simulation
@jit(nopython=True)
def _hj(j, nk, s1, s2, θ, δ, ρ):
"""
If we expand the implicit function for h_j(n_k) then we find that
it is quadratic. We know that h_j(n_k) > 0 so we can get its
value by using the quadratic form
"""
# Find out who's h we are evaluating
if j == 1:
sj = s1
sk = s2
else:
sj = s2
(continues on next page)
return root
@jit(nopython=True)
def DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLL"
return (n1 <= s1_ρ) and (n2 <= s2_ρ)
@jit(nopython=True)
def DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHH"
return (n1 >= _hj(1, n2, s1, s2, θ, δ, ρ)) and \
(n2 >= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHL"
return (n1 >= s1_ρ) and (n2 <= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLH"
return (n1 <= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= s2_ρ)
@jit(nopython=True)
def one_step(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Takes a current value for (n_{1, t}, n_{2, t}) and returns the
values (n_{1, t+1}, n_{2, t+1}) according to the law of motion.
"""
# Depending on where we are, evaluate the right branch
if DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * s1_ρ + (1 - θ) * n1)
n2_tp1 = δ * (θ * s2_ρ + (1 - θ) * n2)
elif DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * n2
elif DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * (θ * _hj(2, n1, s1, s2, θ, δ, ρ) + (1 - θ) * n2)
elif DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * _hj(1, n2, s1, s2, θ, δ, ρ) + (1 - θ) * n1)
n2_tp1 = δ * n2
@jit(nopython=True)
@jit(nopython=True)
def _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers):
"""
Takes initial values and iterates forward to see whether
the histories eventually end up in sync.
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Initialize the status of synchronization
synchronized = False
pers_2_sync = maxiter
iters = 0
# Initialize generator
n_gen = n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
@jit(nopython=True)
def _create_attraction_basis(s1_ρ, s2_ρ, s1, s2, θ, δ, ρ,
maxiter, npers, npts):
# Create unit range with npts
synchronized, pers_2_sync = False, 0
unit_range = np.linspace(0.0, 1.0, npts)
return time_2_sync
class MSGSync:
"""
The paper "Globalization and Synchronization of Innovation Cycles" presents
a two-country model with endogenous innovation cycles. Combines elements
from Deneckere Judd (1985) and Helpman Krugman (1985) to allow for a
model with trade that has firms who can introduce new varieties into
the economy.
Parameters
----------
s1 : scalar(Float)
def _unpack_params(self):
return self.s1, self.s2, self.θ, self.δ, self.ρ
def _calc_s1_ρ(self):
# Unpack params
s1, s2, θ, δ, ρ = self._unpack_params()
# s_1(ρ) = min(val, 1)
val = (s1 - ρ * s2) / (1 - ρ)
return min(val, 1)
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
T : scalar(Int)
Number of periods to simulate
Returns
-------
n1 : Array(Float64, ndim=1)
A history of normalized measures of firms in country one
n2 : Array(Float64, ndim=1)
A history of normalized measures of firms in country two
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
# Allocate space
n1 = np.empty(T)
n2 = np.empty(T)
# Store in arrays
n1[t] = n1_tp1
n2[t] = n2_tp1
return n1, n2
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
return ab
We write a short function below that exploits the preceding code and plots two time series.
Each time series gives the dynamics for the two countries.
The time series share parameters but differ in their initial condition.
Here’s the function
ax.legend()
ax.set(title=title, ylim=(0.15, 0.8))
return ax
# Create figure
fig, ax = plt.subplots(2, 1, figsize=(10, 8))
fig.tight_layout()
plt.show()
In the first case, innovation in the two countries does not synchronize.
In the second case, different initial conditions are chosen, and the cycles become synchronized.
Next, let’s study the initial conditions that lead to synchronized cycles more systematically.
We generate time series from a large collection of different initial conditions and mark those conditions with different
colors according to whether synchronization occurs or not.
The next display shows exactly this for four different parameterizations (one for each subfigure).
Dark colors indicate synchronization, while light colors indicate failure to synchronize.
As you can see, larger values of 𝜌 translate to more synchronization.
You are asked to replicate this figure in the exercises.
In the solution to the exercises, you’ll also find a figure with sliders, allowing you to experiment with different parameters.
Here’s one snapshot from the interactive figure
14.5 Exercises
Exercise 14.5.1
Replicate the figure shown above by coloring initial conditions according to whether or not synchronization occurs from
those conditions.
return ab, cf
Additionally, instead of just seeing 4 plots at once, we might want to manually be able to change 𝜌 and see how it affects
the plot in real-time. Below we use an interactive plot to do this.
Note, interactive plotting requires the ipywidgets module to be installed and enabled.
Note: This interactive plot is disabled on this static webpage. In order to use this, we recommend to run this notebook
locally.
fig = interact(interact_attraction_basis,
ρ=(0.0, 1.0, 0.05),
maxiter=(50, 5000, 50),
npts=(25, 750, 25))
FIFTEEN
15.1 Overview
In 1937, Ronald Coase wrote a brilliant essay on the nature of the firm [Coase, 1937].
Coase was writing at a time when the Soviet Union was rising to become a significant industrial power.
At the same time, many free-market economies were afflicted by a severe and painful depression.
This contrast led to an intensive debate on the relative merits of decentralized, price-based allocation versus top-down
planning.
In the midst of this debate, Coase made an important observation: even in free-market economies, a great deal of top-
down planning does in fact take place.
This is because firms form an integral part of free-market economies and, within firms, allocation is by planning.
In other words, free-market economies blend both planning (within firms) and decentralized production coordinated by
prices.
The question Coase asked is this: if prices and free markets are so efficient, then why do firms even exist?
Couldn’t the associated within-firm planning be done more efficiently by the market?
We’ll use the following imports:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import fminbound
On top of asking a deep and fascinating question, Coase also supplied an illuminating answer: firms exist because of
transaction costs.
Here’s one example of a transaction cost:
Suppose agent A is considering setting up a small business and needs a web developer to construct and help run an online
store.
She can use the labor of agent B, a web developer, by writing up a freelance contract for these tasks and agreeing on a
suitable price.
But contracts like this can be time-consuming and difficult to verify
• How will agent A be able to specify exactly what she wants, to the finest detail, when she herself isn’t sure how the
business will evolve?
289
Advanced Quantitative Economics with Python
• And what if she isn’t familiar with web technology? How can she specify all the relevant details?
• And, if things go badly, will failure to comply with the contract be verifiable in court?
In this situation, perhaps it will be easier to employ agent B under a simple labor contract.
The cost of this contract is far smaller because such contracts are simpler and more standard.
The basic agreement in a labor contract is: B will do what A asks him to do for the term of the contract, in return for a
given salary.
Making this agreement is much easier than trying to map every task out in advance in a contract that will hold up in a
court of law.
So agent A decides to hire agent B and a firm of nontrivial size appears, due to transaction costs.
15.1.2 A Trade-Off
15.1.3 Summary
15.2.1 Subcontracting
The subcontracting scheme by which tasks are allocated across firms is illustrated in the figure below
In this example,
• Firm 1 receives a contract to sell one unit of the completed good to a final buyer.
• Firm 1 then forms a contract with firm 2 to purchase the partially completed good at stage 𝑡1 , with the intention of
implementing the remaining 1 − 𝑡1 tasks in-house (i.e., processing from stage 𝑡1 to stage 1).
• Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at stage 𝑡2 .
• Firm 3 decides to complete the chain, selecting 𝑡3 = 0.
At this point, production unfolds in the opposite direction (i.e., from upstream to downstream).
• Firm 3 completes processing stages from 𝑡3 = 0 up to 𝑡2 and transfers the good to firm 2.
• Firm 2 then processes from 𝑡2 up to 𝑡1 and transfers the good to firm 1,
• Firm 1 processes from 𝑡1 to 1 and delivers the completed good to the final buyer.
The length of the interval of stages (range of tasks) carried out by firm 𝑖 is denoted by ℓ𝑖 .
Each firm chooses only its upstream boundary, treating its downstream boundary as given.
The benefit of this formulation is that it implies a recursive structure for the decision problem for each firm.
In choosing how many processing stages to subcontract, each successive firm faces essentially the same decision problem
as the firm above it in the chain, with the only difference being that the decision space is a subinterval of the decision
space for the firm above.
We will exploit this recursive structure in our study of equilibrium.
15.2.2 Costs
15.3 Equilibrium
We assume that all firms are ex-ante identical and act as price takers.
As price takers, they face a price function 𝑝, which is a map from [0, 1] to ℝ+ , with 𝑝(𝑡) interpreted as the price of the
good at processing stage 𝑡.
There is a countable infinity of firms indexed by 𝑖 and no barriers to entry.
The cost of supplying the initial input (the good processed up to stage zero) is set to zero for simplicity.
Free entry and the infinite fringe of competitors rule out positive profits for incumbents, since any incumbent could be
replaced by a member of the competitive fringe filling the same role in the production chain.
Profits are never negative in equilibrium because firms can freely exit.
An equilibrium in this setting is an allocation of firms and a price function such that
1. all active firms in the chain make zero profits, including suppliers of raw materials
2. no firm in the production chain has an incentive to deviate, and
3. no inactive firms can enter and extract positive profits
In particular, 𝑡𝑖−1 is the downstream boundary of firm 𝑖 and 𝑡𝑖 is its upstream boundary.
As transaction costs are incurred only by the buyer, its profits are
1. 𝑝(0) = 0,
2. 𝜋𝑖 = 0 for all 𝑖, and
3. 𝑝(𝑠) − 𝑐(𝑠 − 𝑡) − 𝛿𝑝(𝑡) ≤ 0 for any pair 𝑠, 𝑡 with 0 ≤ 𝑠 ≤ 𝑡 ≤ 1.
The rationale behind these conditions was given in our informal definition of equilibrium above.
We have defined an equilibrium but does one exist? Is it unique? And, if so, how can we compute it?
To address these questions, we introduce the operator 𝑇 mapping a nonnegative function 𝑝 on [0, 1] to 𝑇 𝑝 via
𝑇 𝑝(𝑠) = min {𝑐(𝑠 − 𝑡) + 𝛿𝑝(𝑡)} for all 𝑠 ∈ [0, 1]. (15.3)
𝑡≤𝑠
By definition, 𝑡∗ (𝑠) is the cost-minimizing upstream boundary for a firm that is contracted to deliver the good at stage 𝑠
and faces the price function 𝑝∗ .
Since 𝑝∗ lies in 𝒫 and since 𝑐 is strictly convex, it follows that the right-hand side of (15.4) is continuous and strictly
convex in 𝑡.
Hence the minimizer 𝑡∗ (𝑠) exists and is uniquely defined.
We can use 𝑡∗ to construct an equilibrium allocation as follows:
Recall that firm 1 sells the completed good at stage 𝑠 = 1, its optimal upstream boundary is 𝑡∗ (1).
Hence firm 2’s optimal upstream boundary is 𝑡∗ (𝑡∗ (1)).
Continuing in this way produces the sequence {𝑡∗𝑖 } defined by
The sequence ends when a firm chooses to complete all remaining tasks.
We label this firm (and hence the number of firms in the chain) as
The task allocation corresponding to (15.5) is given by ℓ𝑖∗ ∶= 𝑡∗𝑖−1 − 𝑡∗𝑖 for all 𝑖.
In [Kikuchi et al., 2018] it is shown that
1. The value 𝑛∗ in (15.6) is well-defined and finite,
2. the allocation {ℓ𝑖∗ } is feasible, and
3. the price function 𝑝∗ and this allocation together forms an equilibrium for the production chain.
While the proofs are too long to repeat here, much of the insight can be obtained by observing that, as a fixed point of
𝑇 , the equilibrium price function must satisfy
From this equation, it is clear that so profits are zero for all incumbent firms.
We can develop some additional insights on the behavior of firms by examining marginal conditions associated with the
equilibrium.
As a first step, let ℓ∗ (𝑠) ∶= 𝑠 − 𝑡∗ (𝑠).
This is the cost-minimizing range of in-house tasks for a firm with downstream boundary 𝑠.
In [Kikuchi et al., 2018] it is shown that 𝑡∗ and ℓ∗ are increasing and continuous, while 𝑝∗ is continuously differentiable
at all 𝑠 ∈ (0, 1) with
Equation (15.8) follows from 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} and the envelope theorem for derivatives.
A related equation is the first order condition for 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)}, the minimization problem for a
firm with upstream boundary 𝑠, which is
This condition matches the marginal condition expressed verbally by Coase that we stated above:
“A firm will tend to expand until the costs of organizing an extra transaction within the firm become equal
to the costs of carrying out the same transaction by means of an exchange on the open market…”
Combining (15.8) and (15.9) and evaluating at 𝑠 = 𝑡𝑖 , we see that active firms that are adjacent satisfy
𝛿 𝑐′ (ℓ𝑖+1
∗
) = 𝑐′ (ℓ𝑖∗ ) (15.10)
In other words, the marginal in-house cost per task at a given firm is equal to that of its upstream partner multiplied by
gross transaction cost.
This expression can be thought of as a Coase–Euler equation, which determines inter-firm efficiency by indicating how
two costly forms of coordination (markets and management) are jointly minimized in equilibrium.
15.5 Implementation
For most specifications of primitives, there is no closed-form solution for the equilibrium as far as we are aware.
However, we know that we can compute the equilibrium corresponding to a given transaction cost parameter 𝛿 and a cost
function 𝑐 by applying the results stated above.
In particular, we can
1. fix initial condition 𝑝 ∈ 𝒫,
2. iterate with 𝑇 until 𝑇 𝑛 𝑝 has converged to 𝑝∗ , and
3. recover firm choices via the choice function (15.3)
At each iterate, we will use continuous piecewise linear interpolation of functions.
To begin, here’s a class to store primitives and a grid:
class ProductionChain:
def __init__(self,
n=1000,
delta=1.05,
c=lambda t: np.exp(10 * t) - 1):
* pc is an instance of ProductionChain
* The initial condition is p = c
"""
delta, c, n, grid = pc.delta, pc.c, pc.n, pc.grid
p = c(grid) # Initial condition is c(s), as an array
new_p = np.empty_like(p)
error = tol + 1
i = 0
if i < max_iter:
print(f"Iteration converged in {i} steps")
else:
print(f"Warning: iteration hit upper bound {max_iter}")
(continues on next page)
The next function computes optimal choice of upstream boundary and range of task implemented for a firm face price
function p_function and with downstream boundary 𝑠.
"""
delta, c = pc.delta, pc.c
f = lambda t: delta * p_function(t) + c(s - t)
t_star = max(fminbound(f, -1, s), 0)
ell_star = s - t_star
return t_star, ell_star
The allocation of firms can be computed by recursively stepping through firms’ choices of their respective upstream
boundary, treating the previous firm’s upstream boundary as their own downstream boundary.
In doing so, we start with firm 1, who has downstream boundary 𝑠 = 1.
pc = ProductionChain()
p_star = compute_prices(pc)
fig, ax = plt.subplots()
ax.plot(pc.grid, p_star(pc.grid))
ax.set_xlim(0.0, 1.0)
ax.set_ylim(0.0)
for s in transaction_stages:
ax.axvline(x=s, c="0.5")
plt.show()
Here’s the function ℓ∗ , which shows how large a firm with downstream boundary 𝑠 chooses to be
ell_star = np.empty(pc.n)
for i, s in enumerate(pc.grid):
t, e = optimal_choices(pc, p_star, s)
ell_star[i] = e
fig, ax = plt.subplots()
ax.plot(pc.grid, ell_star, label=r"$\ell^*$")
ax.legend(fontsize=14)
plt.show()
15.6 Exercises
Exercise 15.6.1
The number of firms is endogenously determined by the primitives.
What do you think will happen in terms of the number of firms as 𝛿 increases? Why?
Check your intuition by computing the number of firms at delta in (1.01, 1.05, 1.1).
pc = ProductionChain(delta=delta)
p_star = compute_prices(pc)
transaction_stages = compute_stages(pc, p_star)
num_firms = len(transaction_stages)
print(f"When delta={delta} there are {num_firms} firms")
Exercise 15.6.2
The value added of firm 𝑖 is 𝑣𝑖 ∶= 𝑝∗ (𝑡𝑖−1 ) − 𝑝∗ (𝑡𝑖 ).
One of the interesting predictions of the model is that value added is increasing with downstreamness, as are several other
measures of firm size.
Can you give any intution?
Try to verify this phenomenon (value added increasing with downstreamness) using the code above.
pc = ProductionChain()
p_star = compute_prices(pc)
stages = compute_stages(pc, p_star)
va = []
fig, ax = plt.subplots()
ax.plot(va, label="value added by firm")
ax.set_xticks((5, 25))
ax.set_xticklabels(("downstream firms", "upstream firms"))
plt.show()
SIXTEEN
COMPOSITE SORTING
16.1 Overview
Optimal transport theory is studies how one (marginal) probabilty measure can be related to another (marginal) probability
measure in an ideal way.
The output of such a theory is a coupling of the two probability measures, i.e., a joint probabilty measure having those
two marginal probability measures.
This lecture describes how Job Boerma, Aleh Tsyvinski, Ruodo Wang, and Zhenyuan Zhang [Boerma et al., 2024] used
optimal transport theory to formulate and solve an equilibrium of a model in which wages and allocations of workers
across jobs adjust to match measures of different types with measures of different types of occupations.
Production technologies allow firms to affect shape costs of mismatch with the consequence that costs of mismatch can
be concave.
That means that it is possible that equilibrium there is neither positive assortive nor negative assorting matching, an
outcome that [Boerma et al., 2024] call composite assortive matching.
For example, in an equilibrium with composite matching, identical workers can sort into different occupations, some
positively and some negatively.
[Boerma et al., 2024] show how this can generate distinct distributions of labor earnings within and across occupations.
This lecture describes the [Boerma et al., 2024] model and presents Python code for computing equilibria.
The lecture applies the code to the [Boerma et al., 2024] model of labor markets.
As with an earlier QuantEcon lecture on optimal transport, a key tool will be linear programming.
16.2 Setup
𝑋 and 𝑌 are finite sets that represent two distinct types of people to be matched.
For each 𝑥 ∈ 𝑋, let a positive integer 𝑛𝑥 be the number of agents of type 𝑥.
Similarly, let a positive integer 𝑚𝑦 be the agents of agents of type 𝑦 ∈ 𝑌 .
We refer to these two measures as marginals.
We assume that
∑ 𝑛𝑥 = ∑ 𝑚𝑦 =∶ 𝑁
𝑥∈𝑋 𝑦∈𝑌
303
Advanced Quantitative Economics with Python
s.t. ∑ 𝜇𝑥𝑦 = 𝑛𝑥
𝑥∈𝑋
∑ 𝜇𝑥𝑦 = 𝑚𝑦
𝑦∈𝑌
Given our discreteness assumptions about 𝑋 and 𝑌 , the problem admits an integer solution 𝜇 ∈ ℤ𝑋×𝑌
+ , i.e., 𝜇𝑥𝑦 is a
non-negative integer for each 𝑥 ∈ 𝑋, 𝑦 ∈ 𝑌 .
We will study integer solutions.
Two points about restricting ourselves to integer solutions are worth mentioning:
• it is without loss of generality for computational purposes, since every problem with float marginals can be trans-
formed into an equivalent problem with integer marginals;
• although the mathematical structure that we present actually works for arbitrary real marginals, some of our Python
implementations would fail to work with float arithmetic.
We focus on a specific instance of an optimal transport problem:
We assume that 𝑋 and 𝑌 are finite subsets of ℝ and that the cost function satisfies 𝑐𝑥𝑦 = ℎ(|𝑥 − 𝑦|) for all 𝑥, 𝑦 ∈ ℝ, for
an ℎ ∶ ℝ+ → ℝ+ that is strictly concave and strictly increasing and grounded (i.e., ℎ(0) = 0).
Such an ℎ satisfies the following
Lemma. If ℎ ∶ ℝ+ → ℝ+ is strictly concave and grounded, then ℎ is strictly subadditive, i.e. for all 𝑥, 𝑦 ∈ ℝ+ , 0 < 𝑥 < 𝑦,
we have
Proof. For 𝛼 ∈ (0, 1) and 𝑥 > 0 we have, by strict concavity and groundedness, ℎ(𝛼𝑥) > 𝛼ℎ(𝑥)+(1−𝛼)ℎ(0) = 𝛼ℎ(𝑥).
𝑥
Now fix 𝑥, 𝑦 ∈ ℝ+ , 0 < 𝑥 < 𝑦, and let 𝛼 = 𝑥+𝑦 ; the previous observation gives ℎ(𝑥) = ℎ(𝛼(𝑥 + 𝑦)) > 𝛼ℎ(𝑥 + 𝑦) and
ℎ(𝑦) = ℎ((1 − 𝛼)(𝑥 + 𝑦)) > (1 − 𝛼)ℎ(𝑥 + 𝑦); summing these inequality delivers the result. □
In the following implementation we assume that the cost function is 𝑐𝑥𝑦 = |𝑥 − 𝑦|1/𝜁 for 𝜁 > 1, i.e. ℎ(𝑧) = 𝑧 1/𝜁 for
𝑧 ∈ ℝ+ .
Hence, our problem is
s.t. ∑ 𝜇𝑥𝑦 = 𝑛𝑥
𝑥∈𝑋
∑ 𝜇𝑥𝑦 = 𝑚𝑦
𝑦∈𝑌
import numpy as np
from scipy.optimize import linprog
from itertools import chain
import pandas as pd
from collections import namedtuple
(continues on next page)
The following Python class takes as inputs sets of types 𝑋, 𝑌 ⊂ ℝ, marginals 𝑛, 𝑚 with positive integer entries such that
∑𝑥∈𝑋 𝑛𝑥 = ∑𝑦∈𝑌 𝑚𝑦 and cost parameter 𝜁 > 1.
The cost function is stored as an |𝑋| × |𝑌 | matrix with (𝑥, 𝑦)-entry equal to |𝑥 − 𝑦|1/𝜁 , i.e., the cost of matching an agent
of type 𝑥 ∈ 𝑋 with an agent of type 𝑦 ∈ 𝑌 .
class ConcaveCostOT():
def __init__(self, X_types=None, Y_types=None, n_x =None, m_y=None, ζ=2):
# Sets of types
self.X_types, self.Y_types = X_types, Y_types
# Marginals
if X_types is not None and Y_types is not None:
non_empty_types = True
self.n_x = np.ones(len(X_types), dtype=int) if n_x is None else n_x
self.m_y = np.ones(len(Y_types), dtype=int) if m_y is None else m_y
else:
non_empty_types = False
self.n_x, self.m_y = n_x, m_y
Let’s consider a random instance with given numbers of types |𝑋| and |𝑌 | and a given number of agents.
First, we generate random types 𝑋 and 𝑌 .
Then we generate random quantities for each type so that there are 𝑁 agents for each side.
number_of_x_types = 20
number_of_y_types = 20
N_agents_per_side = 60
np.random.seed(1)
# generate types
X_types_example = np.random.choice(random_support,
(continues on next page)
ConcaveCostOT.assign_random_marginals = assign_random_marginals
We use 𝐹 (resp. 𝐺) to denote the cumulative distribution function associated to the measure 𝑛 (resp. 𝑚)
Thus, 𝐹 (𝑧) = ∑𝑥≤𝑧∶𝑛 𝑛𝑥 and 𝐺(𝑧) = ∑𝑦≤𝑧∶𝑚 𝑚𝑦 for 𝑧 ∈ ℝ.
𝑥 >0 𝑦 >0
plt.figure(figsize=figsize)
ConcaveCostOT.plot_marginals = plot_marginals
example_pb.plot_marginals()
We can verify that reassigning the minimum of such quantities to the pairs (𝑧, 𝑧) and (𝑥, 𝑦) improves upon the current
matching since
where the first inequality follows from triangle inequality and the fact that ℎ is increasing and the strict inequality from
strict subadditivity.
We can then repeat the operation for any other analogous pair of matches involving 𝑧, while improving the value, until
we have mass min{𝑛𝑧 , 𝑚𝑧 } on match (𝑧, 𝑧).
Viewing the matching 𝜇 as a measure on 𝑋 × 𝑌 with marginals 𝑛 and 𝑚, this property says that in any optimal 𝜇 we
have 𝜇𝑧𝑧 = 𝑛𝑧 ∧ 𝑚𝑧 for (𝑧, 𝑧) in the diagonal {(𝑥, 𝑦) ∈ 𝑋 × 𝑌 ∶ 𝑥 = 𝑦} of ℝ × ℝ.
The following method finds perfect pairs and returns the on-diagonal matchings as well as the residual off-diagonal
marginals.
def match_perfect_pairs(self):
m_y_off_diag = self.m_y.copy()
m_y_off_diag[perfect_pairs_y] -= Δ_q
ConcaveCostOT.match_perfect_pairs = match_perfect_pairs
On-diagonal matches: 15
Residual types in X: 14
Residual types in Y: 16
We can therefore create a new instance with the residual marginals that will feature no perfect pairs.
Later we shall add the on-diagonal matching to the solution of this new instance.
We refer to this instance as “off-diagonal” since the product measure of the residual marginals 𝑛 ⊗ 𝑚 feature zeros mass
on the diagonal of ℝ × ℝ.
In the rest of this section, we will focus on this instance.
We create a subclass to study the residual off-diagonal problem.
The subclass inherits the attributes and the modules from the original class.
We let 𝑍 ∶= 𝑋 ⊔ 𝑌 , where ⊔ denotes the union of disjoint sets. We will
• index types 𝑋 as {0, … , |𝑋| − 1} and types 𝑌 as {|𝑋|, … , |𝑋| + |𝑌 | − 1};
• store the cost function as a |𝑍| × |𝑍| matrix with entry (𝑧, 𝑧 ′ ) equal to 𝑐𝑥𝑦 if 𝑧 = 𝑥 ∈ 𝑋 and 𝑧 ′ = 𝑦 ∈ 𝑌 or
𝑧 = 𝑦 ∈ 𝑌 and 𝑧 ′ = 𝑥 ∈ 𝑋 or equal to +∞ if 𝑧 and 𝑧 ′ belong to the same side
– (the latter is just customary, since these “infinitely penalized” entries are actually never accessed in the im-
plementation);
• let 𝑞 be a vector of size |𝑍| whose 𝑧-th entry equals 𝑛𝑥 if type 𝑥 is the 𝑧-th smallest type in 𝑍 and −𝑚𝑦 if type 𝑦
is the 𝑧-th smallest type in 𝑍; hence 𝑞 encodes capacities of both sides on the (ascending) sorted set of types.
Finally, we add a method to flexibly add a pair (𝑖, 𝑗) with 𝑖 ∈ {0, … , |𝑋| − 1}, 𝑗 ∈ {|𝑋|, … , |𝑋| + |𝑌 | − 1} or
𝑗 ∈ {0, … , |𝑋| − 1}, 𝑖 ∈ {|𝑋|, … , |𝑋| + |𝑌 | − 1} to a matching matrix of size |𝑋| × |𝑌 |.
class OffDiagonal(ConcaveCostOT):
def __init__(self, X_types, Y_types, n_x, m_y, ζ):
super().__init__(X_types, Y_types, n_x, m_y, ζ)
# Types (unsorted)
self.types_list = np.concatenate((X_types,Y_types))
# upper-right block
self.cost_z_z[:len(self.X_types), len(self.X_types):] = self.cost_x_y
# lower-left block
self.cost_z_z[len(self.X_types):, :len(self.X_types)] = self.cost_x_y.T
## Distributions of types
# sorted types and index identifier for each z in support
self.type_z = np.argsort(self.types_list)
self.support_z = self.types_list[self.type_z]
We add a function that returns an instance of the off-diagonal subclass as well as the on-diagonal matching and the indices
of the residual off-diagonal types.
These indices will come handy for adding the off-diagonal matching matrix to the diagonal matching matrix we just found,
since the former will have a smaller size if there are perfect pairs in the original problem.
def generate_offD_onD_matching(self):
# Match perfect pairs and compute on-diagonal matching
n_x_off_diag, m_y_off_diag , matching_diag = self.match_perfect_pairs()
ConcaveCostOT.generate_offD_onD_matching = generate_offD_onD_matching
example_off_diag, _ = example_pb.generate_offD_onD_matching()
Let’s plot the residual marginals to verify visually that there are no overlappings between types from distinct sides in the
off-diagonal instance.
|𝑥 − 𝑦′ | + |𝑥′ − 𝑦| = |𝑥 − 𝑦| + |𝑥′ − 𝑦′ |
|𝑥−𝑦|+|𝑥′ −𝑦|
Letting 𝛼 ∶= |𝑥−𝑦′ |−|𝑥′ −𝑦| ∈ (0, 1), we have |𝑥−𝑦| = 𝛼|𝑥−𝑦′ |+(1−𝛼)|𝑥′ −𝑦| and |𝑥′ −𝑦′ | = (1−𝛼)|𝑥−𝑦′ |+𝛼|𝑥′ −𝑦|.
Hence, by strict concavity of ℎ,
ℎ(|𝑥 − 𝑦|) + ℎ(|𝑥′ − 𝑦′ |) < 𝛼ℎ(|𝑥 − 𝑦′ |) + (1 − 𝛼)ℎ(|𝑥′ − 𝑦|) + (1 − 𝛼)ℎ(|𝑥 − 𝑦′ |) + 𝛼ℎ(|𝑥′ − 𝑦|) = ℎ(|𝑥 − 𝑦′ |) + ℎ(|𝑥′ − 𝑦|).
Therefore, as in the first case, we can strictly improve the cost among 𝑥, 𝑦, 𝑥′ , 𝑦′ by uncrossing the pairs.
Finally, it remains to argue that in both cases uncrossing operations do not increase the number of intersections with other
matched pairs.
It can indeed be shown on a case-by-case basis that, in both of the above cases, for any other matched pair (𝑥″ , 𝑦″ ) the
number of intersections between pairs (𝑥, 𝑦), (𝑥′ , 𝑦′ ) and the pair (𝑥″ , 𝑦″ ) (i.e., after uncrossing) is not larger than the
number of intersections between pairs (𝑥, 𝑦′ ), (𝑥′ , 𝑦) and the pair (𝑥″ , 𝑦″ ) (i.e., before uncrossing), hence the uncrossing
operations above reduce the number of intersections.
We conclude that if a matching features intersecting pairs, it can be modified via a sequence of uncrossing operations
into a matching without intersecting pairs while improving on the value.
(Layering) Recall that there are 2𝑁 individual agents, each agent 𝑖 having type 𝑧𝑖 ∈ 𝑋 ⊔ 𝑌 .
When we introduce the off diagonal matching, to stress that the types sets are disjoint now.
To simplify our explanation of this property, assume for now that each agent has its own distinct type (i.e., |𝑋| = |𝑌 | = 𝑁
and 𝑛 = 𝑚 = 1𝑁 ), in which case the optimal transport problem is also referred to as assignment problem.
Let’s index agents according to their types:
Suppose that agents 𝑖 of type 𝑧𝑖 and 𝑗 of type 𝑧𝑗 , with 𝑧𝑖 < 𝑧𝑗 , are matched in a particular optimal solution.
Then there is an equal number of agents from each side in {𝑖 + 1, … , 𝑗 − 1}, if this set is not empty.
Indeed, if this were not the case, then some agent 𝑘 ∈ {𝑖 + 1, 𝑗 − 1} would be matched with some agent ℓ with
ℓ ∉ {𝑖, … , 𝑗}, i.e., there would be types
with matches (𝑧𝑖 , 𝑧𝑗 ) and (𝑧𝑘 , 𝑧ℓ ), violating the no intersecting pairs property.
We conclude that we can define a binary relation on [𝑁 ] such that 𝑖 ∼ 𝑗 if there is an equal number of agents of each
side in {𝑖, 𝑖 + 1, … , 𝑗} (or if this set is empty).
This is an equivalence relation, so we can find associated equivalence classes that we call layers.
By the reasoning above, in an optimal solution all pairs 𝑖, 𝑗 (of opposite sides) which are matched belong to the same
layer, hence we can solve the assignment problem associated to each layer and then add up the solutions.
In terms of distributions, 𝑖 and 𝑗, of types 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌 respectively, belong to the same layer (i.e., 𝑥 ∼ 𝑦) if and
only if 𝐹 (𝑦−) − 𝐹 (𝑥) = 𝐺(𝑦−) − 𝐺(𝑥).
If 𝐹 and 𝐺 were continuous, then 𝐹 (𝑦) − 𝐹 (𝑥) = 𝐺(𝑦) − 𝐺(𝑥) ⟺ 𝐹 (𝑥) − 𝐺(𝑥) = 𝐹 (𝑦) − 𝐺(𝑦).
This suggests that the following quantity plays an important role:
# Plot H(z)
plt.figure(figsize=figsize)
plt.axhline(0, color='black', linewidth=1)
OffDiagonal.plot_H_z = plot_H_z
example_off_diag.plot_H_z()
Moreover, each layer 𝐿ℓ contains an even number of types 𝑁ℓ ∈ 2ℕ, which are alternating, i.e., ordering them as
𝑧1 < 𝑧2 ⋯ < 𝑧𝑁ℓ −1 < 𝑧𝑁ℓ all odd (or even, respectively) indexed types belong to the same side.
The following method finds the layers associated with distributions 𝐹 and 𝐺.
Again, types in 𝑋 are indexed with {0, … , |𝑋| − 1} and types in 𝑌 with {|𝑋|, … , |𝑋| + |𝑌 | − 1}.
Using these indices (instead of the types themselves) to represent the layers allows keeping track of sides types in each
layer, without adding an additional bit of information that would identify the side of the first type in the layer, which,
because a layer is alternating, would then allow identifying sides of all types in the layer.
In addition, using indices will let us extract the cost function within a layer from the cost function 𝑐𝑧𝑧′ computed offline.
def find_layers(self):
# Compute H(z) on the joint support
H_z = np.concatenate([[0], np.cumsum(self.q_z)])
# Compute layers
# the following |H(R)|x|Z| matrix has entry (z,l) equal to 1 iff type z belongs␣
↪to layer l
OffDiagonal.find_layers = find_layers
[array([23, 10]), array([27, 3, 23, 10]), array([16, 2, 21, 3, 25, 8, 23, 12]),
↪ array([16, 2, 21, 3, 25, 12]), array([22, 0, 16, 2, 21, 3, 18, 12]),␣
↪array([15, 0, 16, 2, 14, 5, 21, 3, 18, 9]), array([20, 0, 16, 2, 14, 5,␣
↪21, 3, 19, 11, 24, 1, 18, 9]), array([ 2, 26, 5, 21, 3, 19, 4, 18]),␣
↪array([ 2, 26, 7, 21, 3, 19, 4, 17, 6, 18]), array([13, 26, 7, 21, 3, 19, ␣
↪6, 18]), array([ 6, 18]), array([ 6, 28]), array([ 6, 29])]
plt.figure(figsize=figsize)
# Plot H(z)
step = np.concatenate(([self.support_z.min() - .05 * self.support_z.ptp()],
self.support_z,
[self.support_z.max() + .05 * self.support_z.ptp()]))
height = np.concatenate((H_z, [0]))
plt.step(step, height, where='post', color='black', label='CDF', zorder=1)
# Plot layers
colors = cm.viridis(np.linspace(0, 1, len(layers)))
for ell, layer in enumerate(layers):
plt.vlines(self.types_list[layer], layers_height[ell] ,
layers_height[ell] + layers_mass[ell],
color=colors[ell], linewidth=2)
plt.scatter(self.types_list[layer],
np.ones(len(layer)) * layers_height[ell]
+.5 * layers_mass[ell],
color=colors[ell], s=50)
plt.axhline(layers_height[ell], color=colors[ell],
linestyle=':', linewidth=1.5, zorder=0)
OffDiagonal.plot_layers = plot_layers
example_off_diag.plot_layers()
which is alternating.
The problem within a layer is unitary.
Hence we can solve the problem with unit masses and later rescale the solution by the layer’s mass 𝑀ℓ .
Let us select a layer from the example above (we pick the one with maximum number of types) and plot the types on the
real line
plt.figure(figsize=figsize)
ConcaveCostOT.plot_layer_types = plot_layer_types
example_off_diag.plot_layer_types(layer_example,
layers_mass_example[layer_id_example])
Given the structure of a layer and the no intersecting pairs property, the optimal matching and value of the layer can be
found recursively.
Indeed, if in certain optimal matching 1 and 𝑗 ∈ [𝑁ℓ ], 𝑗 − 1 odd, are paired, then there is no matching between agents
in [2, 𝑗 − 1] and those in [𝑗 + 1, 𝑁ℓ ] (if both are non empty, i.e., 𝑗 is not 2 or 𝑁ℓ ).
Hence in such optimal solution agents in [2, 𝑗 − 1] are matched among themselves.
Since [𝑧2 , 𝑧𝑗−1 ] (as well as [𝑧𝑗+1 , 𝑧𝑁ℓ ]) is alternating, we can reason recursively.
Let 𝑉𝑖𝑗 be the optimal value of matching agents in [𝑖, 𝑗] with 𝑖, 𝑗 ∈ [𝑁ℓ ], 𝑗 − 𝑖 ∈ {1, 3, … , 𝑁ℓ − 1}.
Suppose that we computed the value 𝑉𝑖𝑗 for all 𝑖, 𝑗 ∈ [𝑁ℓ ] with 𝑖 − 𝑗 ∈ {1, 3, … , 𝑡 − 2} for some odd natural number 𝑡.
Then, for 𝑖, 𝑗 ∈ [𝑁ℓ ] with 𝑖 − 𝑗 = 𝑡 we have
The following method takes as input the layer types indices and computes the value function as a matrix
[𝑉𝑖𝑗 ]𝑖∈[𝑁ℓ +1],𝑗∈[𝑁ℓ ] .
In order to distinguish entries that are relevant for our computations from those that are never accessed, we initialize this
matrix as full of NaN values.
def solve_bellman_eqs(self,layer):
# Recover cost function within the layer
cost_i_j = self.cost_z_z[layer[:,None],layer[None,:]]
t = 1
while t < len(layer):
# Select agents i in [n_L-t] (with potential partners j's in [t,n_L])
i_t = np.arange(len(layer)-t)
return V_i_j
OffDiagonal.solve_bellman_eqs = solve_bellman_eqs
Having computed the value function, we can proceed to compute the optimal matching as the policy that attains the value
function that solves the Bellman equation (policy evaluation).
We start from agent 1 and match it with the 𝑘 that achieves the minimum in the equation associated with 𝑉1,2𝑁ℓ .
Then we store segments [2, 𝑘 − 1] and [𝑘 + 1, 2𝑁ℓ ] (if not empty).
In general, given a segment [𝑖, 𝑗], we match 𝑖 with 𝑘 that achieves the minimum in the equation associated with 𝑉𝑖𝑗 and
store the segments [𝑖, 𝑘 − 1] and [𝑘 + 1, 𝑗] (if not empty).
The algorithm proceeds until there are no segments left.
while segments_to_process:
# Pick i, first agent of the segment
# and potential partners i+1,i+3,..., in the segment
segment = segments_to_process[0]
i_0 = segment[0]
potential_matches = np.arange(i_0, segment[-1], 2) + 1
return matching
OffDiagonal.find_layer_matching = find_layer_matching
Lets apply this method our example to find the matching within the layer and then rescale it by 𝑀ℓ .
Note that the unscaled value equals 𝑉1,𝑁ℓ .
matching_layer = example_off_diag.find_layer_matching(V_i_j,layer_example)
print(f"Value of the layer (unscaled): {(matching_layer * example_off_diag.cost_x_y).
↪sum()}")
plt.show()
ConcaveCostOT.plot_layer_matching = plot_layer_matching
example_off_diag.plot_layer_matching(layer_example, matching_layer)
We now present two key results in the context of OT with concave type costs.
We refer [Boerma et al., 2024] and [Delon et al., 2011] for proofs.
Consider the problem faced within a layer, i.e., types from 𝑌 ⊔ 𝑋
for 𝑖, 𝑗 ∈ [𝑁ℓ ], 𝑗 − 𝑖 odd, with boundary conditions 𝑉𝑖+1,𝑖 = 0 for 𝑖 ∈ [0, 𝑁ℓ ] and 𝑉𝑖+2,𝑖−1 = −𝑐𝑖,𝑖+1 for 𝑖 ∈ [𝑁ℓ − 1] .
The following method uses these equations to compute the value function that is stored as a matrix [𝑉𝑖𝑗 ]𝑖∈[𝑁ℓ +1],𝑗∈[𝑁ℓ +1] .
def solve_bellman_eqs_DSS(self,layer):
# Recover cost function within the layer
cost_i_j = self.cost_z_z[layer[:,None],layer[None,:]]
t = 1
while t < len(layer):
# Select agents i in [n_l-t] and potential partner j=i+t for each i
i_t = np.arange(len(layer)-t)
j_t = i_t + t +1
return V_i_j
OffDiagonal.solve_bellman_eqs_DSS = solve_bellman_eqs_DSS
Let’s apply the algorithm to our example and compare outcomes with those attained with the Bellman equations above.
V_i_j_DSS = example_off_diag.solve_bellman_eqs_DSS(layer_example)
print('##########################')
print(f"Difference with previous Bellman equations: \
{(V_i_j_DSS[:,1:] - V_i_j)[V_i_j >= 0].sum()}")
We can actually compute the optimal matching within the layer simultaneously with computing the value function, rather
than sequentially.
The key idea is that, if at some step of the computation of the values the left branch of the minimum above achieves the
minimum, say 𝑉𝑖𝑗 = 𝑐𝑖𝑗 + 𝑉𝑖+1,𝑗−1 , then (𝑖, 𝑗) are optimally matched on [𝑖, 𝑗] and by the theorem above we get that a
matching on [𝑖 + 1, 𝑗 − 1] which achieves 𝑉𝑖+1,𝑗−1 belongs to an optimal matching on the whole layer (since it is covered
by the arc (𝑖, 𝑗) in [𝑖, 𝑗]).
We can therefore proceed as follows
We initialize an empty matching and a list with all the agents in the layer (representing the agents which are not matched
yet).
Then whenever the left branch of the minimum is achieved for some (𝑖, 𝑗) in the computation of 𝑉 , we take the collections
of agents 𝑘1 , … , 𝑘𝑀 in [𝑖 + 1, 𝑗 − 1] (in ascending order, i.e. with 𝑧𝑘𝑝 < 𝑧𝑘𝑝+1 ) that are not matched yet (if any) and
add to the matching the pairs (𝑘1 , 𝑘2 ), (𝑘3 , 𝑘4 ), … , (𝑘𝑀−1 , 𝑘𝑀 ).
Thus, we match each unmatched agent 𝑘𝑝 in [𝑖 + 1, 𝑗 − 1] with the closest unmatched right neighbour 𝑘𝑝+1 (starting from
𝑘1 ).
Intuitively, if 𝑘𝑝 were optimally matched with some 𝑘𝑞 in [𝑖 + 1, 𝑗 − 1] and not with 𝑘𝑝+1 , then 𝑘𝑝+1 would have already
been hidden by the match (𝑘𝑝 , 𝑘𝑞 ) from some previous computation (because |𝑘𝑝 − 𝑘𝑞 | < |𝑖 − 𝑗|) and it would therefore
be matched.
Finally, if the process above leaves some umatched agents, we proceed by matching each of these agent with the closest
unmatched right neighbour, starting again from the leftmost of these collection.
To gain understanding, note that this situation happens when the left branch is achieved only for pairs 𝑖, 𝑗 with |𝑖 − 𝑗| = 1,
which leads to the optimal matching (1, 2), (2, 3), … , (𝑛ℓ − 1, 𝑛ℓ ).
def find_layer_matching_DSS(self,layer):
# Recover cost function within the layer
cost_i_j = self.cost_z_z[layer[:,None],layer[None,:]]
t = 1
while t < len(layer):
# Compute optimal value for pairs with |i-j| = t
i_t = np.arange(len(layer)-t)
j_t = i_t + t + 1
# Select each i for which left branch achieves minimum in the V_{i,i+t}
left_branch_achieved = i_t[left_branch == V_i_j[i_t, j_t]]
# Update matching
for i in left_branch_achieved:
# for each agent k in [i+1,i+t-1]
for k in np.arange(i+1,i+t)[unmatched[range(i+1,i+t)]]:
# if k is unmatched
if unmatched[k] == True:
# find unmatched right neighbour
j_k = np.arange(k+1,len(layer))[unmatched[k+1:]][0]
# add pair to matching
self.add_pair_to_matching(layer[[k, j_k]], matching)
# remove pair from unmatched agents list
unmatched[[k, j_k]] = False
return matching
OffDiagonal.find_layer_matching_DSS = find_layer_matching_DSS
matching_layer_DSS = example_off_diag.find_layer_matching_DSS(layer_example)
print(f" Value of layer with DSS recursive equations \
{(matching_layer_DSS * example_off_diag.cost_x_y).sum()}")
print(f" Value of layer with Bellman equations \
{(matching_layer * example_off_diag.cost_x_y).sum()}")
example_off_diag.plot_layer_matching(layer_example, matching_layer_DSS)
The following method assembles our components in order to solve the primal problem.
First, if matches are perfect pairs, we store the on-diagonal matching and create an off-diagonal instance with the residual
marginals.
Then we compute the set of layers of the residual distributions.
Finally, we solve each layer and put together matchings within each layer with the on-diagonal matchings.
We then return the full matching, the off-diagonal matching, and the off-diagonal instance.
def solve_primal_pb(self):
# Compute on-diagonal matching, create new instance with resitual types
off_diagoff_diagonal, match_tuple = self.generate_offD_onD_matching()
nonzero_id_x, nonzero_id_y, matching_diag = match_tuple
# Compute layers
(continues on next page)
ConcaveCostOT.solve_primal_pb = solve_primal_pb
def solve_primal_DSS(self):
# Compute on-diagonal matching, create new instance with resitual types
off_diagoff_diagonal, match_tuple = self.generate_offD_onD_matching()
nonzero_id_x, nonzero_id_y, matching_diag = match_tuple
# Find layers
layers, layers_mass, _, _ = off_diagoff_diagonal.find_layers()
ConcaveCostOT.solve_primal_DSS = solve_primal_DSS
DSS_tuple = example_pb.solve_primal_DSS()
matching_DSS, matching_off_diag_DSS, off_diagoff_diagonal_DSS = DSS_tuple
By drawing semicircles joining the matched agents (with distinct types), we can visualize the off-diagonal matching.
In the following figure, widths and colors of semicirles indicate relative numbers of agents that are “transported” along
an arc.
# Add labels
for i, x in enumerate(self.X_types):
ax.annotate(f'$x_{{{i }}}$', (x, 0), textcoords="offset points",
xytext=(0, -15), ha='center', color='blue', fontsize=12)
for j, y in enumerate(self.Y_types):
ax.annotate(f'$y_{{{j }}}$', (y, 0), textcoords="offset points",
xytext=(0, -15), ha='center', color='red', fontsize=12)
count = matching_off_diag[matched_types]
colors = plt.cm.Greys(np.linspace(0.5, 1.5, count.max() + 1))
max_height = 0
for iter in range(len(count)):
width = abs(matched_types_x[iter] - matched_types_y[iter])
center = (matched_types_x[iter] + matched_types_y[iter]) / 2
height = width
max_height = max(max_height, height)
semicircle = patches.Arc((center, 0), width, height,
theta1=0, theta2=180,
color=colors[count[iter]],
lw=count[iter] * (2.2 / count.max()))
ax.add_patch(semicircle)
step = np.concatenate(([self.support_z.min()
- .02 * self.support_z.ptp()],
self.support_z,
[self.support_z.max()
+ .02 * self.support_z.ptp()]))
# Set the y-limit to keep H_z and maximum circle size in the plot
ax.set_ylim(np.min(H_z) - H_z.ptp() *.01,
np.maximum(np.max(H_z), max_height / 2) + H_z.ptp() *.01)
plt.show()
ConcaveCostOT.plot_matching = plot_matching
off_diagoff_diagonal.plot_matching(matching_off_diag,
title='Optimal Matching (off-diagonal)', plot_H_z=True)
off_diagoff_diagonal_DSS.plot_matching(matching_off_diag_DSS,
title='Optimal Matching (off-diagonal) with DSS algorithm')
# Constraint matrix
M_z_a = np.vstack([np.kron(np.eye(n), np.ones(m)),
np.kron(np.ones(n), np.eye(m))])
# Constraint vector
q = np.concatenate((n_x, m_y))
if return_dual:
return (np.round(result.x).astype(int).reshape([n, m]),
result.eqlin.marginals)
else:
return np.round(result.x).astype(int).reshape([n, m])
mu_x_y_LP = solve_1to1(example_pb.cost_x_y,
example_pb.n_x,
example_pb.m_y)
print(f"Value of LP (scipy): {(mu_x_y_LP * example_pb.cost_x_y).sum()}")
print(f"Value (plain Bellman equations): {(matching * example_pb.cost_x_y).sum()}")
print(f"Value (DSS): {(matching_DSS * example_pb.cost_x_y).sum()}")
16.5 Examples
16.5.1 Example 1
We study optimal transport problems on the real line with cost 𝑐(𝑥, 𝑦) = ℎ(|𝑥 − 𝑦|) for a strictly concave and increasing
function ℎ ∶ ℝ+ → ℝ+ .
The outcome is called composite sorting.
1
Here, we will focus on 𝑐(𝑥, 𝑦) = |𝑥 − 𝑦| 𝜁 for 𝜁 > 1
To appreciate differences with positive assortative matching (PAM) note that the latter is induced by a cost of the form
ℎ(𝑥 − 𝑦) for some strictly convex ℎ ∶ ℝ → ℝ+ .
See Santambrogio 2015, Ch. 2.2.
For example, the cost function |𝑥 − 𝑦|𝑝 , 𝑝 > 1 induces PAM.
On the other hand, negative assortative matching (NAM) arises if 𝑐(𝑥, 𝑦) = ℎ(𝑥 − 𝑦) with ℎ ∶ ℝ → ℝ+ strictly concave.
For example, the cost function −|𝑥 − 𝑦|𝑝 , 𝑝 > 1, induces NAM.
Thus, NAM corresponds to a matching that maximizes a transport problem criterion with gain function 𝑔(𝑥, 𝑦) = |𝑥−𝑦|𝑝 .
Note how PAM and NAM differ from composite sorting
Composite sorting is induced by a cost that is the composition of a strictly concave increasing function ℎ and a convex
function | ⋅ | applied to displacement 𝑥 − 𝑦.
Different functions ℎ potentially induce different matchings.
The following example shows that composite matching can feature both positive and negative assortative patterns.
Suppose that there are two agents per side and types
There are two feasible matchings, one corresponding to PAM, the other to NAM.
• The first features two displacements |𝑥0 − 𝑦0 |, |𝑥1 − 𝑦1 |
• The second features a large displacement |𝑥0 − 𝑦1 | and a small displacement |𝑥1 − 𝑦0 |.
Evidently,
• PAM corresponds to the matching with two medium side displacement because the correponding cost is strictly
convex and increasing in the the displacement.
• NAM corresponds to the matching with a small displacement and a large displacement because the gain is strictly
convex and increasing in the displacement.
In this example, composite sorting ends up coinciding with NAM, but this is something of a coincidence
• Thus, note that in composite matching the cost function is strictly concave and increasing in the displacement.
N = 2
p = 2
ζ = 2
To explore the coincidental resemblence to a NAM outcome, let’s shift left type 𝑦0 while keeping it in between 𝑥0 and
𝑥1 .
PAM and NAM are invariant to any such shift.
However, for a large enough shift, composite sorting now coindices with PAM.
N = 2
ζ = 2
p = 2
Finally, notice that the Monge problem cost function |𝑥 − 𝑦| equals the limit of the composite sorting cost |𝑥 − 𝑦|1/𝜁 as
𝜁 ↓ 1 and also the limit of |𝑥 − 𝑦|𝑝 as 𝑝 ↓ 1.
Evidently, the Monge problem is solved by both the PAM and the composite sorting assignment that arises for 𝜁 ↓ 1.
In the following example, the Monge cost of the composite sorting assignment equals the Monge cost of PAM.
Consequently, it is optimal for the Monge problem.
N = 10
ζ = 1.01
p = 2
np.random.seed(1)
X_types = np.random.uniform(0,10, size=N)
Y_types = np.random.uniform(0,10, size=N)
matching_CS, _ ,_ = example_1.solve_primal_DSS()
example_1.plot_matching(matching_CS,
title=f'Composite Sorting: $|x-y|^{{1/{ζ}}}$', figsize=(5,5))
example_1.plot_matching(matching_PAM, title = 'PAM', figsize=(5,5))
16.5.2 Example 2
N = 5
ζ = 2
p = 2
matching_CS, _ ,_ = example_2.solve_primal_DSS()
16.5.3 Example 3
X_types_example_3 = np.array([0,5,9])
Y_types_example_3 = np.array([1,6,10])
n_x_example_3 = np.array([2,1,1], dtype= int)
m_y_example_3 = np.array([1,1,2], dtype= int)
In the case of positive assortative matching (PAM), the two agents with lowest value 𝑥0 are matched with the lowest
valued agents on the other side 𝑦0 , 𝑦1 .
Similarly, the agents with highest value 𝑦2 are matched with the highest valued types on the other side, 𝑥1 and 𝑥2 .
Composite sorting features both negative and positive sorting patterns: agents of type 𝑥0 are matched with both the
bottom 𝑦0 and the top 𝑦2 of the distribution.
matching_CS, _ ,_ = example_3.solve_primal_DSS()
s.t. ∑ 𝜇𝑥𝑦 = 𝑛𝑥
𝑥∈𝑋
∑ 𝜇𝑥𝑦 = 𝑚𝑦
𝑦∈𝑌
𝑉𝐷 = max ∑ 𝑛𝑥 𝜙𝑥 + ∑ 𝑚𝑦 𝜓𝑦
𝜙,𝜓
𝑥∈𝑋 𝑦∈𝑌
s.t. 𝜙𝑥 + 𝜓𝑦 ≤ 𝑐𝑥𝑦
where (𝜙, 𝜓) are dual variables, which can be interpreted as shadow cost of agents in 𝑋 and 𝑌 , respectively.
Since the dual is feasible and bounded, 𝑉𝑃 = 𝑉𝐷 (strong duality prevails).
Assume now that 𝑦𝑥𝑦 = 𝛼𝑥 + 𝛾𝑦 − 𝑐𝑥𝑦 is the output generated by matching 𝑥 and 𝑦.
It includes the sum of 𝑥 and 𝑦 specific amenities/outputs minus the cost 𝑐𝑥𝑦 .
Then we can formulate the following problem and its dual
s.t. ∑ 𝜇𝑥𝑦 = 𝑛𝑥
𝑥∈𝑋
∑ 𝜇𝑥𝑦 = 𝑚𝑦
𝑦∈𝑌
𝑊𝐷 = min ∑ 𝑛𝑥 𝑢𝑥 + ∑ 𝑚𝑦 𝑣𝑦
𝑢,𝑣
𝑥∈𝑋 𝑦∈𝑌
s.t. 𝑢𝑥 + 𝑣𝑦 ≥ 𝑦𝑥𝑦
Given the constraints, the primal problem 𝑊𝑃 does not depend on 𝛼, 𝛾 and it has the same solutions as the cost mini-
mization problem 𝑉𝑃 .
The values are related by 𝑊𝑃 = ∑𝑥∈𝑋 𝑛𝑥 𝛼𝑥 + ∑𝑦∈𝑌 𝑚𝑦 𝛾𝑦 − 𝑉𝑃 .
The dual solutions of 𝑉𝐷 and 𝑊𝐷 are related by 𝑢𝑥 = 𝛼𝑥 − 𝜙𝑥 and 𝑣𝑦 = 𝛾𝑦 − 𝜓𝑦 .
The dual solution (𝑢, 𝑣) of 𝑊𝐷 can be interpreted as equilibrium utilities of the agents, which include the individual
specific amenities and equilibrium shadow costs.
[Boerma et al., 2024] propose an efficient method to compute the dual variables from the optimal matching (primal
solution) in the case of composite sorting.
Let’s generate an instance and compute the optimal matching.
num_agents = 8
np.random.seed(1)
# Plot matching
add_labels = True if num_agents < 16 else False
exam_assign_OD.plot_matching(assignment_OD, title = f'Composite Sorting',
figsize=(10,10), add_labels=add_labels)
Having computed the optimal matching, we say that a pair (𝑥0 , 𝑦0 ) is a subpair of a matched pair (𝑥, 𝑦) if 𝑥0 , 𝑦0 are in
the open interval between 𝑥 and 𝑦 and the pair (𝑥0 , 𝑦0 ) is not nested.
The following method computes the subpairs of the optimal matching of the off-diagonal instance.
The output of this method is a dictionary with keys corresponding to matched pairs and an “artificial pair” which collects
all arcs which are visible from above.
Values of each key (𝑥0 , 𝑦0 ) are the subpairs ordered so that the first subpair is the subpair with the 𝑥 type closest to 𝑥0
and the last subpair is the subpair with the 𝑦 type closest to 𝑦0 .
ConcaveCostOT.sort_subpairs = sort_subpairs
# Find subpairs (both nested and non-nested) for each matched pair
for matched_pair in matched_pairs | {'artificial_pair'}:
# Determine the interval of the matched pair
if matched_pair != 'artificial_pair':
min_type, max_type = sorted([self.X_types[matched_pair[0]],
self.Y_types[matched_pair[1]]])
else:
min_type, max_type = (-np.inf, np.inf)
if return_pairs_between:
return subpairs, pairs_between
return subpairs
OffDiagonal.find_subpairs = find_subpairs
The algorithm to compute the dual variables has a hierarchical structure: it starts from the matched pairs with no subpairs
and then moves to those pairs whose subpairs have been already processed.
We can visualize the hierarchical structure by computing the order in which he pairs will be processed and plotting the
matching with color of the arcs corresponding the hierarchy.
## Compute Hierarchies
def find_hierarchies(subpairs):
# Find new ready_to_process pairs that have all their subpairs processed
ready_to_process = {
pair for pair in pairs_to_process
if all(subpair in processed_pairs for subpair in subpairs[pair])}
return hierarchies
## Plot Hierarchies
# Plot types on the real line (blue for X_types, red for Y_types)
size_marker = 20 if scatter else 0
ax.scatter(self.X_types, np.zeros_like(self.X_types), color='blue',
s=size_marker, zorder=5, label='X_types')
ax.scatter(self.Y_types, np.zeros_like(self.Y_types), color='red',
s=size_marker, zorder=5, label='Y_types')
# Plot arcs
# Create a colormap ('viridis' or 'coolwarm', 'plasma')
cmap = plt.colormaps['plasma']
for level, hierarchy in enumerate(hierarchies):
color = (cmap(level / (len(hierarchies) - 1))
if len(hierarchies) > 1 else cmap(0))
for pair in hierarchy:
if pair == 'artificial_pair':
continue
plt.show()
OffDiagonal.plot_hierarchies = plot_hierarchies
exam_assign_OD.plot_hierarchies(subpairs)
We proceed to describe and implement the algorithm to compute the dual solution.
As already mentioned, the algorithm starts from the matched pairs (𝑥0 , 𝑦0 ) with no subpairs and assigns the (temporary)
values 𝜓𝑥0 = 𝑐𝑥0 𝑦0 and 𝜓𝑦0 = 0, i.e. the 𝑥 type sustains the whole cost of matching.
The algorithm then proceeds sequentially by processing any matched pair whose subpairs have already been processed.
After picking any such matched pair (𝑥0 , 𝑦0 ), the dual variables already computed for the processed subpairs need to be
made “comparable”.
Indeed, for any subpair (𝑥1 , 𝑦1 ) of (𝑥0 , 𝑦0 ), the dual variables of all the types between the 𝑥1 and 𝑦1 satisfy dual feasibility
and complementary slackness locally, i.e. 𝜙𝑥 +𝜓𝑦 ≤ 𝑐𝑥𝑦 with equality if (𝑥, 𝑦) is a matched pair for all types 𝑥, 𝑦 between
𝑥0 and 𝑦0 .
But dual feasibility is not satisfied globally in general, for instance it might not be satisfied for two subpairs (𝑥1 , 𝑦1 ) and
(𝑥2 , 𝑦2 ) of (𝑥0 , 𝑦0 ).
Therefore, letting (𝑥1 , 𝑦1 ), … , (𝑥𝑝 , 𝑦𝑝 ) be the subpairs of (𝑥0 , 𝑦0 ), we compute the solution (𝛽2 , … , 𝛽𝑝 ) of the linear
system
𝑗
max(𝑐𝑥0 𝑦0 − 𝑐𝑥0 𝑦𝑖 − 𝑐𝑥𝑗 𝑦0 , −𝑐𝑥𝑗 𝑦𝑖 ) + 𝑐𝑥𝑖 𝑦𝑖 ≤ ∑ 𝛽𝑘 ≤ min(𝑐𝑥0 𝑦𝑗 + 𝑐𝑥𝑖 𝑦0 − 𝑐𝑥0 𝑦0 , 𝑐𝑥𝑖 𝑦𝑗 ) − 𝑐𝑥𝑗 𝑦𝑗 , for all 1 ≤ 𝑖 < 𝑗 ≤ 𝑝.
𝑘=𝑖+1
𝑝
Then for all 𝑖 ∈ [𝑝] compute the adjustment Δ𝑖 = ∑𝑘=𝑖+1 𝛽𝑘 + 𝜙𝑥𝑝 − 𝜙𝑥1 and modify the dual variables
𝜙𝑥 ← 𝜙 𝑥 + Δ 𝑖
𝜓𝑦 ← 𝜓 𝑦 − Δ 𝑖 ,
After this step, the dual variables of the types between 𝑥0 and 𝑦0 satisfy dual feasibility and complementary slackness;
we can then proceed to compute the dual variables for 𝑥0 and 𝑦0 by setting
𝜓𝑦0 = min{𝑐𝑥𝑖 𝑦0 − 𝜙𝑥𝑖 }
𝑖∈[𝑝]
sum_tensor -= sum_tensor.transpose(1, 0, 2)
beta = result.x
beta[0] = 0
return beta
OffDiagonal.compute_betas = compute_betas
The following method iteratively processes the matched pairs of the off-diagonal matching as explained above.
Δ_subpair = (beta[np.arange(i+1,len(subpairs[pair]))].sum()
+ ϕ_x[subpairs[pair][-1][0]]
- ϕ_x[subpair[0]])
if pair != 'artificial_pair':
if pair[0] == subpairs_x[0]:
ψ_y[pair[1]] = np.min(self.cost_x_y[pair[0], subpairs_y]
- ψ_y[subpairs_y]) + self.cost_x_y[pair]
else:
ψ_y[pair[1]] = np.min(self.cost_x_y[subpairs_x,
pair[1]] - ϕ_x[subpairs_x] )
OffDiagonal.compute_dual_off_diagonal = compute_dual_off_diagonal
We apply the algorithm to our example and check that dual feasibility (𝜙𝑥 + 𝜓𝑦 ≤ 𝑐𝑥𝑦 for all 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌 ) as well
as strong duality (𝑉𝑃 = 𝑉𝐷 ) are satisfied.
Having computed the dual variables of the off-diagonal types, we compute the dual variables for perfecly matched pairs
by setting
𝜙𝑥 = min {𝑐𝑥𝑦 − 𝜓𝑦 }
𝑦∈𝑌 𝑂𝐷
𝜓𝑦 = min {𝑐𝑥𝑦 − 𝜙𝑥 }
𝑥∈𝑋𝑂𝐷
where 𝑋 𝑂𝐷 and 𝑌 𝑂𝐷 are the types of the off-diagonal instance, for which the dual variables have already been computed.
The following method computes the full dual solution from the primal solution.
ϕ_x[nonzero_id_x] = ϕ_x_off_diag
ψ_y[nonzero_id_y] = ψ_x_off_diag
ConcaveCostOT.compute_dual_solution = compute_dual_solution
16.7 Application
16.7.1 Data
We now replicate the empirical analysis carried out by [Boerma et al., 2024].
The dataset is obtained from the American Community Survey and contains individual level data on income, age and
occupation.
The occupation of each individual consists of a Standard Occupational Classification (SOC) code.
There are 497 codes in total.
We consider only employed (civilian) individuals with ages between 25 and 60 from 2010 to 2017.
To visualize log-wage dispersion, we group the individuals by occupation and compute the mean and standard deviation
of the wages within each occupation.
data_path = '_static/lecture_specific/match_transport/'
occupation_df = pd.read_csv(data_path + 'acs_data_summary.csv')
# Polynomial interpolation
x = np.arange(len(occupation_df))
y = occupation_df['std_Earnings']
degree = 5
p = np.poly1d(np.polyfit(x, y, degree) )
plt.plot(x, p(x), color='red')
plt.show()
We also plot the average wages for each occupation (SOC code). Again, occupations are ordered by increasing average
wage.
# Polynomial interpolation
x = np.arange(len(occupation_df))
y = occupation_df['mean_Earnings']
degree = 5
p = np.poly1d(np.polyfit(x, y, degree) )
(continues on next page)
Fig. 16.1: Average wage for each Standard Occupational Classification (SOC) code. The codes are sorted by average
wage on the horizontal axis. In red, a polynomial of degree 5 is fitted to the data. The size of the marker is proportional
to the number of individuals in the occupation.
plt.show()
16.7.2 Model
parameters_1980 = namedtuple('Params_Jobs', [
'mean_1', 'var_1', 'mean_2', 'var_2', 'mixing_weight', 'var_workers'
])(
mean_1=0.38,
var_1=0.06,
mean_2=0.0,
var_2=0.75,
mixing_weight=0.36,
var_workers=0.2
)
(continues on next page)
Fig. 16.2: Average wage for each Standard Occupational Classification (SOC) code. The codes are sorted by average
wage on the horizontal axis. In red, a polynomial of degree 5 is fitted to the data.
num_agents=1500
np.random.seed(random_seed)
# Job types
job_types = np.where(np.random.rand(num_agents) < mixing_weight,
np.random.lognormal(mean_1, var_1, num_agents),
np.random.lognormal(mean_2, var_2, num_agents))
# Worker types
mean_workers = - var_workers/ 2
worker_types = np.random.lognormal(mean_workers, var_workers, num_agents)
ConcaveCostOT.generate_types_application = generate_types_application
Since we will consider examples with a large number of agents, it will be convenient to visualize the distributions as
histograms approximating the pdfs.
plt.figure(figsize=figsize)
plt.show()
ConcaveCostOT.plot_marginals_pdf = plot_marginals_pdf
We plot the hystograms and the measure of underqualification for the worker types and job types. We then compute the
primal solution and plot the matching.
# Plot pdf
range_x_axis = (0, 4)
model_1980.plot_marginals_pdf(figsize=(8, 5),
bins=300, range_x_axis=range_x_axis)
# Plot H_z
model_OD_1980 , _ = model_1980.generate_offD_onD_matching()
model_OD_1980.plot_H_z(figsize=(8, 5), range_x_axis=range_x_axis, scatter=False)
We then find the dual solution (𝜙, 𝜓) and compute the wages as 𝑤𝑥 = 𝑔(𝑥) − 𝜙𝑥 , assuming that the type-specific
productivity of type 𝑥 is 𝑔(𝑥) = 𝑥.
Let’s plot average wages and wage dispersion generated by the model.
def plot_wages_application(wages):
plot_wages_application(wage_worker_x_1980)
plot_wage_dispersion_model(wage_worker_x_1980, bins=100)
357
CHAPTER
SEVENTEEN
“Mathematics is the art of giving the same name to different things” – Henri Poincare
“Complete market economies are all alike” – Robert E. Lucas, Jr., (1989)
“Every partial equilibrium model can be reinterpreted as a general equilibrium model.” – Anonymous
This lecture presents a class of linear-quadratic-Gaussian models of general economic equilibrium designed by Lars Peter
Hansen and Thomas J. Sargent [Hansen and Sargent, 2013].
The class of models is implemented in a Python class DLE that is part of quantecon.
Subsequent lectures use the DLE class to implement various instances that have appeared in the economics literature
1. Growth in Dynamic Linear Economies
2. Lucas Asset Pricing using DLE
3. IRFs in Hall Model
4. Permanent Income Using the DLE class
5. Rosen schooling model
6. Cattle cycles
7. Shock Non Invertibility
In saying that “complete markets are all alike”, Robert E. Lucas, Jr. was noting that all of them have
• a commodity space.
• a space dual to the commodity space in which prices reside.
• endowments of resources.
• peoples’ preferences over goods.
• physical technologies for transforming resources into goods.
• random processes that govern shocks to technologies and preferences and associated information flows.
• a single budget constraint per person.
• the existence of a representative consumer even when there are many people in the model.
359
Advanced Quantitative Economics with Python
17.1.2 Forecasting?
A consequence of a single budget constraint per person plus the Hicks-Arrow tricks is that households and firms need not
forecast.
But there exist equivalent structures called recursive competitive equilibria in which they do appear to need to forecast.
In these structures, to forecast, households and firms use:
• equilibrium pricing functions, and
• knowledge of the Markov structure of the economy’s state vector.
For an application of the [Hansen and Sargent, 2013] class of models, the outcome of theorizing is a stochastic process,
i.e., a probability distribution over sequences of prices and quantities, indexed by parameters describing preferences,
technologies, and information flows.
Another name for that object is a likelihood function, a key object of both frequentist and Bayesian statistics.
There are two important uses of an equilibrium stochastic process or likelihood function.
The first is to solve the direct problem.
The direct problem takes as inputs values of the parameters that define preferences, technologies, and information flows
and as an output characterizes or simulates random paths of quantities and prices.
The second use of an equilibrium stochastic process or likelihood function is to solve the inverse problem.
The inverse problem takes as an input a time series sample of observations on a subset of prices and quantities determined
by the model and from them makes inferences about the parameters that define the model’s preferences, technologies,
and information flows.
A [Hansen and Sargent, 2013] economy consists of lists of matrices that describe peoples’ household technologies, their
preferences over consumption services, their production technologies, and their information sets.
There are complete markets in history-contingent commodities.
Competitive equilibrium allocations and prices
• satisfy equations that are easy to write down and solve
• have representations that are convenient econometrically
Different example economies manifest themselves simply as different settings for various matrices.
[Hansen and Sargent, 2013] use these tools:
• A theory of recursive dynamic competitive economies
• Linear optimal control theory
• Recursive methods for estimating and interpreting vector autoregressions
The models are flexible enough to express alternative senses of a representative household
• A single ‘stand-in’ household of the type used to good effect by Edward C. Prescott.
• Heterogeneous households satisfying conditions for Gorman aggregation into a representative household.
• Heterogeneous household technologies that violate conditions for Gorman aggregation but are still susceptible to
aggregation into a single representative household via ‘non-Gorman’ or ‘mongrel’ aggregation’.
These three alternative types of aggregation have different consequences in terms of how prices and allocations can be
computed.
In particular, can prices and an aggregate allocation be computed before the equilibrium allocation to individual hetero-
geneous households is computed?
• Answers are “Yes” for Gorman aggregation, “No” for non-Gorman aggregation.
In summary, the insights and practical benefits from economics to be introduced in this lecture are
• Deeper understandings that come from recognizing common underlying structures.
• Speed and ease of computation that comes from unleashing a common suite of Python programs.
We’ll use the following mathematical tools
• Stochastic Difference Equations (Linear).
• Duality: LQ Dynamic Programming and Linear Filtering are the same things mathematically.
• The Spectral Factorization Identity (for understanding vector autoregressions and non-Gorman aggregation).
So here is our roadmap.
We’ll describe sets of matrices that pin down
• Information
• Technologies
• Preferences
Then we’ll describe
• Equilibrium concept and computation
• Econometric representation and estimation
We’ll use stochastic linear difference equations to describe information flows and equilibrium outcomes.
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be a martingale difference sequence adapted to {𝐽𝑡 ∶ 𝑡 = 0, 1, …} if
𝐸(𝑤𝑡+1 |𝐽𝑡 ) = 0 for 𝑡 = 0, 1, … .
′
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be conditionally homoskedastic if 𝐸(𝑤𝑡+1 𝑤𝑡+1 ∣ 𝐽𝑡 ) = 𝐼 for 𝑡 = 0, 1, … .
We assume that the {𝑤𝑡 ∶ 𝑡 = 1, 2, …} process is conditionally homoskedastic.
Let {𝑥𝑡 ∶ 𝑡 = 1, 2, …} be a sequence of 𝑛-dimensional random vectors, i.e. an 𝑛-dimensional stochastic process.
The process {𝑥𝑡 ∶ 𝑡 = 1, 2, …} is constructed recursively using an initial random vector 𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ) and a time-
invariant law of motion:
Let 𝐽0 be generated by 𝑥0 and 𝐽𝑡 be generated by 𝑥0 , 𝑤1 , … , 𝑤𝑡 , which means that 𝐽𝑡 consists of the set of all measurable
functions of {𝑥0 , 𝑤1 , … , 𝑤𝑡 }.
𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ) = 𝐴𝑥𝑡
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
𝑡−1
= [∑ 𝐴𝜏 𝐶𝑤𝑡−𝜏 ] + 𝐴𝑡 𝑥0
𝜏=0
𝐸𝑡 𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡
𝑣1 = 𝐶𝐶 ′
𝑣𝑗 = 𝐶𝐶 ′ + 𝐴𝑣𝑗−1 𝐴′ , 𝑗≥2
To decompose these covariances into parts attributable to the individual components of 𝑤𝑡 , we let 𝑖𝜏 be an 𝑁 -dimensional
column vector of zeroes except in position 𝜏 , where there is a one. Define a matrix 𝜐𝑗,𝜏
𝑗−1
′
𝜐𝑗,𝜏 = ∑ 𝐴𝑘 𝐶𝑖𝜏 𝑖′𝜏 𝐶 ′ 𝐴 𝑘 .
𝑘=0
𝑁
Note that ∑𝜏=1 𝑖𝜏 𝑖′𝜏 = 𝐼, so that we have
𝑁
∑ 𝜐𝑗,𝜏 = 𝜐𝑗
𝜏=1
Evidently, the matrices {𝜐𝑗,𝜏 , 𝜏 = 1, … , 𝑁 } give an orthogonal decomposition of the covariance matrix of 𝑗-step-ahead
prediction errors into the parts attributable to each of the components 𝜏 = 1, … , 𝑁 .
𝑈𝑏 and 𝑈𝑑 are matrices that select entries of 𝑧𝑡 . The law of motion for {𝑧𝑡 ∶ 𝑡 = 0, 1, …} is
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 for 𝑡 = 0, 1, …
where 𝑧0 is a given initial condition. The eigenvalues of the matrix 𝐴22 have absolute values that are less than or equal
to one.
Thus, in summary, our model of information and shocks is
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1
𝑏𝑡 = 𝑈 𝑏 𝑧 𝑡
𝑑𝑡 = 𝑈𝑑 𝑧𝑡 .
We can now briefly summarize other components of our economies, in particular
• Production technologies
• Household technologies
• Household preferences
Where 𝑐𝑡 is a vector of consumption rates, 𝑘𝑡 is a vector of physical capital goods, 𝑔𝑡 is a vector intermediate productions
goods, 𝑑𝑡 is a vector of technology shocks, the production technology is
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Here Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 are all matrices conformable to the vectors they multiply and ℓ𝑡 is a disutility generating
resource supplied by the household.
For technical reasons that facilitate computations, we make the following.
Assumption: [Φ𝑐 Φ𝑔 ] is nonsingular.
Households confront a technology that allows them to devote consumption goods to construct a vector ℎ𝑡 of household
capital goods and a vector 𝑠𝑡 of utility generating house services
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
where Λ, Π, Δℎ , Θℎ are matrices that pin down the household technology.
We make the following
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one.
Below, we’ll outline further assumptions that we shall occasionally impose.
17.1.12 Preferences
Where 𝑏𝑡 is a stochastic process of preference shocks that will play the role of demand shifters, the representative house-
hold orders stochastic processes of consumption services 𝑠𝑡 according to
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
We now proceed to give examples of production and household technologies that appear in various models that appear
in the literature.
First, we give examples of production Technologies
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
∣ 𝑔𝑡 ∣≤ ℓ𝑡
so we’ll be looking for specifications of the matrices Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 that define them.
𝑐𝑡 + 𝑖𝑡 = 𝑑1𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡
where 𝜙1 is a small positive number.
To implement this version, we set Δ𝑘 = Θ𝑘 = 0 and
1 1 0 0 𝑑
Φ𝑐 = [ ] , Φ𝑖 = [ ] , Φ𝑔 = [ ] , Γ = [ ] , 𝑑𝑡 = [ 1𝑡 ]
0 𝜙1 −1 0 0
We can use this specification to create a linear-quadratic version of Lucas’s (1978) asset pricing model.
There is a single consumption good, a single intermediate good, and a single investment good.
The technology is described by
Set
1 0 0
Φ𝑐 = [ ] , Φ 𝑔 = [ ] , Φ 𝑖 = [ ]
0 −1 𝜙1
𝛾
Γ = [ ] , Δ𝑘 = 𝛿 𝑘 , Θ 𝑘 = 1
0
We set 𝐴22 , 𝐶2 and 𝑈𝑑 to make (𝑑1𝑡 , 𝑑2𝑡 )′ = 𝑑𝑡 follow a desired stochastic process.
Now we describe some examples of preferences, which as we have seen are ordered by
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + (ℓ𝑡 )2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and we make
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one.
Later we shall introduce canonical household technologies that satisfy an ‘invertibility’ requirement relating sequences
{𝑠𝑡 } of services and {𝑐𝑡 } of consumption flows.
And we’ll describe how to obtain a canonical representation of a household technology from one that is not canonical.
Here are some examples of household preferences.
Time Separable preferences
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
Consumer Durables
Services at 𝑡 are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
Set Δℎ = 𝛿ℎ , Θℎ = 1, Λ = 𝜆, Π = 0.
Habit Persistence
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0
2 𝑡=0 𝑗=0
𝑡
ℎ𝑡 = (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗 + 𝛿ℎ𝑡+1 ℎ−1
𝑗=0
𝑠𝑡 = −𝜆ℎ𝑡−1 + 𝑐𝑡 , 𝜆 > 0
To implement, set Λ = −𝜆, Π = 1, Δℎ = 𝛿ℎ , Θℎ = 1 − 𝛿ℎ .
Seasonal Habit Persistence
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−4𝑗−4 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0 𝑗=0
ℎ̃ 𝑡 0 0 0 𝛿ℎ ℎ̃ 𝑡−1 (1 − 𝛿ℎ )
⎡ ̃ ⎤ ⎡ ⎤ ⎡ ̃ ⎤ ⎡
ℎ 1 0 0 0 ⎢ℎ𝑡−2 ⎥ 0 ⎤
ℎ𝑡 = ⎢ ̃ 𝑡−1 ⎥ = ⎢ ⎥ + ⎢ ⎥𝑐
⎢ℎ𝑡−2 ⎥ ⎢0 1 0 0 ⎥ ⎢ℎ̃ 𝑡−3 ⎥ ⎢ 0 ⎥ 𝑡
⎣ℎ̃ 𝑡−3 ⎦ ⎣0 0 1 0 ⎦ ⎣ℎ̃ ⎦ ⎣ 0 ⎦
𝑡−4
Adjustment Costs.
Recall
∞
1
−( )𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏1𝑡 )2 + 𝜆2 (𝑐𝑡 − 𝑐𝑡−1 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
0<𝛽<1 , 𝜆>0
To capture adjustment costs, set
ℎ𝑡 = 𝑐𝑡
0 1
𝑠𝑡 = [ ]ℎ + [ ] 𝑐𝑡
−𝜆 𝑡−1 𝜆
so that
𝑠1𝑡 = 𝑐𝑡
0 1
Λ=[ ] , Π=[ ]
−𝜆 𝜆
Multiple Consumption Goods
0 𝜋 0
Λ = [ ] and Π = [ 1 ]
0 𝜋2 𝜋3
1
− 𝛽 𝑡 (Π𝑐𝑡 − 𝑏𝑡 )′ (Π𝑐𝑡 − 𝑏𝑡 )
2
𝜇𝑡 = −𝛽 𝑡 [Π′ Π 𝑐𝑡 − Π′ 𝑏𝑡 ]
𝑐𝑡 = −(Π′ Π)−1 𝛽 −𝑡 𝜇𝑡 + (Π′ Π)−1 Π′ 𝑏𝑡
This is called the Frisch demand function for consumption.
We can think of the vector 𝜇𝑡 as playing the role of prices, up to a common factor, for all dates and states.
The scale factor is determined by the choice of numeraire.
Notions of substitutes and complements can be defined in terms of these Frisch demand functions.
Two goods can be said to be substitutes if the cross-price effect is positive and to be complements if this effect is
negative.
Hence this classification is determined by the off-diagonal element of −(Π′ Π)−1 , which is equal to 𝜋2 𝜋3 / det(Π′ Π).
If 𝜋2 and 𝜋3 have the same sign, the goods are substitutes.
If they have opposite signs, the goods are complements.
To summarize, our economic structure consists of the matrices that define the following components:
Information and shocks
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1
𝑏𝑡 = 𝑈 𝑏 𝑧 𝑡
𝑑𝑡 = 𝑈𝑑 𝑧𝑡
Production Technology
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Household Technology
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
Preferences
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
∞
−(1/2)𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]∣𝐽0
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡
Define:
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
Thus, we require that each component of ℎ𝑡 and each component of 𝑘𝑡 belong to 𝐿20 .
We shall compare and utilize two approaches to solving the planning problem
• Lagrangian formulation
• Dynamic programming
∞
The planner maximizes ℒ with respect to the quantities {𝑐𝑡 , 𝑖𝑡 , 𝑔𝑡 }𝑡=0 and minimizes with respect to the Lagrange mul-
tipliers 𝑀𝑡𝑑 , 𝑀𝑡𝑘 , 𝑀𝑡ℎ , 𝑀𝑡𝑠 .
First-order necessary conditions for maximization with respect to 𝑐𝑡 , 𝑔𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , and 𝑠𝑡 , respectively, are:
−Φ′𝑐 𝑀𝑡𝑑 + Θ′ℎ 𝑀𝑡ℎ + Π′ 𝑀𝑡𝑠 = 0,
− 𝑔𝑡 − Φ′𝑔 𝑀𝑡𝑑 = 0,
−𝑀𝑡ℎ + 𝛽𝐸(Δ′ℎ 𝑀𝑡+1
ℎ
+ Λ′ 𝑀𝑡+1
𝑠
) ∣ 𝐽𝑡 = 0,
− Φ′𝑖 𝑀𝑡𝑑 + Θ′𝑘 𝑀𝑡𝑘 = 0,
−𝑀𝑡𝑘 + 𝛽𝐸(Δ′𝑘 𝑀𝑡+1
𝑘
+ Γ′ 𝑀𝑡+1
𝑑
) ∣ 𝐽𝑡 = 0,
− 𝑠𝑡 + 𝑏𝑡 − 𝑀𝑡𝑠 = 0
for 𝑡 = 0, 1, ….
In addition, we have the complementary slackness conditions (these recover the original transition equations) and also
transversality conditions
lim 𝛽 𝑡 𝐸[𝑀𝑡𝑘′ 𝑘𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
lim 𝛽 𝑡 𝐸[𝑀𝑡ℎ′ ℎ𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
The system formed by the FONCs and the transition equations can be handed over to Python.
Python will solve the planning problem for fixed parameter values.
Here are the Python Ready Equations
−Φ′𝑐 𝑀𝑡𝑑 + Θ′ℎ 𝑀𝑡ℎ + Π′ 𝑀𝑡𝑠 = 0,
− 𝑔𝑡 − Φ′𝑔 𝑀𝑡𝑑 = 0,
−𝑀𝑡ℎ + 𝛽𝐸(Δ′ℎ 𝑀𝑡+1
ℎ
+ Λ′ 𝑀𝑡+1
𝑠
) ∣ 𝐽𝑡 = 0,
− Φ′𝑖 𝑀𝑡𝑑 + Θ′𝑘 𝑀𝑡𝑘 = 0,
−𝑀𝑡𝑘 + 𝛽𝐸(Δ′𝑘 𝑀𝑡+1
𝑘
+ Γ′ 𝑀𝑡+1
𝑑
) ∣ 𝐽𝑡 = 0,
− 𝑠𝑡 + 𝑏𝑡 − 𝑀𝑡𝑠 = 0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡
The Lagrange multipliers or shadow prices satisfy
𝑀𝑡𝑠 = 𝑏𝑡 − 𝑠𝑡
∞
𝑀𝑡ℎ = 𝐸[∑ 𝛽 𝜏 (Δ′ℎ )𝜏−1 Λ′ 𝑀𝑡+𝜏
𝑠
∣ 𝐽𝑡 ]
𝜏=1
−1
Φ′𝑐 Θ′ 𝑀 ℎ + Π′ 𝑀𝑡𝑠
𝑀𝑡𝑑 = [ ] [ ℎ 𝑡 ]
Φ′𝑔 −𝑔𝑡
∞
𝑀𝑡𝑘 = 𝐸[∑ 𝛽 𝜏 (Δ′𝑘 )𝜏−1 Γ′ 𝑀𝑡+𝜏
𝑑
∣ 𝐽𝑡 ]
𝜏=1
Φ𝑐 𝑐0 + Φ𝑔 𝑔0 + Φ𝑖 𝑖0 = Γ𝑘−1 + 𝑑0 ,
𝑘0 = Δ𝑘 𝑘−1 + Θ𝑘 𝑖0 ,
ℎ0 = Δℎ ℎ−1 + Θℎ 𝑐0 ,
𝑠0 = Λℎ−1 + Π𝑐0 ,
𝑧1 = 𝐴22 𝑧0 + 𝐶2 𝑤1 , 𝑏0 = 𝑈𝑏 𝑧0 and 𝑑0 = 𝑈𝑑 𝑧0
Because this is a linear-quadratic dynamic programming problem, it turns out that the value function has the form
𝑉 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝜌
Thus, we want to solve an instance of the following linear-quadratic dynamic programming problem:
Choose a contingency plan for {𝑥𝑡+1 , 𝑢𝑡 }∞
𝑡=0 to maximize
∞
−𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0 < 𝛽 < 1
𝑡=0
subject to
𝑉 (𝑥𝑡 ) = −𝑥′𝑡 𝑃 𝑥𝑡 − 𝜌
𝑃 satisfies
The optimum decision rule for 𝑢𝑡 is independent of the parameters 𝐶, and so of the noise statistics.
Iterating on the Bellman operator leads to
𝑉𝑗 (𝑥𝑡 ) = −𝑥′𝑡 𝑃𝑗 𝑥𝑡 − 𝜌𝑗
where 𝑃𝑗 and 𝜌𝑗 satisfy the equations
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢𝑘𝑡−1 ⎥ , 𝑢𝑡 = 𝑖𝑡
⎣ 𝑧𝑡 ⎦
where
Δℎ Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Γ Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 𝑈𝑑
⎡
𝐴=⎢ 0 Δ𝑘 0 ⎤
⎥
⎣ 0 0 𝐴22 ⎦
−1
−Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ] Φ𝑖 0
𝐵=⎡
⎢ Θ𝑘 ⎤ , 𝐶=⎡0⎤
⎥ ⎢ ⎥
⎣ 0 ⎦ ⎣𝐶2 ⎦
′ ′
𝑥 𝑥 𝑥 𝑅 𝑊 𝑥
[ 𝑡] 𝑆 [ 𝑡] = [ 𝑡] [ ′ ] [ 𝑡]
𝑢𝑡 𝑢𝑡 𝑢𝑡 𝑊 𝑄 𝑢𝑡
𝑆 = (𝐺′ 𝐺 + 𝐻 ′ 𝐻)/2
𝐻 = [Λ ⋮ Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Γ ⋮ Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 𝑈𝑑 − 𝑈𝑏 ⋮ −Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Φ𝑖 ]
𝐺 = 𝑈𝑔 [Φ𝑐 Φ𝑔 ]−1 [0 ⋮ Γ ⋮ 𝑈𝑑 ⋮ −Φ𝑖 ].
Lagrange multipliers as gradient of value function
A useful fact is that Lagrange multipliers equal gradients of the planner’s value function
ℳ𝑘𝑡 = 𝑀𝑘 𝑥𝑡 and 𝑀𝑡ℎ = 𝑀ℎ 𝑥𝑡 where
𝑀𝑘 = 2𝛽[0 𝐼 0]𝑃 𝐴𝑜
𝑀ℎ = 2𝛽[𝐼 0 0]𝑃 𝐴𝑜
ℳ𝑠𝑡 = 𝑀𝑠 𝑥𝑡 where 𝑀𝑠 = (𝑆𝑏 − 𝑆𝑠 ) and 𝑆𝑏 = [0 0 𝑈𝑏 ]
−1
Φ′ Θ′ 𝑀 + Π ′ 𝑀 𝑠
ℳ𝑑𝑡 = 𝑀𝑑 𝑥𝑡 where 𝑀𝑑 = [ ′𝑐 ] [ ℎ ℎ ]
Φ𝑔 −𝑆𝑔
ℳ𝑐𝑡 = 𝑀𝑐 𝑥𝑡 where 𝑀𝑐 = Θ′ℎ 𝑀ℎ + Π′ 𝑀𝑠
ℳ𝑖𝑡 = 𝑀𝑖 𝑥𝑡 where 𝑀𝑖 = Θ′𝑘 𝑀𝑘
We will use this fact and these equations to compute competitive equilibrium prices.
Let’s start with describing the commodity space and pricing functional for our competitive equilibrium.
For the commodity space, we use
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
The representative household owns endowment process and initial stocks of ℎ and 𝑘 and chooses stochastic processes for
{𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , ℓ𝑡 }∞ 2
𝑡=0 , each element of which is in 𝐿0 , to maximize
∞
1
− 𝐸0 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0
subject to
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘−1
𝑡=0 𝑡=0
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 , ℎ−1 , 𝑘−1 given
We now describe the problems faced by two types of firms called type I and type II.
A type I firm rents capital and labor and endowments and produces 𝑐𝑡 , 𝑖𝑡 .
It chooses stochastic processes for {𝑐𝑡 , 𝑖𝑡 , 𝑘𝑡 , ℓ𝑡 , 𝑔𝑡 , 𝑑𝑡 }, each element of which is in 𝐿20 , to maximize
∞
𝐸0 ∑ 𝛽 𝑡 (𝑝𝑡0 ⋅ 𝑐𝑡 + 𝑞𝑡0 ⋅ 𝑖𝑡 − 𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑤𝑡0 ℓ𝑡 − 𝛼0𝑡 ⋅ 𝑑𝑡 )
𝑡=0
subject to
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
− ℓ𝑡2 + 𝑔𝑡 ⋅ 𝑔𝑡 = 0
A firm of type II acquires capital via investment and then rents stocks of capital to the 𝑐, 𝑖-producing type I firm.
A type II firm is a price taker facing the vector 𝑣0 and the stochastic processes {𝑟𝑡0 , 𝑞𝑡0 }.
The firm chooses 𝑘−1 and stochastic processes for {𝑘𝑡 , 𝑖𝑡 }∞
𝑡=0 to maximize
∞
𝐸 ∑ 𝛽 𝑡 (𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑞𝑡0 ⋅ 𝑖𝑡 ) ∣ 𝐽0 − 𝑣0 ⋅ 𝑘−1
𝑡=0
subject to
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑤𝑡0 =∣ 𝑆𝑔 𝑥𝑡 ∣ /𝜇𝑤
0
𝑣0 = Γ′ 𝑀0𝑑 /𝜇𝑤 ′ 𝑘 𝑤
0 + Δ𝑘 𝑀0 /𝜇0
Verification: With this price system, values can be assigned to the Lagrange multipliers for each of our three classes of
agents that cause all first-order necessary conditions to be satisfied at these prices and at the quantities associated with
the optimum of the planning problem.
17.2 Econometrics
Up to now, we have described how to solve the direct problem that maps model parameters into an (equilibrium)
stochastic process of prices and quantities.
Recall the inverse problem of inferring model parameters from a single realization of a time series of some of the prices
and quantities.
Another name for the inverse problem is econometrics.
An advantage of the [Hansen and Sargent, 2013] structure is that it comes with a self-contained theory of econometrics.
It is really just a tale of two state-space representations.
Here they are:
Original State-Space Representation:
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡
where 𝑣𝑡 is a martingale difference sequence of measurement errors that satisfies 𝐸𝑣𝑡 𝑣𝑡′ = 𝑅, 𝐸𝑤𝑡+1 𝑣𝑠′ = 0 for all
𝑡 + 1 ≥ 𝑠 and
𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 )
Innovations Representation:
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡 ,
𝐾𝑡 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1
Σ𝑡+1 = 𝐴𝑜 Σ𝑡 𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴𝑜′
𝑎𝑡 = 𝑦𝑡 − 𝐺𝑥𝑡̂
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
can be used recursively to construct a record of innovations {𝑎𝑡 }𝑇𝑡=0 from an (𝑥0̂ , Σ0 ) and a record of observations
{𝑦𝑡 }𝑇𝑡=0 .
Limiting Time-Invariant Innovations Representation
Σ = 𝐴𝑜 Σ𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴𝑜′
𝐾 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝐺′ + 𝑅)−1
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡
where 𝐸𝑎𝑡 𝑎′𝑡 ≡ Ω = 𝐺Σ𝐺 + 𝑅.′
𝑓(𝑦𝑇 , 𝑦𝑇 −1 , … , 𝑦0 ) = 𝑓𝑇 (𝑦𝑇 |𝑦𝑇 −1 , … , 𝑦0 )𝑓𝑇 −1 (𝑦𝑇 −1 |𝑦𝑇 −2 , … , 𝑦0 ) ⋯ 𝑓1 (𝑦1 |𝑦0 )𝑓0 (𝑦0 )
= 𝑔𝑇 (𝑎𝑇 )𝑔𝑇 −1 (𝑎𝑇 −1 ) … 𝑔1 (𝑎1 )𝑓0 (𝑦0 ).
Gaussian Log-Likelihood:
𝑇
−.5 ∑{𝑛𝑦 ln(2𝜋) + ln |Ω𝑡 | + 𝑎′𝑡 Ω−1
𝑡 𝑎𝑡 }
𝑡=0
Key Insight: The zeros of the polynomial det[𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐾 + 𝐼] all lie inside the unit circle, which means that 𝑎𝑡
lies in the space spanned by square summable linear combinations of 𝑦𝑡 .
𝐻(𝑎𝑡 ) = 𝐻(𝑦𝑡 )
𝐿𝑥𝑡 ≡ 𝑥𝑡−1
𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1
A Wold moving average representation for {𝑦𝑡 } is
Applying the inverse of the operator on the right side and using
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
𝑏𝑡 = 𝑈 𝑏 𝑧 𝑡
Definition: A household service technology (Δℎ , Θℎ , Π, Λ, 𝑈𝑏 ) is said to be canonical if
• Π is nonsingular, and
√
• the absolute values of the eigenvalues of (Δℎ − Θℎ Π−1 Λ) are strictly less than 1/ 𝛽.
Key invertibility property: A canonical household service technology maps a service process {𝑠𝑡 } in 𝐿20 into a corre-
sponding consumption process {𝑐𝑡 } for which the implied household capital stock process {ℎ𝑡 } is also in 𝐿20 .
An inverse household technology:
The restriction on the eigenvalues of the matrix (Δℎ − Θℎ Π−1 Λ) keeps the household capital stock {ℎ𝑡 } in 𝐿20 .
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1
where ℎ𝑖,−1 = ℎ−1 .
∞
𝑊0 = 𝐸0 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) + 𝑣0 ⋅ 𝑘−1
𝑡=0
∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ (𝑏𝑡 − 𝑠𝑖,𝑡 ) − 𝑊0
𝜇𝑤
0 = ∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ 𝜌𝑡0
𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑏𝑡 − Π−1 𝜇𝑤
0 𝐸𝑡 {Π
′ −1
− Π′ −1 Θ′ℎ
[𝐼 − (Δ′ℎ − Λ′ Π′ −1 Θ′ℎ )𝛽𝐿−1 ]−1 Λ′ Π′−1 𝛽𝐿−1 }𝑝𝑡0
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
This system expresses consumption demands at date 𝑡 as functions of: (i) time-𝑡 conditional expectations of future scaled
0
Arrow-Debreu prices {𝑝𝑡+𝑠 }∞
𝑠=0 ; (ii) the stochastic process for the household’s endowment {𝑑𝑡 } and preference shock
{𝑏𝑡 }, as mediated through the multiplier 𝜇𝑤 0 and wealth 𝑊0 ; and (iii) past values of consumption, as mediated through
the state variable ℎ𝑡−1 .
We shall explore how the dynamic demand schedule for consumption goods opens up the possibility of satisfying Gorman’s
(1953) conditions for aggregation in a heterogeneous consumer model.
The first equation of our demand system is an Engel curve for consumption that is linear in the marginal utility 𝜇20 of
individual wealth with a coefficient on 𝜇𝑤
0 that depends only on prices.
The multiplier 𝜇𝑤
0 depends on wealth in an affine relationship, so that consumption is linear in wealth.
In a model with multiple consumers who have the same household technologies (Δℎ , Θℎ , Λ, Π) but possibly different
preference shock processes and initial values of household capital stocks, the coefficient on the marginal utility of wealth
is the same for all consumers.
Gorman showed that when Engel curves satisfy this property, there exists a unique community or aggregate preference
ordering over aggregate consumption that is independent of the distribution of wealth.
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1 ,
∞
𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝜌𝑡+𝑗
𝑡
⋅ (𝑏𝑡+𝑗 − 𝑠𝑖,𝑡+𝑗 ) − 𝑊𝑡
𝜇𝑤
𝑡 = ∞ 𝑡 𝑡
𝐸𝑡 ∑𝑡=0 𝛽 𝑗 𝜌𝑡+𝑗 ⋅ 𝜌𝑡+𝑗
𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑏𝑡 − Π−1 𝜇𝑤
𝑡 𝐸𝑡 {Π
′ −1
− Π′ −1 Θ′ℎ
[𝐼 − (Δ′ℎ − Λ′ Π′ −1 Θ′ℎ )𝛽𝐿−1 ]−1 Λ′ Π′−1 𝛽𝐿−1 }𝑝𝑡𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
[Π + 𝛽 1/2 𝐿−1 Λ(𝐼 − 𝛽 1/2 𝐿−1 Δℎ )−1 Θℎ ]′ [Π + 𝛽 1/2 𝐿Λ(𝐼 − 𝛽 1/2 𝐿Δℎ )−1 Θℎ ]
̂ − 𝛽 1/2 𝐿−1 Δ )−1 Θ ]′ [Π̂ + 𝛽 1/2 𝐿Λ(𝐼
= [Π̂ + 𝛽 1/2 𝐿−1 Λ(𝐼 ̂ − 𝛽 1/2 𝐿Δ )−1 Θ ]
ℎ ℎ ℎ ℎ
The factorization identity guarantees that the [Λ,̂ Π]̂ representation satisfies both requirements for a canonical represen-
tation.
Now we’ll provide quick overviews of examples of economies that fit within our framework
We provide details for a number of these examples in subsequent lectures
1. Growth in Dynamic Linear Economies
2. Lucas Asset Pricing using DLE
3. IRFs in Hall Model
4. Permanent Income Using the DLE class
5. Rosen schooling model
6. Cattle cycles
From material described earlier in this lecture, we know how to reverse engineer preferences that generate this demand
system
• note how the demand equations are cast in terms of the matrices in our standard preference representation
Now let’s turn to supply.
A representative firm takes as given and beyond its control the stochastic process {𝑝𝑡 }∞
𝑡=0 .
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 ⋅ 𝑐𝑡 − 𝑔𝑡 ⋅ 𝑔𝑡 /2}
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑖 𝑖𝑡 + Φ𝑔 𝑔𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 .
𝑐𝑡 = 𝛾𝑘𝑡−1
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝑓1 𝑖𝑡 + 𝑓2 𝑑𝑡
where 𝑑𝑡 is a cost shifter, 𝛾 > 0, and 𝑓1 > 0 is a cost parameter and 𝑓2 = 1. Demand is governed by
𝑝𝑡 = 𝛼0 − 𝛼1 𝑐𝑡 + 𝑢𝑡
where 𝑢𝑡 is a demand shifter with mean zero and 𝛼0 , 𝛼1 are positive parameters.
Assume that 𝑢𝑡 , 𝑑𝑡 are uncorrelated first-order autoregressive processes.
𝑅𝑡 = 𝑏𝑡 + 𝛼ℎ𝑡
∞
𝑝𝑡 = 𝐸𝑡 ∑(𝛽𝛿ℎ )𝜏 𝑅𝑡+𝜏
𝜏=0
where ℎ𝑡 is the stock of housing at time 𝑡 𝑅𝑡 is the rental rate for housing, 𝑝𝑡 is the price of new houses, and 𝑏𝑡 is a
demand shifter; 𝛼 < 0 is a demand parameter, and 𝛿ℎ is a depreciation factor for houses.
We cast this demand specification within our class of models by letting the stock of houses ℎ𝑡 evolve according to
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 𝛿ℎ ∈ (0, 1)
where 𝑐𝑡 is the rate of production of new houses.
̄ 𝑡 or 𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡 , where 𝜆 = 𝜆𝛿
Houses produce services 𝑠𝑡 according to 𝑠𝑡 = 𝜆ℎ ̄ ℎ , 𝜋 = 𝜆.̄
̄ 𝑡0 = 𝑅𝑡 as the rental rate on housing at time 𝑡, measured in units of time 𝑡 consumption (housing).
We can take 𝜆𝜌
Demand for housing services is
𝑠𝑡 = 𝑏𝑡 − 𝜇0 𝜌𝑡0
where the price of new houses 𝑝𝑡 is related to 𝜌𝑡0 by 𝜌𝑡0 = 𝜋−1 [𝑝𝑡 − 𝛽𝛿ℎ 𝐸𝑡 𝑝𝑡+1 ].
Rosen, Murphy, and Scheinkman (1994). Let 𝑝𝑡 be the price of freshly slaughtered beef, 𝑚𝑡 the feeding cost of preparing
an animal for slaughter, ℎ̃ 𝑡 the one-period holding cost for a mature animal, 𝛾1 ℎ̃ 𝑡 the one-period holding cost for a yearling,
and 𝛾0 ℎ̃ 𝑡 the one-period holding cost for a calf.
The cost processes {ℎ̃ 𝑡 , 𝑚𝑡 }∞ ∞
𝑡=0 are exogenous, while the stochastic process {𝑝𝑡 }𝑡=0 is determined by a rational expec-
tations equilibrium. Let 𝑥𝑡̃ be the breeding stock, and 𝑦𝑡̃ be the total stock of animals.
The law of motion for cattle stocks is
𝑥𝑡̃ = (1 − 𝛿)𝑥𝑡−1
̃ + 𝑔𝑥𝑡−3
̃ − 𝑐𝑡
where 𝑐𝑡 is a rate of slaughtering. The total head-count of cattle
𝑦𝑡̃ = 𝑥𝑡̃ + 𝑔𝑥𝑡−1
̃ + 𝑔𝑥𝑡−2
̃
is the sum of adults, calves, and yearlings, respectively.
A representative farmer chooses {𝑐𝑡 , 𝑥𝑡̃ } to maximize
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ̃ 𝑡 𝑥𝑡̃ − (𝛾0 ℎ̃ 𝑡 )(𝑔𝑥𝑡−1
̃ ) − (𝛾1 ℎ̃ 𝑡 )(𝑔𝑥𝑡−2
̃ ) − 𝑚 𝑡 𝑐𝑡
𝑡=0
− Ψ(𝑥𝑡̃ , 𝑥𝑡−1
̃ , 𝑥𝑡−2
̃ , 𝑐𝑡 )}
where
𝜓1 2 𝜓2 2 𝜓 𝜓
Ψ= 𝑥̃ + 𝑥̃ + 3 𝑥2𝑡−2
̃ + 4 𝑐𝑡2
2 𝑡 2 𝑡−1 2 2
Demand is governed by
𝑐𝑡 = 𝛼0 − 𝛼1 𝑝𝑡 + 𝑑𝑡̃
where 𝛼0 > 0, 𝛼1 > 0, and {𝑑𝑡̃ }∞
𝑡=0 is a stochastic process with mean zero representing a demand shifter.
We’ll describe the following pair of schooling models that view education as a time-to-build process:
• Rosen schooling model for engineers
• Two-occupation model
Ryoo and Rosen’s (2004) [Ryoo and Rosen, 2004] model consists of the following equations:
first, a demand curve for engineers
third, a definition of the discounted present value of each new engineering student
∞
𝑣𝑡 = 𝛽 𝑘 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗 ;
𝑗=0
𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖2𝑡 , 𝛼𝑠 > 0
Here {𝜖1𝑡 , 𝜖2𝑡 } are stochastic processes of labor demand and supply shocks.
Definition: A partial equilibrium is a stochastic process {𝑤𝑡 , 𝑁𝑡 , 𝑣𝑡 , 𝑛𝑡 }∞
𝑡=0 satisfying these four equations, and initial
conditions 𝑁−1 , 𝑛−𝑠 , 𝑠 = 1, … , −𝑘.
We sweep the time-to-build structure and the demand for engineers into the household technology and putting the supply
of new engineers into the technology for producing goods.
ℎ1𝑡−1
⎡ ℎ ⎤
𝑠𝑡 = [𝜆1 0 … 0] ⎢ 2𝑡−1 ⎥ + 0 ⋅ 𝑐𝑡
⎢ ⋮ ⎥
⎣ℎ𝑘+1,𝑡−1 ⎦
ℎ1𝑡 𝛿𝑁 1 0 ⋯ 0 ℎ1𝑡−1 0
⎡ ℎ ⎤ ⎡0 0 1 ⋯ 0⎤ ⎡ ℎ2𝑡−1 ⎤ ⎡0⎤
2𝑡
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥=⎢ ⋮ ⋮ ⋮ ⋱ ⋮⎥⎢ ⋮ ⎥ + ⎢ ⋮ ⎥ 𝑐𝑡
⎢ ℎ𝑘,𝑡 ⎥ ⎢ 0 ⋯ ⋯ 0 1⎥ ⎢ ℎ𝑘,𝑡−1 ⎥ ⎢0⎥
⎣ℎ𝑘+1,𝑡 ⎦ ⎣ 0 0 0 ⋯ 0⎦ ⎣ℎ𝑘+1,𝑡−1 ⎦ ⎣1⎦
This specification sets Rosen’s 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−𝜏 , 𝜏 = 1, … , 𝑘, and uses the home-produced service
to capture the demand for labor. Here 𝜆1 embodies Rosen’s demand parameter 𝛼𝑑 .
• The supply of new workers becomes our consumption.
• The dynamic demand curve becomes Rosen’s dynamic supply curve for new workers.
Remark: This has an Imai-Keane flavor.
For more details and Python code see Rosen schooling model.
𝑤 𝑁
[ 𝑢𝑡 ] = 𝛼𝑑 [ 𝑢𝑡 ] + 𝜖1𝑡
𝑤𝑠𝑡 𝑁𝑠𝑡
where 𝛼𝑑 is a (2 × 2) matrix of demand parameters and 𝜖1𝑡 is a vector of demand shifters second, time-to-train specifi-
cations for skilled and unskilled labor, respectively:
where 𝑁𝑠𝑡 , 𝑁𝑢𝑡 are stocks of the two types of labor, and 𝑛𝑠𝑡 , 𝑛𝑢𝑡 are entry rates into the two occupations.
third, definitions of discounted present values of new entrants to the skilled and unskilled occupations, respectively:
∞
𝑣𝑠𝑡 = 𝐸𝑡 𝛽 𝑘 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑠𝑡+𝑘+𝑗
𝑗=0
∞
𝑣𝑢𝑡 = 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑢𝑡+𝑗
𝑗=0
where 𝑤𝑢𝑡 , 𝑤𝑠𝑡 are wage rates for the two occupations; and fourth, supply curves for new entrants:
𝑛 𝑣
[ 𝑠𝑡 ] = 𝛼𝑠 [ 𝑢𝑡 ] + 𝜖2𝑡
𝑛𝑢𝑡 𝑣𝑠𝑡
Short Cut
As an alternative, Siow simply used the equalizing differences condition
𝑣𝑢𝑡 = 𝑣𝑠𝑡
∞
𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 𝐸(𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )|𝐽𝑡 ⇒
𝑗=0
∞ ∞
∑ 𝛽 𝑗 (𝜙𝑐 )′ 𝜒𝑗 = ∑ 𝛽 𝑗 𝜖𝑗
𝑗=0 𝑗=0
and
For more details see Permanent Income Using the DLE class
Testing Permanent Income Models:
We have two types of implications of permanent income models:
• Equality of present values of moving average coefficients.
• Martingale ℳ𝑘𝑡 .
These have been tested in work by Hansen, Sargent, and Roberts (1991) [Sargent et al., 1991] and by Attanasio and
Pavoni (2011) [Attanasio and Pavoni, 2011].
We now assume that there is a finite number of households, each with its own household technology and preferences over
consumption services.
Household 𝑗 orders preferences over consumption processes according to
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) ⋅ (𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) + ℓ𝑗𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑏𝑗𝑡 = 𝑈𝑏𝑗 𝑧𝑡
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑗𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑗𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑗𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘𝑗,−1 ,
𝑡=0 𝑡=0
th
where 𝑘𝑗,−1 is given. The 𝑗 consumer owns an endowment process 𝑑𝑗𝑡 , governed by the stochastic process 𝑑𝑗𝑡 = 𝑈𝑑𝑗 𝑧𝑡 .
We refer to this as a setting with Gorman heterogeneous households.
This specification confines heterogeneity among consumers to:
• differences in the preference processes {𝑏𝑗𝑡 }, represented by different selections of 𝑈𝑏𝑗
• differences in the endowment processes {𝑑𝑗𝑡 }, represented by different selections of 𝑈𝑑𝑗
• differences in ℎ𝑗,−1 and
• differences in 𝑘𝑗,−1
The matrices Λ, Π, Δℎ , Θℎ do not depend on 𝑗.
This makes everybody’s demand system have the form described earlier, with different 𝜇𝑤 𝑗0 ’s (reflecting different wealth
levels) and different 𝑏𝑗𝑡 preference shock processes and initial conditions for household capital stocks.
Punchline: there exists a representative consumer.
We can use the representative consumer to compute a competitive equilibrium aggregate allocation and price system.
With the equilibrium aggregate allocation and price system in hand, we can then compute allocations to each household.
Computing Allocations to Individuals:
Set
ℓ𝑗𝑡 = (𝜇𝑤 𝑤
0𝑗 /𝜇0𝑎 )ℓ𝑎𝑡
∞ ∞
𝜇𝑤 𝑡 0 0 0 𝑤 𝑡 0 𝑖 0
0𝑗 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ 𝜌𝑡 + (𝑤𝑡 /𝜇0𝑎 )ℓ𝑎𝑡 } = 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ (𝑏𝑗𝑡 − 𝑠𝑗𝑡 ) − 𝛼𝑡 ⋅ 𝑑𝑗𝑡 } − 𝑣0 𝑘𝑗,−1
𝑡=0 𝑡=0
𝑠𝑗𝑡 − 𝑏𝑗𝑡 = 𝜇𝑤 0
0𝑗 𝜌𝑡
We now describe a less tractable type of heterogeneity across households that we dub Non-Gorman heterogeneity.
Here is the specification:
Preferences and Household Technologies:
∞
1
− 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) ⋅ (𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) + ℓ𝑖𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑑𝑖𝑡 = 𝑈𝑑𝑖 𝑧𝑡 , 𝑖 = 1, 2
Pareto Problem:
∞
1
− 𝜆𝐸0 ∑ 𝛽 𝑡 [(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + ℓ1𝑡
2
]
2 𝑡=0
∞
1
− (1 − 𝜆)𝐸0 ∑ 𝛽 𝑡 [(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 ) + ℓ2𝑡
2
]
2 𝑡=0
𝑝𝑡 = 𝜇−1 ′ −1 ′
0 Π 𝑏𝑡 − 𝜇0 Π Π𝑐𝑡
Integrating the marginal utility vector shows that preferences can be taken to be
𝜇−1 ′ −1 −1′
0 Π Π = (𝜇01 Π1 Π2 + 𝜇02 Π−1 −1′ −1
2 Π2 )
Dynamic Analogue:
We now describe how to extend mongrel aggregation to a dynamic setting.
The key comparison is
• Static: factor a covariance matrix-like object
• Dynamic: factor a spectral-density matrix-like object
Programming Problem for Dynamic Mongrel Aggregation:
Our strategy for deducing the mongrel preference ordering over 𝑐𝑡 = 𝑐1𝑡 + 𝑐2𝑡 is to solve the programming problem:
choose {𝑐1𝑡 , 𝑐2𝑡 } to maximize the criterion
∞
∑ 𝛽 𝑡 [𝜆(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + (1 − 𝜆)(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 )]
𝑡=0
subject to
subject to (ℎ1,−1 , ℎ2,−1 ) given and {𝑏1𝑡 }, {𝑏2𝑡 }, {𝑐𝑡 } being known and fixed sequences.
Substituting the {𝑐1𝑡 , 𝑐2𝑡 } sequences that solve this problem as functions of {𝑏1𝑡 , 𝑏2𝑡 , 𝑐𝑡 } into the objective determines
a mongrel preference ordering over {𝑐𝑡 } = {𝑐1𝑡 + 𝑐2𝑡 }.
In solving this problem, it is convenient to proceed by using Fourier transforms. For details, please see [Hansen and
Sargent, 2013] where they deploy a
Secret Weapon: Another application of the spectral factorization identity.
Concluding remark: The [Hansen and Sargent, 2013] class of models described in this lecture are all complete markets
models. We have exploited the fact that complete market models are all alike to allow us to define a class that gives the
same name to different things in the spirit of Henri Poincare.
Could we create such a class for incomplete markets models?
That would be nice, but before trying it would be wise to contemplate the remainder of a statement by Robert E. Lucas,
Jr., with which we began this lecture.
“Complete market economies are all alike but each incomplete market economy is incomplete in its own
individual way.” Robert E. Lucas, Jr., (1989)
EIGHTEEN
This is another member of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen
and Sargent, 2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s included in Anaconda, this lecture uses the quantecon library.
This lecture describes several complete market economies having a common linear-quadratic-Gaussian structure.
Three examples of such economies show how the DLE class can be used to compute equilibria of such economies in
Python and to illustrate how different versions of these economies can or cannot generate sustained growth.
We require the following imports
import numpy as np
import matplotlib.pyplot as plt
from quantecon import DLE
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈𝑑 𝑧𝑡
• Consumption and physical investment goods are produced using the following technology
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
where 𝑐𝑡 is a vector of consumption goods, 𝑔𝑡 is a vector of intermediate goods, 𝑖𝑡 is a vector of investment goods,
𝑘𝑡 is a vector of physical capital goods, and 𝑙𝑡 is the amount of labor supplied by the representative household.
389
Advanced Quantitative Economics with Python
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
{𝐴22 , 𝐶2 , 𝑈𝑏 , 𝑈𝑑 , Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 , Λ, Π, Δℎ , Θℎ }
The first welfare theorem asserts that a competitive equilibrium allocation solves the following planning problem.
Choose {𝑐𝑡 , 𝑠𝑡 , 𝑖𝑡 , ℎ𝑡 , 𝑘𝑡 , 𝑔𝑡 }∞
𝑡=0 to maximize
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
2 𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and
𝑏𝑡 = 𝑈 𝑏 𝑧 𝑡
𝑑𝑡 = 𝑈𝑑 𝑧𝑡
The DLE class in Python maps this planning problem into a linear-quadratic dynamic programming problem and then
solves it by using QuantEcon’s LQ class.
(See Section 5.5 of Hansen & Sargent (2013) [Hansen and Sargent, 2013] for a full description of how to map these
economies into an LQ setting, and how to use the solution to the LQ problem to construct the output matrices in order to
simulate the economies)
The state for the LQ problem is
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
Each of the example economies shown here will share a number of components. In particular, for each we will consider
preferences of the form
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝜃ℎ 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology of the form
𝑐𝑡 + 𝑖𝑡 = 𝛾1 𝑘𝑡−1 + 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
And information of the form
1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤ ⎡
⎥ 𝑧𝑡 + ⎢ 1 0 ⎤
⎥ 𝑤𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
We shall vary {𝜆, 𝜋, 𝛿ℎ , 𝜃ℎ , 𝛾1 , 𝛿𝑘 , 𝜙1 } and the initial state 𝑥0 across the three economies.
First, we set parameters such that consumption follows a random walk. In particular, we set
1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05
(In this economy 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for consumption services We
set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1.
For simulations of this economy, we choose an initial condition of
′
𝑥0 = [ 5 150 1 0 0 ]
# Parameter Matrices
γ_1 = 0.1
ϕ_1 = 1e-5
# Initial condition
x0 = np.array([[5], [150], [1], [0], [0]])
These parameter values are used to define an economy of the DLE class.
We can then simulate the economy for a chosen length of time, from our initial state vector 𝑥0
econ1.compute_sequence(x0, ts_length=300)
The economy stores the simulated values for each variable. Below we plot consumption and investment
Inspection of the plot shows that the sample paths of consumption and investment drift in ways that suggest that each has
or nearly has a random walk or unit root component.
This is confirmed by checking the eigenvalues of 𝐴𝑜
econ1.endo, econ1.exo
The endogenous eigenvalue that appears to be unity reflects the random walk character of consumption in Hall’s model.
• Actually, the largest endogenous eigenvalue is very slightly below 1.
• This outcome comes from the small adjustment cost 𝜙1 .
econ1.endo[1]
0.9999999999904767
The fact that the largest endogenous eigenvalue is strictly less than unity in modulus means that it is possible to compute
the non-stochastic steady state of consumption, investment and capital.
econ1.compute_steadystate()
np.set_printoptions(precision=3, suppress=True)
print(econ1.css, econ1.iss, econ1.kss)
However, the near-unity endogenous eigenvalue means that these steady state values are of little relevance.
We generate our next economy by making two alterations to the parameters of Example 1.
• First, we raise 𝜙1 from 0.00001 to 1.
– This will lower the endogenous eigenvalue that is close to 1, causing the economy to head more quickly to
the vicinity of its non-stochastic steady-state.
• Second, we raise 𝛾1 from 0.1 to 0.15.
– This has the effect of raising the optimal steady-state value of capital.
We also start the economy off from an initial condition with a lower capital stock
′
𝑥0 = [ 5 20 1 0 0 ]
γ2 = 0.15
γ22 = np.array([[γ2], [0]])
ϕ_12 = 1
ϕ_i2 = np.array([[1], [-ϕ_12]])
Creating the DLE class and then simulating gives the following plot for consumption and investment
econ2.compute_sequence(x02, ts_length=300)
plt.plot(econ2.c[0], label='Cons.')
plt.plot(econ2.i[0], label='Inv.')
plt.legend()
plt.show()
Simulating our new economy shows that consumption grows quickly in the early stages of the sample.
However, it then settles down around the new non-stochastic steady-state level of consumption of 17.5, which we find as
follows
econ2.compute_steadystate()
print(econ2.css, econ2.iss, econ2.kss)
The economy converges faster to this level than in Example 1 because the largest endogenous eigenvalue of 𝐴𝑜 is now
significantly lower than 1.
econ2.endo, econ2.exo
For our third economy, we choose parameter values with the aim of generating sustained growth in consumption, invest-
ment and capital.
To do this, we set parameters so that Jones and Manuelli’s “growth condition” is just satisfied.
In our notation, just satisfying the growth condition is actually equivalent to setting 𝛽(𝛾1 + 𝛿𝑘 ) = 1, the condition that
was necessary for consumption to be a random walk in Hall’s model.
Thus, we lower 𝛾1 back to 0.1.
In our model, this is a necessary but not sufficient condition for growth.
To generate growth we set preference parameters to reflect habit persistence.
In particular, we set 𝜆 = −1, 𝛿ℎ = 0.9 and 𝜃ℎ = 1 − 𝛿ℎ = 0.1.
This makes preferences assume the form
1 ∞ ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 − (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 )2 + 𝑙2𝑡 ]
2 𝑡=0 𝑗=0
l_λ2 = np.array([[-1]])
pref2 = (β, l_λ2, π_h, δ_h, θ_h)
econ3.compute_sequence(x0, ts_length=300)
Thus, adding habit persistence to the Hall model of Example 1 is enough to generate sustained growth in our economy.
The eigenvalues of 𝐴𝑜 in this new economy are
econ3.endo, econ3.exo
We now have two unit endogenous eigenvalues. One stems from satisfying the growth condition (as in Example 1).
The other unit eigenvalue results from setting 𝜆 = −1.
To show the importance of both of these for generating growth, we consider the following experiments.
l_λ3 = np.array([[-0.7]])
pref3 = (β, l_λ3, π_h, δ_h, θ_h)
econ4.compute_sequence(x0, ts_length=300)
plt.plot(econ4.c[0], label='Cons.')
plt.plot(econ4.i[0], label='Inv.')
plt.legend()
plt.show()
econ4.endo, econ4.exo
β_2 = np.array([[0.94]])
pref4 = (β_2, l_λ, π_h, δ_h, θ_h)
econ5.compute_sequence(x0, ts_length=300)
plt.plot(econ5.c[0], label='Cons.')
plt.plot(econ5.i[0], label='Inv.')
plt.legend()
plt.show()
econ5.endo, econ5.exo
NINETEEN
This is one of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen and Sargent,
2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s in Anaconda, this lecture uses the quantecon library
This lecture uses the DLE class to price payout streams that are linear functions of the economy’s state vector, as well as
risk-free assets that pay out one unit of the first consumption good with certainty.
We assume basic knowledge of the class of economic environments that fall within the domain of the DLE class.
Many details about the basic environment are contained in the lecture Growth in Dynamic Linear Economies.
We’ll also need the following imports
import numpy as np
import matplotlib.pyplot as plt
from quantecon import DLE
We use a linear-quadratic version of an economy that Lucas (1978) [Lucas, 1978] used to develop an equilibrium theory
of asset prices:
Preferences
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
𝑠𝑡 = 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 = 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
Information
1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤ 𝑧 +
⎥ 𝑡 ⎢
⎡ 1 0 ⎤
⎥ 𝑤𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
401
Advanced Quantitative Economics with Python
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
′
𝑥0 = [ 5 150 1 0 0 ]
[Hansen and Sargent, 2013] show that the time t value of a permanent claim to a stream 𝑦𝑠 = 𝑈𝑎 𝑥𝑠 , 𝑠 ≥ 𝑡 is:
𝑎𝑡 = (𝑥′𝑡 𝜇𝑎 𝑥𝑡 + 𝜎𝑎 )/(𝑒1̄ 𝑀𝑐 𝑥𝑡 )
with
∞
′
𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0
∞
𝛽 ′ ′
𝜎𝑎 = trace(𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 (𝐴𝑜 )𝜏 )
1−𝛽 𝜏=0
where
′
𝑍𝑎 = 𝑈𝑎 𝑀𝑐
The use of 𝑒1̄ indicates that the first consumption good is the numeraire.
gam = 0
γ = np.array([[gam], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
ϕ_1 = 1e-4
ϕ_i = np.array([[0], [-ϕ_1]])
δ_k = np.array([[.95]])
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
ud = np.array([[5, 1, 0],
[0, 0, 0]])
a22 = np.array([[1, 0, 0],
[0, 0.8, 0],
[0, 0, 0.5]])
c2 = np.array([[0, 1, 0],
[0, 0, 1]]).T
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ub = np.array([[30, 0, 0]])
x0 = np.array([[5, 150, 1, 0, 0]]).T
The graph below plots the price of this claim over time:
The next plot displays the realized gross rate of return on this “Lucas tree” as well as on a risk-free one-period bond:
array([[ 1. , -0.45097342],
[-0.45097342, 1. ]])
Above we have also calculated the correlation coefficient between these two returns.
To give an idea of how the term structure of interest rates moves in this economy, the next plot displays the net rates of
return on one-period and five-period risk-free bonds:
From the above plot, we can see the tendency of the term structure to slope up when rates are low and to slope down
when rates are high.
Comparing it to the previous plot of the price of the “Lucas tree”, we can also see that net rates of return are low when
the price of the tree is high, and vice versa.
We now plot the realized gross rate of return on a “Lucas tree” as well as on a risk-free one-period bond when the
autoregressive parameter for the endowment process is reduced to 0.4:
array([[ 1. , -0.63164195],
[-0.63164195, 1. ]])
The correlation between these two gross rates is now more negative.
Next, we again plot the net rates of return on one-period and five-period risk-free bonds:
We can see the tendency of the term structure to slope up when rates are low (and down when rates are high) has been
accentuated relative to the first instance of our economy.
TWENTY
This is another member of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen
and Sargent, 2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s in Anaconda, this lecture uses the quantecon library.
import numpy as np
import matplotlib.pyplot as plt
from quantecon import DLE
This lecture shows how the DLE class can be used to create impulse response functions for three related economies,
starting from Hall (1978) [Hall, 1978].
Knowledge of the basic economic environment is assumed.
See the lecture “Growth in Dynamic Linear Economies” for more details.
γ_1 = 0.1
γ = np.array([[γ_1], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
(continues on next page)
409
Advanced Quantitative Economics with Python
These parameter values are used to define an economy of the DLE class.
We can then simulate the economy for a chosen length of time, from our initial state vector 𝑥0 .
The economy stores the simulated values for each variable. Below we plot consumption and investment:
The DLE class can be used to create impulse response functions for each of the endogenous variables:
{𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , 𝑔𝑡 }.
If no selector vector for the shock is specified, the default choice is to give IRFs to the first shock in 𝑤𝑡+1 .
Below we plot the impulse response functions of investment and consumption to an endowment innovation (the first
shock) in the Hall model:
econ1.irf(ts_length=40, shock=None)
# This is the left panel of Fig 5.7.1 from p.105 of HS2013
plt.plot(econ1.c_irf, label='Cons.')
plt.plot(econ1.i_irf, label='Inv.')
plt.legend()
plt.show()
It can be seen that the endowment shock has permanent effects on the level of both consumption and investment, consistent
with the endogenous unit eigenvalue in this economy.
Investment is much more responsive to the endowment shock at shorter time horizons.
We generate our next economy by making only one change to the parameters of Example 1: we raise the parameter
associated with the cost of adjusting capital,𝜙1 , from 0.00001 to 0.2.
This will lower the endogenous eigenvalue that is unity in Example 1 to a value slightly below 1.
ϕ_12 = 0.2
ϕ_i2 = np.array([[1], [-ϕ_12]])
tech2 = (ϕ_c, ϕ_g, ϕ_i2, γ, δ_k, θ_k)
econ2.irf(ts_length=40,shock=None)
# This is the left panel of Fig 5.8.1 from p.106 of HS2013
plt.plot(econ2.c_irf,label='Cons.')
plt.plot(econ2.i_irf,label='Inv.')
plt.legend()
plt.show()
econ2.endo
array([0.9 , 0.99657126])
econ2.compute_steadystate()
print(econ2.css, econ2.iss, econ2.kss)
The first graph shows that there seems to be a downward trend in both consumption and investment.
This is a consequence of the decrease in the largest endogenous eigenvalue from unity in the earlier economy, caused by
the higher adjustment cost.
The present economy has a nonstochastic steady state value of 5 for consumption and 0 for both capital and investment.
Because the largest endogenous eigenvalue is still close to 1, the economy heads only slowly towards these mean values.
The impulse response functions now show that an endowment shock does not have a permanent effect on the levels of
either consumption or investment.
We generate our third economy by raising 𝜙1 further, to 1.0. We also raise the production function parameter from 0.1
to 0.15 (which raises the non-stochastic steady state value of capital above zero).
We also change the specification of preferences to make the consumption good durable.
Specifically, we allow for a single durable household good obeying:
Services are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
To implement this, we set 𝜆 = 0.1 and 𝜋 = 0 (we have already set 𝜃ℎ = 1 and 𝛿ℎ = 0.9).
We start from an initial condition that makes consumption begin near around its non-stochastic steady state.
ϕ_13 = 1
ϕ_i3 = np.array([[1], [-ϕ_13]])
γ_12 = 0.15
γ_2 = np.array([[γ_12], [0]])
l_λ2 = np.array([[0.1]])
π_h2 = np.array([[0]])
In contrast to Hall’s original model of Example 1, it is now investment that is much smoother than consumption.
This illustrates how making consumption goods durable tends to undo the strong consumption smoothing result that Hall
obtained.
econ3.irf(ts_length=40, shock=None)
# This is the left panel of Fig 5.11.1 from p.111 of HS2013
plt.plot(econ3.c_irf, label='Cons.')
plt.plot(econ3.i_irf, label='Inv.')
plt.legend()
plt.show()
The impulse response functions confirm that consumption is now much more responsive to an endowment shock (and
investment less so) than in Example 1.
As in Example 2, the endowment shock has permanent effects on neither variable.
TWENTYONE
This lecture is part of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen and
Sargent, 2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s included in Anaconda, this lecture uses the quantecon library.
This lecture adds a third solution method for the linear-quadratic-Gaussian permanent income model with 𝛽𝑅 = 1, com-
plementing the other two solution methods described in Optimal Savings I: The Permanent Income Model and Optimal
Savings II: LQ Techniques and this Jupyter notebook.
The additional solution method uses the DLE class.
In this way, we map the permanent income model into the framework of Hansen & Sargent (2013) “Recursive Models
of Dynamic Linear Economies” [Hansen and Sargent, 2013].
We’ll also require the following imports
import numpy as np
import matplotlib.pyplot as plt
from quantecon import DLE
np.set_printoptions(suppress=True, precision=4)
where 𝐸𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information, 𝑐𝑡 is time 𝑡 consumption,
𝑢(𝑐) is a strictly concave one-period utility function, and 𝛽 ∈ (0, 1) is a discount factor.
The LQ model gets its name partly from assuming that the utility function 𝑢 is quadratic:
419
Advanced Quantitative Economics with Python
The consumer maximizes the utility functional (21.1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0 subject
to the sequence of budget constraints
where 𝑦𝑡 is an exogenous stationary endowment process, 𝑅 is a constant gross risk-free interest rate, 𝑏𝑡 is one-period
risk-free debt maturing at 𝑡, and 𝑏0 is a given initial condition.
We shall assume that 𝑅−1 = 𝛽.
Equation (21.2) is linear.
We use another set of linear equations to model the endowment process.
In particular, we assume that the endowment process has the state-space representation
where 𝑤𝑡+1 is an IID process with mean zero and identity contemporaneous covariance matrix, 𝐴22 is a stable matrix,
its eigenvalues being strictly below unity in modulus, and 𝑈𝑦 is a selection vector that identifies 𝑦 with a particular linear
combination of the 𝑧𝑡 .
We impose the following condition on the consumption, borrowing plan:
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < +∞ (21.4)
𝑡=0
𝑧
𝑥𝑡 = [ 𝑡 ]
𝑏𝑡
where 𝑏𝑡 is its one-period debt falling due at the beginning of period 𝑡 and 𝑧𝑡 contains all variables useful for forecasting
its future endowment.
We assume that {𝑦𝑡 } follows a second order univariate autoregressive process:
One way of solving this model is to map the problem into the framework outlined in Section 4.8 of [Hansen and Sargent,
2013] by setting up our technology, information and preference matrices as follows:
1 0 −1 −1
Technology: 𝜙𝑐 = [ ] , 𝜙𝑔 = [ ] , 𝜙𝑖 = [ ], Γ = [ ], Δ𝑘 = 0, Θ𝑘 = 𝑅.
0 1 −0.00001 0
1 0 0 0
0 1 0
Information: 𝐴22 = ⎡
⎢ 𝛼 𝜌1 𝜌2 ⎤ ⎡ ⎤
⎥, 𝐶2 = ⎢ 𝜎 ⎥, 𝑈𝑏 = [ 𝛾 0 0 ], 𝑈𝑑 = [ ].
0 0 0
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
Preferences: Λ = 0, Π = 1, Δℎ = 0, Θℎ = 0.
We set parameters
420 Chapter 21. Permanent Income Model using the DLE Class
Advanced Quantitative Economics with Python
𝑐𝑡 + 𝑘𝑡−1 = 𝑖𝑡 + 𝑦𝑡
𝑘𝑡
= 𝑖𝑡
𝑅
𝑙2𝑡 = (0.00001)2 𝑖𝑡
Combining the first two of these gives the budget constraint of the permanent income model, where 𝑘𝑡 = 𝑏𝑡+1 .
The third equation is a very small penalty on debt-accumulation to rule out Ponzi schemes.
We set up this instance of the DLE class below:
γ = np.array([[-1], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
ϕ_1 = 1e-5
ϕ_i = np.array([[-1], [-ϕ_1]])
δ_k = np.array([[0]])
θ_k = np.array([[1 / β]])
β = np.array([[β]])
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[0]])
θ_h = np.array([[0]])
To check the solution of this model with that from the LQ problem, we select the 𝑆𝑐 matrix from the DLE class.
The solution to the DLE economy has:
𝑐𝑡 = 𝑆𝑐 𝑥𝑡
econ1.Sc
for i in range(25):
econ1.compute_sequence(x0, ts_length=150)
ax1.plot(econ1.c[0], c='g')
ax1.plot(econ1.d[0], c='b')
ax1.plot(econ1.c[0], label='Consumption', c='g')
ax1.plot(econ1.d[0], label='Income', c='b')
ax1.legend()
for i in range(25):
econ1.compute_sequence(x0, ts_length=150)
ax2.plot(econ1.k[0], color='r')
ax2.plot(econ1.k[0], label='Debt', c='r')
ax2.legend()
plt.show()
422 Chapter 21. Permanent Income Model using the DLE Class
CHAPTER
TWENTYTWO
This lecture is yet another part of a suite of lectures that use the quantecon DLE class to instantiate models within the
[Hansen and Sargent, 2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s included in Anaconda, this lecture uses the quantecon library
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple
from quantecon import DLE
Ryoo and Rosen’s (2004) [Ryoo and Rosen, 2004] partial equilibrium model determines
• a stock of “Engineers” 𝑁𝑡
• a number of new entrants in engineering school, 𝑛𝑡
• the wage rate of engineers, 𝑤𝑡
It takes k periods of schooling to become an engineer.
The model consists of the following equations:
• a demand curve for engineers:
𝑤𝑡 = −𝛼𝑑 𝑁𝑡 + 𝜖𝑑𝑡
• a time-to-build structure of the education process:
𝑁𝑡+𝑘 = 𝛿𝑁 𝑁𝑡+𝑘−1 + 𝑛𝑡
• a definition of the discounted present value of each new engineering student:
∞
𝑣𝑡 = 𝛽𝑘 𝔼 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗
𝑗=0
423
Advanced Quantitative Economics with Python
22.2.1 Preferences
𝛿𝑁 1 0 ⋯ 0 0
⎡0 0 1 ⋯ 0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
Π = 0, Λ = [𝛼𝑑 0 ⋯ 0] , Δℎ = ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ , Θℎ = ⎢ ⋮ ⎥
⎢0 ⋯ ⋯ 0 1⎥ ⎢0⎥
⎣0 0 0 ⋯ 0⎦ ⎣1⎦
where Λ is a k+1 x 1 matrix, Δℎ is a k_1 x k+1 matrix, and Θℎ is a k+1 x 1 matrix.
This specification sets 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−(𝑘−𝜏) for 𝜏 = 1, ..., 𝑘.
Below we set things up so that the number of years of education, 𝑘, can be varied.
22.2.2 Technology
To capture Ryoo and Rosen’s [Ryoo and Rosen, 2004] supply curve, we use the physical technology:
𝑐𝑡 = 𝑖𝑡 + 𝑑1𝑡
𝜓1 𝑖𝑡 = 𝑔𝑡
where 𝜓1 is inversely proportional to 𝛼𝑠 .
22.2.3 Information
1 0 0 0 0
10 1 0
𝐴22 = ⎡
⎢0 𝜌𝑠 0⎤ , 𝐶
⎥ 2 ⎢ = ⎡1 0⎤⎥ , 𝑈𝑏 = [30 0 1] , 𝑈𝑑 = [ ]
0 0 0
⎣0 0 𝜌𝑑 ⎦ ⎣0 1⎦
where 𝜌𝑠 and 𝜌𝑑 describe the persistence of the supply and demand shocks
β = np.array([[1 / 1.05]])
α_d = np.array([[0.1]])
α_s = 1
ε_1 = 1e-7
λ_1 = np.full((1, k), ε_1)
# Use of ε_1 is trick to aquire detectability, see HS2013 p. 228 footnote 4
l_λ = np.hstack((α_d, λ_1))
π_h = np.array([[0]])
δ_n = np.array([[0.95]])
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
ψ_1 = 1 / α_s
δ_k = np.array([[0]])
θ_k = np.array([[0]])
ρ_s = 0.8
ρ_d = 0.8
1. Raising 𝛼𝑑 to 2
2. Raising 𝑘 to 7
3. Raising 𝑘 to 10
α_d = np.array([[2]])
l_λ = np.hstack((α_d, λ_1))
pref2 = Preferences(β, l_λ, π_h, δ_h, θ_h)
econ2 = DLE(info1, tech1, pref2)
α_d = np.array([[0.1]])
k = 7
λ_1 = np.full((1, k), ε_1)
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k+1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
k = 10
λ_1 = np.full((1, k), ε_1)
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
econ1.irf(ts_length=25, shock=shock_demand)
econ2.irf(ts_length=25, shock=shock_demand)
econ3.irf(ts_length=25, shock=shock_demand)
econ4.irf(ts_length=25, shock=shock_demand)
The first figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a positive demand shock, for 𝛼𝑑 = 0.1
and 𝛼𝑑 = 2.
When 𝛼𝑑 = 2, the number of new students 𝑛𝑡 rises initially, but the response then turns negative.
A positive demand shock raises wages, drawing new students into the profession.
However, these new students raise 𝑁𝑡 .
The higher is 𝛼𝑑 , the larger the effect of this rise in 𝑁𝑡 on wages.
This counteracts the demand shock’s positive effect on wages, reducing the number of new students in subsequent periods.
Consequently, when 𝛼𝑑 is lower, the effect of a demand shock on 𝑁𝑡 is larger
The next figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a positive demand shock, for 𝑘 = 4,
𝑘 = 7 and 𝑘 = 10 (with 𝛼𝑑 = 0.1)
ax2.plot(econ1.h_irf[:,0], label='$k=4$')
ax2.plot(econ3.h_irf[:,0], label='$k=7$')
ax2.plot(econ4.h_irf[:,0], label='$k=10$')
ax2.legend()
ax2.set_title('Response of $N_t$ to a demand shock')
plt.show()
Both panels in the above figure show that raising 𝑘 lowers the effect of a positive demand shock on entry into the engi-
neering profession.
Increasing the number of periods of schooling lowers the number of new students in response to a demand shock.
This occurs because with longer required schooling, new students ultimately benefit less from the impact of that shock
on wages.
TWENTYTHREE
CATTLE CYCLES
This is another member of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen
and Sargent, 2013] class of models described in detail in Recursive Models of Dynamic Linear Economies.
In addition to what’s in Anaconda, this lecture uses the quantecon library.
This lecture uses the DLE class to construct instances of the “Cattle Cycles” model of Rosen, Murphy and Scheinkman
(1994) [Rosen et al., 1994].
That paper constructs a rational expectations equilibrium model to understand sources of recurrent cycles in US cattle
stocks and prices.
We make the following imports:
import numpy as np
import matplotlib.pyplot as plt
from collections import namedtuple
from quantecon import DLE
from math import sqrt
The model features a static linear demand curve and a “time-to-grow” structure for cattle.
Let 𝑝𝑡 be the price of slaughtered beef, 𝑚𝑡 the cost of preparing an animal for slaughter, ℎ𝑡 the holding cost for a mature
animal, 𝛾1 ℎ𝑡 the holding cost for a yearling, and 𝛾0 ℎ𝑡 the holding cost for a calf.
The cost processes {ℎ𝑡 , 𝑚𝑡 }∞ ∞
𝑡=0 are exogenous, while the price process {𝑝𝑡 }𝑡=0 is determined within a rational expecta-
tions equilibrium.
Let 𝑥𝑡 be the breeding stock, and 𝑦𝑡 be the total stock of cattle.
The law of motion for the breeding stock is
𝑥𝑡 = (1 − 𝛿)𝑥𝑡−1 + 𝑔𝑥𝑡−3 − 𝑐𝑡
where 𝑔 < 1 is the number of calves that each member of the breeding stock has each year, and 𝑐𝑡 is the number of cattle
slaughtered.
The total headcount of cattle is
𝑦𝑡 = 𝑥𝑡 + 𝑔𝑥𝑡−1 + 𝑔𝑥𝑡−2
429
Advanced Quantitative Economics with Python
This equation states that the total number of cattle equals the sum of adults, calves and yearlings, respectively.
A representative farmer chooses {𝑐𝑡 , 𝑥𝑡 } to maximize:
∞
𝜓1 2 𝜓2 2 𝜓 𝜓
𝔼0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ𝑡 𝑥𝑡 − 𝛾0 ℎ𝑡 (𝑔𝑥𝑡−1 ) − 𝛾1 ℎ𝑡 (𝑔𝑥𝑡−2 ) − 𝑚𝑡 𝑐𝑡 − 𝑥 − 𝑥 − 3 𝑥2𝑡−3 − 4 𝑐𝑡2 }
𝑡=0
2 𝑡 2 𝑡−1 2 2
subject to the law of motion for 𝑥𝑡 , taking as given the stochastic laws of motion for the exogenous processes, the equi-
librium price process, and the initial state [𝑥−1 , 𝑥−2 , 𝑥−3 ].
Remark The 𝜓𝑗 parameters are very small quadratic costs that are included for technical reasons to make well posed and
well behaved the linear quadratic dynamic programming problem solved by the fictitious planner who in effect chooses
equilibrium quantities and shadow prices.
Demand for beef is government by 𝑐𝑡 = 𝑎0 − 𝑎1 𝑝𝑡 + 𝑑𝑡̃ where 𝑑𝑡̃ is a stochastic process with mean zero, representing a
demand shifter.
23.2.1 Preferences
−1
We set Λ = 0, Δℎ = 0, Θℎ = 0, Π = 𝛼1 2 and 𝑏𝑡 = Π𝑑𝑡̃ + Π𝛼0 .
With these settings, the FOC for the household’s problem becomes the demand curve of the “Cattle Cycles” model.
23.2.2 Technology
(1 − 𝛿) 0 𝑔 1
Δ𝑘 = ⎡
⎢ 1 0 0 ⎤ ⎡ ⎤
⎥ , Θ𝑘 = ⎢ 0 ⎥
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
(where 𝑖𝑡 = −𝑐𝑡 ).
To capture the production of cattle, we set
1 0 0 0 0 1 0 0 0
⎡ 𝑓 ⎤ ⎡ 1 0 0 0 ⎤ ⎡ 0 ⎤ ⎡ 𝑓 (1 − 𝛿) 0 𝑔𝑓1 ⎤
⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥
Φ𝑐 = ⎢ 0 ⎥ , Φ𝑔 = ⎢ 0 1 0 0 ⎥ , Φ𝑖 = ⎢ 0 ⎥ , Γ = ⎢ 𝑓3 0 0 ⎥
⎢ 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎢ 0 ⎥ ⎢ 0 𝑓5 0 ⎥
⎣ −𝑓7 ⎦ ⎣ 0 0 0 1 ⎦ ⎣ 0 ⎦ ⎣ 0 0 0 ⎦
23.2.3 Information
We set
0
1 0 0 0 0 0 0 ⎡ ⎤
⎡ ⎤ ⎡ ⎤ 𝑓2 𝑈 ℎ
0 𝜌1 0 0 1 0 0 ⎢ ⎥
𝐴22 =⎢ ⎥ , 𝐶2 = ⎢ ⎥ , 𝑈𝑏 = [ Π𝛼0 0 0 Π ] , 𝑈𝑑 = ⎢ 𝑓4 𝑈 ℎ ⎥
⎢ 0 0 𝜌2 0 ⎥ ⎢ 0 1 0 ⎥
⎢ 𝑓6 𝑈 ℎ ⎥
⎣ 0 0 0 𝜌3 ⎦ ⎣ 0 0 15 ⎦
⎣ 𝑓8 𝑈 ℎ ⎦
Ψ1 Ψ2 Ψ3
To map this into our class, we set 𝑓12 = 2 , 𝑓22 = 2 , 𝑓32 = 2 , 2𝑓1 𝑓2 = 1, 2𝑓3 𝑓4 = 𝛾0 𝑔, 2𝑓5 𝑓6 = 𝛾1 𝑔.
β = np.array([[0.909]])
lλ = np.array([[0]])
a1 = 0.5
πh = np.array([[1 / (sqrt(a1))]])
δh = np.array([[0]])
θh = np.array([[0]])
δ = 0.1
g = 0.85
f1 = 0.001
f3 = 0.001
f5 = 0.001
f7 = 0.001
ϕg = np.array([[0, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1,0],
[0, 0, 0, 1]])
γ = np.array([[ 0, 0, 0],
[f1 * (1 - δ), 0, g * f1],
[ f3, 0, 0],
[ 0, f5, 0],
[ 0, 0, 0]])
δk = np.array([[1 - δ, 0, g],
[ 1, 0, 0],
[ 0, 1, 0]])
ρ1 = 0
ρ2 = 0
ρ3 = 0.6
a0 = 500
γ0 = 0.4
γ1 = 0.7
f2 = 1 / (2 * f1)
f4 = γ0 * g / (2 * f3)
f6 = γ1 * g / (2 * f5)
f8 = 1 / (2 * f7)
c2 = np.array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 15]])
πh_scalar = πh.item()
ub = np.array([[πh_scalar * a0, 0, 0, πh_scalar]])
uh = np.array([[50, 1, 0, 0]])
um = np.array([[100, 0, 1, 0]])
ud = np.vstack(([0, 0, 0, 0],
f2 * uh, f4 * uh, f6 * uh, f8 * um))
Notice that we have set 𝜌1 = 𝜌2 = 0, so ℎ𝑡 and 𝑚𝑡 consist of a constant and a white noise component.
We set up the economy using tuples for information, technology and preference matrices below.
We also construct two extra information matrices, corresponding to cases when 𝜌3 = 1 and 𝜌3 = 0 (as opposed to the
baseline case of 𝜌3 = 0.6).
ρ3_2 = 1
a22_2 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_2]])
ρ3_3 = 0
a22_3 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_3]])
# Example of how we can look at the matrices associated with a given namedtuple
info1.a22
array([[1. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.6]])
# Calculate steady-state in baseline case and use to set the initial condition
econ1.compute_steadystate(nnc=4)
x0 = econ1.zz
econ1.compute_sequence(x0, ts_length=100)
[Rosen et al., 1994] use the model to understand the sources of recurrent cycles in total cattle stocks.
Plotting 𝑦𝑡 for a simulation of their model shows its ability to generate cycles in quantities
# Calculation of y_t
totalstock = econ1.k[0] + g * econ1.k[1] + g * econ1.k[2]
fig, ax = plt.subplots()
ax.plot(totalstock)
ax.set_xlim((-1, 100))
ax.set_title('Total number of cattle')
plt.show()
In their Figure 3, [Rosen et al., 1994] plot the impulse response functions of consumption and the breeding stock of cattle
to the demand shock, 𝑑𝑡̃ , under the three different values of 𝜌3 .
We replicate their Figure 3 below
econ1.irf(ts_length=25, shock=shock_demand)
econ2.irf(ts_length=25, shock=shock_demand)
econ3.irf(ts_length=25, shock=shock_demand)
The above figures show how consumption patterns differ markedly, depending on the persistence of the demand shock:
• If it is purely transitory (𝜌3 = 0) then consumption rises immediately but is later reduced to build stocks up again.
• If it is permanent (𝜌3 = 1), then consumption falls immediately, in order to build up stocks to satisfy the permanent
rise in future demand.
In Figure 4 of their paper, [Rosen et al., 1994] plot the response to a demand shock of the breeding stock and the total
stock, for 𝜌3 = 0 and 𝜌3 = 0.6.
We replicate their Figure 4 below
The fact that 𝑦𝑡 is a weighted moving average of 𝑥𝑡 creates a humped shape response of the total stock in response to
demand shocks, contributing to the cyclicality seen in the first graph of this lecture.
TWENTYFOUR
24.1 Overview
This is another member of a suite of lectures that use the quantecon DLE class to instantiate models within the [Hansen
and Sargent, 2013] class of models described in Recursive Models of Dynamic Linear Economies.
In addition to what’s in Anaconda, this lecture uses the quantecon library.
import numpy as np
import quantecon as qe
import matplotlib.pyplot as plt
from quantecon import DLE
from math import sqrt
This lecture describes an early contribution to what is now often called a news and noise issue.
In particular, it analyzes a shock-invertibility issue that is endemic within a class of permanent income models.
Technically, the invertibility problem indicates a situation in which histories of the shocks in an econometrician’s autore-
gressive or Wold moving average representation span a smaller information space than do the shocks that are seen by the
agents inside the econometrician’s model.
An econometrician who is unaware of the problem would misinterpret shocks and likely responses to them.
A shock-invertibility that is technically close to the one studied here is discussed by Eric Leeper, Todd Walker, and Susan
Yang [Leeper et al., 2013] in their analysis of fiscal foresight.
A distinct shock-invertibility issue is present in the special LQ consumption smoothing model in this quantecon lecture
Information and Consumption Smoothing.
437
Advanced Quantitative Economics with Python
24.2 Model
We consider the following modification of Robert Hall’s (1978) model [Hall, 1978] in which the endowment process is
the sum of two orthogonal autoregressive processes:
Preferences
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
𝑠𝑡 = 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
Information
1 0 0 0 0 0 0 0
⎡ 0 0.9 0 0 0 0 ⎤ ⎡ 1 0 ⎤
⎢ ⎥ ⎢ ⎥
0 0 0 0 0 0 0 4
𝑧𝑡+1 =⎢ ⎥ 𝑧𝑡 + ⎢ ⎥ 𝑤𝑡+1
⎢ 0 0 1 0 0 0 ⎥ ⎢ 0 0 ⎥
⎢ 0 0 0 1 0 0 ⎥ ⎢ 0 0 ⎥
⎣ 0 0 0 0 1 0 ⎦ ⎣ 0 0 ⎦
𝑈𝑏 = [ 30 0 0 0 0 0 ]
5 1 1 0.8 0.6 0.4
𝑈𝑑 = [ ]
0 0 0 0 0 0
The preference shock is constant at 30, while the endowment process is the sum of a constant and two orthogonal processes.
Specifically:
𝑑𝑡 = 5 + 𝑑1𝑡 + 𝑑2𝑡
γ_1 = 0.05
γ = np.array([[γ_1], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
ϕ_1 = 0.00001
ϕ_i = np.array([[1], [-ϕ_1]])
δ_k = np.array([[1]])
θ_k = np.array([[1]])
(continues on next page)
𝑐𝑡 𝜎 (𝐿)
[ ]=[ 1 ] 𝑤𝑡
𝑐𝑡 − 𝑑 𝑡 𝜎2 (𝐿)
Thus, the econometrician’s news 𝑢𝑡 typically responds belatedly to the consumer’s news 𝑤𝑡 .
24.3 Code
We will construct Figures from Chapter 8 Appendix E of [Hansen and Sargent, 2013] to illustrate these ideas:
econ1.irf(ts_length=40, shock=None)
ax2.plot(econ1.c_irf, label='Consumption')
ax2.plot(econ1.c_irf - econ1.d_irf[:,0].reshape(40, 1), label='Deficit')
ax2.legend()
ax2.set_title('Response to $w_{2t}$')
plt.show()
The above figure displays the impulse response of consumption and the net-of-interest deficit to the innovations 𝑤𝑡 to the
consumer’s non-financial income or endowment process.
Consumption displays the characteristic “random walk” response with respect to each innovation.
Each endowment innovation leads to a temporary surplus followed by a permanent net-of-interest deficit.
The temporary surplus just offsets the permanent deficit in terms of expected present value.
hs_kal = qe.Kalman(lss_hs)
w_lss = hs_kal.whitener_lss()
ma_coefs = hs_kal.stationary_coefficients(50, 'ma')
ma_coefs = ma_coefs
jj = 50
y1_w1 = np.empty(jj)
y2_w1 = np.empty(jj)
y1_w2 = np.empty(jj)
y2_w2 = np.empty(jj)
for t in range(jj):
y1_w1[t] = ma_coefs[t][0, 0]
y1_w2[t] = ma_coefs[t][0, 1]
y2_w1[t] = ma_coefs[t][1, 0]
y2_w2[t] = ma_coefs[t][1, 1]
ax2.plot(y1_w2, label='Consumption')
ax2.plot(y2_w2, label='Deficit')
ax2.legend()
ax2.set_title('Response to $u_{2t}$')
plt.show()
The above figure displays the impulse response of consumption and the deficit to the innovations in the econometrician’s
Wold representation
• this is the object that would be recovered from a high order vector autoregression on the econometrician’s obser-
vations.
Consumption responds only to the first innovation
• this is indicative of the Granger causality imposed on the [𝑐𝑡 , 𝑐𝑡 −𝑑𝑡 ] process by Hall’s model: consumption Granger
causes 𝑐𝑡 − 𝑑𝑡 , with no reverse causality.
jj = 20
irf_wlss = w_lss.impulse_response(jj)
ycoefs = irf_wlss[1]
# Pull out the shocks
a1_w1 = np.empty(jj)
a1_w2 = np.empty(jj)
a2_w1 = np.empty(jj)
a2_w2 = np.empty(jj)
for t in range(jj):
a1_w1[t] = ycoefs[t][0, 0]
a1_w2[t] = ycoefs[t][0, 1]
a2_w1[t] = ycoefs[t][1, 0]
a2_w2[t] = ycoefs[t][1, 1]
While the responses of the innovations to consumption are concentrated at lag zero for both components of 𝑤𝑡 , the
responses of the innovations to (𝑐𝑡 − 𝑑𝑡 ) are spread over time (especially in response to 𝑤1𝑡 ).
Thus, the innovations to (𝑐𝑡 − 𝑑𝑡 ) as revealed by the vector autoregression depend on what the economic agent views as
“old news”.
443
CHAPTER
TWENTYFIVE
25.1 Overview
As an introduction to one possible approach to modeling Knightian uncertainty, this lecture describes static represen-
tations of five classes of preferences over risky prospects.
These preference specifications allow us to distinguish risk from uncertainty along lines proposed by [Knight, 1921].
All five preference specifications incorporate risk aversion, meaning displeasure from risks governed by well known
probability distributions.
Two of them also incorporate uncertainty aversion, meaning dislike of not knowing a probability distribution.
The preference orderings are
• Expected utility preferences
• Constraint preferences
• Multiplier preferences
• Risk-sensitive preferences
• Ex post Bayesian expected utility preferences
This labeling scheme is taken from [Hansen and Sargent, 2001].
Constraint and multiplier preferences express aversion to not knowing a unique probability distribution that describes
random outcomes.
Expected utility, risk-sensitive, and ex post Bayesian expected utility preferences all attribute a unique known probability
distribution to a decision maker.
We present things in a simple before-and-after one-period setting.
In addition to learning about these preference orderings, this lecture also describes some interesting code for computing
and graphing some representations of indifference curves, utility functions, and related objects.
Staring at these indifference curves provides insights into the different preferences.
Watch for the presence of a kink at the 45 degree line for the constraint preference indifference curves.
We begin with some that we’ll use to create some graphs.
# Package imports
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import rc, cm
(continues on next page)
445
Advanced Quantitative Economics with Python
𝐼 𝐼
𝜋𝑖̂ 𝜋̂ 𝜋̂
ent(𝜋, 𝜋)̂ = ∑ 𝜋𝑖̂ log( ) = ∑ 𝜋𝑖 ( 𝑖 ) log( 𝑖 )
𝑖=1
𝜋𝑖 𝑖=1
𝜋𝑖 𝜋𝑖
or
𝐼
ent(𝜋, 𝜋)̂ = ∑ 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 .
𝑖=1
Remark: A likelihood ratio 𝑚𝑖 is a discrete random variable. For any discrete random variable {𝑥𝑖 }𝐼𝑖=1 , the expected
value of 𝑥 under the 𝜋𝑖̂ distribution can be represented as the expected value under the 𝜋 distribution of the product of
𝑥𝑖 times the `shock’ 𝑚𝑖 :
𝐼 𝐼
̂ = ∑ 𝑥𝑖 𝜋𝑖̂ = ∑ 𝑚𝑖 𝑥𝑖 𝜋𝑖 = 𝐸𝑚𝑥,
𝐸𝑥
𝑖=1 𝑖=1
where 𝐸̂ is the mathematical expectation under the 𝜋̂ distribution and 𝐸 is the expectation under the 𝜋 distribution.
Evidently,
̂ = 𝐸𝑚 = 1
𝐸1
𝐸𝑚 log 𝑚 = 𝐸̂ log 𝑚.
In the three figures below, we plot relative entropy from several perspectives.
Our first figure depicts entropy as a function of 𝜋1̂ when 𝐼 = 2 and 𝜋1 = .5.
When 𝜋1 ∈ (0, 1), entropy is finite for both 𝜋1̂ = 0 and 𝜋1̂ = 1 because lim𝑥→0 𝑥 log 𝑥 = 0
However, when 𝜋1 = 0 or 𝜋1 = 1, entropy is infinite.
The heat maps in the next two figures vary both 𝜋1̂ and 𝜋1 .
The following figure plots entropy.
3.8205752275831846
A decision maker is said to have expected utility preferences when he ranks plans 𝑐 by their expected utilities
𝐼
∑ 𝑢(𝑐𝑖 )𝜋𝑖 , (25.1)
𝑖=1
where 𝑢 is a unique utility function and 𝜋 is a unique probability measure over states.
• A known 𝜋 expresses risk.
• Curvature of 𝑢 expresses risk aversion.
A decision maker is said to have constraint preferences when he ranks plans 𝑐 according to
𝐼
min ∑ 𝑚𝑖 𝜋𝑖 𝑢(𝑐𝑖 ) (25.2)
{𝑚𝑖 ≥0}𝐼𝑖=1 𝑖=1
subject to
𝐼
∑ 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 ≤ 𝜂 (25.3)
𝑖=1
and
𝐼
∑ 𝜋𝑖 𝑚𝑖 = 1. (25.4)
𝑖=1
In (25.3), 𝜂 ≥ 0 defines an entropy ball of probability distributions 𝜋̂ = 𝑚𝜋 that surround a baseline distribution 𝜋.
𝐼
As noted earlier, ∑𝑖=1 𝑚𝑖 𝜋𝑖 𝑢(𝑐𝑖 ) is the expected value of 𝑢(𝑐) under a twisted probability distribution {𝜋𝑖̂ }𝐼𝑖=1 =
{𝑚𝑖 𝜋𝑖 }𝐼𝑖=1 .
Larger values of the entropy constraint 𝜂 indicate more apprehension about the baseline probability distribution {𝜋𝑖 }𝐼𝑖=1 .
Following [Hansen and Sargent, 2001] and [Hansen and Sargent, 2008], we call minimization problem (25.2) subject to
(25.3) and(25.4) a constraint problem.
To find minimizing probabilities, we form a Lagrangian
𝐼 𝐼
𝐿 = ∑ 𝑚𝑖 𝜋𝑖 𝑢(𝑐𝑖 ) + 𝜃[̃ ∑ 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 − 𝜂] (25.5)
𝑖=1 𝑖=1
exp(−𝑢(𝑐𝑖 )/𝜃)̃
𝑚̃ 𝑖 (𝑐; 𝜃)̃ = . (25.6)
∑𝑗 𝜋𝑗 exp(−𝑢(𝑐𝑗 )/𝜃)̃
or
for 𝜃 ̃ = 𝜃(𝑐;
̃ 𝜂).
For a fixed 𝜂, the 𝜃 ̃ that solves equation (25.7) is evidently a function of the consumption plan 𝑐.
̃ 𝜂) in hand we can obtain worst-case probabilities as functions 𝜋 𝑚̃ (𝑐; 𝜂) of 𝜂.
With 𝜃(𝑐; 𝑖 𝑖
where the last term is 𝜃 ̃ times the entropy of the worst-case probability distribution.
A decision maker is said to have multiplier preferences when he ranks consumption plans 𝑐 according to
𝐼
T𝑢(𝑐) ≐ min ∑ 𝜋𝑖 𝑚𝑖 [𝑢(𝑐𝑖 ) + 𝜃 log 𝑚𝑖 ] (25.11)
{𝑚𝑖 ≥0}𝐼𝑖=1 𝑖=1
Here 𝜃 ∈ (𝜃, +∞) is a ‘penalty parameter’ that governs a ‘cost’ to an ‘evil alter ego’ who distorts probabilities by choosing
{𝑚𝑖 }𝐼𝑖=1 .
Lower values of the penalty parameter 𝜃 express more apprehension about the baseline probability distribution 𝜋.
Following [Hansen and Sargent, 2001] and [Hansen and Sargent, 2008], we call the minimization problem on the right
side of (25.11) a multiplier problem.
The minimizing probability distortion that solves the multiplier problem is
exp(−𝑢(𝑐𝑖 )/𝜃)
𝑚̂ 𝑖 (𝑐; 𝜃) = . (25.12)
∑𝑗 𝜋𝑗 exp(−𝑢(𝑐𝑗 )/𝜃)
We can solve
exp(−𝑢(𝑐𝑖 )/𝜃) exp(−𝑢(𝑐𝑖 )/𝜃)
∑ 𝜋𝑖 log[ ] = 𝜂̃ (25.13)
𝑖 ∑𝑗 𝜋𝑗 exp(−𝑢(𝑐𝑗 )/𝜃) ∑𝑗 𝜋𝑗 exp(−𝑢(𝑐𝑗 )/𝜃)
𝐼
T𝑢(𝑐) ≐ −𝜃 log ∑ 𝜋𝑖 exp(−𝑢(𝑐𝑖 )/𝜃). (25.15)
𝑖=1
Here T𝑢 in (25.15) is the risk-sensitivity operator of [Jacobson, 1973], [Whittle, 1981], and [Whittle, 1990].
For large values of 𝜃, T𝑢(𝑐) is approximately linear in the probability 𝜋1 , but for lower values of 𝜃, T𝑢(𝑐) has considerable
curvature as a function of 𝜋1 .
Under expected utility, i.e., 𝜃 = +∞, T𝑢(𝑐) is linear in 𝜋1 , but it is convex as a function of 𝜋1 when 𝜃 < +∞.
The two panels in the next figure below can help us to visualize the extra adjustment for risk that the risk-sensitive operator
entails.
This will help us understand how the T transformation works by envisioning what function is being averaged.
The panel on the right portrays how the transformation exp ( −𝑢(𝑐)
𝜃 ) sends 𝑢 (𝑐) to a new function by (i) flipping the sign,
and (ii) increasing curvature in proportion to 𝜃.
In the left panel, the red line is our tool for computing the mathematical expectation for different values of 𝜋.
The green lot indicates the mathematical expectation of exp ( −𝑢(𝑐)
𝜃 ) when 𝜋 = .5.
Notice that the distance between the green dot and the curve is greater in the transformed space than the original space
as a result of additional curvature.
The inverse transformation 𝜃 log 𝐸 [exp ( −𝑢(𝑐)
𝜃 )] generates the green dot on the left panel that constitutes the risk-
sensitive utility index.
The gap between the green dot and the red line on the left panel measures the additional adjustment for risk that risk-
sensitive preferences make relative to plain vanilla expected utility preferences.
is evidently a moment generating function for the random variable 𝑢(𝑐𝑖 ), while
𝐼
𝑔(𝜃−1 ) ≐ log ∑ 𝜋𝑖 exp(−𝑢(𝑐𝑖 )/𝜃)
𝑖=1
1 2
T𝑢(𝑐) = 𝜇𝑢 − 𝜎 , (25.16)
2𝜃 𝑢
which becomes expected utility 𝜇𝑢 when 𝜃−1 = 0.
The right side of equation (25.16) is a special case of stochastic differential utility preferences in which consumption
plans are ranked not just by their expected utilities 𝜇𝑢 but also the variances 𝜎𝑢2 of their expected utilities.
A decision maker is said to have ex post Bayesian preferences when he ranks consumption plans according to the
expected utility function
where 𝜋(𝑐̂ ∗ ) is the worst-case probability distribution associated with multiplier or constraint preferences evaluated at a
particular consumption plan 𝑐∗ = {𝑐𝑖∗ }𝐼𝑖=1 .
At 𝑐∗ , an ex post Bayesian’s indifference curves are tangent to those for multiplier and constraint preferences with appro-
priately chosen 𝜃 and 𝜂, respectively.
For the special case in which 𝐼 = 2, 𝑐1 = 2, 𝑐2 = 1, 𝑢(𝑐) = ln 𝑐, and 𝜋1 = .5, the following two figures depict how
worst-case probabilities are determined under constraint and multiplier preferences, respectively.
The first figure graphs entropy as a function of 𝜋1̂ .
̂
It also plots expected utility under the twisted probability distribution, namely, 𝐸𝑢(𝑐) = 𝑢(𝑐2 ) + 𝜋1̂ (𝑢(𝑐1 ) − 𝑢(𝑐2 )),
which is evidently a linear function of 𝜋1̂ .
𝐼
The entropy constraint ∑𝑖=1 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 ≤ 𝜂 implies a convex set Π̂ 1 of 𝜋1̂ ’s that constrains the adversary who chooses
𝜋1̂ , namely, the set of 𝜋1̂ ’s for which the entropy curve lies below the horizontal dotted line at an entropy level of 𝜂 = .25.
̂
Unless 𝑢(𝑐1 ) = 𝑢(𝑐2 ), the 𝜋1̂ that minimizes 𝐸𝑢(𝑐) is at the boundary of the set Π̂ 1 .
𝐼
The next figure shows the function ∑𝑖=1 𝜋𝑖 𝑚𝑖 [𝑢(𝑐𝑖 ) + 𝜃 log 𝑚𝑖 ] that is to be minimized in the multiplier problem.
The argument of the function is 𝜋1̂ = 𝑚1 𝜋1 .
Evidently, from this figure and also from formula (25.12), lower values of 𝜃 lead to lower, and thus more distorted,
minimizing values of 𝜋1̂ .
The figure indicates how one can construct a Lagrange multiplier 𝜃 ̃ associated with a given entropy constraint 𝜂 and a
given consumption plan.
Thus, to draw the figure, we set the penalty parameter for multiplier preferences 𝜃 so that the minimizing 𝜋1̂ equals the
minimizing 𝜋1̂ for the constraint problem from the previous figure.
The penalty parameter 𝜃 = .42 also equals the Lagrange multiplier 𝜃 ̃ on the entropy constraint for the constraint pref-
erences depicted in the previous figure because the 𝜋1̂ that minimizes the asymmetric curve associated with penalty
parameter 𝜃 = .42 is the same 𝜋1̂ associated with the intersection of the entropy curve and the entropy constraint dashed
vertical line.
Formulas (25.6) assert that the decision maker acts as if he is pessimistic relative to an approximating model 𝜋.
It expresses what [Bucklew, 2004] [p. 27] calls a statistical version of Murphy’s law:
The probability of anything happening is in inverse ratio to its desirability.
The minimizing likelihood ratio 𝑚̂ slants worst-case probabilities 𝜋̂ exponentially to increase probabilities of events that
give lower utilities.
As expressed by the value function bound (25.19) to be displayed below, the decision maker uses pessimism instrumen-
tally to protect himself against model misspecification.
The penalty parameter 𝜃 for multipler preferences or the entropy level 𝜂 that determines the Lagrange multiplier 𝜃 ̃ for
constraint preferences controls how adversely the decision maker exponentially slants probabilities.
A decision rule is said to be undominated in the sense of Bayesian decision theory if there exists a probability distribution
𝜋 for which it is optimal.
A decision rule is said to be admissible if it is undominated.
[Hansen and Sargent, 2008] use ex post Bayesian preferences to show that robust decision rules are undominated and
therefore admissible.
Indifference curves illuminate how concerns about robustness affect asset pricing and utility costs of fluctuations. For
𝐼 = 2, the slopes of the indifference curves for our five preference specifications are
• Expected utility:
𝑑𝑐2 𝜋 𝑢′ (𝑐 )
=− 1 ′ 1
𝑑𝑐1 𝜋2 𝑢 (𝑐2 )
𝑑𝑐2 𝜋̂ 𝑢′ (𝑐 )
=− 1 ′ 1
𝑑𝑐1 𝜋2̂ 𝑢 (𝑐2 )
where 𝜋1̂ , 𝜋2̂ are the minimizing probabilities computed from the worst-case distortions (25.6) from the constraint
problem at (𝑐1 , 𝑐2 ).
• Multiplier and risk-sensitive preferences:
When 𝑐1 > 𝑐2 , the exponential twisting formula (25.12) implies that 𝜋1̂ < 𝜋1 , which in turn implies that the indifference
curves through (𝑐1 , 𝑐2 ) for both constraint and multiplier preferences are flatter than the indifference curve associated
with expected utility preferences.
As we shall see soon when we discuss state price deflators, this gives rise to higher estimates of prices of risk.
For an example with 𝑢(𝑐) = ln 𝑐, 𝐼 = 2, and 𝜋1 = .5, the next two figures show indifference curves for expected utility,
multiplier, and constraint preferences.
The following figure shows indifference curves going through a point along the 45 degree line.
−𝑢(𝑐𝑖 ) −𝑢(𝑐𝑖 )
exp( 𝜃 ) exp( 𝜃 )
– Multiplier preferences: solve 𝑢−∑
̄ 𝜋
𝑖 𝑖 −𝑢(𝑐 )
(𝑢 (𝑐𝑖 ) + 𝜃 log ( −𝑢(𝑐𝑗 )
)) = 0 numerically
∑𝑗 exp( 𝜃 𝑗 ) ∑𝑗 exp( 𝜃 )
−𝑢(𝑐𝑖 )
exp( 𝜃∗ )
– Constraint preference: solve 𝑢̄ − ∑𝑖 𝜋𝑖 −𝑢(𝑐𝑗 )
𝑢 (𝑐𝑖 ) = 0 numerically where 𝜃∗ solves
∑𝑗 exp( 𝜃∗ )
−𝑢(𝑐𝑖 ) −𝑢(𝑐𝑖 )
exp( 𝜃∗ ) exp( 𝜃∗ )
∑ 𝑖 𝜋𝑖 −𝑢(𝑐𝑗 )
log ( −𝑢(𝑐𝑗 )
) − 𝜂 = 0 numerically.
∑𝑗 exp( 𝜃∗ ) ∑𝑗 exp( 𝜃∗ )
Remark: It seems that the constraint problem is hard to solve in its original form, i.e. by finding the distorting measure
that minimizes the expected utility.
It seems that viewing equation (25.7) as a root finding problem works much better.
But notice that equation (25.7) does not always have a solution.
Under 𝑢 = log, 𝑐1 = 𝑐2 = 1, we have:
exp ( −𝑢(𝑐
𝜃̃
𝑖)
⎛ ) exp ( −𝑢(𝑐
𝜃̃
𝑖)
) ⎞
∑ 𝜋𝑖 log ⎜
⎜ ⎟=0
−𝑢(𝑐𝑗 ) −𝑢(𝑐𝑗 ) ⎟
𝑖 ∑𝑗 𝜋𝑗 exp ( 𝜃 ̃ ) ∑ 𝜋
⎝ 𝑗 𝑗 exp ( 𝜃̃
) ⎠
Conjecture: when our numerical method fails it because the derivative of the objective doesn’t exist for our choice of
parameters.
Remark: It is tricky to get the algorithm to work properly for all values of 𝑐1 . In particular, parameters were chosen
with graduate student descent.
Tangent indifference curves off 45 degree line
For a given 𝜂 and a given allocatin (𝑐1 , 𝑐2 ) off the 45 degree line, by solving equations (25.7) and (25.13), we can find
̃ 𝑐) and 𝜂(𝜃,
𝜃(𝜂, ̃ 𝑐) that make indifference curves for multiplier and constraint preferences be tangent to one another.
The following figure shows indifference curves for multiplier and constraint preferences through a point off the 45 degree
line, namely, (𝑐(1), 𝑐(2)) = (3, 1), at which 𝜂 and 𝜃 are adjusted to render the indifference curves for constraint and
multiplier preferences tangent.
Note that all three lines of the left graph intersect at (1, 3). While the intersection at (3, 1) is hard-coded, the intersection
at (1,3) arises from the computation, which confirms that the code seems to be working properly.
As we move along the (kinked) indifference curve for the constraint preferences for a given 𝜂, the worst-case probabilities
𝐼
remain constant, but the Lagrange multiplier 𝜃 ̃ on the entropy constraint ∑𝑖=1 𝑚𝑖 log 𝑚𝑖 ≤ 𝜂 varies with (𝑐1 , 𝑐2 ).
As we move along the (smooth) indifference curve for the multiplier preferences for a given penalty parameter 𝜃, the
implied entropy 𝜂 ̃ from equation (25.13) and the worst-case probabilities both change with (𝑐1 , 𝑐2 ).
For constraint preferences, there is a kink in the indifference curve.
For ex post Bayesian preferences, there are effectively two sets of indifference curves depending on which side of the 45
degree line the (𝑐1 , 𝑐2 ) endowment point sits.
There are two sets of indifference curves because, while the worst-case probabilities differ above and below the 45 degree
line, the idea of ex post Bayesian preferences is to use a single probability distribution to compute expected utilities for
all consumption bundles.
Indifference curves through point (𝑐1 , 𝑐2 ) = (3, 1) for expected logarithmic utility (less curved smooth line), multiplier
(more curved line), constraint (solid line kinked at 45 degree line), and ex post Bayesian (dotted lines) preferences. The
worst-case probability 𝜋1̂ < .5 when 𝑐1 = 3 > 𝑐2 = 1 and 𝜋1̂ > .5 when 𝑐1 = 1 < 𝑐2 = 3.
Concerns about model uncertainty boost prices of risk that are embedded in state-price deflators. With complete markets,
let 𝑞𝑖 be the price of consumption in state 𝑖.
𝐼
The budget set of a representative consumer having endowment 𝑐 ̄ = {𝑐𝑖̄ }𝐼𝑖=1 is expressed by ∑𝑖 𝑞𝑖 (𝑐𝑖 − 𝑐𝑖̄ ) ≤ 0.
When a representative consumer has multiplier preferences, the state prices are
exp(−𝑢(𝑐𝑖̄ )/𝜃)
𝑞𝑖 = 𝜋𝑖 𝑚̂ 𝑖 𝑢′ (𝑐𝑖̄ ) = 𝜋𝑖 ( )𝑢′ (𝑐𝑖̄ ). (25.18)
∑𝑗 𝜋𝑗 exp(−𝑢(𝑐𝑗̄ )/𝜃)
The worst-case likelihood ratio 𝑚̂ 𝑖 operates to increase prices 𝑞𝑖 in relatively low utility states 𝑖.
State prices agree under multiplier and constraint preferences when 𝜂 and 𝜃 are adjusted according to (25.7) or (25.13)
to make the indifference curves tangent at the endowment point.
The next figure can help us think about state-price deflators under our different preference orderings.
In this figure, budget line and indifference curves through point (𝑐1 , 𝑐2 ) = (3, 1) for expected logarithmic utility, multi-
plier, constraint (kinked at 45 degree line), and ex post Bayesian (dotted lines) preferences.
Figure 2.7:
Because budget constraints are linear, asset prices are identical under multiplier and constraint preferences for which 𝜃
and 𝜂 are adjusted to verify (25.7) or (25.13) at a given consumption endowment {𝑐𝑖 }𝐼𝑖=1 .
However, as we note next, though they are tangent at the endowment point, the fact that indifference curves differ for
multiplier and constraint preferences means that certainty equivalent consumption compensations of the kind that [Lucas,
1987], [Hansen et al., 1999], [Tallarini, 2000], and [Barillas et al., 2009] used to measure the costs of business cycles
must differ.
For each of our five types of preferences, the following figure allows us to construct a certainty equivalent point (𝑐∗ , 𝑐∗ )
on the 45 degree line that renders the consumer indifferent between it and the risky point (𝑐(1), 𝑐(2)) = (3, 1).
Figure 2.8:
The figure indicates that the certainty equivalent level 𝑐∗ is higher for the consumer with expected utility preferences than
for the consumer with multiplier preferences, and that it is higher for the consumer with multiplier preferences than for
the consumer with constraint preferences.
The gap between these certainty equivalents measures the uncertainty aversion of the multiplier preferences or constraint
preferences consumer.
The gap between the expected value .5𝑐(1) + .5𝑐(2) at point A and the certainty equivalent for the expected utility
decision maker at point B is a measure of his risk aversion.
The gap between points 𝐵 and 𝐶 measures the multiplier preference consumer’s aversion to model uncertainty.
The gap between points B and D measures the constraint preference consumer’s aversion to model uncertainty.
The following figures show iso-entropy and iso-utility lines for the special case in which 𝐼 = 3, 𝜋1 = .3, 𝜋2 = .4, and
1−𝛼
the utility function is 𝑢(𝑐) = 𝑐1−𝛼 with 𝛼 = 0 and 𝛼 = 3, respectively, for the fixed plan 𝑐(1) = 1, 𝑐(2) = 2, 𝑐(3) = 3.
The iso-utility lines are the level curves of
m = m_unnormalized / (π * m_unnormalized).sum()
m = m_unnormalized / (π * m_unnormalized).sum()
The inequality in the last line just asserts that minimizers minimize.
Therefore, we have the following useful bound:
𝐼 𝐼
∑ 𝑚𝑖 𝜋𝑖 𝑢(𝑐𝑖 ) ≥ T𝜃 𝑢(𝑐) − 𝜃 ∑ 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 . (25.19)
𝑖=1 𝑖=1
The left side is expected utility under the probability distribution {𝑚𝑖 𝜋𝑖 }.
The right side is a lower bound on expected utility under all distributions expressed as an affine function of relative entropy
𝐼
∑𝑖=1 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 .
The intercept in the bound is the risk-sensitive criterion T𝜃 𝑢(𝑐), while the slope is the penalty parameter 𝜃.
Lowering 𝜃 does two things:
• it lowers the intercept T𝜃 𝑢(𝑐), which makes the bound less informative for small values of entropy; and
• it lowers the absolute value of the slope, which makes the bound more informative for larger values of relative
𝐼
entropy ∑𝑖=1 𝜋𝑖 𝑚𝑖 log 𝑚𝑖 .
The following figure reports best-case and worst-case expected utilities.
We calculate the lines in this figure numerically by solving optimization problems with respect to the change of measure.
In this figure, expected utility is on the co-ordinate axis while entropy is on the ordinate axis.
The lower curved line depicts expected utility under the worst-case model associated with each value of entropy 𝜂 recorded
𝐼 ̃ 𝜂))𝑢(𝑐 ), where 𝑚̃ (𝜃(𝜂))
̃
on the ordinate axis, i.e., it is ∑𝑖=1 𝜋𝑖 𝑚̃ 𝑖 (𝜃(𝑐, 𝑖 𝑖 ∝ exp( −𝑢(𝑐
𝜃̃
𝑖)
) and 𝜃 ̃ is the Lagrange multiplier
associated with the constraint that entropy cannot exceed the value on the ordinate axis.
The higher curved line depicts expected utility under the best-case model indexed by the value of the Lagrange mul-
tiplier 𝜃 ̌ > 0 associated with each value of entropy less than or equal to 𝜂 recorded on the ordinate axis, i.e., it is
𝐼 ̌ ̌ 𝑢(𝑐𝑖 )
∑𝑖=1 𝜋𝑖 𝑚̌ 𝑖 (𝜃(𝜂))𝑢(𝑐 𝑖 ) where 𝑚̌ 𝑖 (𝜃(𝑐, 𝜂)) ∝ exp( 𝜃 ̌ ).
Thus, as 𝜃 is lowered, T𝜃 𝑢(𝑐) becomes a more conservative estimate of expected utility under the approximating model
𝜋.
However, as 𝜃 is lowered, the robustness bound (25.19) becomes more informative for sufficiently large values of entropy.
The slope of straight line depicting a bound is −𝜃 and the projection of the point of tangency with the curved depicting
the lower bound of expected utility is the entropy associated with that 𝜃 when it is interpreted as a Lagrange multiplier
on the entropy constraint in the constraint problem .
This is an application of the envelope theorem.
Beyond the helpful mathematical fact that it leads directly to convenient exponential twisting formulas (25.6) and (25.12)
for worst-case probability distortions, there are two related justifications for using entropy to measure discrepancies be-
tween probability distribution.
One arises from the role of entropy in statistical tests for discriminating between models.
The other comes from axioms.
Robust control theory starts with a decision maker who has constructed a good baseline approximating model whose free
parameters he has estimated to fit historical data well.
The decision maker recognizes that actual outcomes might be generated by one of a vast number of other models that fit
the historical data nearly as well as his.
Therefore, he wants to evaluate outcomes under a set of alternative models that are plausible in the sense of being statis-
tically close to his model.
He uses relative entropy to quantify what close means.
[Anderson et al., 2003] and [Barillas et al., 2009]describe links between entropy and large deviations bounds on test
statistics for discriminating between models, in particular, statistics that describe the probability of making an error in
applying a likelihood ratio test to decide whether model A or model B generated a data record of length 𝑇 .
For a given sample size, an informative bound on the detection error probability is a function of the entropy parameter 𝜂
in constraint preferences. [Anderson et al., 2003] and [Barillas et al., 2009] use detection error probabilities to calibrate
reasonable values of 𝜂.
[Anderson et al., 2003] and [Hansen and Sargent, 2008] also use detection error probabilities to calibrate reasonable
values of the penalty parameter 𝜃 in multiplier preferences.
For a fixed sample size and a fixed 𝜃, they would calculate the worst-case 𝑚̂ 𝑖 (𝜃), an associated entropy 𝜂(𝜃), and an
associated detection error probability. In this way they build up a detection error probability as a function of 𝜃.
They then invert this function to calibrate 𝜃 to deliver a reasonable detection error probability.
To indicate outcomes from this approach, the following figure plots the histogram for U.S. quarterly consumption growth
along with a representative agent’s approximating density and a worst-case density that [Barillas et al., 2009] show imply
high measured market prices of risk even when a representative consumer has the unit coefficient of relative risk aversion
associated with a logarithmic one-period utility function.
The density for the approximating model is log 𝑐𝑡+1 − log 𝑐𝑡 = 𝜇 + 𝜎𝑐 𝜖𝑡+1 where 𝜖𝑡+1 ∼ 𝑁 (0, 1) and 𝜇 and 𝜎𝑐 are
estimated by maximum likelihood from the U.S. quarterly data in the histogram over the period 1948.I-2006.IV.
The consumer’s value function under logarithmic utility implies that the worst-case model is log 𝑐𝑡+1 − log 𝑐𝑡 = (𝜇 +
𝜎𝑐 𝑤) + 𝜎𝑐 𝜖𝑡+1
̃ where {𝜖𝑡+1 ̃ } is also a normalized Gaussian random sequence and where 𝑤 is calculated by setting a
detection error probability to .05.
The worst-case model appears to fit the histogram nearly as well as the approximating model.
Multiplier and constraint preferences are both special cases of what [Maccheroni et al., 2006] call variational preferences.
They provide an axiomatic foundation for variational preferences and describe how they express ambiguity aversion.
Constraint preferences are particular instances of the multiple priors model of [Gilboa and Schmeidler, 1989].
TWENTYSIX
ETYMOLOGY OF ENTROPY
In information theory [Shannon and Weaver, 1949], entropy is a measure of the unpredictability of a random variable.
To illustrate things, let 𝑋 be a discrete random variable taking values 𝑥1 , … , 𝑥𝑛 with probabilities 𝑝𝑖 = Prob(𝑋 = 𝑥𝑖 ) ≥
0, ∑𝑖 𝑝𝑖 = 1.
Claude Shannon’s [Shannon and Weaver, 1949] definition of entropy is
log 𝑝
lim 𝑝 log 𝑝 = lim = lim 𝑝 = 0,
𝑝↓0 𝑝↓0 𝑝−1 𝑝↓0
469
Advanced Quantitative Economics with Python
For a discrete random variable 𝑋 with probability density 𝑝 = {𝑝𝑖 }𝑛𝑖=1 , the surprisal for state 𝑖 is 𝑠𝑖 = log ( 𝑝1 ).
𝑖
The quantity log ( 𝑝1 ) is called the surprisal because it is inversely related to the likelihood that state 𝑖 will occur.
𝑖
26.2.1 Example
26.2.2 Example
1
Take an 𝑛-sided possibly unfair die with a probability distribution {𝑝𝑖 }𝑛𝑖=1 . The die is fair if 𝑝𝑖 = 𝑛 ∀𝑖.
For a discrete random variable with probability vector 𝑝, entropy 𝐻(𝑝) is a function that satisfies
• 𝐻 is continuous.
• 𝐻 is symmetric: 𝐻(𝑝1 , 𝑝2 , … , 𝑝𝑛 ) = 𝐻(𝑝𝑟1 , … , 𝑝𝑟𝑛 ) for any permutation 𝑟1 , … , 𝑟𝑛 of 1, … , 𝑛.
• A uniform distribution maximizes 𝐻(𝑝): 𝐻(𝑝1 , … , 𝑝𝑛 ) ≤ 𝐻( 𝑛1 , … , 𝑛1 ).
• Maximum entropy increases with the number of states: 𝐻( 𝑛1 , … , 𝑛1 ) ≤ 𝐻( 𝑛+1
1 1
, … , 𝑛+1 ).
• Entropy is not affected by events zero probability.
Let (𝑋, 𝑌 ) be a bivariate discrete random vector with outcomes 𝑥1 , … , 𝑥𝑛 and 𝑦1 , … , 𝑦𝑚 , respectively, occurring with
probability density 𝑝(𝑥𝑖 , 𝑦𝑖 ).
Conditional entropy 𝐻(𝑋|𝑌 ) is defined as
𝑝(𝑦𝑗 )
𝐻(𝑋|𝑌 ) = ∑ 𝑝(𝑥𝑖 , 𝑦𝑗 ) log . (26.2)
𝑖,𝑗
𝑝(𝑥𝑖 , 𝑦𝑗 )
𝑝(𝑦𝑗 )
Here 𝑝(𝑥𝑖 ,𝑦𝑗 ) , the reciprocal of the conditional probability of 𝑥𝑖 given 𝑦𝑗 , can be defined as the conditional surprisal.
𝑝(𝑥𝑖 , 𝑦𝑗 )
= ∑ 𝑝(𝑥𝑖 , 𝑦𝑗 ) = 𝑝(𝑥𝑖 )∀𝑖.
𝑝(𝑦𝑗 ) 𝑗
Thus, among all joint distributions with identical marginal distributions, the conditional entropy maximizing joint distri-
bution makes 𝑥 and 𝑦 be independent.
26.6 Thermodynamics
Let 𝑋 be a discrete state space 𝑥1 , … , 𝑥𝑛 and let 𝑝 and 𝑞 be two discrete probability distributions on 𝑋.
𝑝𝑖
Assume that 𝑞𝑡 ∈ (0, ∞) for all 𝑖 for which 𝑝𝑖 > 0.
Then the Kullback-Leibler statistical divergence, also called relative entropy, is defined as
𝑝𝑖 𝑝 𝑝
𝐷(𝑝|𝑞) = ∑ 𝑝𝑖 log ( ) = ∑ 𝑞𝑖 ( 𝑖 ) log ( 𝑖 ) . (26.4)
𝑖
𝑞𝑖 𝑖
𝑞𝑖 𝑞𝑖
Evidently,
For a continuous random variable, Kullback-Leibler divergence between two densities 𝑝 and 𝑞 is defined as
𝑝(𝑥)
𝐷(𝑝|𝑞) = ∫ 𝑝(𝑥) log ( ) 𝑑 𝑥.
𝑞(𝑥)
We want to compute relative entropy for two continuous densities 𝜙 and 𝜙 ̂ when 𝜙 is 𝑁 (0, 𝐼) and 𝜙 ̂ is 𝑁 (𝑤, Σ), where
the covariance matrix Σ is nonsingular.
We seek a formula for
̂ − log 𝜙(𝜀))𝜙(𝜀)𝑑𝜀.
ent = ∫(log 𝜙(𝜀) ̂
Claim
1 1 1
ent = − log det Σ + 𝑤′ 𝑤 + trace(Σ − 𝐼). (26.5)
2 2 2
Proof
The log likelihood ratio is
̂ − log 𝜙(𝜀) = 1
log 𝜙(𝜀) [−(𝜀 − 𝑤)′ Σ−1 (𝜀 − 𝑤) + 𝜀′ 𝜀 − log det Σ] . (26.6)
2
Observe that
1 ̂ 1
− ∫ (𝜀 − 𝑤)′ Σ−1 (𝜀 − 𝑤)𝜙(𝜀)𝑑𝜀 = − trace(𝐼).
2 2
Applying the identity 𝜀 = 𝑤 + (𝜀 − 𝑤) gives
1 ′ 1 1
𝜀 𝜀 = 𝑤′ 𝑤 + (𝜀 − 𝑤)′ (𝜀 − 𝑤) + 𝑤′ (𝜀 − 𝑤).
2 2 2
Taking mathematical expectations
1 ̂ 1 1
∫ 𝜀′ 𝜀𝜙(𝜀)𝑑𝜀 = 𝑤′ 𝑤 + trace(Σ).
2 2 2
Combining terms gives
𝑆 = −trace(𝑃 ln 𝑃 )
After flipping signs, [Backus et al., 2014] use Kullback-Leibler relative entropy as a measure of volatility of stochastic
discount factors that they assert is useful for characterizing features of both the data and various theoretical models of
stochastic discount factors.
∗
Where 𝑝𝑡+1 is the physical or true measure, 𝑝𝑡+1 is the risk-neutral measure, and 𝐸𝑡 denotes conditional expectation
under the 𝑝𝑡+1 measure, [Backus et al., 2014] define entropy as
∗ ∗
𝐿𝑡 (𝑝𝑡+1 /𝑝𝑡+1 ) = −𝐸𝑡 log(𝑝𝑡+1 /𝑝𝑡+1 ). (26.9)
𝐸𝑡 (𝑚𝑡+1 𝑟𝑡+1 ) = 1
which they propose as a complement to a Hansen-Jagannathan [Hansen and Jagannathan, 1991] bound.
Let {𝑥𝑡 }∞
𝑡=−∞ be a covariance stationary stochastic process with mean zero and spectral density 𝑆𝑥 (𝜔).
The variance of 𝑥 is
𝜋
1
𝜎𝑥2 = ( ) ∫ 𝑆 (𝜔)𝑑𝜔.
2𝜋 −𝜋 𝑥
As described in chapter XIV of [Sargent, 1987], the Wiener-Kolmogorov formula for the one-period ahead prediction
error is
𝜋
1
𝜎𝜖2 = exp [( ) ∫ log 𝑆𝑥 (𝜔)𝑑𝜔] . (26.11)
2𝜋 −𝜋
Occasionally the logarithm of the one-step-ahead prediction error 𝜎𝜖2 is called entropy because it measures unpredictabil-
ity.
Consider the following problem reminiscent of one described earlier.
Problem:
Among all covariance stationary univariate processes with unconditional variance 𝜎𝑥2 , find a process with maximal one-
step-ahead prediction error.
The maximizer is a process with spectral density
𝑆𝑥 (𝜔) = 2𝜋𝜎𝑥2 .
Thus, among all univariate covariance stationary processes with variance 𝜎𝑥2 , a process with a flat spectral density is the
most uncertain, in the sense of one-step-ahead prediction error variance.
This no-patterns-across-time outcome for a temporally dependent process resembles the no-pattern-across-states outcome
for the static entropy maximizing coin or die in the classic information theoretic analysis described above.
Let
∞
𝑦𝑡 = 𝐷(𝐿)𝜖𝑡 ≡ ∑ 𝐷𝑗 𝜖𝑡
𝑗=0
be a Wold representation for 𝑦, where 𝐷(0)𝜖𝑡 is a vector of one-step-ahead errors in predicting 𝑦𝑡 conditional on the
infinite history 𝑦𝑡−1 = [𝑦𝑡−1 , 𝑦𝑡−2 , …] and 𝜖𝑡 is an 𝑛 × 1 vector of serially uncorrelated random disturbances with mean
zero and identity contemporaneous covariance matrix 𝐸𝜖𝑡 𝜖′𝑡 = 𝐼.
Linear-least-squares predictors have one-step-ahead prediction error 𝐷(0)𝐷(0)′ that satisfies
𝜋
1
log det[𝐷(0)𝐷(0)′ ] = ( ) ∫ log det[𝑆𝑦 (𝜔)]𝑑𝜔. (26.12)
2𝜋 −𝜋
Being a measure of the unpredictability of an 𝑛×1 vector covariance stationary stochastic process, the left side of (26.12)
is sometimes called entropy.
Chapter 8 of [Hansen and Sargent, 2008] adapts work in the control theory literature to define a frequency domain
entropy criterion for robust control as
where 𝜃 ∈ (𝜃, +∞) is a positive robustness parameter and 𝐺𝐹 (𝜁) is a 𝜁-transform of the objective function.
Hansen and Sargent [Hansen and Sargent, 2008] show that criterion (26.13) can be represented as
Let 𝑥 be a continuous random variable with density 𝜙(𝑥), and let 𝑔(𝑥) be a nonnegative random variable satisfying
∫ 𝑔(𝑥)𝜙(𝑥)𝑑𝑥 = 1.
̂
The relative entropy of the distorted density 𝜙(𝑥) = 𝑔(𝑥)𝜙(𝑥) is defined as
Fig. 26.2 plots the functions 𝑔 log 𝑔 and 𝑔 − 1 over the interval 𝑔 ≥ 0.
That relative entropy ent(𝑔) ≥ 0 can be established by noting (a) that 𝑔 log 𝑔 ≥ 𝑔 − 1 (see Fig. 26.2) and (b) that under
𝜙, 𝐸𝑔 = 1.
Fig. 26.3 and Fig. 26.4 display aspects of relative entropy visually for a continuous random variable 𝑥 for two densities
with likelihood ratio 𝑔 ≥ 0.
Where the numerator density is 𝒩(0, 1), for two denominator Gaussian densities 𝒩(0, 1.5) and 𝒩(0, .95), respectively,
Fig. 26.3 and Fig. 26.4 display the functions 𝑔 log 𝑔 and 𝑔 − 1 as functions of 𝑥.
Fig. 26.2: The function 𝑔 log 𝑔 for 𝑔 ≥ 0. For a random variable 𝑔 with 𝐸𝑔 = 1, 𝐸𝑔 log 𝑔 ≥ 0.
Fig. 26.3: Graphs of 𝑔 log 𝑔 and 𝑔 − 1 where 𝑔 is the ratio of the density of a 𝒩(0, 1) random variable to the density of
a 𝒩(0, 1.5) random variable. Under the 𝒩(0, 1.5) density, 𝐸𝑔 = 1.
Fig. 26.4: 𝑔 log 𝑔 and 𝑔 − 1 where 𝑔 is the ratio of the density of a 𝒩(0, 1) random variable to the density of a 𝒩(0, 1.5)
random variable. Under the 𝒩(0, 1.5) density, 𝐸𝑔 = 1.
TWENTYSEVEN
ROBUSTNESS
In addition to what’s in Anaconda, this lecture will need the following libraries:
27.1 Overview
This lecture modifies a Bellman equation to express a decision-maker’s doubts about transition dynamics.
His specification doubts make the decision-maker want a robust decision rule.
Robust means insensitive to misspecification of transition dynamics.
The decision-maker has a single approximating model of the transition dynamics.
He calls it approximating to acknowledge that he doesn’t completely trust it.
He fears that transition dynamics are actually determined by another model that he cannot describe explicitly.
All that he knows is that the actual data-generating model is in some (uncountable) set of models that surrounds his
approximating model.
He quantifies the discrepancy between his approximating model and the genuine data-generating model by using a quantity
called entropy.
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models actually governs outcomes.
This is what it means for his decision rule to be “robust to misspecification of an approximating model”.
This may sound like too much to ask for, but ….
… a secret weapon is available to design robust decision rules.
The secret weapon is max-min control theory.
A value-maximizing decision-maker enlists the aid of an (imaginary) value-minimizing model chooser to construct bounds
on the value attained by a given decision rule under different models of the transition dynamics.
The original decision-maker uses those bounds to construct a decision rule with an assured performance level, no matter
which model actually governs outcomes.
Note: In reading this lecture, please don’t think that our decision-maker is paranoid when he conducts a worst-case
analysis. By designing a rule that works well against a worst-case, his intention is to construct a rule that will work well
across a set of models.
479
Advanced Quantitative Economics with Python
import pandas as pd
import numpy as np
from scipy.linalg import eig
import matplotlib.pyplot as plt
import quantecon as qe
Our “robust” decision-maker wants to know how well a given rule will work when he does not know a single transition
law ….
… he wants to know sets of values that will be attained by a given decision rule 𝐹 under a set of transition laws.
Ultimately, he wants to design a decision rule 𝐹 that shapes the set of values in ways that he prefers.
With this in mind, consider the following graph, which relates to a particular decision problem to be explained below
• Value refers to a sum of discounted rewards obtained by applying the decision rule 𝐹 when the state starts at some
fixed initial state 𝑥0 .
• Entropy is a non-negative number that measures the size of a set of models surrounding the decision-maker’s ap-
proximating model.
– Entropy is zero when the set includes only the approximating model, indicating that the decision-maker com-
pletely trusts the approximating model.
– Entropy is bigger, and the set of surrounding models is bigger, the less the decision-maker trusts the approx-
imating model of the transition dynamics.
The shaded region indicates that for all models having entropy less than or equal to the number on the horizontal axis,
the value obtained will be somewhere within the indicated set of values.
Now let’s compare sets of values associated with two different decision rules, 𝐹𝑟 and 𝐹𝑏 .
In the next figure,
• The red set shows the value-entropy correspondence for decision rule 𝐹𝑟 .
• The blue set shows the value-entropy correspondence for decision rule 𝐹𝑏 .
Notice that the less robust rule 𝐹𝑟 promises higher values for small misspecifications (small entropy).
(But it is more fragile in the sense that it is more sensitive to perturbations of the approximating model)
Below we’ll explain in detail how to construct these sets of values for a given 𝐹 , but for now ….
Here is a hint about the secret weapons we’ll use to construct these sets
• We’ll use some min problems to construct the lower bounds
• We’ll use some max problems to construct the upper bounds
We will also describe how to choose 𝐹 to shape the sets of values.
This will involve crafting a skinnier set at the cost of a lower level (at least for low values of entropy).
If you want to understand more about why one serious quantitative researcher is interested in this approach, we recom-
mend Lars Peter Hansen’s Nobel lecture.
For simplicity, we present ideas in the context of a class of problems with linear transition laws and quadratic objective
functions.
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than value maximization.
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of controls {𝑢𝑡 } to minimize
∞
∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (27.1)
𝑡=0
As before,
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
• 𝑅 is 𝑛 × 𝑛 and 𝑄 is 𝑘 × 𝑘
We also allow for model uncertainty on the part of the agent solving this optimization problem.
In particular, the agent takes 𝑤𝑡 = 0 for all 𝑡 ≥ 0 as a benchmark model but admits the possibility that this model might
be wrong.
As a consequence, she also considers a set of alternative models expressed in terms of sequences {𝑤𝑡 } that are more or
less “close” to the zero sequence.
She seeks a policy that will do well enough for a set of alternative models whose members are pinned down by sequences
{𝑤𝑡 }.
A sequence {𝑤𝑡 } might represent
• nonlinearities absent from the approximating model
• time variations in parameters of the approximating model
• omitted state variables in the approximating model
• neglected history dependencies …
• and other potential sources of misspecification
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the discounted sum
∞
∑𝑡=0 𝛽 𝑡+1 𝑤𝑡+1
′
𝑤𝑡+1 .
If our agent takes {𝑤𝑡 } as a given deterministic sequence, then, drawing on ideas in earlier lectures on dynamic program-
ming, we can anticipate Bellman equations such as
where
and 𝐼 is a 𝑗 × 𝑗 identity matrix. Substituting this expression for the maximum into (27.3) yields
𝑃 = ℬ(𝒟(𝑃 ))
The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and 𝑃 = ℬ(𝑃 ) is the standard matrix Riccati
equation coming from the Bellman equation — see this discussion.
Under some regularity conditions (see [Hansen and Sargent, 2008]), the operator ℬ ∘ 𝒟 has a unique positive definite
fixed point, which we denote below by 𝑃 ̂ .
A robust policy, indexed by 𝜃, is 𝑢 = −𝐹 ̂ 𝑥 where
We also define
The interpretation of 𝐾̂ is that 𝑤𝑡+1 = 𝐾𝑥̂ 𝑡 on the worst-case path of {𝑥𝑡 }, in the sense that this vector is the maximizer
of (27.4) evaluated at the fixed rule 𝑢 = −𝐹 ̂ 𝑥.
Note that 𝑃 ̂ , 𝐹 ̂ , 𝐾̂ are all determined by the primitives and 𝜃.
Note also that if 𝜃 is very large, then 𝒟 is approximately equal to the identity mapping.
Hence, when 𝜃 is large, 𝑃 ̂ and 𝐹 ̂ are approximately equal to their standard LQ values.
Furthermore, when 𝜃 is large, 𝐾̂ is approximately equal to zero.
Conversely, smaller 𝜃 is associated with greater fear of model misspecification and greater concern for robustness.
What we have done above can be interpreted in terms of a two-person zero-sum game in which 𝐹 ̂ , 𝐾̂ are Nash equilibrium
objects.
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting the possibility of misspec-
ification.
Agent 2 is an imaginary malevolent player.
Agent 2’s malevolence helps the original agent to compute bounds on his value function across a set of models.
We begin with agent 2’s problem.
Agent 2
1. knows a fixed policy 𝐹 specifying the behavior of agent 1, in the sense that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡
2. responds by choosing a shock sequence {𝑤𝑡 } from a set of paths sufficiently close to the benchmark sequence
{0, 0, 0, …}
∞
A natural way to say “sufficiently close to the zero sequence” is to restrict the summed inner product ∑𝑡=1 𝑤𝑡′ 𝑤𝑡 to be
small.
However, to obtain a time-invariant recursive formulation, it turns out to be convenient to restrict a discounted inner
product
∞
∑ 𝛽 𝑡 𝑤𝑡′ 𝑤𝑡 ≤ 𝜂 (27.9)
𝑡=1
Now let 𝐹 be a fixed policy, and let 𝐽𝐹 (𝑥0 , w) be the present-value cost of that policy given sequence w ∶= {𝑤𝑡 } and
initial condition 𝑥0 ∈ ℝ𝑛 .
Substituting −𝐹 𝑥𝑡 for 𝑢𝑡 in (27.1), this value can be written as
∞
𝐽𝐹 (𝑥0 , w) ∶= ∑ 𝛽 𝑡 𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 (27.10)
𝑡=0
where
or, equivalently,
∞
min ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 + 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (27.12)
w
𝑡=0
subject to (27.11).
What’s striking about this optimization problem is that it is once again an LQ discounted dynamic programming problem,
with w = {𝑤𝑡 } as the sequence of controls.
The expression for the optimal policy can be found by applying the usual LQ formula (see here).
We denote it by 𝐾(𝐹 , 𝜃), with the interpretation 𝑤𝑡+1 = 𝐾(𝐹 , 𝜃)𝑥𝑡 .
The remaining step for agent 2’s problem is to set 𝜃 to enforce the constraint (27.9), which can be done by choosing
𝜃 = 𝜃𝜂 such that
∞
𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃𝜂 )′ 𝐾(𝐹 , 𝜃𝜂 )𝑥𝑡 = 𝜂 (27.13)
𝑡=0
Here 𝑥𝑡 is given by (27.11) — which in this case becomes 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡 .
Define the minimized object on the right side of problem (27.12) as 𝑅𝜃 (𝑥0 , 𝐹 ).
Because “minimizers minimize” we have
∞ ∞
𝑅𝜃 (𝑥0 , 𝐹 ) ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } + 𝛽𝜃 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1 ,
𝑡=0 𝑡=0
where
∞
ent ∶= 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (27.14) is a straight line with slope −𝜃.
Technically, it is a “separating hyperplane”.
At a particular value of entropy, the line is tangent to the lower bound of values as a function of entropy.
In particular, the lower bound on the left side of (27.14) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)′ 𝐾(𝐹 , 𝜃)𝑥𝑡 (27.15)
𝑡=0
To construct the lower bound on the set of values associated with all perturbations w satisfying the entropy constraint
(27.9) at a given entropy level, we proceed as follows:
Note: This procedure sweeps out a set of separating hyperplanes indexed by different values for the Lagrange multiplier
𝜃.
where
∞
ent ≡ 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (27.17) is a straight line with slope 𝜃.̃
The upper bound on the left side of (27.17) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)̃ ′ 𝐾(𝐹 , 𝜃)𝑥
̃
𝑡 (27.18)
𝑡=0
To construct the upper bound on the set of values associated all perturbations w with a given entropy we proceed much
as we did for the lower bound
• For a given 𝜃,̃ solve the maximization problem (27.16).
• Compute the maximizer 𝑉𝜃 ̃(𝑥0 , 𝐹 ) and the associated entropy using (27.18).
• Compute the upper bound on the value function 𝑉𝜃 ̃(𝑥0 , 𝐹 ) + 𝜃 ̃ ent and plot it against ent.
• Repeat the preceding three steps for a range of values of 𝜃 ̃ to trace out the upper bound.
Now in the interest of reshaping these sets of values by choosing 𝐹 , we turn to agent 1’s problem.
subject to
Once again, the expression for the optimal policy can be found here — we denote it by 𝐹 ̃ .
Clearly, the 𝐹 ̃ we have obtained depends on 𝐾, which, in agent 2’s problem, depended on an initial policy 𝐹 .
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where
𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))
Now we turn to the stochastic case, where the sequence {𝑤𝑡 } is treated as an IID sequence of random vectors.
In this setting, we suppose that our agent is uncertain about the conditional probability distribution of 𝑤𝑡+1 .
The agent takes the standard normal distribution 𝑁 (0, 𝐼) as the baseline conditional distribution, while admitting the
possibility that other “nearby” distributions prevail.
These alternative conditional distributions of 𝑤𝑡+1 might depend nonlinearly on the history 𝑥𝑠 , 𝑠 ≤ 𝑡.
To implement this idea, we need a notion of what it means for one distribution to be near another one.
Here we adopt a very useful measure of closeness for distributions known as the relative entropy, or Kullback-Leibler
divergence.
For densities 𝑝, 𝑞, the Kullback-Leibler divergence of 𝑞 from 𝑝 is defined as
𝑝(𝑥)
𝐷𝐾𝐿 (𝑝, 𝑞) ∶= ∫ ln [ ] 𝑝(𝑥) 𝑑𝑥
𝑞(𝑥)
Using this notation, we replace (27.3) with the stochastic analog
𝐽 (𝑥) = min max {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [∫ 𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]} (27.22)
𝑢 𝜓∈𝒫
Here 𝒫 represents the set of all densities on ℝ𝑛 and 𝜙 is the benchmark distribution 𝑁 (0, 𝐼).
The distribution 𝜙 is chosen as the least desirable conditional distribution in terms of next period outcomes, while taking
into account the penalty term 𝜃𝐷𝐾𝐿 (𝜓, 𝜙).
This penalty term plays a role analogous to the one played by the deterministic penalty 𝜃𝑤′ 𝑤 in (27.3), since it discourages
large deviations from the benchmark.
The maximization problem in (27.22) appears highly nontrivial — after all, we are maximizing over an infinite dimen-
sional space consisting of the entire set of densities.
However, it turns out that the solution is tractable, and in fact also falls within the class of normal distributions.
First, we note that 𝐽 has the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 for some positive definite matrix 𝑃 and constant real number 𝑑.
Moreover, it turns out that if (𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 is nonsingular, then
where
Substituting the expression for the maximum into Bellman equation (27.22) and using 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 gives
𝑥′ 𝑃 𝑥 + 𝑑 = min {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]} (27.25)
𝑢
Since constant terms do not affect minimizers, the solution is the same as (27.6), leading to
To solve this Bellman equation, we take 𝑃 ̂ to be the positive definite fixed point of ℬ ∘ 𝒟.
In addition, we take 𝑑 ̂ as the real number solving 𝑑 = 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )], which is
𝛽
𝑑 ̂ ∶= 𝜅(𝜃, 𝑃 ) (27.26)
1−𝛽
The robust policy in this stochastic case is the minimizer in (27.25), which is once again 𝑢 = −𝐹 ̂ 𝑥 for 𝐹 ̂ given by (27.7).
Substituting the robust policy into (27.24) we obtain the worst-case shock distribution:
̂ 𝑡 , (𝐼 − 𝜃−1 𝐶 ′ 𝑃 ̂ 𝐶)−1 )
𝑤𝑡+1 ∼ 𝑁 (𝐾𝑥
Before turning to implementation, we briefly outline how to compute several other quantities of interest.
One thing we will be interested in doing is holding a policy fixed and computing the discounted loss associated with that
policy.
So let 𝐹 be a given policy and let 𝐽𝐹 (𝑥) be the associated loss, which, by analogy with (27.22), satisfies
Writing 𝐽𝐹 (𝑥) = 𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 and applying the same argument used to derive (27.23) we get
and
𝛽 𝛽
𝑑𝐹 ∶= 𝜅(𝜃, 𝑃𝐹 ) = 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃𝐹 𝐶)−1 ] (27.27)
1−𝛽 1−𝛽
If you skip ahead to the appendix, you will be able to verify that −𝑃𝐹 is the solution to the Bellman equation in agent 2’s
problem discussed above — we use this in our computations.
27.6 Implementation
The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ optimal control.
The code can be found on GitHub.
Here is a brief description of the methods of the class
• d_operator() and b_operator() implement 𝒟 and ℬ respectively
• robust_rule() and robust_rule_simple() both solve for the triple 𝐹 ̂ , 𝐾,̂ 𝑃 ̂ , as described in equations
(27.7) – (27.8) and the surrounding discussion
– robust_rule() is more efficient
– robust_rule_simple() is more transparent and easier to follow
• K_to_F() and F_to_K() solve the decision problems of agent 1 and agent 2 respectively
• compute_deterministic_entropy() computes the left-hand side of (27.13)
• evaluate_F() computes the loss and entropy associated with a given policy — see this discussion
27.7 Application
Let us consider a monopolist similar to this one, but now facing model uncertainty.
The inverse demand function is 𝑝𝑡 = 𝑎0 − 𝑎1 𝑦𝑡 + 𝑑𝑡 .
where
IID
𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑑 𝑤𝑡+1 , {𝑤𝑡 } ∼ 𝑁 (0, 1)
(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑦𝑡 − 𝛾 − 𝑐𝑦𝑡
2
∞
Its objective is to maximize expected discounted profits, or, equivalently, to minimize 𝔼 ∑𝑡=0 𝛽 𝑡 (−𝑟𝑡 ).
To form a linear regulator problem, we take the state and control to be
1
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡⎥
⎤ and 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣𝑑𝑡 ⎦
Setting 𝑏 ∶= (𝑎0 − 𝑐)/2 we define
0 𝑏 0
𝑅 = −⎡
⎢ 𝑏 −𝑎 1 1/2⎤
⎥ and 𝑄 = 𝛾/2
⎣0 1/2 0 ⎦
The standard normal distribution for 𝑤𝑡 is understood as the agent’s baseline, with uncertainty parameterized by 𝜃.
We compute value-entropy correspondences for two policies
1. The no concern for robustness policy 𝐹0 , which is the ordinary LQ loss minimizer.
2. A “moderate” concern for robustness policy 𝐹𝑏 , with 𝜃 = 0.02.
The code for producing the graph shown above, with blue being for the robust policy, is as follows
# Model parameters
a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
c = 2
γ = 50.0
θ = 0.02
ac = (a_0 - c) / 2.0
# Define LQ matrices
R = -R # For minimization
Q = γ / 2
# ----------------------------------------------------------------------- #
# Functions
# ----------------------------------------------------------------------- #
"""
Given θ (scalar, dtype=float) and policy F (array_like), returns the
value associated with that policy under the worst case path for {w_t},
as well as the entropy level.
"""
"""
Compute the value function and entropy levels for a θ path
increasing until it reaches the specified target entropy value.
Parameters
==========
emax: scalar
The target entropy value
F: array_like
The policy function to be evaluated
bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.
Returns
=======
df: pd.DataFrame
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.
"""
if bw == 'worst':
θs = 1 / np.linspace(1e-8, 1000, grid_size)
else:
θs = -1 / np.linspace(1e-8, 1000, grid_size)
for θ in θs:
df.loc[θ] = evaluate_policy(θ, F)
if df.loc[θ, 'entropy'] >= emax:
break
df = df.dropna(how='any')
return df
# ------------------------------------------------------------------------ #
# Main
# ------------------------------------------------------------------------ #
emax = 1.6e6
fig, ax = plt.subplots()
ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()
class Curve:
plt.show()
↪extract a single element from your array before performing this operation.␣
↪extract a single element from your array before performing this operation.␣
↪extract a single element from your array before performing this operation.␣
↪extract a single element from your array before performing this operation.␣
27.8 Appendix
We sketch the proof only of the first claim in this section, which is that, for any given 𝜃, 𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as
given in (27.8).
This is the content of the next lemma.
Lemma. If 𝑃 ̂ is the fixed point of the map ℬ ∘ 𝒟 and 𝐹 ̂ is the robust policy as given in (27.7), then
Proof: As a first step, observe that when 𝐹 = 𝐹 ̂ , the Bellman equation associated with the LQ problem (27.11) – (27.12)
is
(revisit this discussion if you don’t know where (27.29) comes from) and the optimal policy is
Using the definition of 𝒟, we can rewrite the right-hand side more simply as
Although it involves a substantial amount of algebra, it can be shown that the latter is just 𝑃 ̂ .
TWENTYEIGHT
In addition to what’s in Anaconda, this lecture will need the following libraries:
28.1 Overview
import numpy as np
import quantecon as qe
from scipy.linalg import solve
import matplotlib.pyplot as plt
Decisions of two agents affect the motion of a state vector that appears as an argument of payoff functions of both agents.
As described in Markov perfect equilibrium, when decision-makers have no concerns about the robustness of their de-
cision rules to misspecifications of the state dynamics, a Markov perfect equilibrium can be computed via backward
recursion on two sets of equations
• a pair of Bellman equations, one for each agent.
• a pair of equations that express linear decision rules for each agent as functions of that agent’s continuation value
function as well as parameters of preferences and state transition matrices.
499
Advanced Quantitative Economics with Python
This lecture shows how a similar equilibrium concept and similar computational procedures apply when we impute con-
cerns about robustness to both decision-makers.
A Markov perfect equilibrium with robust agents will be characterized by
• a pair of Bellman equations, one for each agent.
• a pair of equations that express linear decision rules for each agent as functions of that agent’s continuation value
function as well as parameters of preferences and state transition matrices.
• a pair of equations that express linear decision rules for worst-case shocks for each agent as functions of that agent’s
continuation value function as well as parameters of preferences and state transition matrices.
Below, we’ll construct a robust firms version of the classic duopoly model with adjustment costs analyzed in Markov
perfect equilibrium.
As we saw in Markov perfect equilibrium, the study of Markov perfect equilibria in dynamic games with two players
leads us to an interrelated pair of Bellman equations.
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Riccati equations” with a tractable
mathematical structure.
We consider a general linear quadratic regulator game with two players, each of whom fears model misspecifications.
We often call the players agents.
The agents share a common baseline model for the transition dynamics of the state vector
• this is a counterpart of a ‘rational expectations’ assumption of shared beliefs
But now one or more agents doubt that the baseline model is correctly specified.
The agents express the possibility that their baseline specification is incorrect by adding a contribution 𝐶𝑣𝑖𝑡 to the time
𝑡 transition law for the state
• 𝐶 is the usual volatility matrix that appears in stochastic versions of optimal linear regulator problems.
• 𝑣𝑖𝑡 is a possibly history-dependent vector of distortions to the dynamics of the state that agent 𝑖 uses to represent
misspecification of the original model.
For convenience, we’ll start with a finite horizon formulation, where 𝑡0 is the initial date and 𝑡1 is the common terminal
date.
Player 𝑖 takes a sequence {𝑢−𝑖𝑡 } as given and chooses a sequence {𝑢𝑖𝑡 } to minimize and {𝑣𝑖𝑡 } to maximize
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 − 𝜃𝑖 𝑣𝑖𝑡
′
𝑣𝑖𝑡 } (28.1)
𝑡=𝑡0
Here
• 𝑥𝑡 is an 𝑛 × 1 state vector, 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖, and
subject to
where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡
′
• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖′ − 𝑀𝑖′ 𝐹−𝑖𝑡
This is an LQ robust dynamic programming problem of the type studied in the Robustness lecture, which can be solved
by working backward.
Maximization with respect to distortion 𝑣1𝑡 leads to the following version of the 𝒟 operator from the Robustness lecture,
namely
The matrix 𝐹1𝑡 in the policy rule 𝑢1𝑡 = −𝐹1𝑡 𝑥𝑡 that solves agent 1’s problem satisfies
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) (28.6)
𝑃1𝑡 = Π1𝑡 − (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 )′ (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 )+
(28.7)
𝛽Λ′1𝑡 𝒟1 (𝑃1𝑡+1 )Λ1𝑡
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) (28.8)
𝑃2𝑡 = Π2𝑡 − (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 )′ (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 )+
(28.9)
𝛽Λ′2𝑡 𝒟2 (𝑃2𝑡+1 )Λ2𝑡
As in Markov perfect equilibrium, a key insight here is that equations (28.6) and (28.8) are linear in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (28.7) and (28.9).
Notice how 𝑗’s control law 𝐹𝑗𝑡 is a function of {𝐹𝑖𝑠 , 𝑠 ≥ 𝑡, 𝑖 ≠ 𝑗}.
Thus, agent 𝑖’s choice of {𝐹𝑖𝑡 ; 𝑡 = 𝑡0 , … , 𝑡1 − 1} influences agent 𝑗’s choice of control laws.
However, in the Markov perfect equilibrium of this game, each agent is assumed to ignore the influence that his choice
exerts on the other agent’s choice.
After these equations have been solved, we can also deduce associated sequences of worst-case shocks.
𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡
where
We often want to compute the solutions of such games for infinite horizons, in the hope that the decision rules 𝐹𝑖𝑡 settle
down to be time-invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driving 𝑡0 → −∞.
This is the approach we adopt in the next section.
28.2.6 Implementation
We use the function nnash_robust to compute a Markov perfect equilibrium of the infinite horizon linear quadratic
dynamic game with robust planers in the manner described above.
28.3 Application
Without concerns for robustness, the model is identical to the duopoly model from the Markov perfect equilibrium lecture.
To begin, we briefly review the structure of that model.
Two firms are the only producers of a good the demand for which is governed by a linear inverse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (28.10)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and 𝑎0 > 0, 𝑎1 > 0.
In (28.10) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve (28.10) into (28.11) lets us express the one-period payoff as
∞
The objective of the firm is to maximize ∑𝑡=0 𝛽 𝑡 𝜋𝑖𝑡 .
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current state (𝑞𝑖 , 𝑞−𝑖 ).
This completes our review of the duopoly model without concerns for robustness.
Now we activate robustness concerns of both firms.
To map a robust version of the duopoly model into coupled robust linear-quadratic dynamic programming problems, we
again define the state and controls as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡ 𝑎0
⎢− 2 𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0𝑎 0 𝑎1
2
⎤
⎥
𝑎1 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
then we recover the one-period payoffs (28.11) for the two firms in the duopoly model.
The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where
1 0 0 0 0
𝐴 ∶= ⎡
⎢0 1 0⎤⎥, 𝐵1 ∶= ⎡ ⎤
⎢1⎥ , 𝐵2 ∶= ⎡ ⎤
⎢0⎥
⎣0 0 1⎦ ⎣0⎦ ⎣1⎦
A robust decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following closed-loop system for the
evolution of 𝑥 in the Markov perfect equilibrium:
import numpy as np
import quantecon as qe
# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
(continues on next page)
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
We add robustness concerns to the Markov Perfect Equilibrium model by extending the function qe.nnash (link) into
a robustness version by adding the maximization operator 𝒟(𝑃 ) into the backward induction.
The MPE with robustness function is nnash_robust.
The function’s code is as follows
def nnash_robust(A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
θ1, θ2, beta=1.0, tol=1e-8, max_iter=1000):
r"""
Compute the limit of a Nash linear quadratic dynamic game with
robustness concern.
Parameters
----------
A : scalar(float) or array_like(float)
Corresponds to the MPE equations, should be of size (n, n)
C : scalar(float) or array_like(float)
As above, size (n, c), c is the size of w
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
θ1 : scalar(float)
Robustness parameter of player 1
θ2 : scalar(float)
Returns
-------
F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 1
P2 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 2
"""
# Initial values
n = A.shape[0]
k_1 = B1.shape[1]
k_2 = B2.shape[1]
v1 = np.eye(k_1)
v2 = np.eye(k_2)
P1 = np.eye(n) * 1e-5
P2 = np.eye(n) * 1e-5
F1 = np.random.randn(k_1, n)
F2 = np.random.randn(k_2, n)
for it in range(max_iter):
# Update
F10 = F1
F20 = F2
I = np.eye(C.shape[1])
# D1(P1)
# Note: INV1 may not be solved if the matrix is singular
INV1 = solve(θ1 * I - C.T @ P1 @ C, I)
D1P1 = P1 + P1 @ C @ INV1 @ C.T @ P1
Λ1 = A - B2 @ F2
Λ2 = A - B1 @ F1
Π1 = R1 + F2.T @ S1 @ F2
Π2 = R2 + F1.T @ S2 @ F1
Γ1 = W1.T - M1.T @ F2
Γ2 = W2.T - M2.T @ F1
# Compute P1 and P2
P1 = Π1 - (B1.T @ D1P1 @ Λ1 + Γ1).T @ F1 + \
Λ1.T @ D1P1 @ Λ1
P2 = Π2 - (B2.T @ D2P2 @ Λ2 + Γ2).T @ F2 + \
Λ2.T @ D2P2 @ Λ2
else:
raise ValueError(f'No convergence: Iteration limit of {max_iter} \
reached in nnash')
where
1
𝑥𝑡 ∶= ⎡ ⎤
⎢𝑞1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
and
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡ 𝑎0
⎢− 2 𝑎1 𝑎1 ⎤
2 ⎥, 𝑅2 ∶= ⎡
⎢ 0𝑎 0 𝑎1
2
⎤ , 𝑄 = 𝑄 = 𝛾, 𝑆 = 𝑆 = 0,
⎥ 1 2 1 2 𝑊1 = 𝑊2 = 0, 𝑀1 = 𝑀2 = 0
𝑎1 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Consistency Check
We first conduct a comparison test to check if nnash_robust agrees with qe.nnash in the non-robustness case in
which each 𝜃𝑖 ≈ +∞
We can see that the results are consistent across the two functions.
We want to compare the dynamics of price and output under the baseline MPE model with those under the baseline
model under the robust decision rules within the robust MPE.
This means that we simulate the state dynamics under the MPE equilibrium closed-loop transition matrix
𝐴𝑜 = 𝐴 − 𝐵 1 𝐹1 − 𝐵 2 𝐹2
where 𝐹1 and 𝐹2 are the firms’ robust decision rules within the robust markov_perfect equilibrium
• by simulating under the baseline model transition dynamics and the robust MPE rules we are in assuming that at
the end of the day firms’ concerns about misspecification of the baseline model do not materialize.
• a short way of saying this is that misspecification fears are all ‘just in the minds’ of the firms.
• simulating under the baseline model is a common practice in the literature.
• note that some assumption about the model that actually governs the data has to be made in order to create a
simulation.
• later we will describe the (erroneous) beliefs of the two firms that justify their robust decisions as best responses
to transition laws that are distorted relative to the baseline model.
After simulating 𝑥𝑡 under the baseline transition dynamics and robust decision rules 𝐹𝑖 , 𝑖 = 1, 2, we extract and plot
industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 .
Here we set the robustness and volatility matrix parameters as follows:
• 𝜃1 = 0.02
• 𝜃2 = 0.04
0
⎜0.01⎞
• 𝐶=⎛ ⎟
⎝ 0.01⎠
Because we have set 𝜃1 < 𝜃2 < +∞ we know that
• both firms fear that the baseline specification of the state transition dynamics are incorrect.
• firm 1 fears misspecification more than firm 2.
The following code prepares graphs that compare market-wide output 𝑞1𝑡 + 𝑞2𝑡 and the price of the good 𝑝𝑡 under
equilibrium decision rules 𝐹𝑖 , 𝑖 = 1, 2 from an ordinary Markov perfect equilibrium and the decision rules under a
Markov perfect equilibrium with robust firms with multiplier parameters 𝜃𝑖 , 𝑖 = 1, 2 set as described above.
Both industry output and price are under the transition dynamics associated with the baseline model; only the decision
rules 𝐹𝑖 differ across the two equilibrium objects presented.
ax = axes[0]
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE output')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
Under the dynamics associated with the baseline model, the price path is higher with the Markov perfect equilibrium
robust decision rules than it is with decision rules for the ordinary Markov perfect equilibrium.
So is the industry output path.
To dig a little beneath the forces driving these outcomes, we want to plot 𝑞1𝑡 and 𝑞2𝑡 in the Markov perfect equilibrium
with robust firms and to compare them with corresponding objects in the Markov perfect equilibrium without robust firms
ax = axes[0]
ax.plot(q1, 'g-', lw=2, alpha=0.75, label='firm 1 MPE output')
ax.plot(qr1, 'b-', lw=2, alpha=0.75, label='firm 1 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
(continues on next page)
Evidently, firm 1’s output path is substantially lower when firms are robust firms while firm 2’s output path is virtually the
same as it would be in an ordinary Markov perfect equilibrium with no robust firms.
Recall that we have set 𝜃1 = .02 and 𝜃2 = .04, so that firm 1 fears misspecification of the baseline model substantially
more than does firm 2
• but also please notice that firm 2’s behavior in the Markov perfect equilibrium with robust firms responds to the
decision rule 𝐹1 𝑥𝑡 employed by firm 1.
• thus it is something of a coincidence that its output is almost the same in the two equilibria.
Larger concerns about misspecification induce firm 1 to be more cautious than firm 2 in predicting market price and the
output of the other firm.
To explore this, we study next how ex-post the two firms’ beliefs about state dynamics differ in the Markov perfect
equilibrium with robust firms.
(by ex-post we mean after extremization of each firm’s intertemporal objective)
Heterogeneous Beliefs
As before, let 𝐴𝑜 = 𝐴 − 𝐵_1𝐹 _1𝑟 − 𝐵_2𝐹 _2𝑟 , where in a robust MPE, 𝐹𝑖𝑟 is a robust decision rule for firm 𝑖.
Worst-case forecasts of 𝑥𝑡 starting from 𝑡 = 0 differ between the two firms.
This means that worst-case forecasts of industry output 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 also differ between the two firms.
To find these worst-case beliefs, we compute the following three “closed-loop” transition matrices
• 𝐴𝑜
• 𝐴𝑜 + 𝐶𝐾_1
• 𝐴𝑜 + 𝐶𝐾_2
We call the first transition law, namely, 𝐴𝑜 , the baseline transition under firms’ robust decision rules.
We call the second and third worst-case transitions under robust decision rules for firms 1 and 2.
From {𝑥𝑡 } paths generated by each of these transition laws, we pull off the associated price and total output sequences.
The following code plots them
# == Plot == #
fig, axes = plt.subplots(2, 1, figsize=(9, 9))
ax = axes[0]
ax.plot(qrp1, 'b--', lw=2, alpha=0.75,
label='RMPE worst-case belief output player 1')
ax.plot(qrp2, 'r:', lw=2, alpha=0.75,
label='RMPE worst-case belief output player 2')
(continues on next page)
ax = axes[1]
ax.plot(prp1, 'b--', lw=2, alpha=0.75,
label='RMPE worst-case belief price player 1')
ax.plot(prp2, 'r:', lw=2, alpha=0.75,
label='RMPE worst-case belief price player 2')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
We see from the above graph that under robustness concerns, player 1 and player 2 have heterogeneous beliefs about total
output and the goods price even though they share the same baseline model and information
• firm 1 thinks that total output will be higher and price lower than does firm 2
• this leads firm 1 to produce less than firm 2
These beliefs justify (or rationalize) the Markov perfect equilibrium robust decision rules.
This means that the robust rules are the unique optimal rules (or best responses) to the indicated worst-case transition
dynamics.
([Hansen and Sargent, 2008] discuss how this property of robust decision rules is connected to the concept of admissibility
in Bayesian statistical decision theory)
519
CHAPTER
TWENTYNINE
In addition to what’s in Anaconda, this lecture will need the following libraries:
29.1 Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models routinely used to study eco-
nomic and financial time series.
This class has the advantage of being
1. simple enough to be described by an elegant and comprehensive theory
2. relatively broad in terms of the kinds of dynamics it can represent
We consider these models in both the time and frequency domain.
We will focus much of our attention on linear covariance stationary models with a finite number of parameters.
In particular, we will study stationary ARMA processes, which form a cornerstone of the standard theory of time series
analysis.
Every ARMA process can be represented in linear state space form.
However, ARMA processes have some important structure that makes it valuable to study them separately.
521
Advanced Quantitative Economics with Python
import numpy as np
import matplotlib.pyplot as plt
import quantecon as qe
29.2 Introduction
29.2.1 Definitions
Perhaps the simplest class of covariance stationary processes is the white noise processes.
A process {𝜖𝑡 } is called a white noise process if
1. 𝔼𝜖𝑡 = 0
2. 𝛾(𝑘) = 𝜎2 1{𝑘 = 0} for some 𝜎 > 0
(Here 1{𝑘 = 0} is defined to be 1 if 𝑘 = 0 and zero otherwise)
White noise processes play the role of building blocks for processes with more complicated dynamics.
From the simple building block provided by white noise, we can construct a very flexible family of covariance stationary
processes — the general linear processes
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 , 𝑡∈ℤ (29.1)
𝑗=0
where
• {𝜖𝑡 } is white noise
∞
• {𝜓𝑡 } is a square summable sequence in ℝ (that is, ∑𝑡=0 𝜓𝑡2 < ∞)
The sequence {𝜓𝑡 } is often called a linear filter.
Equation (29.1) is said to present a moving average process or a moving average representation.
With some manipulations, it is possible to confirm that the autocovariance function for (29.1) is
∞
𝛾(𝑘) = 𝜎2 ∑ 𝜓𝑗 𝜓𝑗+𝑘 (29.2)
𝑗=0
By the Cauchy-Schwartz inequality, one can show that 𝛾(𝑘) satisfies equation (29.2).
Evidently, 𝛾(𝑘) does not depend on 𝑡.
Remarkably, the class of general linear processes goes a long way towards describing the entire class of zero-mean
covariance stationary processes.
In particular, Wold’s decomposition theorem states that every zero-mean covariance stationary process {𝑋𝑡 } can be
written as
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 + 𝜂𝑡
𝑗=0
where
• {𝜖𝑡 } is white noise
• {𝜓𝑡 } is square summable
• 𝜓0 𝜖𝑡 is the one-step ahead prediction error in forecasting 𝑋𝑡 as a linear least-squares function of the infinite history
𝑋𝑡−1 , 𝑋𝑡−2 , …
• 𝜂𝑡 can be expressed as a linear function of 𝑋𝑡−1 , 𝑋𝑡−2 , … and is perfectly predictable over arbitrarily long horizons
For the method of constructing a Wold representation, intuition, and further discussion, see [Sargent, 1987], p. 286.
29.2.5 AR and MA
𝜎2
𝛾(𝑘) = 𝜙𝑘 , 𝑘 = 0, 1, … (29.4)
1 − 𝜙2
The next figure plots an example of this function for 𝜙 = 0.8 and 𝜙 = −0.8 with 𝜎 = 1.
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.4)
Another very simple process is the MA(1) process (here MA means “moving average”)
𝑋𝑡 = 𝜖𝑡 + 𝜃𝜖𝑡−1
The AR(1) can be generalized to an AR(𝑝) and likewise for the MA(1).
Putting all of this together, we get the
A stochastic process {𝑋𝑡 } is called an autoregressive moving average process, or ARMA(𝑝, 𝑞), if it can be written as
𝐿0 𝑋𝑡 − 𝜙1 𝐿1 𝑋𝑡 − ⋯ − 𝜙𝑝 𝐿𝑝 𝑋𝑡 = 𝐿0 𝜖𝑡 + 𝜃1 𝐿1 𝜖𝑡 + ⋯ + 𝜃𝑞 𝐿𝑞 𝜖𝑡 (29.6)
In what follows we always assume that the roots of the polynomial 𝜙(𝑧) lie outside the unit circle in the complex plane.
This condition is sufficient to guarantee that the ARMA(𝑝, 𝑞) process is covariance stationary.
In fact, it implies that the process falls within the class of general linear processes described above.
That is, given an ARMA(𝑝, 𝑞) process {𝑋𝑡 } satisfying the unit circle condition, there exists a square summable sequence
∞
{𝜓𝑡 } with 𝑋𝑡 = ∑𝑗=0 𝜓𝑗 𝜖𝑡−𝑗 for all 𝑡.
The sequence {𝜓𝑡 } can be obtained by a recursive procedure outlined on page 79 of [Cryer and Chan, 2008].
The function 𝑡 ↦ 𝜓𝑡 is often called the impulse response function.
Autocovariance functions provide a great deal of information about covariance stationary processes.
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire joint distribution.
Even for non-Gaussian processes, it provides a significant amount of information.
It turns out that there is an alternative representation of the autocovariance function of a covariance stationary process,
called the spectral density.
At times, the spectral density is easier to derive, easier to manipulate, and provides additional intuition.
Before discussing the spectral density, we invite you to recall the main properties of complex numbers (or skip to the next
section).
It can be helpful to remember that, in a formal sense, complex numbers are just points (𝑥, 𝑦) ∈ ℝ2 endowed with a
specific notion of multiplication.
When (𝑥, 𝑦) is regarded as a complex number, 𝑥 is called the real part and 𝑦 is called the imaginary part.
The modulus or absolute value of a complex number 𝑧 = (𝑥, 𝑦) is just its Euclidean norm in ℝ2 , but is usually written as
|𝑧| instead of ‖𝑧‖.
The product of two complex numbers (𝑥, 𝑦) and (𝑢, 𝑣) is defined to be (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢), while addition is standard
pointwise vector addition.
When endowed with these notions of multiplication and addition, the set of complex numbers forms a field — addition
and multiplication play well together, just as they do in ℝ.
The complex number (𝑥, 𝑦) is often written as 𝑥 + 𝑖𝑦, where 𝑖 is called the imaginary unit and is understood to obey
𝑖2 = −1.
The 𝑥 + 𝑖𝑦 notation provides an easy way to remember the definition of multiplication given above, because, proceeding
naively,
Converted back to our first notation, this becomes (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢) as promised.
Complex numbers can be represented in the polar form 𝑟𝑒𝑖𝜔 where
Let {𝑋𝑡 } be a covariance stationary process with autocovariance function 𝛾 satisfying ∑𝑘 𝛾(𝑘)2 < ∞.
The spectral density 𝑓 of {𝑋𝑡 } is defined as the discrete time Fourier transform of its autocovariance function 𝛾.
(Some authors normalize the expression on the right by constants such as 1/𝜋 — the convention chosen makes little
difference provided you are consistent).
Using the fact that 𝛾 is even, in the sense that 𝛾(𝑡) = 𝛾(−𝑡) for all 𝑡, we can show that
It is an exercise to show that the MA(1) process 𝑋𝑡 = 𝜃𝜖𝑡−1 + 𝜖𝑡 has a spectral density
With a bit more effort, it’s possible to show (see, e.g., p. 261 of [Sargent, 1987]) that the spectral density of the AR(1)
process 𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 is
𝜎2
𝑓(𝜔) = (29.11)
1 − 2𝜙 cos(𝜔) + 𝜙2
More generally, it can be shown that the spectral density of the ARMA process (29.5) is
2
𝜃(𝑒𝑖𝜔 )
𝑓(𝜔) = ∣ ∣ 𝜎2 (29.12)
𝜙(𝑒𝑖𝜔 )
where
• 𝜎 is the standard deviation of the white noise process {𝜖𝑡 }.
• the polynomials 𝜙(⋅) and 𝜃(⋅) are as defined in (29.7).
The derivation of (29.12) uses the fact that convolutions become products under Fourier transformations.
The proof is elegant and can be found in many places — see, for example, [Sargent, 1987], chapter 11, section 4.
It’s a nice exercise to verify that (29.10) and (29.11) are indeed special cases of (29.12).
Plotting (29.11) reveals the shape of the spectral density for the AR(1) model when 𝜙 takes the values 0.8 and -0.8
respectively.
These spectral densities correspond to the autocovariance functions for the AR(1) process shown above.
Informally, we think of the spectral density as being large at those 𝜔 ∈ [0, 𝜋] at which the autocovariance function seems
approximately to exhibit big damped cycles.
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral density for the case 𝜙 = −0.8
is large at 𝜔 = 𝜋.
Recall that the spectral density can be expressed as
When we evaluate this at 𝜔 = 𝜋, we get a large number because cos(𝜋𝑘) is large and positive when (−0.8)𝑘 is positive,
and large in absolute value and negative when (−0.8)𝑘 is negative.
Hence the product is always large and positive, and hence the sum of the products on the right-hand side of (29.13) is
large.
These ideas are illustrated in the next figure, which has 𝑘 on the horizontal axis.
ϕ = -0.8
times = list(range(16))
y1 = [ϕ**k / (1 - ϕ**2) for k in times]
y2 = [np.cos(np.pi * k) for k in times]
(continues on next page)
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label=r'$\cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
# Product
ax = axes[2]
ax.stem(times, y3, label=r'$\gamma(k) \cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("k")
plt.show()
On the other hand, if we evaluate 𝑓(𝜔) at 𝜔 = 𝜋/3, then the cycles are not matched, the sequence 𝛾(𝑘) cos(𝜔𝑘) contains
both positive and negative terms, and hence the sum of these terms is much smaller.
ϕ = -0.8
times = list(range(16))
y1 = [ϕ**k / (1 - ϕ**2) for k in times]
y2 = [np.cos(np.pi * k/3) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label=r'$\cos(\pi k/3)$')
(continues on next page)
# Product
ax = axes[2]
ax.stem(times, y3, label=r'$\gamma(k) \cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("$k$")
plt.show()
In summary, the spectral density is large at frequencies 𝜔 where the autocovariance function exhibits damped cycles.
We have just seen that the spectral density is useful in the sense that it provides a frequency-based perspective on the
autocovariance structure of a covariance stationary process.
Another reason that the spectral density is useful is that it can be “inverted” to recover the autocovariance function via
the inverse Fourier transform.
In particular, for all 𝑘 ∈ ℤ, we have
𝜋
1
𝛾(𝑘) = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 (29.14)
2𝜋 −𝜋
This is convenient in situations where the spectral density is easier to calculate and manipulate than the autocovariance
function.
(For example, the expression (29.12) for the ARMA spectral density is much easier to work with than the expression for
the ARMA autocovariance)
This section is loosely based on [Sargent, 1987], p. 249-253, and included for those who
• would like a bit more insight into spectral densities
• and have at least some background in Hilbert space theory
Others should feel free to skip to the next section — none of this material is necessary to progress to computation.
Recall that every separable Hilbert space 𝐻 has a countable orthonormal basis {ℎ𝑘 }.
The nice thing about such a basis is that every 𝑓 ∈ 𝐻 satisfies
• 𝐻 = 𝐿2 , where 𝐿2 is the set of square summable functions on the interval [−𝜋, 𝜋], with inner product ⟨𝑔, ℎ⟩ =
𝜋
∫−𝜋 𝑔(𝜔)ℎ(𝜔)𝑑𝜔.
• {ℎ𝑘 } = the orthonormal basis for 𝐿2 given by the set of trigonometric functions.
𝑒𝑖𝜔𝑘
ℎ𝑘 (𝜔) = √ , 𝑘 ∈ ℤ, 𝜔 ∈ [−𝜋, 𝜋]
2𝜋
Using the definition of 𝑇 from above and the fact that 𝑓 is even, we now have
𝑒𝑖𝜔𝑘 1
𝑇 𝛾 = ∑ 𝛾(𝑘) √ = √ 𝑓(𝜔) (29.16)
𝑘∈ℤ 2𝜋 2𝜋
In other words, apart from a scalar multiple, the spectral density is just a transformation of 𝛾 ∈ ℓ2 under a certain linear
isometry — a different way to view 𝛾.
In particular, it is an expansion of the autocovariance function with respect to the trigonometric basis functions in 𝐿2 .
As discussed above, the Fourier coefficients of 𝑇 𝛾 are given by the sequence 𝛾, and, in particular, 𝛾(𝑘) = ⟨𝑇 𝛾, ℎ𝑘 ⟩.
Transforming this inner product into its integral expression and using (29.16) gives (29.14), justifying our earlier expres-
sion for the inverse transform.
29.4 Implementation
Most code for working with covariance stationary models deals with ARMA models.
Python code for studying ARMA models can be found in the tsa submodule of statsmodels.
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis — we’ve put together the module
arma.py, which is part of QuantEcon.py package.
The module provides functions for mapping ARMA(𝑝, 𝑞) models into their
1. impulse response function
2. simulated time series
3. autocovariance function
4. spectral density
29.4.1 Application
Let’s use this code to replicate the plots on pages 68–69 of [Ljungqvist and Sargent, 2018].
Here are some functions to generate the plots
def quad_plot(arma):
"""
Plots the impulse response, spectral_density, autocovariance,
and one realization of the process.
"""
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 7))
plot_functions = [plot_impulse_response,
plot_spectral_density,
plot_autocovariance,
plot_simulation]
for plot_func, ax in zip(plot_functions, axes.flatten()):
plot_func(arma, ax)
plt.tight_layout()
plt.show()
ϕ = 0.0
θ = 0.0
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/matplotlib/
↪cbook.py:1762: ComplexWarning: Casting complex values to real discards the␣
↪imaginary part
return math.isfinite(val)
(continues on next page)
↪imaginary part
↪imaginary part
self._points[:, 1] = interval
If we look carefully, things look good: the spectrum is the flat line at 100 at the very top of the spectrum graphs, which
is at it should be.
Also
1 𝜋
• the variance equals 1 = 2𝜋 ∫−𝜋 1𝑑𝜔 as it should.
• the covariogram and impulse response look as they should.
• it is actually challenging to visualize a time series realization of white noise – a sequence of surprises – but this too
looks pretty good.
To get some more examples, as our laboratory we’ll replicate quartets of graphs that [Ljungqvist and Sargent, 2018] use
to teach “how to read spectral densities”.
Ljunqvist and Sargent’s first model is 𝑋𝑡 = 1.3𝑋𝑡−1 − .7𝑋𝑡−2 + 𝜖𝑡
ϕ = 1.3, -.7
θ = 0.0
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
ϕ = 0.9
θ = -0.0
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
ϕ = .98
θ = -0.7
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
29.4.2 Explanation
The call
arma = ARMA(ϕ, θ, σ)
creates an instance arma that represents the ARMA(𝑝, 𝑞) model
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 + 𝜃𝜖𝑡−1
The two numerical packages most useful for working with ARMA models are scipy.signal and numpy.fft.
The package scipy.signal expects the parameters to be passed into its functions in a manner consistent with the
alternative ARMA notation (29.8).
For example, the impulse response sequence {𝜓𝑡 } discussed above can be obtained using scipy.signal.
dimpulse, and the function call should be of the form
times, ψ = dimpulse((ma_poly, ar_poly, 1), n=impulse_length)
where ma_poly and ar_poly correspond to the polynomials in (29.7) — that is,
• ma_poly is the vector (1, 𝜃1 , 𝜃2 , … , 𝜃𝑞 )
• ar_poly is the vector (1, −𝜙1 , −𝜙2 , … , −𝜙𝑝 )
To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their values computed automat-
ically from the values of phi and theta supplied by the user.
If the user decides to change the value of either theta or phi ex-post by assignments such as arma.phi = (0.5,
0.2) or arma.theta = (0, -0.1).
then ma_poly and ar_poly should update automatically to reflect these new parameters.
This is achieved in our implementation by using descriptors.
As discussed above, for ARMA processes the spectral density has a simple representation that is relatively easy to calculate.
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the spectral density via the
inverse Fourier transform.
Here we use NumPy’s Fourier transform package np.fft, which wraps a standard Fortran-based package called FFTPACK.
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given sequence 𝐴0 , 𝐴1 , … , 𝐴𝑛−1
and returns the sequence 𝑎0 , 𝑎1 , … , 𝑎𝑛−1 defined by
1 𝑛−1
𝑎𝑘 = ∑ 𝐴 𝑒𝑖𝑘2𝜋𝑡/𝑛
𝑛 𝑡=0 𝑡
Thus, if we set 𝐴𝑡 = 𝑓(𝜔𝑡 ), where 𝑓 is the spectral density and 𝜔𝑡 ∶= 2𝜋𝑡/𝑛, then
1 𝑛−1 1 2𝜋 𝑛−1
𝑎𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 , 𝜔𝑡 ∶= 2𝜋𝑡/𝑛
𝑛 𝑡=0 2𝜋 𝑛 𝑡=0
THIRTY
ESTIMATION OF SPECTRA
In addition to what’s in Anaconda, this lecture will need the following libraries:
30.1 Overview
In a previous lecture, we covered some fundamental properties of covariance stationary linear stochastic processes.
One objective for that lecture was to introduce spectral densities — a standard and very useful technique for analyzing
such processes.
In this lecture, we turn to the problem of estimating spectral densities and other related quantities from data.
Estimates of the spectral density are computed using what is known as a periodogram — which in turn is computed via
the famous fast Fourier transform.
Once the basic technique has been explained, we will apply it to the analysis of several key macroeconomic time series.
For supplementary reading, see [Sargent, 1987] or [Cryer and Chan, 2008].
Let’s start with some standard imports:
import numpy as np
import matplotlib.pyplot as plt
from quantecon import ARMA, periodogram, ar_periodogram
30.2 Periodograms
Recall that the spectral density 𝑓 of a covariance stationary process with autocorrelation function 𝛾 can be written
Now consider the problem of estimating the spectral density of a given time series, when 𝛾 is unknown.
In particular, let 𝑋0 , … , 𝑋𝑛−1 be 𝑛 consecutive observations of a single time series that is assumed to be covariance
stationary.
543
Advanced Quantitative Economics with Python
The most common estimator of the spectral density of this process is the periodogram of 𝑋0 , … , 𝑋𝑛−1 , which is defined
as
2
1 𝑛−1
𝐼(𝜔) ∶= ∣∑ 𝑋𝑡 𝑒𝑖𝑡𝜔 ∣ , 𝜔∈ℝ (30.1)
𝑛 𝑡=0
30.2.1 Interpretation
To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies
2𝜋𝑗
𝜔𝑗 ∶= , 𝑗 = 0, … , 𝑛 − 1
𝑛
In what sense is 𝐼(𝜔𝑗 ) an estimate of 𝑓(𝜔𝑗 )?
The answer is straightforward, although it does involve some algebra.
With a bit of effort, one can show that for any integer 𝑗 > 0,
𝑛−1 𝑛−1
𝑡
∑ 𝑒𝑖𝑡𝜔𝑗 = ∑ exp {𝑖2𝜋𝑗 } = 0
𝑡=0 𝑡=0
𝑛
𝑛−1
Letting 𝑋̄ denote the sample mean 𝑛−1 ∑𝑡=0 𝑋𝑡 , we then have
2
𝑛−1 𝑛−1 𝑛−1
̄ 𝑖𝑡𝜔𝑗 ∣ = ∑(𝑋𝑡 − 𝑋)𝑒
𝑛𝐼(𝜔𝑗 ) = ∣∑(𝑋𝑡 − 𝑋)𝑒 ̄ 𝑖𝑡𝜔𝑗 ∑(𝑋𝑟 − 𝑋)𝑒
̄ −𝑖𝑟𝜔𝑗
𝑡=0 𝑡=0 𝑟=0
Now let
1 𝑛−1 ̄ ̄
𝛾(𝑘)
̂ ∶= ∑(𝑋 − 𝑋)(𝑋𝑡−𝑘 − 𝑋), 𝑘 = 0, 1, … , 𝑛 − 1
𝑛 𝑡=𝑘 𝑡
This is the sample autocovariance function, the natural “plug-in estimator” of the autocovariance function 𝛾.
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations with sample means)
With this notation, we can now write
𝑛−1
𝐼(𝜔𝑗 ) = 𝛾(0)
̂ + 2 ∑ 𝛾(𝑘)
̂ cos(𝜔𝑗 𝑘)
𝑘=1
Recalling our expression for 𝑓 given above, we see that 𝐼(𝜔𝑗 ) is just a sample analog of 𝑓(𝜔𝑗 ).
30.2.2 Calculation
With numpy.fft.fft imported as fft and 𝑎0 , … , 𝑎𝑛−1 stored in NumPy array a, the function call fft(a) returns
the values 𝐴0 , … , 𝐴𝑛−1 as a NumPy array.
It follows that when the data 𝑋0 , … , 𝑋𝑛−1 are stored in array X, the values 𝐼(𝜔𝑗 ) at the Fourier frequencies, which are
given by
2
1 𝑛−1 𝑡𝑗
∣∑ 𝑋𝑡 exp {𝑖2𝜋 }∣ , 𝑗 = 0, … , 𝑛 − 1
𝑛 𝑡=0 𝑛
Note: The NumPy function abs acts elementwise, and correctly handles complex numbers (by computing their modulus,
which is exactly what we need).
A function called periodogram that puts all this together can be found here.
Let’s generate some data for this function using the ARMA class from QuantEcon.py (see the lecture on linear processes
for more details).
Here’s a code snippet that, once the preceding code has been run, generates data from the process
where {𝜖𝑡 } is white noise with unit variance, and compares the periodogram to the actual spectral density
n = 40 # Data size
ϕ, θ = 0.5, (0, -0.8) # AR and MA parameters
lp = ARMA(ϕ, θ)
X = lp.simulation(ts_length=n)
fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
x_sd, y_sd = lp.spectral_density(two_pi=False, res=120)
ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')
ax.legend()
plt.show()
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/matplotlib/
↪cbook.py:1762: ComplexWarning: Casting complex values to real discards the␣
↪imaginary part
return math.isfinite(val)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/matplotlib/
↪cbook.py:1398: ComplexWarning: Casting complex values to real discards the␣
↪imaginary part
This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not surprising that the estimate is
poor.
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density.
This brings us to our next topic.
30.3 Smoothing
The standard way that smoothing is applied to periodograms is by taking local averages.
In other words, the value 𝐼(𝜔𝑗 ) is replaced with a weighted average of the adjacent values
where the weights 𝑤(−𝑝), … , 𝑤(𝑝) are a sequence of 2𝑝 + 1 nonnegative values summing to one.
In general, larger values of 𝑝 indicate more smoothing — more on this below.
The next figure shows the kind of sequence typically used.
Note the smaller weights towards the edges and larger weights in the center, so that more distant values from 𝐼(𝜔𝑗 ) have
less weight than closer ones in the sum (30.3).
def hanning_window(M):
w = [0.5 - 0.5 * np.cos(2 * np.pi * n/(M-1)) for n in range(M)]
return w
Our next step is to provide code that will not only estimate the periodogram but also provide smoothing as required.
Such functions have been written in estspec.py and are available once you’ve installed QuantEcon.py.
The GitHub listing displays three functions, smooth(), periodogram(), ar_periodogram(). We will discuss
the first two here and the third one below.
The periodogram() function returns a periodogram, optionally smoothed via the smooth() function.
Regarding the smooth() function, since smoothing adds a nontrivial amount of computation, we have applied a fairly
terse array-centric method based around np.convolve.
Readers are left either to explore or simply to use this code according to their interests.
The next three figures each show smoothed and unsmoothed periodograms, as well as the population or “true” spectral
density.
(The model is the same as before — see equation (30.2) — and there are 400 observations)
From the top figure to bottom, the window length is varied from small to large.
In looking at the figure, we can see that for this model and data size, the window length chosen in the middle figure
provides the best fit.
Relative to this value, the first window length provides insufficient smoothing, while the third gives too much smoothing.
Of course in real estimation problems, the true spectral density is not visible and the choice of appropriate smoothing
will have to be made based on judgement/priors or some other theory.
In the code listing, we showed three functions from the file estspec.py.
The third function in the file (ar_periodogram()) adds a pre-processing step to periodogram smoothing.
First, we describe the basic idea, and after that we give the code.
The essential idea is to
1. Transform the data in order to make estimation of the spectral density more efficient.
2. Compute the periodogram associated with the transformed data.
3. Reverse the effect of the transformation on the periodogram, so that it now estimates the spectral density of the
original process.
Step 1 is called pre-filtering or pre-whitening, while step 3 is called recoloring.
The first step is called pre-whitening because the transformation is usually designed to turn the data into something closer
to white noise.
Why would this be desirable in terms of spectral density estimation?
The reason is that we are smoothing our estimated periodogram based on estimated values at nearby points — recall
(30.3).
The underlying assumption that makes this a good idea is that the true spectral density is relatively regular — the value
of 𝐼(𝜔) is close to that of 𝐼(𝜔′ ) when 𝜔 is close to 𝜔′ .
This will not be true in all cases, but it is certainly true for white noise.
For white noise, 𝐼 is as regular as possible — it is a constant function.
In this case, values of 𝐼(𝜔′ ) at points 𝜔′ near to 𝜔 provided the maximum possible amount of information about the value
𝐼(𝜔).
Another way to put this is that if 𝐼 is relatively constant, then we can use a large amount of smoothing without introducing
too much bias.
Let’s examine this idea more carefully in a particular setting — where the data are assumed to be generated by an AR(1)
process.
(More general ARMA settings can be handled using similar techniques to those described below)
Suppose in particular that {𝑋𝑡 } is covariance stationary and AR(1), with
where 𝜇 and 𝜙 ∈ (−1, 1) are unknown parameters and {𝜖𝑡 } is white noise.
It follows that if we regress 𝑋𝑡+1 on 𝑋𝑡 and an intercept, the residuals will approximate white noise.
Let
• 𝑔 be the spectral density of {𝜖𝑡 } — a constant function, as discussed above
• 𝐼0 be the periodogram estimated from the residuals — an estimate of 𝑔
30.4 Exercises
Exercise 30.4.1
Replicate this figure (modulo randomness).
The model is as in equation (30.2) and there are 400 observations.
For the smoothed periodogram, the window type is “hamming”.
## Data
n = 400
ϕ = 0.5
θ = 0, -0.8
lp = ARMA(ϕ, θ)
X = lp.simulation(ts_length=n)
x, y = periodogram(X)
ax[i].plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
(continues on next page)
ax[i].legend()
ax[i].set_title(f'window length = {wl}')
plt.show()
Exercise 30.4.2
Replicate this figure (modulo randomness).
The model is as in equation (30.4), with 𝜇 = 0, 𝜙 = −0.9 and 150 observations in each time series.
All periodograms are fit with the “hamming” window and window length of 65.
lp = ARMA(-0.9)
wl = 65
for i in range(3):
X = lp.simulation(ts_length=150)
ax[i].set_xlim(0, np.pi)
ax[i].legend(loc='upper left')
plt.show()
THIRTYONE
In addition to what’s in Anaconda, this lecture will need the following libraries:
31.1 Overview
Many economic time series display persistent growth that prevents them from being asymptotically stationary and ergodic.
For example, outputs, prices, and dividends typically display irregular but persistent growth.
Asymptotic stationarity and ergodicity are key assumptions needed to make it possible to learn by applying statistical
methods.
But there are good ways to model time series that have persistent growth that still enable statistical learning based on a
law of large numbers for an asymptotically stationary and ergodic process.
Thus, [Hansen, 2012] described two classes of time series models that accommodate growth.
They are
1. additive functionals that display random “arithmetic growth”
2. multiplicative functionals that display random “geometric growth”
These two classes of processes are closely connected.
If a process {𝑦𝑡 } is an additive functional and 𝜙𝑡 = exp(𝑦𝑡 ), then {𝜙𝑡 } is a multiplicative functional.
In this lecture, we describe both additive functionals and multiplicative functionals.
We also describe and compute decompositions of additive and multiplicative processes into four components:
1. a constant
2. a trend component
3. an asymptotically stationary component
4. a martingale
We describe how to construct, simulate, and interpret these components.
More details about these concepts and algorithms can be found in Hansen [Hansen, 2012] and Hansen and Sargent [Hansen
and Sargent, 2024].
Let’s start with some imports:
559
Advanced Quantitative Economics with Python
import numpy as np
import scipy.linalg as la
import quantecon as qe
import matplotlib.pyplot as plt
from scipy.stats import norm, lognorm
Here
• 𝑥𝑡 is an 𝑛 × 1 vector,
• 𝐴 is an 𝑛 × 𝑛 stable matrix (all eigenvalues lie within the open unit circle),
• 𝑧𝑡+1 ∼ 𝑁 (0, 𝐼) is an 𝑚 × 1 IID shock,
• 𝐵 is an 𝑛 × 𝑚 matrix, and
• 𝑥0 ∼ 𝑁 (𝜇0 , Σ0 ) is a random initial condition for 𝑥
The second piece is an equation that expresses increments of {𝑦𝑡 }∞
𝑡=0 as linear functions of
• a scalar constant 𝜈,
• the vector 𝑥𝑡 , and
• the same Gaussian vector 𝑧𝑡+1 that appears in the VAR (31.1)
In particular,
A convenient way to represent our additive functional is to use a linear state space system.
To do this, we set up state and observation vectors
1
𝑥
𝑥𝑡̂ = ⎡𝑥
⎢ 𝑡⎥
⎤ and 𝑦𝑡̂ = [ 𝑡 ]
𝑦𝑡
⎣ 𝑦𝑡 ⎦
1 1 0 0 1 0
⎡𝑥 ⎤ = ⎡0 𝐴 0⎤ ⎡𝑥 ⎤ + ⎡𝐵⎤ 𝑧
⎢ 𝑡+1 ⎥ ⎢ ⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡+1
⎣ 𝑦𝑡+1 ⎦ ⎣𝜈 𝐷 1⎦ ⎣ 𝑦𝑡 ⎦ ⎣ 𝐹 ⎦
1
𝑥 0 𝐼 0 ⎡ ⎤
[ 𝑡] = [ ] ⎢𝑥𝑡 ⎥
𝑦𝑡 0 0 1
⎣ 𝑦𝑡 ⎦
This can be written as
𝑥𝑡+1
̂ = 𝐴𝑥 ̂ ̂ + 𝐵𝑧
̂ 𝑡+1
𝑡
𝑦𝑡̂ = 𝐷̂ 𝑥𝑡̂
31.3 Dynamics
𝑥𝑡+1
̃ = 𝜙1 𝑥𝑡̃ + 𝜙2 𝑥𝑡−1
̃ + 𝜙3 𝑥𝑡−2
̃ + 𝜙4 𝑥𝑡−3
̃ + 𝜎𝑧𝑡+1 (31.3)
𝜙(𝑧) = (1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − 𝜙3 𝑧3 − 𝜙4 𝑧4 )
31.3.1 Simulation
class AMF_LSS_VAR:
"""
This class transforms an additive (multiplicative)
functional into a QuantEcon linear state space system.
"""
# Set F
if not np.any(F):
self.F = np.zeros((self.nk, 1))
else:
self.F = F
# Set ν
if not np.any(ν):
self.ν = np.zeros((self.nm, 1))
elif type(ν) == float:
self.ν = np.asarray([[ν]])
elif len(ν.shape) == 1:
self.ν = np.expand_dims(ν, 1)
else:
self.ν = ν
if self.ν.shape[0] != self.D.shape[0]:
raise ValueError("The dimension of ν is inconsistent with D!")
def construct_ss(self):
(continues on next page)
# Auxiliary blocks with 0's and 1's to fill out the lss matrices
nx0c = np.zeros((nx, 1))
nx0r = np.zeros(nx)
nx1 = np.ones(nx)
nk0 = np.zeros(nk)
ny0c = np.zeros((nm, 1))
ny0r = np.zeros(nm)
ny1m = np.eye(nm)
ny0m = np.zeros((nm, nm))
nyx0m = np.zeros_like(D)
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (κ_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
I = np.identity(self.nx)
A_res = la.solve(I - self.A, I)
g = self.D @ A_res
H = self.F + self.D @ A_res @ self.B
return self.ν, H, g
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5)*np.expand_dims(np.diag(H @ H.T), 1)
return ν_tilde, H, g
return llh[-1]
Plotting
The code below adds some functions that generate plots for instances of the AMF_LSS_VAR class.
# Allocate space
trange = np.arange(T)
# Create figure
fig, ax = plt.subplots(2, 2, sharey=True, figsize=(15, 8))
return fig
"""
# Pull out right sizes so we know how to increment
(continues on next page)
add_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
add_figs.append(plot_given_paths(amf, T,
ypath[li:ui,:],
mpath[li:ui,:],
spath[li:ui,:],
tpath[li:ui,:],
mbounds[LI:UI,:],
sbounds[LI:UI,:],
return add_figs
"""
# Pull out right sizes so we know how to increment
nx, nk, nm = amf.nx, amf.nk, amf.nm
# Matrices for the multiplicative decomposition
ν_tilde, H, g = amf.multiplicative_decomp()
mult_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mult_figs.append(plot_given_paths(amf,T,
ypath_mult[li:ui,:],
mpath_mult[li:ui,:],
spath_mult[li:ui,:],
tpath_mult[li:ui,:],
mbounds_mult[LI:UI,:],
sbounds_mult[LI:UI,:],
1,
show_trend=show_trend))
mult_figs[ii].suptitle(f'Multiplicative decomposition of \
$y_{ii+1}$', fontsize=14)
return mult_figs
# Create figure
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
return fig
).item()
)
mbounds_mult[li:ui, t] = Mdist.ppf([.01, .99])
mart_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mart_figs.append(plot_martingale_paths(amf, T, mpath_mult[li:ui, :],
mbounds_mult[LI:UI, :],
horline=1))
mart_figs[ii].suptitle(f'Martingale components for many paths of \
$y_{ii+1}$', fontsize=14)
return mart_figs
For now, we just plot 𝑦𝑡 and 𝑥𝑡 , postponing until later a description of exactly how we compute them.
# A matrix should be n x n
A = np.array([[ϕ_1, ϕ_2, ϕ_3, ϕ_4],
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0]])
# B matrix should be n x k
B = np.array([[σ, 0, 0, 0]]).T
D = np.array([1, 0, 0, 0]) @ A
F = np.array([1, 0, 0, 0]) @ B
T = 150
x, y = amf.lss.simulate(T)
31.3.2 Decomposition
Hansen and Sargent [Hansen and Sargent, 2024] describe how to construct a decomposition of an additive functional into
four parts:
• a constant inherited from initial values 𝑥0 and 𝑦0
• a linear trend
• a martingale
• an (asymptotically) stationary component
To attain this decomposition for the particular class of additive functionals defined by (31.1) and (31.2), we first construct
the matrices
𝐻 ∶= 𝐹 + 𝐷(𝐼 − 𝐴)−1 𝐵
𝑔 ∶= 𝐷(𝐼 − 𝐴)−1
Then the Hansen [Hansen, 2012], [Hansen and Sargent, 2024] decomposition is
Martingale component
⏞
𝑡 initial conditions
𝑦𝑡 = 𝑡𝜈
⏟ + ∑ 𝐻𝑧𝑗 − 𝑔𝑥
⏟𝑡 + 𝑔⏞
𝑥 0 + 𝑦0
trend component 𝑗=1 stationary component
At this stage, you should pause and verify that 𝑦𝑡+1 − 𝑦𝑡 satisfies (31.2).
It is convenient for us to introduce the following notation:
• 𝜏𝑡 = 𝜈𝑡 , a linear, deterministic trend
𝑡
• 𝑚𝑡 = ∑𝑗=1 𝐻𝑧𝑗 , a martingale with time 𝑡 + 1 increment 𝐻𝑧𝑡+1
• 𝑠𝑡 = 𝑔𝑥𝑡 , an (asymptotically) stationary component
We want to characterize and simulate components 𝜏𝑡 , 𝑚𝑡 , 𝑠𝑡 of the decomposition.
A convenient way to do this is to construct an appropriate instance of a linear state space system by using LinearStateSpace
from QuantEcon.py.
This will allow us to use the routines in LinearStateSpace to study dynamics.
To start, observe that, under the dynamics in (31.1) and (31.2) and with the definitions just given,
1 1 0 0 0 0 1 0
⎡ 𝑡 + 1 ⎤ ⎡1 1 0 0 0⎤ ⎡ 𝑡 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 𝑥𝑡+1 ⎥ = ⎢0 0 𝐴 0 0⎥ ⎢ 𝑥𝑡 ⎥ + ⎢ 𝐵 ⎥ 𝑧𝑡+1
⎢ 𝑦𝑡+1 ⎥ ⎢𝜈 0 𝐷 1 0 ⎥ ⎢ 𝑦𝑡 ⎥ ⎢ 𝐹 ⎥
⎣𝑚𝑡+1 ⎦ ⎣0 0 0 0 1⎦ ⎣𝑚𝑡 ⎦ ⎣𝐻 ⎦
and
𝑥𝑡 0 0 𝐼 0 0 1
⎡ 𝑦 ⎤ ⎡0 0 0 1 0⎤ ⎡ 𝑡 ⎤
⎢ 𝑡⎥ ⎢ ⎥⎢ ⎥
⎢ 𝜏𝑡 ⎥ = ⎢0 𝜈 0 0 0 ⎥ ⎢ 𝑥𝑡 ⎥
⎢𝑚𝑡 ⎥ ⎢0 0 0 0 1 ⎥ ⎢ 𝑦𝑡 ⎥
⎣ 𝑠𝑡 ⎦ ⎣0 0 −𝑔 0 0⎦ ⎣𝑚𝑡 ⎦
With
1 𝑥𝑡
⎡ 𝑡 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑡⎥
𝑥̃ ∶= ⎢ 𝑥𝑡 ⎥ and 𝑦 ̃ ∶= ⎢ 𝜏𝑡 ⎥
⎢ 𝑦𝑡 ⎥ ⎢𝑚𝑡 ⎥
⎣𝑚𝑡 ⎦ ⎣ 𝑠𝑡 ⎦
𝑥𝑡+1
̃ ̃ ̃ + 𝐵𝑧
= 𝐴𝑥 ̃ 𝑡+1
𝑡
𝑦𝑡̃ = 𝐷̃ 𝑥𝑡̃
31.4 Code
The class AMF_LSS_VAR mentioned above does all that we want to study our additive functional.
In fact, AMF_LSS_VAR does more because it allows us to study an associated multiplicative functional as well.
(A hint that it does more is the name of the class – here AMF stands for “additive and multiplicative functional” – the
code computes and displays objects associated with multiplicative functionals too.)
Let’s use this code (embedded above) to explore the example process described above.
If you run the code that first simulated that example again and then the method call you will generate (modulo randomness)
the plot
plot_additive(amf, T)
plt.show()
When we plot multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also plot the population 95%
probability coverage sets computed using the LinearStateSpace class.
We have chosen to simulate many paths, all starting from the same non-random initial conditions 𝑥0 , 𝑦0 (you can tell this
from the shape of the 95% probability coverage shaded areas).
Notice tell-tale signs of these probability coverage shaded areas
√
• the purple one for the martingale component 𝑚𝑡 grows with 𝑡
• the green one for the stationary component 𝑠𝑡 converges to a constant band
or
𝑀𝑡 ̃
𝑀 𝑒(𝑋
̃ 0)
̃ ( 𝑡 )(
= exp (𝜈𝑡) )
𝑀0 ̃
𝑀0 𝑒(𝑥
̃ 𝑡)
where
𝑡
𝐻 ⋅𝐻 ̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )), ̃0 = 1
𝜈̃ = 𝜈 + , 𝑀 𝑀
2 𝑗=1
2
and
𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷(𝐼 − 𝐴)−1 𝑥]
An instance of class AMF_LSS_VAR (above) includes this associated multiplicative functional as an attribute.
Let’s plot this multiplicative functional for our example.
If you run the code that first simulated that example again and then the method call in the cell below you’ll obtain the graph
in the next cell.
plot_multiplicative(amf, T)
plt.show()
As before, when we plotted multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also plotted population
95% confidence bands computed using the LinearStateSpace class.
Comparing this figure and the last also helps show how geometric growth differs from arithmetic growth.
The top right panel of the above graph shows a panel of martingales associated with the panel of 𝑀𝑡 = exp(𝑦𝑡 ) that we
have generated for a limited horizon 𝑇 .
It is interesting to how the martingale behaves as 𝑇 → +∞.
Let’s see what happens when we set 𝑇 = 12000 instead of 150.
Hansen and Sargent [Hansen and Sargent, 2024] (ch. 8) describe the following two properties of the martingale compo-
̃𝑡 of the multiplicative decomposition
nent 𝑀
̃𝑡 = 1 for all 𝑡 ≥ 0, nevertheless …
• while 𝐸0 𝑀
̃𝑡 converges to zero almost surely
• as 𝑡 → +∞, 𝑀
̃𝑡 is a multiplicative martingale with initial condition 𝑀
The first property follows from the fact that 𝑀 ̃0 = 1.
The second is a peculiar property noted and proved by Hansen and Sargent [Hansen and Sargent, 2024].
̃𝑡 illustrates both properties
The following simulation of many paths of 𝑀
np.random.seed(10021987)
plot_martingales(amf, 12000)
plt.show()
The dotted line in the above graph is the mean 𝐸 𝑀̃ 𝑡 = 1 of the martingale.
It remains constant at unity, illustrating the first property.
The purple 95 percent frequency coverage interval collapses around zero, illustrating the second property.
̃𝑡 }∞ in more detail.
Let’s drill down and study probability distribution of the multiplicative martingale {𝑀 𝑡=0
In particular, we want to simulate 5000 sample paths of length 𝑇 for the case in which 𝑥 is a scalar and [𝐴, 𝐵, 𝐷, 𝐹 ] =
[0.8, 0.001, 1.0, 0.01] and 𝜈 = 0.005.
After accomplishing this, we want to display and study histograms of 𝑀̃ 𝑇𝑖 for various values of 𝑇 .
Here is code that accomplishes these tasks.
We’ll do this by formulating the additive functional as a linear state space model and putting the LinearStateSpace class
to work.
class AMF_LSS_VAR:
"""
This class is written to transform a scalar additive functional
into a linear state space system.
"""
def __init__(self, A, B, D, F=0.0, ν=0.0):
# Unpack required elements
self.A, self.B, self.D, self.F, self.ν = A, B, D, F, ν
def construct_ss(self):
(continues on next page)
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition (Proposition 4.3.3.)
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (kappa_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
A_res = 1 / (1 - self.A)
g = self.D * A_res
H = self.F + self.D * A_res * self.B
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5) * H**2
return ν_tilde, H, g
return llh[-1]
return x, y
# Allocate space
storeX = np.empty((I, T))
storeY = np.empty((I, T))
for i in range(I):
# Do specific simulation
x, y = simulate_xy(amf, T)
Now that we have these functions in our toolkit, let’s apply them to run some simulations.
# Allocate space
add_mart_comp = np.empty((I, T))
# Build model
amf_2 = AMF_LSS_VAR(0.8, 0.001, 1.0, 0.01,.005)
Note: scipy.stats.lognorm expects you to pass the standard deviation first (𝑡𝐻 ⋅ 𝐻) and then the exponent of
the mean as a keyword argument scale (scale=np.exp(-t * H2 / 2)).
• See the documentation here.
This is peculiar, so make sure you are careful in working with the log normal distribution.
# The distribution
mdist = lognorm(np.sqrt(t*H2), scale=np.exp(-t*H2/2))
x = np.linspace(xmin, xmax, npts)
pdf = mdist.pdf(x)
return x, pdf
# The distribution
lmdist = norm(-t*H2/2, np.sqrt(t*H2))
x = np.linspace(xmin, xmax, npts)
pdf = lmdist.pdf(x)
return x, pdf
plt.tight_layout()
plt.show()
These probability density functions help us understand mechanics underlying the peculiar property of our multiplicative
martingale
• As 𝑇 grows, most of the probability mass shifts leftward toward zero.
• For example, note that most mass is near 1 for 𝑇 = 10 or 𝑇 = 100 but most of it is near 0 for 𝑇 = 5000.
̃𝑇 lengthens toward the right.
• As 𝑇 grows, the tail of the density of 𝑀
̃𝑇 = 1 even as most mass in the distribution of 𝑀
• Enough mass moves toward the right tail to keep 𝐸 𝑀 ̃𝑇 collapses
around 0.
THIRTYTWO
32.1 Overview
In an earlier lecture Linear Quadratic Dynamic Programming Problems, we have studied how to solve a special class
of dynamic optimization and prediction problems by applying the method of dynamic programming. In this class of
problems
• the objective function is quadratic in states and controls.
• the one-step transition function is linear.
• shocks are IID Gaussian or martingale differences.
In this lecture and a companion lecture Classical Filtering with Linear Algebra, we study the classical theory of linear-
quadratic (LQ) optimal control problems.
The classical approach does not use the two closely related methods – dynamic programming and Kalman filtering –
that we describe in other lectures, namely, Linear Quadratic Dynamic Programming Problems and A First Look at the
Kalman Filter.
Instead, they use either
• 𝑧-transform and lag operator methods, or
• matrix decompositions applied to linear systems of first-order conditions for optimum problems.
In this lecture and the sequel Classical Filtering with Linear Algebra, we mostly rely on elementary linear algebra.
The main tool from linear algebra we’ll put to work here is LU decomposition.
We’ll begin with discrete horizon problems.
Then we’ll view infinite horizon problems as appropriate limits of these finite horizon problems.
Later, we will examine the close connection between LQ control and least-squares prediction and filtering problems.
These classes of problems are connected in the sense that to solve each, essentially the same mathematics is used.
Let’s start with some standard imports:
import numpy as np
import matplotlib.pyplot as plt
583
Advanced Quantitative Economics with Python
32.1.1 References
Useful references include [Whittle, 1963], [Hansen and Sargent, 1980], [Orfanidis, 1988], [Athanasios and Pillai, 1991],
and [Muth, 1960].
Let 𝐿 be the lag operator, so that, for sequence {𝑥𝑡 } we have 𝐿𝑥𝑡 = 𝑥𝑡−1 .
More generally, let 𝐿𝑘 𝑥𝑡 = 𝑥𝑡−𝑘 with 𝐿0 𝑥𝑡 = 𝑥𝑡 and
𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚
where
• ℎ is a positive parameter and 𝛽 ∈ (0, 1) is a discount factor.
𝑡
• {𝑎𝑡 }𝑡≥0 is a sequence of exponential order less than 𝛽 −1/2 , by which we mean lim𝑡→∞ 𝛽 2 𝑎𝑡 = 0.
Maximization in (32.1) is subject to initial conditions for 𝑦−1 , 𝑦−2 … , 𝑦−𝑚 .
Maximization is over infinite sequences {𝑦𝑡 }𝑡≥0 .
32.2.1 Example
The formulation of the LQ problem given above is broad enough to encompass many useful models.
As a simple illustration, recall that in LQ Control: Foundations we consider a monopolist facing stochastic demand shocks
and adjustment costs.
Let’s consider a deterministic version of this problem, where the monopolist maximizes the discounted sum
∞
∑ 𝛽 𝑡 𝜋𝑡
𝑡=0
and
𝜕𝐽
= 2𝛽 𝑡 𝑑0 𝑑(𝐿)𝑦𝑡 + 2𝛽 𝑡+1 𝑑1 𝑑(𝐿)𝑦𝑡+1 + ⋯ + 2𝛽 𝑡+𝑚 𝑑𝑚 𝑑(𝐿)𝑦𝑡+𝑚
𝜕𝑦𝑡
= 2𝛽 𝑡 (𝑑0 + 𝑑1 𝛽𝐿−1 + 𝑑2 𝛽 2 𝐿−2 + ⋯ + 𝑑𝑚 𝛽 𝑚 𝐿−𝑚 ) 𝑑(𝐿)𝑦𝑡
𝜕𝐽
= 2𝛽 𝑡 𝑑(𝛽𝐿−1 ) 𝑑(𝐿)𝑦𝑡 (32.2)
𝜕𝑦𝑡
𝜕𝐽
= 2𝛽 𝑁 𝑑0 𝑑(𝐿)𝑦𝑁
𝜕𝑦𝑁
𝜕𝐽
= 2𝛽 𝑁−1 [𝑑0 + 𝛽 𝑑1 𝐿−1 ] 𝑑(𝐿)𝑦𝑁−1
𝜕𝑦𝑁−1 (32.3)
⋮ ⋮
𝜕𝐽
= 2𝛽 𝑁−𝑚+1 [𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 ]𝑑(𝐿)𝑦𝑁−𝑚+1
𝜕𝑦𝑁−𝑚+1
With these preliminaries under our belts, we are ready to differentiate (32.1).
Differentiating (32.1) with respect to 𝑦𝑡 for 𝑡 = 0, … , 𝑁 − 𝑚 gives the Euler equations
The system of equations (32.4) forms a 2 × 𝑚 order linear difference equation that must hold for the values of 𝑡 indicated.
In the finite 𝑁 problem, we want simultaneously to solve (32.4) subject to the 𝑚 initial conditions 𝑦−1 , … , 𝑦−𝑚 and the
𝑚 terminal conditions (32.5).
These conditions uniquely pin down the solution of the finite 𝑁 problem.
That is, for the finite 𝑁 problem, conditions (32.4) and (32.5) are necessary and sufficient for a maximum, by concavity
of the objective function.
Next, we describe how to obtain the solution using matrix methods.
Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ control problem.
[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑 (𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 1
𝑁
(32.6)
𝛽 [𝑎𝑁 − ℎ 𝑦𝑁 − 𝑑0 𝑑 (𝐿)𝑦𝑁 ] = 0
where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿.
These equations are to be solved for 𝑦0 , 𝑦1 , … , 𝑦𝑁 as functions of 𝑎0 , 𝑎1 , … , 𝑎𝑁 and 𝑦−1 .
Let
(𝜙0 − 𝑑12 ) 𝜙1 0 0 … … 0 𝑦𝑁 𝑎𝑁
⎡ 𝛽𝜙 𝜙0 𝜙1 0 … … 0 ⎤ ⎡𝑦𝑁−1 ⎤ ⎡ 𝑎𝑁−1 ⎤
1
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 0 𝛽𝜙 1 𝜙0 𝜙1 … … 0 ⎥ ⎢𝑦𝑁−2 ⎥ ⎢ 𝑎𝑁−2 ⎥
= (32.7)
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 … … … 𝛽𝜙1 𝜙0 𝜙1 ⎥ ⎢ 𝑦1 ⎥ ⎢ 𝑎1 ⎥
⎣ 0 … … … 0 𝛽𝜙1 𝜙0 ⎦ ⎣ 𝑦0 ⎦ ⎣𝑎0 − 𝜙1 𝑦−1 ⎦
or
𝑊 𝑦 ̄ = 𝑎̄ (32.8)
1. The first element differs from the remaining diagonal elements, reflecting the terminal condition.
2. The sub-diagonal elements equal 𝛽 time the super-diagonal elements.
The solution of (32.8) can be expressed in the form
𝑦 ̄ = 𝑊 −1 𝑎̄ (32.9)
An Alternative Representation
An alternative way to express the solution to (32.7) or (32.8) is in so-called feedback-feedforward form.
The idea here is to find a solution expressing 𝑦𝑡 as a function of past 𝑦’s and current and future 𝑎’s.
To achieve this solution, one can use an LU decomposition of 𝑊 .
There always exists a decomposition of 𝑊 of the form 𝑊 = 𝐿𝑈 where
• 𝐿 is an (𝑁 + 1) × (𝑁 + 1) lower triangular matrix.
• 𝑈 is an (𝑁 + 1) × (𝑁 + 1) upper triangular matrix.
The factorization can be normalized so that the diagonal elements of 𝑈 are unity.
Using the LU representation in (32.9), we obtain
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (32.10)
1 𝑈12 0 0 … 0 0 𝑦𝑁
⎡0 1 𝑈23 0 … 0 0 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑁−1 ⎥
⎢0 0 1 𝑈34 … 0 0 ⎥ 𝑦
⎢ 𝑁−2 ⎥
⎢0 0 0 1 … 0 0 ⎥ ⎢𝑦𝑁−3 ⎥ =
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 … 1 𝑈𝑁,𝑁+1 ⎥ ⎢ 𝑦1 ⎥
⎣0 0 0 0 … 0 1 ⎦ ⎣ 𝑦0 ⎦
𝐿−1
11 0 0 … 0 𝑎𝑁
⎡ 𝐿−1 𝐿−1 0 … 0 ⎤⎡ 𝑎 ⎤
⎢ 21
−1
22
⎥⎢ 𝑁−1
⎥
⎢ 𝐿31 𝐿−1
32 𝐿−1
33 … 0 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥
⎢ 𝐿−1
𝑁,1 𝐿−1
𝑁,2 𝐿−1
𝑁,3 … 0 ⎥⎢ 𝑎1 ⎥
−1
⎣𝐿𝑁+1,1 𝐿−1
𝑁+1,2 𝐿−1
𝑁+1,3
−1
… 𝐿𝑁+1 𝑁+1 ⎦ ⎣𝑎0 − 𝜙1 𝑦−1 ⎦
where 𝐿−1
𝑖𝑗 is the (𝑖, 𝑗) element of 𝐿
−1
and 𝑈𝑖𝑗 is the (𝑖, 𝑗) element of 𝑈 .
Note how the left side for a given 𝑡 involves 𝑦𝑡 and one lagged value 𝑦𝑡−1 while the right side involves all future values of
the forcing process 𝑎𝑡 , 𝑎𝑡+1 , … , 𝑎𝑁 .
We briefly indicate how this approach extends to the problem with 𝑚 > 1.
Assume that 𝛽 = 1 and let 𝐷𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) symmetric matrix whose elements are determined from the
following formula:
𝑦𝑁 𝑎𝑁 𝑦𝑁−𝑚+1
⎡𝑦 ⎤ ⎡𝑎 ⎤ ⎡𝑦 ⎤
(𝐷𝑚+1 + ℎ𝐼𝑚+1 ) ⎢ 𝑁−1 ⎥ = ⎢ 𝑁−1 ⎥ + 𝑀 ⎢ 𝑁−𝑚−2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑦
⎣ 𝑁−𝑚 ⎦ ⎣𝑎𝑁−𝑚 ⎦ 𝑦
⎣ 𝑁−2𝑚 ⎦
where 𝑀 is (𝑚 + 1) × 𝑚 and
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (32.11)
𝑡 𝑁−𝑡
∑ 𝑈−𝑡+𝑁+1, −𝑡+𝑁+𝑗+1 𝑦𝑡−𝑗 = ∑ 𝐿−𝑡+𝑁+1, −𝑡+𝑁+1−𝑗 𝑎𝑡+𝑗
̄ ,
𝑗=0 𝑗=0
𝑡 = 0, 1, … , 𝑁
where 𝐿−1
𝑡,𝑠 is the element in the (𝑡, 𝑠) position of 𝐿, and similarly for 𝑈 .
The left side of equation (32.11) is the “feedback” part of the optimal control law for 𝑦𝑡 , while the right-hand side is the
“feedforward” part.
We note that there is a different control law for each 𝑡.
Thus, in the finite horizon case, the optimal control law is time-dependent.
It is natural to suspect that as 𝑁 → ∞, (32.11) becomes equivalent to the solution of our infinite horizon problem, which
below we shall show can be expressed as
−1
so that as 𝑁 → ∞ we expect that for each fixed 𝑡, 𝑈𝑡,𝑡−𝑗 → 𝑐𝑗 and 𝐿𝑡,𝑡+𝑗 approaches the coefficient on 𝐿−𝑗 in the
−1 −1
expansion of 𝑐(𝛽𝐿 ) .
This suspicion is true under general conditions that we shall study later.
For now, we note that by creating the matrix 𝑊 for large 𝑁 and factoring it into the 𝐿𝑈 form, good approximations to
𝑐(𝐿) and 𝑐(𝛽𝐿−1 )−1 can be obtained.
For the infinite horizon problem, we propose to discover first-order necessary conditions by taking the limits of (32.4)
and (32.5) as 𝑁 → ∞.
This approach is valid, and the limits of (32.4) and (32.5) as 𝑁 approaches infinity are first-order necessary conditions
for a maximum.
However, for the infinite horizon problem with 𝛽 < 1, the limits of (32.4) and (32.5) are, in general, not sufficient for a
maximum.
That is, the limits of (32.5) do not provide enough information uniquely to determine the solution of the Euler equation
(32.4) that maximizes (32.1).
As we shall see below, a side condition on the path of 𝑦𝑡 that together with (32.4) is sufficient for an optimum is
∞
∑ 𝛽 𝑡 ℎ𝑦𝑡2 < ∞ (32.12)
𝑡=0
All paths that satisfy the Euler equations, except the one that we shall select below, violate this condition and, therefore,
evidently lead to (much) lower values of (32.1) than does the optimal path selected by the solution procedure below.
Consider the characteristic equation associated with the Euler equation
where 𝑧0 is a constant.
1 𝛽 −1
In (32.14), we substitute (𝑧 − 𝑧𝑗 ) = −𝑧𝑗 (1 − 𝑧𝑗 𝑧) and (𝑧 − 𝛽𝑧𝑗−1 ) = 𝑧(1 − 𝑧𝑗 𝑧 ) for 𝑗 = 1, … , 𝑚 to get
1 1 1 1
ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = (−1)𝑚 (𝑧0 𝑧1 ⋯ 𝑧𝑚 )(1 − 𝑧) ⋯ (1 − 𝑧)(1 − 𝛽𝑧 −1 ) ⋯ (1 − 𝛽𝑧 −1 )
𝑧1 𝑧𝑚 𝑧1 𝑧𝑚
𝑚
Now define 𝑐(𝑧) = ∑𝑗=0 𝑐𝑗 𝑧 𝑗 as
1/2 𝑧 𝑧 𝑧
𝑐 (𝑧) = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] (1 − ) (1 − ) ⋯ (1 − ) (32.15)
𝑧1 𝑧2 𝑧𝑚
𝑐(𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧) (32.17)
where
1/2 1
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] ; 𝜆𝑗 = , 𝑗 = 1, … , 𝑚
𝑧𝑗
√ √
Since |𝑧𝑗 | > 𝛽 for 𝑗 = 1, … , 𝑚 it follows that |𝜆𝑗 | < 1/ 𝛽 for 𝑗 = 1, … , 𝑚.
Using (32.17), we can express the factorization (32.16) as
In sum, we have constructed a factorization (32.16) of the characteristic polynomial for the Euler equation in which the
zeros of 𝑐(𝑧) exceed 𝛽 1/2 in modulus, and the zeros of 𝑐 (𝛽𝑧 −1 ) are less than 𝛽 1/2 in modulus.
Using (32.16), we now write the Euler equation as
𝑐(𝛽𝐿−1 ) 𝑐 (𝐿) 𝑦𝑡 = 𝑎𝑡
The unique solution of the Euler equation that satisfies condition (32.12) is
This can be established by using an argument paralleling that in chapter IX of [Sargent, 1987].
To exhibit the solution in a form paralleling that of [Sargent, 1987], we use (32.17) to write (32.18) as
𝑐0−2 𝑎𝑡
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = −1
(32.19)
(1 − 𝛽𝜆1 𝐿 ) ⋯ (1 − 𝛽𝜆𝑚 𝐿−1 )
Using partial fractions, we can write the characteristic polynomial on the right side of (32.19) as
𝑚
𝐴𝑗 𝑐0−2
∑ where 𝐴𝑗 ∶= 𝜆𝑖
1 − 𝜆𝑗 𝛽𝐿−1 ∏𝑖≠𝑗 (1 −
𝑗=1 𝜆𝑗 )
or
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (32.20)
𝑗=1 𝑘=0
Equation (32.20) expresses the optimum sequence for 𝑦𝑡 in terms of 𝑚 lagged 𝑦’s, and 𝑚 weighted infinite geometric
sums of future 𝑎𝑡 ’s.
Furthermore, (32.20) is the unique solution of the Euler equation that satisfies the initial conditions and condition (32.12).
In effect, condition (32.12) compels us to solve the “unstable” roots of ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) forward (see [Sargent, 1987]).
The step of√factoring the polynomial ℎ + 𝑑(𝛽𝑧 −1 ) 𝑑(𝑧) into 𝑐 (𝛽𝑧 −1 )𝑐 (𝑧), where the zeros of 𝑐 (𝑧) all have modulus
exceeding 𝛽, is central to solving the problem.
We note two features of the solution (32.20)
√ √
• Since |𝜆𝑗 | < 1/ 𝛽 for all 𝑗, it follows that (𝜆𝑗 𝛽) < 𝛽.
√
• The assumption that {𝑎𝑡 } is of exponential order less than 1/ 𝛽 is sufficient to guarantee that the geometric sums
of future 𝑎𝑡 ’s on the right side of (32.20) converge.
We immediately see that those sums will converge under the weaker condition that {𝑎𝑡 } is of exponential order less than
𝜙−1 where 𝜙 = max {𝛽𝜆𝑖 , 𝑖 = 1, … , 𝑚}.
Note that with 𝑎𝑡 identically zero, (32.20) implies that in general |𝑦𝑡 | eventually grows exponentially at a rate given by
max𝑖 |𝜆𝑖 |.
√
The condition max𝑖 |𝜆𝑖 | < 1/ 𝛽 guarantees that condition (32.12) is satisfied.
√
In fact, max𝑖 |𝜆𝑖 | < 1/ 𝛽 is a necessary condition for (32.12) to hold.
Were (32.12) not satisfied, the objective function would diverge to −∞, implying that the 𝑦𝑡 path could not be optimal.
For example, with 𝑎𝑡 = 0, for all 𝑡, it is easy to describe a naive (nonoptimal) policy for {𝑦𝑡 , 𝑡 ≥ 0} that gives a finite
value of (32.17).
We can simply let 𝑦𝑡 = 0 for 𝑡 ≥ 0.
This policy involves at most 𝑚 nonzero values of ℎ𝑦𝑡2 and [𝑑(𝐿)𝑦𝑡 ]2 , and so yields a finite value of (32.1).
Therefore it is easy to dominate a path that violates (32.12).
It is worthwhile focusing on a special case of the LQ problems above: the undiscounted problem that emerges when
𝛽 = 1.
In this case, the Euler equation is
(ℎ + 𝑑(𝐿−1 )𝑑(𝐿)) 𝑦𝑡 = 𝑎𝑡
(ℎ + 𝑑 (𝑧 −1 )𝑑(𝑧)) = 𝑐 (𝑧 −1 ) 𝑐 (𝑧)
where
𝑐 (𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧)
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 … 𝑧𝑚 ]
|𝜆𝑗 | < 1 for 𝑗 = 1, … , 𝑚
1
𝜆𝑗 = for 𝑗 = 1, … , 𝑚
𝑧𝑗
𝑧0 = constant
The solution of the problem becomes
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
Discounted problems can always be converted into undiscounted problems via a simple transformation.
Consider problem (32.1) with 0 < 𝛽 < 1.
Define the transformed variables
𝑎𝑡̃ = 𝛽 𝑡/2 𝑎𝑡 , 𝑦𝑡̃ = 𝛽 𝑡/2 𝑦𝑡 (32.21)
𝑚
Then notice that 𝛽 𝑡 [𝑑 (𝐿)𝑦𝑡 ]2 = [𝑑 ̃(𝐿)𝑦𝑡̃ ]2 with 𝑑 ̃(𝐿) = ∑𝑗=0 𝑑𝑗̃ 𝐿𝑗 and 𝑑𝑗̃ = 𝛽 𝑗/2 𝑑𝑗 .
Then the original criterion function (32.1) is equivalent to
𝑁
1 1
lim ∑{𝑎𝑡̃ 𝑦𝑡̃ − ℎ 𝑦𝑡2̃ − [𝑑 ̃(𝐿) 𝑦𝑡̃ ]2 } (32.22)
𝑁→∞
𝑡=0
2 2
which is to be maximized over sequences {𝑦𝑡̃ , 𝑡 = 0, …} subject to 𝑦−1
̃ , ⋯ , 𝑦−𝑚
̃ given and {𝑎𝑡̃ , 𝑡 = 1, …} a known
bounded sequence.
The Euler equation for this problem is [ℎ + 𝑑 ̃(𝐿−1 ) 𝑑 ̃(𝐿)] 𝑦𝑡̃ = 𝑎𝑡̃ .
The solution is
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿) 𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ , (32.23)
𝑗=1 𝑘=0
where
𝑓𝑗 = 𝑓𝑗̃ 𝛽 −𝑗/2 , 𝐴𝑗 = 𝐴𝑗̃ , 𝜆𝑗 = 𝜆̃ 𝑗 𝛽 −1/2 (32.24)
The transformations (32.21) and the inverse formulas (32.24) allow us to solve a discounted problem by first solving a
related undiscounted problem.
32.6 Implementation
Here’s the code that computes solutions to the LQ problem using the methods described above.
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of ϕ upfront
#---------------------------------------------
ϕ = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
ϕ[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) \
@ self.d.reshape(1, self.m + 1),
k=-i
)
)
ϕ[self.m] = ϕ[self.m] + self.h
self.ϕ = ϕ
#-----------------------------------------------------
# If r is given calculate the vector ϕ_r
#-----------------------------------------------------
if r is None:
(continues on next page)
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)) \
.reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
ϕ = self.ϕ
for i in range(m):
W_m[N - i, :(m - i)] = ϕ[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic
equation associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can
be evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
ϕ = self.ϕ
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = ϕ_r[self.k + abs(i-j)]
return V
return d.rvs()
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary
diagonal matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1] \
.reshape(N + 1, 1)
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)) \
.reshape(N + 1 + self.m, 1)
32.6.1 Example
d = γ * np.asarray([1, -1])
y_m = np.asarray(y_m).reshape(m, 1)
plot_simulation()
plot_simulation(γ=5)
And here’s 𝛾 = 10
plot_simulation(γ=10)
32.7 Exercises
Exercise 32.7.1
Consider solving a discounted version (𝛽 < 1) of problem (32.1), as follows.
Convert (32.1) to the undiscounted problem (32.22).
Let the solution of (32.22) in feedback form be
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿)𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ (32.25)
𝑗=1 𝑘=0
Here
̃ −1 )𝑑(𝑧)
• ℎ + 𝑑(𝑧 ̃ = 𝑐(𝑧
̃ −1 )𝑐(𝑧)
̃
̃ ]1/2 (1 − 𝜆̃ 1 𝑧) ⋯ (1 − 𝜆̃ 𝑚 𝑧)
̃ = [(−1)𝑚 𝑧0̃ 𝑧1̃ ⋯ 𝑧𝑚
• 𝑐(𝑧)
̃ −1 ) 𝑑(𝑧).
where the 𝑧𝑗̃ are the zeros of ℎ + 𝑑(𝑧 ̃
Prove that (32.25) implies that the solution for 𝑦𝑡 in feedback form is
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + … + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ 𝛽 𝑘 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
Exercise 32.7.2
Solve the optimal control problem, maximize
2
1
∑ {𝑎𝑡 𝑦𝑡 − [(1 − 2𝐿)𝑦𝑡 ]2 }
𝑡=0
2
Note: This problem differs from the problem in the text in one important way: instead of ℎ > 0 in (32.1), ℎ = 0. This
has an important influence on the solution.
Exercise 32.7.3
Solve the infinite time-optimal control problem to maximize
𝑁
1
lim ∑ − [(1 − 2𝐿)𝑦𝑡 ]2 ,
𝑁→∞
𝑡=0
2
Exercise 32.7.4
Solve the infinite time problem, to maximize
𝑁
1
lim ∑ (.0000001) 𝑦𝑡2 − [(1 − 2𝐿)𝑦𝑡 ]2
𝑁→∞
𝑡=0
2
subject to 𝑦−1 given. Prove that the solution 𝑦𝑡 = 2𝑦𝑡−1 violates condition (32.12), and so is not optimal.
Prove that the optimal solution is approximately 𝑦𝑡 = .5𝑦𝑡−1 .
THIRTYTHREE
33.1 Overview
This is a sequel to the earlier lecture Classical Control with Linear Algebra.
That lecture used linear algebra – in particular, the LU decomposition – to formulate and solve a class of linear-quadratic
optimal control problems.
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition, to solve linear prediction and
filtering problems.
We exploit the useful fact that there is an intimate connection between two superficially different classes of problems:
• deterministic linear-quadratic (LQ) optimal control problems
• linear least squares prediction and filtering problems
The first class of problems involves no randomness, while the second is all about randomness.
Nevertheless, essentially the same mathematics solves both types of problem.
This connection, which is often termed “duality,” is present whether one uses “classical” or “recursive” solution procedures.
In fact, we saw duality at work earlier when we formulated control and prediction problems recursively in lectures LQ
dynamic programming problems, A first look at the Kalman filter, and The permanent income model.
A useful consequence of duality is that
• With every LQ control problem, there is implicitly affiliated a linear least squares prediction or filtering problem.
• With every linear least squares prediction or filtering problem there is implicitly affiliated a LQ control problem.
An understanding of these connections has repeatedly proved useful in cracking interesting applied problems.
For example, Sargent [Sargent, 1987] [chs. IX, XIV] and Hansen and Sargent [Hansen and Sargent, 1980] formulated
and solved control and filtering problems using 𝑧-transform methods.
In this lecture, we begin to investigate these ideas by using mostly elementary linear algebra.
This is the main purpose and focus of the lecture.
However, after showing matrix algebra formulas, we’ll summarize classic infinite-horizon formulas built on 𝑧-transform
and lag operator methods.
And we’ll occasionally refer to some of these formulas from the infinite dimensional problems as we present the finite
time formulas and associated linear algebra.
We’ll start with the following standard import:
603
Advanced Quantitative Economics with Python
import numpy as np
33.1.1 References
Useful references include [Whittle, 1963], [Hansen and Sargent, 1980], [Orfanidis, 1988], [Athanasios and Pillai, 1991],
and [Muth, 1960].
Let (𝑥1 , 𝑥2 , … , 𝑥𝑇 )′ = 𝑥 be a 𝑇 × 1 vector of random variables with mean 𝔼𝑥 = 0 and covariance matrix 𝔼𝑥𝑥′ = 𝑉 .
Here 𝑉 is a 𝑇 × 𝑇 positive definite matrix.
The 𝑖, 𝑗 component 𝐸𝑥𝑖 𝑥𝑗 of 𝑉 is the inner product between 𝑥𝑖 and 𝑥𝑗 .
We regard the random variables as being ordered in time so that 𝑥𝑡 is thought of as the value of some economic variable
at time 𝑡.
For example, 𝑥𝑡 could be generated by the random process described by the Wold representation presented in equation
(33.16) in the section below on infinite dimensional prediction and filtering.
In that case, 𝑉𝑖𝑗 is given by the coefficient on 𝑧 ∣𝑖−𝑗∣ in the expansion of 𝑔𝑥 (𝑧) = 𝑑(𝑧) 𝑑(𝑧 −1 ) + ℎ, which equals ℎ +
∞
∑𝑘=0 𝑑𝑘 𝑑𝑘+∣𝑖−𝑗∣ .
We want to construct 𝑗 step ahead linear least squares predictors of the form
𝑉 = 𝐿−1 (𝐿−1 )′
and
𝐿 𝑉 𝐿′ = 𝐼
604 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
𝐿11 𝑥1 = 𝜀1
𝐿21 𝑥1 + 𝐿22 𝑥2 = 𝜀2
(33.1)
⋮
𝐿𝑇 1 𝑥1 … + 𝐿𝑇 𝑇 𝑥𝑇 = 𝜀𝑇
or
𝑡−1
∑ 𝐿𝑡,𝑡−𝑗 𝑥𝑡−𝑗 = 𝜀𝑡 , 𝑡 = 1, 2, … 𝑇 (33.2)
𝑗=0
𝑥1 = 𝐿−1
11 𝜀1
𝑥2 = 𝐿−1 −1
22 𝜀2 + 𝐿21 𝜀1
, (33.3)
⋮
𝑥𝑇 = 𝐿−1 −1 −1
𝑇 𝑇 𝜀𝑇 + 𝐿𝑇 ,𝑇 −1 𝜀𝑇 −1 … + 𝐿𝑇 ,1 𝜀1
or
𝑡−1
𝑥𝑡 = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (33.4)
𝑗=0
where 𝐿−1 −1
𝑖,𝑗 denotes the 𝑖, 𝑗 element of 𝐿 .
To proceed, it is useful to drill down and note that for 𝑡 − 1 ≥ 𝑚 ≥ 1 we can rewrite (33.4) in the form of the moving
average representation
𝑚−1 𝑡−1
𝑥𝑡 = ∑ 𝐿−1 −1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 + ∑ 𝐿𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (33.6)
𝑗=0 𝑗=𝑚
𝑡−1
Representation (33.6) is an orthogonal decomposition of 𝑥𝑡 into a part ∑𝑗=𝑚 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that lies in the space spanned
𝑡−1
by [𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … , 𝑥1 ] and an orthogonal component ∑𝑗=𝑚 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that does not lie in that space but instead in
a linear space knowns as its orthogonal complement.
It follows that
𝑚−1
̂ 𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚−1 , … , 𝑥1 ] = ∑ 𝐿−1
𝔼[𝑥 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=0
33.2.1 Implementation
Here’s the code that computes solutions to LQ control and filtering problems using the methods described here and in
Classical Control with Linear Algebra.
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of ϕ upfront
#---------------------------------------------
ϕ = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
ϕ[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) \
@ self.d.reshape(1, self.m + 1),
k=-i
)
)
ϕ[self.m] = ϕ[self.m] + self.h
self.ϕ = ϕ
#-----------------------------------------------------
# If r is given calculate the vector ϕ_r
#-----------------------------------------------------
(continues on next page)
606 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)) \
.reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
ϕ = self.ϕ
for i in range(m):
W_m[N - i, :(m - i)] = ϕ[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic
equation associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can
be evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
ϕ = self.ϕ
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
608 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = ϕ_r[self.k + abs(i-j)]
return V
return d.rvs()
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary
diagonal matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1] \
.reshape(N + 1, 1)
610 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)) \
.reshape(N + 1 + self.m, 1)
33.2.2 Example 1
𝑥𝑡 = (1 − 2𝐿)𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
If we were to use the tools associated with infinite dimensional prediction and filtering to be described below, we would
use the Wiener-Kolmogorov formula (33.21) to compute the linear least squares forecasts 𝔼[𝑥𝑡+𝑗 ∣ 𝑥𝑡 , 𝑥𝑡−1 , …], for
𝑗 = 1, 2.
But we can do everything we want by instead using our finite dimensional tools and setting 𝑑 = 𝑟, generating an instance
of LQFilter, then invoking pertinent methods of LQFilter.
m = 1
y_m = np.asarray([.0]).reshape(m, 1)
d = np.asarray([1, -2])
r = np.asarray([1, -2])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()
example.roots_of_characteristic()
Now let’s form the covariance matrix of a time series vector of length 𝑁 and put it in 𝑉 .
Then we’ll take a Cholesky decomposition of 𝑉 = 𝐿−1 𝐿−1 and use it to form the vector of “moving average represen-
tations” 𝑥 = 𝐿−1 𝜀 and the vector of “autoregressive representations” 𝐿𝑥 = 𝜀.
V = example.construct_V(N=5)
print(V)
[[ 5. -2. 0. 0. 0.]
[-2. 5. -2. 0. 0.]
[ 0. -2. 5. -2. 0.]
[ 0. 0. -2. 5. -2.]
[ 0. 0. 0. -2. 5.]]
Notice how the lower rows of the “moving average representations” are converging to the appropriate infinite history
Wold representation to be described below when we study infinite horizon-prediction and filtering
Li = np.linalg.cholesky(V)
print(Li)
[[ 2.23606798 0. 0. 0. 0. ]
[-0.89442719 2.04939015 0. 0. 0. ]
[ 0. -0.97590007 2.01186954 0. 0. ]
[ 0. 0. -0.99410024 2.00293902 0. ]
[ 0. 0. 0. -0.99853265 2.000733 ]]
Notice how the lower rows of the “autoregressive representations” are converging to the appropriate infinite-history au-
toregressive representation to be described below when we study infinite horizon-prediction and filtering
L = np.linalg.inv(Li)
print(L)
[[0.4472136 0. 0. 0. 0. ]
[0.19518001 0.48795004 0. 0. 0. ]
[0.09467621 0.23669053 0.49705012 0. 0. ]
[0.04698977 0.11747443 0.2466963 0.49926632 0. ]
[0.02345182 0.05862954 0.12312203 0.24917554 0.49981682]]
612 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
33.2.3 Example 2
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
Let’s find a Wold moving average representation for 𝑥𝑡 that will prevail in the infinite-history context to be studied in
detail below.
To do this, we’ll use the Wiener-Kolomogorov formula (33.21) presented below to compute the linear least squares
forecasts 𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡−1 , …] for 𝑗 = 1, 2, 3.
We proceed in the same way as in example 1
m = 2
y_m = np.asarray([.0, .0]).reshape(m, 1)
d = np.asarray([1, 0, -np.sqrt(2)])
r = np.asarray([1, 0, -np.sqrt(2)])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()
example.roots_of_characteristic()
V = example.construct_V(N=8)
print(V)
[[ 3. 0. -1.41421356 0. 0. 0.
0. 0. ]
[ 0. 3. 0. -1.41421356 0. 0.
0. 0. ]
[-1.41421356 0. 3. 0. -1.41421356 0.
0. 0. ]
[ 0. -1.41421356 0. 3. 0. -1.41421356
0. 0. ]
[ 0. 0. -1.41421356 0. 3. 0.
-1.41421356 0. ]
[ 0. 0. 0. -1.41421356 0. 3.
0. -1.41421356]
[ 0. 0. 0. 0. -1.41421356 0.
3. 0. ]
[ 0. 0. 0. 0. 0. -1.41421356
0. 3. ]]
Li = np.linalg.cholesky(V)
print(Li[-3:, :])
[[ 0. 0. 0. -0.9258201 0. 1.46385011
0. 0. ]
[ 0. 0. 0. 0. -0.96609178 0.
1.43759058 0. ]
[ 0. 0. 0. 0. 0. -0.96609178
0. 1.43759058]]
L = np.linalg.inv(Li)
print(L)
[[0.57735027 0. 0. 0. 0. 0.
0. 0. ]
[0. 0.57735027 0. 0. 0. 0.
0. 0. ]
[0.3086067 0. 0.65465367 0. 0. 0.
0. 0. ]
[0. 0.3086067 0. 0.65465367 0. 0.
0. 0. ]
[0.19518001 0. 0.41403934 0. 0.68313005 0.
0. 0. ]
[0. 0.19518001 0. 0.41403934 0. 0.68313005
0. 0. ]
[0.13116517 0. 0.27824334 0. 0.45907809 0.
0.69560834 0. ]
[0. 0.13116517 0. 0.27824334 0. 0.45907809
0. 0.69560834]]
33.2.4 Prediction
It immediately follows from the “orthogonality principle” of least squares (see [Athanasios and Pillai, 1991] or [Sargent,
1987] [ch. X]) that
𝑡−1
̂ 𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … 𝑥1 ] = ∑ 𝐿−1
𝔼[𝑥 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=𝑚 (33.7)
= [𝐿−1 −1 −1
𝑡,1 𝐿𝑡,2 , … , 𝐿𝑡,𝑡−𝑚 0 0 … 0]𝐿 𝑥
This can be interpreted as a finite-dimensional version of the Wiener-Kolmogorov 𝑚-step ahead prediction formula.
We can use (33.7) to represent the linear least squares projection of the vector 𝑥 conditioned on the first 𝑠 observations
[𝑥𝑠 , 𝑥𝑠−1 … , 𝑥1 ].
We have
This formula will be convenient in representing the solution of control problems under uncertainty.
Equation (33.4) can be recognized as a finite dimensional version of a moving average representation.
Equation (33.2) can be viewed as a finite dimension version of an autoregressive representation.
Notice that even if the 𝑥𝑡 process is covariance stationary, so that 𝑉 is such that 𝑉𝑖𝑗 depends only on |𝑖−𝑗|, the coefficients
in the moving average representation are time-dependent, there being a different moving average for each 𝑡.
614 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
If 𝑥𝑡 is a covariance stationary process, the last row of 𝐿−1 converges to the coefficients in the Wold moving average
representation for {𝑥𝑡 } as 𝑇 → ∞.
Further, if 𝑥𝑡 is covariance stationary, for fixed 𝑘 and 𝑗 > 0, 𝐿−1 −1
𝑇 ,𝑇 −𝑗 converges to 𝐿𝑇 −𝑘,𝑇 −𝑘−𝑗 as 𝑇 → ∞.
That is, the “bottom” rows of 𝐿−1 converge to each other and to the Wold moving average coefficients as 𝑇 → ∞.
This last observation gives one simple and widely-used practical way of forming a finite 𝑇 approximation to a Wold
moving average representation.
′
First, form the covariance matrix 𝔼𝑥𝑥′ = 𝑉 , then obtain the Cholesky decomposition 𝐿−1 𝐿−1 of 𝑉 , which can be
accomplished quickly on a computer.
The last row of 𝐿−1 gives the approximate Wold moving average coefficients.
This method can readily be generalized to multivariate systems.
where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚 , 𝐿 is the lag operator, 𝑎̄ = [𝑎𝑁 , 𝑎𝑁−1 … , 𝑎1 , 𝑎0 ]′ a random vector with mean
̄ ′̄ = 𝑉 .
zero and 𝔼 𝑎𝑎
The variables 𝑦−1 , … , 𝑦−𝑚 are given.
Maximization is over choices of 𝑦0 , 𝑦1 … , 𝑦𝑁 , where 𝑦𝑡 is required to be a linear function of {𝑦𝑡−𝑠−1 , 𝑡 + 𝑚 − 1 ≥
0; 𝑎𝑡−𝑠 , 𝑡 ≥ 𝑠 ≥ 0}.
We saw in the lecture Classical Control with Linear Algebra that the solution of this problem under certainty could be
represented in the feedback-feedforward form
𝑦−1
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
⎣𝑦−𝑚 ⎦
0 0
𝔼[̂ 𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] = 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄
0 𝐼(𝑠+1)
where 𝐼(𝑠+1) is the (𝑠 + 1) × (𝑠 + 1) identity matrix, and 𝑉 = 𝑈̃ −1 𝑈̃ −1 , where 𝑈̃ is the upper triangular Cholesky
′
𝑦−1
0 0
𝑈 𝑦 ̄ = 𝐿−1 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
0 𝐼(𝑡+1)
⎣𝑦−𝑚 ⎦
It is instructive to compare the finite-horizon formulas based on linear algebra decompositions of finite-dimensional
covariance matrices with classic formulas for infinite horizon and infinite history prediction and control problems.
These classic infinite horizon formulas used the mathematics of 𝑧-transforms and lag operators.
We’ll meet interesting lag operator and 𝑧-transform counterparts to our finite horizon matrix formulas.
We pose two related prediction and filtering problems.
We let 𝑌𝑡 be a univariate 𝑚th order moving average, covariance stationary stochastic process,
𝑌𝑡 = 𝑑(𝐿)𝑢𝑡 (33.9)
𝑚
where 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 , and 𝑢𝑡 is a serially uncorrelated stationary random process satisfying
𝔼𝑢𝑡 = 0
1 if 𝑡 = 𝑠 (33.10)
𝔼𝑢𝑡 𝑢𝑠 = {
0 otherwise
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 (33.11)
where 𝜀𝑡 is a serially uncorrelated stationary random process with 𝔼𝜀𝑡 = 0 and 𝔼𝜀𝑡 𝜀𝑠 = 0 for all distinct 𝑡 and 𝑠.
We also assume that 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
The linear least squares prediction problem is to find the 𝐿2 random variable 𝑋̂ 𝑡+𝑗 among linear combinations of
{𝑋𝑡 , 𝑋𝑡−1 , …} that minimizes 𝔼(𝑋̂ 𝑡+𝑗 − 𝑋𝑡+𝑗 )2 .
∞ ∞
That is, the problem is to find a 𝛾𝑗 (𝐿) = ∑𝑘=0 𝛾𝑗𝑘 𝐿𝑘 such that ∑𝑘=0 |𝛾𝑗𝑘 |2 < ∞ and 𝔼[𝛾𝑗 (𝐿)𝑋𝑡 − 𝑋𝑡+𝑗 ]2 is
minimized.
∞ ∞
The linear least squares filtering problem is to find a 𝑏 (𝐿) = ∑𝑗=0 𝑏𝑗 𝐿𝑗 such that ∑𝑗=0 |𝑏𝑗 |2 < ∞ and 𝔼[𝑏 (𝐿)𝑋𝑡 −
𝑌𝑡 ]2 is minimized.
Interesting versions of these problems related to the permanent income theory were studied by [Muth, 1960].
𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏
𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 𝜏 = 0, ±1, ±2, … (33.12)
𝐶𝑌 ,𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏
616 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
𝑦𝑡 = 𝐴(𝐿)𝑣1𝑡 + 𝐵(𝐿)𝑣2𝑡
𝑥𝑡 = 𝐶(𝐿)𝑣1𝑡 + 𝐷(𝐿)𝑣2𝑡
Then, as shown for example in [Sargent, 1987] [ch. XI], it is true that
𝑔𝑌 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
𝑔𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 ) + ℎ (33.15)
𝑔𝑌 𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
The key step in obtaining solutions to our problems is to factor the covariance generating function 𝑔𝑋 (𝑧) of 𝑋.
The solutions of our problems are given by formulas due to Wiener and Kolmogorov.
These formulas utilize the Wold moving average representation of the 𝑋𝑡 process,
𝑋𝑡 = 𝑐 (𝐿) 𝜂𝑡 (33.16)
𝑚
where 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿𝑗 , with
̂ 𝑡 |𝑋𝑡−1 , 𝑋𝑡−2 , …]
𝑐0 𝜂𝑡 = 𝑋𝑡 − 𝔼[𝑋 (33.17)
Therefore, we have already shown constructively how to factor the covariance generating function 𝑔𝑋 (𝑧) = 𝑑(𝑧) 𝑑 (𝑧 −1 )+
ℎ.
We now introduce the annihilation operator:
∞ ∞
[ ∑ 𝑓𝑗 𝐿 𝑗 ] ≡ ∑ 𝑓𝑗 𝐿 𝑗 (33.20)
𝑗=−∞ 𝑗=0
+
𝑐(𝐿)
𝛾𝑗 (𝐿) = [ ] 𝑐 (𝐿)−1 (33.21)
𝐿𝑗 +
̂ 𝑡 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = 𝑏(𝐿)𝑋𝑡 .
We have defined the solution of the filtering problem as 𝔼[𝑌
The Wiener-Kolomogorov formula for 𝑏(𝐿) is
𝑔𝑌 𝑋 (𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝑐(𝐿−1 ) +
or
𝑑(𝐿)𝑑(𝐿−1 )
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (33.22)
𝑐(𝐿−1 ) +
Formulas (33.21) and (33.22) are discussed in detail in [Whittle, 1983] and [Sargent, 1987].
The interested reader can there find several examples of the use of these formulas in economics Some classic examples
using these formulas are due to [Muth, 1960].
As an example of the usefulness of formula (33.22), we let 𝑋𝑡 be a stochastic process with Wold moving average repre-
sentation
𝑋𝑡 = 𝑐(𝐿)𝜂𝑡
Suppose that at time 𝑡, we wish to predict a geometric sum of future 𝑋’s, namely
∞
1
𝑦𝑡 ≡ ∑ 𝛿 𝑗 𝑋𝑡+𝑗 = 𝑋
𝑗=0
1 − 𝛿𝐿−1 𝑡
618 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
𝑐(𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (33.23)
1 − 𝛿𝐿−1 +
In order to evaluate the term in the annihilation operator, we use the following result from [Hansen and Sargent, 1980].
Proposition Let
∞ ∞
• 𝑔(𝑧) = ∑𝑗=0 𝑔𝑗 𝑧𝑗 where ∑𝑗=0 |𝑔𝑗 |2 < +∞.
and, alternatively,
𝑛
𝑔(𝑧) 𝑧𝑔(𝑧) − 𝛿𝑗 𝑔(𝛿𝑗 )
[ −1
] = ∑ 𝐵𝑗 ( ) (33.25)
ℎ(𝑧 ) + 𝑗=1 𝑧 − 𝛿𝑗
𝑛
where 𝐵𝑗 = 1/ ∏ 𝑘=1 (1 − 𝛿𝑘 /𝛿𝑗 ).
𝑘+𝑗
Applying formula (33.25) of the proposition to evaluating (33.23) with 𝑔(𝑧) = 𝑐(𝑧) and ℎ(𝑧−1 ) = 1 − 𝛿𝑧 −1 gives
𝐿𝑐(𝐿) − 𝛿𝑐(𝛿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝐿−𝛿
or
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝑏(𝐿) = [ ]
1 − 𝛿𝐿−1
Thus, we have
∞
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝔼̂ [∑ 𝛿 𝑗 𝑋𝑡+𝑗 |𝑋𝑡 , 𝑥𝑡−1 , …] = [ ] 𝑋𝑡 (33.26)
𝑗=0
1 − 𝛿𝐿−1
This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control with Linear Algebra in which
the randomness emerges because {𝑎𝑡 } is a stochastic process.
The problem is to maximize
𝑁
1 2 1
𝔼0 lim ∑ 𝛽 𝑡 [𝑎𝑡 𝑦𝑡 − ℎ𝑦 − [𝑑(𝐿)𝑦𝑡 ]2 ] (33.27)
𝑁→∞
𝑡−0
2 𝑡 2
where 𝔼𝑡 is mathematical expectation conditioned on information known at 𝑡, and where {𝑎𝑡 } is a covariance stationary
stochastic process with Wold moving average representation
𝑎𝑡 = 𝑐(𝐿) 𝜂𝑡
where
𝑛̃
𝑐(𝐿) = ∑ 𝑐𝑗 𝐿𝑗
𝑗=0
and
̂ 𝑡 |𝑎𝑡−1 , …]
𝜂𝑡 = 𝑎𝑡 − 𝔼[𝑎
The problem is to maximize (33.27) with respect to a contingency plan expressing 𝑦𝑡 as a function of information known
at 𝑡, which is assumed to be (𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑎𝑡 , 𝑎𝑡−1 , …).
The solution of this problem can be achieved in two steps.
First, ignoring the uncertainty, we can solve the problem assuming that {𝑎𝑡 } is a known sequence.
The solution is, from above,
or
𝑚 ∞
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑(𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (33.28)
𝑗=1 𝑘=0
Second, the solution of the problem under uncertainty is obtained by replacing the terms on the right-hand side of the
above expressions with their linear least squares predictors.
Using (33.26) and (33.28), we have the following solution
𝑚
1 − 𝛽𝜆𝑗 𝑐(𝛽𝜆𝑗 )𝐿−1 𝑐(𝐿)−1
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 [ ] 𝑎𝑡
𝑗=1
1 − 𝛽𝜆𝑗 𝐿−1
Blaschke factors
The following is a useful piece of mathematics underlying “root flipping”.
𝑚
Let 𝜋(𝑧) = ∑𝑗=0 𝜋𝑗 𝑧𝑗 and let 𝑧1 , … , 𝑧𝑘 be the zeros of 𝜋(𝑧) that are inside the unit circle, 𝑘 < 𝑚.
Then define
(𝑧1 𝑧 − 1) (𝑧 𝑧 − 1) (𝑧 𝑧 − 1)
𝜃(𝑧) = 𝜋(𝑧)( )( 2 )…( 𝑘 )
(𝑧 − 𝑧1 ) (𝑧 − 𝑧2 ) (𝑧 − 𝑧𝑘 )
and that the zeros of 𝜃(𝑧) are not inside the unit circle.
33.5 Exercises
Exercise 33.5.1
Let 𝑌𝑡 = (1 − 2𝐿)𝑢𝑡 where 𝑢𝑡 is a mean zero white noise with 𝔼𝑢2𝑡 = 1. Let
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated white noise with 𝔼𝜀2𝑡 = 9, and 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
620 Chapter 33. Classical Prediction and Filtering With Linear Algebra
Advanced Quantitative Economics with Python
Exercise 33.5.2
Multivariable Prediction: Let 𝑌𝑡 be an (𝑛 × 1) vector stochastic process with moving average representation
𝑌𝑡 = 𝐷(𝐿)𝑈𝑡
𝑚
where 𝐷(𝐿) = ∑𝑗=0 𝐷𝑗 𝐿𝐽 , 𝐷𝑗 an 𝑛 × 𝑛 matrix, 𝑈𝑡 an (𝑛 × 1) vector white noise with 𝔼𝑈𝑡 = 0 for all 𝑡, 𝔼𝑈𝑡 𝑈𝑠′ = 0
for all 𝑠 ≠ 𝑡, and 𝔼𝑈𝑡 𝑈𝑡′ = 𝐼 for all 𝑡.
Let 𝜀𝑡 be an 𝑛 × 1 vector white noise with mean 0 and contemporaneous covariance matrix 𝐻, where 𝐻 is a positive
definite matrix.
Let 𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 .
′ ′ ′
Define the covariograms as 𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏 , 𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 , 𝐶𝑌 𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏 .
Then define the matrix covariance generating function, as in (32.21), only interpret all the objects in (32.21) as matrices.
Show that the covariance generating functions are given by
𝑔𝑦 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑔𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻
𝑔𝑌 𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
A factorization of 𝑔𝑋 (𝑧) can be found (see [Rozanov, 1967] or [Whittle, 1983]) of the form
𝑚
𝐷(𝑧)𝐷(𝑧−1 )′ + 𝐻 = 𝐶(𝑧)𝐶(𝑧 −1 )′ , 𝐶(𝑧) = ∑ 𝐶𝑗 𝑧𝑗
𝑗=0
where the zeros of |𝐶(𝑧)| do not lie inside the unit circle.
A vector Wold moving average representation of 𝑋𝑡 is then
𝑋𝑡 = 𝐶(𝐿)𝜂𝑡
where 𝜂𝑡 is an (𝑛 × 1) vector white noise that is “fundamental” for 𝑋𝑡 .
That is, 𝑋𝑡 − 𝔼̂ [𝑋𝑡 ∣ 𝑋𝑡−1 , 𝑋𝑡−2 …] = 𝐶0 𝜂𝑡 .
The optimum predictor of 𝑋𝑡+𝑗 is
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝑗 ] 𝜂𝑡
𝐿 +
If 𝐶(𝐿) is invertible, i.e., if the zeros of det 𝐶(𝑧) lie strictly outside the unit circle, then this formula can be written
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝐽 ] 𝐶(𝐿)−1 𝑋𝑡
𝐿 +
622 Chapter 33. Classical Prediction and Filtering With Linear Algebra
CHAPTER
THIRTYFOUR
In addition to what’s in Anaconda, this lecture will need the following libraries:
34.1 Introduction
Robert E. Lucas, Jr. [Robert E. Lucas, 1975], Kenneth Kasa [Kasa, 2000], and Robert Townsend [Townsend, 1983]
showed that putting decision makers into environments in which they want to infer persistent hidden state variables from
equilibrium prices and quantities can elongate and amplify impulse responses to aggregate shocks.
This provides a promising way to think about amplification mechanisms in business cycle models.
Townsend [Townsend, 1983] noted that living in such environments makes decision makers want to forecast forecasts of
others.
This theme has been pursued for situations in which decision makers’ imperfect information forces them to pursue an
infinite recursion that involves forming beliefs about the beliefs of others (e.g., [Allen et al., 2002]).
Lucas [Robert E. Lucas, 1975] side stepped having decision makers forecast the forecasts of other decision makers by
assuming that they simply pool their information before forecasting.
A pooling equilibrium like Lucas’s plays a prominent role in this lecture.
Because he didn’t assume such pooling, [Townsend, 1983] confronted the forecasting the forecasts of others problem.
To formulate the problem recursively required that Townsend define a decision maker’s state vector.
Townsend concluded that his original model required an intractable infinite dimensional state space.
Therefore, he constructed a more manageable approximating model in which a hidden Markov component of a demand
shock is revealed to all firms after a fixed, finite number of periods.
In this lecture, we illustrate again the theme that finding the state is an art by showing how to formulate Townsend’s
original model in terms of a low-dimensional state space.
We show that Townsend’s model shares equilibrium prices and quantities with those that prevail in a pooling equilibrium.
That finding emerged from a line of research about Townsend’s model that built on [Pearlman et al., 1986] and that
culminated in [Pearlman and Sargent, 2005] .
Rather than directly deploying the [Pearlman et al., 1986] machinery here, we shall instead implement a sneaky guess-
and-verify tactic.
• We first compute a pooling equilibrium and represent it as an instance of a linear state-space system provided by
the Python class quantecon.LinearStateSpace.
623
Advanced Quantitative Economics with Python
• Leaving the state-transition equation for the pooling equilibrium unaltered, we alter the observation vector for a
firm to match what it is in Townsend’s original model. So rather than directly observing the signal received by firms
in the other industry, a firm sees the equilibrium price of the good produced by the other industry.
• We compute a population linear least squares regression of the noisy signal at time 𝑡 that firms in the other industry
would receive in a pooling equilibrium on time 𝑡 information that a firm receives in Townsend’s original model.
• The 𝑅2 in this regression equals 1.
• That verifies that a firm’s information set in Townsend’s original model equals its information set in a pooling
equilibrium.
• Therefore, equilibrium prices and quantities in Townsend’s original model equal those in a pooling equilibrium.
We proceed by describing a sequence of models of two industries that are linked in a single way:
• shocks to the demand curves for their products have a common component.
The models are simplified versions of Townsend’s [Townsend, 1983].
Townsend’s is a model of a rational expectations equilibrium in which firms want to forecast forecasts of others.
In Townsend’s model, firms condition their forecasts on observed endogenous variables whose equilibrium laws of motion
are determined by their own forecasting functions.
We shall assemble model components progressively in ways that can help us to appreciate the structure of the pooling
equilibrium that ultimately interests us.
While keeping all other aspects of the model the same, we shall study consequences of alternative assumptions about
what decision makers observe.
Technically, this lecture deploys concepts and tools that appear in First Look at Kalman Filter and Rational Expectations
Equilibrium.
where 𝑃𝑡𝑖 is the price of good 𝑖 at 𝑡, 𝑌𝑡𝑖 = 𝑓𝐾𝑡𝑖 is output in market 𝑖, 𝜃𝑡 is a persistent component of a demand shock
that is common across the two industries, and 𝜖𝑖𝑡 is an industry specific component of the demand shock that is i.i.d. and
whose time 𝑡 marginal distribution is 𝒩(0, 𝜎𝜖2 ).
where {𝑣𝑡 } is an i.i.d. sequence of Gaussian shocks, each with mean zero and variance 𝜎𝑣2 .
To simplify notation, we’ll study a special case by setting ℎ = 𝑓 = 1.
Costs of adjusting their capital stocks impart to firms an incentive to forecast the price of the good that they sell.
Throughout, we use the rational expectations equilibrium concept presented in this lecture Rational Expectations Equi-
librium.
We let capital letters denote market wide objects and lower case letters denote objects chosen by a representative firm.
In each industry, a competitive equilibrium prevails.
To rationalize the big 𝐾, little 𝑘 connection, we can think of there being a continuum of firms in industry 𝑖, with each
1
firm being indexed by 𝜔 ∈ [0, 1] and 𝐾 𝑖 = ∫0 𝑘𝑖 (𝜔)𝑑𝜔.
In equilibrium, 𝑘𝑡𝑖 = 𝐾𝑡𝑖 , but we must distinguish between 𝑘𝑡𝑖 and 𝐾𝑡𝑖 when we pose the firm’s optimization problem.
34.3 Tactics
We shall compute equilibrium laws of motion for capital in industry 𝑖 under a sequence of assumptions about what a
representative firm observes.
Successive members of this sequence make a representative firm’s information more and more obscure.
We begin with the most information, then gradually withdraw information in a way that approaches and eventually reaches
the Townsend-like information structure that we are ultimately interested in.
Thus, we shall compute equilibria under the following alternative information structures:
• Perfect foresight: future values of 𝜃𝑡 , 𝜖𝑖𝑡 are observed in industry 𝑖.
• Observed history of stochastic 𝜃𝑡 : {𝜃𝑡 , 𝜖𝑖𝑡 } are realizations from a stochastic process; current and past values of
each are observed at time 𝑡 but future values are not.
• One noise-ridden observation on 𝜃𝑡 : values of {𝜃𝑡 , 𝜖𝑖𝑡 } separately are never observed. However, at time 𝑡, a
history 𝑤𝑡 of scalar noise-ridden observations on 𝜃𝑡 is observed at time 𝑡.
• Two noise-ridden observations on 𝜃𝑡 : values of {𝜃𝑡 , 𝜖𝑖𝑡 } separately are never observed. However, at time 𝑡, a
history 𝑤𝑡 of two noise-ridden observations on 𝜃𝑡 is observed at time 𝑡.
Successive computations build one on previous ones.
We proceed by first finding an equilibrium under perfect foresight.
To compute an equilibrium with current and past but not future values of 𝜃𝑡 observed, we use a certainty equivalence prin-
ciple to justify modifying the perfect foresight equilibrium by replacing future values of 𝜃𝑠 , 𝜖𝑖𝑠 , 𝑠 ≥ 𝑡 with mathematical
expectations conditioned on 𝜃𝑡 .
This provides the equilibrium when 𝜃𝑡 is observed at 𝑡 but future 𝜃𝑡+𝑗 and 𝜖𝑖𝑡+𝑗 are not observed.
To find an equilibrium when a history 𝑤𝑡 observations of a single noise-ridden 𝜃𝑡 is observed, we again apply a certainty
equivalence principle and replace future values of the random variables 𝜃𝑠 , 𝜖𝑖𝑠 , 𝑠 ≥ 𝑡 with their mathematical expectations
conditioned on 𝑤𝑡 .
To find an equilibrium when a history 𝑤𝑡 of two noisy signals on 𝜃𝑡 is observed, we replace future values of the random
variables 𝜃𝑠 , 𝜖𝑖𝑠 , 𝑠 ≥ 𝑡 with their mathematical expectations conditioned on history 𝑤𝑡 .
We call the equilibrium with two noise-ridden observations on 𝜃𝑡 a pooling equilibrium.
• It corresponds to an arrangement in which at the beginning of each period firms in industries 1 and 2 somehow get
together and share information about current values of their noisy signals on 𝜃.
We want ultimately to compare outcomes in a pooling equilibrium with an equilibrium under the following alternative
information structure for a firm in industry 𝑖 that originally interested Townsend [Townsend, 1983]:
• Firm 𝑖’s noise-ridden signal on 𝜃𝑡 and the price in industry −𝑖, a firm in industry 𝑖 observes a history 𝑤𝑡 of
one noise-ridden signal on 𝜃𝑡 and a history of industry −𝑖’s price is observed. (Here −𝑖 means ``not 𝑖’’.)
With this information structure, a representative firm 𝑖 sees the price as well as the aggregate endogenous state variable
𝑌𝑡𝑖 in its own industry.
That allows it to infer the total demand shock 𝜃𝑡 + 𝜖𝑖𝑡 .
However, at time 𝑡, the firm sees only 𝑃𝑡−𝑖 and does not see 𝑌𝑡−𝑖 , so that a firm in industry 𝑖 does not directly observe
𝜃𝑡 + 𝜖−𝑖
𝑡 .
Nevertheless, it will turn out that equilibrium prices and quantities in this equilibrium equal their counterparts in a pooling
equilibrium because firms in industry 𝑖 are able to infer the noisy signal about the demand shock received by firms in
industry −𝑖.
We shall verify this assertion by using a guess and verify tactic that involves running a least squares regression and
inspecting its 𝑅2 .1
𝑖
where {𝜙𝑡𝑖 } is a sequence of Lagrange multipliers on the transition law 𝑘𝑡+1 = 𝑘𝑡𝑖 + 𝜇𝑖𝑡 .
First order conditions for the nonstochastic problem are
𝑖
𝜙𝑡𝑖 = 𝛽𝜙𝑡+1 𝑖
+ 𝛽𝑃𝑡+1
(34.4)
𝜇𝑖𝑡 = 𝜙𝑡𝑖 .
Substituting the demand function (34.2) for 𝑃𝑡𝑖 , imposing the condition 𝑘𝑡𝑖 = 𝐾𝑡𝑖 that makes representative firm be
representative, and using definition (34.6) of 𝑔𝑡𝑖 , the Euler equation (34.4) lagged by one period can be expressed as
−𝑏𝑘𝑡𝑖 + 𝜃𝑡 + 𝜖𝑖𝑡 + (𝑘𝑡+1
𝑖
− 𝑘𝑡𝑖 ) − 𝑔𝑡𝑖 = 0 or
𝑖
𝑘𝑡+1 = (𝑏 + 1)𝑘𝑡𝑖 − 𝜃𝑡 − 𝜖𝑖𝑡 + 𝑔𝑡𝑖 (34.5)
In addition, we have the law of motion for 𝜃𝑡 , (34.3), and the demand equation (34.2).
1 [Pearlman and Sargent, 2005] verified this assertion using a different tactic, namely, by constructing analytic formulas for an equilibrium under
the incomplete information structure and confirming that they match the pooling equilibrium formulas derived here.
In summary, with perfect foresight, equilibrium conditions for industry 𝑖 comprise the following system of difference
equations:
𝑖
𝑘𝑡+1 = (1 + 𝑏)𝑘𝑡𝑖 − 𝜖𝑖𝑡 − 𝜃𝑡 + 𝑔𝑡𝑖
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡
𝑖
(34.8)
𝑔𝑡+1 = 𝛽 −1 (𝑔𝑡𝑖 − 𝑃𝑡𝑖 )
𝑃𝑡𝑖 = −𝑏𝑘𝑡𝑖 + 𝜖𝑖𝑡 + 𝜃𝑡
Without perfect foresight, the same system prevails except that the following equation replaces the third equation of
(34.8):
𝑖
𝑔𝑡+1,𝑡 = 𝛽 −1 (𝑔𝑡𝑖 − 𝑃𝑡𝑖 )
where 𝑥𝑡+1,𝑡 denotes the mathematical expectation of 𝑥𝑡+1 conditional on information at time 𝑡.
Our first step is to compute the equilibrium law of motion for 𝑘𝑡𝑖 under perfect foresight.
Let 𝐿 be the lag operator.2
Equations (34.7) and (34.5) imply the second order difference equation in 𝑘𝑡𝑖 :3
where |𝜆|̃ < 1 is the smaller root and 𝜆 is the larger root of (𝜆 − 1)(𝜆 − 1/𝛽) = 𝑏𝜆.
Therefore, (34.9) can be expressed as
𝜆̃ −1 (𝐿−1 − 𝜆)(1
̃ ̃ −1 )𝑘𝑡𝑖 = 𝛽𝐿−1 𝜖𝑖𝑡 + 𝛽𝐿−1 𝜃𝑡 .
− 𝜆𝛽𝐿
Solving the stable root backwards and the unstable root forwards gives
̃
𝜆𝛽
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 (𝜖𝑖 + 𝜃𝑡+1 ).
1 − 𝜆𝛽𝐿̃ −1 𝑡+1
Recall that we have already set 𝑘𝑖 = 𝐾 𝑖 at the appropriate point in the argument, namely, after having derived the
first-order necessary conditions for a representative firm in industry 𝑖.
Thus, under perfect foresight the equilibrium capital stock in industry 𝑖 satisfies
∞
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 + ∑(𝜆𝛽)
= 𝜆𝑘 ̃ 𝑗 (𝜖𝑖 + 𝜃𝑡+𝑗 ). (34.10)
𝑡+𝑗
𝑗=1
Next, we shall investigate consequences of replacing future values of (𝜖𝑖𝑡+𝑗 + 𝜃𝑡+𝑗 ) in equation (34.10) with alternative
forecasting schemes.
In particular, we shall compute equilibrium laws of motion for capital under alternative assumptions about information
available to firms in market 𝑖.
2 See [Sargent, 1987], especially chapters IX and XIV, for principles that guide solving some roots backwards and others forwards.
3 As noted by [Sargent, 1987], this difference equation is the Euler equation for a planning problem that maximizes the discounted sum of consumer
plus producer surplus.
If future 𝜃’s are unknown at 𝑡, it is appropriate to replace all random variables on the right side of (34.10) with their
conditional expectations based on the information available to decision makers in market 𝑖.
For now, we assume that this information set is 𝐼𝑡𝑝 = [𝜃𝑡 𝜖𝑖𝑡 ], where 𝑧 𝑡 represents the semi-infinite history of variable
𝑧𝑠 up to time 𝑡.
Later we shall give firms less information.
To obtain an appropriate counterpart to (34.10) under our current assumption about information, we apply a certainty
equivalence principle.
In particular, it is appropriate to take (34.10) and replace each term (𝜖𝑖𝑡+𝑗 +𝜃𝑡+𝑗 ) on the right side with 𝐸[(𝜖𝑖𝑡+𝑗 +𝜃𝑡+𝑗 )|𝜃𝑡 ].
After using (34.3) and the i.i.d. assumption about {𝜖𝑖𝑡 }, this gives
̃
𝜆𝛽𝜌
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 𝜃𝑡
̃
1 − 𝜆𝛽𝜌
or
𝑖 ̃ 𝑡𝑖 + 𝜌
𝑘𝑡+1 = 𝜆𝑘 𝜃 (34.11)
𝜆−𝜌 𝑡
where 𝜆 ≡ (𝛽 𝜆)̃ −1 .
For our purposes, it is convenient to represent the equilibrium {𝑘𝑡𝑖 }𝑡 process recursively as
1
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 𝜃̂
𝜆 − 𝜌 𝑡+1
̂ = 𝜌𝜃 (34.12)
𝜃𝑡+1 𝑡
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡 .
34.5.1 Filtering
We get closer to the original Townsend model that interests us by now assuming that firms in market 𝑖 do not observe 𝜃𝑡 .
Instead they observe a history 𝑤𝑡 of noisy signals at time 𝑡.
In particular, assume that
𝑤𝑡 = 𝜃𝑡 + 𝑒𝑡
(34.13)
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡
where 𝑒𝑡 and 𝑣𝑡 are mutually independent i.i.d. Gaussian shock processes with means of zero and variances 𝜎𝑒2 and 𝜎𝑣2 ,
respectively.
Define
̂ = 𝐸(𝜃 |𝑤𝑡 )
𝜃𝑡+1 𝑡+1
where 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 ] denotes the history of the 𝑤𝑠 process up to and including 𝑡.
Associated with the state-space representation (34.13) is the time-invariant innovations representation
̂ = 𝜌𝜃 ̂ + 𝜅𝑎
𝜃𝑡+1 𝑡 𝑡
(34.14)
̂
𝑤 =𝜃 +𝑎
𝑡 𝑡 𝑡
where 𝑎𝑡 ≡ 𝑤𝑡 − 𝐸(𝑤𝑡 |𝑤𝑡−1 ) is the innovations process in 𝑤𝑡 and the Kalman gain 𝜅 is
𝜌𝑝
𝜅= (34.15)
𝑝 + 𝜎𝑒2
𝑝𝜌2 𝜎𝑒2
𝑝 = 𝜎𝑣2 + . (34.16)
𝜎𝑒2 + 𝑝
State-reconstruction error
𝜃𝑡̃ = 𝜃𝑡 − 𝜃𝑡̂ .
Then 𝑝 = 𝐸 𝜃𝑡2̃ .
Equations (34.13) and (34.14) imply
̃ = (𝜌 − 𝜅)𝜃 ̃ + 𝑣 − 𝑘𝑒 .
𝜃𝑡+1 (34.17)
𝑡 𝑡 𝑡
̂ as
Notice that we can express 𝜃𝑡+1
̃ .
where the first term in braces equals 𝜃𝑡+1 and the second term in braces equals −𝜃𝑡+1
We can express (34.11) as
𝑖 ̃ 𝑡𝑖 + 1
𝑘𝑡+1 = 𝜆𝑘 𝐸𝜃 |𝜃𝑡 . (34.19)
𝜆 − 𝜌 𝑡+1
An application of a certainty equivalence principle asserts that when only 𝑤𝑡 is observed, a corresponding equilibrium
{𝑘𝑡𝑖 } process can be found by replacing the information set 𝜃𝑡 with 𝑤𝑡 in (34.19).
Making this substitution and using (34.18) leads to
𝑖 ̃ 𝑡𝑖 + 𝜌 𝜅 𝜌−𝜅 ̃
𝑘𝑡+1 = 𝜆𝑘 𝜃𝑡 + 𝑒𝑡 − 𝜃. (34.20)
𝜆−𝜌 𝜆−𝜌 𝜆−𝜌 𝑡
Simplifying equation (34.18), we also have
̂ = 𝜌𝜃 + 𝜅𝑒 − (𝜌 − 𝜅)𝜃 ̃ .
𝜃𝑡+1 (34.21)
𝑡 𝑡 𝑡
Relative to (34.11), the equilibrium acquires a new state variable, namely, the 𝜃–reconstruction error, 𝜃𝑡̃ .
For a subsequent argument, by using (34.15), it is convenient to write (34.20) as
𝑖 ̃ 𝑡𝑖 + 𝜌 1 𝑝𝜌 1 𝜌𝜎𝑒2 ̃
𝑘𝑡+1 = 𝜆𝑘 𝜃𝑡 + 𝑒𝑡 − 𝜃 (34.22)
𝜆−𝜌 𝜆 − 𝜌 𝑝 + 𝜎𝑒2 𝜆 − 𝜌 𝑝 + 𝜎𝑒2 𝑡
In summary, when decision makers in market 𝑖 observe a semi-infinite history 𝑤𝑡 of noisy signals 𝑤𝑡 on 𝜃𝑡 at 𝑡, we an
equilibrium law of motion for 𝑘𝑡𝑖 can be represented as
1
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 𝜃̂
𝜆 − 𝜌 𝑡+1
̂ 𝜌𝑝 𝜌𝜎𝑒2 ̃
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑒𝑡 − 𝜃
𝑝 + 𝜎𝑒2 𝑝 + 𝜎𝑒2 𝑡 (34.23)
̃ 𝜌𝜎𝑒2 ̃ 𝑝𝜌
𝜃𝑡+1 = 𝜃 − 𝑒 + 𝑣𝑡
𝑝 + 𝜎𝑒2 𝑡 𝑝 + 𝜎𝑒2 𝑡
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡 .
We now construct a pooling equilibrium by assuming that at time 𝑡 a firm in industry 𝑖 receives a vector 𝑤𝑡 of two noisy
signals on 𝜃𝑡 :
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡
1 𝑒
𝑤𝑡 = [ ] 𝜃𝑡 + [ 1𝑡 ]
1 𝑒2𝑡
𝑒 𝜖1
[ 1𝑡 ] = [ 𝑡2 ]
𝑒2𝑡 𝜖𝑡
so that a firm in industry 𝑖 observes the noisy signals on that 𝜃𝑡 presented to firms in both industries 𝑖 and −𝑖.
The pertinent innovations representation now becomes
̂ = 𝜌𝜃 ̂ + 𝜅𝑎
𝜃𝑡+1 𝑡 𝑡
1 (34.24)
𝑤𝑡 = [ ] 𝜃𝑡̂ + 𝑎𝑡
1
where 𝑎𝑡 ≡ 𝑤𝑡 − 𝐸[𝑤𝑡 |𝑤𝑡−1 ] is a (2 × 1) vector of innovations in 𝑤𝑡 and 𝜅 is now a (1 × 2) vector of Kalman gains.
Formulas for the Kalman filter imply that
𝜌𝑝
𝜅= [1 1] (34.25)
2𝑝 + 𝜎𝑒2
𝑝𝜌2 𝜎𝑒2
𝑝 = 𝜎𝑣2 + . (34.26)
2𝑝 + 𝜎𝑒2
Thus, when a representative firm in industry 𝑖 observes two noisy signals on 𝜃𝑡 , we can express the equilibrium law of
motion for capital recursively as
1
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 𝜃̂
𝜆 − 𝜌 𝑡+1
̂ 𝜌𝑝 𝜌𝜎𝑒2
𝜃𝑡+1 = 𝜌𝜃𝑡 + (𝑒1𝑡 + 𝑒2𝑡 ) − 𝜃̃
2𝑝 + 𝜎𝑒2 2𝑝 + 𝜎𝑒2 𝑡 (34.27)
̃ 𝜌𝜎𝑒2 𝑝𝜌
𝜃𝑡+1 = 𝜃̃ − (𝑒 + 𝑒2𝑡 ) + 𝑣𝑡
2𝑝 + 𝜎𝑒2 𝑡 2𝑝 + 𝜎𝑒2 1𝑡
𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝑣𝑡 .
Below, by using a guess-and-verify tactic, we shall show that outcomes in this pooling equilibrium equal those in an
equilibrium under the alternative information structure that interested Townsend [Townsend, 1983] but that originally
seemed too challenging to compute.4
As a preliminary step we shall take our recursive representation (34.23) of an equilibrium in industry 𝑖 with one noisy
signal on 𝜃𝑡 and perform the following steps:
• Compute 𝜆 and 𝜆̃ by posing a root-finding problem and solving it with numpy.roots
• Compute 𝑝 by forming the appropriate discrete Riccati equation and then solving it using quantecon.
solve_discrete_riccati
• Add a measurement equation for 𝑃𝑡𝑖 = 𝑏𝑘𝑡𝑖 + 𝜃𝑡 + 𝑒𝑡 , 𝜃𝑡 + 𝑒𝑡 , and 𝑒𝑡 to system (34.23).
• Write the resulting system in state-space form and encode it using quantecon.LinearStateSpace
• Use methods of the quantecon.LinearStateSpace to compute impulse response functions of 𝑘𝑡𝑖 with
respect to shocks 𝑣𝑡 , 𝑒𝑡 .
After analyzing the one-noisy-signal structure in this way, by making appropriate modifications we shall analyze the
two-noisy-signal structure.
We proceed to analyze first the one-noisy-signal structure and then the two-noisy-signal structure.
4 [Pearlman and Sargent, 2005] verify the same claim by applying machinery of [Pearlman et al., 1986].
Note that:
𝐴= [ 𝜌 ]
√
𝐵= [ 2 ]
𝑅= [ 𝜎𝑒2 ]
𝑄= [ 𝜎𝑣2 ]
𝑁= [ 0 ]
𝑒𝑡+1 0 0 0 0 0 0 𝑒𝑡
⎡ 𝜅 −1 𝜅𝜎𝑒2 ⎤ 𝜎𝑒 0
⎡ 𝑘𝑖 ⎤ ̃ 𝜌
⎡ 𝑘𝑖 ⎤ ⎡ 0
⎢ 𝑡+1 ⎥ ⎢ 𝜆−𝜌 𝜆 𝜆−𝜌 2𝑝 0 𝜆−𝜌 0 ⎥
⎢ 𝑡̃ ⎥ ⎢
0 ⎤
̃ ⎢ −𝜅 0 𝜅𝜎𝑒 ⎥
⎢ 𝜃𝑡+1 ⎥ = 0 0 1 ⎥ ⎢ 𝜃𝑡 ⎥ + ⎢ 0 0 ⎥ 𝑧1,𝑡+1
⎢ 𝑏𝜅 𝑝
⎥ [ ]
⎢ 𝑃𝑡+1 ⎥ ⎢ 𝜆−𝜌 𝑏𝜆̃ 𝜆−𝜌 𝑝
−𝑏 𝜅𝜎𝑒
2
𝑏𝜌
0 𝜆−𝜌 + 𝜌 1 ⎥ ⎢ 𝑃𝑡 ⎥ ⎢ 𝜎𝑒 0 ⎥ 𝑧2,𝑡+1
⎢ 𝜃𝑡+1 ⎥ ⎢ 0 ⎢ 𝜃𝑡 ⎥ ⎢ 0 0 ⎥
0 0 0 𝜌 1 ⎥
𝑣𝑡+1⏟⏟
⎣⏟⏟
⏟ ⎦ ⎣
⏟ 𝑣 𝑡 ⎦ ⎣ 0
⏟⏟⏟⏟⏟ 𝜎 𝑣 ⎦
⎣ 0 0 0
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 0 0 0 ⎦
𝑥𝑡+1 𝑥𝑡 𝐶
𝐴
𝑒𝑡
⎡ 𝑘𝑖 ⎤
𝑃𝑡 0 0 0 1 0 0 ⎢ 𝑡 ⎥ 0
⎡ 𝑒 + 𝜃 ⎤ = ⎡ 1 0 0 0 1 0 ⎤ ⎢ 𝜃𝑡̃ ⎥ + ⎡ 0 ⎤ 𝑤
⎢ 𝑡 𝑡 ⎥ ⎢ ⎥⎢ 𝑃 ⎥ ⎢ ⎥ 𝑡+1
⎣ 𝑒𝑡
⏟⏟⏟⏟⏟ ⎣ 1 0 0 0 0 0 ⎦ ⎢ 𝜃𝑡 ⎥ ⎣
⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⏟ 0 ⎦
𝑡
𝑦𝑡 𝐺 𝐻
⎣
⏟ 𝑣𝑡 ⎦
𝑥𝑡
𝑧1,𝑡+1
⎡ 𝑧 ⎤
⎢ 2,𝑡+1 ⎥ ∼ 𝒩 (0, 𝐼)
⎣ 𝑤𝑡+1 ⎦
𝜌𝑝
𝜅=
𝑝 + 𝜎𝑒2
This representation includes extraneous variables such as 𝑃𝑡 in the state vector.
We formulate things in this way because it allows us easily to compute covariances of these variables with other
components of the state vector (step 5 above) by using the stationary_distributions method of the Lin-
earStateSpace class.
import numpy as np
import quantecon as qe
import plotly.graph_objects as go
import plotly.offline as pyo
from statsmodels.regression.linear_model import OLS
from IPython.display import display, Latex, Image
pyo.init_notebook_mode(connected=True)
# Compute λ
poly = np.array([1, -(1 + β + b) / β, 1 / β])
roots_poly = np.roots(poly)
λ_tilde = roots_poly.min()
λ = roots_poly.max()
True
A_ricc = np.array([[ρ]])
B_ricc = np.array([[1.]])
R_ricc = np.array([[σ_e ** 2]])
Q_ricc = np.array([[σ_v ** 2]])
N_ricc = np.zeros((1, 1))
p = qe.solve_discrete_riccati(A_ricc, B_ricc, Q_ricc, R_ricc, N_ricc).item()
True
κ = ρ * p / (p + σ_e ** 2)
κ_prod = κ * σ_e ** 2 / p
ts_length = 100_000
x, y = lss.simulate(ts_length, random_state=1)
True
To compute impulse response functions of 𝑘𝑡𝑖 , we use the impulse_response method of the quantecon.
LinearStateSpace class and plot outcomes.
where Σ11 is the covariance matrix of dependent variables and Σ22 is the covariance matrix of independent variables.
Regression coefficients are 𝛽 = Σ21 Σ−1
22 .
To verify an instance of a law of large numbers computation, we construct a long simulation of the state vector and for the
resulting sample compute the ordinary least-squares estimator of 𝛽 that we shall compare with corresponding population
regression coefficients.
Σ_11 = Σ_x[0, 0]
Σ_12 = Σ_x[0, 1:4]
Σ_21 = Σ_x[1:4, 0]
Σ_22 = Σ_x[1:4, 1:4]
(continues on next page)
# Compute R squared
R_squared = reg_coeffs @ Σ_x[1:4, 1:4] @ reg_coeffs / Σ_x[0, 0]
R_squared
0.9649461170475461
# Verify that the computed coefficients are close to least squares estimates
model = OLS(x[0], x[1:4].T)
reg_res = model.fit()
np.max(np.abs(reg_coeffs - reg_res.params)) < 1e-2
True
True
True
0 0 0 0 0 0 0 0
𝑒1,𝑡+1 ⎡ 0 𝑒1,𝑡 𝜎𝑒 0 0
⎡ 𝑒 ⎤ 0 0 0 0 0 0 0 ⎤ ⎡ 𝑒 ⎤ ⎡ 0 𝜎
⎢ 𝜅 2 ⎥ 0 ⎤
⎢ 2,𝑡+1
𝑖 ⎥ ⎢ 𝜆−𝜌 𝜆−𝜌 𝜆̃ 𝜆−𝜌 𝑝 𝑒 0 0
𝜅 −1 𝜅𝜎 𝜌
0 ⎥
2,𝑡
⎢ 𝑖 ⎥ ⎢ 𝑒
⎥
𝑘
⎢ 𝑡+1 ⎥ 𝜆−𝜌 𝑘
⎢ 𝑡 ⎥ ⎢ 0 0 0 ⎥
⎢ −𝜅 −𝜅 0 𝑧
̃
⎢ 𝜃𝑡+1 ⎥
𝜅𝜎𝑒2
0 0 0 1 ⎥ ⎢ 𝜃𝑡̃ ⎥ ⎢ 0 0 0 ⎥ ⎡ 1,𝑡+1 ⎤
⎢ 𝑃1 ⎥ = ⎢ 𝑝 ⎥ +
⎢ 𝑃1 ⎥ ⎢ 𝜎 𝑧
0 ⎥ ⎢ 2,𝑡+1 ⎥
2
⎢ 𝑡+1 ⎥
𝑏𝜅
⎢ 𝜆−𝜌 𝑏𝜅
𝜆−𝜌 𝑏𝜆̃ 𝜆−𝜌
−𝑏 𝜅𝜎𝑒
𝑝
𝑏𝜌
0 0 𝜆−𝜌 +𝜌 1 ⎥ ⎢ 𝑡2 ⎥ ⎢ 𝑒
0
⎥ ⎣ 𝑧3,𝑡+1 ⎦
2
⎢ 𝑃𝑡+1 ⎥ ⎢ 𝑏𝜅 −𝑏 𝜅𝜎𝑒2 ⎥ ⎢ 𝑃𝑡 ⎥ ⎢ 0 𝜎 𝑒 0 ⎥
⎢ 𝜆−𝜌 𝜆−𝜌 𝑏𝜆̃ 𝜆−𝜌 𝑝
𝑏𝜅 𝑏𝜌
0 0 𝜆−𝜌 +𝜌 1 ⎥
⎢ 𝜃𝑡+1 ⎥ ⎢ 𝜃𝑡 ⎥ ⎢ 0 0 0 ⎥
⎢ 0 0 0 0 0 0 𝜌 1 ⎥
⎣⏟⏟
⏟ 𝑣𝑡+1⏟⏟⎦ ⎣
⏟ ⎣ 0 0 𝜎𝑣 ⎦
𝑣𝑡 ⎦ ⏟⏟⏟⏟⏟⏟⏟
⎣ 0 0 0 0
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 0 0 0 0 ⎦
𝑥𝑡+1 𝑥𝑡 𝐶
𝐴
𝑒1,𝑡
𝑃𝑡1 0 0 0 0 1 0 0 0 ⎡ 𝑒 ⎤ 0
⎡ ⎤ ⎡ 0 0 0 0 0 1 0 0 ⎤ ⎢ 2,𝑡
𝑖 ⎥ ⎡ 0 ⎤
𝑃𝑡2 ⎢ 𝑘𝑡 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 𝑒1,𝑡 + 𝜃𝑡 ⎥ = ⎢ 1 0 0 0 0 0 1 0 ⎥ ⎢ 𝜃𝑡̃ ⎥ ⎢ 0 ⎥ 𝑤𝑡+1
⎢ 𝑒2,𝑡 + 𝜃𝑡 ⎥ ⎢ 0 1 0 0 0 0 1 0 ⎥ ⎢ 𝑃1 ⎥+ ⎢ 0 ⎥
⎢ 1 0 0 0 0 0 0 0 ⎥ ⎢ 𝑡2 ⎥ ⎢ 0 ⎥
⎢ 𝑒1,𝑡 ⎥ ⎢ 𝑃𝑡 ⎥
⎣⏟⏟⏟
⏟ 𝑒2,𝑡⏟⏟⏟ ⎣ 0 1 0 0 0 0 0 0 ⎦
⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⎢ 𝜃𝑡 ⎥ ⎣
⏟ 0 ⎦
𝑦𝑡 𝐺 ⎣
⏟ 𝑣𝑡 ⎦ 𝐻
𝑥𝑡
𝑧1,𝑡+1
⎡ 𝑧 ⎤
⎢ 2,𝑡+1 ⎥ ∼ 𝒩 (0, 𝐼)
⎢ 𝑧3,𝑡+1 ⎥
⎣ 𝑤𝑡+1 ⎦
𝜌𝑝
𝜅=
2𝑝 + 𝜎𝑒2
A_ricc = np.array([[ρ]])
B_ricc = np.array([[np.sqrt(2)]])
R_ricc = np.array([[σ_e ** 2]])
Q_ricc = np.array([[σ_v ** 2]])
N_ricc = np.zeros((1, 1))
p = qe.solve_discrete_riccati(A_ricc, B_ricc, Q_ricc, R_ricc, N_ricc).item()
True
κ = ρ * p / (2 * p + σ_e ** 2)
κ_prod = κ * σ_e ** 2 / p
ts_length = 100_000
x, y = lss.simulate(ts_length, random_state=1)
data = np.array([xcoef])[0, :, 2, :]
Σ_11 = Σ_x[1, 1]
Σ_12 = Σ_x[1, 2:5]
Σ_21 = Σ_x[2:5, 1]
Σ_22 = Σ_x[2:5, 2:5]
# Compute R squared
(continues on next page)
0.0
# Verify that the computed coefficients are close to least squares estimates
model = OLS(x[1], x[2:5].T)
reg_res = model.fit()
np.max(np.abs(reg_coeffs - reg_res.params)) < 1e-2
True
True
Σ_11 = Σ_x[1, 1]
Σ_12 = Σ_x[1, 2:6]
Σ_21 = Σ_x[2:6, 1]
Σ_22 = Σ_x[2:6, 2:6]
print('------------------------------')
print(r'k_t:', reg_coeffs[0])
print(r'\tilde{\theta_t}:', reg_coeffs[1])
print(r'P^{1}_t:', reg_coeffs[2])
print(r'P^{2}_t:', reg_coeffs[3])
# Compute R squared
R_squared = reg_coeffs @ Σ_x[2:6, 2:6] @ reg_coeffs / Σ_x[1, 1]
R_squared
0.9621171983721837
Now we come to the key step for verifying that equilibrium outcomes for prices and quantities are identical in the pooling
equilibrium original model that led Townsend to deduce an infinite-dimensional state space.
We accomplish this by computing a population linear least squares regression of the noisy signal that firms in the other
industry receive in a pooling equilibrium on time 𝑡 information that a firm would receive in Townsend’s original model.
Let’s compute the regression and stare at the 𝑅2 :
True
reg_res.rsquared
1.0
For purposes of comparison, it is useful to construct a model in which demand disturbance in both industries still both
share have a common persistent component 𝜃𝑡 , but in which the persistent component 𝜃 is observed each period.
In this case, firms share the same information immediately and have no need to deploy signal-extraction techniques.
Thus, consider a version of our model in which histories of both 𝜖𝑖𝑡 and 𝜃𝑡 are observed by a representative firm.
In this case, the firm’s optimal decision rule is described by
1
𝑖
𝑘𝑡+1 ̃ 𝑡𝑖 +
= 𝜆𝑘 𝜃̂
𝜆 − 𝜌 𝑡+1
̂ =𝐸𝜃
where 𝜃𝑡+1 𝑡 𝑡+1 is given by
̂ = 𝜌𝜃
𝜃𝑡+1 𝑡
𝑖 ̃ 𝑡𝑖 + 𝜌
𝑘𝑡+1 = 𝜆𝑘 𝜃
𝜆−𝜌 𝑡
Consequently, when a history 𝜃𝑠 , 𝑠 ≤ 𝑡 is observed without noise, the following state space system prevails:
𝜃𝑡+1 𝜌 0 𝜃𝑡 𝜎
[ 𝑖 ]=[ 𝜌 ] [ ] + [ 𝑣 ] 𝑧1,𝑡+1
𝑘𝑡+1 𝜆−𝜌 𝜆̃ 𝑘𝑡𝑖 0
𝜃 1 0 𝜃𝑡 0
[ 𝑡𝑖 ] = [ ] [ ] + [ ] 𝑧1,𝑡+1
𝑘𝑡 0 1 𝑘𝑡𝑖 0
In order once again to use the quantecon class quantecon.LinearStateSpace, let’s form pertinent state-space
matrices
Go_lss = np.identity(2)
Now let’s form and plot an impulse response function of 𝑘𝑡𝑖 to shocks 𝑣𝑡 to 𝜃𝑡+1
It is enlightening side by side to plot impulse response functions for capital for the two noisy-signal information structures
and the noiseless signal on 𝜃 that we have just presented.
Please remember that the two-signal structure corresponds to the pooling equilibrium and also Townsend’s original
model.
fig_comb = go.Figure(data=[
*fig1.data,
*fig2.update_traces(xaxis='x2', yaxis='y2').data,
*fig3.update_traces(xaxis='x3', yaxis='y3').data
]).set_subplots(1, 3,
subplot_titles=("One noisy-signal",
"Two noisy-signal",
"No Noise"),
horizontal_spacing=0.02,
shared_yaxes=True)
# Export to PNG file
Image(fig_comb.to_image(format="png"))
# fig_comb.show() # will provide interactive plot when running
# notebook locally
𝑇 𝑤𝑜 − 𝑛𝑜𝑖𝑠𝑒𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒 ∶ 0.324062
display(Latex('$\\textbf{Kalman Gains}$'))
display(Latex(f'One noisy-signal structure: {round(κ_one, 6)}'))
display(Latex(f'Two noisy-signals structure: {round(κ_two, 6)}'))
Kalman Gains
Another lesson that comes from the preceding three-panel graph is that the presence of iid noise 𝜖𝑖𝑡 in industry 𝑖 generates
a response in 𝑘𝑡−𝑖 in the two-noisy-signal structure, but not in the one-noisy-signal structure.
To truncate what he saw as an intractable, infinite dimensional state space, Townsend constructed an approximating model
in which the common hidden Markov demand shock is revealed to all firms after a fixed number of periods.
Thus,
• Townsend wanted to assume that at time 𝑡 firms in industry 𝑖 observe 𝑘𝑡𝑖 , 𝑌𝑡𝑖 , 𝑃𝑡𝑖 , (𝑃 −𝑖 )𝑡 , where (𝑃 −𝑖 )𝑡 is the
history of prices in the other market up to time 𝑡.
• Because that turned out to be too challenging, Townsend made a sensible alternative assumption that eased his
calculations: that after a large number 𝑆 of periods, firms in industry 𝑖 observe the hidden Markov component of
the demand shock 𝜃𝑡−𝑆 .
Townsend argued that the more manageable model could do a good job of approximating the intractable model in which
the Markov component of the demand shock remains unobserved for ever.
By applying technical machinery of [Pearlman et al., 1986], [Pearlman and Sargent, 2005] showed that there is a recursive
representation of the equilibrium of the perpetually and symmetrically uninformed model that Townsend wanted to solve
[Townsend, 1983].
A reader of [Pearlman and Sargent, 2005] will notice that their representation of the equilibrium of Townsend’s model
exactly matches that of the pooling equilibrium presented here.
We have structured our notation in this lecture to faciliate comparison of the pooling equilibrium constructed here with
the equilibrium of Townsend’s model reported in [Pearlman and Sargent, 2005].
The computational method of [Pearlman and Sargent, 2005] is recursive: it enlists the Kalman filter and invariant subspace
methods for solving systems of Euler equations5 .
5 See [Anderson et al., 1996] for an account of invariant subspace methods.
As [Singleton, 1987], [Kasa, 2000], and [Sargent, 1991] also found, the equilibrium is fully revealing: observed prices
tell participants in industry 𝑖 all of the information held by participants in market −𝑖 (−𝑖 means not 𝑖).
This means that higher-order beliefs play no role: observing equilibrium prices in effect lets decision makers pool their
information sets6 .
The disappearance of higher order beliefs means that decision makers in this model do not really face a problem of
forecasting the forecasts of others.
Because those forecasts are the same as their own, they know them.
Sargent [Sargent, 1991] proposed a way to compute an equilibrium without making Townsend’s approximation.
Extending the reasoning of [Muth, 1960], Sargent noticed that it is possible to summarize the relevant history with a low
dimensional object, namely, a small number of current and lagged forecasting errors.
Positing an equilibrium in a space of perceived laws of motion for endogenous variables that takes the form of a vector
autoregressive, moving average, Sargent described an equilibrium as a fixed point of a mapping from the perceived law
of motion to the actual law of motion of that form.
Sargent worked in the time domain and proceeded to guess and verify the appropriate orders of the autoregressive and
moving average pieces of the equilibrium representation.
By working in the frequency domain [Kasa, 2000] showed how to discover the appropriate orders of the autoregressive
and moving average parts, and also how to compute an equilibrium.
The [Pearlman and Sargent, 2005] recursive computational method, which stays in the time domain, also discovered
appropriate orders of the autoregressive and moving average pieces.
In addition, by displaying equilibrium representations in the form of [Pearlman et al., 1986], [Pearlman and Sargent,
2005] showed how the moving average piece is linked to the innovation process of the hidden persistent component of
the demand shock.
That scalar innovation process is the additional state variable contributed by the problem of extracting a signal from
equilibrium prices that decision makers face in Townsend’s model.
6 See [Allen et al., 2002] for a discussion of information assumptions needed to create a situation in which higher order beliefs appear in equilibrium
decision rules. A way to read our findings in light of [Allen et al., 2002] is that, relative to the number of signals agents observe, Townsend’s section 8
model has too few random shocks to get higher order beliefs to play a role.
647
CHAPTER
THIRTYFIVE
35.1 Overview
import numpy as np
from numba import njit, prange
from scipy.stats import lognorm
import matplotlib.pyplot as plt
Lucas studied a pure exchange economy with a representative consumer (or household), where
• Pure exchange means that all endowments are exogenous.
• Representative consumer means that either
– there is a single consumer (sometimes also referred to as a household), or
– all consumers have identical endowments and preferences
Either way, the assumption of a representative agent means that prices adjust to eradicate desires to trade.
This makes it very easy to compute competitive equilibrium prices.
649
Advanced Quantitative Economics with Python
Assets
There is a single “productive unit” that costlessly generates a sequence of consumption goods {𝑦𝑡 }∞
𝑡=0 .
We will assume that this endowment is Markovian, following the exogenous process
Consumers
A representative consumer ranks consumption streams {𝑐𝑡 } according to the time separable utility functional
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (35.1)
𝑡=0
Here
• 𝛽 ∈ (0, 1) is a fixed discount factor.
• 𝑢 is a strictly increasing, strictly concave, continuously differentiable period utility function.
• 𝔼 is a mathematical expectation.
650 Chapter 35. Asset Pricing II: The Lucas Asset Pricing Model
Advanced Quantitative Economics with Python
𝑐𝑡 + 𝜋𝑡+1 𝑝𝑡 ≤ 𝜋𝑡 𝑦𝑡 + 𝜋𝑡 𝑝𝑡
subject to
We can invoke the fact that utility is increasing to claim equality in (35.2) and hence eliminate the constraint, obtaining
𝑣(𝜋, 𝑦) = max
′
{𝑢[𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ 𝑝(𝑦)] + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)} (35.3)
𝜋
The solution to this dynamic programming problem is an optimal policy expressing either 𝜋′ or 𝑐 as a function of the
state (𝜋, 𝑦).
• Each one determines the other, since 𝑐(𝜋, 𝑦) = 𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ (𝜋, 𝑦)𝑝(𝑦)
Next Steps
Equilibrium Constraints
Since the consumption good is not storable, in equilibrium we must have 𝑐𝑡 = 𝑦𝑡 for all 𝑡.
In addition, since there is one representative consumer (alternatively, since all consumers are identical), there should be
no trade in equilibrium.
In particular, the representative consumer owns the whole tree in every period, so 𝜋𝑡 = 1 for all 𝑡.
Prices must adjust to satisfy these two constraints.
Now observe that the first-order condition for (35.3) can be written as
Next, we impose the equilibrium constraints while combining the last two equations to get
𝑢′ [𝐺(𝑦, 𝑧)]
𝑝(𝑦) = 𝛽 ∫ [𝐺(𝑦, 𝑧) + 𝑝(𝐺(𝑦, 𝑧))]𝜙(𝑑𝑧) (35.4)
𝑢′ (𝑦)
In sequential rather than functional notation, we can also write this as
𝑢′ (𝑐𝑡+1 )
𝑝𝑡 = 𝔼𝑡 [𝛽 (𝑦 + 𝑝𝑡+1 )] (35.5)
𝑢′ (𝑐𝑡 ) 𝑡+1
This is the famous consumption-based asset pricing equation.
Before discussing it further we want to solve out for prices.
Instead of solving for it directly we’ll follow Lucas’ indirect approach, first setting
Here ℎ(𝑦) ∶= 𝛽 ∫ 𝑢′ [𝐺(𝑦, 𝑧)]𝐺(𝑦, 𝑧)𝜙(𝑑𝑧) is a function that depends only on the primitives.
652 Chapter 35. Asset Pricing II: The Lucas Asset Pricing Model
Advanced Quantitative Economics with Python
Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the next section
≤ 𝛽 ∫ ‖𝑓 − 𝑔‖𝜙(𝑑𝑧)
= 𝛽‖𝑓 − 𝑔‖
Since the right-hand side is an upper bound, taking the sup over all 𝑦 on the left-hand side gives (35.9) with 𝛼 ∶= 𝛽.
The preceding discussion tells that we can compute 𝑓 ∗ by picking any arbitrary 𝑓 ∈ 𝑐𝑏ℝ+ and then iterating with 𝑇 .
The equilibrium price function 𝑝∗ can then be recovered by 𝑝∗ (𝑦) = 𝑓 ∗ (𝑦)/𝑢′ (𝑦).
Let’s try this when ln 𝑦𝑡+1 = 𝛼 ln 𝑦𝑡 + 𝜎𝜖𝑡+1 where {𝜖𝑡 } is IID and standard normal.
Utility will take the isoelastic form 𝑢(𝑐) = 𝑐1−𝛾 /(1 − 𝛾), where 𝛾 > 0 is the coefficient of relative risk aversion.
We will set up a LucasTree class to hold parameters of the model
class LucasTree:
"""
Class to store parameters of the Lucas tree model.
"""
def __init__(self,
γ=2, # CRRA utility parameter
β=0.95, # Discount factor
α=0.90, # Correlation coefficient
σ=0.1, # Volatility coefficient
grid_size=100):
self.h = np.empty(self.grid_size)
for i, y in enumerate(self.grid):
self.h[i] = β * np.mean((y**α * self.draws)**(1 - γ))
The following function takes an instance of the LucasTree and generates a jitted version of the Lucas operator
"""
Returns approximate Lucas operator, which computes and returns the
updated function Tf on the grid points.
"""
654 Chapter 35. Asset Pricing II: The Lucas Asset Pricing Model
Advanced Quantitative Economics with Python
Tf = np.empty_like(f)
# Apply the T operator to f using Monte Carlo integration
for i in prange(len(grid)):
y = grid[i]
Tf[i] = h[i] + β * np.mean(Af(y**α * z_vec))
return Tf
return T
To solve the model, we write a function that iterates using the Lucas operator to find the fixed point.
"""
# Simplify notation
grid, grid_size = tree.grid, tree.grid_size
γ = tree.γ
T = operator_factory(tree)
i = 0
f = np.ones_like(grid) # Initial guess of f
error = tol + 1
while error > tol and i < max_iter:
Tf = T(f)
error = np.max(np.abs(Tf - f))
f = Tf
i += 1
return price
tree = LucasTree()
price_vals = solve_model(tree)
We see that the price is increasing, even if we remove all serial correlation from the endowment process.
The reason is that a larger current endowment reduces current marginal utility.
The price must therefore rise to induce the household to consume the entire endowment (and hence satisfy the resource
constraint).
What happens with a more patient consumer?
Here the orange line corresponds to the previous parameters and the green line is price when 𝛽 = 0.98.
We see that when consumers are more patient the asset becomes more valuable, and the price of the Lucas tree shifts up.
Exercise 1 asks you to replicate this figure.
35.3 Exercises
Exercise 35.3.1
Replicate the figure to show how discount factors affect prices.
656 Chapter 35. Asset Pricing II: The Lucas Asset Pricing Model
Advanced Quantitative Economics with Python
ax.legend(loc='upper left')
ax.set(xlabel='$y$', ylabel='price', xlim=(min(grid), max(grid)))
plt.show()
658 Chapter 35. Asset Pricing II: The Lucas Asset Pricing Model
CHAPTER
THIRTYSIX
36.1 Overview
This lecture is about some implications of asset-pricing theories that are based on the equation 𝐸𝑚𝑅 = 1, where 𝑅 is
the gross return on an asset, 𝑚 is a stochastic discount factor, and 𝐸 is a mathematical expectation with respect to a joint
probability distribution of 𝑅 and 𝑚.
Instances of this equation occur in many models.
Note: Chapter 1 of [Ljungqvist and Sargent, 2018] describes the role that this equation plays in a diverse set of models
in macroeconomics, monetary economics, and public finance.
We aim to convey insights about empirical implications of this equation brought out in the work of Lars Peter Hansen
[Hansen and Richard, 1987] and Lars Peter Hansen and Ravi Jagannathan [Hansen and Jagannathan, 1991].
By following their footsteps, from that single equation we’ll derive
• a mean-variance frontier
• a single-factor model of excess returns
To do this, we use two ideas:
• the equation 𝐸𝑚𝑅 = 1 that is implied by an application of a law of one price
• a Cauchy-Schwartz inequality
In particular, we’ll apply a Cauchy-Schwartz inequality to a population linear least squares regression equation that is
implied by 𝐸𝑚𝑅 = 1.
We’ll also describe how practitioners have implemented the model using
• cross sections of returns on many assets
• time series of returns on various assets
For background and basic concepts about linear least squares projections, see our lecture orthogonal projections and their
applications.
As a sequel to the material here, please see our lecture two modifications of mean-variance portfolio theory.
659
Advanced Quantitative Economics with Python
𝐸𝑚𝑅𝑖 = 1 (36.1)
The random gross return 𝑅𝑖 for every asset 𝑖 and the scalar stochastic discount factor 𝑚 live in a common probability
space.
[Hansen and Richard, 1987] and [Hansen and Jagannathan, 1991] explain how existence of a scalar stochastic discount
factor that verifies equation (36.1) is implied by a law of one price that requires that all portfolios of assets that bring the
same payouts have the same price.
They also explain how the absence of an arbitrage opportunity implies that the stochastic discount factor 𝑚 ≥ 0.
In order to say something about the uniqueness of a stochastic discount factor, we would have to impose more theoretical
structure than we do in this lecture.
For example, in complete markets models like those illustrated in this lecture equilibrium capital structures with incom-
plete markets, the stochastic discount factor is unique.
In incomplete markets models like those illustrated in this lecture the Aiyagari model, the stochastic discount factor is
not unique.
We combine key equation (36.1) with a remark of Lars Peter Hansen that “asset pricing theory is all about covariances”.
Note: Lars Hansen’s remark is a concise summary of ideas in [Hansen and Richard, 1987] and [Hansen and Jagannathan,
1991]. Important foundations of these ideas were set down by [Ross, 1976], [Ross, 1978], [Harrison and Kreps, 1979],
[Kreps, 1981], and [Chamberlain and Rothschild, 1983].
This remark of Lars Hansen refers to the fact that interesting restrictions can be deduced by recognizing that 𝐸𝑚𝑅𝑖 is a
component of the covariance between 𝑚 and 𝑅𝑖 and then using that fact to rearrange equation (36.1).
Let’s do this step by step.
First note that the definition of a covariance cov (𝑚, 𝑅𝑖 ) = 𝐸(𝑚 − 𝐸𝑚)(𝑅𝑖 − 𝐸𝑅𝑖 ) implies that
Next note that for a risk-free asset with non-random gross return 𝑅𝑓 , equation (36.1) becomes
1 = 𝐸𝑅𝑓 𝑚 = 𝑅𝑓 𝐸𝑚.
This is true because we can pull the constant 𝑅𝑓 outside the mathematical expectation.
It follows that the gross return on a risk-free asset is
𝑅𝑓 = 1/𝐸(𝑚)
Using this formula for 𝑅𝑓 in equation (36.2) and rearranging, it follows that
It follows that we can express an excess return 𝐸𝑅𝑖 − 𝑅𝑓 on asset 𝑖 relative to the risk-free rate as
Equation (36.3) can be rearranged to display important parts of asset pricing theory.
We can obtain the celebrated expected-return-Beta -representation for gross return 𝑅𝑖 by simply rearranging excess
return equation (36.3) to become
⎛
⎜ cov (𝑅𝑖 , 𝑚) ⎞
⎟ ⎛ var(𝑚) ⎞
⎜ ⎟
𝐸𝑅𝑖 = 𝑅𝑓 + ⎜ ⎟ ⎜ − ⎟
⎜
⎜ ⏟⏟var(𝑚)
⏟⏟⏟ ⎟⎜
⎟ ⎜ 𝐸(𝑚)
⏟⏟⏟ ⏟⏟
⎟
⎟
⎝ 𝛽𝑖,𝑚 =regression coefficient⎠ ⎝ 𝜆𝑚 =price of risk⎠
or
Here
• 𝛽𝑖,𝑚 is a (population) least squares regression coefficient of gross return 𝑅𝑖 on stochastic discount factor 𝑚
• 𝜆𝑚 is minus the variance of 𝑚 divided by the mean of 𝑚, an object that is sometimes called a price of risk.
Because 𝜆𝑚 < 0, equation (36.4) asserts that
• assets whose returns are positively correlated with the stochastic discount factor (SDF) 𝑚 have expected returns
lower than the risk-free rate 𝑅𝑓
• assets whose returns are negatively correlated with the SDF 𝑚 have expected returns higher than the risk-free
rate 𝑅𝑓
These patterns will be discussed more below.
In particular, we’ll see that returns that are perfectly negatively correlated with the SDF 𝑚 have a special status:
• they are on a mean-variance frontier
Before we dive into that more, we’ll pause to look at an example of an SDF.
To interpret representation (36.4), the following widely used example helps.
Example
Let 𝑐𝑡 be the logarithm of the consumption of a representative consumer or just a single consumer for whom we have
consumption data.
A popular model of 𝑚 is
𝑈 ′ (𝐶𝑡+1 )
𝑚𝑡+1 = 𝛽
𝑈 ′ (𝐶𝑡 )
where 𝐶𝑡 is consumption at time 𝑡, 𝛽 = exp(−𝜌) is a discount factor with 𝜌 being the discount rate, and 𝑈 (⋅) is a
concave, twice-diffential utility function.
𝐶 1−𝛾
For a constant relative risk aversion (CRRA) utility function 𝑈 (𝐶) = 1−𝛾 utility function 𝑈 ′ (𝐶) = 𝐶 −𝛾 .
In this case, letting 𝑐𝑡 = log(𝐶𝑡 ), we can write 𝑚𝑡+1 as
𝑐𝑡+1 − 𝑐𝑡 = 𝜇 + 𝜎𝑐 𝜖𝑡+1
In this case
𝜎𝑐2 𝛾 2
𝐸𝑚𝑡+1 = exp(−𝜌) exp (−𝛾𝜇 + )
2
and
Note: Methods of Hansen and Richard are described and used extensively by [Cochrane, 2005].
Their idea was rearrange the key equation (36.1), namely, 𝐸𝑚𝑅𝑖 = 1, and then to apply a Cauchy-Schwarz inequality.
A convenient way to remember the Cauchy-Schwartz inequality in our context is that it says that an 𝑅2 in any regression
has to be less than or equal to 1.
(Please note that here 𝑅2 denotes the coefficient of determination in a regression, not a return on an asset!)
Let’s apply that idea to deduce
cov (𝑚, 𝑅𝑖 )
𝜌𝑚,𝑅𝑖 ≡
𝜎(𝑚)𝜎 (𝑅𝑖 )
and where 𝜎(⋅) denotes the standard deviation of the variable in parentheses
Equation (36.5) implies
𝜎(𝑚)
𝐸𝑅𝑖 = 𝑅𝑓 − 𝜌𝑚,𝑅𝑖 𝜎 (𝑅𝑖 )
𝐸(𝑚)
𝜎(𝑚)
∣𝐸𝑅𝑖 − 𝑅𝑓 ∣ ⩽ 𝜎 (𝑅𝑖 ) (36.6)
𝐸(𝑚)
sigmam = .25
Em = .99
# Plot y
ax.plot(x, y_values, label=r'$R^f + \frac{\sigma(m)}{E(m)} \sigma(R^i)$')
ax.plot(x, z_values, label=r'$R^f - \frac{\sigma(m)}{E(m)} \sigma(R^i)$')
The figure shows two straight lines, the blue upper one being the locus of (𝜎(𝑅𝑖 ), 𝐸(𝑅𝑖 ) pairs that are on the mean-
variance frontier or mean-standard-deviation frontier.
The green dot refers to a return 𝑅𝑗 that is not on the frontier and that has moments (𝜎(𝑅𝑗 ), 𝐸𝑅𝑗 ) = (.05, 1.015).
It is described by the statistical model
𝑅 𝑗 = 𝑅 𝑖 + 𝜖𝑗
where 𝑅𝑖 is a return that is on the frontier and 𝜖𝑗 is a random variable that has mean zero and that is orthogonal to 𝑅𝑖 .
Then 𝐸𝑅𝑗 = 𝐸𝑅𝑖 and, as a consequence of 𝑅𝑗 not being on the frontier,
The length of a horizontal line from the point 𝜎(𝑅𝑗 ), 𝐸(𝑅𝑗 ) = .05, 1.015 to the frontier equals
This is a measure of the part of the risk in 𝑅𝑗 that is not priced because it is uncorrelated with the stochastic discount
factor and so can be diversified away (i.e., averaged out to zero by holding a diversified portfolio).
𝐸(𝑅𝑖 ) − 𝑅𝑓
𝜎(𝑅𝑖 )
The above figure reminds us that all assets 𝑅𝑖 whose returns are on the mean-standard deviation frontier satisfy
𝐸(𝑅𝑖 ) − 𝑅𝑓 𝜎(𝑚)
=
𝜎(𝑅𝑖 ) 𝐸𝑚
𝜎(𝑚)
The ratio 𝐸𝑚 is often called the market price of risk.
Evidently it equals the maximum Sharpe ratio for any asset or portfolio of assets.
The mathematical structure of the mean-variance frontier described by inequality (36.6) implies that
• all returns on the frontier are perfectly correlated.
Thus,
– Let 𝑅𝑚 , 𝑅𝑚𝑣 be two returns on the frontier.
– Then for some scalar 𝑎, a return 𝑅𝑚𝑣 on the mean-variance frontier satisfies the affine equation 𝑅𝑚𝑣 =
𝑅𝑓 + 𝑎 (𝑅𝑚 − 𝑅𝑓 ) . This is an exact equation with no residual.
• each return 𝑅𝑚𝑣 that is on the mean-variance frontier is perfectly (negatively) correlated with 𝑚
𝑚 = 𝑎 + 𝑏𝑅𝑚𝑣
– (𝜌𝑚,𝑅𝑚𝑣 = −1) ⇒ { for some scalars 𝑎, 𝑏, 𝑒, 𝑑,
𝑅𝑚𝑣 = 𝑒 + 𝑑𝑚
Therefore, any return on the mean-variance frontier is a legitimate stochastic discount factor
• for any mean-variance-efficient return 𝑅𝑚𝑣 that is on the frontier but that is not 𝑅𝑓 , there exists a single-beta
representation for any return 𝑅𝑖 that takes the form:
𝐸𝑅𝑖 = 𝑅𝑓 + 𝛽𝑖,𝑅𝑚𝑣 [𝐸 (𝑅𝑚𝑣 ) − 𝑅𝑓 ] (36.7)
• the regression coefficient 𝛽𝑖,𝑅𝑚𝑣 is often called asset 𝑖’s beta
• The special case of a single-beta representation (36.7) with 𝑅𝑖 = 𝑅𝑚𝑣 is
𝐸𝑅𝑚𝑣 = 𝑅𝑓 + 1 ⋅ [𝐸 (𝑅𝑚𝑣 ) − 𝑅𝑓 ]
where 𝜆𝑗 is the price of being exposed to risk factor 𝑓𝑡𝑗 and 𝛽𝑖,𝑗 is asset 𝑖’s exposure to that risk factor.
To uncover the 𝛽𝑖,𝑗 ’s, one takes data on time series of the risk factors 𝑓𝑡𝑗 that are being priced and specifies the following
least squares regression
We briefly describe empirical implementations of multi-factor generalizations of the single-factor model described above.
Two representations of a multi-factor model play importnt roles in empirical applications.
One is the time series regression (36.8)
The other representation entails a cross-section regression of average returns 𝐸𝑅𝑖 for assets 𝑖 = 1, 2, … , 𝐼 on prices
of risk 𝜆𝑗 for 𝑗 = 𝑎, 𝑏, 𝑐, …
Here is the cross-section regression specification for a multi-factor model:
Testing strategies:
Time-series and cross-section regressions play roles in both estimating and testing beta representation models.
The basic idea is to implement the following two steps.
Step 1:
• Estimate 𝑎𝑖 , 𝛽𝑖,𝑎 , 𝛽𝑖,𝑏 , ⋯ by running a time series regression: 𝑅𝑡𝑖 on a constant and 𝑓𝑡𝑎 , 𝑓𝑡𝑏 , …
Step 2:
• take the 𝛽𝑖,𝑗 ’s estimated in step one as regressors together with data on average returns 𝐸𝑅𝑖 over some period and
then estimate the cross-section regression
𝐸
⏟ (𝑅𝑖 ) =𝛾+ 𝛽
⏟ 𝑖,𝑎 𝜆
⏟𝑎 + 𝛽⏟ 𝑖,𝑏 𝜆⏟𝑏 +⋯ + 𝛼⏟𝑖 , 𝑖 = 1, … , 𝐼; 𝛼
⏟⏟𝑖 ⟂⏟𝛽⏟⏟⏟
𝑖,𝑗 , 𝑗 = 𝑎,
⏟⏟⏟𝑏, …
average return over time series regressor regressioncoefficient regressor regressioncoefficient pricing errors least squares orthogonality conditio
In the following exercises, we illustrate aspects of these empirical strategies on artificial data.
Our basic tools are random number generator that we shall use to create artificial samples that conform to the theory and
least squares regressions that let us watch aspects of the theory at work.
These exercises will further convince us that asset pricing theory is mostly about covariances and least squares regressions.
36.10 Exercises
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
Lots of our calculations will involve computing population and sample OLS regressions.
So we define a function for simple univariate OLS regression that calls the OLS routine from statsmodels.
if constant:
X = sm.add_constant(X)
model = sm.OLS(Y, X)
res = model.fit()
β_hat = res.params[-1]
σ_hat = np.sqrt(res.resid @ res.resid / res.df_resid)
Exercise 36.10.1
Look at the equation,
𝐸 [𝜖𝑖,𝑡 ] = 0, 𝐸 [𝜖𝑖,𝑡 𝑢𝑡 ] = 0
It follows that
𝐸 [𝜎𝑖 𝜖𝑖,𝑡 (𝑅𝑡𝑚 − 𝑅𝑓 )] = 𝐸 [𝜎𝑖 𝜖𝑖,𝑡 (𝜉 + 𝜆𝑢𝑡 )]
= 𝜎𝑖 𝜉𝐸 [𝜖𝑖,𝑡 ] + 𝜎𝑖 𝜆𝐸 [𝜖𝑖,𝑡 𝑢𝑡 ]
=0
Exercise 36.10.2
Give a formula for the regression coefficient 𝛽𝑖,𝑅𝑚 .
Exercise 36.10.3
As in many sciences, it is useful to distinguish a direct problem from an inverse problem.
• A direct problem involves simulating a particular model with known parameter values.
• An inverse problem involves using data to estimate or choose a particular parameter vector from a manifold of
models indexed by a set of parameter vectors.
Please assume the parameter values provided below and then simulate 2000 observations from the theory specified above
for 5 assets, 𝑖 = 1, … , 5.
𝐸 [𝑅𝑓 ] = 0.02
𝜎𝑓 = 0.00
𝜉 = 0.06
𝜆 = 0.04
𝛽1,𝑅𝑚 = 0.2
𝜎1 = 0.04
𝛽2,𝑅𝑚 = .4
𝜎2 = 0.04
𝛽3,𝑅𝑚 = .6
𝜎3 = 0.04
𝛽4,𝑅𝑚 = .8
𝜎4 = 0.04
𝛽5,𝑅𝑚 = 1.0
𝜎5 = 0.04
More Exercises
Now come some even more fun parts!
Our theory implies that there exist values of two scalars, 𝑎 and 𝑏, such that a legitimate stochastic discount factor is:
𝑚𝑡 = 𝑎 + 𝑏𝑅𝑡𝑚
Now that we have a panel of data, we’d like to solve the inverse problem by assuming the theory specified above and
estimating the coefficients given above.
Inverse Problem:
We will solve the inverse problem by simple OLS regressions.
1. estimate 𝐸 [𝑅𝑓 ] and 𝜎𝑓
ERf_hat, σf_hat
(0.020000000000000046, 4.5114090308141905e-17)
ERf, σf
(0.02, 0.0)
2. 𝜉 and 𝜆
ξ_hat, λ_hat
(0.060225944676975, 0.07779632562028074)
ξ, λ
(0.06, 0.08)
3. 𝛽𝑖,𝑅𝑚 and 𝜎𝑖
βi_hat = np.empty(N)
σi_hat = np.empty(N)
for i in range(N):
βi_hat[i], σi_hat[i] = simple_ols(Rm - Rf, Ri[i, :] - Rf)
βi_hat, σi_hat
βi, σi
(array([0.2, 0.4, 0.6, 0.8, 1. ]), array([0.04, 0.04, 0.04, 0.04, 0.04]))
Exercise 36.10.4
Using the equations above, find a system of two linear equations that you can solve for 𝑎 and 𝑏 as functions of the
parameters (𝜆, 𝜉, 𝐸[𝑅𝑓 ]).
Write a function that can solve these equations.
Please check the condition number of a key matrix that must be inverted to determine a, b
# Code here
def solve_ab(ERf, σf, λ, ξ):
M = np.empty((2, 2))
M[0, 0] = ERf + ξ
M[0, 1] = (ERf + ξ) ** 2 + λ ** 2 + σf ** 2
M[1, 0] = ERf
M[1, 1] = ERf ** 2 + ξ * ERf + σf ** 2
a, b = np.linalg.solve(M, np.ones(2))
condM = np.linalg.cond(M)
return a, b, condM
a, b, condM
Exercise 36.10.5
Using the estimates of the parameters that you generated above, compute the implied stochastic discount factor.
THIRTYSEVEN
37.1 Overview
This lecture describes extensions to the classical mean-variance portfolio theory summarized in our lecture Elementary
Asset Pricing Theory.
The classic theory described there assumes that a decision maker completely trusts the statistical model that he posits to
govern the joint distribution of returns on a list of available assets.
Both extensions described here put distrust of that statistical model into the mind of the decision maker.
One is a model of Black and Litterman [Black and Litterman, 1992] that imputes to the decision maker distrust of
historically estimated mean returns but still complete trust of estimated covariances of returns.
The second model also imputes to the decision maker doubts about his statistical model, but now by saying that, because
of that distrust, the decision maker uses a version of robust control theory described in this lecture Robustness.
The famous Black-Litterman (1992) [Black and Litterman, 1992] portfolio choice model was motivated by the finding
that with high frequency or moderately high frequency data, means are more difficult to estimate than variances.
A model of robust portfolio choice that we’ll describe below also begins from the same starting point.
To begin, we’ll take for granted that means are more difficult to estimate that covariances and will focus on how Black and
Litterman, on the one hand, an robust control theorists, on the other, would recommend modifying the mean-variance
portfolio choice model to take that into account.
At the end of this lecture, we shall use some rates of convergence results and some simulations to verify how means are
more difficult to estimate than variances.
Among the ideas in play in this lecture will be
• Mean-variance portfolio theory
• Bayesian approaches to estimating linear regressions
• A risk-sensitivity operator and its connection to robust control theory
In summary, we’ll describe two ways to modify the classic mean-variance portfolio choice model in ways designed to
make its recommendations more plausible.
Both of the adjustments that we describe are designed to confront a widely recognized embarrassment to mean-variance
portfolio theory, namely, that it usually implies taking very extreme long-short portfolio positions.
The two approaches build on a common and widespread hunch – that because it is much easier statistically to estimate
covariances of excess returns than it is to estimate their means, it makes sense to adjust investors’ subjective beliefs about
mean returns in order to render more plausible decisions.
Let’s start with some imports:
673
Advanced Quantitative Economics with Python
import numpy as np
import scipy.stats as stat
import matplotlib.pyplot as plt
from numba import jit
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇, Σ)
or
𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖
𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) ∼ 𝒩(𝑤′ 𝜇, 𝑤′ Σ𝑤)
𝛿
𝑈 (𝜇, Σ; 𝑤) = 𝑤′ 𝜇 − 𝑤′ Σ𝑤 (37.1)
2
where 𝛿 > 0 is a risk-aversion parameter. The first-order condition for maximizing (37.1) with respect to the vector 𝑤 is
𝜇 = 𝛿Σ𝑤
𝑤 = (𝛿Σ)−1 𝜇 (37.2)
The key inputs into the portfolio choice model (37.2) are
• estimates of the parameters 𝜇, Σ of the random excess return vector(𝑟 ⃗ − 𝑟𝑓 1)
• the risk-aversion parameter 𝛿
A standard way of estimating 𝜇 is maximum-likelihood or least squares; that amounts to estimating 𝜇 by a sample mean
of excess returns and estimating Σ by a sample covariance matrix.
When estimates of 𝜇 and Σ from historical sample means and covariances have been combined with plausible values of
the risk-aversion parameter 𝛿 to compute an optimal portfolio from formula (37.2), a typical outcome has been 𝑤’s with
extreme long and short positions.
A common reaction to these outcomes is that they are so implausible that a portfolio manager cannot recommend them
to a customer.
np.random.seed(12)
N = 10 # Number of assets
T = 200 # Sample size
# Estimate μ and Σ
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)
𝑤𝑚 = (𝛿Σ)−1 𝜇𝐵𝐿
37.5 Details
Let’s define
′
𝑤𝑚 𝜇 ≡ (𝑟𝑚 − 𝑟𝑓 )
𝜎 2 = 𝑤𝑚
′
Σ𝑤𝑚
Define
𝑟𝑚 − 𝑟𝑓
SR𝑚 =
𝜎
as the Sharpe-ratio on the market portfolio 𝑤𝑚 .
Let 𝛿𝑚 be the value of the risk aversion parameter that induces an investor to hold the market portfolio in light of the
optimal portfolio choice rule (37.2).
Evidently, portfolio rule (37.2) then implies that 𝑟𝑚 − 𝑟𝑓 = 𝛿𝑚 𝜎2 or
𝑟𝑚 − 𝑟𝑓
𝛿𝑚 =
𝜎2
or
SR𝑚
𝛿𝑚 =
𝜎
Following the Black-Litterman philosophy, our first step will be to back a value of 𝛿𝑚 from
• an estimate of the Sharpe-ratio, and
• our maximum likelihood estimate of 𝜎 drawn from our estimates or 𝑤𝑚 and Σ
The second key Black-Litterman step is then to use this value of 𝛿 together with the maximum likelihood estimate of Σ
to deduce a 𝜇BL that verifies portfolio rule (37.2) at the market portfolio 𝑤 = 𝑤𝑚
𝜇𝑚 = 𝛿𝑚 Σ𝑤𝑚
The starting point of the Black-Litterman portfolio choice model is thus a pair (𝛿𝑚 , 𝜇𝑚 ) that tells the customer to hold
the market portfolio.
# Sharpe-ratio
sr_m = r_m / np.sqrt(σ_m)
x = np.arange(N) + 1
fig, ax = plt.subplots(figsize=(8, 5))
ax.set_title(r'Difference between $\hat{\mu}$ (estimate) and $\mu_{BL}$ (market␣
↪implied)')
Black and Litterman start with a baseline customer who asserts that he or she shares the market’s views, which means
that he or she believes that excess returns are governed by
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇𝐵𝐿 , Σ) (37.3)
Black and Litterman would advise that customer to hold the market portfolio of risky securities.
Black and Litterman then imagine a consumer who would like to express a view that differs from the market’s.
The consumer wants appropriately to mix his view with the market’s before using (37.2) to choose a portfolio.
Suppose that the customer’s view is expressed by a hunch that rather than (37.3), excess returns are governed by
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇,̂ 𝜏 Σ)
where 𝜏 > 0 is a scalar parameter that determines how the decision maker wants to mix his view 𝜇̂ with the market’s
view 𝜇BL .
Black and Litterman would then use a formula like the following one to mix the views 𝜇̂ and 𝜇BL
Black and Litterman would then advise the customer to hold the portfolio associated with these views implied by rule
(37.2):
𝑤̃ = (𝛿Σ)−1 𝜇̃
This portfolio 𝑤̃ will deviate from the portfolio 𝑤𝐵𝐿 in amounts that depend on the mixing parameter 𝜏 .
If 𝜇̂ is the maximum likelihood estimator and 𝜏 is chosen heavily to weight this view, then the customer’s portfolio will
involve big short-long positions.
τ = 1
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
def BL_plot(τ):
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
w_tilde = np.linalg.solve(δ * Σ_est, μ_tilde)
BL_plot(τ)
𝜇 ∼ 𝒩(𝜇𝐵𝐿 , Σ)
Given a particular realization of the mean excess returns 𝜇 one observes the average excess returns 𝜇̂ on the market
according to the distribution
𝜇̂ ∣ 𝜇, Σ ∼ 𝒩(𝜇, 𝜏 Σ)
where 𝜏 is typically small capturing the idea that the variation in the mean is smaller than the variation of the individual
random variable.
Given the realized excess returns one should then update the prior over the mean excess returns according to Bayes rule.
The corresponding posterior over mean excess returns is normally distributed with mean
Hence, the Black-Litterman recommendation is consistent with the Bayes update of the prior over the mean excess returns
in light of the realized average excess returns on the market.
𝑟𝑒⃗ ∼ 𝒩(𝜇𝐵𝐿 , Σ)
and
𝑟𝑒⃗ ∼ 𝒩(𝜇,̂ 𝜏 Σ)
A special feature of the multivariate normal random variable 𝑍 is that its density function depends only on the (Euclidiean)
length of its realization 𝑧.
Formally, let the 𝑘-dimensional random vector be
𝑍 ∼ 𝒩(𝜇, Σ)
then
𝑍 ̄ ≡ Σ(𝑍 − 𝜇) ∼ 𝒩(0, 𝐼)
and so the points where the density takes the same value can be described by the ellipse
This property is called spherical symmetry (see p 81. in Leamer (1978) [Leamer, 1978]).
In our specific example, we can use the pair (𝑑1̄ , 𝑑2̄ ) as being two “likelihood” values for which the corresponding iso-
likelihood ellipses in the excess return space are given by
Notice that for particular 𝑑1̄ and 𝑑2̄ values the two ellipses have a tangency point.
These tangency points, indexed by the pairs (𝑑1̄ , 𝑑2̄ ), characterize points 𝑟𝑒⃗ from which there exists no deviation where
one can increase the likelihood of one view without decreasing the likelihood of the other view.
The pairs (𝑑1̄ , 𝑑2̄ ) for which there is such a point outlines a curve in the excess return space. This curve is reminiscent of
the Pareto curve in an Edgeworth-box setting.
Dickey (1975) [Dickey, 1975] calls it a curve decolletage.
Leamer (1978) [Leamer, 1978] calls it an information contract curve and describes it by the following program: maximize
the likelihood of one view, say the Black-Litterman recommendation while keeping the likelihood of the other view at
least at a prespecified constant 𝑑2̄
𝑟𝑒⃗ = (Σ−1 + 𝜆(𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + 𝜆(𝜏 Σ)−1 𝜇)̂ (37.6)
Note that if 𝜆 = 1, (37.6) is equivalent with (37.4) and it identifies one point on the information contract curve.
Furthermore, because 𝜆 is a function of the minimum likelihood 𝑑2̄ on the RHS of the constraint, by varying 𝑑2̄ (or 𝜆 ),
we can trace out the whole curve as the figure below illustrates.
np.random.seed(1987102)
N = 2 # Number of assets
T = 200 # Sample size
τ = 0.8
μ = (np.random.randn(N) + 5) / 100
S = np.random.randn(N, N)
V = S @ S.T
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m)
excess_return = stat.multivariate_normal(μ, Σ)
sample = excess_return.rvs(T)
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)
λ = 1
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * Σ_est)
X, Y = np.meshgrid(r1, r2)
XY = np.stack((X, Y), axis=-1)
Z_BL = dist_r_BL.pdf(XY)
Z_hat = dist_r_hat.pdf(XY)
decolletage(λ)
Note that the line that connects the two points 𝜇̂ and 𝜇𝐵𝐿 is linear, which comes from the fact that the covariance matrices
of the two competing distributions (views) are proportional to each other.
To illustrate the fact that this is not necessarily the case, consider another example using the same parameter values,
except that the “second view” constituting the constraint has covariance matrix 𝜏 𝐼 instead of 𝜏 Σ.
This leads to the following figure, on which the curve connecting 𝜇̂ and 𝜇𝐵𝐿 are bending
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * np.eye(N))
X, Y = np.meshgrid(r1, r2)
XY = np.stack((X, Y), axis=-1)
Z_BL = dist_r_BL.pdf(XY)
Z_hat = dist_r_hat.pdf(XY)
decolletage(λ)
̂
𝛽𝑂𝐿𝑆 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
From this decomposition, one can see that in order for the MSE to be small, both the bias and the variance terms must
be small.
̂
For example, consider the case when 𝑋 is a 𝑇 -vector of ones (where 𝑇 is the sample size), so 𝛽𝑂𝐿𝑆 is simply the sample
average, while 𝛽0 ∈ ℝ is defined by the true mean of 𝑦.
In this example the MSE is
2
𝑇
̂ 1
mse(𝛽𝑂𝐿𝑆 , 𝛽0 ) = 2 𝔼 (∑(𝑦𝑡 − 𝛽0 )) + 0⏟
𝑇
⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑡=1 bias
variance
However, because there is a trade-off between the estimator’s bias and variance, there are cases when by permitting a
small bias we can substantially reduce the variance so overall the MSE gets smaller.
A typical scenario when this proves to be useful is when the number of coefficients to be estimated is large relative to the
sample size.
In these cases, one approach to handle the bias-variance trade-off is the so called Tikhonov regularization.
A general form with regularization matrix Γ can be written as
̃ 2}
min {‖𝑋𝛽 − 𝑦‖2 + ‖Γ(𝛽 − 𝛽)‖
𝛽
̂
𝛽𝑅𝑒𝑔 = (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑦 + Γ′ Γ𝛽)̃
̂
Substituting the value of 𝛽𝑂𝐿𝑆 yields
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
Often, the regularization matrix takes the form Γ = 𝜆𝐼 with 𝜆 > 0 and 𝛽 ̃ = 0.
Then the Tikhonov regularization is equivalent to what is called ridge regression in statistics.
To illustrate how this estimator addresses the bias-variance trade-off, we compute the MSE of the ridge estimator
2
𝑇 2
̂ ,𝛽 ) = 1 𝜆
mse(𝛽ridge 0 2
𝔼 (∑(𝑦𝑡 − 𝛽0 )) + ( ) 𝛽02
(𝑇 + 𝜆) ⏟⏟𝑇
⏟ +
⏟ 𝜆⏟⏟⏟
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟𝑡=1
bias
variance
The ridge regression shrinks the coefficients of the estimated vector towards zero relative to the OLS estimates thus
reducing the variance term at the cost of introducing a “small” bias.
However, there is nothing special about the zero vector.
When 𝛽 ̃ ≠ 0 shrinkage occurs in the direction of 𝛽.̃
Now, we can give a regularization interpretation of the Black-Litterman portfolio recommendation.
To this end, first simplify the equation (37.4) that characterizes the Black-Litterman recommendation
In our case, 𝜇̂ is the estimated mean excess returns of securities. This could be written as a vector autoregression where
• 𝑦 is the stacked vector of observed excess returns of size (𝑁 𝑇 × 1) – 𝑁 securities and 𝑇 observations.
√
• 𝑋 = 𝑇 −1 (𝐼𝑁 ⊗ 𝜄𝑇 ) where 𝐼𝑁 is the identity matrix and 𝜄𝑇 is a column vector of ones.
Correspondingly, the OLS regression of 𝑦 on 𝑋 would yield the mean excess returns as coefficients.
√
With Γ = 𝜏 𝑇 −1 (𝐼𝑁 ⊗ 𝜄𝑇 ) we can write the regularized version of the mean excess return estimation
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
̂
= (1 + 𝜏 )−1 𝑋 ′ 𝑋(𝑋 ′ 𝑋)−1 (𝛽𝑂𝐿𝑆 + 𝜏 𝛽)̃
̂
= (1 + 𝜏 )−1 (𝛽𝑂𝐿𝑆 + 𝜏 𝛽)̃
̂
= (1 + 𝜏 −1 )−1 (𝜏 −1 𝛽𝑂𝐿𝑆 + 𝛽)̃
̂
Given that 𝛽𝑂𝐿𝑆 = 𝜇̂ and 𝛽 ̃ = 𝜇𝐵𝐿 in the Black-Litterman model, we have the following interpretation of the model’s
recommendation.
The estimated (personal) view of the mean excess returns, 𝜇̂ that would lead to extreme short-long positions are “shrunk”
towards the conservative market view, 𝜇𝐵𝐿 , that leads to the more conservative market portfolio.
So the Black-Litterman procedure results in a recommendation that is a compromise between the conservative market
portfolio and the more extreme portfolio that is implied by estimated “personal” views.
The Black-Litterman approach is partly inspired by the econometric insight that it is easier to estimate covariances of
excess returns than the means.
That is what gave Black and Litterman license to adjust investors’ perception of mean excess returns while not tampering
with the covariance matrix of excess returns.
The robust control theory is another approach that also hinges on adjusting mean excess returns but not covariances.
Associated with a robust control problem is what Hansen and Sargent [Hansen and Sargent, 2001], [Hansen and Sargent,
2008] call a T operator.
Let’s define the T operator as it applies to the problem at hand.
Let 𝑥 be an 𝑛 × 1 Gaussian random vector with mean vector 𝜇 and covariance matrix Σ = 𝐶𝐶 ′ . This means that 𝑥 can
be represented as
𝑥 = 𝜇 + 𝐶𝜖
̃ = 𝑚(𝜖, 𝜇)𝜙(𝜖)
𝜙(𝜖)
The next concept that we need is the entropy of the distorted distribution 𝜙 ̃ with respect to 𝜙.
Entropy is defined as
or
̃
ent = ∫ log 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖
That is, relative entropy is the expected value of the likelihood ratio 𝑚 where the expectation is taken with respect to the
twisted density 𝜙.̃
Relative entropy is non-negative. It is a measure of the discrepancy between two probability distributions.
As such, it plays an important role in governing the behavior of statistical tests designed to discriminate one probability
distribution from another.
−𝑉 (𝜇 + 𝐶𝜖)
= − log 𝜃 ∫ exp ( ) 𝜙(𝜖)𝑑𝜖
𝜃
This asserts that T is an indirect utility function for a minimization problem in which an adversary chooses a distorted
probability distribution 𝜙 ̃ to lower expected utility, subject to a penalty term that gets bigger the larger is relative entropy.
Here the penalty parameter
𝜃 ∈ [𝜃, +∞]
is a robustness parameter when it is +∞, there is no scope for the minimizing agent to distort the distribution, so no
robustness to alternative distributions is acquired.
As 𝜃 is lowered, more robustness is achieved.
1 ′
T(𝑟 ⃗ − 𝑟𝑓 1) = 𝑤′ 𝜇 + 𝜁 − 𝑤 Σ𝑤
2𝜃
and entropy is
𝑣′ 𝑣 1
= 2 𝑤′ 𝐶𝐶 ′ 𝑤
2 2𝜃
According to criterion (37.1), the mean-variance portfolio choice problem chooses 𝑤 to maximize
which equals
𝛿
𝑤′ 𝜇 − 𝑤′ Σ𝑤
2
A robust decision maker can be modeled as replacing the mean return 𝐸[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] with the risk-sensitive criterion
1 ′
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] = 𝑤′ 𝜇 − 𝑤 Σ𝑤
2𝜃
that comes from replacing the mean 𝜇 of 𝑟 ⃗ − 𝑟_𝑓1 with the worst-case mean
𝜇 − 𝜃−1 Σ𝑤
37.12 Appendix
We want to illustrate the “folk theorem” that with high or moderate frequency data, it is more difficult to estimate means
than variances.
In order to operationalize this statement, we take two analog estimators:
𝑁
• sample average: 𝑋̄ 𝑁 = 1
𝑁 ∑𝑖=1 𝑋𝑖
𝑁
• sample variance: 𝑆𝑁 = 1
𝑁−1 ∑𝑡=1 (𝑋𝑖 − 𝑋̄ 𝑁 )2
to estimate the unconditional mean and unconditional variance of the random variable 𝑋, respectively.
To measure the “difficulty of estimation”, we use mean squared error (MSE), that is the average squared difference
between the estimator and the true value.
Assuming that the process {𝑋𝑖 }is ergodic, both analog estimators are known to converge to their true values as the sample
size 𝑁 goes to infinity.
More precisely for all 𝜀 > 0
and
A necessary condition for these convergence results is that the associated MSEs vanish as 𝑁 goes to infinity, or in other
words,
Even if the MSEs converge to zero, the associated rates might be different. Looking at the limit of the relative MSE (as
the sample size grows to infinity)
𝑋𝑖 ∼ 𝒩(𝜇, 𝜎2 )
𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) =
𝑁
Taking 𝑆𝑁 to estimate the variance, the MSE is
2𝜎4
MSE(𝑆𝑁 , 𝜎2 ) =
𝑁 −1
Both estimators are unbiased and hence the MSEs reflect the corresponding variances of the estimators.
Furthermore, both MSEs are 𝑜(1) with a (multiplicative) factor of difference in their rates of convergence:
MSE(𝑆𝑁 , 𝜎2 ) 𝑁 2𝜎2
= → 2𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) 𝑁 −1 𝑁→∞
We are interested in how this (asymptotic) relative rate of convergence changes as increasing sampling frequency puts
dependence into the data.
To investigate how sampling frequency affects relative rates of convergence, we assume that the data are generated by a
mean-reverting continuous time process of the form
where 𝜇is the unconditional mean, 𝜅 > 0 is a persistence parameter, and {𝑊𝑡 } is a standardized Brownian motion.
Observations arising from this system in particular discrete periods 𝒯(ℎ) ≡ {𝑛ℎ ∶ 𝑛 ∈ ℤ}withℎ > 0 can be described
by the following process
where
𝜎2 (1 − exp(−2𝜅ℎ))
𝜖𝑡,ℎ ∼ 𝒩(0, Σℎ ) with Σℎ =
2𝜅
We call ℎ the frequency parameter, whereas 𝑛 represents the number of lags between observations.
Hence, the effective distance between two observations 𝑋𝑡 and 𝑋𝑡+𝑛 in the discrete time notation is equal to ℎ ⋅ 𝑛 in
terms of the underlying continuous time process.
Straightforward calculations show that the autocorrelation function for the stochastic process {𝑋𝑡 }𝑡∈𝒯(ℎ) is
μ = .0
κ = .1
σ = .5
var_uncond = σ**2 / (2 * κ)
Consider again the AR(1) process generated by discrete sampling with frequency ℎ. Assume that we have a sample of
size 𝑁 and we would like to estimate the unconditional mean – in our case the true mean is 𝜇.
Again, the sample average is an unbiased estimator of the unconditional mean
1 𝑁
𝔼[𝑋̄ 𝑁 ] = ∑ 𝔼[𝑋𝑖 ] = 𝔼[𝑋0 ] = 𝜇
𝑁 𝑖=1
1 𝑁
𝕍 (𝑋̄ 𝑁 ) = 𝕍 ( ∑𝑋 )
𝑁 𝑖=1 𝑖
𝑁 𝑁−1 𝑁
1
= (∑ 𝕍(𝑋𝑖 ) + 2 ∑ ∑ cov(𝑋𝑖 , 𝑋𝑠 ))
𝑁 2 𝑖=1 𝑖=1 𝑠=𝑖+1
𝑁−1
1
= (𝑁 𝛾(0) + 2 ∑ 𝑖 ⋅ 𝛾 (ℎ ⋅ (𝑁 − 𝑖)))
𝑁2 𝑖=1
𝑁−1
1 𝜎2 𝜎2
= (𝑁 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖)) )
𝑁2 2𝜅 𝑖=1
2𝜅
It is explicit in the above equation that time dependence in the data inflates the variance of the mean estimator through
the covariance terms.
Moreover, as we can see, a higher sampling frequency—smaller ℎ—makes all the covariance terms larger, everything
else being fixed.
This implies a relatively slower rate of convergence of the sample average for high-frequency data.
Intuitively, stronger dependence across observations for high-frequency data reduces the “information content” of each
observation relative to the IID case.
We can upper bound the variance term in the following way
𝑁−1
1
𝕍(𝑋̄ 𝑁 ) = 2 (𝑁 𝜎2 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖))𝜎2 )
𝑁 𝑖=1
𝑁−1
𝜎2
≤ (1 + 2 ∑ ⋅ exp(−𝜅ℎ(𝑖)))
2𝜅𝑁 𝑖=1
𝜎2 1 − exp(−𝜅ℎ)𝑁−1
= (1 + 2 )
2𝜅𝑁
⏟ 1 − exp(−𝜅ℎ)
IID case
Asymptotically, the term exp(−𝜅ℎ)𝑁−1 vanishes and the dependence in the data inflates the benchmark IID variance by
a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
This long run factor is larger the higher is the frequency (the smaller is ℎ).
Therefore, we expect the asymptotic relative MSEs, 𝐵, to change with time-dependent data. We just saw that the mean
estimator’s rate is roughly changing by a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
Unfortunately, the variance estimator’s MSE is harder to derive.
Nonetheless, we can approximate it by using (large sample) simulations, thus getting an idea about how the asymptotic
relative MSEs changes in the sampling frequency ℎ relative to the IID case that we compute in closed form.
@jit
def sample_generator(h, N, M):
ϕ = (1 - np.exp(-κ * h)) * μ
ρ = np.exp(-κ * h)
s = σ**2 * (1 - np.exp(-2 * κ * h)) / (2 * κ)
mean_uncond = μ
std_uncond = np.sqrt(σ**2 / (2 * κ))
for i in range(N):
y_path[:, i + 1] = ϕ + ρ * y_path[:, i] + ε_path[:, i]
return y_path
var_est_store = []
(continues on next page)
for h in h_grid:
labels.append(h)
sample = sample_generator(h, N_app, M_app)
mean_est_store.append(np.mean(sample, 1))
var_est_store.append(np.var(sample, 1))
var_est_store = np.array(var_est_store)
mean_est_store = np.array(mean_est_store)
The above figure illustrates the relationship between the asymptotic relative MSEs and the sampling frequency
• We can see that with low-frequency data – large values of ℎ – the ratio of asymptotic rates approaches the IID
case.
• As ℎ gets smaller – the higher the frequency – the relative performance of the variance estimator is better in the
sense that the ratio of asymptotic rates gets smaller. That is, as the time dependence gets more pronounced, the
rate of convergence of the mean estimator’s MSE deteriorates more than that of the variance estimator.
THIRTYEIGHT
In addition to what’s in Anaconda, this lecture will need the following libraries:
38.1 Introduction
This is a prolegomenon to another lecture Equilibrium Capital Structures with Incomplete Markets about a model with
incomplete markets authored by Bisin, Clementi, and Gottardi [Bisin et al., 2018].
We adopt specifications of preferences and technologies very close to Bisin, Clemente, and Gottardi’s but unlike them
assume that there are complete markets in one-period Arrow securities.
This simplification of BCG’s setup helps us by
• creating a benchmark economy to compare with outcomes in BCG’s incomplete markets economy
• creating a good guess for initial values of some equilibrium objects to be computed in BCG’s incomplete markets
economy via an iterative algorithm
• illustrating classic complete markets outcomes that include
– indeterminacy of consumers’ portfolio choices
– indeterminacy of firms’ financial structures that underlies a Modigliani-Miller theorem [Modigliani and
Miller, 1958]
• introducing Big K, little k issues in a simple context that will recur in the BCG incomplete markets
environment
A Big K, little k analysis also played roles in this quantecon lecture as well as here and here.
697
Advanced Quantitative Economics with Python
38.1.1 Setup
38.1.2 Endowments
There is a single consumption good in period 0 and at each random state 𝜖 in period 1.
Economy-wide endowments in periods 0 and 1 are
𝑤0
𝑤1 (𝜖) in state 𝜖
Soon we’ll explain how aggregate endowments are divided between type 𝑖 = 1 and type 𝑖 = 2 consumers.
We don’t need to do that in order to describe a social planning problem.
38.1.3 Technology:
38.1.4 Preferences:
A consumer of type 𝑖 orders period 0 consumption 𝑐0𝑖 and state 𝜖, period 1 consumption 𝑐1𝑖 (𝜖) by
38.1.5 Parameterizations
𝜖 ∼ 𝒩(𝜇, 𝜎2 )
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
2 2
𝑤1𝑖 (𝜖) = 𝑒−𝜒𝑖 𝜇−.5𝜒𝑖 𝜎 +𝜒𝑖 𝜖
, 𝜒𝑖 ∈ [0, 1]
Sometimes instead of asuming 𝜖 ∼ 𝑔(𝜖) = 𝒩(0, 𝜎2 ), we’ll assume that 𝑔(⋅) is a probability mass function that serves as
a discrete approximation to a standardized normal density.
obj = 𝜙1 𝑢1 + 𝜙2 𝑢2 , 𝜙𝑖 ≥ 0, 𝜙 1 + 𝜙2 = 1
+ 𝛽 ∫ 𝜆1 (𝜖) [𝑤11 (𝜖) + 𝑤12 (𝜖) + 𝑒𝜖 𝐴𝑘𝛼 − 𝑐11 (𝜖) − 𝑐12 (𝜖)] 𝑔(𝜖)𝑑𝜖
𝑐01 ∶ 𝜙1 𝑢′ (𝑐01 ) − 𝜆0 = 0
𝑐02 ∶ 𝜙2 𝑢′ (𝑐02 ) − 𝜆0 = 0
𝑐11 (𝜖) ∶ 𝜙1 𝛽𝑢′ (𝑐11 (𝜖))𝑔(𝜖) − 𝛽𝜆1 (𝜖)𝑔(𝜖) = 0
𝑐12 (𝜖) ∶ 𝜙2 𝛽𝑢′ (𝑐12 (𝜖))𝑔(𝜖) − 𝛽𝜆1 (𝜖)𝑔(𝜖) = 0
These together with the fifth first-order condition for the planner imply the following equation that determines an optimal
choice of capital
𝑢′ (𝑐1𝑖 (𝜖)) 𝜖
1 = 𝛽𝛼𝐴𝑘𝛼−1 ∫ 𝑒 𝑔(𝜖)𝑑𝜖
𝑢′ (𝑐0𝑖 )
for 𝑖 = 1, 2.
Evidently,
𝑢′ (𝑐) = 𝑐−𝛾
and
−𝛾
𝑢′ (𝑐1 ) 𝑐1 𝜙2
= ( ) =
𝑢′ (𝑐2 ) 𝑐2 𝜙1
where it is to be understood that this equation holds for 𝑐1 = 𝑐01 and 𝑐2 = 𝑐02 and also for 𝑐1 = 𝑐1 (𝜖) and 𝑐2 = 𝑐2 (𝜖) for
all 𝜖.
With the same understanding, it follows that
−𝛾 −1
𝑐1 𝜙
( 2) = ( 2)
𝑐 𝜙1
Let 𝑐 = 𝑐1 + 𝑐2 .
It follows from the preceding equation that
𝑐1 = 𝜂𝑐
𝑐2 = (1 − 𝜂)𝑐
𝐶0 = 𝑤0 − 𝐾
𝐶1 (𝜖) = 𝑤1 (𝜖) + 𝐴𝐾 𝛼 𝑒𝜖
𝑐01 = 𝜂𝐶0
𝑐02 = (1 − 𝜂)𝐶0
𝑐11 (𝜖) = 𝜂𝐶1 (𝜖)
𝑐12 (𝜖) = (1 − 𝜂)𝐶1 (𝜖)
where 𝜂 ∈ [0, 1] is the consumption share parameter mentioned above that is a function of the Pareto weight 𝜙1 and the
utility curvature parameter 𝛾.
Remarks
The relative Pareto weight parameter 𝜂 does not appear in equation (38.1) that determines 𝐾.
Neither does it influence 𝐶0 or 𝐶1 (𝜖), which depend solely on 𝐾.
The role of 𝜂 is to determine how to allocate total consumption between the two types of consumers.
Thus, the planner’s choice of 𝐾 does not interact with how it wants to allocate consumption.
We now describe a competitive equilibrium for an economy that has specifications of consumer preferences, technology,
and aggregate endowments that are identical to those in the preceding planning problem.
While prices do not appear in the planning problem – only quantities do – prices play an important role in a competitive
equilibrium.
To understand how the planning economy is related to a competitive equilibrium, we now turn to the Big K, little
k distinction.
In the same spirit, let 𝜁 ∈ [0, 1] index a particular firm. Then define Big 𝐾 as
1
𝐾 = ∫ 𝑘(𝜁)𝑑 𝜁
0
The assumption that there are continua of our three types of agents plays an important role making each individual agent
into a powerless price taker:
• an individual consumer chooses its own (infinesimal) part 𝑐𝑖 (𝜔) of 𝐶 𝑖 taking prices as given
• an individual firm chooses its own (infinitesmimal) part 𝑘(𝜁) of 𝐾 taking prices as
• equilibrium prices depend on the Big K, Big C objects 𝐾 and 𝐶
Nevertheless, in equilibrium, 𝐾 = 𝑘, 𝐶 𝑖 = 𝑐𝑖
The assumption about measures of agents is thus a powerful device for making a host of competitive agents take as given
equilibrium prices that are determined by the independent decisions of hosts of agents who behave just like they do.
Ownership
Consumers of type 𝑖 own the following exogenous quantities of the consumption good in periods 0 and 1:
𝑤0𝑖 , 𝑖 = 1, 2
𝑖
𝑤1 (𝜖) 𝑖 = 1, 2
where
∑ 𝑤0𝑖 = 𝑤0
𝑖
Consumers also own shares in a firm that operates the technology for converting nonnegative amounts of the time 0
consumption good one-for-one into a capital good 𝑘 that produces 𝐴𝑘𝛼 𝑒𝜖 units of the time 1 consumption good in time
1 state 𝜖.
Consumers of types 𝑖 = 1, 2 are endowed with 𝜃0𝑖 shares of a firm and
𝜃01 + 𝜃02 = 1
Asset markets
At time 0, consumers trade the following assets with other consumers and with firms:
• equities (also known as stocks) issued by firms
• one-period Arrow securities that pay one unit of consumption at time 1 when the shock 𝜖 assumes a particular value
Later, we’ll allow the firm to issue bonds too, but not now.
Let
• 𝑎𝑖 (𝜖) be consumer 𝑖 ’s purchases of claims on time 1 consumption in state 𝜖
• 𝑞(𝜖) be a pricing kernel for one-period Arrow securities
• 𝜃0𝑖 ≥ 0 be consumer 𝑖’s intial share of the firm, ∑𝑖 𝜃0𝑖 = 1
• 𝜃𝑖 be the fraction of a firm’s shares purchased by consumer 𝑖 at time 𝑡 = 0
• 𝑉 be the value of the representative firm
• 𝑉 ̃ be the value of equity issued by the representative firm
• 𝐾, 𝐶0 be two scalars and 𝐶1 (𝜖) a function that we use to construct a guess about an equilibrium pricing kernel for
Arrow securities
We proceed to describe constrained optimum problems faced by consumers and a representative firm in a competitive
equilibrium.
𝑉 ̃ = ∫ 𝐴𝑘𝛼 𝑒𝜖 𝑞(𝜖)𝑑𝜖
𝑉 = −𝑘 + ∫ 𝐴𝑘𝛼 𝑒𝜖 𝑞(𝜖)𝑑𝜖
−1 + 𝛼𝐴𝑘𝛼−1 ∫ 𝑒𝜖 𝑞(𝜖)𝑑𝜖 = 0
𝑉 = −𝑘 + 𝑉 ̃
The right side equals the value of equity minus the cost of the time 0 goods that it purchases and uses as capital.
The quantity −𝑎𝑖̄ (𝜖; 𝜃𝑖 ) is the maximum amount that it is feasible for consumer 𝑖 to repay to his Arrow security creditors
at time 1 in state 𝜖.
Notice that −𝑎𝑖̄ (𝜖; 𝜃𝑖 ) defined in (38.2) depends on
• his endowment 𝑤1𝑖 (𝜖) at time 1 in state 𝜖
• his share 𝜃𝑖 of a representive firm’s dividends
These constitute two sources of collateral that back the consumer’s issues of Arrow securities that pay off in state 𝜖
Consumer 𝑖 chooses a scalar 𝑐0𝑖 and a function 𝑐1𝑖 (𝜖) to maximize
Attach Lagrange multiplier 𝜆𝑖0 to the budget constraint at time 0 and scaled Lagrange multiplier 𝛽𝜆𝑖1 (𝜖)𝑔(𝜖) to the budget
constraint at time 1 and state 𝜖, then form the Lagrangian
Off corners, first-order necessary conditions for an optimum with respect to 𝑐0𝑖 , 𝑐1𝑖 (𝜖), and 𝑎𝑖 (𝜖) are
These equations imply that consumer 𝑖 adjusts its consumption plan to satisfy
𝑢′ (𝑐1𝑖 (𝜖))
𝑞(𝜖) = 𝛽 ( ) 𝑔(𝜖) (38.3)
𝑢′ (𝑐0𝑖 )
To deduce a restriction on equilibrium prices, we solve the period 1 budget constraint to express 𝑎𝑖 (𝜖) as
then substitute the expression on the right side into the time 0 budget constraint and rearrange to get the single intertem-
poral budget constraint
𝑤0𝑖 + 𝜃0𝑖 𝑉 + ∫ 𝑤1𝑖 (𝜖)𝑞(𝜖)𝑑𝜖 + 𝜃𝑖 [𝐴𝑘𝛼 ∫ 𝑒𝜖 𝑞(𝜖)𝑑𝜖 − 𝑉 ̃ ] ≥ 𝑐0𝑖 + ∫ 𝑐1𝑖 (𝜖)𝑞(𝜖)𝑑𝜖 (38.4)
The right side of inequality (38.4) is the present value of consumer 𝑖’s consumption while the left side is the present value
of consumer 𝑖’s endowment when consumer 𝑖 buys 𝜃𝑖 shares of equity.
From inequality (38.4), we deduce two findings.
1. No arbitrage profits condition:
Unless
the consumer could afford an arbitrarily high present value of consumption by setting 𝜃𝑖 to an arbitrarily large negative
number.
If
the consumer could afford an arbitrarily high present value of consumption by setting 𝜃𝑖 to be arbitrarily large positive
number.
Since resources are finite, there can exist no such arbitrage opportunity in a competitive equilibrium.
Therefore, it must be true that the following no arbitrage condition prevails:
Equation (38.6) asserts that the value of equity equals the value of the state-contingent dividends 𝐴𝑘𝛼 𝑒𝜖 evaluated at the
Arrow security prices 𝑞(𝜖; 𝐾) that we have expressed as a function of 𝐾.
We’ll say more about this equation later.
2. Indeterminacy of portfolio
When the no-arbitrage pricing equation (38.6) prevails, a consumer of type 𝑖’s choice 𝜃𝑖 of equity is indeterminate.
Consumer of type 𝑖 can offset any choice of 𝜃𝑖 by setting an appropriate schedule 𝑎𝑖 (𝜖) for purchasing state-contingent
securities.
Having computed an allocation that solves the planning problem, we can readily compute a competitive equilibrium via
the following steps that, as we’ll see, relies heavily on the Big K, little k, Big C, little c logic mentioned
earlier:
• a competitive equilbrium allocation equals the allocation chosen by the planner
• competitive equilibrium prices and the value of a firm’s equity are encoded in shadow prices from the planning
problem that depend on Big 𝐾 and Big 𝐶.
To substantiate that this procedure is valid, we proceed as follows.
With 𝐾 in hand, we make the following guess for competitive equilibrium Arrow securities prices
−𝛾
𝑢′ (𝑤1 (𝜖) + 𝐴𝐾 𝛼 𝑒𝜖 )
𝑞(𝜖; 𝐾) = 𝛽 ( ) (38.7)
𝑢′ (𝑤0 − 𝐾)
To confirm the guess, we begin by considering its consequences for the firm’s choice of 𝑘.
With Arrow securities prices (38.7), the firm’s first-order necessary condition for choosing 𝑘 becomes
𝑘=𝐾
because by setting 𝑘 = 𝐾 equation (38.8) becomes equivalent with the planner’s first-order condition (38.1) for setting
𝐾.
To pose a consumer’s problem in a competitive equilibrium, we require not only the above guess for the Arrow securities
pricing kernel 𝑞(𝜖) but the value of equity 𝑉 ̃ :
Let 𝑉 ̃ be the value of equity implied by Arrow securities price function (38.7) and formula (38.9).
At the Arrow securities prices 𝑞(𝜖) given by (38.7) and equity value 𝑉 ̃ given by (38.9), consumer 𝑖 = 1, 2 choose
consumption allocations and portolios that satisfy the first-order necessary conditions
𝑢′ (𝑐1𝑖 (𝜖))
𝛽( ) 𝑔(𝜖) = 𝑞(𝜖; 𝐾)
𝑢′ (𝑐0𝑖 )
It can be verified directly that the following choices satisfy these equations
𝑐01 + 𝑐02 = 𝐶0 = 𝑤0 − 𝐾
𝑐01 (𝜖) + 𝑐02 (𝜖) = 𝐶1 (𝜖) = 𝑤1 (𝜖) + 𝐴𝑘𝛼 𝑒𝜖
𝑐12 (𝜖) 𝑐02 1−𝜂
= =
𝑐11 (𝜖) 𝑐01 𝜂
for an 𝜂 ∈ (0, 1) that depends on consumers’ endowments [𝑤01 , 𝑤02 , 𝑤11 (𝜖), 𝑤12 (𝜖), 𝜃01 , 𝜃02 ].
Remark: Multiple arrangements of endowments [𝑤01 , 𝑤02 , 𝑤11 (𝜖), 𝑤12 (𝜖), 𝜃01 , 𝜃02 ] associated with the same distribution of
wealth 𝜂. Can you explain why?
𝑉 ̃ + 𝑏𝑝(𝑘, 𝑏)
where 𝑝(𝑘, 𝑏) is the price of one unit of the bond when a firm with 𝑘 units of physical capital issues 𝑏 bonds.
We continue to assume that there are complete markets in Arrow securities with pricing kernel 𝑞(𝜖).
A version of the no-arbitrage-in-equilibrium argument that we presented earlier implies that the value of equity and the
price of bonds are
∞ ∞
𝑉 ̃ = 𝐴𝑘𝛼 ∫ 𝑒𝜖 𝑞(𝜖)𝑑𝜖 − 𝑏 ∫ 𝑞(𝜖)𝑑𝜖
𝜖∗ 𝜖∗
𝜖∗ ∞
𝐴𝑘𝛼
𝑝(𝑘, 𝑏) = ∫ 𝑒𝜖 𝑞(𝜖)𝑑𝜖 + ∫ 𝑞(𝜖)𝑑𝜖
𝑏 −∞ 𝜖∗
which is the same expression that we obtained above when we assumed that the firm issued only equity.
We thus obtain a version of the celebrated Modigliani-Miller theorem [Modigliani and Miller, 1958] about firms’ finance:
Modigliani-Miller theorem:
• The value of a firm is independent the mix of equity and bonds that it uses to finance its physical capital.
• The firms’s decision about how much physical capital to purchase does not depend on whether it finances those
purchases by issuing bonds or equity
• The firm’s choice of whether to finance itself by issuing equity or bonds is indeterminant
Please note the role of the assumption of complete markets in Arrow securities in substantiating these claims.
In Equilibrium Capital Structures with Incomplete Markets, we will assume that markets are (very) incomplete – we’ll shut
down markets in almost all Arrow securities.
That will pull the rug from underneath the Modigliani-Miller theorem.
38.3 Code
We create a class object BCG_complete_markets to compute equilibrium allocations of the complete market BCG
model given a list of parameter values.
It consists of 4 functions that do the following things:
• opt_k computes the planner’s optimal capital 𝐾
– First, create a grid for capital.
– Then for each value of capital stock in the grid, compute the left side of the planner’s first-order necessary
condition for 𝑘, that is,
−𝛾
𝑤 (𝜖) + 𝐴𝐾 𝛼 𝑒𝜖
𝛽𝛼𝐴𝐾 𝛼−1
∫( 1 ) 𝑒𝜖 𝑔(𝜖)𝑑𝜖 − 1 = 0
𝑤0 − 𝐾
where
𝐶0 = 𝑤0 − 𝐾
𝐶1 (𝜖) = 𝑤1 (𝜖) + 𝐴𝐾 𝛼 𝑒𝜖
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from numba import njit, prange
from quantecon.optimize import root_finding
# Other parameters
self.𝜓 = 𝜓
self.𝛼 = 𝛼
self.A = A
self.𝜇 = 𝜇
self.𝜎 = 𝜎
self.𝛽 = 𝛽
# Utility
(continues on next page)
# Production
self.f = njit(lambda k: A * (k ** 𝛼))
self.Y = lambda 𝜖, k: np.exp(𝜖) * self.f(k)
# Initial endowments
self.w10 = w10
self.w20 = w20
self.w0 = w10 + w20
# Initial holdings
self.𝜃10 = 𝜃10
self.𝜃20 = 𝜃20
# Endowments at t=1
w11 = njit(lambda 𝜖: np.exp(-𝜒1*𝜇 - 0.5*(𝜒1**2)*(𝜎**2) + 𝜒1*𝜖))
w21 = njit(lambda 𝜖: np.exp(-𝜒2*𝜇 - 0.5*(𝜒2**2)*(𝜎**2) + 𝜒2*𝜖))
self.w11 = w11
self.w21 = w21
# Normal PDF
self.g = lambda x: norm.pdf(x, loc=𝜇, scale=𝜎)
# Integration
x, self.weights = np.polynomial.hermite.hermgauss(nb_points_integ)
self.points_integral = np.sqrt(2) * 𝜎 * x + 𝜇
self.k_foc = k_foc_factory(self)
# Grid for k
kgrid = np.linspace(1e-4, w0-1e-4, 100)
return kk
if k is None:
k = self.opt_k()
𝜂 = num / denom
def k_foc_factory(model):
𝜓 = model.𝜓
f = model.f
𝛽 = model.𝛽
𝛼 = model.𝛼
A = model.A
𝜓 = model.𝜓
w0 = model.w0
𝜇 = model.𝜇
𝜎 = model.𝜎
weights = model.weights
points_integral = model.points_integral
@njit
def integrand(𝜖, 𝜒1, 𝜒2, k=1e-4):
fk = f(k)
return (w1(𝜖, 𝜒1, 𝜒2) + np.exp(𝜖) * fk) ** (-𝜓) * np.exp(𝜖)
@njit
def k_foc(k, 𝜒1, 𝜒2):
int_k = np.sum(weights * integrand(points_integral, 𝜒1, 𝜒2, k=k)) / np.
↪sqrt(np.pi)
return val
return k_foc
38.3.1 Examples
1st example
Let’s plot the agents’ time-1 endowments with respect to shocks to see the difference in the two models:
fig, ax = plt.subplots(1,2,figsize=(14,6))
ax[0].plot(epsgrid, mdl1.w11(epsgrid), color='black', label=r'Agent 1\'s endowment')
ax[0].plot(epsgrid, mdl1.w21(epsgrid), color='blue', label=r'Agent 2\'s endowment')
ax[0].plot(epsgrid, mdl1.Y(epsgrid,1), color='red', label=r'Production with $k=1$')
ax[0].set_xlim([-1,1])
ax[0].set_ylim([0,7])
ax[0].set_xlabel(r'$\epsilon$',fontsize=12)
ax[0].set_title(r'Model with $\chi_1 = 0$, $\chi_2 = 0.9$')
ax[0].legend()
ax[0].grid()
plt.show()
Let’s also compare the optimal capital stock, 𝑘, and optimal time-0 consumption of agent 2, 𝑐02 , for the two models:
# Print optimal k
kk_1 = mdl1.opt_k()
kk_2 = mdl2.opt_k()
2nd example
In the second example, we illustrate how the optimal choice of 𝑘 is influenced by the correlation parameter 𝜒𝑖 .
We will need to install the plotly package for 3D illustration. See https://plotly.com/python/getting-started/ for further
instructions.
# Mesh grid of
N = 30
𝜒1grid, 𝜒2grid = np.meshgrid(np.linspace(-1,1,N),
np.linspace(-1,1,N))
w0 = mdl1.w0
@njit(parallel=True)
def fill_k_grid(kgrid):
# Loop: Compute optimal k and
for i in prange(N):
for j in prange(N):
X1 = 𝜒1grid[i, j]
X2 = 𝜒2grid[i, j]
k = root_finding.newton_secant(k_foc, 1e-2, args=(X1, X2)).root
kgrid[i, j] = k
%%time
fill_k_grid(kgrid)
%%time
# Second-run
fill_k_grid(kgrid)
CPU times: user 7.65 ms, sys: 985 μs, total: 8.64 ms
Wall time: 2.87 ms
# Plot optimal k
fig = go.Figure(data=[go.Surface(x=𝜒1grid, y=𝜒2grid, z=kgrid)])
fig.update_layout(scene = dict(xaxis_title='x - 𝜒1',
yaxis_title='y - 𝜒2',
zaxis_title='z - k',
aspectratio=dict(x=1,y=1,z=1)))
fig.update_layout(width=500,
height=500,
margin=dict(l=50, r=50, b=65, t=90))
fig.update_layout(scene_camera=dict(eye=dict(x=2, y=-2, z=1.5)))
THIRTYNINE
In addition to what’s in Anaconda, this lecture will need the following libraries:
39.1 Introduction
This is an extension of an earlier lecture Irrelevance of Capital Structure with Complete Markets about a complete markets
model.
In contrast to that lecture, this one describes an instance of a model authored by Bisin, Clementi, and Gottardi [Bisin et
al., 2018] in which financial markets are incomplete.
Instead of being able to trade equities and a full set of one-period Arrow securities as they can in Irrelevance of Capital
Structure with Complete Markets, here consumers and firms trade only equity and a bond.
It is useful to watch how outcomes differ in the two settings.
In the complete markets economy in Irrelevance of Capital Structure with Complete Markets
• there is a unique stochastic discount factor that prices all assets
• consumers’ portfolio choices are indeterminate
• firms’ financial structures are indeterminate, so the model embodies an instance of a Modigliani-Miller irrelevance
theorem [Modigliani and Miller, 1958]
• the aggregate of all firms’ financial structures are indeterminate, a consequence of there being redundant assets
In the incomplete markets economy studied here
• there is a not a unique equilibrium stochastic discount factor
• different stochastic discount factors price different assets
• consumers’ portfolio choices are determinate
• while individual firms’ financial structures are indeterminate, thus conforming to part of a Modigliani-Miller the-
orem, [Modigliani and Miller, 1958], the aggregate of all firms’ financial structures is determinate.
A Big K, little k analysis played an important role in the previous lecture Irrelevance of Capital Structure with
Complete Markets.
A more subtle version of a Big K, little k features in the BCG incomplete markets environment here.
717
Advanced Quantitative Economics with Python
We use it to convey the heart of what BCG call a rational conjectures equilibrium in which conjectures are about
equilibrium pricing functions in regions of the state space that an average consumer or firm does not visit in equilibrium.
Note that the absence of complete markets means that now we cannot compute competitive equilibrium prices and
allocations by first solving the simple planning problem that we did in Irrelevance of Capital Structure with Complete
Markets.
Instead, we compute an equilibrium by solving a system of simultaneous inequalities.
(Here we do not address the interesting question of whether there is a different planning problem that we could use to
compute a competitive equlibrium allocation.)
39.1.1 Setup
We adopt specifications of preferences and technologies used by Bisin, Clemente, and Gottardi (2018) [Bisin et al., 2018]
and in our earlier lecture on a complete markets version of their model.
The economy lasts for two periods, 𝑡 = 0, 1.
There are two types of consumers named 𝑖 = 1, 2.
A scalar random variable 𝜖 affects both
• a representative firm’s physical return 𝑓(𝑘)𝑒𝜖 in period 1 from investing 𝑘 ≥ 0 in capital in period 0.
• period 1 endowments 𝑤1𝑖 (𝜖) of the consumption good for agents 𝑖 = 1 and 𝑖 = 2.
39.1.2 Ownership
A consumer of type 𝑖 is endowed with 𝑤0𝑖 units of the time 0 good and 𝑤1𝑖 (𝜖) of the time 1 good when the random variable
takes value 𝜖.
At the start of period 0, a consumer of type 𝑖 also owns 𝜃0𝑖 shares of a representative firm.
As in the companion lecture Irrelevance of Capital Structure with Complete Markets that studies a complete markets version
of the model, we follow BCG in assuming that there are unit measures of
• consumers of type 𝑖 = 1
• consumers of type 𝑖 = 2
• firms with access to a production technology that converts 𝑘 units of time 0 good into 𝐴𝑘𝛼 𝑒𝜖 units of the time 1
good in random state 𝜖
Thus, let 𝜔 ∈ [0, 1] index a particular consumer of type 𝑖.
Then define Big 𝐶 𝑖 as
1
𝐶 𝑖 = ∫ 𝑐𝑖 (𝜔)𝑑 𝜔
0
with components
1
𝐶0𝑖 = ∫ 𝑐0𝑖 (𝜔)𝑑 𝜔
0
1
𝐶1𝑖 (𝜖) = ∫ 𝑐1𝑖 (𝜖; 𝜔)𝑑 𝜔
0
In the same spirit, let 𝜁 ∈ [0, 1] index a particular firm and let firm 𝜁 purchase 𝑘(𝜁) units of capital and issue 𝑏(𝜁) bonds.
Then define Big 𝐾 and Big 𝐵 as
1 1
𝐾 = ∫ 𝑘(𝜁)𝑑 𝜁, 𝐵 = ∫ 𝑏(𝜁)𝑑 𝜁
0 0
The assumption that there are equal measures of our three types of agents justifies our assumption that each individual
agent is a powerless price taker:
• an individual consumer chooses its own (infinitesimal) part 𝑐𝑖 (𝜔) of 𝐶 𝑖 taking prices as given
• an individual firm chooses its own (infinitesmimal) part 𝑘(𝜁) of 𝐾 and 𝑏(𝜁) of 𝐵 taking pricing functions as given
• However, equilibrium prices depend on the Big K, Big B, Big C objects 𝐾, 𝐵, and 𝐶
The assumption about measures of agents is a powerful device for making a host of competitive agents take as given the
equilibrium prices that turn out to be determined by the decisions of hosts of agents who are just like them.
We call an equilibrium symmetric if
• all type 𝑖 consumers choose the same consumption profiles so that 𝑐𝑖 (𝜔) = 𝐶 𝑖 for all 𝜔 ∈ [0, 1]
• all firms choose the same levels of 𝑘 and 𝑏 so that 𝑘(𝜁) = 𝐾, 𝑏(𝜁) = 𝐵 for all 𝜁 ∈ [0, 1]
In this lecture, we restrict ourselves to describing symmetric equilibria.
39.1.4 Endowments
𝑤0 = 𝑤01 + 𝑤02
𝑤1 (𝜖) = 𝑤11 (𝜖) + 𝑤12 (𝜖) in state 𝜖
39.1.5 Feasibility:
39.1.6 Parameterizations
𝜖 ∼ 𝒩(𝜇, 𝜎2 )
𝑐1−𝛾
𝑢(𝑐) =
1−𝛾
2 2
𝑤1𝑖 (𝜖) = 𝑒−𝜒𝑖 𝜇−.5𝜒𝑖 𝜎 +𝜒𝑖 𝜖
, 𝜒𝑖 ∈ [0, 1]
Sometimes instead of asuming 𝜖 ∼ 𝑔(𝜖) = 𝒩(0, 𝜎2 ), we’ll assume that 𝑔(⋅) is a probability mass function that serves as
a discrete approximation to a standardized normal density.
39.1.7 Preferences:
A consumer of type 𝑖 orders period 0 consumption 𝑐0𝑖 and state 𝜖-period 1 consumption 𝑐𝑖 (𝜖) by
The two types of agents’ period 1 endowments have different correlations with the physical return on capital.
Endowment differences give agents incentives to trade risks that in the complete market version of the model showed up
in their demands for equity and in their demands and supplies of one-period Arrow securities.
In the incomplete-markets setting under study here, these differences show up in differences in the two types of consumers’
demands for a typical firm’s bonds and equity, the only two assets that agents can now trade.
Markets are incomplete: ex cathedra we the model builders declare that only equities and bonds issued by representative
firms can be traded.
Let 𝜃𝑖 and 𝜉 𝑖 be a consumer of type 𝑖’s post-trade holdings of equity and bonds, respectively.
A firm issues bonds promising to pay 𝑏 units of consumption at time 𝑡 = 1 and purchases 𝑘 units of physical capital at
time 𝑡 = 0.
When 𝑒𝜖 𝐴𝑘𝛼 < 𝑏 at time 1, the firm defaults and its output is divided equally among bondholders.
Evidently, when the productivity shock 𝜖 < 𝜖∗ = log ( 𝐴𝑘𝑏 𝛼 ), the firm defaults on its debt
Payoffs to equity and debt at date 1 as functions of the productivity shock 𝜖 are thus
39.2.1 Consumers
Each consumer of type 𝑖 is endowed with 𝑤0𝑖 of the time 0 consumption good, 𝑤1𝑖 (𝜖) of the time 1, state 𝜖 consumption
good and also owns a fraction 𝜃0𝑖 ∈ (0, 1) of the initial value of a representative firm, where 𝜃01 + 𝜃02 = 1.
The initial value of a representative firm is 𝑉 (an object to be determined in a rational expectations equilibrium).
Consumer 𝑖 buys 𝜃𝑖 shares of equity and buys bonds worth 𝑝𝜉̌ 𝑖 where 𝑝̌ is the bond price.
Being a price-taker, a consumer takes 𝑉 , 𝑞,̌ 𝑝,̌ and 𝐾, 𝐵 as given.
Consumers know that equilibrium payoff functions for bonds and equities take the form
𝑑𝑒 (𝐾, 𝐵; 𝜖) = max {𝑒𝜖 𝐴𝐾 𝛼 − 𝐵, 0}
𝑒𝜖 𝐴𝐾 𝛼
𝑑𝑏 (𝐾, 𝐵; 𝜖) = min { , 1}
𝐵
Consumer 𝑖’s optimization problem is
When individual firms solve their optimization problems, they take big 𝐶 𝑖 ’s as fixed objects that they don’t influence.
A representative firm faces a price function 𝑞(𝑘, 𝑏) for its equity and a price function 𝑝(𝑘, 𝑏) per unit of bonds that satisfy
𝑢′ (𝐶1𝑖 (𝜖)) 𝑒
𝑞(𝑘, 𝑏) = max 𝛽 ∫ 𝑑 (𝑘, 𝑏; 𝜖)𝑔(𝜖) 𝑑𝜖
𝑖 𝑢′ (𝐶0𝑖 )
𝑢′ (𝐶1𝑖 (𝜖)) 𝑏
𝑝(𝑘, 𝑏) = max 𝛽 ∫ 𝑑 (𝑘, 𝑏; 𝜖)𝑔(𝜖) 𝑑𝜖
𝑖 𝑢′ (𝐶0𝑖 )
39.2.3 Firms
The firm chooses capital 𝑘 and debt 𝑏 to maximize its market value:
Attributing value maximization to the firm is a good idea because in equilibrium consumers of both types want a firm to
maximize its value.
In the special quantitative examples studied here
• consumers of types 𝑖 = 1, 2 both hold equity
• only consumers of type 𝑖 = 2 hold debt; consumers of type 𝑖 = 1 hold none.
These outcomes occur because we follow BCG and set parameters so that a type 2 consumer’s stochastic endowment of
the consumption good in period 1 is more correlated with the firm’s output than is a type 1 consumer’s.
This gives consumers of type 2 a motive to hedge their second period endowment risk by holding bonds (they also choose
to hold some equity).
These outcomes mean that the pricing functions end up satisfying
The firm’s first-order necessary conditions with respect to 𝑘 and 𝑏, respectively, are
𝜕𝑞(𝑘, 𝑏) 𝜕𝑝(𝑞, 𝑏)
𝑘∶ −1+ +𝑏 =0
𝜕𝑘 𝜕𝑘
𝜕𝑞(𝑘, 𝑏) 𝜕𝑝(𝑘, 𝑏)
𝑏∶ + 𝑝(𝑘, 𝑏) + 𝑏 =0
𝜕𝑏 𝜕𝑏
We use the Leibniz integral rule several times to arrive at the following derivatives:
∞
𝜕𝑞(𝑘, 𝑏) 𝑢′ (𝐶1𝑖 (𝜖)) 𝜖
= 𝛽𝛼𝐴𝑘𝛼−1 ∫ 𝑒 𝑔(𝜖)𝑑𝜖, 𝑖 = 1, 2
𝜕𝑘 𝜖∗ 𝑢′ (𝐶0𝑖 )
∞
𝜕𝑞(𝑘, 𝑏) 𝑢′ (𝐶1𝑖 (𝜖))
= −𝛽 ∫ 𝑔(𝜖)𝑑𝜖, 𝑖 = 1, 2
𝜕𝑏 𝜖∗ 𝑢′ (𝐶0𝑖 )
𝜖∗
𝜕𝑝(𝑘, 𝑏) 𝐴𝑘𝛼−1 𝑢′ (𝐶12 (𝜖))
= 𝛽𝛼 ∫ ′ 2
𝑔(𝜖)𝑑𝜖
𝜕𝑘 𝑏 −∞ 𝑢 (𝐶0 )
𝜖∗
𝜕𝑝(𝑘, 𝑏) 𝐴𝑘𝛼 𝑢′ (𝐶12 (𝜖)) 𝜖
= −𝛽 2 ∫ ′ 2
𝑒 𝑔(𝜖)𝑑𝜖
𝜕𝑏 𝑏 −∞ 𝑢 (𝐶0 )
𝜕𝑞(𝑘,𝑏)
Special case: We confine ourselves to a special case in which both types of consumer hold positive equities so that 𝜕𝑘
and 𝜕𝑞(𝑘,𝑏)
𝜕𝑏 are related to rates of intertemporal substitution for both agents.
Substituting these partial derivatives into the above first-order conditions for 𝑘 and 𝑏, respectively, we obtain the following
versions of those first order conditions:
∞
𝑢′ (𝐶12 (𝜖)) 𝜖
𝑘∶ −1 + 𝛽𝛼𝐴𝑘𝛼−1 ∫ 𝑒 𝑔(𝜖)𝑑𝜖 = 0 (39.2)
−∞ 𝑢′ (𝐶02 )
∞ ∞
𝑢′ (𝐶11 (𝜖)) 𝑢′ (𝐶12 (𝜖))
𝑏∶ ∫ ( ) 𝑔(𝜖) 𝑑𝜖 = ∫ ( ) 𝑔(𝜖) 𝑑𝜖 (39.3)
𝜖∗ 𝑢′ (𝐶01 ) 𝜖∗ 𝑢′ (𝐶02 )
where again recall that 𝜖∗ (𝑘, 𝑏) ≡ log ( 𝐴𝑘𝑏 𝛼 ).
Taking 𝐶0𝑖 , 𝐶1𝑖 (𝜖) as given, these are two equations that we want to solve for the firm’s optimal decisions 𝑘, 𝑏.
Before displaying our Python code for computing a BCG incomplete markets equilibrium, we’ll sketch some pseudo code
that describes its logical flow.
Here goes:
1. Set upper and lower bounds for firm value as 𝑉ℎ and 𝑉𝑙 , for capital as 𝑘ℎ and 𝑘𝑙 , and for debt as 𝑏ℎ and 𝑏𝑙 .
2. Conjecture firm value 𝑉 = 12 (𝑉ℎ + 𝑉𝑙 )
3. Conjecture debt level 𝑏 = 12 (𝑏ℎ + 𝑏𝑙 ).
4. Conjecture capital 𝑘 = 12 (𝑘ℎ + 𝑘𝑙 ).
5. Compute the default threshold 𝜖∗ ≡ log ( 𝐴𝑘𝑏 𝛼 ).
6. (In this step we abuse notation by freezing 𝑉 , 𝑘, 𝑏 and in effect temporarily treating them as Big 𝐾, 𝐵 values. Thus,
in this step 6 little 𝑘, 𝑏 are frozen at guessed at value of 𝐾, 𝐵.) Fixing the values of 𝑉 , 𝑏 and 𝑘, compute optimal
choices of consumption 𝑐𝑖 with consumers’ FOCs. Assume that only agent 2 holds debt: 𝜉 2 = 𝑏 and that both
agents hold equity: 0 < 𝜃𝑖 < 1 for 𝑖 = 1, 2.
7. Set high and low bounds for equity holdings for agent 1 as 𝜃ℎ1 and 𝜃𝑙1 . Guess 𝜃1 = 21 (𝜃ℎ1 + 𝜃𝑙1 ), and 𝜃2 = 1 − 𝜃1 .
While |𝜃ℎ1 − 𝜃𝑙1 | is large:
• Compute agent 1’s valuation of the equity claim with a fixed-point iteration:
′ 1
𝑞1 = 𝛽 ∫ 𝑢𝑢(𝑐′ (𝑐1 (𝜖)) 𝑒
1 ) 𝑑 (𝑘, 𝑏; 𝜖)𝑔(𝜖) 𝑑𝜖
0
where
𝑐11 (𝜖) = 𝑤11 (𝜖) + 𝜃1 𝑑𝑒 (𝑘, 𝑏; 𝜖)
and
𝑐01 = 𝑤01 + 𝜃01 𝑉 − 𝑞1 𝜃1
• Compute agent 2’s valuation of the bond claim with a fixed-point iteration:
′ 2
𝑝 = 𝛽 ∫ 𝑢𝑢(𝑐′ (𝑐1 (𝜖)) 𝑏
2 ) 𝑑 (𝑘, 𝑏; 𝜖)𝑔(𝜖) 𝑑𝜖
0
where
𝑐12 (𝜖) = 𝑤12 (𝜖) + 𝜃2 𝑑𝑒 (𝑘, 𝑏; 𝜖) + 𝑏
and
where
𝑐12 (𝜖) = 𝑤12 (𝜖) + 𝜃2 𝑑𝑒 (𝑘, 𝑏; 𝜖) + 𝑏
and
𝑐02 = 𝑤02 + 𝜃02 𝑉 − 𝑞2 𝜃2 − 𝑝𝑏
• If 𝑞1 > 𝑞2 , Set 𝜃𝑙 = 𝜃1 ; otherwise, set 𝜃ℎ = 𝜃1 .
• Repeat steps 6Aa through 6Ad until |𝜃ℎ1 − 𝜃𝑙1 | is small.
8. Set bond price as 𝑝 and equity price as 𝑞 = max(𝑞1 , 𝑞2 ).
9. Compute optimal choices of consumption:
10. (Here we confess to abusing notation again, but now in a different way. In step 7, we interpret frozen 𝑐𝑖 s as Big
𝐶 𝑖 . We do this to solve the firm’s problem.) Fixing the values of 𝑐0𝑖 and 𝑐1𝑖 (𝜖), compute optimal choices of capital
𝑘 and debt level 𝑏 using the firm’s first order necessary conditions.
11. Compute deviations from the firm’s FONC for capital 𝑘 as:
′ 2
𝑘𝑓𝑜𝑐 = 𝛽𝛼𝐴𝑘𝛼−1 (∫ 𝑢𝑢(𝑐′ (𝑐1 (𝜖)) 𝜖
2 ) 𝑒 𝑔(𝜖) 𝑑𝜖) − 1
0
39.5 Code
We create a Python class BCG_incomplete_markets to compute the equilibrium allocations of the incomplete
market BCG model, given a set of parameter values.
The class includes the following methods, i.e., functions:
• solve_eq: solves the BCG model and returns the equilibrium values of capital 𝑘, debt 𝑏 and firm value 𝑉 , as
well as
– agent 1’s equity holdings 𝜃1,∗
– prices 𝑞 ∗ , 𝑝∗
– consumption plans 𝐶01,∗ , 𝐶02,∗ , 𝐶11,∗ (𝜖), 𝐶12,∗ (𝜖).
• eq_valuation: inputs equilibrium consumpion plans 𝐶 ∗ and outputs the following valuations for each pair of
(𝑘, 𝑏) in the grid:
– the firm 𝑉 (𝑘, 𝑏)
– the equity 𝑞(𝑘, 𝑏)
– the bond 𝑝(𝑘, 𝑏).
Parameters include:
• 𝜒1 , 𝜒2 : correlation parameter for agent 1 and 2. Default values are respectively 0 and 0.9.
• 𝑤01 , 𝑤02 : initial endowments. Default values are respectively 0.9 and 1.1.
• 𝜃01 , 𝜃02 : initial holding of the firm. Default values are 0.5.
• 𝜓: risk parameter. Default value is 3.
• 𝛼: Production function parameter. Default value is 0.6.
• 𝐴: Productivity of the firm. Default value is 2.5.
• 𝜇, 𝜎: Mean and standard deviation of the shock distribution. Default values are respectively -0.025 and 0.4
• 𝛽: Discount factor. Default value is 0.96.
• bound: Bound for truncated normal distribution. Default value is 3.
import numpy as np
from scipy.stats import truncnorm
from scipy.integrate import quad
from numba import njit
class BCG_incomplete_markets:
# Other parameters
self.𝜓1 = 𝜓1
self.𝜓2 = 𝜓2
self.𝛼 = 𝛼
self.A = A
self.𝜇 = 𝜇
self.𝜎 = 𝜎
self.𝛽 = 𝛽
self.bound = bound
# Utility
self.u = njit(lambda c: (c**(1-𝜓)) / (1-𝜓))
# Initial endowments
self.w10 = w10
self.w20 = w20
self.w0 = w10 + w20
# Initial holdings
self.𝜃10 = 𝜃10
self.𝜃20 = 𝜃20
# Endowments at t=1
self.w11 = njit(lambda 𝜖: np.exp(-𝜒1*𝜇 - 0.5*(𝜒1**2)*(𝜎**2) + 𝜒1*𝜖))
self.w21 = njit(lambda 𝜖: np.exp(-𝜒2*𝜇 - 0.5*(𝜒2**2)*(𝜎**2) + 𝜒2*𝜖))
self.w1 = njit(lambda 𝜖: self.w11(𝜖) + self.w21(𝜖))
# Truncated normal
ta, tb = (-bound - 𝜇) / 𝜎, (bound - 𝜇) / 𝜎
#*************************************************************
# Function: Solve for equilibrium of the BCG model
#*************************************************************
def solve_eq(self, print_crit=True):
# Load parameters
𝜓1 = self.𝜓1
𝜓2 = self.𝜓2
𝛼 = self.𝛼
A = self.A
𝛽 = self.𝛽
bound = self.bound
Vl = self.Vl
Vh = self.Vh
kbot = self.kbot
ktop = self.ktop
bbot = self.bbot
btop = self.btop
w10 = self.w10
w20 = self.w20
𝜃10 = self.𝜃10
𝜃20 = self.𝜃20
w11 = self.w11
w21 = self.w21
g = self.g
while V_crit>1e-4:
# We begin by adding the guess for the value of the firm to endowment
V = (Vl+Vh)/2
while b_crit>1e-5:
while k_crit>1e-5:
# Production
fk = A*(k**𝛼)
# Y = lambda : np.exp( )*fk
#**************************************************************
# Compute the prices and allocations consistent with consumers'
# Euler equations
#**************************************************************
#========
# Agent 1
#========
# Holdings
𝜉1 = 0
𝜃1a = 0.3
𝜃1b = 1
𝜃1 = (𝜃1a + 𝜃1b) / 2
#========
# Agent 2
#========
𝜉2 = b - 𝜉1
𝜃2 = 1 - 𝜃1
qq2l = 0
qq2h = ww20
diff = 1
while diff > 1e-7:
qq2 = (qq2l+qq2h)/2
rhs = const_qq2/((ww20-qq2*𝜃2-p*b)**(-𝜓2));
if (rhs > qq2):
qq2l = qq2
else:
qq2h = qq2
diff = abs(qq2l-qq2h)
#================
# Update holdings
#================
if qq1 > qq2:
𝜃1a = 𝜃1
else:
𝜃1b = 𝜃1
#================
# Get consumption
#================
c10 = ww10 - q*𝜃1
c11 = lambda 𝜖: w11(𝜖) + 𝜃1*max(Y(𝜖, fk)-b,0)
c20 = ww20 - q*(1-𝜃1) - p*b
c21 = lambda 𝜖: w21(𝜖) + (1-𝜃1)*max(Y(𝜖, fk)-b,0) + min(Y(𝜖, fk),
b)
↪
#*************************************************
# Compute the first order conditions for the firm
#*************************************************
#===========
# Equity FOC
#===========
# Only agent 2's IMRS is relevent
# intk1 = lambda : (w21( ) + Y( , fk))**(- 2)*np.exp( )*g( )
# intk2 = lambda : (w21( ) + 2*(Y( , fk)-b) + b)**(- 2)*np.
exp( )*g( )
↪
if print_crit:
print("critical value of k: {:.5f}".format(k_crit))
#=========
# Bond FOC
#=========
# intB1 = lambda : (w11( ) + 1*(Y( , fk) - b))**(- 1)*g( )
# intB2 = lambda : (w21( ) + 2*(Y( , fk) - b) + b)**(- 2)*g( )
if print_crit:
print("#=== critical value of b: {:.5f}".format(b_crit))
if print_crit:
print("#====== critical value of V: {:.5f}".format(V_crit))
print('k,b,p,q,kfoc,bfoc,epstar,V,V_crit')
formattedList = ["%.3f" % member for member in [k,
b,
p,
q,
kfoc,
#*********************************
# Equilibrium values
#*********************************
return kss,bss,Vss,qss,pss,c10ss,c11ss,c20ss,c21ss,𝜃1ss
#*************************************************************
# Function: Equity and bond valuations by different agents
#*************************************************************
def valuations_by_agent(self,
c10, c11, c20, c21,
k, b):
# Load parameters
𝜓1 = self.𝜓1
𝜓2 = self.𝜓2
𝛼 = self.𝛼
A = self.A
𝛽 = self.𝛽
bound = self.bound
Vl = self.Vl
Vh = self.Vh
# Production
fk = A*(k**𝛼)
Y = lambda 𝜖: np.exp(𝜖)*fk
return Q1,Q2,P1,P2
#*************************************************************
# Function: equilibrium valuations for firm, equity, bond
#*************************************************************
def eq_valuation(self, c10, c11, c20, c21, N=30):
# Load parameters
𝜓1 = self.𝜓1
𝜓2 = self.𝜓2
𝛼 = self.𝛼
A = self.A
𝛽 = self.𝛽
bound = self.bound
Vl = self.Vl
Vh = self.Vh
# Create grids
kgrid, bgrid = np.meshgrid(np.linspace(kbot,ktop,N),
np.linspace(bbot,btop,N))
Vgrid = np.zeros_like(kgrid)
Qgrid = np.zeros_like(kgrid)
Pgrid = np.zeros_like(kgrid)
39.6 Examples
Below we show some examples computed with the class BCG_incomplete markets.
In the first example, we set up an instance of the BCG incomplete markets model with default parameter values.
mdl = BCG_incomplete_markets()
kss,bss,Vss,qss,pss,c10ss,c11ss,c20ss,c21ss,𝜃1ss = mdl.solve_eq(print_crit=False)
print(-kss+qss+pss*bss)
print(Vss)
print(𝜃1ss)
0.10073912888808995
0.100830078125
0.98564453125
Python reports to us that the equilibrium firm value is 𝑉 = 0.101, with capital 𝑘 = 0.151 and debt 𝑏 = 0.484.
Let’s verify some things that have to be true if our algorithm has truly found an equilibrium.
Thus, let’s see if the firm is actually maximizing its firm value given the equilibrium pricing function 𝑞(𝑘, 𝑏) for equity
and 𝑝(𝑘, 𝑏) for bonds.
Up to the approximation involved in using a discrete grid, these numbers give us comfort that the firm does indeed seem
to be maximizing its value at the top of the value hill on the (𝑘, 𝑏) plane that it faces.
Below we will plot the firm’s value as a function of 𝑘, 𝑏.
We’ll also plot the equilibrium price functions 𝑞(𝑘, 𝑏) and 𝑝(𝑘, 𝑏).
# Firm Valuation
fig = go.Figure(data=[go.Scatter3d(x=[kss],
y=[bss],
z=[Vss],
mode='markers',
marker=dict(size=3, color='red')),
(continues on next page)
fig.update_layout(scene = dict(
xaxis_title='x - Capital k',
yaxis_title='y - Debt b',
zaxis_title='z - Firm Value V',
aspectratio = dict(x=1,y=1,z=1)),
width=700,
height=700,
margin=dict(l=50, r=50, b=65, t=90))
fig.update_layout(scene_camera=dict(eye=dict(x=1.5, y=-1.5, z=2)))
fig.update_layout(title='Equilibrium firm valuation for the grid of (k,b)')
A Modigliani-Miller theorem?
The red dot in the above graph is both an equilibrium (𝑏, 𝑘) chosen by a representative firm and the equilibrium 𝐵, 𝐾
pair chosen by the aggregate of all firms.
Thus, in equilibrium it is true that
(𝑏, 𝑘) = (𝐵, 𝐾)
But an individual firm named 𝜁 ∈ [0, 1] neither knows nor cares whether it sets (𝑏(𝜁), 𝑘(𝜁)) = (𝐵, 𝐾).
Indeed the above graph has a ridge of 𝑏(𝜁)’s that also maximize the firm’s value so long as it sets 𝑘(𝜁) = 𝐾.
Here it is important that the measure of firms that deviate from setting 𝑏 at the red dot is very small – measure zero – so
that 𝐵 remains at the red dot even while one firm 𝜁 deviates.
So within this equilibrium, there is a qualified Modigliani-Miller theorem that asserts that firm 𝜁’s value is independent
of how it mixes its financing between equity and bonds (so long as it is not what other firms do on average).
Thus, while an individual firm 𝜁’s financial structure is indeterminate, the market’s financial structure is determinant and
sits at the red dot in the above graph.
This contrasts sharply with the unqualified Modigliani-Miller theorem descibed in the complete markets model in the
lecture Irrelevance of Capital Structure with Complete Markets.
There the market’s financial structure was indeterminate.
These subtle distinctions bear more thought and exploration.
So we will do some calculations to ferret out a sense in which the equilibrium (𝑘, 𝑏) = (𝐾, 𝐵) outcome at the red dot in
the above graph is stable.
In particular, we’ll explore the consequences of some choices of 𝑏 = 𝐵 that deviate from the red dot and ask whether
firm 𝜁 would want to remain at that 𝑏.
In more detail, here is what we’ll do:
1. Obtain equilibrium values of capital and debt as 𝑘∗ = 𝐾 and 𝑏∗ = 𝐵, the red dot above.
2. Now fix 𝑘∗ and let 𝑏∗∗ = 𝑏∗ − 𝑒 for some 𝑒 > 0. Conjecture that big 𝐾 = 𝑘∗ but big 𝐵 = 𝑏∗∗ .
3. Take 𝐾 and 𝐵 and compute intertermporal marginal rates of substitution (IMRS’s) as we did before.
4. Taking the new IMRS to the firm’s problem. Plot 3D surface for the valuations of the firm with this new IMRS.
5. Check if the value at 𝑘∗ , 𝑏∗∗ is at the top of this new 3D surface.
6. Repeat these calculations for 𝑏∗∗ = 𝑏∗ + 𝑒.
To conduct the above procedures, we create a function off_eq_check that inputs the BCG model instance parameters,
equilibrium capital 𝐾 = 𝑘∗ and debt 𝐵 = 𝑏∗ , and a perturbation of debt 𝑒.
The function outputs the fixed point firm values 𝑉 ∗∗ , prices 𝑞 ∗∗ , 𝑝∗∗ , and consumption choices 𝑐∗∗ .
Importantly, we relax the condition that only agent 2 holds bonds.
Now both agents can hold bonds, i.e., 0 ≤ 𝜉 1 ≤ 𝐵 and 𝜉 1 + 𝜉 2 = 𝐵.
That implies the consumers’ budget constraints are:
def off_eq_check(mdl,kss,bss,e=0.1):
# Big K and big B
k = kss
b = bss + e
# Load parameters
𝜓1 = mdl.𝜓1
𝜓2 = mdl.𝜓2
𝛼 = mdl.𝛼
A = mdl.A
𝛽 = mdl.𝛽
bound = mdl.bound
(continues on next page)
intpp1b = njit(lambda 𝜖, fk, 𝜃1, 𝜓1, 𝜉1, b: (w11(𝜖) + 𝜃1*(Y(𝜖, fk)-b) + 𝜉1)**(-
↪𝜓1)*g(𝜖))
intpp2b = njit(lambda 𝜖, fk, 𝜃2, 𝜓2, 𝜉2, b: (w21(𝜖) + 𝜃2*(Y(𝜖, fk)-b) + 𝜉2)**(-
↪𝜓2)*g(𝜖))
# We begin by adding the guess for the value of the firm to endowment
V = (Vl+Vh)/2
ww10 = w10 + 𝜃10*V
ww20 = w20 + 𝜃20*V
# Production
fk = A*(k**𝛼)
# Y = lambda : np.exp( )*fk
#**************************************************************
# Compute the prices and allocations consistent with consumers'
# Euler equations
#**************************************************************
𝜉1 = (𝜉1a + 𝜉1b) / 2
𝜃1a = 0.3
𝜃1b = 1
𝜃1 = (𝜃1a + 𝜃1b) / 2
# const_qq1 = * quad(intqq1,epstar,bound)[0]
const_qq1 = 𝛽 * quad(intqq1,epstar,bound, args=(fk, 𝜃1, 𝜓1, 𝜉1, b))[0]
#========
# Agent 2
#========
𝜉2 = b - 𝜉1
𝜃2 = 1 - 𝜃1
#================
# Update holdings
#================
if qq1 > qq2:
𝜃1a = 𝜃1
else:
𝜃1b = 𝜃1
#print(p,q, 1, 1)
#================
# Get consumption
#================
c10 = ww10 - q*𝜃1 - p*𝜉1
c11 = lambda 𝜖: w11(𝜖) + 𝜃1*max(Y(𝜖, fk)-b,0) + 𝜉1*min(Y(𝜖, fk)/b,1)
c20 = ww20 - q*(1-𝜃1) - p*(b-𝜉1)
c21 = lambda 𝜖: w21(𝜖) + (1-𝜃1)*max(Y(𝜖, fk)-b,0) + (b-𝜉1)*min(Y(𝜖, fk)/b,1)
return V,k,b,p,q,c10,c11,c20,c21,𝜉1
Our hunch is that (𝑘∗ , 𝑏∗∗ ) is not at the top of the firm valuation 3D surface so that the firm is not maximizing its value
if it chooses 𝑘 = 𝐾 = 𝑘∗ and 𝑏 = 𝐵 = 𝑏∗∗ .
That indicates that (𝑘∗ , 𝑏∗∗ ) is not an equilibrium capital structure for the firm.
We first check the case in which 𝑏∗∗ = 𝑏∗ − 𝑒 where 𝑒 = 0.1:
# Firm Valuation
kgride1, bgride1, Vgride1, Qgride1, Pgride1 = mdl.eq_valuation(c10e1, c11e1, c20e1,␣
↪c21e1,N=20)
fig = go.Figure(data=[go.Scatter3d(x=[ke1],
y=[be1],
z=[Ve1],
mode='markers',
marker=dict(size=3, color='red')),
go.Surface(x=kgride1,
y=bgride1,
z=Vgride1,
colorscale='Greens',opacity=0.6)])
fig.update_layout(scene = dict(
xaxis_title='x - Capital k',
yaxis_title='y - Debt b',
zaxis_title='z - Firm Value V',
aspectratio = dict(x=1,y=1,z=1)),
width=700,
height=700,
margin=dict(l=50, r=50, b=65, t=90))
fig.update_layout(scene_camera=dict(eye=dict(x=1.5, y=-1.5, z=2)))
fig.update_layout(title='Equilibrium firm valuation for the grid of (k,b)')
In the above 3D surface of prospective firm valuations, the perturbed choice (𝑘∗ , 𝑏∗ − 𝑒), represented by the red dot, is
not at the top.
The firm could issue more debts and attain a higher firm valuation from the market.
Therefore, (𝑘∗ , 𝑏∗ − 𝑒) would not be an equilibrium.
Next, we check for 𝑏∗∗ = 𝑏∗ + 𝑒.
# Firm Valuation
kgride2, bgride2, Vgride2, Qgride2, Pgride2 = mdl.eq_valuation(c10e2, c11e2, c20e2,␣
(continues on next page)
fig = go.Figure(data=[go.Scatter3d(x=[ke2],
y=[be2],
z=[Ve2],
mode='markers',
marker=dict(size=3, color='red')),
go.Surface(x=kgride2,
y=bgride2,
z=Vgride2,
colorscale='Greens',opacity=0.6)])
fig.update_layout(scene = dict(
xaxis_title='x - Capital k',
yaxis_title='y - Debt b',
zaxis_title='z - Firm Value V',
aspectratio = dict(x=1,y=1,z=1)),
width=700,
height=700,
margin=dict(l=50, r=50, b=65, t=90))
fig.update_layout(scene_camera=dict(eye=dict(x=1.5, y=-1.5, z=2)))
fig.update_layout(title='Equilibrium firm valuation for the grid of (k,b)')
In contrast to (𝑘∗ , 𝑏∗ − 𝑒), the 3D surface for (𝑘∗ , 𝑏∗ + 𝑒) now indicates that a firm would want o decrease its debt issuance
to attain a higher valuation.
That incentive to deviate means that (𝑘∗ , 𝑏∗ + 𝑒) is not an equilibrium capital structure for the firm.
Interestingly, if consumers were to anticipate that firms would over-issue debt, i.e. 𝐵 > 𝑏∗ , then both types of consumer
would want to hold corporate debt.
For example, 𝜉 1 > 0:
Our two stability experiments suggest that the equilibrium capital structure (𝑘∗ , 𝑏∗ ) is locally unique even though at the
equilibrium an individual firm would be willing to deviate from the representative firms’ equilibrium debt choice.
These experiments thus refine our discussion of the qualified Modigliani-Miller theorem that prevails in this example
economy.
It is also interesting to look at the equilibrium price functions 𝑞(𝑘, 𝑏) and 𝑝(𝑘, 𝑏) faced by firms in our rational expectations
equilibrium.
# Equity Valuation
fig = go.Figure(data=[go.Scatter3d(x=[kss],
y=[bss],
z=[qss],
mode='markers',
marker=dict(size=3, color='red')),
go.Surface(x=kgrid,
y=bgrid,
z=Qgrid,
colorscale='Blues',opacity=0.6)])
fig.update_layout(scene = dict(
xaxis_title='x - Capital k',
yaxis_title='y - Debt b',
zaxis_title='z - Equity price q',
aspectratio = dict(x=1,y=1,z=1)),
width=700,
height=700,
margin=dict(l=50, r=50, b=65, t=90))
fig.update_layout(scene_camera=dict(eye=dict(x=1.5, y=-1.5, z=2)))
fig.update_layout(title='Equilibrium equity valuation for the grid of (k,b)')
# Bond Valuation
fig = go.Figure(data=[go.Scatter3d(x=[kss],
y=[bss],
z=[pss],
mode='markers',
marker=dict(size=3, color='red')),
go.Surface(x=kgrid,
y=bgrid,
z=Pgrid,
colorscale='Oranges',opacity=0.6)])
fig.update_layout(scene = dict(
xaxis_title='x - Capital k',
yaxis_title='y - Debt b',
zaxis_title='z - Bond price q',
aspectratio = dict(x=1,y=1,z=1)),
(continues on next page)
The equilibrium pricing functions displayed above merit study and reflection.
They reveal the countervailing effects on a firm’s valuations of bonds and equities that lie beneath the Modigliani-Miller
ridge apparent in our earlier graph of an individual firm 𝜁’s value as a function of 𝑘(𝜁), 𝑏(𝜁).
We illustrate how the fraction of initial endowments held by agent 2, 𝑤02 /(𝑤01 +𝑤02 ) affects an equilibrium capital structure
(𝑘, 𝑏) = (𝐾, 𝐵) well as associated equilibrium allocations.
We are interested in how agents 1 and 2 value equity and bond.
𝑢′ (𝐶1𝑖,∗ (𝜖))
𝑄𝑖 = 𝛽 ∫ 𝑑𝑒 (𝑘∗ , 𝑏∗ ; 𝜖)𝑔(𝜖) 𝑑𝜖
𝑢′ (𝐶0𝑖,∗ )
𝑢′ (𝐶1𝑖,∗ (𝜖))
𝑃𝑖 = 𝛽 ∫ 𝑑𝑏 (𝑘∗ , 𝑏∗ ; 𝜖)𝑔(𝜖) 𝑑𝜖
𝑢′ (𝐶0𝑖,∗ )
# Save fraction
w10 = 0.9 - 0.05*i
w20 = 1.1 + 0.05*i
wlist.append(w20/(w10+w20))
# Plot
fig, ax = plt.subplots(3,2,figsize=(12,12))
ax[0,0].plot(wlist,klist)
ax[0,0].set_title('capital')
ax[0,1].plot(wlist,blist)
ax[0,1].set_title('debt')
ax[1,0].plot(wlist,qlist)
ax[1,0].set_title('equity price')
ax[1,1].plot(wlist,plist)
ax[1,1].set_title('bond price')
ax[2,0].plot(wlist,Vlist)
ax[2,0].set_title('firm value')
ax[2,0].set_xlabel('fraction of initial endowment held by agent 2',fontsize=13)
# Plot (cont.)
ax[2,1].plot(wlist,epslist)
ax[2,1].set_title(r'default threshold $\epsilon^*$')
ax[2,1].set_xlabel('fraction of initial endowment held by agent 2',fontsize=13)
plt.show()
ax[1].plot(wlist,p1list,label='agent 1',color='green')
ax[1].plot(wlist,p2list,label='agent 2',color='blue')
ax[1].plot(wlist,plist,label='bond price',color='red',linestyle='--')
ax[1].legend()
ax[1].set_title('bond valuations')
ax[1].set_xlabel('fraction of initial endowment held by agent 2',fontsize=11)
ax[2].plot(wlist,tlist,color='blue')
ax[2].set_title('equity holdings by agent 1')
ax[2].set_xlabel('fraction of initial endowment held by agent 2',fontsize=11)
plt.show()
755
CHAPTER
FORTY
40.1 Overview
This lecture describes a model of optimal unemployment insurance created by Shavell and Weiss (1979) [Shavell and
Weiss, 1979].
We use recursive techniques of Hopenhayn and Nicolini (1997) [Hopenhayn and Nicolini, 1997] to compute optimal
insurance plans for Shavell and Weiss’s model.
Hopenhayn and Nicolini’s model is a generalization of Shavell and Weiss’s along dimensions that we’ll soon describe.
An unemployed worker orders stochastic processes of consumption and search effort {𝑐𝑡 , 𝑎𝑡 }∞
𝑡=0 according to
∞
𝐸 ∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) − 𝑎𝑡 ] (40.1)
𝑡=0
where 𝛽 ∈ (0, 1) and 𝑢(𝑐) is strictly increasing, twice differentiable, and strictly concave.
We assume that 𝑢(0) is well defined.
We require that 𝑐𝑡 ≥ 0 and 𝑎𝑡 ≥ 0.
All jobs are alike and pay wage 𝑤 > 0 units of the consumption good each period forever.
An unemployed worker searches with effort 𝑎 and with probability 𝑝(𝑎) receives a permanent job at the beginning of the
next period.
Furthermore, 𝑎 = 0 when the worker is employed.
The probability of finding a job is 𝑝(𝑎).
𝑝 is an increasing, strictly concave, and twice differentiable function of 𝑎 that satisfies 𝑝(𝑎) ∈ [0, 1] for 𝑎 ≥ 0, 𝑝(0) = 0.
Note: When we compute examples below, we’ll use assume the same 𝑝(𝑎) function as [Hopenhayn and Nicolini, 1997],
namely, 𝑝(𝑎) = 1 − exp(−𝑟𝑎), where 𝑟 is a parameter that we’ll calibrate to hit the same target that [Hopenhayn and
Nicolini, 1997] did, namely, an empirical hazard rate of leaving unemployment.
757
Advanced Quantitative Economics with Python
The unemployed worker’s only source of consumption smoothing over time and across states is an insurance agency or
planner.
Once a worker has found a job, he is beyond the planner’s grasp.
• This is Shavell and Weiss’s assumption, but not Hopenhayn and Nicolini’s.
• Hopenhayn and Nicolini allow the unemployment insurance agency to impose history-dependent taxes on previ-
ously unemployed workers.
• Since there is no incentive problem after the worker has found a job, it is optimal for the agency to provide an
employed worker with a constant level of consumption.
• Hence, Hopenhayn and Nicolini’s insurance agency imposes a permanent per-period history-dependent tax on a
previously unemployed but presently employed worker.
40.2.1 Autarky
As a benchmark, we first study the fate of an unemployed worker who has no access to unemployment insurance.
Because employment is an absorbing state for the worker, we work backward from that state.
Let 𝑉 𝑒 be the expected sum of discounted one-period utilities of an employed worker.
Once the worker is employed, 𝑎 = 0, making his period utility be 𝑢(𝑐) − 𝑎 = 𝑢(𝑤) forever.
Therefore,
𝑢(𝑤)
𝑉𝑒 = . (40.2)
(1 − 𝛽)
Now let 𝑉 𝑢 be the expected discounted present value of utility for an unemployed worker who chooses consumption,
effort pair (𝑐, 𝑎) optimally.
Value 𝑉 𝑢 satisfies the Bellman equation
Another benchmark model helps set the stage for the model with private information that we ultimately want to study.
We temporarily assume that an unemployment insurance agency has full information about the unemployed worker.
We assume that the insurance agency can control both the consumption and the search effort of an unemployed worker.
The agency wants to design an unemployment insurance contract to give the unemployed worker expected discounted
utility 𝑉 > 𝑉aut .
The agency, i.e., the planner, wants to deliver value 𝑉 efficiently, meaning in a way that minimizes an expected present
value discounted costs, using 𝛽 as the discount factor.
We formulate the optimal insurance problem recursively.
Let 𝐶(𝑉 ) be the expected discounted cost of giving the worker expected discounted utility 𝑉 .
The cost function is strictly convex because a higher 𝑉 implies a lower marginal utility of the worker; that is, additional
expected utils can be awarded to the worker only at an increasing marginal cost in terms of the consumption good.
Given 𝑉 , the planner assigns first-period pair (𝑐, 𝑎) and promised continuation value 𝑉 𝑢 next period if the worker is
unlucky and does not find a job this period.
The planner sets (𝑐, 𝑎, 𝑉 𝑢 ) as functions of 𝑉 and to satisfy the following Bellman equation for associated cost function
𝐶(𝑉 ):
Here 𝑉 𝑒 is given by equation (40.2), which reflects the assumption that once the worker is employed, he is beyond the
reach of the unemployment insurance agency.
The right side of Bellman equation (40.5) is attained by policy functions 𝑐 = 𝑐(𝑉 ), 𝑎 = 𝑎(𝑉 ), and 𝑉 𝑢 = 𝑉 𝑢 (𝑉 ).
The promise-keeping constraint, equation (40.6), asserts that the 3-tuple (𝑐, 𝑎, 𝑉 𝑢 ) attains at least 𝑉 .
Let 𝜃 be a Lagrange multiplier on constraint (40.6).
At an interior solution, first-order conditions with respect to 𝑐, 𝑎, and 𝑉 𝑢 , respectively, are
1
𝜃= ,
𝑢′ (𝑐)
1 (40.7)
𝐶(𝑉 𝑢 ) = 𝜃 [ ′ − (𝑉 𝑒 − 𝑉 𝑢 )] ,
𝛽𝑝 (𝑎)
𝐶 ′ (𝑉 𝑢 ) = 𝜃 .
The envelope condition 𝐶 ′ (𝑉 ) = 𝜃 and the third equation of (40.7) imply that 𝐶 ′ (𝑉 𝑢 ) = 𝐶 ′ (𝑉 ).
Strict convexity of 𝐶 then implies that 𝑉 𝑢 = 𝑉 .
Applied repeatedly over time, 𝑉 𝑢 = 𝑉 makes the continuation value remain constant during the entire spell of unem-
ployment.
The first equation of (40.7) determines 𝑐, and the second equation of (40.7) determines 𝑎, both as functions of promised
value 𝑉 .
That 𝑉 𝑢 = 𝑉 then implies that 𝑐 and 𝑎 are held constant during the unemployment spell.
Thus, the unemployed worker’s consumption 𝑐 and search effort 𝑎 are both fully smoothed during the unemployment
spell.
But the worker’s consumption is not smoothed across states of employment and unemployment unless 𝑉 = 𝑉 𝑒 .
The preceding efficient insurance scheme assumes that the insurance agency controls both 𝑐 and 𝑎.
The insurance agency cannot simply provide 𝑐 and then allow the worker to choose 𝑎.
Here is why.
The agency delivers a value 𝑉 𝑢 higher than the autarky value 𝑉aut by doing two things.
It increases the unemployed worker’s consumption 𝑐 and decreases his search effort 𝑎.
The prescribed search effort is higher than what the worker would choose if he were to be guaranteed consumption level
𝑐 while he remains unemployed.
This follows from the first two equations of (40.7) and the fact that the insurance scheme is costly, 𝐶(𝑉 𝑢 ) > 0, which
imply [𝛽𝑝′ (𝑎)]−1 > (𝑉 𝑒 − 𝑉 𝑢 ).
Now look at the worker’s first-order condition (40.4) under autarky.
It implies that if search effort 𝑎 > 0, then [𝛽𝑝′ (𝑎)]−1 = [𝑉 𝑒 −𝑉 𝑢 ], which is inconsistent with the inequality [𝛽𝑝′ (𝑎)]−1 >
(𝑉 𝑒 − 𝑉 𝑢 ) that prevails when 𝑎 > 0 when the agency controls both 𝑎 and 𝑐.
If he were free to choose 𝑎, the worker would therefore want to fulfill (40.4), either at equality so long as 𝑎 > 0, or by
setting 𝑎 = 0 otherwise.
Starting from the 𝑎 associated with the full-information social insurance scheme in which the agency controls both 𝑐 and
𝑎, the worker would establish the desired equality in (40.4) by lowering 𝑎, thereby decreasing the term [𝛽𝑝′ (𝑎)]−1 (which
also lowers (𝑉 𝑒 − 𝑉 𝑢 ) when the value of being unemployed 𝑉 𝑢 increases).
If an equality can be established before 𝑎 reaches zero, this would be the worker’s preferred search effort; otherwise the
worker would find it optimal to accept the insurance payment, set 𝑎 = 0, and never work again.
Thus, since the worker does not take the cost of the insurance scheme into account, he would choose a search effort below
the socially optimal, full-information level.
The full-information contract thus relies on the agency’s ability to control both the unemployed worker’s consumption and
his search effort.
Following [Shavell and Weiss, 1979] and [Hopenhayn and Nicolini, 1997], now assume that the unemployment insurance
agency cannot observe or control 𝑎, though it can observe and control 𝑐.
The worker is free to choose 𝑎, which puts expression (40.4), the worker’s first-order condition under autarky, back in
the picture.
• We are assuming that the worker’s best response to the unemployment insurance arrangement is completely char-
acterized by the first-order condition (40.4), an instance of the so-called first-order approach to incentive problems.
Given a contract, the individual will choose search effort according to first-order condition (40.4).
This fact motivates the insurance agency to design an unemployment insurance contract that respects this restriction.
Thus, the contract design problem is now to minimize the right side of equation (40.5) subject to expression (40.6) and
the incentive constraint (40.4).
Since the restrictions (40.4) and (40.6) are not linear and generally do not define a convex set, it becomes challenging to
provide conditions under which the solution to the dynamic programming problem results in a convex function 𝐶(𝑉 ).
• Sometimes this complication can be handled by convexifying the constraint set by introducing lotteries.
• A common finding is that optimal plans do not involve lotteries, because convexity of the constraint set is a sufficient
but not necessary condition for convexity of the cost function.
• In order to characterize the optimal solution, we follow Hopenhayn and Nicolini (1997) [Hopenhayn and Nicolini,
1997] by hopefully proceeding under the assumption that 𝐶(𝑉 ) is strictly convex.
Let 𝜂 be the multiplier on constraint (40.4), while 𝜃 continues to denote the multiplier on constraint (40.6).
But now we replace the weak inequality in (40.6) by an equality.
• We do this because the unemployment insurance agency cannot award a higher utility than 𝑉 because that might
violate an incentive-compatibility constraint for exerting the proper search effort in earlier periods.
At an interior solution, first-order conditions with respect to 𝑐, 𝑎, and 𝑉 𝑢 , respectively, are
1
𝜃= ,
𝑢′ (𝑐)
1 𝑝″ (𝑎) 𝑒
𝐶(𝑉 𝑢 ) = 𝜃 [ − (𝑉 𝑒
− 𝑉 𝑢
)] − 𝜂 (𝑉 − 𝑉 𝑢 )
𝛽𝑝′ (𝑎) 𝑝′ (𝑎)
(40.8)
𝑝″ (𝑎) 𝑒
= −𝜂 ′ (𝑉 − 𝑉 𝑢 ) ,
𝑝 (𝑎)
𝑝′ (𝑎)
𝐶 ′ (𝑉 𝑢 ) = 𝜃 − 𝜂 ,
1 − 𝑝(𝑎)
where the second equality in the second equation in (40.8) follows from strict equality of the incentive constraint (40.4)
when 𝑎 > 0.
As long as the insurance scheme is associated with costs, so that 𝐶(𝑉 𝑢 ) > 0, the first-order condition in the second
equation of (40.8) implies that the multiplier 𝜂 is strictly positive.
The first-order condition in the second equation of the third equality in (40.8) and the envelope condition 𝐶 ′ (𝑉 ) = 𝜃
together allow us to conclude that 𝐶 ′ (𝑉 𝑢 ) < 𝐶 ′ (𝑉 ).
Convexity of 𝐶 then implies that 𝑉 𝑢 < 𝑉 .
After we have also used the first equation of (40.8), it follows that in order to provide the proper incentives, the consump-
tion of the unemployed worker must decrease as the duration of the unemployment spell lengthens.
It also follows from (40.4) at equality that search effort 𝑎 rises as 𝑉 𝑢 falls, i.e., it rises with the duration of unemployment.
The of benefits on the duration of unemployment is designed to provide the worker an incentive to search.
To understand this, from the third equation of (40.8), notice how the conclusion that consumption falls with the duration of
unemployment depends on the assumption that more search effort raises the prospect of finding a job, i.e., that 𝑝′ (𝑎) > 0.
If 𝑝′ (𝑎) = 0, then the third equation of (40.8) and the strict convexity of 𝐶 imply that 𝑉 𝑢 = 𝑉 .
Thus, when 𝑝′ (𝑎) = 0, there is no reason for the planner to make consumption fall with the duration of unemployment.
It is useful to note that there are natural lower and upper bounds to the set of continuation values 𝑉 𝑢 .
The lower bound is the expected lifetime utility in autarky, 𝑉aut .
To compute an upper bound, represent condition (40.4) as
𝑉 𝑢 ≥ 𝑉 𝑒 − [𝛽𝑝′ (𝑎)]−1 ,
so that consumption is
log[𝑟𝛽(𝑉 𝑒 − 𝑉 𝑢 )]
𝑎 = max {0, }. (40.11)
𝑟
Formulas (40.9) and (40.11) express (𝑐, 𝑎) as functions of 𝑉 and the continuation value 𝑉 𝑢 .
Using these functions allows us to write the Bellman equation in 𝐶(𝑉 ) as
𝐶(𝑉 ) = min
𝑢
{𝑐 + 𝛽[1 − 𝑝(𝑎)]𝐶(𝑉 𝑢 )} (40.12)
𝑉
We’ll approximate the planner’s optimal cost function with cubic splines.
To do this, we’ll load some useful modules
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
class params_instance:
def __init__(self,
r,
β = 0.999,
σ = 0.500,
w = 100,
n_grid = 50):
self.β,self.σ,self.w,self.r = β,σ,w,r
self.n_grid = n_grid
uw = self.w**(1-self.σ)/(1-self.σ) #Utility from consuming all wage
self.Ve = uw/(1-β)
For the other parameters appearing in the above Python code, we’ll calibrate parameter 𝑟 that pins down the function
𝑝(𝑎) = 1 − exp(−𝑟𝑎) to match an observerd hazard rate – the probability that an unemployed worker finds a job each –
in US data.
In particular, we seek an 𝑟 so that in autarky p(a(r)) = 0.1, where a is the optimal search effort.
First, we create some helper functions.
def invp_prime(x,r):
return -np.log(x/r)/r
def p_prime(a,r):
return r*np.exp(-r*a)
def u_inv(self,x):
return ((1-self.σ)*x)**(1/(1-self.σ))
Recall that under autarky the value for an unemployed worker satisfies the Bellman equation
3. Evaluate the difference between the LHS and RHS of the Bellman equation (40.13)
4. Update guess for 𝑉 𝑢 accordingly, then return to 2) and repeat until the Bellman equation is satisfied.
For a given 𝑟 and guess 𝑉 𝑢 , the function Vu_error calculates the error in the Bellman equation under the optimal
search intensity.
We’ll soon use this as an input to computing 𝑉 𝑢 .
a = invp_prime(1/(β*(Ve-Vu)),r)
error = u(self,0) -a + β*(p(a,r)*Ve + (1-p(a,r))*Vu) - Vu
return error
Since the calibration exercise is to match the hazard rate under autarky to the data, we must find a parameter 𝑟 to match
p(a,r) = 0.1.
The function below r_error calculates, for a given guess of 𝑟 the difference between the model implied equilibrium
hazard rate and 0.1.
We’ll use this to compute a calibrated 𝑟∗ .
We want to compute an 𝑟 that is consistent with the hazard rate 0.1 in autarky.
To do so, we will use a bisection strategy.
r_calibrated = sp.optimize.brentq(r_error_Λ,1e-10,1-1e-10)
print(f"Parameter to match 0.1 hazard rate: r = {r_calibrated}")
print(f"Check p at r: {p(a_aut,r_calibrated)}")
Now that we have calibrated our the parameter 𝑟, we can continue with solving the model with private information.
Our approach to solving the full model follows ideas of Judd (1998) [Judd, 1998], who uses a polynomial to approximate
the value function and a numerical optimizer to perform the optimization at each iteration.
Note: For further details of the Judd (1998) [Judd, 1998] method, see [Ljungqvist and Sargent, 2018], Section 5.7.
We will use cubic splines to interpolate across a pre-set grid of points to approximate the value function.
Our strategy involves finding a function 𝐶(𝑉 ) – the expected cost of giving the worker value 𝑉 – that satisfies the Bellman
equation:
Notice that in equations (40.9) and (40.11), we have analytical solutions of 𝑐 and 𝑎 in terms of promised value 𝑉 and 𝑉 𝑢
(and other parameters).
We can substitute these equations for 𝑐 and 𝑎 and obtain the functional equation (40.12).
def calc_c(self,Vu,V,a):
'''
Calculates the optimal consumption choice coming from the constraint of the␣
↪insurer's problem
def calc_a(self,Vu):
'''
Calculates the optimal effort choice coming from the worker's effort optimality␣
↪condition.
'''
r,β,Ve = self.r,self.β,self.Ve
With these analytical solutions for optimal 𝑐 and 𝑎 in hand, we can reduce the minimization to (40.12) in the single
variable 𝑉 𝑢 .
With this in hand, we have our algorithm.
40.3.5 Algorithm
# Operator iterate_C that calculates the next iteration of the cost function.
def iterate_C(self,C_old,Vu_grid):
'''
We solve the model by minimising the value function across a grid of possible␣
↪promised values.
'''
β,r,n_grid = self.β,self.r,self.n_grid
C_new = np.zeros(n_grid)
cons_star = np.zeros(n_grid)
a_star = np.zeros(n_grid)
V_star = np.zeros(n_grid)
C_new2 = np.zeros(n_grid)
V_star2 = np.zeros(n_grid)
res = sp.optimize.minimize_scalar(C_Vi_temp_interp,method='bounded',bounds =␣
↪(Vu_min,Vu_max))
V_star[V_i] = res.x
C_new[V_i] = res.fun
return C_new,V_star,cons_star,a_star
The following code executes steps 4 and 5 in the Algorithm until convergence to a function 𝐶 ∗ (𝑉 ).
C_init = np.ones(self.n_grid)*0
C_old = np.copy(C_init)
return C_new,V_new,cons_star,a_star
40.4 Outcomes
Using the above functions, we create another instance of the parameters with our calibrated parameter 𝑟.
#Set up grid
Vu_min = Vu_aut
Vu_max = params.Ve - 1/(params.β*p_prime(0,params.r))
Vu_grid = np.linspace(Vu_min,Vu_max,params.n_grid)
#Solve model
C_star,V_star,cons_star,a_star = solve_incomplete_info_model(params,Vu_grid,Vu_aut,
↪tol = 1e-6,max_iter = 10000) #,cons_star,a_star
# Since we have the policy functions in grid form, we will interpolate them to be␣
↪able to
Iteration: 0, error:72.95964854907824
Let’s graph the replacement ratio (𝑐/𝑤) and search effort 𝑎 as functions of the duration of unemployment.
We’ll do this for three levels of 𝑉0 , the lowest being the autarky value 𝑉aut .
We accomplish this by using the optimal policy functions V_star, cons_star and a_star computed above as well
the following iterative procedure:
fontSize = 10
plt.rc('font', size=fontSize) # controls default text sizes
plt.rc('axes', titlesize=fontSize) # fontsize of the axes title
plt.rc('axes', labelsize=fontSize) # fontsize of the x and y labels
plt.rc('xtick', labelsize=fontSize) # fontsize of the tick labels
plt.rc('ytick', labelsize=fontSize) # fontsize of the tick labels
plt.rc('legend', fontsize=fontSize) # legend fontsize
f1 = plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(range(T_max-1),cons_t[:,0]/params.w,label = '$V^u_0$ = 16759 (aut)',color =
↪'red')
plt.subplot(2,1,2)
plt.plot(range(T_max-1),a_t[:,0],color = 'red')
plt.plot(range(T_max-1),a_t[:,1],color = 'blue')
plt.plot(range(T_max-1),a_t[:,2],color = 'green')
plt.ylim(0,320)
plt.ylabel("Optimal search effort (a)")
plt.xlabel("Duration of unemployment")
plt.title("Optimal search effort")
plt.show()
For an initial promised value 𝑉 𝑢 = 𝑉aut , the planner chooses the autarky level of 0 for the replacement ratio and instructs
the worker to search at the autarky search intensity, regardless of the duration of unemployment
But for 𝑉 𝑢 > 𝑉aut , the planner makes the replacement ratio decline and search effort increase with the duration of
unemployment.
40.4.2 Interpretations
The downward slope of the replacement ratio when 𝑉 𝑢 > 𝑉aut is a consequence of the planner’s limited information
about the worker’s search effort.
By providing the worker with a duration-dependent schedule of replacement ratios, the planner induces the worker in
effect to reveal his/her search effort to the planner.
We saw earlier that with full information, the planner would smooth consumption over an unemployment spell by keeping
the replacement ratio constant.
With private information, the planner can’t observe the worker’s search effort and therefore makes the replacement ratio
fall.
Evidently, search effort rise as the duration of unemployment increases, especially early in an unemployment spell.
There is a carrot-and-stick aspect to the replacement rate and search effort schedules:
• the carrot occurs in the forms of high compensation and low search effort early in an unemployment spell.
• the stick occurs in the low compensation and high effort later in the spell.
We shall encounter a related carrot-and-stick feature in our other lectures about dynamic programming squared.
The planner offers declining benefits and induces increased search effort as the duration of an unemployment spell rises in
order to provide an unemployed worker with proper incentives, not to punish an unlucky worker who has been unemployed
for a long time.
The planner believes that a worker who has been unemployed a long time is unlucky, not that he has done anything wrong
(e.g.,that he has not lived up to the contract).
Indeed, the contract is designed to induce the unemployed workers to search in the way the planner expects.
The falling consumption and rising search effort of the unlucky ones with long unemployment spells are simply costs that
have to be paid in order to provide proper incentives.
FORTYONE
STACKELBERG PLANS
In addition to what’s in Anaconda, this lecture will need the following libraries:
41.1 Overview
This lecture formulates and computes a plan that a Stackelberg leader uses to manipulate forward-looking decisions of a
Stackelberg follower that depend on continuation sequences of decisions made once and for all by the Stackelberg leader
at time 0.
To facilitate computation and interpretation, we formulate things in a context that allows us to apply dynamic programming
for linear-quadratic models.
Technically, our calculations are closely related to ones described this lecture.
From the beginning, we carry along a linear-quadratic model of duopoly in which firms face adjustment costs that make
them want to forecast actions of other firms that influence future prices.
Let’s start with some standard imports:
import numpy as np
import numpy.linalg as la
import quantecon as qe
from quantecon import LQ
import matplotlib.pyplot as plt
41.2 Duopoly
𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 )
where 𝑞𝑖𝑡 is output of firm 𝑖 at time 𝑡 and 𝑎0 and 𝑎1 are both positive.
𝑞10 , 𝑞20 are given numbers that serve as initial conditions at time 0.
By incurring a cost equal to
2
𝛾𝑣𝑖𝑡 , 𝛾 > 0,
771
Advanced Quantitative Economics with Python
where the appearance behind the semi-colon indicates that 𝑞2⃗ is given.
Firm 1’s problem induces the best response mapping
𝑞1⃗ = 𝐵(𝑞2⃗ )
whose maximizer is a sequence 𝑞2⃗ that depends on the initial conditions 𝑞10 , 𝑞20 and the parameters of the model 𝑎0 , 𝑎1 , 𝛾.
This formulation captures key features of the model
• Both firms make once-and-for-all choices at time 0.
• This is true even though both firms are choosing sequences of quantities that are indexed by time.
• The Stackelberg leader chooses first within time 0, knowing that the Stackelberg follower will choose second
within time 0.
While our abstract formulation reveals the timing protocol and equilibrium concept well, it obscures details that must be
addressed when we want to compute and interpret a Stackelberg plan and the follower’s best response to it.
To gain insights about these things, we study them in more detail.
Firm 2 knows that firm 1 chooses second and takes this into account in choosing {𝑞2𝑡+1 }∞
𝑡=0 .
In the spirit of working backward, we study firm 1’s problem first, taking {𝑞2𝑡+1 }∞
𝑡=0 as given.
We approach this problem using methods described in [Ljungqvist and Sargent, 2018], chapter 2, appendix A and [Sar-
gent, 1987], chapter IX.
First-order conditions for this problem are
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
These first-order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to take the form
𝛽𝑎0 𝛽𝑎 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 1 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We can substitute the second equation into the first equation to obtain
Equation (41.1) is a second-order difference equation in the sequence 𝑞1⃗ whose solution we want.
It satisfies two boundary conditions:
• an initial condition that 𝑞1,0 , which is given
• a terminal condition requiring that lim𝑇 →+∞ 𝛽 𝑇 𝑞1𝑡
2
< +∞
Using the lag operators described in [Sargent, 1987], chapter IX, difference equation (41.1) can be written as
1 + 𝛽 + 𝑐1
𝛽(1 − 𝐿 + 𝛽 −1 𝐿2 )𝑞1𝑡+2 = −𝑐0 + 𝑐2 𝑞2𝑡+1
𝛽
The polynomial in the lag operator on the left side can be factored as
1 + 𝛽 + 𝑐1
(1 − 𝐿 + 𝛽 −1 𝐿2 ) = (1 − 𝛿1 𝐿)(1 − 𝛿2 𝐿) (41.2)
𝛽
Because 𝛿2 > √1𝛽 the operator (1−𝛿2 𝐿) contributes an unstable component if solved backwards but a stable component
if solved forwards.
Mechanically, write
Operating on both sides of equation (41.2) with 𝛽 −1 times this inverse operator gives the follower’s decision rule for
setting 𝑞1𝑡+1 in the feedback-feedforward form
∞
1
𝑞1𝑡+1 = 𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐2 𝛿2−1 𝛽 −1 ∑ 𝛿2𝑗 𝑞2𝑡+𝑗+1 , 𝑡≥0 (41.3)
1 − 𝛿2 𝑗=0
The problem of the Stackelberg leader firm 2 is to choose the sequence {𝑞2𝑡+1 }∞
𝑡=0 to maximize its discounted profits
∞
∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
This renders a direct attack on the problem in the space of sequences cumbersome.
Therefore, below we will formulate the Stackelberg leader’s problem recursively.
We’ll proceed by putting our duopoly model into a broader class of models with the same general structure.
We formulate a class of linear-quadratic Stackelberg leader-follower problems of which our duopoly model is an instance.
We use the optimal linear regulator (a.k.a. the linear-quadratic dynamic programming problem described in LQ Dynamic
Programming problems) to represent a Stackelberg leader’s problem recursively.
Let 𝑧𝑡 be an 𝑛𝑧 × 1 vector of natural state variables.
Let 𝑥𝑡 be an 𝑛𝑥 × 1 vector of endogenous forward-looking variables that are physically free to jump at 𝑡.
In our duopoly example 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the Stackelberg follower.
𝑟(𝑦, 𝑢) = 𝑦′ 𝑅𝑦 + 𝑢′ 𝑄𝑢
Subject to an initial condition for 𝑧0 , but not for 𝑥0 , the Stackelberg leader wants to maximize
∞
− ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 ) (41.5)
𝑡=0
𝐼 0 𝑧 𝐴̂ 𝐴12̂ 𝑧 ̂ 𝑡
[ ] [ 𝑡+1 ] = [ 11
̂ ̂ ] [ 𝑡 ] + 𝐵𝑢 (41.6)
𝐺21 𝐺22 𝑥𝑡+1 𝐴21 𝐴22 𝑥 𝑡
𝐼 0
We assume that the matrix [ ] on the left side of equation (41.6) is invertible, so that we can multiply both
𝐺21 𝐺22
sides by its inverse to obtain
𝑧 𝐴 𝐴12 𝑧𝑡
[ 𝑡+1 ] = [ 11 ] [ ] + 𝐵𝑢𝑡 (41.7)
𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡
or
The Stackelberg follower’s best response mapping is summarized by the second block of equations of (41.7).
In particular, these equations are the first-order conditions of the Stackelberg follower’s optimization problem (i.e., its
Euler equations).
These Euler equations summarize the forward-looking aspect of the follower’s behavior and express how its time 𝑡 decision
depends on the leader’s actions at times 𝑠 ≥ 𝑡.
When combined with a stability condition to be imposed below, the Euler equations summarize the follower’s best response
to the sequence of actions by the leader.
The Stackelberg leader maximizes (41.5) by choosing sequences {𝑢𝑡 , 𝑥𝑡 , 𝑧𝑡+1 }∞
𝑡=0 subject to (41.8) and an initial condi-
tion for 𝑧0 .
Note that we have an initial condition for 𝑧0 but not for 𝑥0 .
𝑥0 is among the variables to be chosen at time 0 by the Stackelberg leader.
The Stackelberg leader uses its understanding of the responses restricted by (41.8) to manipulate the follower’s decisions.
Please remember that the follower’s system of Euler equations is embedded in the system of dynamic equations 𝑦𝑡+1 =
𝐴𝑦𝑡 + 𝐵𝑢𝑡 .
Note that the definition of Ω(𝑦0 ) treats 𝑦0 as given.
Although it is taken as given in Ω(𝑦0 ), eventually, the 𝑥0 component of 𝑦0 is to be chosen by the Stackelberg leader.
Subproblem 2
Now let’s map our duopoly model into the above setup.
We formulate a state vector
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
where for our duopoly model
1
𝑧𝑡 = ⎡ 𝑞 ⎤
⎢ 2𝑡 ⎥ , 𝑥𝑡 = 𝑣1𝑡 ,
⎣𝑞1𝑡 ⎦
where 𝑥𝑡 = 𝑣1𝑡 is the time 𝑡 decision of the follower firm 1, 𝑢𝑡 is the time 𝑡 decision of the leader firm 2 and
𝑣1𝑡 = 𝑞1𝑡+1 − 𝑞1𝑡 , 𝑢𝑡 = 𝑞2𝑡+1 − 𝑞2𝑡 .
For our duopoly model, initial conditions for the natural state variables in 𝑧𝑡 are
1
𝑧0 = ⎡𝑞 ⎤
⎢ 20 ⎥
⎣𝑞10 ⎦
while 𝑥0 = 𝑣10 = 𝑞11 − 𝑞10 is a choice variable for the Stackelberg leader firm 2, one that will ultimately be chosen
according an optimal rule prescribed by (41.10) for subproblem 2 above.
That the Stackelberg leader firm 2 chooses 𝑥0 = 𝑣10 is subtle.
Of course, 𝑥0 = 𝑣10 emerges from the feedback-feedforward solution (41.3) of firm 1’s system of Euler equations, so
that it is actually firm 1 that sets 𝑥0 .
⃗ = {𝑞2𝑡+1 }∞
But firm 2 manipulates firm 1’s choice through firm 2’s choice of the sequence 𝑞2,1 𝑡=0 .
Now we’ll proceed to cast our duopoly model within the framework of the more general linear-quadratic structure de-
scribed above.
That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation to solve a linear-quadratic dynamic
program.
As emphasized above, firm 1 acts as if firm 2’s decisions {𝑞2𝑡+1 , 𝑣2𝑡 }∞
𝑡=0 are given and beyond its control.
where
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
with 𝑥𝑡 = 𝑣1𝑡 and
𝑅1 0
𝑅=[ ]
0 0
In order to attain an appropriate representation of the Stackelberg leader’s history-dependent plan, we will employ what
amounts to a version of the Big K, little k device often used in macroeconomics by distinguishing 𝑧𝑡 , which depends
partly on decisions 𝑥𝑡 of the followers, from another vector 𝑧𝑡̌ , which does not.
We will use 𝑧𝑡̌ and its history 𝑧𝑡̌ = [𝑧𝑡̌ , 𝑧𝑡−1
̌ , … , 𝑧0̌ ] to describe the sequence of the Stackelberg leader’s decisions that
the Stackelberg follower takes as given.
Thus, we let 𝑦𝑡′̌ = [𝑧𝑡′̌ 𝑥′𝑡̌ ] with initial condition 𝑧0̌ = 𝑧0 given.
That we distinguish 𝑧𝑡̌ from 𝑧𝑡 is part and parcel of the Big K, little k device in this instance.
We have demonstrated that a Stackelberg plan for {𝑢𝑡 }∞
𝑡=0 has a recursive representation
−1
𝑥0̌ = −𝑃22 𝑃21 𝑧0
𝑢𝑡 = −𝐹 𝑦𝑡̌ , 𝑡 ≥ 0
𝑦𝑡+1
̌ = (𝐴 − 𝐵𝐹 )𝑦𝑡̌ , 𝑡≥0
From this representation, we can deduce the sequence of functions 𝜎 = {𝜎𝑡 (𝑧𝑡̌ )}∞
𝑡=0 that comprise a Stackelberg plan.
𝑧̌
For convenience, let 𝐴 ̌ ≡ 𝐴 − 𝐵𝐹 and partition 𝐴 ̌ conformably to the partition 𝑦𝑡 = [ 𝑡 ] as
𝑥𝑡̌
𝐴̌ ̌
𝐴12
[ 11̌ ̌ ]
𝐴21 𝐴22
−1
Let 𝐻00 ≡ −𝑃22 𝑃21 so that 𝑥0̌ = 𝐻00 𝑧0̌ .
𝑧̌
̌ = 𝐴𝑦̌ 𝑡̌ starting from initial condition 𝑦0̌ = [ 00 ] imply that for 𝑡 ≥ 1
Then iterations on 𝑦𝑡+1
𝐻0 𝑧0̌
𝑡
𝑥𝑡̌ = ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗
̌
𝑗=1
where
̌
𝐻1𝑡 = 𝐴21
𝐻𝑡 = 𝐴̌ 𝐴̌
2 22 21
⋮ ⋮
𝑡
𝐻𝑡−1 ̌
= 𝐴𝑡−2 ̌
22 𝐴21
̌
𝐻𝑡𝑡 = 𝐴𝑡−1 ̌ ̌ 0
22 (𝐴21 + 𝐴22 𝐻0 )
𝑧̌
𝑢𝑡 = −𝐹 𝑦𝑡̌ ≡ − [𝐹𝑧 𝐹𝑥 ] [ 𝑡 ]
𝑥𝑡
or
𝑡
𝑢𝑡 = −𝐹𝑧 𝑧𝑡̌ − 𝐹𝑥 ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗 = 𝜎𝑡 (𝑧 𝑡̌ ) (41.11)
𝑗=1
Representation (41.11) confirms that whenever 𝐹𝑥 ≠ 0, the typical situation, the time 𝑡 component 𝜎𝑡 of a Stackelberg
plan is history-dependent, meaning that the Stackelberg leader’s choice 𝑢𝑡 depends not just on 𝑧𝑡̌ but on components of
𝑧 𝑡−1
̌ .
Because we set 𝑧0̌ = 𝑧0 , it will turn out that 𝑧𝑡 = 𝑧𝑡̌ for all 𝑡 ≥ 0.
Then why did we distinguish 𝑧𝑡̌ from 𝑧𝑡 ?
The answer is that if we want to present to the Stackelberg follower a history-dependent representation of the Stackel-
berg leader’s sequence 𝑞2⃗ , we must use representation (41.11) cast in terms of the history 𝑧 𝑡̌ and not a corresponding
representation cast in terms of 𝑧𝑡 .
Given the sequence 𝑞2⃗ chosen by the Stackelberg leader in our duopoly model, it turns out that the Stackelberg follower’s
problem is recursive in the natural state variables that confront a follower at any time 𝑡 ≥ 0.
This means that the follower’s plan is time consistent.
To verify these claims, we’ll formulate a recursive version of a follower’s problem that builds on our recursive represen-
tation of the Stackelberg leader’s plan and our use of the Big K, little k idea.
We now use what amounts to another “Big 𝐾, little 𝑘” trick (see rational expectations equilibrium) to formulate a recursive
version of a follower’s problem cast in terms of an ordinary Bellman equation.
Firm 1, the follower, faces {𝑞2𝑡 }∞
𝑡=0 as a given quantity sequence chosen by the leader and believes that its output price
at 𝑡 satisfies
To do so, recall that under the Stackelberg plan, firm 2 sets output according to the 𝑞2𝑡 component of
1
⎡𝑞 ⎤
𝑦𝑡+1 = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡 ⎥
⎣ 𝑥𝑡 ⎦
which is governed by
𝑦𝑡+1 = (𝐴 − 𝐵𝐹 )𝑦𝑡
To obtain a recursive representation of a {𝑞2𝑡 } sequence that is exogenous to firm 1, we define a state 𝑦𝑡̃
1
⎡𝑞 ⎤
𝑦𝑡̃ = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑥𝑡̃ ⎦
that evolves according to
𝑦𝑡+1
̃ = (𝐴 − 𝐵𝐹 )𝑦𝑡̃
−1
subject to the initial condition 𝑞10
̃ = 𝑞10 and 𝑥0̃ = 𝑥0 where 𝑥0 = −𝑃22 𝑃21 as stated above.
Firm 1’s state vector is
𝑦𝑡̃
𝑋𝑡 = [ ]
𝑞1𝑡
It follows that the follower firm 1 faces law of motion
𝑦̃ 𝐴 − 𝐵𝐹 0 𝑦𝑡̃ 0
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑥𝑡 (41.12)
𝑞1𝑡+1 0 1 𝑞1𝑡 1
This specification assures that from the point of the view of firm 1, 𝑞2𝑡 is an exogenous process.
Here
• 𝑞1𝑡
̃ , 𝑥𝑡̃ play the role of Big K
• 𝑞1𝑡 , 𝑥𝑡 play the role of little k
The time 𝑡 component of firm 1’s objective is
′ 𝑎0
1 0 0 0 0 2 1
⎡𝑞 ⎤ ⎡0 0 0 0 − 𝑎21 ⎤ ⎡𝑞2𝑡 ⎤
2𝑡
̃ 𝑡 − 𝑥2𝑡 𝑄̃ = ⎢
𝑋̃ 𝑡′ 𝑅𝑥 ⎢𝑞1𝑡
̃ ⎥
⎥ ⎢
⎢0 0 0 0
⎥⎢ ⎥
0 ⎥ ⎢𝑞1𝑡 ̃ ⎥ − 𝛾𝑥2𝑡
⎢ 𝑥𝑡̃ ⎥ ⎢0 0 0 0 0 ⎥ ⎢ 𝑥𝑡̃ ⎥
𝑎
⎣𝑞1𝑡 ⎦ ⎣ 20 − 𝑎21 0 0 −𝑎1 ⎦ ⎣𝑞1𝑡 ⎦
Firm 1’s optimal decision rule is
𝑥𝑡 = − 𝐹 ̃ 𝑋 𝑡
and its state evolves according to
𝑋̃ 𝑡+1 = (𝐴 ̃ − 𝐵̃ 𝐹 ̃ )𝑋𝑡
under its optimal decision rule.
Later we shall compute 𝐹 ̃ and verify that when we set
1
⎡𝑞 ⎤
⎢ 20 ⎥
𝑋0 = ⎢𝑞10 ⎥
⎢ 𝑥0 ⎥
⎣𝑞10 ⎦
we recover
𝑥0 = −𝐹 ̃ 𝑋̃ 0 ,
which will verify that we have properly set up a recursive representation of the follower’s problem facing the Stackelberg
leader’s 𝑞2⃗ .
The follower can solve its problem using dynamic programming because its problem is recursive in what for it are the
natural state variables, namely
1
⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑥𝑡̃ ⎦
It follows that the follower’s plan is time consistent.
Here is our code to compute a Stackelberg plan via the linear-quadratic dynamic program describe above.
Let’s use it to compute the Stackelberg plan.
# Parameters
a0 = 10
a1 = 2
β = 0.96
γ = 120
n = 300
tol0 = 1e-8
tol1 = 1e-16
tol2 = 1e-2
βs = np.ones(n)
βs[1:] = β
βs = βs.cumprod()
# In LQ form
Alhs = np.eye(4)
Arhs = np.eye(4)
Arhs[2, 3] = 1
Alhsinv = la.inv(Alhs)
A = Alhsinv @ Arhs
Q = np.array([[γ]])
# Simulate forward
π_leader = np.zeros(n)
z0 = np.array([[1, 1, 1]]).T
x0 = H_0_0 @ z0
y0 = np.vstack((z0, x0))
π_matrix = (R + F. T @ Q @ F)
for t in range(n):
π_leader[t] = -(yt[:, t].T @ π_matrix @ yt[:, t])
# Display policies
print("Computed policy for Continuation Stackelberg leader\n")
print(f"F = {F}")
Now let’s use the code to compute and display outcomes as a Stackelberg plan unfolds.
The following code plots quantities chosen by the Stackelberg leader and follower, together with the equilibrium output
price.
We’ll compute the value 𝑤(𝑥0 ) attained by the Stackelberg leader, where 𝑥0 is given by the maximizer (41.10) of sub-
problem 2.
We’ll compute it two ways and get the same answer.
In addition to being a useful check on the accuracy of our coding, computing things in these two ways helps us think
about the structure of the problem.
# Display values
print("Computed values for the Stackelberg leader at t=0:\n")
print(f"v_leader_forward(forward sim) = {v_leader_forward:.4f}")
print(f"v_leader_direct (direct) = {v_leader_direct:.4f}")
True
True
yt_reset = yt.copy()
yt_reset[-1, :] = (H_0_0 @ yt[:3, :])
for t in range(n):
vt_leader[t] = -yt[:, t].T @ P @ yt[:, t]
vt_reset_leader[t] = -yt_reset[:, t].T @ P @ yt_reset[:, t]
plt.tight_layout()
plt.show()
We now formulate and compute the recursive version of the follower’s problem.
We check that the recursive Big 𝐾 , little 𝑘 formulation of the follower’s problem produces the same output path 𝑞1⃗ that
we computed when we solved the Stackelberg problem
A_tilde = np.eye(5)
A_tilde[:4, :4] = A - B @ F
(continues on next page)
Q_tilde = Q
B_tilde = np.array([[0, 0, 0, 0, 1]]).T
Note: Variables with _tilde are obtained from solving the follower’s problem – those without are from the Stackelberg
problem
4.440892098500626e-16
# x0 == x0_tilde
yt[:, 0][-1] - (yt_tilde[:, 1] - yt_tilde[:, 0])[-1] < tol0
True
If we inspect coefficients in the decision rule −𝐹 ̃ , we should be able to spot why the follower chooses to set 𝑥𝑡 = 𝑥𝑡̃
when it sets 𝑥𝑡 = −𝐹 ̃ 𝑋𝑡 in the recursive formulation of the follower problem.
Can you spot what features of 𝐹 ̃ imply this?
True
for i in range(1000):
P_guess = ((R_tilde + F_tilde_star.T @ Q @ F_tilde_star) +
β * (A_tilde - B_tilde @ F_tilde_star).T @ P_guess
@ (A_tilde - B_tilde @ F_tilde_star))
112.65590740578115
112.65590740578136
for i in range(100):
# Compute P_iter
P_iter = np.zeros((5, 5))
for j in range(1000):
P_iter = ((R_tilde + F_iter.T @ Q @ F_iter) + β
* (A_tilde - B_tilde @ F_iter).T @ P_iter
@ (A_tilde - B_tilde @ F_iter))
# Update F_iter
F_iter = (β * la.inv(Q + β * B_tilde.T @ P_iter @ B_tilde)
@ B_tilde.T @ P_iter @ A_tilde)
# Simulate the system using `F_tilde_star` and check that it gives the
# same result as the original solution
for t in range(n-1):
yt_tilde_star[t+1, :] = (A_tilde - B_tilde @ F_tilde_star) \
@ yt_tilde_star[t, :]
fig, ax = plt.subplots()
ax.plot(yt_tilde_star[:, 4], 'r', label="q_tilde")
ax.plot(yt_tilde[2], 'b', label="q")
ax.legend()
plt.show()
0.0
𝑧𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2 )𝑧𝑡
# In LQ form
A = np.eye(3)
B1 = np.array([[0], [0], [1]])
B2 = np.array([[0], [1], [0]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Simulate forward
AF = A - B1 @ F1 - B2 @ F2
z = np.empty((3, n))
z[:, 0] = 1, 1, 1
for t in range(n-1):
z[:, t+1] = AF @ z[:, t]
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
q1 = z[1, :]
q2 = z[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
# Computes the maximum difference between the two quantities of the two firms
np.max(np.abs(q1 - q2))
8.881784197001252e-16
# Compute values
u1 = (- F1 @ z).flatten()
u2 = (- F2 @ z).flatten()
π_1 = p * q1 - γ * (u1) ** 2
π_2 = p * q2 - γ * (u2) ** 2
# Display values
print("Computed values for firm 1 and firm 2:\n")
print(f"v1(forward sim) = {v1_forward:.4f}; v1 (direct) = {v1_direct:.4f}")
print(f"v2 (forward sim) = {v2_forward:.4f}; v2 (direct) = {v2_direct:.4f}")
# Sanity check
Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
True
It is enlightening to compare equilbrium values for firms 1 and 2 under two alternative settings:
• A Markov perfect equilibrium like that described in this lecture
• A Stackelberg equilbrium
The following code performs the required computations, then plots the continuation values.
vt_MPE = np.zeros(n)
vt_follower = np.zeros(n)
for t in range(n):
vt_MPE[t] = -z[:, t].T @ P1 @ z[:, t]
vt_follower[t] = -yt_tilde[:, t].T @ P_tilde @ yt_tilde[:, t]
# Display values
print("Computed values:\n")
print(f"vt_leader(y0) = {vt_leader[0]:.4f}")
print(f"vt_follower(y0) = {vt_follower[0]:.4f}")
print(f"vt_MPE(y0) = {vt_MPE[0]:.4f}")
Computed values:
vt_leader(y0) = 150.0324
vt_follower(y0) = 112.6559
vt_MPE(y0) = 133.3296
# Compute the difference in total value between the Stackelberg and the MPE
vt_leader[0] + vt_follower[0] - 2 * vt_MPE[0]
-3.9709425620890784
FORTYTWO
42.1 Introduction
This lecture uses what we call a machine learning approach to compute a Ramsey plan for a version of a model
of Calvo [Calvo, 1978].
We use another approach to compute a Ramsey plan for Calvo’s model in another quantecon lecture Time Inconsistency
of Ramsey Plans.
The Time Inconsistency of Ramsey Plans lecture uses an analytic approach based on dynamic programming
squared to guide computations.
Dynamic programming squared provides information about the structure of mathematical objects in terms of which a
Ramsey plan can be represented recursively.
Using that information paves the way to computing a Ramsey plan efficiently.
Included in the structural information that dynamic programming squared provides in quantecon lecture Time Inconsis-
tency of Ramsey Plans are
• a state variable that confronts a continuation Ramsey planner, and
• two Bellman equations
– one that describes the behavior of the representative agent
– another that describes decision problems of a Ramsey planner and of a continuation Ramsey planner
In this lecture, we approach the Ramsey planner in a less sophisticated way that proceeds without knowing the mathe-
matical structure imparted by dynamic programming squared.
We simply choose a pair of infinite sequences of real numbers that maximizes a Ramsey planner’s objective function.
The pair consists of
• a sequence 𝜃 ⃗ of inflation rates
• a sequence 𝜇⃗ of money growh rates
Because it fails to take advantage of the structure recognized by dynamic programming squared and, relative to the
dynamic programming squared approach, proliferates parameters, we take the liberty of calling this a machine learning
approach.
This is similar to what other machine learning algorithms also do.
Comparing the calculations in this lecture with those in our sister lecture Time Inconsistency of Ramsey Plans provides us
with a laboratory that can help us appreciate promises and limits of machine learning approaches more generally.
In this lecture, we’ll actually deploy two machine learning approaches.
795
Advanced Quantitative Economics with Python
We study a linear-quadratic version of a model that Guillermo Calvo [Calvo, 1978] used to illustrate the time inconsis-
tency of optimal government plans.
Calvo’s model focuses on intertemporal tradeoffs between
• utility accruing from a representative agent’s anticipations of future deflation that lower the agent’s cost of holding
real money balances and prompt him to increase his liquidity, as measured by his stock of real money balances,
and
• social costs associated with the distorting taxes that a government levies to acquire the paper money that it destroys
in order to generate prospective deflation
The model features
• rational expectations
• costly government actions at all dates 𝑡 ≥ 1 that increase the representative agent’s utilities at dates before 𝑡
The model combines ideas from papers by Cagan [Cagan, 1956], [Sargent and Wallace, 1973], and Calvo [Calvo, 1978].
There is no uncertainty.
Let:
• 𝑝𝑡 be the log of the price level
• 𝑚𝑡 be the log of nominal money balances
• 𝜃𝑡 = 𝑝𝑡+1 − 𝑝𝑡 be the net rate of inflation between 𝑡 and 𝑡 + 1
• 𝜇𝑡 = 𝑚𝑡+1 − 𝑚𝑡 be the net rate of growth of nominal balances
The demand for real balances is governed by a perfect foresight version of a Cagan [Cagan, 1956] demand function for
real balances:
for 𝑡 ≥ 0.
Equation (42.1) asserts that the representative agent’s demand for real balances is inversely related to the representative
agent’s expected rate of inflation, which equals the actual rate of inflation because there is no uncertainty here.
(When there is no uncertainty, an assumption of rational expectations becomes equivalent to perfect foresight).
Subtracting the demand function (42.1) at time 𝑡 from the demand function at 𝑡 + 1 gives:
𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡
or
𝛼 1
𝜃𝑡 = 𝜃𝑡+1 + 𝜇 (42.2)
1+𝛼 1+𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 < 1.
Definition 42.3.1
For scalar 𝑏𝑡 , let 𝐿2 be the space of sequences {𝑏𝑡 }∞
𝑡=0 that satisfy
∞
∑ 𝑏𝑡2 < +∞
𝑡=0
The government values a representative household’s utility of real balances at time 𝑡 according to the utility function
𝑢2
𝑈 (𝑚𝑡 − 𝑝𝑡 ) = 𝑢0 + 𝑢1 (𝑚𝑡 − 𝑝𝑡 ) − (𝑚𝑡 − 𝑝𝑡 )2 , 𝑢0 > 0, 𝑢1 > 0, 𝑢2 > 0 (42.4)
2
The money demand function (42.1) and the utility function (42.4) imply that
𝑢2
𝑈 (−𝛼𝜃𝑡 ) = 𝑢0 + 𝑢1 (−𝛼𝜃𝑡 ) − (−𝛼𝜃𝑡 )2 . (42.5)
2
𝑢1
Note: The “bliss level” of real balances is 𝑢2 ; the inflation rate that attains it is − 𝑢𝑢1𝛼 .
2
We assume that the government incurs social costs 2𝑐 𝜇2𝑡 when it changes the stock of nominal money balances at rate 𝜇𝑡
at time 𝑡.
Therefore, the one-period welfare function of a benevolent government is
𝑐
𝑠(𝜃𝑡 , 𝜇𝑡 ) = 𝑈 (−𝛼𝜃𝑡 ) − 𝜇2𝑡 .
2
𝜃 ⃗ ∈ 𝐿2 (42.7)
where
𝛼
𝜆= .
1+𝛼
Parameters:
• Demand for money parameter is 𝛼 > 0; we set its default value 𝛼 = 1
𝛼
– Induced demand function for money parameter is 𝜆 = 1+𝛼
To prepare the way for our calculations, we’ll remind ourselves of the mathematical objects in play.
• sequences of inflation rates and money creation rates:
with
ℎ0 + ℎ1 𝜃𝑡 + ℎ2 𝜃𝑡2
ℎ0 = 𝑢0
ℎ1 = −𝛼𝑢1
𝑢2 𝛼2
ℎ2 = −
2
A Ramsey planner chooses 𝜇⃗ to maximize the government’s value function (42.9) subject to equations (42.8).
A solution 𝜇⃗ of this problem is called a Ramsey plan.
Following Calvo [Calvo, 1978], we assume that the government chooses the money growth sequence 𝜇⃗ once and for all
at, or before, time 0.
An optimal government plan under this timing protocol is an example of what is often called a Ramsey plan.
Notice that while the government is in effect choosing a bivariate time series (𝑚𝑢, ⃗ 𝜃),⃗ the government’s problem is static
in the sense that it chooses treats that time-series as a single object to be chosen at a single point in time.
We anticipate that under a Ramsey plan the sequences {𝜃𝑡 } and {𝜇𝑡 } both converge to stationary values.
Thus, we guess that under the optimal policy lim𝑡→+∞ 𝜇𝑡 = 𝜇.̄
Convergence of 𝜇𝑡 to 𝜇̄ together with formula (42.8) for the inflation rate then implies that lim𝑡→+∞ 𝜃𝑡 = 𝜇̄ as well.
We’ll guess a time 𝑇 large enough that 𝜇𝑡 has gotten very close to the limit 𝜇.̄
Then we’ll approximate 𝜇⃗ by a truncated vector with the property that
𝜇𝑡 = 𝜇̄ ∀𝑡 ≥ 𝑇
𝜃𝑡 = 𝜃 ̄ ∀𝑡 ≥ 𝑇
𝜇̃ = [𝜇0 𝜇1 ⋯ 𝜇𝑇 −1 𝜇]̄
𝜃 ̃ = [𝜃0 𝜃1 ⋯ 𝜃𝑇 −1 𝜃]̄
for 𝑡 = 0, 1, … , 𝑇 − 1.
Formula for 𝑉
Having specified a truncated vector 𝜇̃ and and having computed 𝜃 ̃ by using formula (42.10), we shall write a Python
function that computes
∞
𝑐
𝑉 ̃ = ∑ 𝛽 𝑡 (ℎ0 + ℎ1 𝜃𝑡̃ + ℎ2 𝜃𝑡2̃ − 𝜇2𝑡 ) (42.11)
𝑡=0
2
or more precisely
𝑇 −1
𝑐 𝛽𝑇 𝑐
𝑉 ̃ = ∑ 𝛽 𝑡 (ℎ0 + ℎ1 𝜃𝑡̃ + ℎ2 𝜃𝑡2̃ − 𝜇2𝑡 ) + (ℎ0 + ℎ1 𝜇̄ + ℎ2 𝜇2̄ − 𝜇2̄ )
𝑡=0
2 1 − 𝛽 2
We now describe code that maximizes the criterion function (42.9) subject to equations (42.8) by choice of the truncated
vector 𝜇.̃
We use a brute force or machine learning approach that just hands our problem off to code that minimizes 𝑉 with
respect to the components of 𝜇̃ by using gradient descent.
We hope that answers will agree with those found obtained by other more structured methods in this quantecon lecture
Time Inconsistency of Ramsey Plans.
42.6.1 Implementation
We will implement the above in Python using JAX and Optax libraries.
We use the following imports in this lecture
We’ll eventually want to compare the results we obtain here to those that we obtain in those obtained in this quantecon
lecture Time Inconsistency of Ramsey Plans.
To enable us to do that, we copy the class ChangLQ used in that lecture.
We hide the cell that copies the class, but readers can find details of the class in this quantecon lecture Time Inconsistency
of Ramsey Plans.
Now we compute the value of 𝑉 under this setup, and compare it against those obtained in this section Outcomes under
Three Timing Protocols of the sister quantecon lecture Time Inconsistency of Ramsey Plans.
@jit
def compute_θ(μ, α=1):
λ = α / (1 + α)
T = len(μ) - 1
μbar = μ[-1]
return θ
@jit
def compute_hs(u0, u1, u2, α):
h0 = u0
h1 = -u1 * α
h2 = -0.5 * u2 * α**2
@jit
def compute_V(μ, β, c, α=1, u0=1, u1=0.5, u2=3):
θ = compute_θ(μ, α)
T = len(μ) - 1
t = np.arange(T)
V = V_sum + V_final
return V
deviation = 1.430511474609375e-06
if i % 100 == 0:
print(f"Iteration {i}, grad norm: {jnp.linalg.norm(grads)}")
return params
%%time
# Optimize μ
optimized_μ = adam_optimizer(grad_V, μ_init)
print(f"optimized μ = \n{optimized_μ}")
print(f"original μ = \n{clq.μ_series}")
original μ =
[-0.06450708 -0.09033982 -0.10068489 -0.10482772 -0.10648677 -0.10715115
-0.10741722 -0.10752377 -0.10756644 -0.10758352 -0.10759037 -0.10759311
-0.1075942 -0.10759464 -0.10759482 -0.10759489 -0.10759492 -0.10759493
-0.10759493 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494]
deviation = 2.308478030954575e-07
Array(6.8357825, dtype=float32)
Array(6.835783, dtype=float32)
We take a brief detour to solve a restricted version of the Ramsey problem defined above.
First, recall that a Ramsey planner chooses 𝜇⃗ to maximize the government’s value function (42.9) subject to equations
(42.8).
We now define a distinct problem in which the planner chooses 𝜇⃗ to maximize the government’s value function (42.9)
subject to equation (42.8) and the additional restriction that 𝜇𝑡 = 𝜇̄ for all 𝑡.
The solution of this problem is a time-invariant 𝜇𝑡 that this quantecon lecture Time Inconsistency of Ramsey Plans calls
𝜇𝐶𝑅 .
# Optimize μ
optimized_μ_CR = adam_optimizer(grad_V, μ_init)
print(f"optimized μ = \n{optimized_μ_CR}")
Comparing it to 𝜇𝐶𝑅 in Time Inconsistency of Ramsey Plans, we again obtained very close answers.
np.linalg.norm(clq.μ_CR - optimized_μ_CR)
3.7252903e-08
Array(6.8333354, dtype=float32)
Array(6.8333344, dtype=float32)
By thinking about the mathematical structure of the Ramsey problem and using some linear algebra, we can simplify the
problem that we hand over to a machine learning algorithm.
We start by recalling that the Ramsey problem that chooses 𝜇⃗ to maximize the government’s value function (42.9)subject
to equation (42.8).
This turns out to be an optimization problem with a quadratic objective function and linear constraints.
First-order conditions for this problem are a set of simultaneous linear equations in 𝜇.⃗
If we trust that the second-order conditions for a maximum are also satisfied (they are in our problem), we can compute
the Ramsey plan by solving these equations for 𝜇.⃗
We’ll apply this approach here and compare answers with what we obtained above with the gradient descent approach.
𝜇𝑡 = 𝜇𝑇 ∀𝑡 ≥ 𝑇
and that
𝜃𝑡 = 𝜃𝑇 = 𝜇𝑇 ∀𝑡 ≥ 𝑇
Again, define
𝜃0 𝜇0
⎡ 𝜃 ⎤ ⎡ 𝜇 ⎤
⎢ 1 ⎥ ⎢ 1 ⎥
𝜃⃗ = ⎢ ⋮ ⎥ , 𝜇⃗ = ⎢ ⋮ ⎥
⎢𝜃𝑇 −1 ⎥ ⎢𝜇𝑇 −1 ⎥
⎣ 𝜃𝑇 ⎦ ⎣ 𝜇𝑇 ⎦
Write the system of 𝑇 + 1 equations (42.10) that relate 𝜃 ⃗ to a choice of 𝜇⃗ as the single matrix equation
1 −𝜆 0 0 ⋯ 0 0 𝜃0 𝜇0
⎡0 1 −𝜆 0 ⋯ 0 0 ⎤ ⎡ 𝜃1 ⎤ ⎡ 𝜇1 ⎤
1 ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢0 0 1 −𝜆 ⋯ 0 0 ⎥ ⎢ 𝜃2 ⎥ ⎢ 𝜇2 ⎥
=
(1 − 𝜆) ⎢ ⋮ ⋮ ⋮ ⋮ ⋮ −𝜆 0 ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 ⋯ 1 −𝜆 ⎥ ⎢𝜃𝑇 −1 ⎥ ⎢𝜇𝑇 −1 ⎥
⎣0 0 0 0 ⋯ 0 1 − 𝜆⎦ ⎣ 𝜃𝑇 ⎦ ⎣ 𝜇𝑇 ⎦
or
𝐴𝜃 ⃗ = 𝜇 ⃗
or
𝜃 ⃗ = 𝐵 𝜇⃗
where
𝐵 = 𝐴−1
B = jnp.linalg.inv(A)
return A, B
A, B = construct_B(α=clq.α, T=T)
print(f'A = \n {A}')
A =
[[ 2. -1. 0. ... 0. 0. 0.]
[ 0. 2. -1. ... 0. 0. 0.]
[ 0. 0. 2. ... 0. 0. 0.]
...
[ 0. 0. 0. ... 2. -1. 0.]
[ 0. 0. 0. ... 0. 2. -1.]
[ 0. 0. 0. ... 0. 0. 1.]]
np.allclose(θs, B @ clq.μ_series)
True
1
⎡ 𝛽 ⎤
⎢ ⎥
𝛽⃗ = ⎢ ⋮ ⎥
𝑇 −1
⎢𝛽 ⎥
𝛽𝑇
⎣ 1−𝛽 ⎦
Then we have:
∞
ℎ1 ∑ 𝛽 𝑡 𝜃𝑡 = ℎ1 ⋅ 𝛽 𝑇⃗ 𝜃 ⃗ = (ℎ1 ⋅ 𝐵𝑇 𝛽)⃗ 𝑇 𝜇⃗ = 𝑔𝑇 𝜇⃗
𝑡=0
where 𝑔 = ℎ1 ⋅ 𝐵𝑇 𝛽 ⃗ is a (𝑇 + 1) × 1 vector,
∞
ℎ2 ∑ 𝛽 𝑡 𝜃𝑡2 = 𝜇𝑇⃗ (𝐵𝑇 (ℎ2 ⋅ 𝛽 ⃗ ⋅ I)𝐵)𝜇⃗ = 𝜇𝑇⃗ 𝑀 𝜇⃗
𝑡=0
𝑐 ∞ 𝑡 2 𝑐
∑ 𝛽 𝜇𝑡 = 𝜇𝑇⃗ ( ⋅ 𝛽 ⃗ ⋅ I)𝜇⃗ = 𝜇𝑇⃗ 𝐹 𝜇⃗
2 𝑡=0 2
where 𝐹 = 𝑐
2 ⋅ 𝛽 ⃗ ⋅ I is a (𝑇 + 1) × (𝑇 + 1) matrix
It follows that
∞
𝑐
𝐽 = 𝑉 − ℎ0 = ∑ 𝛽 𝑡 (ℎ1 𝜃𝑡 + ℎ2 𝜃𝑡2 − 𝜇2𝑡 )
𝑡=0
2
= 𝑔𝑇 𝜇⃗ + 𝜇𝑇⃗ 𝑀 𝜇⃗ − 𝜇𝑇⃗ 𝐹 𝜇⃗
= 𝑔𝑇 𝜇⃗ + 𝜇𝑇⃗ (𝑀 − 𝐹 )𝜇⃗
= 𝑔𝑇 𝜇⃗ + 𝜇𝑇⃗ 𝐺𝜇⃗
where 𝐺 = 𝑀 − 𝐹 .
To compute the optimal government plan we want to maximize 𝐽 with respect to 𝜇.⃗
We use linear algebra formulas for differentiating linear and quadratic forms to compute the gradient of 𝐽 with respect
to 𝜇⃗
𝜕
𝐽 = 𝑔 + 2𝐺𝜇.⃗
𝜕 𝜇⃗
𝜕
Setting 𝜕 𝜇⃗ 𝐽 = 0, the maximizing 𝜇 is
1
𝜇𝑅
⃗ = − 𝐺−1 𝑔
2
The associated optimal inflation sequence is
⃗ = 𝐵𝜇𝑅
𝜃𝑅 ⃗
With the more structured approach, we can update our gradient descent exercise with compute_J
_, B = construct_B(α, T+1)
β_vec = jnp.hstack([β**jnp.arange(T),
(β**T/(1-β))])
θ = B @ μ
βθ_sum = jnp.sum((β_vec * h1) * θ)
βθ_square_sum = β_vec * h2 * θ.T @ θ
βμ_square_sum = 0.5 * c * β_vec * μ.T @ μ
%%time
# Optimize μ
optimized_μ = adam_optimizer(grad_J, μ_init)
print(f"optimized μ = \n{optimized_μ}")
print(f"original μ = \n{clq.μ_series}")
original μ =
[-0.06450708 -0.09033982 -0.10068489 -0.10482772 -0.10648677 -0.10715115
-0.10741722 -0.10752377 -0.10756644 -0.10758352 -0.10759037 -0.10759311
-0.1075942 -0.10759464 -0.10759482 -0.10759489 -0.10759492 -0.10759493
-0.10759493 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
-0.10759494 -0.10759494 -0.10759494 -0.10759494]
deviation = 2.3748542332668876e-07
Array(6.8357825, dtype=float32)
We find that by exploiting more knowledge about the structure of the problem, we can significantly speed up our com-
putation.
We can also derive a closed-form solution for 𝜇⃗
_, B = construct_B(α, T+1)
β_vec = jnp.hstack([β**jnp.arange(T),
(β**T/(1-β))])
g = h1 * B.T @ β_vec
M = B.T @ (h2 * jnp.diag(β_vec)) @ B
F = c/2 * jnp.diag(β_vec)
G = M - F
(continues on next page)
closed-form μ =
[-0.0645071 -0.09033982 -0.1006849 -0.1048277 -0.10648677 -0.10715113
-0.10741723 -0.10752378 -0.10756643 -0.10758351 -0.10759034 -0.10759313
-0.10759421 -0.10759464 -0.10759482 -0.1075949 -0.10759489 -0.10759492
-0.10759492 -0.10759491 -0.10759495 -0.10759494 -0.10759495 -0.10759493
-0.10759491 -0.10759491 -0.10759494 -0.10759491 -0.10759491 -0.10759495
-0.10759498 -0.10759492 -0.10759494 -0.10759485 -0.10759497 -0.10759495
-0.10759493 -0.10759494 -0.10759498 -0.10759494]
deviation = 1.47137171779832e-07
Array(6.835783, dtype=float32)
deviation = 2.535387864099903e-07
We can check the gradient of the analytical solution against the JAX computed version
_, B = construct_B(α, T+1)
β_vec = jnp.hstack([β**jnp.arange(T),
(β**T/(1-β))])
g = h1 * B.T @ β_vec
M = (h2 * B.T @ jnp.diag(β_vec) @ B)
F = c/2 * jnp.diag(β_vec)
G = M - F
return g + (2*G @ μ)
closed_grad
- grad_J(jnp.ones(T))
deviation = 4.074267394571507e-07
Note that while 𝜃𝑡 is less than 𝜇𝑡 for low 𝑡’s, it eventually converges to the limit 𝜇̄ of 𝜇𝑡 as 𝑡 → +∞.
This pattern reflects how formula (42.3) makes 𝜃𝑡 be a weighted average of future 𝜇𝑡 ’s.
For subsquent analysis, it will be useful to compute a sequence {𝑣𝑡 }𝑇𝑡=0 of what we’ll call continuation values
along a Ramsey plan.
To do so, we’ll start at date 𝑇 and compute
1
𝑣𝑇 = 𝑠(𝜇,̄ 𝜇).
̄
1−𝛽
Then starting from 𝑡 = 𝑇 − 1, we’ll iterate backwards on the recursion
𝑣𝑡 = 𝑠(𝜃𝑡 , 𝜇𝑡 ) + 𝛽𝑣𝑡+1
for 𝑡 = 𝑇 − 1, 𝑇 − 2, … , 0.
v_t = np.zeros(T)
μ_bar = μ[-1]
# Reduce parameters
s_p = lambda θ, μ: s(θ, μ,
u0=u0, u1=u1, u2=u2, α=α, c=c)
# Define v_T
v_t[T-1] = (1 / (1 - β)) * s_p(μ_bar, μ_bar)
# Backward iteration
for t in reversed(range(T-1)):
v_t[t] = s_p(θ[t], μ[t]) + β * v_t[t+1]
return v_t
The initial continuation value 𝑣0 should equal the optimized value of the Ramsey planner’s criterion 𝑉 defined in equation
(42.6).
Indeed, we find that the deviation is very small:
deviation = 4.76837158203125e-07
We can also verify approximate equality by inspecting a graph of 𝑣𝑡 against 𝑡 for 𝑡 = 0, … , 𝑇 along with the value attained
by a restricted Ramsey planner 𝑉 𝐶𝑅 and the optimized value of the ordinary Ramsey planner 𝑉 𝑅
# Add labels
plt.text(max(Ts) + max(Ts)*0.07, V_CR, '$V^{CR}$', color='C1',
va='center', clip_on=False, fontsize=15)
plt.text(max(Ts) + max(Ts)*0.07, V_R, '$V^R$', color='C2',
va='center', clip_on=False, fontsize=15)
plt.xlabel(r'$t$')
plt.ylabel(r'$v_t$')
plt.tight_layout()
plt.show()
Note: The continuation value 𝑣𝑇 is what some researchers call the “value of a Ramsey plan under a time-less perspective.”
A more descriptive phrase is “the value of the worst continuation Ramsey plan.”
Note: When we eventually get around to trying to understand the regressions below, it will worthwhile to study the
reasoning that let Chang [Chang, 1998] to choose 𝜃𝑡 as his key state variable.
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly␣
↪specified.
𝜇𝑡 = .0645 + 1.5995𝜃𝑡
Note: Of course, this means that a regression of 𝜃𝑡 on 𝜇𝑡 and a constant would also fit perfectly.
Let’s plot the regression line 𝜇𝑡 = .0645 + 1.5995𝜃𝑡 and the points (𝜃𝑡 , 𝜇𝑡 ) that lie on it for 𝑡 = 0, … , 𝑇 .
The time 0 pair (𝜃0 , 𝜇0 ) appears as the point on the upper right.
Points (𝜃𝑡 , 𝜇𝑡 ) for succeeding times appear further and further to the lower left and eventually converge to (𝜇,̄ 𝜇).
̄
Next, we’ll run a linear regression of 𝜃𝑡+1 against 𝜃𝑡 and a constant.
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly␣
↪specified.
We find that the regression line fits perfectly and thus discover the affine relationship
plt.xlabel(r'$\theta_t$')
plt.ylabel(r'$\theta_{t+1}$')
plt.legend()
plt.tight_layout()
plt.show()
Points for succeeding times appear further and further to the lower left and eventually converge to 𝜇,̄ 𝜇.̄
Next we ask Python to regress continuation value 𝑣𝑡 against a constant, 𝜃𝑡 , and 𝜃𝑡2 .
𝑣𝑡 = 𝑔0 + 𝑔1 𝜃𝑡 + 𝑔2 𝜃𝑡2 .
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly␣
↪specified.
[2] The condition number is large, 3.5e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
np.corrcoef(θs, θs**2)
array([[ 1. , -0.99942156],
[-0.99942156, 1. ]])
plt.scatter(θs, v_t)
plt.plot(θ_grid, results3.predict(X3_grid), color='grey',
label=r'$\hat v_t$', linestyle='--')
plt.axhline(V_CR, color='C1', alpha=0.5)
plt.xlabel(r'$\theta_{t}$')
plt.ylabel(r'$v_t$')
plt.legend()
plt.tight_layout()
plt.show()
The highest continuation value 𝑣0 at 𝑡 = 0 appears at the peak of the function quadratic function 𝑔0 + 𝑔1 𝜃𝑡 + 𝑔2 𝜃𝑡2 .
Subsequent values of 𝑣𝑡 for 𝑡 ≥ 1 appear to the lower left of the pair (𝜃0 , 𝑣0 ) and converge monotonically from above to
𝑣𝑇 at time 𝑇 .
The value 𝑉 𝐶𝑅 attained by the Ramsey plan that is restricted to be a constant 𝜇𝑡 = 𝜇𝐶𝑅 sequence appears as a horizontal
line.
Evidently, continuation values 𝑣𝑡 > 𝑉 𝐶𝑅 for 𝑡 = 0, 1, 2 while 𝑣𝑡 < 𝑉 𝐶𝑅 for 𝑡 ≥ 3.
Our regressions tells us that along the Ramsey outcome 𝜇𝑅 ⃗ , the linear function
⃗ , 𝜃𝑅
𝜇𝑡 = .0645 + 1.5995𝜃𝑡
𝜃0 = 𝜃0𝑅
𝜇𝑡 = 𝑏 0 + 𝑏 1 𝜃𝑡 (42.12)
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡
where the initial value 𝜃0𝑅 was computed along with other components of 𝜇𝑅 ⃗ when we computed the Ramsey plan,
⃗ , 𝜃𝑅
and where 𝑏0 , 𝑏1 , 𝑑0 , 𝑑1 are parameters whose values we estimated with our regressions.
In addition, we learned that continuation values are described by the quadratic function
𝑣𝑡 = 𝑔0 + 𝑔1 𝜃𝑡 + 𝑔2 𝜃𝑡2
We discovered these relationships by running some carefully chosen regressions and staring at the results, noticing that
the 𝑅2 ’s of unity tell us that the fits are perfect.
We have learned much about the structure of the Ramsey problem.
However, by using the methods and ideas that we have deployed in this lecture, it is challenging to say more.
There are many other linear regressions among components of 𝜇𝑅
⃗ , 𝜃𝑅 that would also have given us perfect fits.
For example, we could have regressed 𝜃𝑡 on 𝜇𝑡 and obtained the same 𝑅2 value.
Actually, wouldn’t that direction of fit have made more sense?
After all, the Ramsey planner chooses 𝜇,⃗ while 𝜃 ⃗ is an outcome that reflects the represenative agent’s response to the
Ramsey planner’s choice of 𝜇.⃗
Isn’t it more natural then to expect that we’d learn more about the structure of the Ramsey problem from a regression of
components of 𝜃 ⃗ on components of 𝜇?⃗
To answer these questions, we’ll have to deploy more economic theory.
We do that in this quantecon lecture Time Inconsistency of Ramsey Plans.
There, we’ll discover that system (42.12) is actually a very good way to represent a Ramsey plan because it reveals many
things about its structure.
Indeed, in that lecture, we show how to compute the Ramsey plan using dynamic programming squared and provide
a Python class ChangLQ that performs the calculations.
We have deployed ChangLQ earlier in this lecture to compute a baseline Ramsey plan to which we have compared
outcomes from our application of the cruder machine learning approaches studied here.
Let’s use the code to compute the parameters 𝑑0 , 𝑑1 for the decision rule for 𝜇𝑡 and the parameters 𝑑0 , 𝑑1 in the updating
rule for 𝜃𝑡+1 in representation (42.12).
First, we’ll again use ChangLQ to compute these objects (along with a number of others).
Now let’s print out the decision rule for 𝜇𝑡 uncovered by applying dynamic programming squared.
Now let’s print out the decision rule for 𝜃𝑡+1 uncovered by applying dynamic programming squared.
Evidently, these agree with the relationships that we discovered by running regressions on the Ramsey outcomes 𝜇𝑅 ⃗
⃗ , 𝜃𝑅
that we constructed with either of our machine learning algorithms.
We have set the stage for this quantecon lecture Time Inconsistency of Ramsey Plans.
We close this lecture by giving a hint about an insight of Chang [Chang, 1998] that underlies much of quantecon lecture
Time Inconsistency of Ramsey Plans.
Chang noticed how equation (42.3) shows that an equivalence class of continuation money growth sequences {𝜇𝑡+𝑗 }∞
𝑗=0
deliver the same 𝜃𝑡 .
Consequently, equations (42.1) and (42.3) indicate that 𝜃𝑡 intermediates how the government’s choices of 𝜇𝑡+𝑗 , 𝑗 =
0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡 .
In lecture Time Inconsistency of Ramsey Plans, we’ll see how Chang [Chang, 1998] put this insight to work.
FORTYTHREE
In addition to what’s in Anaconda, this lecture will need the following libraries:
43.1 Overview
This lecture describes a linear-quadratic version of a model that Guillermo Calvo [Calvo, 1978] used to analyze the time
inconsistency of optimal government plans.
We use the model as a laboratory in which we explore consequences of different timing protocols for government decision
making.
The model focuses on intertemporal tradeoffs between
• benefits that anticipations of future deflation generate by decreasing costs of holding real money balances and
thereby increasing a representative agent’s liquidity, as measured by his or her holdings of real money balances, and
• costs associated with the distorting taxes that the government must levy in order to acquire the paper money that it
will destroy in order to generate anticipated deflation
Model features include
• rational expectations
• alternative possible timing protocols for government choices of a sequence of money growth rates
• costly government actions at all dates 𝑡 ≥ 1 that increase household utilities at dates before 𝑡
• alternative possible sets of Bellman equations, one set for each timing protocol
– for example, in a timing protocol used to pose a Ramsey plan, a government chooses an infinite sequence of
money supply growth rates once and for all at time 0.
– in this timing protocol, there are two value functions and associated Bellman equations, one that expresses a
representative private expectation of future inflation as a function of current and future government actions,
another that describes the value function of a Ramsey planner
– in other timing protocols, other Bellman equations and associated value functions will appear
A theme of this lecture is that timing protocols for government decisions affect outcomes.
We’ll use ideas from papers by Cagan [Cagan, 1956], Calvo [Calvo, 1978], and Chang [Chang, 1998] as well as from
chapter 19 of [Ljungqvist and Sargent, 2018].
In addition, we’ll use ideas from linear-quadratic dynamic programming described in Linear Quadratic Control as applied
to Ramsey problems in Stackelberg plans.
823
Advanced Quantitative Economics with Python
We specify model fundamentals in ways that allow us to use linear-quadratic discounted dynamic programming to com-
pute an optimal government plan under each of our timing protocols.
A sister lecture Machine Learning a Ramsey Plan studies some of the same models but does not use dynamic programming.
Instead it uses a machine learning approach that does not explicitly recognize the recursive structure structure of the
Ramsey problem that Chang [Chang, 1998] saw and that we exploit in this lecture.
In addition to what’s in Anaconda, this lecture will use the following libraries:
import numpy as np
from quantecon import LQ
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import pandas as pd
from IPython.display import display, Math
There is no uncertainty.
Let:
• 𝑝𝑡 be the log of the price level
• 𝑚𝑡 be the log of nominal money balances
• 𝜃𝑡 = 𝑝𝑡+1 − 𝑝𝑡 be the net rate of inflation between 𝑡 and 𝑡 + 1
• 𝜇𝑡 = 𝑚𝑡+1 − 𝑚𝑡 be the net rate of growth of nominal balances
The demand for real balances is governed by a discrete time version of Sargent and Wallace’s [Sargent and Wallace, 1973]
perfect foresight version of a Cagan [Cagan, 1956] demand function for real balances:
for 𝑡 ≥ 0.
Equation (43.1) asserts that the demand for real balances is inversely related to the public’s expected rate of inflation,
which equals the actual rate of inflation because there is no uncertainty here.
Note: When there is no uncertainty, an assumption of rational expectations becomes equivalent to perfect foresight.
[Sargent, 1977] presents a rational expectations version of the model when there is uncertainty.
Subtracting the demand function (43.1) at time 𝑡 from the time 𝑡 + 1 version of this demand function gives
𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡
or
𝛼 1
𝜃𝑡 = 𝜃 + 𝜇 (43.2)
1 + 𝛼 𝑡+1 1 + 𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 < 1.
∞
∑ 𝑏𝑡2 < +∞
𝑡=0
Insight: Chang [Chang, 1998] noted that equations (43.1) and (43.3) show that 𝜃𝑡 intermediates how choices of 𝜇𝑡+𝑗 , 𝑗 =
0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡 .
An equivalence class of continuation money growth sequences {𝜇𝑡+𝑗 }∞
𝑗=0 deliver the same 𝜃𝑡 .
We shall use this insight to simplify our analysis of alternative government policy problems.
That future rates of money creation influence earlier rates of inflation makes timing protocols matter for modeling optimal
government policies.
We can represent restriction (43.3) as
1 1 0 1 0
[ ]=[ 1+𝛼 ] [ ] +[ ]𝜇 (43.4)
𝜃𝑡+1 0 𝛼
𝜃𝑡 − 𝛼1 𝑡
or
Even though 𝜃0 is to be determined by our model and so is not an initial condition, as it ordinarily would be in the state-
space model described in our lecture on Linear Quadratic Control, we nevertheless write the model in the state-space
form (43.5).
We use form (43.5) because we want to apply an approach described in our lecture on Stackelberg plans.
1+𝛼
Notice that 𝛼 > 1 is an eigenvalue of transition matrix 𝐴 that threatens to destabilize the state-space system.
But the government planner will design a decision rule for 𝜇𝑡 that stabilizes the system and renders 𝜃 ⃗ square summable.
The government values a representative household’s utility of real balances at time 𝑡 according to the utility function
𝑢2
𝑈 (𝑚𝑡 − 𝑝𝑡 ) = 𝑢0 + 𝑢1 (𝑚𝑡 − 𝑝𝑡 ) − (𝑚𝑡 − 𝑝𝑡 )2 , 𝑢0 > 0, 𝑢1 > 0, 𝑢2 > 0 (43.6)
2
The money demand function (43.1) and the utility function (43.6) imply that
𝑢2
𝑈 (−𝛼𝜃𝑡 ) = 𝑢0 + 𝑢1 (−𝛼𝜃𝑡 ) − (−𝛼𝜃𝑡 )2 . (43.7)
2
𝑢1
𝜃𝑡 = 𝜃 ∗ = − (43.8)
𝑢2 𝛼
Milton Friedman recommended that the government withdraw and destroy money at a rate that implies an inflation rate
given by (43.8).
In our setting, that could be accomplished by setting
𝜇𝑡 = 𝜇∗ = 𝜃∗ , 𝑡 ≥ 0 (43.9)
The starting point of Calvo [Calvo, 1978] and Chang [Chang, 1998] is that lump sum taxes are not available.
Instead, the government acquires money by levying taxes that distort decisions and thereby impose costs on the represen-
tative consumer.
In the models of Calvo [Calvo, 1978] and Chang [Chang, 1998], the government takes those tax-distortion costs into
account.
The government balances the costs of imposing the distorting taxes needed to acquire the money that it destroys in order
to generate deflation against the benefits that expected deflation generates by raising the representative household’s real
money balances.
Let’s see how the government does that.
Via equation (43.3), a government plan 𝜇⃗ = {𝜇𝑡 }∞ ⃗ ∞
𝑡=0 leads to a sequence of inflation outcomes 𝜃 = {𝜃𝑡 }𝑡=0 .
The government incurs social costs 2𝑐 𝜇𝑡2 at 𝑡 when it changes the stock of nominal money balances at rate 𝜇𝑡 .
Therefore, the one-period welfare function of a benevolent government is:
′
1 𝑢 − 𝑢12𝛼 1 𝑐
𝑠(𝜃𝑡 , 𝜇𝑡 ) ∶= −𝑟(𝑥𝑡 , 𝜇𝑡 ) = [ ] [ 𝑢01 𝛼 2][ ] − 𝜇2𝑡 = −𝑥′𝑡 𝑅𝑥𝑡 − 𝑄𝜇2𝑡 (43.10)
𝜃𝑡 − 2 − 𝑢22𝛼 𝜃𝑡 2
Note: We define 𝑟(𝑥𝑡 , 𝜇𝑡 ) ∶= −𝑠(𝜃𝑡 , 𝜇𝑡 ) in order to represent the government’s maximization problem in terms of
our Python code for solving linear quadratic discounted dynamic programs. In first LQ control lecture and some other
quantecon lectures, we formulated these as loss minimization problems.
We can represent dependence of 𝑣0 on (𝜃,⃗ 𝜇)⃗ recursively via the difference equation
It is useful to evaluate (43.13) under a time-invariant money growth rate 𝜇𝑡 = 𝜇̄ that according to equation (43.3) would
bring forth a constant inflation rate equal to 𝜇.̄
Under that policy,
𝑠(𝜇,̄ 𝜇)̄
𝑣𝑡 = 𝑉 (𝜇)̄ = (43.14)
1−𝛽
for all 𝑡 ≥ 0.
Values of 𝑉 (𝜇)̄ computed according to formula (43.14) for three different values of 𝜇̄ will play important roles below.
• 𝑉 (𝜇𝑀𝑃 𝐸 ) is the value of attained by the government in a Markov perfect equilibrium
• 𝑉 (𝜇𝑅
∞ ) is the value that a continuation Ramsey planner attains at 𝑡 → +∞
43.5 Structure
The following structure is induced by a representative agent’s behavior as summarized by the demand function for money
(43.1) that leads to equation (43.3), which tells how future settings of 𝜇 affect the current value of 𝜃.
Equation (43.3) maps a policy sequence of money growth rates 𝜇⃗ = {𝜇𝑡 }∞ 2 ⃗
𝑡=0 ∈ 𝐿 into an inflation sequence 𝜃 =
∞ 2
{𝜃𝑡 }𝑡=0 ∈ 𝐿 .
These in turn induce a discounted value to a government sequence 𝑣 ⃗ = {𝑣𝑡 }∞ 2
𝑡=0 ∈ 𝐿 that satisfies recursion (43.13).
Criterion function (43.11) and the constraint system (43.5) exhibit the following structure:
• Setting the money growth rate 𝜇𝑡 ≠ 0 imposes costs 2𝑐 𝜇2𝑡 at time 𝑡 and at no other times; but
• The money growth rate 𝜇𝑡 affects the government’s one-period utilities at all dates 𝑠 = 0, 1, … , 𝑡.
This structure sets the stage for the emergence of a time-inconsistent optimal government plan under a Ramsey timing
protocol
• it is also called a Stackelberg timing protocol.
We’ll study outcomes under a Ramsey timing protocol.
We’ll also study outcomes under other timing protocols.
Note: In the quantecon lecture Sustainable Plans for a Calvo Model, we’ll study outcomes under another timing protocol
in which there is a sequence of separate policymakers. A time 𝑡 policymaker chooses only 𝜇𝑡 but believes that its choice
of 𝜇𝑡 shapes the representative agent’s beliefs about future rates of money creation and inflation, and through them, future
government actions. This is a model of a credible government policy, also called a sustainable plan. The relationship
between outcomes in the first (Ramsey) timing protocol and the Sustainable Plans for a Calvo Model timing protocol
and belief structure is the subject of a literature on sustainable or credible public policies (Chari and Kehoe [Chari and
Kehoe, 1990] [Stokey, 1989], and Stokey [Stokey, 1991]).
We’ll begin with the timing protocol associated with a Ramsey plan and deploy an application of what we nickname
dynamic programming squared.
The nickname refers to the feature that a value satisfying one Bellman equation appears as an argument in a value function
associated with a second Bellman equation.
Thus, two Bellman equations appear:
We split this problem into two stages, as in the lecture Stackelberg plans and [Ljungqvist and Sargent, 2018] Chapter 19.
In the first stage, we take the initial inflation rate 𝜃0 as given and pose an ordinary discounted dynamic programming
problem that in our setting becomes an LQ discounted dynamic programming problem.
In the second stage, we choose an optimal initial inflation rate 𝜃0 .
Define a feasible set of {𝑥𝑡+1 , 𝜇𝑡 }∞ 2
𝑡=0 sequences, with each sequence belonging to 𝐿 :
Ω(𝑥0 ) = {𝑥𝑡+1 , 𝜇𝑡 }∞
𝑡=0 ∶ 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝜇𝑡 , ∀𝑡 ≥ 0,
43.8.1 Subproblem 1
subject to:
𝑥′ = 𝐴𝑥 + 𝐵𝜇
As in the lecture Stackelberg plans, we can map this problem into a linear-quadratic control problem and deduce an
optimal value function 𝐽 (𝑥).
Guessing that 𝐽 (𝑥) = −𝑥′ 𝑃 𝑥 and substituting into the Bellman equation gives rise to the algebraic matrix Riccati
equation:
𝜇𝑡 = −𝐹 𝑥𝑡
where
𝑃11 𝑃12 1
𝑣𝑡 = − [1 𝜃𝑡 ] [ ][ ]
𝑃21 𝑃22 𝜃𝑡
or
or
𝑣𝑡 = 𝑔0 + 𝑔1 𝜃𝑡 + 𝑔2 𝜃𝑡2 (43.17)
where
1
𝜇𝑡 = − [𝐹1 𝐹2 ] [ ]
𝜃𝑡
or
𝜇𝑡 = 𝑏 0 + 𝑏 1 𝜃𝑡 (43.18)
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡 (43.19)
where [ 𝑑0 𝑑1 ] is the second row of the closed-loop matrix 𝐴 − 𝐵𝐹 for computed in subproblem 1 above.
The linear quadratic control problem (43.15) satisfies regularity conditions that guarantee that 𝐴 − 𝐵𝐹 is a stable matrix
(i.e., its maximum eigenvalue is strictly less than 1 in absolute value).
Consequently, we are assured that
43.8.2 Subproblem 2
𝑉 𝑅 = max 𝐽 (𝑥0 )
𝑥0
We abuse notation slightly by writing 𝐽 (𝑥) as 𝐽 (𝜃) and rewrite the above equation as
1
Note: Since 𝑥 = [ ], it follows that 𝜃 is the only component of 𝑥 that can possibly vary.
𝜃
𝑉 𝑅 = max 𝐽 (𝜃0 )
𝜃0
𝑃11 𝑃12 1
𝐽 (𝜃0 ) = − [1 𝜃0 ] [ ] [ ] = −𝑃11 − 2𝑃21 𝜃0 − 𝑃22 𝜃02
𝑃21 𝑃22 𝜃0
−2𝑃21 − 2𝑃22 𝜃0 = 0
which implies
𝑃21
𝜃0 = 𝜃0𝑅 = −
𝑃22
The preceding calculations indicate that we can represent a Ramsey plan 𝜇⃗ recursively with the following system created
in the spirit of Chang [Chang, 1998]:
𝜃0 = 𝜃0𝑅
𝜇𝑡 = 𝑏0 + 𝑏1 𝜃𝑡
(43.21)
𝑣𝑡 = 𝑔0 + 𝑔1 𝜃𝑡 + 𝑔2 𝜃𝑡2
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡 , 𝑑0 > 0, 𝑑1 ∈ (0, 1)
where 𝑏0 , 𝑏1 , 𝑔0 , 𝑔1 , 𝑔2 are positive parameters that we shall compute with Python code below.
From condition (43.20), we know that |𝑑1 | < 1.
To interpret system (43.21), think of the sequence {𝜃𝑡 }∞
𝑡=0 as a sequence of synthetic promised inflation rates.
For some purposes, we can think of these promised inflation rates just as computational devices for generating a sequence
𝜇⃗ of money growth rates that when substituted into equation (43.3) generate actual rates of inflation.
It can be verified that if we substitute a plan 𝜇⃗ = {𝜇𝑡 }∞
𝑡=0 that satisfies these equations into equation (43.3), we obtain
the same sequence 𝜃 ⃗ generated by the system (43.21).
(Here an application of the Big 𝐾, little 𝑘 trick is again at work.)
Thus, within the Ramsey plan, promised inflation equals actual inflation.
System (43.21) implies that under the Ramsey plan
1 − 𝑑1𝑡
𝜃𝑡 = 𝑑0 ( ) + 𝑑1𝑡 𝜃0𝑅 , (43.22)
1 − 𝑑1
𝑑0
lim 𝜃𝑡𝑅 = 𝜃∞
𝑅
= . (43.23)
𝑡→+∞ 1 − 𝑑1
1 − 𝑑1𝑡
𝜇𝑡 = 𝑏0 + 𝑏1 𝑑0 ( ) + 𝑏1 𝑑1𝑡 𝜃0𝑅 . (43.24)
1 − 𝑑1
Variation of 𝜇𝑅 ⃗ , 𝑣𝑅
⃗ , 𝜃𝑅 ⃗ over time are symptoms of time inconsistency.
• The Ramsey planner reaps immediate benefits from promising lower inflation later to be achieved by costly dis-
torting taxes.
• These benefits are intermediated by reductions in expected inflation that precede the reductions in money creation
rates that rationalize them, as indicated by equation (43.3).
As discussed in Stackelberg plans and Optimal taxation with state-contingent debt, a continuation Ramsey plan is not a
Ramsey plan.
This is a concise way of characterizing the time inconsistency of a Ramsey plan.
In the present context, a symptom of time inconsistency is that the Ramsey plannner chooses to make 𝜇𝑡 a non-constant
function of time 𝑡 despite the fact that, other than time itself, there is no other state variable.
Thus, in our context, time-variation of 𝜇⃗ chosen by a Ramsey planner is the telltale sign of the Ramsey plan’s time
inconsistency.
We can use brute force to create a government plan that is time consistent, i.e., that is a time-invariant function of time.
We simply constrain a planner to choose a time-invariant money growth rate 𝜇̄ so that
𝜇𝑡 = 𝜇,̄ ∀𝑡 ≥ 0.
We assume that the government knows the perfect foresight outcome implied by equation (43.2) that 𝜃𝑡 = 𝜇̄ when 𝜇𝑡 = 𝜇̄
for all 𝑡 ≥ 0.
It follows that the value of such a plan is given by 𝑉 (𝜇)̄ defined inequation (43.14).
To generate an alternative model of time-consistent government decision making, we assume another timing protocol.
In this one, there is a sequence of government policymakers.
A time 𝑡 government chooses 𝜇𝑡 and expects all future governments to set 𝜇𝑡+𝑗 = 𝜇.̄
This assumption mirrors an assumption made in this QuantEcon lecture: Markov Perfect Equilibrium.
When it sets 𝜇𝑡 , the government at 𝑡 believes that 𝜇̄ is unaffected by its choice of 𝜇𝑡 .
According to equation (43.3), the time 𝑡 rate of inflation is then
1 𝛼
𝜃𝑡 = 𝜇 + 𝜇,̄ (43.28)
1+𝛼 𝑡 1+𝛼
which expresses inflation 𝜃𝑡 as a geometric weighted average of the money growth today 𝜇𝑡 and money growth from
tomorrow onward 𝜇.̄
Given 𝜇,̄ the time 𝑡 government chooses 𝜇𝑡 to maximize:
𝑐
𝐻(𝜇𝑡 , 𝜇)̄ = 𝑈 (−𝛼𝜃𝑡 ) − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄ (43.29)
2
where 𝑉 (𝜇)̄ is given by formula (43.14) for the time 0 value 𝑣0 of recursion (43.13) under a money supply growth rate
that is forever constant at 𝜇.̄
Substituting (43.28) into (43.29) and expanding gives:
2
𝛼2 𝛼 𝑢 𝛼2 𝛼
𝐻(𝜇𝑡 , 𝜇)̄ = 𝑢0 + 𝑢1 (− 𝜇̄ − 𝜇𝑡 ) − 2 (− 𝜇̄ − 𝜇)
1+𝛼 1+𝛼 2 1+𝛼 1+𝛼 𝑡 (43.30)
𝑐
− 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
2
The first-order necessary condition for maximizing 𝐻(𝜇𝑡 , 𝜇)̄ with respect to 𝜇𝑡 is:
𝛼 𝛼2 𝛼 𝛼
− 𝑢1 − 𝑢2 (− 𝜇̄ − 𝜇𝑡 )(− ) − 𝑐𝜇𝑡 = 0
1+𝛼 1+𝛼 1+𝛼 1+𝛼
Rearranging we get the time 𝑡 government’s best response map
𝜇𝑡 = 𝑓(𝜇)̄
where
−𝑢1 𝛼2 𝑢2
𝑓(𝜇)̄ = 1+𝛼 𝛼
− 1+𝛼 𝛼
𝜇̄
𝛼 𝑐 + 1+𝛼 𝑢2 [ 𝛼 𝑐 + 1+𝛼 𝑢2 ] (1 + 𝛼)
A Markov Perfect Equilibrium (MPE) outcome 𝜇𝑀𝑃 𝐸 is a fixed point of the best response map:
𝜇𝑀𝑃 𝐸 = 𝑓(𝜇𝑀𝑃 𝐸 )
𝑠(𝜇𝑀𝑃 𝐸 , 𝜇𝑀𝑃 𝐸 )
𝑉 𝑀𝑃 𝐸 = (43.32)
1−𝛽
or
𝑉 𝑀𝑃 𝐸 = 𝑉 (𝜇𝑀𝑃 𝐸 )
We want to compare outcome sequences {𝜃𝑡 , 𝜇𝑡 } under three timing protocols associated with
• a standard Ramsey plan with its time-varying {𝜃𝑡 , 𝜇𝑡 } sequences
• a Markov perfect equilibrium, with its time-invariant {𝜃𝑡 , 𝜇𝑡 } sequences
• a nonstandard Ramsey plan in which the planner is restricted to choose a time-invariant 𝜇𝑡 = 𝜇 for all 𝑡 ≥ 0.
We have computed closed form formulas for several of these outcomes, which we find it convenient to repeat here.
In particular, the constrained to constant inflation Ramsey inflation outcome is 𝜇𝐶𝑅 , which according to equation (43.26)
is
𝛼𝑢
𝜃𝐶𝑅 = − 2 1
𝛼 𝑢2 + 𝑐
Equation (43.31) implies that the Markov perfect constant inflation rate is
𝛼𝑢1
𝜃𝑀𝑃 𝐸 = −
𝛼2 𝑢2 + (1 + 𝛼)𝑐
According to equation (43.8), the bliss level of inflation that we associated with a Friedman rule is
𝑢1
𝜃∗ = −
𝑢2 𝛼
class ChangLQ:
"""
Class to solve LQ Chang model
"""
def __init__(self, β, c, α=1, u0=1, u1=0.5, u2=3, T=1000, θ_n=200):
# Record parameters
self.α, self.u0, self.u1, self.u2 = α, u0, u1, u2
self.β, self.c, self.T, self.θ_n = β, c, T, θ_n
self.setup_LQ_matrices()
self.solve_LQ_problem()
self.compute_policy_functions()
self.simulate_ramsey_plan()
self.compute_θ_range()
self.compute_value_and_policy()
def setup_LQ_matrices(self):
# LQ Matrices
self.R = -np.array([[self.u0, -self.u1 * self.α / 2],
[-self.u1 * self.α / 2,
-self.u2 * self.α**2 / 2]])
self.Q = -np.array([[-self.c / 2]])
self.A = np.array([[1, 0], [0, (1 + self.α) / self.α]])
self.B = np.array([[0], [-1 / self.α]])
def solve_LQ_problem(self):
# Solve LQ Problem (Subproblem 1)
lq = LQ(self.Q, self.R, self.A, self.B, beta=self.β)
self.P, self.F, self.d = lq.stationary_values()
# Solve Subproblem 2
self.θ_R = -self.P[0, 1] / self.P[1, 1]
def compute_policy_functions(self):
# Solve the Markov Perfect Equilibrium
self.μ_MPE = -self.u1 / ((1 + self.α) / self.α * self.c
+ self.α / (1 + self.α)
* self.u2 + self.α**2
/ (1 + self.α) * self.u2)
self.θ_MPE = self.μ_MPE
self.μ_CR = -self.α * self.u1 / (self.u2 * self.α**2 + self.c)
self.θ_CR = self.μ_CR
self.J_MPE = self.V_θ(self.μ_MPE)
self.J_CR = self.V_θ(self.μ_CR)
def simulate_ramsey_plan(self):
# Simulate Ramsey plan for large number of periods
θ_series = np.vstack((np.ones((1, self.T)), np.zeros((1, self.T))))
μ_series = np.zeros(self.T)
J_series = np.zeros(self.T)
θ_series[1, 0] = self.θ_R
[μ_series[0]] = -self.F.dot(θ_series[:, 0])
J_series[0] = self.J_θ(θ_series[1, 0])
self.J_series = J_series
self.μ_series = μ_series
self.θ_series = θ_series
def compute_θ_range(self):
# Find the range of θ in Ramsey plan
θ_LB = min(min(self.θ_series[1, :]), self.θ_B)
θ_UB = max(max(self.θ_series[1, :]), self.θ_MPE)
θ_range = θ_UB - θ_LB
self.θ_LB = θ_LB - 0.05 * θ_range
self.θ_UB = θ_UB + 0.05 * θ_range
self.θ_range = θ_range
def compute_value_and_policy(self):
# Create the θ_space
self.μ_space = self.μ_space[0, :]
The following code plots policy functions for a continuation Ramsey planner.
The green line shows a continuation Ramsey planner’s choice of 𝜇𝑡 = 𝜇 as a function of an inherited 𝜃𝑡 = 𝜃.
𝑅
Dynamics under the Ramsey plan are confined to 𝜃 ∈ [𝜃∞ , 𝜃0𝑅 ].
𝑅
The blue and green lines intersect each other and the 45-degree line at 𝜃 = 𝜃∞ .
𝑅
Notice that for 𝜃 ∈ (𝜃∞ , 𝜃0𝑅 ]
• 𝜃′ < 𝜃 because the blue line is below the 45-degree line
• 𝜇 > 𝜃 because the green line is above the 45-degree line
𝑅
It follows that under the Ramsey plan {𝜃𝑡 } and {𝜇𝑡 } both converge monotonically from above to 𝜃∞ .
The next code plots the Ramsey planner’s value function 𝐽 (𝜃).
We know that 𝐽 (𝜃) is maximized at 𝜃0𝑅 , the best time 0 promised inflation rate.
𝑅
The figure also plots the limiting value 𝜃∞ , the limiting value of promised inflation rate 𝜃𝑡 under the Ramsey plan as
𝑡 → +∞.
The figure also indicates an MPE inflation rate 𝜃𝑀𝑃 𝐸 , the inflation 𝜃𝐶𝑅 under a Ramsey plan constrained to a constant
money creation rate, and a bliss inflation 𝜃∗ .
In some subsequent calculations, we’ll use our Python code to study how gaps between these outcome vary depending on
parameters such as the cost parameter 𝑐 and the discount factor 𝛽.
The next code plots the Ramsey Planner’s value function 𝐽 (𝜃) as well as the value function of a constrained Ramsey
planner who must choose a constant 𝜇.
A time-invariant 𝜇 implies a time-invariant 𝜃, we take the liberty of labeling this value function 𝑉 (𝜃).
We’ll use the code to plot 𝐽 (𝜃) and 𝑉 (𝜃) for several values of the discount factor 𝛽 and the cost parameter 𝑐 that multiplies
𝜇2𝑡 in the Ramsey planner’s one-period payoff function.
In all of the graphs below, we disarm the Proposition 1 equivalence results by setting 𝑐 > 0.
The graphs reveal interesting relationships among 𝜃’s associated with various timing protocols:
• 𝐽 (𝜃) ≥ 𝑉 (𝜃)
𝑅 𝑅
• 𝐽 (𝜃∞ ) = 𝑉 (𝜃∞ )
𝑅 𝑅
Before doing anything else, let’s write code to verify our claim that 𝐽 (𝜃∞ ) = 𝑉 (𝜃∞ ).
Here is the code.
True
𝑅 𝑅
So we have verified our claim that 𝐽 (𝜃∞ ) = 𝑉 (𝜃∞ ).
𝑅 𝑅
Since 𝐽 (𝜃∞ ) = 𝑉 (𝜃∞ ) occurs at a tangency point at which 𝐽 (𝜃) is increasing in 𝜃, it follows that
𝑅
𝑉 (𝜃∞ ) ≤ 𝐽 (𝜃𝐶𝑅 ) (43.33)
Now let’s present some graphs that teach us how outcomes change when we assume different values of 𝛽
# Increase c to 100
fig, axes = plt.subplots(1, 3, figsize=(12, 5))
c_values = [1, 10, 100]
generate_table(clqs, dig=4)
𝑅
The above table and figures show how changes in 𝑐 alter 𝜃∞ and 𝜃0𝑅 as well as 𝜃𝐶𝑅 and 𝜃𝑀𝑃 𝐸 , but not 𝜃∗ , again in accord
with formulas (43.8), (43.26), and (43.31).
𝑅
Notice that as 𝑐 gets larger and larger, 𝜃∞ , 𝜃0𝑅 and 𝜃𝐶𝑅 all converge to 𝜃𝑀𝑃 𝐸 .
Now let’s watch what happens when we drive 𝑐 toward zero.
# Decrease c towards 0
fig, axes = plt.subplots(1, 3, figsize=(12, 5))
c_limits = [1, 0.1, 0.01]
𝑅
The above graphs indicate that as 𝑐 approaches zero, 𝜃∞ , 𝜃0𝑅 , 𝜃𝐶𝑅 , and 𝜃𝑀𝑃 𝐸 all approach 𝜃∗ .
This makes sense, because it was by adding costs of distorting taxes that Calvo [Calvo, 1978] drove a wedge between
Friedman’s optimal deflation rate and the inflation rates chosen by a Ramsey planner.
The following code plots sequences 𝜇⃗ and 𝜃 ⃗ prescribed by a Ramsey plan as well as the constant levels 𝜇𝐶𝑅 and 𝜇𝑀𝑃 𝐸 .
The following graphs report values for the value function parameters 𝑔0 , 𝑔1 , 𝑔2 , and the Ramsey policy function parameters
𝑏0 , 𝑏1 , 𝑑0 , 𝑑1 associated with the indicated parameter pair 𝛽, 𝑐.
We’ll vary 𝛽 while keeping a small 𝑐.
After that we’ll study consequences of raising 𝑐.
We’ll watch how the decay rate 𝑑1 governing the dynamics of 𝜃𝑡𝑅 is affected by alterations in the parameters 𝛽, 𝑐.
for β in β_values:
clq = ChangLQ(β=β, c=2)
generate_param_table(clq)
plot_ramsey_MPE(clq)
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.7, 𝑐 = 2 3.39 −0.75 −4.54 −0.06 −1.52 −0.06 0.48
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.8, 𝑐 = 2 5.1 −0.76 −4.65 −0.06 −1.58 −0.06 0.42
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.99, 𝑐 = 2 102.47 −0.76 −4.81 −0.07 −1.65 −0.07 0.35
# Increase c to 100
for c in c_values:
clq = ChangLQ(β=0.85, c=c)
generate_param_table(clq)
plot_ramsey_MPE(clq)
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.85, 𝑐 = 1 6.84 −0.68 −3.19 −0.09 −1.69 −0.09 0.31
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.85, 𝑐 = 10 6.72 −0.92 −16.16 −0.02 −1.47 −0.02 0.53
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.85, 𝑐 = 100 6.67 −0.99 −143.29 −0.0 −1.42 −0.0 0.58
# Increase c to 100
for c in [10, 100]:
clq = ChangLQ(α=4, β=0.85, c=c)
generate_param_table(clq)
plot_ramsey_MPE(clq)
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.85, 𝑐 = 10 6.84 −4.62 −82.32 −0.05 −2.33 −0.01 0.67
𝑔0 𝑔1 𝑔2 𝑏0 𝑏1 𝑑0 𝑑1
𝛽 = 0.85, 𝑐 = 100 6.74 −8.03 −390.65 −0.01 −1.47 −0.0 0.88
The above panels for an 𝛼 = 4 setting indicate that 𝛼 and 𝑐 affect outcomes in interesting ways.
We leave it to the reader to explore consequences of other constellations of parameter values.
Many economists regard a time inconsistent plan as implausible because they question the plausibility of timing protocol
in which a plan for setting a sequence of policy variables is chosen once-and-for-all at time 0.
For that reason, the Markov perfect equilibrium concept attracts many economists.
• A Markov perfect equilibrium plan is constructed to insure that a sequence of government policymakers who choose
sequentially do not want to deviate from it.
Research by Abreu [Abreu, 1988], Chari and Kehoe [Chari and Kehoe, 1990] [Stokey, 1989], and Stokey [Stokey, 1991]
described conditions under which a Ramsey plan can be rescued from the complaint that it is not credible.
They accomplished this by expanding the description of a plan to include expectations about adverse consequences of
deviating from it that can serve to deter deviations.
We turn to such theories in this quantecon lecture Sustainable Plans for a Calvo Model.
FORTYFOUR
44.1 Overview
847
Advanced Quantitative Economics with Python
for 𝑡 ≥ 0.
Equation (44.1) asserts that the demand for real balances is inversely related to the public’s expected rate of inflation,
which equals the actual rate of inflation because there is no uncertainty here.
(When there is no uncertainty, an assumption of rational expectations that becomes equivalent to perfect foresight).
Subtracting the demand function (44.1) at time 𝑡 from the demand function at 𝑡 + 1 gives:
𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡
or
𝛼 1
𝜃𝑡 = 𝜃 + 𝜇 (44.2)
1 + 𝛼 𝑡+1 1 + 𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 < 1.
Definition: For scalar 𝑏𝑡 , let 𝐿2 be the space of sequences {𝑏𝑡 }∞
𝑡=0 satisfying
∞
∑ 𝑏𝑡2 < +∞
𝑡=0
Insight: In the spirit of Chang [Chang, 1998], equations (44.1) and (44.3) show that 𝜃𝑡 intermediates how choices of
𝜇𝑡+𝑗 , 𝑗 = 0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡 .
An equivalence class of continuation money growth sequences {𝜇𝑡+𝑗 }∞
𝑗=0 deliver the same 𝜃𝑡 .
That future rates of money creation influence earlier rates of inflation makes timing protocols matter for modeling optimal
government policies.
Quantecon lecture Time Inconsistency of Ramsey Plans used this insight to simplify analysis of alternative government
policy problems.
The Quantecon lecture Time Inconsistency of Ramsey Plans considered three models of government policy making that
differ in
• what a policymaker chooses, either a sequence 𝜇⃗ or just 𝜇𝑡 in a single period 𝑡.
• when a policymaker chooses, either once and for all at time 0, or at some time or times 𝑡 ≥ 0.
• what a policymaker assumes about how its choice of 𝜇𝑡 affects the representative agent’s expectations about inflation
rates.
• If the government at 𝑡 disappoints private agents by setting 𝜇𝑡 ≠ 𝜇𝑡̃ , private agents expect {𝜇𝐴 ∞
𝑗 }𝑗=0 as the con-
tinuation policy for 𝑡 + 1, i.e., {𝜇𝑡+𝑗+1 } = {𝜇𝐴 ∞ 𝐴
𝑗 }𝑗=0 and therefore expect an associated 𝜃0 for 𝑡 + 1. Here
𝐴 𝐴 ∞
𝜇⃗ = {𝜇𝑗 }𝑗=0 is an alternative government plan to be described below.
The government’s one-period return function 𝑠(𝜃, 𝜇) described in equation (43.10) in quantecon lecture [Calvo, 1978]
has the property that for all 𝜃
𝑠(𝜃, 0) ≥ 𝑠(𝜃, 𝜇)
This inequality implies that whenever the policy calls for the government to set 𝜇 ≠ 0, the government could raise its
one-period payoff by setting 𝜇 = 0.
Disappointing private sector expectations in that way would increase the government’s current payoff but would have
adverse consequences for subsequent government payoffs because the private sector would alter its expectations about
future settings of 𝜇.
The temporary gain constitutes the government’s temptation to deviate from a plan.
If the government at 𝑡 is to resist the temptation to raise its current payoff, it is only because it forecasts adverse con-
sequences that its setting of 𝜇𝑡 would bring for continuation government payoffs via alterations in the private sector’s
expectations.
We call a plan 𝜇⃗ sustainable or credible if at each 𝑡 ≥ 0 the government chooses to confirm private agents’ prior
expectation of its setting for 𝜇𝑡 .
The government will choose to confirm prior expectations only if the long-term loss from disappointing private sec-
tor expectations – coming from the government’s understanding of the way the private sector adjusts its expectations
in response to having its prior expectations at 𝑡 disappointed – outweigh the short-term gain from disappointing those
expectations.
The theory of sustainable or credible plans assumes throughout that private sector expectations about what future gov-
ernments will do are based on the assumption that governments at times 𝑡 ≥ 0 always act to maximize the continuation
discounted utilities that describe those governments’ purposes.
This aspect of the theory means that credible plans always come in pairs:
• a credible (continuation) plan to be followed if the government at 𝑡 confirms private sector expectations
• a credible plan to be followed if the government at 𝑡 disappoints private sector expectations
That credible plans come in pairs threaten to bring an explosion of plans to keep track of
• each credible plan itself consists of two credible plans
• therefore, the number of plans underlying one plan is unbounded
But Dilip Abreu showed how to render manageable the number of plans that must be kept track of.
The key is an object called a self-enforcing plan.
We’ll proceed to compute one.
In addition to what’s in Anaconda, this lecture will use the following libraries:
import numpy as np
from quantecon import LQ
import matplotlib.pyplot as plt
import pandas as pd
A plan 𝜇𝐴
⃗ (here the superscipt 𝐴 is for Abreu) is said to be self-enforcing if
• the consequence of disappointing the representative agent’s expectations at time 𝑗 is to restart plan 𝜇𝐴
⃗ at time 𝑗 + 1
• the consequence of restarting the plan is sufficiently adverse that it forever deters all deviations from the plan
More precisely, a government plan 𝜇𝐴 ⃗ is self-enforcing if
⃗ with equilibrium inflation sequence 𝜃𝐴
𝑣𝑗𝐴 = 𝑠(𝜃𝑗𝐴 , 𝜇𝐴 𝐴
𝑗 ) + 𝛽𝑣𝑗+1
(44.4)
≥ 𝑠(𝜃𝑗𝐴 , 0) + 𝛽𝑣0𝐴 ≡ 𝑣𝑗𝐴,𝐷 , 𝑗≥0
(Here it is useful to recall that setting 𝜇 = 0 is the maximizing choice for the government’s one-period return function)
The first line tells the consequences of confirming the representative agent’s expectations by following the plan, while the
second line tells the consequences of disappointing the representative agent’s expectations by deviating from the plan.
A consequence of the inequality stated in the definition is that a self-enforcing plan is credible.
Self-enforcing plans can be used to construct other credible plans, including ones with better values.
Thus, where 𝑣𝐴 ⃗ is the value associated with a self-enforcing plan 𝜇𝐴
⃗ , a sufficient condition for another plan 𝜇⃗ associated
with inflation 𝜃 ⃗ and value 𝑣 ⃗ to be credible is that
𝑣𝑗 = 𝑠(𝜃𝑗 , 𝜇𝑗 ) + 𝛽𝑣𝑗+1
(44.5)
≥ 𝑠(𝜃𝑗 , 0) + 𝛽𝑣0𝐴 ∀𝑗 ≥ 0
The left side of the above inequality is the government’s gain from deviating from the plan, while the right side is the
government’s loss from deviating from the plan.
A government never wants to deviate from a credible plan.
Abreu taught us that key step in constructing a credible plan is first constructing a self-enforcing plan that has a low time
0 value.
The idea is to use the self-enforcing plan as a continuation plan whenever the government’s choice at time 𝑡 fails to confirm
private agents’ expectation.
We shall use a construction featured in Abreu ([Abreu, 1988]) to construct a self-enforcing plan with low time 0 value.
[Abreu, 1988] invented a way to create a self-enforcing plan with a low initial value.
Imitating his idea, we can construct a self-enforcing plan 𝜇⃗ with a low time 0 value to the government by insisting that
future government decision makers set 𝜇𝑡 to a value yielding low one-period utilities to the household for a long time,
after which government decisions thereafter yield high one-period utilities.
• Low one-period utilities early are a stick
• High one-period utilities later are a carrot
Consider a candidate plan 𝜇𝐴
⃗ that sets 𝜇𝐴
𝑡 = 𝜇̄ (a high positive number) for 𝑇𝐴 periods, and then reverts to the Ramsey
plan.
Denote this sequence by {𝜇𝐴 ∞
𝑡 }𝑡=0 .
∞ 𝑗
1 𝛼
𝜃𝑡𝐴 = ∑( ) 𝜇𝐴
𝑡+𝑗
1 + 𝛼 𝑗=0 1 + 𝛼
𝑇𝐴 −1
𝑣0𝐴 = ∑ 𝛽 𝑡 𝑠(𝜃𝑡𝐴 , 𝜇𝐴
𝑡 )+𝛽
𝑇𝐴
𝐽 (𝜃0𝑅 )
𝑡=0
For an appropriate 𝑇𝐴 , this plan can be verified to be self-enforcing and therefore credible.
From quantecon lecture Time Inconsistency of Ramsey Plans, we’ll again bring in the Python class ChangLQ that constructs
equilibria under timing protocols studied in that lecture.
class ChangLQ:
"""
Class to solve LQ Chang model
"""
def __init__(self, β, c, α=1, u0=1, u1=0.5, u2=3, T=1000, θ_n=200):
# Record parameters
self.α, self.u0, self.u1, self.u2 = α, u0, u1, u2
self.β, self.c, self.T, self.θ_n = β, c, T, θ_n
self.setup_LQ_matrices()
self.solve_LQ_problem()
self.compute_policy_functions()
self.simulate_ramsey_plan()
self.compute_θ_range()
self.compute_value_and_policy()
def setup_LQ_matrices(self):
# LQ Matrices
self.R = -np.array([[self.u0, -self.u1 * self.α / 2],
[-self.u1 * self.α / 2,
-self.u2 * self.α**2 / 2]])
self.Q = -np.array([[-self.c / 2]])
self.A = np.array([[1, 0], [0, (1 + self.α) / self.α]])
self.B = np.array([[0], [-1 / self.α]])
def solve_LQ_problem(self):
(continues on next page)
# Solve Subproblem 2
self.θ_R = -self.P[0, 1] / self.P[1, 1]
def compute_policy_functions(self):
# Solve the Markov Perfect Equilibrium
self.μ_MPE = -self.u1 / ((1 + self.α) / self.α * self.c
+ self.α / (1 + self.α)
* self.u2 + self.α**2
/ (1 + self.α) * self.u2)
self.θ_MPE = self.μ_MPE
self.μ_CR = -self.α * self.u1 / (self.u2 * self.α**2 + self.c)
self.θ_CR = self.μ_CR
self.J_MPE = self.V_θ(self.μ_MPE)
self.J_CR = self.V_θ(self.μ_CR)
def simulate_ramsey_plan(self):
# Simulate Ramsey plan for large number of periods
θ_series = np.vstack((np.ones((1, self.T)), np.zeros((1, self.T))))
μ_series = np.zeros(self.T)
J_series = np.zeros(self.T)
θ_series[1, 0] = self.θ_R
[μ_series[0]] = -self.F.dot(θ_series[:, 0])
J_series[0] = self.J_θ(θ_series[1, 0])
self.J_series = J_series
def compute_θ_range(self):
# Find the range of θ in Ramsey plan
θ_LB = min(min(self.θ_series[1, :]), self.θ_B)
θ_UB = max(max(self.θ_series[1, :]), self.θ_MPE)
θ_range = θ_UB - θ_LB
self.θ_LB = θ_LB - 0.05 * θ_range
self.θ_UB = θ_UB + 0.05 * θ_range
self.θ_range = θ_range
def compute_value_and_policy(self):
# Create the θ_space
self.θ_space = np.linspace(self.θ_LB, self.θ_UB, 200)
self.μ_space = self.μ_space[0, :]
⃗ is self-enforcing, we plot an object that we call 𝑉𝑡𝐴,𝐷 , defined in the key inequality in the
To confirm that the plan 𝜇𝐴
second line of equation (44.4) above.
𝑉𝑡𝐴,𝐷 is the value at 𝑡 of deviating from the self-enforcing plan 𝜇𝐴
⃗ by setting 𝜇𝑡 = 0 and then restarting the plan at 𝑣0𝐴
at 𝑡 + 1:
True
check_ramsey(clq)
True
We can represent a sustainable plan recursively by taking the continuation value 𝑣𝑡 as a state variable.
We form the following 3-tuple of functions:
𝜇𝑡̂ = 𝜈𝜇 (𝑣𝑡 )
𝜃𝑡 = 𝜈𝜃 (𝑣𝑡 ) (44.6)
𝑣𝑡+1 = 𝜈𝑣 (𝑣𝑡 , 𝜇𝑡 )
FORTYFIVE
In addition to what’s in Anaconda, this lecture will need the following libraries:
45.1 Overview
This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and Nancy Stokey [Lucas and
Stokey, 1983].
The model revisits classic issues about how to pay for a war.
Here a war means a more or less temporary surge in an exogenous government expenditure process.
The model features
• a government that must finance an exogenous stream of government expenditures with either
– a flat rate tax on labor, or
– purchases and sales from a full array of Arrow state-contingent securities
• a representative household that values consumption and leisure
• a linear production function mapping labor into a single good
• a Ramsey planner who at time 𝑡 = 0 chooses a plan for taxes and trades of Arrow securities for all 𝑡 ≥ 0
After first presenting the model in a space of sequences, we shall represent it recursively in terms of two Bellman equations
formulated along lines that we encountered in Dynamic Stackelberg models.
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state vector artfully.
In particular, we shall include forward-looking variables that summarize optimal responses of private agents to a Ramsey
plan.
See Optimal taxation for analysis within a linear-quadratic setting.
Let’s start with some standard imports:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import root
from quantecon import MarkovChain
from quantecon.optimize.nelder_mead import nelder_mead
from numba import njit, prange, float64
from numba.experimental import jitclass
859
Advanced Quantitative Economics with Python
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between 𝑐𝑡 (𝑠𝑡 ) and 𝑔𝑡 (𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (45.3)
𝑡=0 𝑠𝑡
where the utility function 𝑢 is increasing, strictly concave, and three times continuously differentiable in both arguments.
The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡 .
The government imposes a flat-rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
There are complete markets in one-period Arrow securities.
One unit of an Arrow security issued at time 𝑡 at history 𝑠𝑡 and promising to pay one unit of time 𝑡 + 1 consumption in
state 𝑠𝑡+1 costs 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ).
The government issues one-period Arrow securities each period.
The government has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
𝑔𝑡 (𝑠𝑡 ) = 𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) − 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) (45.4)
𝑠𝑡+1
where
• 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) is a competitive equilibrium price of one unit of consumption at date 𝑡 + 1 in state 𝑠𝑡+1 at date 𝑡
and history 𝑠𝑡 .
• 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) is government debt falling due at time 𝑡, history 𝑠𝑡 .
Government debt 𝑏0 (𝑠0 ) is an exogenous initial condition.
The representative household has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
𝑐𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = [1 − 𝜏𝑡 (𝑠𝑡 )] 𝑛𝑡 (𝑠𝑡 ) + 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) ∀𝑡 ≥ 0 (45.5)
𝑠𝑡+1
A price system is a sequence of Arrow security prices {𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )}∞
𝑡=0 .
The household faces the price system as a price-taker and takes the government policy as given.
A competitive equilibrium with distorting taxes is a feasible allocation, a price system, and a government policy such
that
• Given the price system and the government policy, the allocation solves the household’s optimization problem.
• Given the allocation, government policy, and price system, the government’s budget constraint is satisfied for all
𝑡, 𝑠𝑡 .
We find it convenient sometimes to work with the Arrow-Debreu price system that is implied by a sequence of Arrow
securities prices.
Let 𝑞𝑡0 (𝑠𝑡 ) be the price at time 0, measured in time 0 consumption goods, of one unit of consumption at time 𝑡, history
𝑠𝑡 .
The following recursion relates Arrow-Debreu prices {𝑞𝑡0 (𝑠𝑡 )}∞ 𝑡 ∞
𝑡=0 to Arrow securities prices {𝑝𝑡+1 (𝑠𝑡+1 |𝑠 )}𝑡=0
0
𝑞𝑡+1 (𝑠𝑡+1 ) = 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑞𝑡0 (𝑠𝑡 ) 𝑠.𝑡. 𝑞00 (𝑠0 ) = 1 (45.6)
Arrow-Debreu prices are useful when we want to compress a sequence of budget constraints into a single intertemporal
budget constraint, as we shall find it convenient to do below.
We apply a popular approach to solving a Ramsey problem, called the primal approach.
The idea is to use first-order conditions for household optimization to eliminate taxes and prices in favor of quantities,
then pose an optimization problem cast entirely in terms of quantities.
After Ramsey quantities have been found, taxes and prices can then be unwound from the allocation.
The primal approach uses four steps:
1. Obtain first-order conditions of the household’s problem and solve them for {𝑞𝑡0 (𝑠𝑡 ), 𝜏𝑡 (𝑠𝑡 )}∞
𝑡=0 as functions of the
allocation {𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 )}∞
𝑡=0 .
2. Substitute these expressions for taxes and prices in terms of the allocation into the household’s present-value budget
constraint.
• This intertemporal constraint involves only the allocation and is regarded as an implementability constraint.
3. Find the allocation that maximizes the utility of the representative household (45.3) subject to the feasibility con-
straints (45.1) and (45.2) and the implementability condition derived in step 2.
• This optimal allocation is called the Ramsey allocation.
4. Use the Ramsey allocation together with the formulas from step 1 to find taxes and prices.
By sequential substitution of one one-period budget constraint (45.5) into another, we can obtain the household’s present-
value budget constraint:
∞ ∞
∑ ∑ 𝑞𝑡0 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) = ∑ ∑ 𝑞𝑡0 (𝑠𝑡 )[1 − 𝜏𝑡 (𝑠𝑡 )]𝑛𝑡 (𝑠𝑡 ) + 𝑏0 (45.7)
𝑡=0 𝑠𝑡 𝑡=0 𝑠𝑡
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (45.8)
𝑢𝑐 (𝑠𝑡 )
and
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽𝜋(𝑠𝑡+1 |𝑠𝑡 ) ( ) (45.9)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡 )
𝑞𝑡0 (𝑠𝑡 ) = 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ) (45.10)
𝑢𝑐 (𝑠0 )
(The stochastic process {𝑞𝑡0 (𝑠𝑡 )} is an instance of what finance economists call a stochastic discount factor process.)
Using the first-order conditions (45.8) and (45.9) to eliminate taxes and prices from (45.7), we derive the implementability
condition
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )[𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝑢𝑐 (𝑠0 )𝑏0 = 0 (45.11)
𝑡=0 𝑠𝑡
subject to (45.11).
𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] = 𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] + Φ [𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] (45.13)
where {𝜃𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 is a sequence of Lagrange multipliers on the feasible conditions (45.2).
Given an initial government debt 𝑏0 , we want to maximize 𝐽 with respect to {𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 and to minimize
with respect to Φ and with respect to {𝜃(𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 .
The first-order conditions for the Ramsey problem for periods 𝑡 ≥ 1 and 𝑡 = 0, respectively, are
𝑐𝑡 (𝑠𝑡 )∶ (1 + Φ)𝑢𝑐 (𝑠𝑡 ) + Φ [𝑢𝑐𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ𝑐 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝜃𝑡 (𝑠𝑡 ) = 0, 𝑡≥1
𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡
(45.15)
𝑛𝑡 (𝑠 )∶ − (1 + Φ)𝑢ℓ (𝑠 ) − Φ [𝑢𝑐ℓ (𝑠 )𝑐𝑡 (𝑠 ) − 𝑢ℓℓ (𝑠 )𝑛𝑡 (𝑠 )] + 𝜃𝑡 (𝑠 ) = 0, 𝑡≥1
and
𝑐0 (𝑠0 , 𝑏0 )∶ (1 + Φ)𝑢𝑐 (𝑠0 , 𝑏0 ) + Φ [𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓ𝑐 (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] − 𝜃0 (𝑠0 , 𝑏0 )
− Φ𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑏0 = 0
(45.16)
𝑛0 (𝑠 , 𝑏0 )∶ − (1 + Φ)𝑢ℓ (𝑠0 , 𝑏0 ) − Φ [𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓℓ (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] + 𝜃0 (𝑠0 , 𝑏0 )
0
Notice that a counterpart to 𝑏0 does not appear in (45.17), so 𝑐 does not directly depend on it for 𝑡 ≥ 1.
But things are different for time 𝑡 = 0.
An analogous argument for the 𝑡 = 0 equations (45.16) leads to one equation that can be solved for 𝑐0 as a function of
the pair (𝑔(𝑠0 ), 𝑏0 ) and the Lagrange multiplier Φ.
These outcomes mean that the following statement would be true even when government purchases are history-dependent
functions 𝑔𝑡 (𝑠𝑡 ) of the history of 𝑠𝑡 .
Proposition: If government purchases are equal after two histories 𝑠𝑡 and 𝑠𝜏̃ for 𝑡, 𝜏 ≥ 0, i.e., if
𝑔𝑡 (𝑠𝑡 ) = 𝑔𝜏 (𝑠𝜏̃ ) = 𝑔
then it follows from (45.17) that the Ramsey choices of consumption and leisure, (𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )) and (𝑐𝑗 (𝑠𝜏̃ ), ℓ𝑗 (𝑠𝜏̃ )),
are identical.
The proposition asserts that the optimal allocation is a function of the currently realized quantity of government purchases
𝑔 only and does not depend on the specific history that preceded that realization of 𝑔.
Also, assume that government purchases 𝑔 are an exact time-invariant function 𝑔(𝑠) of 𝑠.
We maintain these assumptions throughout the remainder of this lecture.
We complete the Ramsey plan by computing the Lagrange multiplier Φ on the implementability constraint (45.11).
Government budget balance restricts Φ via the following line of reasoning.
The household’s first-order conditions imply
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (45.19)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽Π(𝑠𝑡+1 |𝑠𝑡 ) (45.20)
𝑢𝑐 (𝑠𝑡 )
Substituting from (45.19), (45.20), and the feasibility condition (45.2) into the recursive version (45.5) of the household
budget constraint gives
𝑢𝑐 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )] + 𝛽 ∑ Π(𝑠𝑡+1 |𝑠𝑡 )𝑢𝑐 (𝑠𝑡+1 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
𝑠𝑡+1 (45.21)
= 𝑢𝑙 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 )
Hence the equation shares much of the structure of a simple asset pricing equation with 𝑥𝑡 being analogous to the price
of the asset at time 𝑡.
We learned earlier that for a Ramsey allocation 𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), and 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ), and therefore also 𝑥𝑡 (𝑠𝑡 ), are each func-
tions of 𝑠𝑡 only, being independent of the history 𝑠𝑡−1 for 𝑡 ≥ 1.
That means that we can express equation (45.21) as
where 𝑠′ denotes a next period value of 𝑠 and 𝑥′ (𝑠′ ) denotes a next period value of 𝑥.
Given 𝑛(𝑠) for 𝑠 = 1, … , 𝑆, equation (45.22) is easy to solve for 𝑥(𝑠) for 𝑠 = 1, … , 𝑆.
If we let 𝑛,⃗ 𝑔,⃗ 𝑥⃗ denote 𝑆 × 1 vectors whose 𝑖th elements are the respective 𝑛, 𝑔, and 𝑥 values when 𝑠 = 𝑖, and let Π be
the transition matrix for the Markov state 𝑠, then we can express (45.22) as the matrix equation
In these equations, by 𝑢⃗𝑐 𝑛,⃗ for example, we mean element-by-element multiplication of the two vectors.
𝑥(𝑠)
After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the matrix equation
𝑥⃗
𝑏⃗ = (45.25)
𝑢⃗𝑐
where division here means an element-by-element division of the respective components of the 𝑆 × 1 vectors 𝑥⃗ and 𝑢⃗𝑐 .
Here is a computational algorithm:
1. Start with a guess for the value for Φ, then use the first-order conditions and the feasibility conditions to compute
𝑐(𝑠𝑡 ), 𝑛(𝑠𝑡 ) for 𝑠 ∈ [1, … , 𝑆] and 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ), given Φ.
• these are 2(𝑆 + 1) equations in 2(𝑆 + 1) unknowns.
2. Solve the 𝑆 equations (45.24) for the 𝑆 elements of 𝑥.⃗
• these depend on Φ.
3. Find a Φ that satisfies
𝑆
𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠|𝑠0 )𝑥(𝑠) (45.26)
𝑠=1
by gradually raising Φ if the left side of (45.26) exceeds the right side and lowering Φ if the left side is less than
the right side.
4. After computing a Ramsey allocation, recover the flat tax rate on labor from (45.8) and the implied one-period
Arrow securities prices from (45.9).
In summary, when 𝑔𝑡 is a time-invariant function of a Markov state 𝑠𝑡 , a Ramsey plan can be constructed by solving
3𝑆 + 3 equations for 𝑆 components each of 𝑐,⃗ 𝑛,⃗ and 𝑥⃗ together with 𝑛0 , 𝑐0 , and Φ.
A time 𝑡, history 𝑠𝑡 Ramsey plan is a Ramsey plan that starts from initial conditions 𝑠𝑡 , 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ).
A time 𝑡, history 𝑠𝑡 continuation of a time 0, state 0 Ramsey plan is not a time 𝑡, history 𝑠𝑡 Ramsey plan.
The means that a Ramsey plan is not time consistent.
Another way to say the same thing is that a Ramsey plan is time inconsistent.
The reason is that a continuation Ramsey plan takes 𝑢𝑐𝑡 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as given, not 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ).
We shall discuss this more below.
In our calculations below and in a subsequent lecture based on an extension of the Lucas-Stokey model by Aiyagari, Marcet,
Sargent, and Seppä lä (2002) [Aiyagari et al., 2002], we shall modify the one-period utility function assumed above.
(We adopted the preceding utility specification because it was the one used in the original Lucas-Stokey paper [Lucas
and Stokey, 1983]. We shall soon revert to that specification in a subsequent section.)
We will modify their specification by instead assuming that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
where 𝜎 > 0, 𝛾 > 0.
We continue to assume that
𝑐𝑡 + 𝑔 𝑡 = 𝑛 𝑡
With these understandings, equations (45.17) and (45.18) simplify in the case of the CRRA utility function.
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (45.28)
In equation (45.27), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠.
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :
𝑆
𝑢𝑐 (𝑠)
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + 𝛽 ∑ Π(𝑠|𝑠0 ) 𝑏 (𝑠) (45.29)
𝑠=1
𝑢𝑐,0 1
class SequentialLS:
'''
Class that takes a preference object, state transition matrix,
and state contingent government expenditure plan as inputs, and
solves the sequential allocation problem described above.
It returns optimal allocations about consumption and labor supply,
as well as the multiplier on the implementability constraint Φ.
'''
def __init__(self,
pref,
π=np.full((2, 2), 0.5),
g=np.array([0.1, 0.2])):
pref = self.pref
Uc, Ul = pref.Uc, pref.Ul
n = c + g
l = 1 - n
def find_first_best(self):
(continues on next page)
self.cFB = res.x
self.nFB = self.cFB + g
pref = self.pref
Uc, Ucc, Ul, Ull, Ulc = pref.Uc, pref.Ucc, pref.Ul, pref.Ull, pref.Ulc
n = c + g
l = 1 - n
return diff
c = res.x
n = c + g
l = 1 - n
# Compute x
I = pref.Uc(c, n) * c - pref.Ul(c, l) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x
pref = self.pref
Ucc, Ulc = pref.Ucc, pref.Ulc
n0 = c0 + g0
l0 = 1 - n0
return diff
c, n, x = self.time1_allocation(Φ)
Φ = res.x[0]
return Φ, c0, n0
if sHist is None:
sHist = self.mc.simulate(T, s0)
# Time 0
Φ, cHist[0], nHist[0] = self.time0_allocation(b0, s0)
τHist[0] = self.τ(cHist[0], nHist[0])
Bhist[0] = b0
ΦHist[0] = Φ
# Time 1 onward
for t in range(1, T):
c, n, x = self.time1_allocation(Φ)
τ = self.τ(c, n)
u_c = Uc(c, 1-n)
s = sHist[t]
Eu_c = π[sHist[t-1]] @ u_c
cHist[t], nHist[t], Bhist[t], τHist[t] = c[s], n[s], x[s] / u_c[s], τ[s]
RHist[t-1] = Uc(cHist[t-1], 1-nHist[t-1]) / (β * Eu_c)
ΦHist[t] = Φ
gHist = self.g[sHist]
yHist = nHist
return [cHist, nHist, Bhist, τHist, gHist, yHist, sHist, ΦHist, RHist]
To express a Ramsey plan recursively, we imagine that a time 0 Ramsey planner is followed by a sequence of continuation
Ramsey planners at times 𝑡 = 1, 2, ….
A “continuation Ramsey planner” at time 𝑡 ≥ 1 has a different objective function and faces different constraints and state
variables than does the Ramsey planner at time 𝑡 = 0.
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled government debts 𝑥𝑡 (𝑠𝑡 ) =
𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as predetermined quantities that continuation Ramsey planners at times 𝑡 ≥ 1 are obligated to attain.
Continuation Ramsey planners do this by choosing continuation policies that induce the representative household to make
choices that imply that 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) = 𝑥𝑡 (𝑠𝑡 ).
A time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables.
A time 𝑡 ≥ 1 continuation Ramsey planner delivers 𝑥𝑡 by choosing a suitable 𝑛𝑡 , 𝑐𝑡 pair and a list of 𝑠𝑡+1 -contingent
continuation quantities 𝑥𝑡+1 to bequeath to a time 𝑡 + 1 continuation Ramsey planner.
While a time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables, the time 0 Ramsey planner faces 𝑏0 , not
𝑥0 , as a state variable.
Furthermore, the Ramsey planner cares about (𝑐0 (𝑠0 ), ℓ0 (𝑠0 )), while continuation Ramsey planners do not.
The time 0 Ramsey planner hands a state-contingent function that make 𝑥1 a function of 𝑠1 to a time 1, state 𝑠1 contin-
uation Ramsey planner.
These lines of delegated authorities and responsibilities across time express the continuation Ramsey planners’ obligations
to implement their parts of an original Ramsey plan that had been designed once-and-for-all at time 0.
After 𝑠𝑡 has been realized at time 𝑡 ≥ 1, the state variables confronting the time 𝑡 continuation Ramsey planner are
(𝑥𝑡 , 𝑠𝑡 ).
• Let 𝑉 (𝑥, 𝑠) be the value of a continuation Ramsey plan at 𝑥𝑡 = 𝑥, 𝑠𝑡 = 𝑠 for 𝑡 ≥ 1.
• Let 𝑊 (𝑏, 𝑠) be the value of a Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠.
We work backward by preparing a Bellman equation for 𝑉 (𝑥, 𝑠) first, then a Bellman equation for 𝑊 (𝑏, 𝑠).
where maximization over 𝑛 and the 𝑆 elements of 𝑥′ (𝑠′ ) is subject to the single implementability constraint for 𝑡 ≥ 1:
𝑛𝑡 = 𝑓(𝑥𝑡 , 𝑠𝑡 ), 𝑡≥1
(45.32)
𝑥𝑡+1 (𝑠𝑡+1 ) = ℎ(𝑠𝑡+1 ; 𝑥𝑡 , 𝑠𝑡 ), 𝑠𝑡+1 ∈ 𝑆, 𝑡 ≥ 1
where maximization over 𝑛0 and the 𝑆 elements of 𝑥′ (𝑠1 ) is subject to the time 0 implementability constraint
𝑛0 = 𝑓0 (𝑏0 , 𝑠0 )
(45.35)
𝑥1 (𝑠1 ) = ℎ0 (𝑠1 ; 𝑏0 , 𝑠0 )
Notice the appearance of state variables (𝑏0 , 𝑠0 ) in the time 0 policy functions for the Ramsey planner as compared to
(𝑥𝑡 , 𝑠𝑡 ) in the policy functions (45.32) for the time 𝑡 ≥ 1 continuation Ramsey planners.
∞
The value function 𝑉 (𝑥𝑡 , 𝑠𝑡 ) of the time 𝑡 continuation Ramsey planner equals 𝐸𝑡 ∑𝜏=𝑡 𝛽 𝜏−𝑡 𝑢(𝑐𝜏 , 𝑙𝜏 ), where consump-
tion and leisure processes are evaluated along the original time 0 Ramsey plan.
Attach a Lagrange multiplier Φ1 (𝑥, 𝑠) to constraint (45.31) and a Lagrange multiplier Φ0 to constraint (45.26).
Time 𝑡 ≥ 1: First-order conditions for the time 𝑡 ≥ 1 constrained maximization problem on the right side of the
continuation Ramsey planner’s Bellman equation (45.30) are
for 𝑛.
Given Φ1 , equation (45.37) is one equation to be solved for 𝑛 as a function of 𝑠 (or of 𝑔(𝑠)).
Equation (45.36) implies 𝑉𝑥 (𝑥′ , 𝑠′ ) = Φ1 , while an envelope condition is 𝑉𝑥 (𝑥, 𝑠) = Φ1 , so it follows that
Time 𝑡 = 0: For the time 0 problem on the right side of the Ramsey planner’s Bellman equation (45.33), first-order
conditions are
𝑉𝑥 (𝑥(𝑠1 ), 𝑠1 ) = Φ0 (45.39)
Notice similarities and differences between the first-order conditions for 𝑡 ≥ 1 and for 𝑡 = 0.
An additional term is present in (45.40) except in three special cases
• 𝑏0 = 0, or
• 𝑢𝑐 is constant (i.e., preferences are quasi-linear in consumption), or
• initial government assets are sufficiently large to finance all government purchases with interest earnings from those
assets so that Φ0 = 0
Except in these special cases, the allocation and the labor tax rate as functions of 𝑠𝑡 differ between dates 𝑡 = 0 and
subsequent dates 𝑡 ≥ 1.
Naturally, the first-order conditions in this recursive formulation of the Ramsey problem agree with the first-order con-
ditions derived when we first formulated the Ramsey plan in the space of sequences.
𝑉𝑥 (𝑥𝑡 , 𝑠𝑡 ) = Φ0 (45.41)
for all 𝑡 ≥ 1.
When 𝑉 is concave in 𝑥, this implies state-variable degeneracy along a Ramsey plan in the sense that for 𝑡 ≥ 1, 𝑥𝑡 will
be a time-invariant function of 𝑠𝑡 .
Given Φ0 , this function mapping 𝑠𝑡 into 𝑥𝑡 can be expressed as a vector 𝑥⃗ that solves equation (45.34) for 𝑛 and 𝑐 as
functions of 𝑔 that are associated with Φ = Φ0 .
While the marginal utility adjusted level of government debt 𝑥𝑡 is a key state variable for the continuation Ramsey planners
at 𝑡 ≥ 1, it is not a state variable at time 0.
The time 0 Ramsey planner faces 𝑏0 , not 𝑥0 = 𝑢𝑐,0 𝑏0 , as a state variable.
The discrepancy in state variables faced by the time 0 Ramsey planner and the time 𝑡 ≥ 1 continuation Ramsey planners
captures the differing obligations and incentives faced by the time 0 Ramsey planner and the time 𝑡 ≥ 1 continuation
Ramsey planners.
• The time 0 Ramsey planner is obligated to honor government debt 𝑏0 measured in time 0 consumption goods.
• The time 0 Ramsey planner can manipulate the value of government debt as measured by 𝑢𝑐,0 𝑏0 .
• In contrast, time 𝑡 ≥ 1 continuation Ramsey planners are obligated not to alter values of debt, as measured by
𝑢𝑐,𝑡 𝑏𝑡 , that they inherit from a preceding Ramsey planner or continuation Ramsey planner.
When government expenditures 𝑔𝑡 are a time-invariant function of a Markov state 𝑠𝑡 , a Ramsey plan and associated
Ramsey allocation feature marginal utilities of consumption 𝑢𝑐 (𝑠𝑡 ) that, given Φ, for 𝑡 ≥ 1 depend only on 𝑠𝑡 , but that
for 𝑡 = 0 depend on 𝑏0 as well.
This means that 𝑢𝑐 (𝑠𝑡 ) will be a time-invariant function of 𝑠𝑡 for 𝑡 ≥ 1, but except when 𝑏0 = 0, a different function for
𝑡 = 0.
This in turn means that prices of one-period Arrow securities 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝑝(𝑠𝑡+1 |𝑠𝑡 ) will be the same time-invariant
functions of (𝑠𝑡+1 , 𝑠𝑡 ) for 𝑡 ≥ 1, but a different function 𝑝0 (𝑠1 |𝑠0 ) for 𝑡 = 0, except when 𝑏0 = 0.
The differences between these time 0 and time 𝑡 ≥ 1 objects reflect the Ramsey planner’s incentive to manipulate Arrow
security prices and, through them, the value of initial government debt 𝑏0 .
class RecursiveLS:
'''
Compute the planner's allocation by solving Bellman
equation.
'''
def __init__(self,
pref,
x_grid,
π=np.full((2, 2), 0.5),
g=np.array([0.1, 0.2])):
# bound for n
bounds[0] = 0, 1
self.bounds = bounds
# guess of n
z[:, :, 1] = 0.5
# guess of xprime
for s in range(S):
for i in range(S-1):
z[:, s, i+2] = x_grid
while True:
# value function iteration
V_new, z_new = T(V, z, pref, π, g, x_grid, bounds)
V = V_new
z = z_new
self.V = V_new
self.z1 = z_new
self.c1 = z_new[:, :, 0]
self.n1 = z_new[:, :, 1]
self.xprime1 = z_new[:, :, 2:]
if self.V is None:
self.time1_allocation()
x = 1. # x is arbitrary
res = nelder_mead(obj_V,
z1[0, s0, 1:-1],
args=(x, s0, V, pref, π, g, x_grid, b0),
bounds=bounds,
tol_f=1e-10)
self.z0 = z0
self.n0 = n0
self.c0 = n0 - g[s0]
self.xprime0 = xprime0
return z0
return 1 - ul / uc
if sHist is None:
sHist = self.mc.simulate(T, s0)
# Time 0
self.time0_allocation(b0, s0)
cHist[0], nHist[0], xHist[0] = self.c0, self.n0, self.xprime0[s0]
τHist[0] = self.τ(cHist[0], nHist[0])
Bhist[0] = b0
# Time 1 onward
for t in range(1, T):
s, x = sHist[t], xHist[t-1]
cHist[t] = np.interp(x, self.x_grid, self.c1[:, s])
nHist[t] = np.interp(x, self.x_grid, self.n1[:, s])
c, n = np.empty((2, self.S))
for sprime in range(self.S):
c[sprime] = np.interp(x, x_grid, self.c1[:, sprime])
n[sprime] = np.interp(x, x_grid, self.n1[:, sprime])
Euc = π[sHist[t-1]] @ Uc(c, 1-n)
RHist[t-1] = Uc(cHist[t-1], 1-nHist[t-1]) / (self.pref.β * Euc)
gHist = self.g[sHist]
yHist = nHist
if t < T-1:
sprime = sHist[t+1]
xHist[t] = np.interp(x, self.x_grid, self.xprime1[:, s, sprime])
# Helper functions
@njit(parallel=True)
def T(V, z, pref, π, g, x_grid, bounds):
'''
One step iteration of Bellman value function.
'''
S = len(π)
V_new = np.empty_like(V)
z_new = np.empty_like(z)
for i in prange(len(x_grid)):
x = x_grid[i]
for s in prange(S):
res = nelder_mead(obj_V,
z[i, s, 1:-1],
args=(x, s, V, pref, π, g, x_grid),
bounds=bounds,
tol_f=1e-10)
# optimal policy
n, xprime = IC(res.x, x, s, None, pref, π, g)
z_new[i, s, 0] = n - g[s] # c
z_new[i, s, 1] = n # n
z_new[i, s, 2:] = xprime # xprime
V_new[i, s] = res.fun
@njit
def obj_V(z_sub, x, s, V, pref, π, g, x_grid, b0=None):
'''
The objective on the right hand side of the Bellman equation.
z_sub contains guesses of n and xprime[:-1].
'''
S = len(π)
β, U = pref.β, pref.U
return obj
@njit
def IC(z_sub, x, s, b0, pref, π, g):
'''
Find xprime[-1] that satisfies the implementability condition
given the guesses of n and xprime[:-1].
'''
n = z_sub[0]
xprime = np.empty(len(π))
xprime[:-1] = z_sub[1:]
c, l = n-g[s], 1-n
uc = Uc(c, l)
ul = Ul(c, l)
if b0 is None:
diff = x
else:
diff = uc * b0
return n, xprime
45.4 Examples
This example illustrates in a simple setting how a Ramsey planner manages risk.
Government expenditures are known for sure in all periods except one
• For 𝑡 < 3 and 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1
We define the components of the state vector as the following six (𝑡, 𝑔) pairs: (0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥
4, 𝑔𝑙 ).
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6.
The transition matrix is
0 1 0 0 0 0
⎛
⎜ 0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
⎜ 0 0 0 0.5 0.5 0⎟⎟
Π=⎜
⎜ 0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜ 0.1⎞
⎟
⎜
⎜ ⎟
⎟
0.1
𝑔=⎜
⎜ ⎟
⎜ 0.1⎟
⎟
⎜0.2⎟
⎜ ⎟
⎝0.1⎠
We assume that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
and set 𝜎 = 2, 𝛾 = 2, and the discount factor 𝛽 = 0.9.
Note: For convenience in terms of matching our code, we have expressed utility as a function of 𝑛 rather than leisure 𝑙.
crra_util_data = [
('β', float64),
('σ', float64),
('γ', float64)
]
@jitclass(crra_util_data)
class CRRAutility:
def __init__(self,
β=0.9,
(continues on next page)
# Utility function
def U(self, c, l):
# Note: `l` should not be interpreted as labor, it is an auxiliary
# variable used to conveniently match the code and the equations
# in the lecture
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - (1-l) ** (1 + self.γ) / (1 + self.γ)
π = np.array([[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0.5, 0.5, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1]])
plt.tight_layout()
plt.show()
Tax smoothing
• the tax rate is constant for all 𝑡 ≥ 1
– For 𝑡 ≥ 1, 𝑡 ≠ 3, this is a consequence of 𝑔𝑡 being the same at all those dates.
– For 𝑡 = 3, it is a consequence of the special one-period utility function that we have assumed.
– Under other one-period utility functions, the time 𝑡 = 3 tax rate could be either higher or lower than for dates
𝑡 ≥ 1, 𝑡 ≠ 3.
• the tax rate is the same at 𝑡 = 3 for both the high 𝑔𝑡 outcome and the low 𝑔𝑡 outcome
We have assumed that at 𝑡 = 0, the government owes positive debt 𝑏0 .
It sets the time 𝑡 = 0 tax rate partly with an eye to reducing the value 𝑢𝑐,0 𝑏0 of 𝑏0 .
It does this by increasing consumption at time 𝑡 = 0 relative to consumption in later periods.
This has the consequence of lowering the time 𝑡 = 0 value of the gross interest rate for risk-free loans between periods 𝑡
and 𝑡 + 1, which equals
𝑢𝑐,𝑡
𝑅𝑡 =
𝛽𝔼𝑡 [𝑢𝑐,𝑡+1 ]
A tax policy that makes time 𝑡 = 0 consumption be higher than time 𝑡 = 1 consumption evidently decreases the risk-free
rate one-period interest rate, 𝑅𝑡 , at 𝑡 = 0.
Lowering the time 𝑡 = 0 risk-free interest rate makes time 𝑡 = 0 consumption goods cheaper relative to consumption
goods at later dates, thereby lowering the value 𝑢𝑐,0 𝑏0 of initial government debt 𝑏0 .
We see this in a figure below that plots the time path for the risk-free interest rate under both realizations of the time
𝑡 = 3 government expenditure shock.
The following plot illustrates how the government lowers the interest rate at time 0 by raising consumption
We have seen that when 𝑏0 > 0, the Ramsey plan sets the time 𝑡 = 0 tax rate partly with an eye toward lowering a
risk-free interest rate for one-period loans between times 𝑡 = 0 and 𝑡 = 1.
By lowering this interest rate, the plan makes time 𝑡 = 0 goods cheap relative to consumption goods at later times.
By doing this, it lowers the value of time 𝑡 = 0 debt that it has inherited and must finance.
In the preceding example, the Ramsey tax rate at time 0 differs from its value at time 1.
To explore what is going on here, let’s simplify things by removing the possibility of war at time 𝑡 = 3.
The Ramsey problem then includes no randomness because 𝑔𝑡 = 𝑔𝑙 for all 𝑡.
The figure below plots the Ramsey tax rates and gross interest rates at time 𝑡 = 0 and time 𝑡 ≥ 1 as functions of the
initial government debt (using the sequential allocation solution and a CRRA utility function defined above)
n = 100
tax_policy = np.empty((n, 2))
interest_rate = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
for i in range(n):
tax_policy[i] = tax_seq.simulate(gov_debt[i], 0, 2)[3]
interest_rate[i] = tax_seq.simulate(gov_debt[i], 0, 3)[-1]
fig.tight_layout()
plt.show()
The figure indicates that if the government enters with positive debt, it sets a tax rate at 𝑡 = 0 that is less than all later tax
rates.
By setting a lower tax rate at 𝑡 = 0, the government raises consumption, which reduces the value 𝑢𝑐,0 𝑏0 of its initial debt.
It does this by increasing 𝑐0 and thereby lowering 𝑢𝑐,0 .
Conversely, if 𝑏0 < 0, the Ramsey planner sets the tax rate at 𝑡 = 0 higher than in subsequent periods.
A side effect of lowering time 𝑡 = 0 consumption is that it lowers the one-period interest rate at time 𝑡 = 0 below that of
subsequent periods.
There are only two values of initial government debt at which the tax rate is constant for all 𝑡 ≥ 0.
The first is 𝑏0 = 0
• Here the government can’t use the 𝑡 = 0 tax rate to alter the value of the initial debt.
The second occurs when the government enters with sufficiently large assets that the Ramsey planner can achieve first
best and sets 𝜏𝑡 = 0 for all 𝑡.
It is only for these two values of initial government debt that the Ramsey plan is time-consistent.
Another way of saying this is that, except for these two values of initial government debt, a continuation of a Ramsey
plan is not a Ramsey plan.
To illustrate this, consider a Ramsey planner who starts with an initial government debt 𝑏1 associated with one of the
Ramsey plans computed above.
Call 𝜏1𝑅 the time 𝑡 = 0 tax rate chosen by the Ramsey planner confronting this value for initial government debt govern-
ment.
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner and what a new Ramsey planner
would choose for its time 𝑡 = 0 tax rate
n = 100
tax_policy = np.empty((n, 2))
τ_reset = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
for i in range(n):
tax_policy[i] = tax_seq.simulate(gov_debt[i], 0, 2)[3]
τ_reset[i] = tax_seq.simulate(gov_debt[i], 0, 1)[3]
fig.tight_layout()
plt.show()
The tax rates in the figure are equal for only two values of initial government debt.
The complete tax smoothing for 𝑡 ≥ 1 in the preceding example is a consequence of our having assumed CRRA prefer-
ences.
To see what is driving this outcome, we begin by noting that the Ramsey tax rate for 𝑡 ≥ 1 is a time-invariant function
𝜏 (Φ, 𝑔) of the Lagrange multiplier on the implementability constraint and government expenditures.
For CRRA preferences, we can exploit the relations 𝑈𝑐𝑐 𝑐 = −𝜎𝑈𝑐 and 𝑈𝑛𝑛 𝑛 = 𝛾𝑈𝑛 to derive
(1 + (1 − 𝜎)Φ)𝑈𝑐
=1
(1 + (1 − 𝛾)Φ)𝑈𝑛
from the first-order conditions.
This equation immediately implies that the tax rate is constant.
For other preferences, the tax rate may not be constant.
For example, let the period utility function be
log_util_data = [
('β', float64),
('ψ', float64)
]
@jitclass(log_util_data)
(continues on next page)
def __init__(self,
β=0.9,
ψ=0.69):
self.β, self.ψ = β, ψ
# Utility function
def U(self, c, l):
return np.log(c) + self.ψ * np.log(l)
Also, suppose that 𝑔𝑡 follows a two-state IID process with equal probabilities attached to 𝑔𝑙 and 𝑔ℎ .
To compute the tax rate, we will use both the sequential and recursive approaches described above.
The figure below plots a sample path of the Ramsey tax rate
log_example = LogUtility()
# Solve sequential problem
seq_log = SequentialLS(log_example)
T_length = 20
sHist = np.array([0, 0, 0, 0, 0,
0, 0, 0, 1, 1,
0, 0, 0, 1, 1,
1, 1, 1, 1, 0])
# Simulate
sim_seq = seq_log.simulate(0.5, 0, T_length, sHist)
sim_rec = rec_log.simulate(0.5, 0, T_length, sHist)
axes.flatten()[0].legend(('Sequential', 'Recursive'))
fig.tight_layout()
plt.show()
As should be expected, the recursive and sequential solutions produce almost identical allocations.
Unlike outcomes with CRRA preferences, the tax rate is not perfectly smoothed.
Instead, the government raises the tax rate when 𝑔𝑡 is high.
A related lecture describes an extension of the Lucas-Stokey model by Aiyagari, Marcet, Sargent, and Seppä lä (2002)
[Aiyagari et al., 2002].
In the AMSS economy, only a risk-free bond is traded.
That lecture compares the recursive representation of the Lucas-Stokey model presented in this lecture with one for an
AMSS economy.
By comparing these recursive formulations, we shall glean a sense in which the dimension of the state is lower in the
Lucas Stokey model.
Accompanying that difference in dimension will be different dynamics of government debt.
FORTYSIX
In addition to what’s in Anaconda, this lecture will need the following libraries:
46.1 Overview
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import root
from interpolation.splines import eval_linear, UCGrid, nodes
from quantecon import optimize, MarkovChain
from numba import njit, prange, float64
from numba.experimental import jitclass
In an earlier lecture, we described a model of optimal taxation with state-contingent debt due to Robert E. Lucas, Jr., and
Nancy Stokey [Lucas and Stokey, 1983].
Aiyagari, Marcet, Sargent, and Seppä lä [Aiyagari et al., 2002] (hereafter, AMSS) studied optimal taxation in a model
without state-contingent debt.
In this lecture, we
• describe assumptions and equilibrium concepts
• solve the model
• implement the model numerically
• conduct some policy experiments
• compare outcomes with those in a corresponding complete-markets model
We begin with an introduction to the model.
891
Advanced Quantitative Economics with Python
Many but not all features of the economy are identical to those of the Lucas-Stokey economy.
Let’s start with things that are identical.
For 𝑡 ≥ 0, a history of the state is represented by 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ].
Government purchases 𝑔(𝑠) are an exact time-invariant function of 𝑠.
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at history 𝑠𝑡 at time 𝑡.
Each period a representative household is endowed with one unit of time that can be divided between leisure ℓ𝑡 and labor
𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between consumption 𝑐𝑡 (𝑠𝑡 ) and 𝑔(𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (46.3)
𝑡=0 𝑠𝑡
where
• 𝜋𝑡 (𝑠𝑡 ) is a joint probability distribution over the sequence 𝑠𝑡 , and
• the utility function 𝑢 is increasing, strictly concave, and three times continuously differentiable in both arguments.
The government imposes a flat rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities; also see smoothing models.
It is at this point that AMSS [Aiyagari et al., 2002] modify the Lucas and Stokey economy.
AMSS allow the government to issue only one-period risk-free debt each period.
Ruling out complete markets in this way is a step in the direction of making total tax collections behave more like that
prescribed in Robert Barro (1979) [Barro, 1979] than they do in Lucas and Stokey (1983) [Lucas and Stokey, 1983].
back to the private sector? It would not in an economy with state-contingent debt, since any such allocation could be improved by lowering distortionary
taxes rather than handing out lump-sum transfers. But, without state-contingent debt there can be circumstances when a government would like to make
lump-sum transfers to the private sector.
That 𝑏𝑡+1 (𝑠𝑡 ) is the same for all realizations of 𝑠𝑡+1 captures its risk-free character.
The market value at time 𝑡 of government debt maturing at time 𝑡 + 1 equals 𝑏𝑡+1 (𝑠𝑡 ) divided by 𝑅𝑡 (𝑠𝑡 ).
The government’s budget constraint in period 𝑡 at history 𝑠𝑡 is
𝑏𝑡+1 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝜏𝑡𝑛 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) − 𝑔(𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) +
𝑅𝑡 (𝑠𝑡 )
(46.4)
𝑏 (𝑠𝑡 )
≡ 𝑧𝑡 (𝑠 ) + 𝑡+1 𝑡 ,
𝑡
𝑅𝑡 (𝑠 )
1 𝑢 (𝑠𝑡+1 )
𝑡
= ∑ 𝛽𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑐 𝑡
𝑅𝑡 (𝑠 ) 𝑠𝑡+1 |𝑠𝑡 𝑢𝑐 (𝑠 )
Substituting this expression into the government’s budget constraint (46.4) yields:
𝑢𝑐 (𝑠𝑡+1 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝑧𝑡 (𝑠𝑡 ) + 𝛽 ∑ 𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑏 (𝑠𝑡 ) (46.5)
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+1
Components of 𝑧𝑡 (𝑠𝑡 ) on the right side depend on 𝑠𝑡 , but the left side is required to depend only on 𝑠𝑡−1 .
This is what it means for one-period government debt to be risk-free.
Therefore, the right side of equation (46.5) also has to depend only on 𝑠𝑡−1 .
This requirement will give rise to measurability constraints on the Ramsey allocation to be discussed soon.
If we replace 𝑏𝑡+1 (𝑠𝑡 ) on the right side of equation (46.5) by the right side of next period’s budget constraint (associated
with a particular realization 𝑠𝑡 ) we get
After making similar repeated substitutions for all future occurrences of government indebtedness, and by invoking a
natural debt limit, we arrive at:
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = ∑ ∑ 𝛽 𝑗 𝜋𝑡+𝑗 (𝑠𝑡+𝑗 |𝑠𝑡 ) 𝑧 (𝑠𝑡+𝑗 ) (46.6)
𝑗=0 𝑠𝑡+𝑗 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+𝑗
Notice how the conditioning sets in equation (46.6) differ: they are 𝑠𝑡−1 on the left side and 𝑠𝑡 on the right side.
Now let’s
• substitute the resource constraint into the net-of-interest government surplus, and
• use the household’s first-order condition 1 − 𝜏𝑡𝑛 (𝑠𝑡 ) = 𝑢ℓ (𝑠𝑡 )/𝑢𝑐 (𝑠𝑡 ) to eliminate the labor tax rate
so that we can express the net-of-interest government surplus 𝑧𝑡 (𝑠𝑡 ) as
𝑢ℓ (𝑠𝑡 )
𝑧𝑡 (𝑠𝑡 ) = [1 − ] [𝑐𝑡 (𝑠𝑡 ) + 𝑔(𝑠𝑡 )] − 𝑔(𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) . (46.7)
𝑢𝑐 (𝑠𝑡 )
If we substitute appropriate versions of the right side of (46.7) for 𝑧𝑡+𝑗 (𝑠𝑡+𝑗 ) into equation (46.6), we obtain a sequence
of implementability constraints on a Ramsey allocation in an AMSS economy.
Expression (46.6) at time 𝑡 = 0 and initial state 𝑠0 was also an implementability constraint on a Ramsey allocation in a
Lucas-Stokey economy:
∞
𝑢𝑐 (𝑠𝑗 )
𝑏0 (𝑠−1 ) = 𝔼0 ∑ 𝛽 𝑗 𝑧 (𝑠𝑗 ) (46.8)
𝑗=0
𝑢𝑐 (𝑠0 ) 𝑗
The expression on the right side of (46.9) in the Lucas-Stokey (1983) economy would equal the present value of a
continuation stream of government net-of-interest surpluses evaluated at what would be competitive equilibrium Arrow-
Debreu prices at date 𝑡.
In the Lucas-Stokey economy, that present value is measurable with respect to 𝑠𝑡 .
In the AMSS economy, the restriction that government debt be risk-free imposes that that same present value must be
measurable with respect to 𝑠𝑡−1 .
In a language used in the literature on incomplete markets models, it can be said that the AMSS model requires that at
each (𝑡, 𝑠𝑡 ) what would be the present value of continuation government net-of-interest surpluses in the Lucas-Stokey
model must belong to the marketable subspace of the AMSS model.
After we have substituted the resource constraint into the utility function, we can express the Ramsey problem as being
to choose an allocation that solves
∞
max 𝔼0 ∑ 𝛽 𝑡 𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔(𝑠𝑡 ))
{𝑐𝑡 (𝑠 ),𝑏𝑡+1 (𝑠𝑡 )}
𝑡
𝑡=0
and
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝔼𝑡 ∑ 𝛽 𝑗 𝑧 (𝑠𝑡+𝑗 ) = 𝑏𝑡 (𝑠𝑡−1 ) ∀ 𝑡, 𝑠𝑡 (46.11)
𝑗=0
𝑢𝑐 (𝑠𝑡 ) 𝑡+𝑗
given 𝑏0 (𝑠−1 ).
Lagrangian Formulation
A negative multiplier 𝛾𝑡 (𝑠𝑡 ) < 0 means that if we could relax constraint (46.11), we would like to increase the beginning-
of-period indebtedness for that particular realization of history 𝑠𝑡 .
That would let us reduce the beginning-of-period indebtedness for some other history2 .
These features flow from the fact that the government cannot use state-contingent debt and therefore cannot allocate its
indebtedness efficiently across future states.
where
In (46.12), the second equality uses the law of iterated expectations and Abel’s summation formula (also called summation
by parts, see this page).
First-order conditions with respect to 𝑐𝑡 (𝑠𝑡 ) can be expressed as
𝑢𝑐 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 ) + Ψ𝑡 (𝑠𝑡 ) {[𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑧𝑡 (𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 ) 𝑧𝑐 (𝑠𝑡 )}
(46.14)
− 𝛾𝑡 (𝑠𝑡 ) [𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑏𝑡 (𝑠𝑡−1 ) = 0
2 From the first-order conditions for the Ramsey problem, there exists another realization 𝑠𝑡̃ with the same history up until the previous period, i.e.,
𝑠𝑡−1
̃ = 𝑠𝑡−1 , but where the multiplier on constraint (46.11) takes a positive value, so 𝛾𝑡 (𝑠𝑡̃ ) > 0.
If we substitute 𝑧𝑡 (𝑠𝑡 ) from (46.7) and its derivative 𝑧𝑐 (𝑠𝑡 ) into the first-order condition (46.14), we find two differences
from the corresponding condition for the optimal allocation in a Lucas-Stokey economy with state-contingent government
debt.
1. The term involving 𝑏𝑡 (𝑠𝑡−1 ) in the first-order condition (46.14) does not appear in the corresponding expression
for the Lucas-Stokey economy.
• This term reflects the constraint that beginning-of-period government indebtedness must be the same across
all realizations of next period’s state, a constraint that would not be present if government debt could be
state-contingent.
2. The Lagrange multiplier Ψ𝑡 (𝑠𝑡 ) in the first-order condition (46.14) may change over time in response to realizations
of the state, while the multiplier Φ in the Lucas-Stokey economy is time-invariant.
We need some code from an earlier lecture on optimal taxation with state-contingent debt sequential allocation imple-
mentation:
class SequentialLS:
'''
Class that takes a preference object, state transition matrix,
and state contingent government expenditure plan as inputs, and
solves the sequential allocation problem described above.
It returns optimal allocations about consumption and labor supply,
as well as the multiplier on the implementability constraint Φ.
'''
def __init__(self,
pref,
π=np.full((2, 2), 0.5),
g=np.array([0.1, 0.2])):
pref = self.pref
Uc, Ul = pref.Uc, pref.Ul
n = c + g
l = 1 - n
self.cFB = res.x
self.nFB = self.cFB + g
pref = self.pref
Uc, Ucc, Ul, Ull, Ulc = pref.Uc, pref.Ucc, pref.Ul, pref.Ull, pref.Ulc
n = c + g
l = 1 - n
return diff
c = res.x
n = c + g
l = 1 - n
# Compute x
I = pref.Uc(c, n) * c - pref.Ul(c, l) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x
pref = self.pref
Ucc, Ulc = pref.Ucc, pref.Ulc
n0 = c0 + g0
l0 = 1 - n0
return diff
c, n, x = self.time1_allocation(Φ)
return Φ, c0, n0
if sHist is None:
sHist = self.mc.simulate(T, s0)
# Time 0
Φ, cHist[0], nHist[0] = self.time0_allocation(b0, s0)
τHist[0] = self.τ(cHist[0], nHist[0])
Bhist[0] = b0
ΦHist[0] = Φ
# Time 1 onward
for t in range(1, T):
c, n, x = self.time1_allocation(Φ)
τ = self.τ(c, n)
u_c = Uc(c, 1-n)
s = sHist[t]
Eu_c = π[sHist[t-1]] @ u_c
cHist[t], nHist[t], Bhist[t], τHist[t] = c[s], n[s], x[s] / u_c[s], τ[s]
RHist[t-1] = Uc(cHist[t-1], 1-nHist[t-1]) / (β * Eu_c)
ΦHist[t] = Φ
gHist = self.g[sHist]
yHist = nHist
return [cHist, nHist, Bhist, τHist, gHist, yHist, sHist, ΦHist, RHist]
To analyze the AMSS model, we find it useful to adopt a recursive formulation using techniques like those in our lectures
on dynamic Stackelberg models and optimal taxation with state-contingent debt.
where 𝑅𝑡 (𝑠𝑡 ) is the gross risk-free rate of interest between 𝑡 and 𝑡 + 1 at history 𝑠𝑡 and 𝑇𝑡 (𝑠𝑡 ) are non-negative transfers.
Throughout this lecture, we shall set transfers to zero (for some issues about the limiting behavior of debt, this is possibly
an important difference from AMSS [Aiyagari et al., 2002], who restricted transfers to be non-negative).
In this case, the household faces a sequence of budget constraints
𝑏𝑡 (𝑠𝑡−1 ) + (1 − 𝜏𝑡 (𝑠𝑡 ))𝑛𝑡 (𝑠𝑡 ) = 𝑐𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) (46.16)
The household’s first-order conditions are 𝑢𝑐,𝑡 = 𝛽𝑅𝑡 𝔼𝑡 𝑢𝑐,𝑡+1 and (1 − 𝜏𝑡 )𝑢𝑐,𝑡 = 𝑢𝑙,𝑡 .
Using these to eliminate 𝑅𝑡 and 𝜏𝑡 from budget constraint (46.16) gives
𝑢𝑐,𝑡 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡−1 ) + 𝑢𝑙,𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) = 𝑢𝑐,𝑡 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) + 𝛽(𝔼𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 ) (46.18)
Now define
𝑏𝑡+1 (𝑠𝑡 )
𝑥𝑡 ≡ 𝛽𝑏𝑡+1 (𝑠𝑡 )𝔼𝑡 𝑢𝑐,𝑡+1 = 𝑢𝑐,𝑡 (𝑠𝑡 ) (46.19)
𝑅𝑡 (𝑠𝑡 )
for 𝑡 ≥ 1.
The right side of equation (46.21) expresses the time 𝑡 value of government debt in terms of a linear combination of
terms whose individual components are measurable with respect to 𝑠𝑡 .
The sum of terms on the right side of equation (46.21) must equal 𝑏𝑡 (𝑠𝑡−1 ).
That implies that it has to be measurable with respect to 𝑠𝑡−1 .
Equations (46.21) are the measurability constraints that the AMSS model adds to the single time 0 implementation con-
straint imposed in the Lucas and Stokey model.
Let Π(𝑠|𝑠− ) be a Markov transition matrix whose entries tell probabilities of moving from state 𝑠− to state 𝑠 in one
period.
Let
• 𝑉 (𝑥− , 𝑠− ) be the continuation value of a continuation Ramsey plan at 𝑥𝑡−1 = 𝑥− , 𝑠𝑡−1 = 𝑠− for 𝑡 ≥ 1
• 𝑊 (𝑏, 𝑠) be the value of the Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠
We distinguish between two types of planners:
For 𝑡 ≥ 1, the value function for a continuation Ramsey planner satisfies the Bellman equation
𝑢𝑐 (𝑠)𝑥−
= 𝑢𝑐 (𝑠)(𝑛(𝑠) − 𝑔(𝑠)) − 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (46.23)
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
A continuation Ramsey planner at 𝑡 ≥ 1 takes (𝑥𝑡−1 , 𝑠𝑡−1 ) = (𝑥− , 𝑠− ) as given and before 𝑠 is realized chooses
(𝑛𝑡 (𝑠𝑡 ), 𝑥𝑡 (𝑠𝑡 )) = (𝑛(𝑠), 𝑥(𝑠)) for 𝑠 ∈ 𝑆.
The Ramsey planner takes (𝑏0 , 𝑠0 ) as given and chooses (𝑛0 , 𝑥0 ).
The value function 𝑊 (𝑏0 , 𝑠0 ) for the time 𝑡 = 0 Ramsey planner satisfies the Bellman equation
Let 𝜇(𝑠|𝑠− )Π(𝑠|𝑠− ) be a Lagrange multiplier on the constraint (46.23) for state 𝑠.
After forming an appropriate Lagrangian, we find that the continuation Ramsey planner’s first-order condition with respect
to 𝑥(𝑠) is
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ Π(𝑠|𝑠− )𝜇(𝑠|𝑠− ) (46.27)
𝑠
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠̃ − )𝑢𝑐 (𝑠)̃
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ (Π(𝑠|𝑠− ) ) 𝑉𝑥 (𝑥, 𝑠) (46.28)
𝑠
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌ 𝑢𝑐 (𝑠)
Π(𝑠|𝑠 − ) ≡ Π(𝑠|𝑠− )
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
Exercise 46.3.1
̌
Please verify that Π(𝑠|𝑠 − ) is a valid Markov transition density, i.e., that its elements are all non-negative and that for each
𝑠− , the sum over 𝑠 equals unity.
Along a Ramsey plan, the state variable 𝑥𝑡 = 𝑥𝑡 (𝑠𝑡 , 𝑏0 ) becomes a function of the history 𝑠𝑡 and initial government debt
𝑏0 .
In Lucas-Stokey model, we found that
• a counterpart to 𝑉𝑥 (𝑥, 𝑠) is time-invariant and equal to the Lagrange multiplier on the Lucas-Stokey implementabil-
ity constraint
• time invariance of 𝑉𝑥 (𝑥, 𝑠) is the source of a key feature of the Lucas-Stokey model, namely, state variable
degeneracy in which 𝑥𝑡 is an exact time-invariant function of 𝑠𝑡 .
That 𝑉𝑥 (𝑥, 𝑠) varies over time according to a twisted martingale means that there is no state-variable degeneracy in the
AMSS model.
In the AMSS model, both 𝑥 and 𝑠 are needed to describe the state.
This property of the AMSS model transmits a twisted martingale component to consumption, employment, and the tax
rate.
For quasi-linear preferences, the first-order condition for maximizing (46.22) subject to (46.23) with respect to 𝑛(𝑠)
becomes
When 𝜇(𝑠|𝑠− ) = 𝛽𝑉𝑥 (𝑥(𝑠), 𝑥) converges to zero, in the limit 𝑢𝑙 (𝑠) = 1 = 𝑢𝑐 (𝑠), so that 𝜏 (𝑥(𝑠), 𝑠) = 0.
Thus, in the limit, if 𝑔𝑡 is perpetually random, the government accumulates sufficient assets to finance all expenditures
from earnings on those assets, returning any excess revenues to the household as non-negative lump-sum transfers.
46.3.7 Code
class AMSS:
# WARNING: THE CODE IS EXTREMELY SENSITIVE TO CHOCIES OF PARAMETERS.
# DO NOT CHANGE THE PARAMETERS AND EXPECT IT TO WORK
self.V_solved = False
self.W_solved = False
T_v = self.T_v
self.success = False
V_new = np.zeros_like(V)
Δ = 1.0
for itr in range(maxitr):
T_v(V, V_new, σ_v_star, self.pref)
Δ = np.max(np.abs(V_new - V))
if Δ < tol_vfi:
self.V_solved = True
print('Successfully completed VFI after %i iterations'
% (itr+1))
break
if (itr + 1) % print_itr == 0:
print('Error at iteration %i : ' % (itr + 1), Δ)
V[:] = V_new[:]
(continues on next page)
self.V = V
self.σ_v_star = σ_v_star
return V, σ_v_star
self.W = W
self.σ_w_star = σ_w_star
self.W_solved = True
print('Succesfully solved the time 0 problem.')
return W, σ_w_star
pref = self.pref
x_grid, g, β, S = self.x_grid, self.g, self.β, self.S
σ_v_star, σ_w_star = self.σ_v_star, self.σ_w_star
T = len(s_hist)
s_0 = s_hist[0]
# Pre-allocate
n_hist = np.zeros(T)
x_hist = np.zeros(T)
c_hist = np.zeros(T)
τ_hist = np.zeros(T)
b_hist = np.zeros(T)
g_hist = np.zeros(T)
# Compute t = 0
l_0, T_0 = σ_w_star[s_0]
c_0 = (1 - l_0) - g[s_0]
x_0 = (-pref.Uc(c_0, l_0) * (c_0 - T_0 - b_0) +
pref.Ul(c_0, l_0) * (1 - l_0))
# Compute t > 0
for t in range(T - 1):
x_ = x_hist[t]
s_ = s_hist[t]
l = np.zeros(S)
T = np.zeros(S)
for s in range(S):
x_arr = np.array([x_])
l[s] = eval_linear(x_grid, σ_v_star[s_, :, s], x_arr)
T[s] = eval_linear(x_grid, σ_v_star[s_, :, S+s], x_arr)
c = (1 - l) - g
u_c = pref.Uc(c, l)
Eu_c = Π[s_] @ u_c
c_next = c[s_hist[t+1]]
l_next = l[s_hist[t+1]]
x_hist[t+1] = x[s_hist[t+1]]
n_hist[t+1] = 1 - l_next
c_hist[t+1] = c_next
τ_hist[t+1] = 1 - pref.Ul(c_next, l_next) / pref.Uc(c_next, l_next)
b_hist[t+1] = x_ / (β * Eu_c)
g_hist[t+1] = g[s_hist[t+1]]
@njit
def obj_V(σ, state, V, pref):
# Unpack state
s_, x_ = state
l = σ[:S]
T = σ[S:]
c = (1 - l) - g
u_c = pref.Uc(c, l)
Eu_c = Π[s_] @ u_c
x = u_c * x_ / (β * Eu_c) - u_c * (c - T) + pref.Ul(c, l) * (1 - l)
V_next = np.zeros(S)
for s in range(S):
return out
@njit
def obj_W(σ, state, V, pref):
# Unpack state
s_, b_0 = state
l, T = σ
c = (1 - l) - g[s_]
x = -pref.Uc(c, l) * (c - T - b_0) + pref.Ul(c, l) * (1 - l)
return out
@njit(parallel=True)
def T_v(V, V_new, σ_star, pref):
for s_ in prange(S):
for x_i in prange(n):
state = (s_, x_nodes[x_i])
x0 = σ_star[s_, x_i]
res = optimize.nelder_mead(obj_V, x0, bounds=bounds_v,
args=(state, V, pref))
if res.success:
V_new[s_, x_i] = res.fun
σ_star[s_, x_i] = res.x
else:
print("Optimization routine failed.")
W[s_] = res.fun
σ_star[s_] = res.x
46.4 Examples
In our lecture on optimal taxation with state-contingent debt we studied how the government manages uncertainty in a
simple setting.
As in that lecture, we assume the one-period utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
Note: For convenience in matching our computer code, we have expressed utility as a function of 𝑛 rather than leisure
𝑙.
We first consider a government expenditure process that we studied earlier in a lecture on optimal taxation with state-
contingent debt.
Government expenditures are known for sure in all periods except one.
• For 𝑡 < 3 or 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2.
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1.
A useful trick is to define components of the state vector as the following six (𝑡, 𝑔) pairs:
(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 )
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6.
The transition matrix is
0 1 0 0 0 0
⎛
⎜0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
𝑃 =⎜
⎜ ⎟
⎜0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
The government expenditure at each state is
0.1
⎛
⎜ 0.1⎞
⎟
⎜
⎜ ⎟
⎟
0.1
𝑔=⎜
⎜ ⎟
⎜ 0.1⎟
⎟
⎜
⎜0.2⎟⎟
⎝0.1⎠
crra_util_data = [
('β', float64),
('σ', float64),
('γ', float64)
]
@jitclass(crra_util_data)
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2):
# Utility function
def U(self, c, l):
# Note: `l` should not be interpreted as labor, it is an auxiliary
# variable used to conveniently match the code and the equations
# in the lecture
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - (1-l) ** (1 + self.γ) / (1 + self.γ)
The following figure plots Ramsey plans under complete and incomplete markets for both possible realizations of the
state at time 𝑡 = 3.
Ramsey outcomes and policies when the government has access to state-contingent debt are represented by black lines
and by red lines when there is only a risk-free bond.
Paths with circles are histories in which there is peace, while those with triangle denote war.
x_min = -1.5555
x_max = 17.339
x_num = 300
S = len(Π)
bounds_v = np.vstack([np.hstack([np.full(S, -10.), np.zeros(S)]),
np.hstack([np.ones(S) - g, np.full(S, 10.)])]).T
W = np.empty(len(Π))
b_0 = 1.0
σ_w_star = np.ones((S, 2))
σ_w_star[:, 0] = -0.05
%%time
===============
Solve time 1 problem
===============
plt.tight_layout()
plt.show()
How a Ramsey planner responds to war depends on the structure of the asset market.
If it is able to trade state-contingent debt, then at time 𝑡 = 2
• the government purchases an Arrow security that pays off when 𝑔3 = 𝑔ℎ
• the government sells an Arrow security that pays off when 𝑔3 = 𝑔𝑙
• the Ramsey planner designs these purchases and sales designed so that, regardless of whether or not there is a war
at 𝑡 = 3, the government begins period 𝑡 = 4 with the same government debt
This pattern facilities smoothing tax rates across states.
The government without state-contingent debt cannot do this.
Instead, it must enter time 𝑡 = 3 with the same level of debt falling due whether there is peace or war at 𝑡 = 3.
The risk-free rate between time 2 and time 3 is unusually low because at time 2 consumption at time 3 is expected to be
unusually low.
A low risk-free rate of return on government debt between time 2 and time 3 allows the government to enter period 3
with lower government debt than it entered period 2.
To finance a war at time 3 it raises taxes and issues more debt to carry into perpetual peace that begins in period 4.
To service the additional debt burden, it raises taxes in all future periods.
The absence of state-contingent debt leads to an important difference in the optimal tax policy.
When the Ramsey planner has access to state-contingent debt, the optimal tax policy is history independent
• the tax rate is a function of the current level of government spending only, given the Lagrange multiplier on the
implementability constraint
History dependence occurs more dramatically in a case in which the government perpetually faces the prospect of war.
This case was studied in the final example of the lecture on optimal taxation with state-contingent debt.
There, each period the government faces a constant probability, 0.5, of war.
In addition, this example features the following preferences
log_util_data = [
('β', float64),
('ψ', float64)
]
@jitclass(log_util_data)
class LogUtility:
def __init__(self,
β=0.9,
ψ=0.69):
self.β, self.ψ = β, ψ
# Utility function
def U(self, c, l):
return np.log(c) + self.ψ * np.log(l)
With these preferences, Ramsey tax rates will vary even in the Lucas-Stokey model with state-contingent debt.
The figure below plots optimal tax policies for both the economy with state-contingent debt (circles) and the economy
with only a risk-free bond (triangles).
x_min = -3.4107
x_max = 3.709
x_num = 300
S = len(Π)
bounds_v = np.vstack([np.zeros(2 * S), np.hstack([1 - g, np.ones(S)]) ]).T
V = np.zeros((len(Π), x_num))
V[:] = -(nodes(x_grid).T + x_max) ** 2 / 14
W = np.empty(len(Π))
b_0 = 0.5
σ_w_star = 1 - np.full((S, 2), 0.55)
%%time
===============
Solve time 1 problem
===============
T = len(s_hist)
When the government experiences a prolonged period of peace, it is able to reduce government debt and set persistently
lower tax rates.
However, the government finances a long war by borrowing and raising taxes.
This results in a drift away from policies with state-contingent debt that depends on the history of shocks.
This is even more evident in the following figure that plots the evolution of the two policies over 200 periods.
This outcome reflects the presence of a force for precautionary saving that the incomplete markets structure imparts to
the Ramsey plan.
In this subsequent lecture and this subsequent lecture, some ultimate consequences of that force are explored.
T = 200
s_0 = 0
mc = MarkovChain(Π)
FORTYSEVEN
In addition to what’s in Anaconda, this lecture will need the following libraries:
47.1 Overview
This lecture extends our investigations of how optimal policies for levying a flat-rate tax on labor income and issuing
government debt depend on whether there are complete markets for debt.
A Ramsey allocation and Ramsey policy in the AMSS [Aiyagari et al., 2002] model described in optimal taxation without
state-contingent debt generally differs from a Ramsey allocation and Ramsey policy in the Lucas-Stokey [Lucas and Stokey,
1983] model described in optimal taxation with state-contingent debt.
This is because the implementability restriction that a competitive equilibrium with a distorting tax imposes on allocations
in the Lucas-Stokey model is just one among a set of implementability conditions imposed in the AMSS model.
These additional constraints require that time 𝑡 components of a Ramsey allocation for the AMSS model be measurable
with respect to time 𝑡 − 1 information.
The measurability constraints imposed by the AMSS model are inherited from the restriction that only one-period risk-
free bonds can be traded.
Differences between the Ramsey allocations in the two models indicate that at least some of the implementability con-
straints of the AMSS model of optimal taxation without state-contingent debt are violated at the Ramsey allocation of a
corresponding [Lucas and Stokey, 1983] model with state-contingent debt.
Another way to say this is that differences between the Ramsey allocations of the two models indicate that some of the
measurability constraints imposed by the AMSS model are violated at the Ramsey allocation of the Lucas-Stokey
model.
Nonzero Lagrange multipliers on those constraints make the Ramsey allocation for the AMSS model differ from the
Ramsey allocation for the Lucas-Stokey model.
This lecture studies a special AMSS model in which
• The exogenous state variable 𝑠𝑡 is governed by a finite-state Markov chain.
• With an arbitrary budget-feasible initial level of government debt, the measurability constraints
– bind for many periods, but ….
– eventually, they stop binding evermore, so that …
– in the tail of the Ramsey plan, the Lagrange multipliers 𝛾𝑡 (𝑠𝑡 ) on the AMSS implementability constraints
(46.8) are zero.
919
Advanced Quantitative Economics with Python
• After the implementability constraints (46.8) no longer bind in the tail of the AMSS Ramsey plan
– history dependence of the AMSS state variable 𝑥𝑡 vanishes and 𝑥𝑡 becomes a time-invariant function of the
Markov state 𝑠𝑡 .
– the par value of government debt becomes constant over time so that 𝑏𝑡+1 (𝑠𝑡 ) = 𝑏̄ for 𝑡 ≥ 𝑇 for a sufficiently
large 𝑇 .
– 𝑏̄ < 0, so that the tail of the Ramsey plan instructs the government always to make a constant par value of
risk-free one-period loans to the private sector.
– the one-period gross interest rate 𝑅𝑡 (𝑠𝑡 ) on risk-free debt converges to a time-invariant function of the
Markov state 𝑠𝑡 .
• For a particular 𝑏0 < 0 (i.e., a positive level of initial government loans to the private sector), the measurability
constraints never bind.
• In this special case
– the par value 𝑏𝑡+1 (𝑠𝑡 ) = 𝑏̄ of government debt at time 𝑡 and Markov state 𝑠𝑡 is constant across time and
states, but ….
𝑏̄
– the market value 𝑅𝑡 (𝑠𝑡 ) of government debt at time 𝑡 varies as a time-invariant function of the Markov state
𝑠𝑡 .
̄
– fluctuations in the interest rate make gross earnings on government debt 𝑅 𝑏(𝑠 ) fully insure the gross-of-gross-
𝑡 𝑡
interest-payments government budget against fluctuations in government expenditures.
– the state variable 𝑥 in a recursive representation of a Ramsey plan is a time-invariant function of the Markov
state for 𝑡 ≥ 0.
• In this special case, the Ramsey allocation in the AMSS model agrees with that in a Lucas-Stokey [Lucas and
Stokey, 1983] complete markets model in which the same amount of state-contingent debt falls due in all states
tomorrow
– it is a situation in which the Ramsey planner loses nothing from not being able to trade state-contingent debt
and being restricted to exchange only risk-free debt debt.
• This outcome emerges only when we initialize government debt at a particular 𝑏0 < 0.
In a nutshell, the reason for this striking outcome is that at a particular level of risk-free government assets, fluctuations
in the one-period risk-free interest rate provide the government with complete insurance against stochastically varying
government expenditures.
Let’s start with some imports:
The forces driving asymptotic outcomes here are examples of dynamics present in a more general class of incomplete
markets models analyzed in [Bhandari et al., 2017] (BEGS).
BEGS provide conditions under which government debt under a Ramsey plan converges to an invariant distribution.
BEGS construct approximations to that asymptotically invariant distribution of government debt under a Ramsey plan.
BEGS also compute an approximation to a Ramsey plan’s rate of convergence to that limiting invariant distribution.
We shall use the BEGS approximating limiting distribution and their approximating rate of convergence to help interpret
outcomes here.
For a long time, the Ramsey plan puts a nontrivial martingale-like component into the par value of government debt as
part of the way that the Ramsey plan imperfectly smooths distortions from the labor tax rate across time and Markov
states.
But BEGS show that binding implementability constraints slowly push government debt in a direction designed to let the
government use fluctuations in equilibrium interest rates rather than fluctuations in par values of debt to insure against
shocks to government expenditures.
• This is a weak (but unrelenting) force that, starting from a positive initial debt level, for a long time is dominated
by the stochastic martingale-like component of debt dynamics that the Ramsey planner uses to facilitate imperfect
tax-smoothing across time and states.
• This weak force slowly drives the par value of government assets to a constant level at which the government can
completely insure against government expenditure shocks while shutting down the stochastic component of debt
dynamics.
• At that point, the tail of the par value of government debt becomes a trivial martingale: it is constant over time.
Although we are studying an AMSS [Aiyagari et al., 2002] economy, a Lucas-Stokey [Lucas and Stokey, 1983] economy
plays an important role in the reverse-engineering calculation to be described below.
For that reason, it is helpful to have key equations underlying a Ramsey plan for the Lucas-Stokey economy readily
available.
Recall first-order conditions for a Ramsey allocation for the Lucas-Stokey economy.
For 𝑡 ≥ 1, these take the form
There is one such equation for each value of the Markov state 𝑠𝑡 .
Given an initial Markov state, the time 𝑡 = 0 quantities 𝑐0 and 𝑏0 satisfy
(1 + Φ)𝑢𝑐 (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐𝑐 (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓ𝑐 (𝑐, 1 − 𝑐 − 𝑔)]
(47.2)
= (1 + Φ)𝑢ℓ (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐ℓ (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓℓ (𝑐, 1 − 𝑐 − 𝑔)] + Φ(𝑢𝑐𝑐 − 𝑢𝑐,ℓ )𝑏0
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (47.3)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time 𝑡 = 0 and 𝜏0 is the time
𝑡 = 0 tax rate.
In equation (47.3), it is understood that
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0−1 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
It is useful to transform some of the above equations to forms that are more natural for analyzing the case of a CRRA
utility specification that we shall use in our example economies.
As in lectures optimal taxation without state-contingent debt and optimal taxation with state-contingent debt, we assume
that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
and set 𝜎 = 2, 𝛾 = 2, and the discount factor 𝛽 = 0.9.
We eliminate leisure from the model and continue to assume that
𝑐𝑡 + 𝑔 𝑡 = 𝑛 𝑡
The analysis of Lucas and Stokey prevails once we make the following replacements
𝑢ℓ (𝑐, ℓ) ∼ −𝑢𝑛 (𝑐, 𝑛)
𝑢𝑐 (𝑐, ℓ) ∼ 𝑢𝑐 (𝑐, 𝑛)
𝑢ℓ,ℓ (𝑐, ℓ) ∼ 𝑢𝑛𝑛 (𝑐, 𝑛)
𝑢𝑐,𝑐 (𝑐, ℓ) ∼ 𝑢𝑐,𝑐 (𝑐, 𝑛)
𝑢𝑐,ℓ (𝑐, ℓ) ∼ 0
With these understandings, equations (47.1) and (47.2) simplify in the case of the CRRA utility function.
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (47.5)
In equation (47.4), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠.
The CRRA utility function is represented in the following class.
import numpy as np
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=np.full((2, 2), 0.5),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
𝛽 = .9
𝜎=2
𝛾=2
Here are several classes that do most of the work for us.
The code is mostly taken or adapted from the earlier lectures optimal taxation without state-contingent debt and optimal
taxation with state-contingent debt.
import numpy as np
from scipy.optimize import root
from quantecon import MarkovChain
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) \
+ Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
import numpy as np
from scipy.optimize import fmin_slsqp
from scipy.optimize import root
from quantecon import MarkovChain
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# Time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
epsilon = 1e-7
x0 = np.asfarray(x)
f0 = np.atleast_1d(objf(x0))
jac = np.zeros([len(x0), len(f0)])
dx = np.zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (objf(x0+dx) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
fprime=objf_prime, full_output=True,
iprint=0, acc=self.tol, iter=self.
↪maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True,
iprint=0)
if imode > 0:
raise Exception(smode)
import numpy as np
from scipy.interpolate import UnivariateSpline
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
We can reverse engineer a value 𝑏0 of initial debt due that renders the AMSS measurability constraints not binding from
time 𝑡 = 0 onward.
We accomplish this by recognizing that if the AMSS measurability constraints never bind, then the AMSS allocation
and Ramsey plan is equivalent with that for a Lucas-Stokey economy in which for each period 𝑡 ≥ 0, the government
promises to pay the same state-contingent amount 𝑏̄ in each state tomorrow.
This insight tells us to find a 𝑏0 and other fundamentals for the Lucas-Stokey [Lucas and Stokey, 1983] model that make
the Ramsey planner want to borrow the same value 𝑏̄ next period for all states and all dates.
We accomplish this by using various equations for the Lucas-Stokey [Lucas and Stokey, 1983] model presented in optimal
taxation with state-contingent debt.
We use the following steps.
Step 1: Pick an initial Φ.
Step 2: Given that Φ, jointly solve two versions of equation (47.4) for 𝑐(𝑠), 𝑠 = 1, 2 associated with the two values for
𝑔(𝑠), 𝑠 = 1, 2.
Step 3: Solve the following equation for 𝑥⃗
𝑥(𝑠)
Step 4: After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the matrix equation
𝑥⃗
𝑏⃗ = (47.7)
𝑢⃗𝑐
u = CRRAutility()
def min_Φ(Φ):
# Solve Φ(c)
def equations(unknowns, Φ):
c1, c2 = unknowns
# First argument of .Uc and second argument of .Un are redundant
return loss
b_bar = b[0]
b_bar
-1.0757576567504166
To complete the reverse engineering exercise by jointly determining 𝑐0 , 𝑏0 , we set up a function that returns two simul-
taneous equations.
c0, b0 = unknowns
g0 = u.G[s-1]
(0.9344994030900681, -1.0386984075517638)
Thus, we have reverse engineered an initial 𝑏0 = −1.038698407551764 that ought to render the AMSS measurability
constraints slack.
The following graph shows simulations of outcomes for both a Lucas-Stokey economy and for an AMSS economy starting
from initial government debt equal to 𝑏0 = −1.038698407551764.
These graphs report outcomes for both the Lucas-Stokey economy with complete markets and the AMSS economy with
one-period risk-free debt only.
log_example = CRRAutility()
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:437: RuntimeWarning: Values in x were outside bounds during a␣
fx = wrapped_fun(x)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:441: RuntimeWarning: Values in x were outside bounds during a␣
g = append(wrapped_grad(x), 0.0)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:495: RuntimeWarning: Values in x were outside bounds during a␣
U = (c**(1 - σ) - 1) / (1 - σ)
/tmp/ipykernel_6773/108196118.py:29: RuntimeWarning: divide by zero encountered in␣
↪power
return c**(-self.σ)
/tmp/ipykernel_6773/1277371586.py:249: RuntimeWarning: invalid value encountered␣
↪in divide
0.04094445433232542
0.001673211146137493
0.001484674847917127
0.001313772136887205
0.0011814037130420663
0.001055965336102068
0.0009446661649946108
0.0008463807319492324
0.0007560453788611131
0.0006756001033938903
0.000604152845540819
0.0005396004518747859
0.00048207169166290613
0.00043082732064067867
0.00038481851351225495
0.000343835217593145
0.0003072436935049677
0.0002745009146233244
0.00024531773293589513
0.00021923324298642947
0.00019593539310787213
0.00017514303481690137
0.0001565593985003591
0.00013996737081815812
0.00012514457789841946
0.00011190070823325749
0.0001000702000922041
8.949728534363834e-05
8.00497532414663e-05
7.160585250570457e-05
6.405840591557493e-05
5.731160522780524e-05
5.1279701373366633e-05
4.588651722582404e-05
4.106390497232627e-05
3.6750969979187823e-05
3.289357328148953e-05
2.9443322731171715e-05
2.6356778254647064e-05
2.3595477005441402e-05
2.1124867549068547e-05
1.8914292342161616e-05
1.6935989661294087e-05
1.5165570482803087e-05
1.3581075188566359e-05
1.2162766163347089e-05
1.0893227516817513e-05
9.756678182519297e-06
8.739234428152772e-06
7.828320614508025e-06
7.012602839408298e-06
6.2821988113865695e-06
5.628118884533389e-06
5.0424276120745635e-06
4.517800318375349e-06
4.048011435284343e-06
3.6271819852132397e-06
3.250228025571809e-06
2.91255521672949e-06
2.6100632205124585e-06
2.339096372677708e-06
2.096300057053759e-06
1.8787856014677842e-06
1.6838896002658147e-06
1.5092763000475938e-06
1.352790440377663e-06
1.2125870135921682e-06
1.0869367592654264e-06
9.74329344948381e-07
8.734258726613521e-07
7.82979401245993e-07
7.019280421759928e-07
6.292786681149374e-07
5.641636376342722e-07
5.058008139530142e-07
4.5348427330256424e-07
4.0659062310367744e-07
3.6455314441729855e-07
3.2687002299145745e-07
2.930882045255147e-07
2.6280345786809706e-07
2.356529429295176e-07
2.1131168850248635e-07
1.8948851788438695e-07
1.6992245629426705e-07
1.5237965358488245e-07
1.3665054480740185e-07
1.2254729288266142e-07
1.0990157880047098e-07
9.85625196806722e-08
8.839490296454315e-08
7.927751099544721e-08
7.110169892267009e-08
6.377012234144897e-08
5.719543299951795e-08
5.129944108294742e-08
4.6011930465755267e-08
4.127024907212617e-08
3.7017901411273995e-08
3.320421136675924e-08
2.9783836454122435e-08
2.6716185879207155e-08
2.3964828404060055e-08
2.1497111441656643e-08
1.928376711102591e-08
1.7298534286134342e-08
1.5517887041510468e-08
1.3920711115842077e-08
1.2488086772484325e-08
1.120303914946054e-08
1.0050349805051883e-08
9.016372957223345e-09
8.088867717275256e-09
7.256860052028448e-09
6.5105080491085e-09
5.8409842196277625e-09
5.240371187393206e-09
4.701571286205833e-09
4.2182149401635156e-09
3.784594252430241e-09
3.3955835551064364e-09
3.0465910785331343e-09
2.7334965385949916e-09
2.4526029798499404e-09
2.2005967896788517e-09
1.9745023230252437e-09
1.7716540861495694e-09
1.5896779606666392e-09
1.4263644656786832e-09
1.279915801041798e-09
1.1484611488603225e-09
1.0305702313922867e-09
9.247647878021015e-10
8.298468061604299e-10
7.446744286173443e-10
6.682506157688693e-10
5.996765544062293e-10
5.381420956749845e-10
4.829271458904042e-10
4.3337871811544764e-10
3.8891892933983235e-10
3.4902066124392655e-10
3.1321799130111273e-10
2.8109002457092086e-10
2.5225950288597284e-10
2.263868938948011e-10
2.0316830484184638e-10
1.8233409175417047e-10
1.6363582056463494e-10
1.4685617665861112e-10
1.3179940303096093e-10
1.1828486777347211e-10
1.0615888599012755e-10
9.527490070407684e-11
The Ramsey allocations and Ramsey outcomes are identical for the Lucas-Stokey and AMSS economies.
This outcome confirms the success of our reverse-engineering exercises.
Notice how for 𝑡 ≥ 1, the tax rate is a constant - so is the par value of government debt.
However, output and labor supply are both nontrivial time-invariant functions of the Markov state.
The following graph shows the par value of government debt and the flat-rate tax on labor income for a long simulation
for our sample economy.
For the same realization of a government expenditure path, the graph reports outcomes for two economies
• the gray lines are for the Lucas-Stokey economy with complete markets
• the blue lines are for the AMSS economy with risk-free one-period debt only
For both economies, initial government debt due at time 0 is 𝑏0 = .5.
For the Lucas-Stokey complete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡+1 ).
• Notice that this is a time-invariant function of the Markov state from the beginning.
For the AMSS incomplete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡 ).
• Notice that this is a martingale-like random process that eventually seems to converge to a constant 𝑏̄ ≈ −1.07.
• Notice that the limiting value 𝑏̄ < 0 so that asymptotically the government makes a constant level of risk-free loans
to the public.
• In the simulation displayed as well as other simulations we have run, the par value of government debt converges
to about 1.07 after between 1400 to 2000 periods.
For the AMSS incomplete markets economy, the marginal tax rate on labor income 𝜏𝑡 converges to a constant
• labor supply and output each converge to time-invariant functions of the Markov state
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
As remarked above, after 𝑏𝑡+1 (𝑠𝑡 ) has converged to a constant, the measurability constraints in the AMSS model cease
to bind
• the associated Lagrange multipliers on those implementability constraints converge to zero
This leads us to seek an initial value of government debt 𝑏0 that renders the measurability constraints slack from time
𝑡 = 0 onward
• a tell-tale sign of this situation is that the Ramsey planner in a corresponding Lucas-Stokey economy would instruct
the government to issue a constant level of government debt 𝑏𝑡+1 (𝑠𝑡+1 ) across the two Markov states
We now describe how to find such an initial level of government debt.
It is useful to link the outcome of our reverse engineering exercise to limiting approximations constructed by BEGS
[Bhandari et al., 2017].
BEGS [Bhandari et al., 2017] used a slightly different notation to represent a generalization of the AMSS model.
We’ll introduce a version of their notation so that readers can quickly relate notation that appears in their key formulas to
the notation that we have used.
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to our notation by
𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅 =
𝑢𝑐,𝑡−1 𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑡−1
𝑏𝑡 (𝑠 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]
In terms of their notation, equation (44) of [Bhandari et al., 2017] expresses the time 𝑡 state 𝑠 government budget con-
straint as
where the dependence on 𝜏 is to remind us that these objects depend on the tax rate and 𝑠− is last period’s Markov state.
BEGS interpret random variations in the right side of (47.8) as a measure of fiscal risk composed of
• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government portfolio, namely,
ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡
cov∞ (ℛ, 𝒳)
ℬ∗ = − (47.9)
var∞ (ℛ)
where the superscript ∞ denotes a moment taken with respect to an ergodic distribution.
Formula (47.9) presents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribution.
This regression coefficient emerges as the minimizer for a variance-minimization problem:
The minimand in criterion (47.10) is the measure of fiscal risk associated with a given tax-debt policy that appears on
the right side of equation (47.8).
Expressing formula (47.9) in terms of our notation tells us that 𝑏̄ should approximately equal
ℬ∗
𝑏̂ = (47.11)
𝛽𝐸𝑡 𝑢𝑐,𝑡+1
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbitrary initial condition.
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
∗
≈ 2
(47.12)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var(ℛ)
(See the equation above equation (47) in [Bhandari et al., 2017])
For our example, we describe some code that we use to compute the steady state mean and the rate of convergence to it.
The values of 𝜋(𝑠) are 0.5, 0.5.
We can then construct 𝒳(𝑠), ℛ(𝑠), 𝑢𝑐 (𝑠) for our two states using the definitions above.
We can then construct 𝛽𝐸𝑡−1 𝑢𝑐 = 𝛽 ∑𝑠 𝑢𝑐 (𝑠)𝜋(𝑠), cov(ℛ(𝑠), 𝒳(𝑠)) and var(ℛ(𝑠)) to be plugged into formula (47.11).
We also want to compute var(𝒳).
To compute the variances and covariance, we use the following standard formulas.
Temporarily let 𝑥(𝑠), 𝑠 = 1, 2 be an arbitrary random variables.
Then we define
𝜇𝑥 = ∑ 𝑥(𝑠)𝜋(𝑠)
𝑠
cov(𝑥, 𝑦) = (∑ 𝑥(𝑠)𝑦(𝑠)𝜋(𝑠)) − 𝜇𝑥 𝜇𝑦
𝑠
After we compute these moments, we compute the BEGS approximation to the asymptotic mean 𝑏̂ in formula (47.11).
Here are some functions that we’ll use to compute key objects that we want
def mean(x):
'''Returns mean for x given initial state'''
x = np.array(x)
return x @ u.π[s]
def variance(x):
x = np.array(x)
return x**2 @ u.π[s] - mean(x)**2
Now let’s form the two random variables ℛ, 𝒳 appearing in the BEGS approximating formulas
u = CRRAutility()
s = 0
c = [0.940580824225584, 0.8943592757759343] # Vector for c
g = u.G # Vector for g
n = c + g # Total population
τ = lambda s: 1 + u.Un(1, n[s]) / u.Uc(c[s], 1)
R = [R_s(0), R_s(1)]
X = [X_s(0), X_s(1)]
Now let’s compute the ingredient of the approximating limit and the approximating rate of convergence
-1.0757585378303758
bhat, b_bar
(-1.0757585378303758, -1.0757576567504166)
So we have
bhat - b_bar
-8.810799592140484e-07
These outcomes show that 𝑏̂ does a remarkably good job of approximating 𝑏.̄
Next, let’s compute the BEGS fiscal criterion that 𝑏̂ is minimizing
-9.020562075079397e-17
This is machine zero, a verification that 𝑏̂ succeeds in minimizing the nonnegative fiscal cost criterion 𝐽 (ℬ∗ ) defined in
BEGS and in equation (47.13) above.
Let’s push our luck and compute the mean reversion speed in the formula above equation (47) in [Bhandari et al., 2017].
Now let’s compute the implied meantime to get to within 0.01 of the limit
The slow rate of convergence and the implied time of getting within one percent of the limiting value do a good job of
approximating our long simulation above.
In a subsequent lecture we shall study an extension of the model in which the force highlighted in this lecture causes
government debt to converge to a nontrivial distribution instead of the single debt level discovered here.
FORTYEIGHT
In addition to what’s in Anaconda, this lecture will need the following libraries:
48.1 Overview
This lecture studies government debt in an AMSS economy [Aiyagari et al., 2002] of the type described in Optimal
Taxation without State-Contingent Debt.
We study the behavior of government debt as time 𝑡 → +∞.
We use these techniques
• simulations
• a regression coefficient from the tail of a long simulation that allows us to verify that the asymptotic mean of
government debt solves a fiscal-risk minimization problem
• an approximation to the mean of an ergodic distribution of government debt
• an approximation to the rate of convergence to an ergodic distribution of government debt
We apply tools that are applicable to more general incomplete markets economies that are presented on pages 648 - 650
in section III.D of [Bhandari et al., 2017] (BEGS).
We study an AMSS economy [Aiyagari et al., 2002] with three Markov states driving government expenditures.
• In a previous lecture, we showed that with only two Markov states, it is possible that endogenous interest rate
fluctuations eventually can support complete markets allocations and Ramsey outcomes.
• The presence of three states prevents the full spanning that eventually prevails in the two-state example featured in
Fiscal Insurance via Fluctuating Interest Rates.
The lack of full spanning means that the ergodic distribution of the par value of government debt is nontrivial, in contrast
to the situation in Fiscal Insurance via Fluctuating Interest Rates in which the ergodic distribution of the par value of
government debt is concentrated on one point.
Nevertheless, [Bhandari et al., 2017] (BEGS) establish that, for general settings that include ours, the Ramsey planner
steers government assets to a level that comes as close as possible to providing full spanning in a precise a sense defined
by BEGS that we describe below.
We use code constructed in Fluctuating Interest Rates Deliver Fiscal Insurance.
Warning: Key equations in [Bhandari et al., 2017] section III.D carry typos that we correct below.
Let’s start with some imports:
951
Advanced Quantitative Economics with Python
As in Optimal Taxation without State-Contingent Debt and Optimal Taxation with State-Contingent Debt, we assume that
the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
We work directly with labor supply instead of leisure.
We assume that
𝑐𝑡 + 𝑔 𝑡 = 𝑛 𝑡
𝛽 = .9
𝜎=2
𝛾=2
import numpy as np
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=np.full((2, 2), 0.5),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
(continues on next page)
We’ll want first and second moments of some key random variables below.
The following code computes these moments; the code is recycled from Fluctuating Interest Rates Deliver Fiscal Insurance.
import numpy as np
from scipy.optimize import root
from quantecon import MarkovChain
class SequentialAllocation:
'''
(continues on next page)
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) \
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
import numpy as np
from scipy.optimize import fmin_slsqp
from scipy.optimize import root
from quantecon import MarkovChain
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# Time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def objf_prime(x):
epsilon = 1e-7
x0 = np.asfarray(x)
f0 = np.atleast_1d(objf(x0))
jac = np.zeros([len(x0), len(f0)])
dx = np.zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (objf(x0+dx) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
fprime=objf_prime, full_output=True,
iprint=0, acc=self.tol, iter=self.
↪maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True,
iprint=0)
if imode > 0:
import numpy as np
from scipy.interpolate import UnivariateSpline
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
Next, we show the code that we use to generate a very long simulation starting from initial government debt equal to −.5.
Here is a graph of a long simulation of 102000 periods.
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:437: RuntimeWarning: Values in x were outside bounds during a␣
fx = wrapped_fun(x)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:441: RuntimeWarning: Values in x were outside bounds during a␣
g = append(wrapped_grad(x), 0.0)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:495: RuntimeWarning: Values in x were outside bounds during a␣
U = (c**(1 - σ) - 1) / (1 - σ)
/tmp/ipykernel_6826/108196118.py:29: RuntimeWarning: divide by zero encountered in␣
↪power
return c**(-self.σ)
/tmp/ipykernel_6826/1277371586.py:249: RuntimeWarning: invalid value encountered␣
↪in divide
0.038266353387659546
0.0015144378246632448
0.0013387575049931865
0.0011833202400662248
0.0010600307116134505
0.0009506620324908642
0.0008518776517238551
0.0007625857031042564
0.0006819563061669217
0.0006094002927240671
0.0005443007356805235
0.00048599500343956094
0.0004338395935928358
0.00038722730865154364
0.00034559541217657187
0.00030842870645340995
0.00027525901875688697
0.0002456631291987257
0.00021925988533911457
0.0001957069581927878
0.00017469751641633328
0.00015595697131045533
0.00013923987965580473
0.0001243270476244632
0.00011102285954170156
9.915283206080047e-05
8.856139177373994e-05
7.910986485356134e-05
7.067466534026614e-05
6.314566737649043e-05
5.6424746008715835e-05
5.04244714230645e-05
4.5066942129829506e-05
4.028274354582181e-05
3.601001917066026e-05
3.219364287744318e-05
2.878448158073308e-05
2.5738738366349524e-05
2.3017369974638877e-05
2.0585562530972924e-05
1.8412273759209572e-05
1.6470096733078585e-05
1.4734148603737835e-05
1.3182214255360329e-05
1.1794654716176686e-05
1.0553942898779478e-05
9.444436197515114e-06
8.452171093491432e-06
7.564681603048501e-06
6.770836606096674e-06
6.060699172057158e-06
5.4253876343226e-06
4.856977544060761e-06
4.348382732427091e-06
3.893276456302588e-06
3.4860028420224977e-06
3.1215110784890745e-06
2.7952840260155024e-06
2.503284254157189e-06
2.241904747465382e-06
2.0079209145832687e-06
1.7984472260187192e-06
1.610904141295967e-06
1.4429883256895489e-06
1.2926354365994746e-06
1.1580011940576491e-06
1.0374362190402233e-06
9.294651286343194e-07
8.327660623755013e-07
7.461585686381671e-07
6.68586648784756e-07
5.991017296865946e-07
5.368606502407216e-07
4.811037017633464e-07
4.3115434615062044e-07
3.8640500348483447e-07
3.4631274740294855e-07
3.1039146715661056e-07
2.782060642970499e-07
2.493665449692665e-07
2.235241683944158e-07
2.0036660045892633e-07
1.796140357496926e-07
1.610161234596195e-07
1.4434845857135709e-07
1.29410194199688e-07
1.1602140686642469e-07
1.04020962175412e-07
9.326451087350253e-08
8.362279520562034e-08
7.49799979528415e-08
6.723237810210067e-08
6.028699653820159e-08
5.4060588066801066e-08
4.847855517381241e-08
4.347405660607874e-08
3.898720608840536e-08
3.496434157686767e-08
3.135737680533792e-08
2.8123222131646282e-08
2.5223262308472423e-08
2.2622892571432625e-08
2.0291098813063476e-08
1.820008555543109e-08
1.6324938418135388e-08
1.4643330672610771e-08
1.3135245110419445e-08
1.178274355586975e-08
1.0569743803546048e-08
9.48183058751907e-09
8.506079544395937e-09
7.630907318911004e-09
6.845926774203295e-09
6.141826797773109e-09
5.510259068441386e-09
4.943738281315066e-09
4.435554859709816e-09
3.979736766026741e-09
3.5708317622814044e-09
3.2040044801866767e-09
2.874916539533131e-09
2.579680212253616e-09
2.3148068175021918e-09
2.077170148801081e-09
1.8639635474165993e-09
1.6726726276855955e-09
1.5010414936033808e-09
1.3470449992327086e-09
1.2088698423920761e-09
1.0848882197883804e-09
9.736395405805598e-10
8.738135346705384e-10
7.842367703299733e-10
7.03855297579472e-10
6.317225605423774e-10
5.669925787732949e-10
5.089032105148693e-10
4.5677367318159076e-10
4.0999013116379334e-10
3.680044560697966e-10
3.3032415368561477e-10
2.96506010211222e-10
2.6615516244191936e-10
2.389139399385772e-10
2.144649644252697e-10
1.9252092177853976e-10
1.7282471699749249e-10
1.551454449875162e-10
1.3927730577138407e-10
1.2503449048385917e-10
1.1224916676355658e-10
1.0077318342152794e-10
9.047094182757221e-11
For the short samples early in our simulated sample of 102,000 observations, fluctuations in government debt and the
tax rate conceal the weak but inexorable force that the Ramsey planner puts into both series driving them toward ergodic
marginal distributions that are far from these early observations
• early observations are more influenced by the initial value of the par value of government debt than by the ergodic
mean of the par value of government debt
• much later observations are more influenced by the ergodic mean and are independent of the par value of initial
government debt
BEGS [Bhandari et al., 2017] call 𝒳𝑡 the effective government deficit and ℬ𝑡 the effective government debt.
Equation (44) of [Bhandari et al., 2017] expresses the time 𝑡 state 𝑠 government budget constraint as
where the dependence on 𝜏 is meant to remind us that these objects depend on the tax rate; 𝑠− is last period’s Markov
state.
BEGS interpret random variations in the right side of (48.1) as fiscal risks generated by
• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government portfolio, namely,
ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡
The minimand in criterion (48.3) measures fiscal risk associated with a given tax-debt policy that appears on the right
side of equation (48.1).
Expressing formula (48.2) in terms of our notation tells us that the ergodic mean of the par value 𝑏 of government debt
in the AMSS model should be approximately
ℬ∗ ℬ∗
𝑏̂ = = (48.4)
𝛽𝐸(𝐸𝑡 𝑢𝑐,𝑡+1 ) 𝛽𝐸(𝑢𝑐,𝑡+1 )
where mathematical expectations are taken with respect to the ergodic distribution.
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbitrary initial condition.
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
≈ (48.5)
(ℬ𝑡 − ℬ∗ ) 1 + 𝛽 2 var∞ (ℛ)
(See the equation above equation (47) in BEGS [Bhandari et al., 2017])
The remainder of this lecture is about technical material based on formulas from BEGS [Bhandari et al., 2017].
The topic involves interpreting and extending formula (48.3) for the ergodic mean ℬ∗ .
Notice how attributes of the ergodic distribution for ℬ𝑡 appear on the right side of formula (48.3) for approximating the
ergodic mean via ℬ∗ .
Therefor, formula (48.3) is not useful for estimating the mean of the ergodic in advance of actually approximating the
ergodic distribution.
• we need to know the ergodic distribution to compute the right side of formula (48.3)
So the primary use of equation (48.3) is how it confirms that the ergodic distribution solves a fiscal-risk minimization
problem.
As an example, notice how we used the formula for the mean of ℬ in the ergodic distribution of the special AMSS
economy in Fiscal Insurance via Fluctuating Interest Rates
BEGS also [Bhandari et al., 2017] propose an approximation to ℬ∗ that can be computed without first approximating
the ergodic distribution.
To construct the BEGS approximation to ℬ∗ , we just follow steps set forth on pages 648 - 650 of section III.D of [Bhandari
et al., 2017]
• notation in BEGS might be confusing at first sight, so it is important to stare and digest before computing
• there are also some sign errors in the [Bhandari et al., 2017] text that we’ll want to correct here
Here is a step-by-step description of the BEGS [Bhandari et al., 2017] approximation procedure.
𝑐𝜏 (𝑠)−𝜎
ℛ𝜏 (𝑠) = 𝑆
𝛽 ∑𝑠′ =1 𝑐𝜏 (𝑠′ )−𝜎 𝜋(𝑠′ )
and
each for 𝑠 = 1, … , 𝑆.
BEGS call ℛ𝜏 (𝑠) the effective return on risk-free debt and they call 𝒳𝜏 (𝑠) the effective government deficit.
Step 3: With the preceding objects in hand, for a given ℬ, we seek a 𝜏 that satisfies
𝛽 𝛽
ℬ=− 𝐸𝒳𝜏 ≡ − ∑ 𝒳𝜏 (𝑠)𝜋(𝑠)
1−𝛽 1−𝛽 𝑠
This equation says that at a constant discount factor 𝛽, equivalent government debt ℬ equals the present value of the mean
effective government surplus.
Another typo alert: there is a sign error in equation (46) of BEGS [Bhandari et al., 2017] –the left side should be
multiplied by −1.
• We have made this correction in the above equation.
For a given ℬ, let a 𝜏 that solves the above equation be called 𝜏 (ℬ).
We’ll use a Python root solver to find a 𝜏 that solves this equation for a given ℬ.
We’ll use this function to induce a function 𝜏 (ℬ).
Step 4: With a Python program that computes 𝜏 (ℬ) in hand, next we write a Python function to compute the random
variable.
Step 5: Now that we have a way to compute the random variable 𝐽 (ℬ)(𝑠), 𝑠 = 1, … , 𝑆, via a composition of Python
functions, we can use the population variance function that we defined in the code above to construct a function var(𝐽 (ℬ)).
We put var(𝐽 (ℬ)) into a Python function minimizer and compute
Step 6: Next we take the minimizer ℬ∗ and the Python functions for computing means and variances and compute
1
rate =
1+ 𝛽 2 var(ℛ 𝜏(ℬ∗ ) )
(ℬ∗ , rate)
𝑑𝑖𝑣 = 𝛽𝐸𝑢𝑐,𝑡+1
and then compute the mean of the par value of government debt in the AMSS model
ℬ∗
𝑏̂ =
𝑑𝑖𝑣
In the two-Markov-state AMSS economy in Fiscal Insurance via Fluctuating Interest Rates, 𝐸𝑡 𝑢𝑐,𝑡+1 = 𝐸𝑢𝑐,𝑡+1 in the
ergodic distribution.
We have confirmed that this formula very accurately describes a constant par value of government debt that
• supports full fiscal insurance via fluctuating interest parameters, and
• is the limit of government debt as 𝑡 → +∞
In the three-Markov-state economy of this lecture, the par value of government debt fluctuates in a history-dependent
way even asymptotically.
In this economy, 𝑏̂ given by the above formula approximates the mean of the ergodic distribution of the par value of
government debt
so while the approximation circumvents the chicken and egg problem that surrounds
the much better approximation associated with the green vertical line, it does so by enlarging the approximation
error
• 𝑏̂ is represented by the red vertical line plotted in the histogram of the last 100,000 observations of our simulation
of the par value of government debt plotted above
• the approximation is fairly accurate but not perfect
48.4.7 Execution
Step 1
Step 2
c**(-u.σ) @ u.π
u.π
s = 0
R, X = compute_R_X(τ, u, s)
mean(R, s)
1.1111111111111112
mean(X, s)
0.19134248445303795
X @ u.π
Step 3
s = 0
B = 1.0
0.2740159773695818
Step 4
min_J(B, u, s)
0.035564405653720765
Step 6
-1.199483167941158
B_hat = B_star/div
B_hat
-1.0577661126390971
0.09572916798461703
0.9931353432732218
FORTYNINE
In addition to what’s in Anaconda, this lecture will need the following libraries:
49.1 Overview
This lecture describes how Chang [Chang, 1998] analyzed competitive equilibria and a best competitive equilibrium
called a Ramsey plan.
He did this by
• characterizing a competitive equilibrium recursively in a way also employed in the dynamic Stackelberg problems
and Calvo model lectures to pose Stackelberg problems in linear economies, and then
• appropriately adapting an argument of Abreu, Pearce, and Stachetti [Abreu et al., 1990] to describe key features
of the set of competitive equilibria
Roberto Chang [Chang, 1998] chose a model of Calvo [Calvo, 1978] as a simple structure that conveys ideas that apply
more broadly.
A textbook version of Chang’s model appears in chapter 25 of [Ljungqvist and Sargent, 2018].
This lecture and Credible Government Policies in Chang Model can be viewed as more sophisticated and complete treat-
ments of the topics discussed in Ramsey plans, time inconsistency, sustainable plans.
Both this lecture and Credible Government Policies in Chang Model make extensive use of an idea to which we apply the
nickname dynamic programming squared.
In dynamic programming squared problems there are typically two interrelated Bellman equations
• A Bellman equation for a set of agents or followers with value or value function 𝑣𝑎 .
• A Bellman equation for a principal or Ramsey planner or Stackelberg leader with value or value function 𝑣𝑝 in
which 𝑣𝑎 appears as an argument.
We encountered problems with this structure in dynamic Stackelberg problems, optimal taxation with state-contingent debt,
and other lectures.
We’ll start with some standard imports:
import numpy as np
import polytope
import matplotlib.pyplot as plt
987
Advanced Quantitative Economics with Python
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 = 0, 1, ….
The objects in play are
• an initial quantity 𝑀−1 of nominal money holdings
• a sequence of inverse money growth rates ℎ⃗ and an associated sequence of nominal money holdings 𝑀⃗
• a sequence of values of money 𝑞 ⃗
• a sequence of real money holdings 𝑚⃗
• a sequence of total tax collections 𝑥⃗
• a sequence of per capita rates of consumption 𝑐 ⃗
• a sequence of per capita incomes 𝑦 ⃗
A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget constraints and other constraints
imposed by competitive equilibrium.
Given tax collection and price of money sequences, a representative household chooses sequences (𝑐,⃗ 𝑚)
⃗ of consumption
and real balances.
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling decisions of the government
and the representative household.
Chang adopts a version of a model that [Calvo, 1978] designed to exhibit time-inconsistency of a Ramsey policy in a
simple and transparent setting.
By influencing the representative household’s expectations, government actions at time 𝑡 affect components of household
utilities for periods 𝑠 before 𝑡.
When setting a path for monetary expansion rates, the government takes into account how the household’s anticipations
of the government’s future actions affect the household’s current decisions.
The ultimate source of time inconsistency is that a time 0 Ramsey planner takes these effects into account in designing a
plan of government actions for 𝑡 ≥ 0.
49.2 Decisions
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗ of income and total tax
collections, respectively.
Facing vector 𝑞 ⃗ as a price taker, the representative household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and
nominal balances, respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (49.1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (49.2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (49.3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, which we can also call the value of money.
Chang [Chang, 1998] assumes that
• 𝑢 ∶ ℝ+ → ℝ is twice continuously differentiable, strictly concave, and strictly increasing;
• 𝑣 ∶ ℝ+ → ℝ is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0
The household carries real balances out of a period equal to 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 .
Inequality (49.2) is the household’s time 𝑡 budget constraint.
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on real balances 𝑞𝑡 𝑀𝑡−1 carried into period 𝑡, income,
consumption, taxes.
Equation (49.3) imposes an exogenous upper bound 𝑚̄ on the household’s choice of real balances, where 𝑚̄ ≥ 𝑚𝑓 .
49.2.2 Government
𝑀𝑡−1
The government chooses a sequence of inverse money growth rates with time 𝑡 component ℎ𝑡 ≡ 𝑀𝑡 ∈ Π ≡ [𝜋, 𝜋],
where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋.
The government purchases no goods.
It taxes only to acquire paper currency that it will withdraw from circulation (e.g., by burning it).
Let 𝑝𝑡 be the price level at time 𝑡, measured as time 𝑡 dollars per unit of the consumption good.
Evidently, the value of paper currency meassured in units of the consumption good at time 𝑡 is
1
𝑞𝑡 = .
𝑝𝑡
The government faces a sequence of budget constraints with time 𝑡 component
𝑀𝑡 − 𝑀𝑡−1
𝑥𝑡 + = 0,
𝑝𝑡
𝑀𝑡 −𝑀𝑡−1
where 𝑥𝑡 is the real value of revenue that the government raises from taxes and 𝑝𝑡 is the real value of revenue that
the government raises by printing new paper currency.
Evidently, this budget constraint can be rewritten as
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (49.4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π = [𝜋, 𝜋] evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚].
̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸.
To represent the idea that taxes are distorting, Chang makes the following assumption about outcomes for per capita
output:
𝑦𝑡 = 𝑓(𝑥𝑡 ), (49.5)
where 𝑓 ∶ ℝ → ℝ satisfies 𝑓(𝑥) > 0, 𝑓(𝑥) is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, 𝑓 ′ (0) = 0, and 𝑓(𝑥) = 𝑓(−𝑥)
for all 𝑥 ∈ ℝ, so that subsidies and taxes are equally distorting.
Example parameterizations
In some of our Python code deployed later in this lecture, we’ll assume the following functional forms:
𝑢(𝑐) = log(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
𝑓(𝑥) = 180 − (0.4𝑥)2
The tax distortion function
Calvo’s and Chang’s purpose is not to model the causes of tax distortions in any detail but simply to summarize the outcome
of those distortions via the function 𝑓(𝑥).
A key part of the specification is that tax distortions are increasing in the absolute value of tax revenues.
Ramsey plan: A Ramsey plan is a competitive equilibrium that maximizes (49.1).
Within-period timing of decisions is as follows:
• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;
• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household chooses 𝑀𝑡 and therefore 𝑚𝑡
because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡
This within-period timing confronts the government with choices framed by how the private sector wants to respond when
the government takes time 𝑡 actions that differ from what the private sector had expected.
This consideration will be important in lecture credible government policies when we study credible government policies.
The model is designed to focus on the intertemporal trade-offs between the welfare benefits of deflation and the welfare
costs associated with the high tax collections required to retire money at a rate that delivers deflation.
A benevolent time 0 government can promote utility generating increases in real balances only by imposing sufficiently
large distorting tax collections.
To promote the welfare increasing effects of high real balances, the government wants to induce gradual deflation.
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
First-order conditions with respect to 𝑐𝑡 and 𝑀𝑡 , respectively, are
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
The last equation expresses Karush-Kuhn-Tucker complementary slackness conditions (see here).
These insist that the inequality is an equality at an interior solution for 𝑀𝑡 .
𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang refers to as ‘the marginal
utility of real balances’.
From the standpoint of the household at time 𝑡, equation (49.7) shows that 𝜃𝑡+1 intermediates the influences of
(𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡 .
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through their effects on the scalar
𝜃𝑡+1 .
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1 functions in this way is an
important step in constructing a class of competitive equilibria that have a recursive representation.
A closely related observation pervaded the analysis of Stackelberg plans in lecture dynamic Stackelberg problems.
Definition:
• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0.
• A price system is a nonnegative value of money sequence 𝑞.⃗
• An allocation is a triple of nonnegative sequences (𝑐,⃗ 𝑚,⃗ 𝑦).
⃗
It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸.
Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be a competitive equilibrium
if
• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 ).
• The government budget constraint is satisfied.
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem.
ℎ𝑡 = ℎ(𝜃𝑡 )
𝑚𝑡 = 𝑚(𝜃𝑡 )
(49.8)
𝑥𝑡 = 𝑥(𝜃𝑡 )
𝜃𝑡+1 = Ψ(𝜃𝑡 )
starting from 𝜃0
The range and domain of Ψ(⋅) are both Ω
3. A recursive representation of a Ramsey plan
• A recursive representation of a Ramsey plan is a recursive competitive equilibrium 𝜃0 , (ℎ, 𝑚, 𝑥, Ψ) that,
∞
among all recursive competitive equilibria, maximizes ∑𝑡=0 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )].
• The Ramsey planner chooses 𝜃0 , (ℎ, 𝑚, 𝑥, Ψ) from among the set of recursive competitive equilibria at time
0.
• Iterations on the function Ψ determine subsequent 𝜃𝑡 ’s that summarize the aspects of the continuation com-
petitive equilibria that influence the household’s decisions.
• At time 0, the Ramsey planner commits to this implied sequence {𝜃𝑡 }∞
𝑡=0 and therefore to an associated
sequence of continuation competitive equilibria.
4. A characterization of time-inconsistency of a Ramsey plan
• Imagine that after a ‘revolution’ at time 𝑡 ≥ 1, a new Ramsey planner is given the opportunity to ignore history
and solve a brand new Ramsey plan.
• This new planner would want to reset the 𝜃𝑡 associated with the original Ramsey plan to 𝜃0 .
• The incentive to reinitialize 𝜃𝑡 associated with this revolution experiment indicates the time-inconsistency of
the Ramsey plan.
• By resetting 𝜃 to 𝜃0 , the new planner avoids the costs at time 𝑡 that the original Ramsey planner must pay
to reap the beneficial effects that the original Ramsey plan for 𝑠 ≥ 𝑡 had achieved via its influence on the
household’s decisions for 𝑠 = 0, … , 𝑡 − 1.
49.5 Analysis
A competitive equilibrium is a triple of sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ that satisfies (49.2), (49.3), and (49.6).
Chang works with a set of competitive equilibria defined as follows.
Definition: 𝐶𝐸 = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ such that (49.2), (49.3), and (49.6) are satisfied }.
𝐶𝐸 is not empty because there exists a competitive equilibrium with ℎ𝑡 = 1 for all 𝑡 ≥ 1, namely, an equilibrium with
a constant money supply and constant price level.
Chang establishes that 𝐶𝐸 is also compact.
Chang makes the following key observation that combines ideas of Abreu, Pearce, and Stacchetti [Abreu et al., 1990]
with insights of Kydland and Prescott [Kydland and Prescott, 1980].
Proposition: The continuation of a competitive equilibrium is a competitive equilibrium.
That is, (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 implies that (𝑚⃗ 𝑡 , 𝑥𝑡⃗ , ℎ⃗ 𝑡 ) ∈ 𝐶𝐸 ∀ 𝑡 ≥ 1.
(Lecture dynamic Stackelberg problems also used a version of this insight)
We can now state that a Ramsey problem is to
∞
max ∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈𝐸 ∞
𝑡=0
Ω = {𝜃 ∈ ℝ such that 𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for some (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
Equation (49.6) inherits from the household’s Euler equation for money holdings the property that the value of 𝑚0
consistent with the representative household’s choices depends on (ℎ⃗ 1 , 𝑚⃗ 1 ).
This dependence is captured in the definition above by making Ω be the set of first period values of 𝜃0 satisfying 𝜃0 =
⃗
𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for first period component (𝑚0 , ℎ0 ) of competitive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ).
Chang establishes that Ω is a nonempty and compact subset of ℝ+ .
Next Chang advances:
Definition: Γ(𝜃) = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸|𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 )}.
Thus, Γ(𝜃) is the set of competitive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ whose first period components (𝑚0 , ℎ0 ) deliver the
prescribed value 𝜃 for first period marginal utility.
If we knew the sets Ω, Γ(𝜃), we could use the following two-step procedure to find at least the value of the Ramsey
outcome to the representative household
1. Find the indirect value function 𝑤(𝜃) defined as
∞
𝑤(𝜃) = max ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈Γ(𝜃) 𝑡=0
and
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥) (49.11)
and
−𝑥 = 𝑚(1 − ℎ) (49.12)
and
Before we use this proposition to recover a recursive representation of the Ramsey plan, note that the proposition relies
on knowing the set Ω.
To find Ω, Chang uses the insights of Kydland and Prescott [Kydland and Prescott, 1980] together with a method based on
the Abreu, Pearce, and Stacchetti [Abreu et al., 1990] iteration to convergence on an operator 𝐵 that maps continuation
values into values.
We want an operator that maps a continuation 𝜃 into a current 𝜃.
Chang lets 𝑄 be a nonempty, bounded subset of ℝ.
Elements of the set 𝑄 are taken to be candidate values for continuation marginal utilities.
Chang defines an operator
Let ℎ⃗ 𝑡 = (ℎ0 , ℎ1 , … , ℎ𝑡 ) denote a history of inverse money creation rates with time 𝑡 component ℎ𝑡 ∈ Π.
A government strategy 𝜎 = {𝜎𝑡 }∞
𝑡=0 is a 𝜎0 ∈ Π and for 𝑡 ≥ 1 a sequence of functions 𝜎𝑡 ∶ Π
𝑡−1
→ Π.
Chang restricts the government’s choice of strategies to the following space:
𝐶𝐸𝜋 = {ℎ⃗ ∈ Π∞ ∶ there is some (𝑚,⃗ 𝑥)⃗ such that (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
In words, 𝐶𝐸𝜋 is the set of money growth sequences consistent with the existence of competitive equilibria.
Chang observes that 𝐶𝐸𝜋 is nonempty and compact.
Definition: 𝜎 is said to be admissible if for all 𝑡 ≥ 1 and after any history ℎ⃗ 𝑡−1 , the continuation ℎ⃗ 𝑡 implied by 𝜎 belongs
to 𝐶𝐸𝜋 .
Admissibility of 𝜎 means that anticipated policy choices associated with 𝜎 are consistent with the existence of competitive
equilibria after each possible subsequent history.
After any history ℎ⃗ 𝑡−1 , admissibility restricts the government’s choice in period 𝑡 to the set
In words, 𝐶𝐸𝜋0 is the set of all first period money growth rates ℎ = ℎ0 , each of which is consistent with the existence of
a sequence of money growth rates ℎ⃗ starting from ℎ0 in the initial period and for which a competitive equilibrium exists.
Remark:
𝐶𝐸𝜋0 = {ℎ ∈ Π ∶ there is (𝑚, 𝜃′ ) ∈ [0, 𝑚]̄ × Ω such that 𝑢′ [𝑓((ℎ − 1)𝑚) − 𝑣′ (𝑚)] ≤ 𝛽𝜃′ with equality if 𝑚 < 𝑚}.
̄
At this point it is convenient to introduce another operator that can be used to compute a Ramsey plan.
For computing a Ramsey plan, this operator is wasteful because it works with a state vector that is bigger than necessary.
̃
We introduce this operator because it helps to prepare the way for Chang’s operator called 𝐷(𝑍) that we shall describe
in lecture credible government policies.
It is also useful because a fixed point of the operator to be defined here provides a good guess for an initial set from which
̃
to initiate iterations on Chang’s set-to-set operator 𝐷(𝑍) to be described in lecture credible government policies.
Let 𝑆 be the set of all pairs (𝑤, 𝜃) of competitive equilibrium values and associated initial marginal utilities.
Let 𝑊 be a bounded set of values in ℝ.
Let 𝑍 be a nonempty subset of 𝑊 × Ω.
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, 𝜃 pairs.
Define the operator
𝐷(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
We noted that the set 𝑆 can be found by iterating to convergence on 𝐷, provided that we start with a sufficiently large
initial set 𝑆0 .
Our implementation builds on ideas in this notebook.
To find 𝑆 we use a numerical algorithm called the outer hyperplane approximation algorithm.
A key feature of this algorithm is that we discretize the action space, i.e., we create a grid of possible values for 𝑚 and ℎ
(note that 𝑥 is implied by 𝑚 and ℎ). This discretization simplifies computation of 𝑆 ̃ by allowing us to find it by solving a
sequence of linear programs.
The outer hyperplane approximation algorithm proceeds as follows:
1. Initialize subgradients, 𝐻, and hyperplane levels, 𝐶0 .
2. Given a set of subgradients, 𝐻, and hyperplane levels, 𝐶𝑡 , for each subgradient ℎ𝑖 ∈ 𝐻:
• Solve a linear program (described below) for each action in the action space.
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1 .
3. If |𝐶𝑡+1 − 𝐶𝑡 | > 𝜖, return to 2.
Step 1 simply creates a large initial set 𝑆0 .
Given some set 𝑆𝑡 , Step 2 then constructs the set 𝑆𝑡+1 = 𝐷(𝑆𝑡 ). The linear program in Step 2 is designed to construct
a set 𝑆𝑡+1 that is as large as possible while satisfying the constraints of the 𝐷(𝑆) operator.
To do this, for each subgradient ℎ𝑖 , and for each point in the action space (𝑚𝑗 , ℎ𝑗 ), we solve the following problem:
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
𝑢(𝑐) = log(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
"""
Provides a class called ChangModel to solve different
parameterizations of the Chang (1998) model.
"""
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, \
self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β * res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, \
self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β*res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_s[j] = res.x
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), \
np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} \
seconds'.format(iters, round(elapsed, 2)))
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) \
+ self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
# Bellman Iterations
diff = 1
iters = 1
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)
[1.9168]
[0.66782]
[0.49235]
[0.32412]
[0.19022]
[0.10863]
[0.05817]
[0.0262]
[0.01836]
[0.01415]
[0.00297]
[0.00089]
[0.00027]
[0.00008]
[0.00002]
[0.00001]
Convergence achieved after 16 iterations and 38.65 seconds
def plot_competitive(ChangModel):
"""
Method that only plots competitive equilibrium set
"""
poly_C = polytope.Polytope(ChangModel.H, ChangModel.c1_c)
ext_C = polytope.extreme(poly_C)
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
plt.tight_layout()
plt.show()
plot_competitive(ch1)
[0.06369]
[0.02476]
[0.02153]
[0.01915]
[0.01795]
[0.01642]
[0.01507]
[0.01284]
[0.01106]
[0.00694]
[0.0085]
[0.00781]
[0.00433]
[0.00492]
[0.00303]
[0.00182]
[0.00638]
[0.00116]
[0.00093]
[0.00075]
[0.0006]
[0.00494]
[0.00038]
[0.00121]
[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 114.68 seconds
plot_competitive(ch2)
In this section we solve the Bellman equation confronting a continuation Ramsey planner.
The construction of a Ramsey plan is decomposed into a two subproblems in Ramsey plans, time inconsistency, sustainable
plans and dynamic Stackelberg problems.
• Subproblem 1 is faced by a sequence of continuation Ramsey planners at 𝑡 ≥ 1.
• Subproblem 2 is faced by a Ramsey planner at 𝑡 = 0.
The problem is:
subject to:
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥)
𝑥 = 𝑚(ℎ − 1)
(𝑚, 𝑥, ℎ) ∈ 𝐸
𝜃′ ∈ Ω
To solve this Bellman equation, we must know the set Ω.
We have solved the Bellman equation for the two sets of parameter values for which we computed the equilibrium value
sets above.
Hence for these parameter configurations, we know the bounds of Ω.
The two sets of parameters differ only in the level of 𝛽.
From the figures earlier in this lecture, we know that when 𝛽 = 0.3, Ω = [0.0088, 0.0499], and when 𝛽 = 0.8,
Ω = [0.0395, 0.2193]
uc = lambda c: np.log(c)
↪extract a single element from your array before performing this operation.␣
p_iter1[i] = -p_fun(res.x)
/tmp/ipykernel_7471/1608401414.py:309: RuntimeWarning: invalid value encountered␣
↪in log
uc = lambda c: np.log(c)
↪extract a single element from your array before performing this operation.␣
p_grid[i] = p
/tmp/ipykernel_7471/1608401414.py:444: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣
↪extract a single element from your array before performing this operation.␣
↪extract a single element from your array before performing this operation.␣
θ_series[0] = res.x
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:437: RuntimeWarning: Values in x were outside bounds during a␣
fx = wrapped_fun(x)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:441: RuntimeWarning: Values in x were outside bounds during a␣
g = append(wrapped_grad(x), 0.0)
/home/runner/miniconda3/envs/quantecon/lib/python3.12/site-packages/scipy/optimize/
↪_slsqp_py.py:495: RuntimeWarning: Values in x were outside bounds during a␣
First, a quick check that our approximations of the value functions are good.
We do this by calculating the residuals between iterates on the value function on a fine grid:
max(abs(ch1.resid_grid)), max(abs(ch2.resid_grid))
(6.46313155971967e-06, 6.875358415925348e-07)
The value functions plotted below trace out the right edges of the sets of equilibrium values plotted above
plt.show()
The next figure plots the optimal policy functions; values of 𝜃′ , 𝑚, 𝑥, ℎ for each value of the state 𝜃:
plt.show()
With the first set of parameter values, the value of 𝜃′ chosen by the Ramsey planner quickly hits the upper limit of Ω.
But with the second set of parameters it converges to a value in the interior of the set.
Consequently, the choice of 𝜃 ̄ is clearly important with the first set of parameter values.
One way of seeing this is plotting 𝜃′ (𝜃) for each set of parameters.
With the first set of parameter values, this function does not intersect the 45-degree line until 𝜃,̄ whereas in the second
set of parameter values, it intersects in the interior.
axes[0].legend()
plt.show()
Subproblem 2 is equivalent to the planner choosing the initial value of 𝜃 (i.e. the value which maximizes the value
function).
From this starting point, we can then trace out the paths for {𝜃𝑡 , 𝑚𝑡 , ℎ𝑡 , 𝑥𝑡 }∞
𝑡=0 that support this equilibrium.
plt.show()
In Credible Government Policies in Chang Model we shall find a subset of competitive equilibria that are sustainable in
the sense that a sequence of government administrations that chooses sequentially, rather than once and for all at time 0
will choose to implement them.
In the process of constructing them, we shall construct another, smaller set of competitive equilibria.
FIFTY
In addition to what’s in Anaconda, this lecture will need the following libraries:
50.1 Overview
Some of the material in this lecture and competitive equilibria in the Chang model can be viewed as more sophisticated
and complete treatments of the topics discussed in Ramsey plans, time inconsistency, sustainable plans.
This lecture assumes almost the same economic environment analyzed in competitive equilibria in the Chang model.
The only change – and it is a substantial one – is the timing protocol for making government decisions.
In competitive equilibria in the Chang model, a Ramsey planner chose a comprehensive government policy once-and-for-all
at time 0.
Now in this lecture, there is no time 0 Ramsey planner.
Instead there is a sequence of government decision-makers, one for each 𝑡.
The time 𝑡 government decision-maker choose time 𝑡 government actions after forecasting what future governments will
do.
We use the notion of a sustainable plan proposed in [Chari and Kehoe, 1990], also referred to as a credible public policy
in [Stokey, 1989].
Technically, this lecture starts where lecture competitive equilibria in the Chang model on Ramsey plans within the Chang
[Chang, 1998] model stopped.
That lecture presents recursive representations of competitive equilibria and a Ramsey plan for a version of a model of
Calvo [Calvo, 1978] that Chang used to analyze and illustrate these concepts.
We used two operators to characterize competitive equilibria and a Ramsey plan, respectively.
In this lecture, we define a credible public policy or sustainable plan.
̃
Starting from a large enough initial set 𝑍0 , we use iterations on Chang’s set-to-set operator 𝐷(𝑍) to compute a set of
values associated with sustainable plans.
̃
Chang’s operator 𝐷(𝑍) is closely connected with the operator 𝐷(𝑍) introduced in lecture competitive equilibria in the
Chang model.
̃
• 𝐷(𝑍) incorporates all of the restrictions imposed in constructing the operator 𝐷(𝑍), but ….
• It adds some additional restrictions
– these additional restrictions incorporate the idea that a plan must be sustainable.
1021
Advanced Quantitative Economics with Python
– sustainable means that the government wants to implement it at all times after all histories.
Let’s start with some standard imports:
import numpy as np
import polytope
import matplotlib.pyplot as plt
We begin by reviewing the set up deployed in competitive equilibria in the Chang model.
Chang’s model, adopted from Calvo, is designed to focus on the intertemporal trade-offs between the welfare benefits
of deflation and the welfare costs associated with the high tax collections required to retire money at a rate that delivers
deflation.
A benevolent time 0 government can promote utility generating increases in real balances only by imposing an infinite
sequence of sufficiently large distorting tax collections.
To promote the welfare increasing effects of high real balances, the government wants to induce gradual deflation.
We start by reviewing notation.
For a sequence of scalars 𝑧 ⃗ ≡ {𝑧𝑡 }∞ 𝑡
𝑡=0 , let 𝑧 ⃗ = (𝑧0 , … , 𝑧𝑡 ), 𝑧𝑡⃗ = (𝑧𝑡 , 𝑧𝑡+1 , …).
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 = 0, 1, ….
The objects in play are
• an initial quantity 𝑀−1 of nominal money holdings
• a sequence of inverse money growth rates ℎ⃗ and an associated sequence of nominal money holdings 𝑀⃗
• a sequence of values of money 𝑞 ⃗
• a sequence of real money holdings 𝑚⃗
• a sequence of total tax collections 𝑥⃗
• a sequence of per capita rates of consumption 𝑐 ⃗
• a sequence of per capita incomes 𝑦 ⃗
A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget constraints and other constraints
imposed by competitive equilibrium.
Given tax collection and price of money sequences, a representative household chooses sequences (𝑐,⃗ 𝑚)
⃗ of consumption
and real balances.
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling decisions of the government
and the representative household.
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗ of income and total tax
collections, respectively.
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances, respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (50.1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (50.2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (50.3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, also known as the value of money.
Chang [Chang, 1998] assumes that
• 𝑢 ∶ ℝ+ → ℝ is twice continuously differentiable, strictly concave, and strictly increasing;
• 𝑣 ∶ ℝ+ → ℝ is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0
Real balances carried out of a period equal 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 .
Inequality (50.2) is the household’s time 𝑡 budget constraint.
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on income, consumption, taxes, and real balances 𝑞𝑡 𝑀𝑡−1
carried into the period.
Equation (50.3) imposes an exogenous upper bound 𝑚̄ on the choice of real balances, where 𝑚̄ ≥ 𝑚𝑓 .
50.2.2 Government
𝑀𝑡−1
The government chooses a sequence of inverse money growth rates with time 𝑡 component ℎ𝑡 ≡ 𝑀𝑡 ∈ Π ≡ [𝜋, 𝜋],
where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋.
The government faces a sequence of budget constraints with time 𝑡 component
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (50.4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚].
̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸.
To represent the idea that taxes are distorting, Chang makes the following assumption about outcomes for per capita
output:
𝑦𝑡 = 𝑓(𝑥𝑡 ) (50.5)
where 𝑓 ∶ ℝ → ℝ satisfies 𝑓(𝑥) > 0, 𝑓(𝑥) is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, 𝑓 ′ (0) = 0, and 𝑓(𝑥) = 𝑓(−𝑥)
for all 𝑥 ∈ ℝ, so that subsidies and taxes are equally distorting.
The purpose is not to model the causes of tax distortions in any detail but simply to summarize the outcome of those
distortions via the function 𝑓(𝑥).
A key part of the specification is that tax distortions are increasing in the absolute value of tax revenues.
The government chooses a competitive equilibrium that maximizes (50.1).
For the results in this lecture, the timing of actions within a period is important because of the incentives that it activates.
Chang assumed the following within-period timing of decisions:
• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;
• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household chooses 𝑀𝑡 and therefore 𝑚𝑡
because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡
This within-period timing confronts the government with choices framed by how the private sector wants to respond when
the government takes time 𝑡 actions that differ from what the private sector had expected.
This timing will shape the incentives confronting the government at each history that are to be incorporated in the con-
struction of the 𝐷̃ operator below.
∞
ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang refers to as ‘the marginal
utility of real balances’.
From the standpoint of the household at time 𝑡, equation (50.7) shows that 𝜃𝑡+1 intermediates the influences of
(𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡 .
Definition:
• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0.
• A price system is a non-negative value of money sequence 𝑞.⃗
• An allocation is a triple of non-negative sequences (𝑐,⃗ 𝑚,⃗ 𝑦).
⃗
It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸.
Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be a competitive equilibrium
if
• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 ).
• The government budget constraint is satisfied.
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem.
ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) (50.8)
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
• Here it is to be understood that ℎ̂ 𝑡 is the action that the government policy instructs the government to take, while
ℎ𝑡 possibly not equal to ℎ̂ 𝑡 is some other action that the government is free to take at time 𝑡.
The plan is credible if it is in the time 𝑡 government’s interest to execute it.
Credibility requires that the plan be such that for all possible choices of ℎ𝑡 that are consistent with competitive equilibria,
so that at each instance and circumstance of choice, a government attains a weakly higher lifetime utility with continuation
value 𝑤𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) by adhering to the plan and confirming the associated time 𝑡 action ℎ̂ 𝑡 that the public had
expected earlier.
Please note the subtle change in arguments of the functions used to represent a competitive equilibrium and a Ramsey
plan, on the one hand, and a credible government plan, on the other hand.
The extra arguments appearing in the functions used to represent a credible plan come from allowing the government to
contemplate disappointing the private sector’s expectation about its time 𝑡 choice ℎ̂ 𝑡 .
A credible plan induces the government to confirm the private sector’s expectation.
The recursive representation of the plan uses the evolution of continuation values to deter the government from wanting
to disappoint the private sector’s expectations.
Technically, a Ramsey plan and a credible plan both incorporate history dependence.
For a Ramsey plan, this is encoded in the dynamics of the state variable 𝜃𝑡 , a promised marginal utility that the Ramsey
plan delivers to the private sector.
For a credible government plan, we the two-dimensional state vector (𝑤𝑡 , 𝜃𝑡 ) encodes history dependence.
A government strategy 𝜎 and an allocation rule 𝛼 are said to constitute a sustainable plan (SP) if.
1. 𝜎 is admissible.
2. Given 𝜎, 𝛼 is competitive.
3. After any history ℎ⃗ 𝑡−1 , the continuation of 𝜎 is optimal for the government; i.e., the sequence ℎ⃗ 𝑡 induced by 𝜎
after ℎ⃗ 𝑡−1 maximizes over 𝐶𝐸𝜋 given 𝛼.
Given any history ℎ⃗ 𝑡−1 , the continuation of a sustainable plan is a sustainable plan.
Let Θ = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 ∶ there is an SP whose outcome is(𝑚,⃗ 𝑥,⃗ ℎ)}.
⃗
with value
∞
𝑤 = ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )] and such that 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) = 𝜃}
𝑡=0
The space 𝑆 is a compact subset of 𝑊 × Ω where 𝑊 = [𝑤, 𝑤] is the space of values associated with sustainable plans.
Here 𝑤 and 𝑤 are finite bounds on the set of values.
Because there is at least one sustainable plan, 𝑆 is nonempty.
Now recall the within-period timing protocol, which we can depict (ℎ, 𝑥) → 𝑚 = 𝑞𝑀 → 𝑦 = 𝑐.
With this timing protocol in mind, the time 0 component of an SP has the following components:
1. A period 0 action ℎ̂ ∈ Π that the public expects the government to take, together with subsequent within-period
consequences 𝑚(ℎ),̂ 𝑥(ℎ)̂ when the government acts as expected.
2. For any first-period action ℎ ≠ ℎ̂ with ℎ ∈ 𝐶𝐸𝜋0 , a pair of within-period consequences 𝑚(ℎ), 𝑥(ℎ) when the
government does not act as the public had expected.
3. For every ℎ ∈ Π, a pair (𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ 𝑆 to carry into next period.
These components must be such that it is optimal for the government to choose ℎ̂ as expected; and for every possible
ℎ ∈ Π, the government budget constraint and the household’s Euler equation must hold with continuation 𝜃 being 𝜃′ (ℎ).
Given the timing protocol within the model, the representative household’s response to a government deviation to ℎ ≠
ℎ̂ from a prescribed ℎ̂ consists of a first-period action 𝑚(ℎ) and associated subsequent actions, together with future
equilibrium prices, captured by (𝑤′ (ℎ), 𝜃′ (ℎ)).
At this point, Chang introduces an idea in the spirit of Abreu, Pearce, and Stacchetti [Abreu et al., 1990].
Let 𝑍 be a nonempty subset of 𝑊 × Ω.
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, promised marginal utility pairs.
Define the following operator:
̃
𝐷(𝑍) = {(𝑤, 𝜃) ∶ there is ℎ̂ ∈ 𝐶𝐸𝜋0 and for each ℎ ∈ 𝐶𝐸𝜋0
(50.9)
a four-tuple (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
̂ + 𝑣(𝑚(ℎ))
𝑤 = 𝑢(𝑓(𝑥(ℎ))) ̂ + 𝛽𝑤′ (ℎ)̂ (50.10)
̂
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚( ℎ)̂ + 𝑥(ℎ))
̂ (50.11)
and for all ℎ ∈ 𝐶𝐸𝜋0
This operator adds the key incentive constraint to the conditions that had defined the earlier 𝐷(𝑍) operator defined in
competitive equilibria in the Chang model.
Condition (50.12) requires that the plan deter the government from wanting to take one-shot deviations when candidate
continuation values are drawn from 𝑍.
Proposition:
̃
1. If 𝑍 ⊂ 𝐷(𝑍), ̃
then 𝐷(𝑍) ⊂ 𝑆 (‘self-generation’).
̃
2. 𝑆 = 𝐷(𝑆) (‘factorization’).
Proposition:.
1. Monotonicity of 𝐷:̃ 𝑍 ⊂ 𝑍 ′ implies 𝐷(𝑍)
̃ ̃ ′ ).
⊂ 𝐷(𝑍
̃
2. 𝑍 compact implies that 𝐷(𝑍) is compact.
Chang establishes that 𝑆 is compact and that therefore there exists a highest value SP and a lowest value SP.
Further, the preceding structure allows Chang to compute 𝑆 by iterating to convergence on 𝐷̃ provided that one begins
with a sufficiently large initial set 𝑍0 .
This structure delivers the following recursive representation of a sustainable outcome:
1. choose an initial (𝑤0 , 𝜃0 ) ∈ 𝑆;
2. generate a sustainable outcome recursively by iterating on (50.8), which we repeat here for convenience:
ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
̃
Above we defined the 𝐷(𝑍) operator as (50.9).
Chang (1998) provides a method for dealing with the final three constraints.
These incentive constraints ensure that the government wants to choose ℎ̂ as the private sector had expected it to.
Chang’s simplification starts from the idea that, when considering whether or not to confirm the private sector’s expecta-
tion, the government only needs to consider the payoff of the best possible deviation.
Equally, to provide incentives to the government, we only need to consider the harshest possible punishment.
Let ℎ denote some possible deviation. Chang defines:
𝑥 = 𝑚(ℎ − 1)
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) + 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
𝐸(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
𝑤 ≥ 𝐵𝑅(𝑍)}
Aside from the final incentive constraint, this is the same as the operator in competitive equilibria in the Chang model.
Consequently, to implement this operator we just need to add one step to our outer hyperplane approximation algorithm :
1. Initialize subgradients, 𝐻, and hyperplane levels, 𝐶0 .
2. Given a set of subgradients, 𝐻, and hyperplane levels, 𝐶𝑡 , calculate 𝐵𝑅(𝑆𝑡 ).
3. Given 𝐻, 𝐶𝑡 , and 𝐵𝑅(𝑆𝑡 ), for each subgradient ℎ𝑖 ∈ 𝐻:
• Solve a linear program (described below) for each action in the action space.
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1 .
4. If |𝐶𝑡+1 − 𝐶𝑡 | > 𝜖, return to 2.
Step 1 simply creates a large initial set 𝑆0 .
Given some set 𝑆𝑡 , Step 2 then constructs the value 𝐵𝑅(𝑆𝑡 ).
To do this, we solve the following problem for each point in the action space (𝑚𝑗 , ℎ𝑗 ):
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
𝑤 ≥ 𝐵𝑅(𝑍)
This problem maximizes the hyperplane level for a given set of actions.
The second part of Step 3 then finds the maximum possible hyperplane level across the action space.
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0 .
Step 4 ends the algorithm when the difference between these sets is small enough.
We have created a Python class that solves the model assuming the following functional forms:
𝑢(𝑐) = log(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
𝑓(𝑥) = 180 − (0.4𝑥)2
̄ are then variables to be specified for an instance of the Chang class.
The remaining parameters {𝛽, 𝑚,̄ ℎ, ℎ}
Below we use the class to solve the model and plot the resulting equilibrium set, once with 𝛽 = 0.3 and once with 𝛽 = 0.8.
We also plot the (larger) competitive equilibrium sets, which we described in competitive equilibria in the Chang model.
(We have set the number of subgradients to 10 in order to speed up the code for now. We can increase accuracy by
increasing the number of subgradients)
The following code computes sustainable plans
"""
Provides a class called ChangModel to solve different
parameterizations of the Chang (1998) model.
"""
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, \
self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β * res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, \
self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β*res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_s[j] = res.x
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), \
np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} \
seconds'.format(iters, round(elapsed, 2)))
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) \
+ self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
# Bellman Iterations
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)
The set of (𝑤, 𝜃) associated with sustainable plans is smaller than the set of (𝑤, 𝜃) pairs associated with competitive
equilibria, since the additional constraints associated with sustainability must also be satisfied.
Let’s compute two examples, one with a low 𝛽, another with a higher 𝛽
ch1.solve_sustainable()
[1.9168]
[0.66782]
[0.49235]
[0.32412]
[0.19022]
[0.10863]
[0.05817]
[0.0262]
[0.01836]
[0.01415]
[0.00297]
[0.00089]
[0.00027]
[0.00008]
[0.00002]
[0.00001]
Convergence achieved after 16 iterations and 38.57 seconds
The following plot shows both the set of 𝑤, 𝜃 pairs associated with competitive equilibria (in red) and the smaller set of
𝑤, 𝜃 pairs associated with sustainable plans (in blue).
def plot_equilibria(ChangModel):
"""
Method to plot both equilibrium sets
"""
fig, ax = plt.subplots(figsize=(7, 5))
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
plt.tight_layout()
plt.show()
plot_equilibria(ch1)
ch2.solve_sustainable()
[0.06369]
[0.02476]
[0.02153]
[0.01915]
[0.01795]
[0.01642]
[0.01507]
[0.01284]
[0.01106]
[0.00694]
[0.0085]
[0.00781]
[0.00433]
[0.00492]
[0.00303]
[0.00182]
[0.00638]
[0.00116]
[0.00093]
[0.00075]
[0.0006]
[0.00494]
[0.00038]
[0.00121]
[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 113.3 seconds
plot_equilibria(ch2)
Other
1045
CHAPTER
FIFTYONE
TROUBLESHOOTING
This page is for readers experiencing errors when running the code from the lectures.
The basic assumption of the lectures is that code in a lecture should execute whenever
1. it is executed in a Jupyter notebook and
2. the notebook is running on a machine with the latest version of Anaconda Python.
You have installed Anaconda, haven’t you, following the instructions in this lecture?
Assuming that you have, the most common source of problems for our readers is that their Anaconda distribution is not
up to date.
Here’s a useful article on how to update Anaconda.
Another option is to simply remove Anaconda and reinstall.
You also need to keep the external code libraries, such as QuantEcon.py up to date.
For this task you can either
• use pip install –upgrade quantecon on the command line, or
• execute !pip install –upgrade quantecon within a Jupyter notebook.
If your local environment is still not working you can do two things.
First, you can use a remote machine instead, by clicking on the Launch Notebook icon available for each lecture
Second, you can report an issue, so we can try to fix your local set up.
We like getting feedback on the lectures so please don’t hesitate to get in touch.
1047
Advanced Quantitative Economics with Python
One way to give feedback is to raise an issue through our issue tracker.
Please be as specific as possible. Tell us where the problem is and as much detail about your local set up as you can
provide.
Another feedback option is to use our discourse forum.
Finally, you can provide direct feedback to [email protected]
FIFTYTWO
REFERENCES
1049
Advanced Quantitative Economics with Python
FIFTYTHREE
EXECUTION STATISTICS
1051
Advanced Quantitative Economics with Python
!python --version
Python 3.12.7
!conda list
[Abr88] Dilip Abreu. On the theory of infinitely repeated games with discounting. Econometrica, 56:383–396,
1988.
[APS90] Dilip Abreu, David Pearce, and Ennio Stacchetti. Toward a theory of discounted repeated games with
imperfect monitoring. Econometrica, 58(5):1041–1063, September 1990.
[AMSSeppala02] S Rao Aiyagari, Albert Marcet, Thomas J Sargent, and Juha Seppä lä . Optimal taxation without state-
contingent debt. Journal of Political Economy, 110(6):1220–1254, 2002.
[AMS02] Franklin Allen, Stephen Morris, and Hyun Song Shin. Beauty contests, bubbles, and iterated expectations
in asset markets. mimeo, 2002.
[AHMS96] Evan Anderson, Lars Peter Hansen, Ellen R. McGrattan, and Thomas J. Sargent. Mechanics of forming
and estimating dynamic linear economies. In Hans M. Amman, David A. Kendrick, and John Rust, editors,
Handbook of computational economics, 171–252. Elsevier Science, North-Holland, 1996.
[AHS03] Evan W. Anderson, Lars Peter Hansen, and Thomas J. Sargent. A Quartet of Semigroups for Model Spec-
ification, Robustness, Prices of Risk, and Model Detection. Journal of the European Economic Association,
1(1):68–123, March 2003. URL: https://ideas.repec.org/a/tpr/jeurec/v1y2003i1p68-123.html, doi:.
[Are08] Cristina Arellano. Default risk and income fluctuations in emerging economies. The American Economic
Review, pages 690–712, 2008.
[AP91] Papoulis Athanasios and S Unnikrishna Pillai. Probability, random variables, and stochastic processes. Mc-
Graw Hill, 1991.
[AP11] Orazio P Attanasio and Nicola Pavoni. Risk sharing in private information models with asset accumulation:
explaining the excess smoothness of consumption. Econometrica, 79(4):1027–1068, 2011.
[BCZ14] David Backus, Mikhail Chernov, and Stanley Zin. Sources of Entropy in Representative Agent
Models. Journal of Finance, 69(1):51–99, February 2014. URL: https://ideas.repec.org/a/bla/jfinan/
v69y2014i1p51-99.html, doi:.
[BHS09] Francisco Barillas, Lars Peter Hansen, and Thomas J. Sargent. Doubts or variability? Journal
of Economic Theory, 144(6):2388–2418, November 2009. URL: https://ideas.repec.org/a/eee/jetheo/
v144y2009i6p2388-2418.html, doi:.
[Bar79] Robert J Barro. On the Determination of the Public Debt. Journal of Political Economy, 87(5):940–971,
1979.
[Bar99] Robert J Barro. Determinants of democracy. Journal of Political economy, 107(S6):S158–S183, 1999.
[BM03] Robert J Barro and Rachel McCleary. Religion and economic growth. Technical Report, National Bureau
of Economic Research, 2003.
[BEGS17] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J. Sargent. Fiscal Policy and Debt Manage-
ment with Incomplete Markets. The Quarterly Journal of Economics, 132(2):617–663, 2017.
1053
Advanced Quantitative Economics with Python
[BCG18] Alberto Bisin, Gian Luca Clementi, and Piero Gottardi. Capital and hedging demand with incomplete
markets. Technical Report, NYU and EUI, 2018.
[BL92] Fischer Black and Robert Litterman. Global portfolio optimization. Financial analysts journal, 48(5):28–
43, 1992.
[BTWZ24] Job Boerma, Aleh Tsyvinski, Ruodu Wang, and Zhenyuan Zhang. Composite sorting. Technical Report,
University of Wisconsin, 2024.
[Buc04] James A. Bucklew. An Introduction to Rare Event Simulation. Springer Verlag, New York, 2004.
[Cag56] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor, Studies in the Quantity
Theory of Money, pages 25–117. University of Chicago Press, Chicago, 1956.
[Cal78] Guillermo A. Calvo. On the time consistency of optimal policy in a monetary economy. Econometrica,
46(6):1411–1428, 1978.
[CR83] Gary Chamberlain and Michael Rothschild. Arbitrage, Factor Structure, and Mean-Variance Analysis on
Large Asset Markets. Econometrica, 51(5):1281–1304, September 1983. URL: https://ideas.repec.org/a/
ecm/emetrp/v51y1983i5p1281-304.html, doi:.
[Cha98] Roberto Chang. Credible monetary policy in an infinite horizon model: recursive approaches. Journal of
Economic Theory, 81(2):431–461, 1998.
[CK90] Varadarajan V Chari and Patrick J Kehoe. Sustainable plans. Journal of Political Economy, pages 783–802,
1990.
[Coa37] Ronald Harry Coase. The nature of the firm. economica, 4(16):386–405, 1937.
[Coc05] John H. Cochrane. Asset Pricing: revised edition. Princeton University Press, Princeton, New Jersey, 2005.
[CC08] J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.
[DSS11] Julie Delon, Julien Salomon, and Andrei Sobolevski. Minimum-weight perfect matching for non-intrinsic
distances on the line. arXiv preprint arXiv:1102.1558, 2011.
[DJ92] Raymond J Deneckere and Kenneth L Judd. Cyclical and chaotic behavior in a dynamic equilibrium model,
with implications for fiscal policy. Cycles and chaos in economic equilibrium, pages 308–329, 1992.
[Dic75] J Dickey. Bayesian alternatives to the f-test and least-squares estimate in the normal linear model. In S.E.
Fienberg and A. Zellner, editors, Studies in Bayesian econometrics and statistics, pages 515–554. North-
Holland, Amsterdam, 1975.
[DVGC99] JBR Do Val, JC Geromel, and OLV Costa. Solutions for the linear-quadratic control problem of markov
jump linear systems. Journal of Optimization Theory and Applications, 103(2):283–311, 1999.
[Fri56] M. Friedman. A Theory of the Consumption Function. Princeton University Press, 1956.
[Gal37] Albert Gallatin. Report on the finances**, november, 1807. In Reports of the Secretary of the Treasury of
the United States, Vol 1. Government printing office, Washington, DC, 1837.
[GS89] Itzhak Gilboa and David Schmeidler. Maxmin Expected Utility with Non-Unique Prior. Journal of Math-
ematical Economics, 18(2):141–153, apr 1989.
[Hal78] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis: Theory and Evi-
dence. Journal of Political Economy, 86(6):971–987, 1978.
[HS08a] L P Hansen and T J Sargent. Robustness. Princeton University Press, 2008.
[Han12] Lars Peter Hansen. Dynamic Valuation Decomposition Within Stochastic Economies. Economet-
rica, 80(3):911–967, May 2012. URL: https://ideas.repec.org/a/ecm/emetrp/v80y2012i3p911-967.html,
doi:10.3982/ECTA8070.
1054 Bibliography
Advanced Quantitative Economics with Python
[HJ91] Lars Peter Hansen and Ravi Jagannathan. Implications of Security Market Data for Models of Dynamic
Economies. Journal of Political Economy, 99(2):225–262, April 1991. URL: https://ideas.repec.org/a/ucp/
jpolec/v99y1991i2p225-62.html, doi:10.1086/261749.
[HR87] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in Deducing Testable.
Econometrica, 55(3):587–613, May 1987.
[HS80] Lars Peter Hansen and Thomas J Sargent. Formulating and estimating dynamic linear rational expectations
models. Journal of Economic Dynamics and control, 2:7–46, 1980.
[HS00] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics. Manuscript, Department
of Economics, Stanford University., 2000.
[HS08b] Lars Peter Hansen and Thomas J Sargent. Robustness. Princeton University Press, 2008.
[HS01] Lars Peter Hansen and Thomas J. Sargent. Robust control and model uncertainty. American Economic
Review, 91(2):60–66, 2001.
[HS13] Lars Peter Hansen and Thomas J. Sargent. Recursive Linear Models of Dynamic Economics. Princeton
University Press, Princeton, New Jersey, 2013.
[HS24] Lars Peter Hansen and Thomas J. Sargent. Risk, uncertainty, and value. University of Chicago and NYU
manuscript, 2024.
[HST99] Lars Peter Hansen, Thomas J. Sargent, and Thomas D. Tallarini. Robust Permanent Income and
Pricing. Review of Economic Studies, 66(4):873–907, 1999. URL: https://ideas.repec.org/a/oup/restud/
v66y1999i4p873-907..html, doi:.
[HK79] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod securities mar-
kets. Journal of Economic Theory, 20(3):381–408, June 1979. URL: https://ideas.repec.org/a/eee/jetheo/
v20y1979i3p381-408.html, doi:.
[HK85] Elhanan Helpman and Paul Krugman. Market structure and international trade. MIT Press Cambridge,
1985.
[HLL96] O Hernandez-Lerma and J B Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria.
Number Vol 1 in Applications of Mathematics Stochastic Modelling and Applied Probability. Springer,
1996.
[HN97] Hugo A Hopenhayn and Juan Pablo Nicolini. Optimal Unemployment Insurance. Journal of Political Econ-
omy, 105(2):412–438, April 1997. URL: https://ideas.repec.org/a/ucp/jpolec/v105y1997i2p412-38.html,
doi:10.1086/262078.
[HR93] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A General Equilibrium
Analysis. Journal of Political Economy, 101(5):915–938, 1993.
[Jac73] D. H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and their relation
to differential games. IEEE Transactions on Automatic Control, 18(2):124–131, 1973.
[Jud98] K L Judd. Numerical Methods in Economics. Scientific and Engineering. MIT Press, 1998.
[Jud85] Kenneth L Judd. On the performance of patents. Econometrica, pages 567–585, 1985.
[JYC03] Kenneth L. Judd, Sevin Yeltekin, and James Conklin. Computing Supergame Equilibria. Econometrica,
71(4):1239–1254, 07 2003. URL: https://ideas.repec.org/a/ecm/emetrp/v71y2003i4p1239-1254.html,
doi:.
[Kas00] Kenneth Kasa. Forecasting the forecasts of others in the frequency domain. Review of Economic Dynamics,
3:726–756, 2000.
[KNS18] Tomoo Kikuchi, Kazuo Nishimura, and John Stachurski. Span of control, transaction costs, and the struc-
ture of production chains. Theoretical Economics, 13(2):729–760, 2018.
[Kni21] Frank H. Knight. Risk, Uncertainty, and Profit. Houghton Mifflin, 1921.
Bibliography 1055
Advanced Quantitative Economics with Python
[Kre81] David M. Kreps. Arbitrage and equilibrium in economies with infinitely many commodities. Jour-
nal of Mathematical Economics, 8(1):15–35, March 1981. URL: https://ideas.repec.org/a/eee/mateco/
v8y1981i1p15-35.html, doi:.
[KP80] Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational expectations and optimal
control. Journal of Economic Dynamics and Control, 2:79–91, 1980.
[LM94] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Applied Mathe-
matical Sciences. Springer-Verlag, 1994.
[Lea78] Edward E Leamer. Specification searches: Ad hoc inference with nonexperimental data. Volume 53. John
Wiley & Sons Incorporated, 1978.
[LWY13] Eric M. Leeper, Todd B. Walker, and Shu‐Chun Susan Yang. Fiscal foresight and information flows. Econo-
metrica, 81(3):1115–1145, May 2013.
[LS18] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition, 2018.
[Luc87] Robert E Lucas. Models of business cycles. Volume 26. Oxford Blackwell, 1987.
[Luc78] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the Econometric Society,
46(6):1429–1445, 1978.
[LS83] Robert E Lucas, Jr. and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an Economy without
Capital. Journal of monetary Economics, 12(3):55–93, 1983.
[MMR06] Fabio Maccheroni, Massimo Marinacci, and Aldo Rustichini. Ambiguity Aversion, Robustness, and the
Variational Representation of Preferences. Econometrica, 74(6):1147–1498, 2006.
[MT09] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2009.
[MF02] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance. Cambridge: MIT Press,
2002.
[MM58] Franco Modigliani and Merton H. Miller. Corporation finance and the theory of investment. American
Economic Review, XLVIII(3):261–297, 1958.
[Mut60] John F Muth. Optimal properties of exponentially weighted forecasts. Journal of the american statistical
association, 55(290):299–306, 1960.
[Orf88] Sophocles J Orfanidis. Optimum Signal Processing: An Introduction. McGraw Hill Publishing, New York,
New York, 1988.
[PCL86] Joseph Pearlman, David Currie, and Paul Levine. Rational Expectations Models with Private Information.
Economic Modelling, 3(2):90–105, 1986.
[PS05] Joseph G. Pearlman and Thomas J. Sargent. Knowing the Forecasts of Others. Review of Economic Dy-
namics, 8(2):480–497, April 2005. URL: https://ideas.repec.org/a/red/issued/v8y2005i2p480-497.html,
doi:10.1016/j.red.2004.10.011.
[Put05] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley &
Sons, 2005.
[Ram27] F. P. Ramsey. A Contribution to the theory of taxation. Economic Journal, 37(145):47–61, 1927.
[REL75] Jr. Robert E. Lucas. An equilibrium model of the business cycle. Journal of Political Economy, 83:1113–
1144, 1975.
[Rom05] Steven Roman. Advanced linear algebra. Volume 3. Springer, 2005.
[RMS94] Sherwin Rosen, Kevin M Murphy, and Jose A Scheinkman. Cattle cycles. Journal of Political Economy,
102(3):468–492, 1994.
[Ros78] Stephen A Ross. A Simple Approach to the Valuation of Risky Streams. The Journal of Business,
51(3):453–475, July 1978. URL: https://ideas.repec.org/a/ucp/jnlbus/v51y1978i3p453-75.html.
1056 Bibliography
Advanced Quantitative Economics with Python
[Ros76] Stephen A. Ross. The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3):341–
360, December 1976. URL: https://ideas.repec.org/a/eee/jetheo/v13y1976i3p341-360.html, doi:.
[Roz67] Y. A. Rozanov. Stationary Random Processes. Holden-Day, San Francisco, 1967.
[Rus96] John Rust. Numerical dynamic programming in economics. Handbook of computational economics, 1:619–
729, 1996.
[RR04] Jaewoo Ryoo and Sherwin Rosen. The engineering labor market. Journal of political economy,
112(S1):S110–S140, 2004.
[SHR91] Thomas Sargent, Lars Peter Hansen, and Will Roberts. Observable implications of present value budget
balance. In Rational Expectations Econometrics. Westview Press, 1991.
[Sar77] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational Expectations: I. Inter-
national Economic Review, 18(1):59–82, February 1977.
[Sar87] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition, 1987.
[SW73] Thomas J Sargent and Neil Wallace. The stability of models of money and growth with perfect foresight.
Econometrica: Journal of the Econometric Society, pages 1043–1048, 1973.
[Sar91] Thomas J. Sargent. Equilibrium with signal extraction from endogenous variables. Journal of Economic
Dynamics and Control, 15:245–273, 1991.
[SW49] Claude E. Shannon and Warren Weaver. The Mathematical Theory of Communication. University of Illinois
Press, Urbana, 1949.
[SW79] Steven Shavell and Laurence Weiss. The optimal payment of unemployment insurance benefits over time.
Journal of political Economy, 87(6):1347–1362, 1979.
[Shi95] A N Shiriaev. Probability. Graduate texts in mathematics. Springer. Springer, 2nd edition, 1995.
[Sin87] Kenneth J. Singleton. Asset prices in a time-series model with disparately informed competitive traders.
In William A. Barnett and Kenneth J. Singleton, editors, New Apprroaches to Monetary Economics. Cam-
bridge University Press, 1987.
[SLP89] N L Stokey, R E Lucas, and E C Prescott. Recursive Methods in Economic Dynamics. Harvard University
Press, 1989.
[Sto89] Nancy L Stokey. Reputation and time consistency. The American Economic Review, pages 134–139, 1989.
[Sto91] Nancy L. Stokey. Credible public policy. Journal of Economic Dynamics and Control, 15(4):627–656,
October 1991.
[SW09] Lars E.O. Svensson and Noah Williams. Optimal Monetary Policy under Uncertainty in DSGE Models: A
Markov Jump-Linear-Quadratic Approach. In Klaus Schmidt-Hebbel, Carl E. Walsh, Norman Loayza (Se-
ries Editor), and Klaus Schmidt-Hebbel (Series, editors, Monetary Policy under Uncertainty and Learning,
volume 13 of Central Banking, Analysis, and Economic Policies Book Series, chapter 3, pages 077–114.
Central Bank of Chile, edition, March 2009.
[SW+08] Lars EO Svensson, Noah Williams, and others. Optimal monetary policy under uncertainty: a markov
jump-linear-quadratic approach. Federal Reserve Bank of St. Louis Review, 90(4):275–293, 2008.
[Tal00] Thomas D Tallarini. Risk-sensitive real business cycles. Journal of Monetary Economics, 45(3):507–532,
June 2000.
[Tow83] Robert M. Townsend. Forecasting the forecasts of others. Journal of Political Economy, 91:546–588, 1983.
[Whi63] Peter Whittle. Prediction and regulation by linear least-square methods. English Univ. Press, 1963.
[Whi81] Peter Whittle. Risk-sensitive linear/quadratic/gaussian control. Advances in Applied Probability,
13(4):764–777, 1981.
Bibliography 1057
Advanced Quantitative Economics with Python
[Whi83] Peter Whittle. Prediction and Regulation by Linear Least Squares Methods. University of Minnesota Press,
Minneapolis, Minnesota, 2nd edition, 1983.
[Whi90] Peter Whittle. Risk-Sensitive Optimal Control. Wiley, New York, 1990.
1058 Bibliography
PROOF INDEX
square-summable
square-summable (calvo_machine_learn), 797
1059
Advanced Quantitative Economics with Python
A M
AR, 528 MA, 528
ARMA, 525, 528 Markov Chains
ARMA Processes, 521 Continuous State, 23
Markov Perfect Equilibrium
B Applications, 503
Bellman Equation, 479 Overview, 499
Models
C Additive functionals, 559, 823, 847
Coase's Theory of the Firm, 289 Lucas Asset Pricing, 649
Complex Numbers, 526
Consumption N
Tax, 91 Nonparametric Estimation, 546
Covariance Stationary, 522
Covariance Stationary Processes, 521 O
AR, 524 Orthogonal Projection, 5
MA, 524
P
D Periodograms, 543
Discrete State Dynamic Programming, 53 Computation, 545
Interpretation, 544
E python, 43, 142, 195, 205, 219, 359, 388, 399, 407, 417,
Elementary Asset Pricing, 659 422, 428, 435, 672
F R
Fixed Point Theory, 653 Ramsey Problem
Optimal Taxation, 227
G Robustness, 479
General Linear Processes, 523
S
L Smoothing, 546
Linear Markov Perfect Equilibria, 500 Tax, 105
Lucas Model, 649 Spectra
Assets, 650 Estimation, 543
Computation, 654 Spectra, Estimation
Consumers, 650 AR(1) Setting, 551
Dynamic Program, 651 Fast Fourier Transform, 543
Equilibrium Constraints, 652 Pre-Filtering, 551
Equilibrium Price Function, 652 Smoothing, 546, 549, 551
Pricing, 650 Spectral Analysis, 521, 526
Solving, 652 Spectral Densities, 527
1061
Advanced Quantitative Economics with Python
W
White Noise, 523, 527
Wold Representation, 523
1062 Index