Quantitative Economics With Python
Quantitative Economics With Python
with Python
July 1, 2019
1 https://lectures.quantecon.org/py/
2
Contents
I Introduction to Python 1
1 About Python 3
3 An Introductory Example 35
4 Python Essentials 55
6 NumPy 81
7 Matplotlib 99
8 SciPy 111
9 Numba 123
15 Debugging 237
16 Pandas 245
3
4 CONTENTS
44 Consumption and Tax Smoothing with Complete and Incomplete Markets 729
46 Robustness 763
Introduction to Python
1
1
About Python
1.1 Contents
• Overview 1.2
1.2 Overview
At this stage, it’s not our intention that you try to replicate all you see
We will work through what follows at a slow pace later in the lecture series
Our only objective for this lecture is to give you some feel of what Python is, and what it can
do
3
4 1. ABOUT PYTHON
• communications
• web development
• CGI and graphical user interfaces
• games
• multimedia, data processing, security, etc., etc., etc.
• Google
• Dropbox
• Reddit
• YouTube
• Walt Disney Animation, etc., etc.
The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python
The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science
1.3. WHAT’S PYTHON? 5
For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here
(The corresponding time path for MATLAB is shown for comparison)
Note that pandas takes off in 2012, which is the same year that we seek Python’s popularity
begin to spike in the first figure
Overall, it’s clear that
1.3.3 Features
One nice feature of Python is its elegant syntax — we’ll see many examples later on
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the
syntax easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you don’t need to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is an elegant design
6 1. ABOUT PYTHON
Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors
Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library
NumPy provides the basic array data type plus some simple processing operations
For example, let’s build some arrays
In [2]: b @ c
Out[2]: 1.5265566588595902e-16
The number you see here might vary slightly but it’s essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density
1.4. SCIENTIFIC PROGRAMMING 7
� = norm()
value, error = quad(�.pdf, -2, 2) # Integrate using Gaussian quadrature
value
Out[3]: 0.9544997361036417
• linear algebra
• integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc.
1.4.2 Graphics
The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib
Example 3D plot
• Plotly
• Bokeh
• VPython — 3D graphics and animations
1.4. SCIENTIFIC PROGRAMMING 9
Out[4]: 3*x + y
solve polynomials
solve(x**2 + x + 2)
limit(1 / x, x, 0)
Out[7]: oo
In [8]: limit(sin(x) / x, x, 0)
Out[8]: 1
In [9]: diff(sin(x), x)
Out[9]: cos(x)
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 1. ABOUT PYTHON
1.4.4 Statistics
Python’s data manipulation and statistics libraries have improved rapidly over the last few
years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Here’s a simple example, using some fake data
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
In [11]: df.mean()
Here’s some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node
1.4. SCIENTIFIC PROGRAMMING 11
/home/anju/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarnin
if cb.is_numlike(alpha):
Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
12 1. ABOUT PYTHON
See also
- Amazon Elastic Compute Cloud
- The Google App Engine (Python, Java, PHP or Go)
- Pythonanywhere
- Sagemath Cloud
Apart from the cloud computing options listed above, you might like to consider
- Parallel computing through IPython clusters
- The Starcluster interface to Amazon’s EC2
- GPU programming through PyCuda, PyOpenCL, Theano or similar
There are many other interesting developments with scientific programming in Python
Some representative examples include
- Jupyter — Python in your browser with code cells, embedded images, etc.
- Numba — Make Python run at the same speed as native machine code!
- Blaze — a generalization of NumPy
- PyTables — manage large data sets
- CVXPY — convex optimization in Python
2.1 Contents
• Overview 2.2
• Anaconda 2.3
• Exercises 2.8
2.2 Overview
1. get a Python environment up and running with all the necessary tools
2. execute simple Python commands
3. run a sample program
4. install the code libraries that underpin these lectures
2.3 Anaconda
The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which
13
14 2. SETTING UP YOUR PYTHON ENVIRONMENT
Hence the best approach for our purposes is to install a free Python distribution that contains
• very popular
• cross platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name
Anaconda also comes with a great package management system to organize your code li-
braries
All of what follows assumes that you adopt this recommendation!
Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution
As a practice run, please execute the following
1. Open up a terminal
2. Type conda update anaconda
Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries
They use a browser-based interface to Python with
2.4. JUPYTER NOTEBOOKS 15
Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook
You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to
Once you have installed Anaconda, you can start the Jupyter notebook
Either
If you use the second option, you will see something like this (click to enlarge)
Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine
Hopefully, your default browser has also opened up with a web page that looks something like
this (click to enlarge)
2.4. JUPYTER NOTEBOOKS 17
The notebook displays an active cell, into which you can type Python commands
Let’s start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter
2.4. JUPYTER NOTEBOOKS 19
(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are
1. Edit mode
1. Command mode
To switch to
• command mode from edit mode, hit the Esc key or Ctrl-M
20 2. SETTING UP YOUR PYTHON ENVIRONMENT
The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it
User Interface Tour
At this stage, we recommend you take your time to
• look at the various options in the menus and see what they do
• take the “user interface tour”, which can be accessed through the help menu
N = 20
θ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)
ax = plt.subplot(111, polar=True)
bars = ax.bar(θ, radii, width=width, bottom=0.0)
plt.show()
2.4. JUPYTER NOTEBOOKS 21
Don’t worry about the details for now — let’s just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)
Clicking on the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
24 2. SETTING UP YOUR PYTHON ENVIRONMENT
Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25
Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files — or by using web services such as
nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above
QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon
Notes
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community
26 2. SETTING UP YOUR PYTHON ENVIRONMENT
into a cell
Alternatively, you can type the following into a terminal
Using the run command is often easier than copy and paste
(You might find that the % is unnecessary — use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isn’t in that directory, you will get an error
Let’s look at a successful example, where we run a file test.py with contents:
foobar
foobar
foobar
foobar
foobar
Here
• pwd asks Jupyter to show the PWD (or %pwd — see the comment about automagic
above)
– Note that test.py is there (on our computer, because we saved it there earlier)
• cat test.py asks Jupyter to print the contents of test.py (or !type test.py on
Windows)
If you’re trying to run a file not in the present working directory, you’ll get an error
To fix this error you need to either
One way to achieve the first option is to use the Upload button
• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture
Note: You can type the first letter or two of each directory name and then use the tab key to
expand
It’s often convenient to be able to see your code before you run it
2.7. EDITORS AND IDES 29
The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your
workflow
There are many different options and we mention them only in passing
30 2. SETTING UP YOUR PYTHON ENVIRONMENT
2.7.1 JupyterLab
A text editor is an application that is specifically designed to work with text files — such as
Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.
The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typing…
Here’s one set up, on a Linux box, with
2.7.4 IDEs
IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code — please tell us about your experiences on the forum
2.8 Exercises
2.8.1 Exercise 1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running
Now
2.8.2 Exercise 2
As an exercise, try
1. Installing Git
2. Getting a copy of QuantEcon.py using Git
For example, if you’ve installed the command line version, open up a terminal and enter
(This is just git clone in front of the URL for the repository)
Even better,
1. Sign up to GitHub
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub)
3. Fork QuantEcon.py
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo
5. If you made a valuable improvement, send us a pull request!
An Introductory Example
3.1 Contents
• Overview 3.2
• Version 1 3.4
• Exercises 3.6
• Solutions 3.7
Note: These references offer help on installing Python but you should probably stick with the
method on our set up page
You’ll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course
3.2 Overview
In this lecture, we will write and then pick apart small Python programs
35
36 3. AN INTRODUCTORY EXAMPLE
The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures
3.2.1 Prerequisites
Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw
𝜖𝑡 is independent standard normal
In other words, we want to generate figures that look something like this:
3.4 Version 1
Here are a few lines of code that perform the task we set
x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 37
After import numpy as np we have access to these attributes via the syntax np.
Here’s another example
np.sqrt(4)
Out[2]: 2.0
numpy.sqrt(4)
Out[3]: 2.0
38 3. AN INTRODUCTORY EXAMPLE
In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around
On this machine, it’s located in
anaconda3/lib/python3.6/site-packages/numpy
Subpackages
Consider the line x = np.random.randn(100)
Here np refers to the package NumPy, while random is a subpackage of NumPy
You can see the contents here
Subpackages are just packages that are subdirectories of another package
np.sqrt(4)
Out[4]: 2.0
sqrt(4)
Out[5]: 2.0
for i in range(ts_length):
e = np.random.randn()
�_values.append(e)
plt.plot(�_values)
plt.show()
40 3. AN INTRODUCTORY EXAMPLE
In brief,
3.5.2 Lists
In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)
Out[7]: list
The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [8]: x
3.5. ALTERNATIVE VERSIONS 41
In [9]: x.append(2.5)
x
Here append() is what’s called a method, which is a function “attached to” an object—in
this case, the list x
We’ll learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object
• String objects have string methods, list objects have list methods, etc.
In [10]: x
In [11]: x.pop()
Out[11]: 2.5
In [12]: x
In [13]: x
In [14]: x[0]
Out[14]: 10
In [15]: x[1]
Out[15]: 'foo'
42 3. AN INTRODUCTORY EXAMPLE
Now let’s consider the for loop from the program above, which was
Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the “block” of code that we
are looping over
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion
In our program, indentation decreases after line �_values.append(e), telling Python that
this line marks the lower limit of the code block
More on indentation below—for now, let’s look at another example of a for loop
This example helps to clarify how the for loop works: When we execute a loop of the form
• For each element of the sequence, it “binds” the name variable_name to that ele-
ment and then executes the code block
The sequence object can in fact be a very general object, as we’ll see soon enough
In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram
Once you get used to it, this is a good thing: It
3.5. ALTERNATIVE VERSIONS 43
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.
• All lines in a code block must have the same amount of indentation
• The Python standard is 4 spaces, and that’s what you should use
Tabs vs Spaces
One small “gotcha” here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure it’s configured to do
so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
— trying searching online
The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, let’s modify the program above to use a while loop in-
stead
Note that
• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1
Now let’s go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:
data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 45
Let’s go over this carefully, in case you’re not familiar with functions and how they work
We have defined a function called generate_data() as follows
This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100
The net result is that the name data is bound to the list �_values returned by the function
3.5.7 Conditions
else:
e = np.random.randn()
�_values.append(e)
return �_values
Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks
Notes
Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function
To understand this, consider the following version
This principle works more generally—for example, consider the following piece of code
Out[22]: 7
In [23]: m = max
m(7, 2, 4)
Out[23]: 7
Here we created another name for the built-in function max(), which could then be used in
identical ways
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another function—as we did above
48 3. AN INTRODUCTORY EXAMPLE
We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line
In [25]: range(8)
Out[25]: range(0, 8)
�_values = []
for i in range(n):
e = generator_type()
�_values.append(e)
into
3.6 Exercises
3.6.1 Exercise 1
3.6.2 Exercise 2
The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary
trials, where each trial succeeds with probability 𝑝
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of 𝑌
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True
with probability 𝑝
3.6.3 Exercise 3
• If 𝑈 is a bivariate uniform random variable on the unit square (0, 1)2 , then the proba-
bility that 𝑈 lies in a subset 𝐵 of (0, 1)2 is equal to the area of 𝐵
• If 𝑈1 , … , 𝑈𝑛 are IID copies of 𝑈 , then, as 𝑛 gets large, the fraction that falls in 𝐵, con-
verges to the probability of landing in 𝐵
• For a circle, area = pi * radius^2
3.6.4 Exercise 4
Write a program that prints one realization of the following random device:
3.6.5 Exercise 5
Your next task is to simulate and plot the correlated time series
3.6.6 Exercise 6
To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea
Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 51
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the 𝛼 values
Important hints:
• If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure
– And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line
3.7 Solutions
3.7.1 Exercise 1
In [30]: def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k
factorial(4)
Out[30]: 24
3.7.2 Exercise 2
In [31]: from numpy.random import uniform
52 3. AN INTRODUCTORY EXAMPLE
binomial_rv(10, 0.5)
Out[31]: 5
3.7.3 Exercise 3
In [32]: n = 100000
count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
3.13976
3.7.4 Exercise 4
In [33]: from numpy.random import uniform
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1
print(payoff)
1
3.7. SOLUTIONS 53
3.7.5 Exercise 5
The next line embeds all subsequent figures in the browser itself
In [34]: α = 0.9
ts_length = 200
current_x = 0
x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values)
plt.show()
3.7.6 Exercise 6
for α in αs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values, label=f'α = {α}')
plt.legend()
plt.show()
54 3. AN INTRODUCTORY EXAMPLE
4
Python Essentials
4.1 Contents
• Iterating 4.4
• Exercises 4.8
• Solutions 4.9
In this lecture, we’ll cover features of the language that are essential to reading and writing
Python code
We’ve already met several built-in Python data types, such as strings, integers, floats and
lists
Let’s learn a bit more about them
One simple data type is Boolean values, which can be either True or False
In [1]: x = True
x
Out[1]: True
55
56 4. PYTHON ESSENTIALS
In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value
Out[2]: False
In [3]: type(y)
Out[3]: bool
In [4]: x + y
Out[4]: 1
In [5]: x * y
Out[5]: 0
Out[6]: 2
sum(bools)
Out[7]: 3
The two most common data types used to represent numbers are integers and floats
In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)
Out[8]: int
In [9]: type(c)
Out[9]: float
Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate
As long as you’re using Python 3.x, division of integers yields floats
In [10]: 1 / 2
4.2. DATA TYPES 57
Out[10]: 0.5
But be careful! If you’re still using Python 2.x, division of two integers returns only the inte-
ger part
For integer division in Python 3.x use this syntax:
In [11]: 1 // 2
Out[11]: 0
In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y
Out[12]: 5j
4.2.2 Containers
Python has several basic types for storing collections of (possibly heterogeneous) data
We’ve already discussed lists
A related data type is tuples, which are “immutable” lists
In [14]: type(x)
Out[14]: tuple
In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable
In [15]: x = [1, 2]
x[0] = 10
x
Out[15]: [10, 2]
In [16]: x = (1, 2)
x[0] = 10
58 4. PYTHON ESSENTIALS
---------------------------------------------------------------------------
<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10
We’ll say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be “unpacked” as follows
Out[17]: 10
In [18]: y
Out[18]: 20
In [19]: a = [2, 4, 6, 8]
a[1:]
Out[19]: [4, 6, 8]
In [20]: a[1:3]
Out[20]: [4, 6]
Out[21]: [6, 8]
In [22]: s = 'foobar'
s[-3:] # Select the last three elements
4.3. INPUT AND OUTPUT 59
Out[22]: 'bar'
Out[23]: dict
In [24]: d['age']
Out[24]: 33
Out[25]: set
Out[26]: False
In [27]: s1.intersection(s2)
Out[27]: {'b'}
Let’s briefly review reading and writing to text files, starting with writing
Here
In [30]: %pwd
Out[30]: '/home/anju/Desktop/lecture-source-py/_build/jupyter/executed'
In [32]: print(out)
Testing
Testing again
4.3.1 Paths
Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case, you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
4.4 Iterating
One of the most important tasks in computing is stepping through a sequence of data and
performing a given action
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for
loop
Many Python objects are “iterable”, in the sense that they can be looped over
To give an example, let’s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory
4.4. ITERATING 61
Overwriting us_cities.txt
Suppose that we want to make the information more readable, by capitalizing names and
adding commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:
Here format() is a string method used for inserting variables into strings
The reformatting of each line is the result of three different string methods, the details of
which can be left till later
The interesting part of this program for us is line 2, which shows that
1. The file object f is iterable, in the sense that it can be placed to the right of in within
a for loop
2. Iteration steps through each line in the file
One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing
For example,
62 4. PYTHON ESSENTIALS
1
4
9
is preferred to
1
4
9
When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code
The zip() function is also useful for creating dictionaries — for example
If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
4.5.1 Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False)
A common type is comparisons, such as
In [41]: x, y = 1, 2
x < y
Out[41]: True
In [42]: x > y
Out[42]: False
Out[43]: True
Out[44]: True
In [45]: x = 1 # Assignment
x == 2 # Comparison
Out[45]: False
In [46]: 1 != 2
Out[46]: True
Note that when testing conditions, we can use any valid Python expression
Out[47]: 'yes'
Out[48]: 'no'
64 4. PYTHON ESSENTIALS
• Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.)
and None are all equivalent to False
Out[49]: True
Out[50]: False
Out[51]: True
Out[52]: False
Out[53]: True
Remember
Let’s talk a bit more about functions, which are all important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
4.6. MORE FUNCTIONS 65
Out[54]: 20
Out[55]: range(0, 4)
In [56]: list(range(4)) # will evaluate the range iterator and create a list
Out[56]: [0, 1, 2, 3]
In [57]: str(22)
Out[57]: '22'
In [58]: type(22)
Out[58]: int
Out[59]: False
Out[60]: True
User-defined functions are important for improving the clarity of your code by
Functions without a return statement automatically return the special Python object None
4.6.3 Docstrings
Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this
In [63]: f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
In [64]: f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
4.6. MORE FUNCTIONS 67
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as
well
and
quad(lambda x: x**3, 0, 2)
Here the function created by lambda is said to be anonymous because it was never given a
name
If you did the exercises in the previous lecture, you would have come across the statement
In this call to Matplotlib’s plot function, notice that the last argument is passed in
name=argument syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
Keyword arguments are particularly useful when a function has a lot of arguments, in which
case it’s hard to remember the right order
You can adopt keyword arguments in user-defined functions with no difficulty
The next example illustrates the syntax
The keyword argument values we supplied in the definition of f become the default values
In [69]: f(2)
Out[69]: 3
Out[70]: 14
To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
We’ve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page
4.8 Exercises
4.8.1 Exercise 1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip()
Part 2: In one line, count the number of even numbers in 0,…,99
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even
4.8.2 Exercise 2
𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (1)
𝑖=0
Write a function p such that p(x, coeff) that computes the value in Eq. (1) given a point
x and a list of coefficients coeff
Try to use enumerate() in your loop
4.8.3 Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters
in the string
Hint: 'foo'.upper() returns 'FOO'
4.8.4 Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False
4.8.5 Exercise 5
When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation
70 4. PYTHON ESSENTIALS
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency
4.9 Solutions
4.9.1 Exercise 1
Part 1 Solution:
Here’s one possible solution
Out[71]: 6
Out[72]: 6
Part 2 Solution:
One solution is
Out[73]: 50
Out[74]: 50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 71
Out[75]: 50
and
Out[76]: 50
Part 3 Solution
Here’s one possibility
In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])
Out[77]: 2
4.9.2 Exercise 2
In [78]: def p(x, coeff):
return sum(a * x**i for i, a in enumerate(coeff))
Out[79]: 6
4.9.3 Exercise 3
Out[80]: 3
4.9.4 Exercise 4
Here’s a solution:
# == test == #
True
False
Of course, if we use the sets data type then the solution is easier
4.9.5 Exercise 5
In [83]: def linapprox(f, a, b, n, x):
"""
Evaluates the piecewise linear interpolant of f at x on the interval
[a, b], with n evenly spaced grid points.
Parameters
===========
f : function
The function to approximate
n : integer
Number of grid points
Returns
=========
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
5.1 Contents
• Overview 5.2
• Objects 5.3
• Summary 5.4
5.2 Overview
Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach
However, at a foundational level, Python is object-oriented
73
74 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
5.3 Objects
In Python, an object is a collection of data and instructions held in computer memory that
consists of
1. a type
2. a unique identity
3. data (i.e., content)
4. methods
5.3.1 Type
Python provides for different types of objects, to accommodate different categories of data
For example
Out[1]: str
Out[2]: int
Out[3]: '300cc'
Out[4]: 700
---------------------------------------------------------------------------
<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400
Here we are mixing types, and it’s unclear to Python whether the user wants to
To avoid the error, you need to clarify by changing the relevant type
For example,
Out[6]: 700
5.3.2 Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object
The identity of an object can be obtained via the id() function
In [7]: y = 2.5
z = 2.5
id(y)
Out[7]: 140535456630128
In [8]: id(z)
Out[8]: 140535456630080
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object
The identity of an object is in fact just the address of the object in memory
76 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
If we set x = 42 then we create an object of type int that contains the data 42
In fact, it contains more, as the following example shows
In [9]: x = 42
x
Out[9]: 42
In [10]: x.imag
Out[10]: 0
In [11]: x.__class__
Out[11]: int
When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type
Any name following a dot is called an attribute of the object to the left of the dot
We see from this example that objects have attributes that contain auxiliary information
They also have attributes that act like functions, called methods
These attributes are important, so let’s discuss them in-depth
5.3.4 Methods
Out[12]: True
In [13]: callable(x.__doc__)
Out[13]: False
Methods typically act on the data contained in the object they belong to, or combine that
data with other data
In [15]: s.lower()
It doesn’t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows
(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)
5.4 Summary
In [20]: type(f)
Out[20]: function
In [21]: id(f)
Out[21]: 140535456543336
In [22]: f.__name__
Out[22]: 'f'
We can see that f has type, identity, attributes and so on—just like any other object
It also has methods
One example is the __call__ method, which just evaluates the function
In [23]: f.__call__(3)
Out[23]: 9
id(math)
Out[24]: 140535632790936
This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent
Part II
79
6
NumPy
6.1 Contents
• Overview 6.2
• Exercises 6.7
• Solutions 6.8
“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton
6.2 Overview
In this lecture, we introduce NumPy arrays and the fundamental array processing operations
provided by NumPy
6.2.1 References
81
82 6. NUMPY
• Loops in Python over Python data types like lists carry significant overhead
• C and Fortran code contains a lot of type information that can be used for optimization
• Various optimizations can be carried out during compilation when the compiler sees the
instructions as a whole
However, for a task like the one described above, there’s no need to switch back to C or For-
tran
Instead, we can use NumPy, where the instructions look like this:
x = np.random.uniform(0, 1, size=1000000)
x.mean()
Out[1]: 0.5004892850074708
The operations of creating the array and computing its mean are both passed out to carefully
optimized machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines
In a later lecture, we’ll discuss code that isn’t easy to vectorize and how such routines can
also be optimized
The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
6.4. NUMPY ARRAYS 83
In [2]: a = np.zeros(3)
a
In [3]: type(a)
Out[3]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64
In [4]: a = np.zeros(3)
type(a[0])
Out[4]: numpy.float64
Out[5]: numpy.int64
In [6]: z = np.zeros(10)
Here z is a flat array with no dimension — neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple
In [7]: z.shape
84 6. NUMPY
Out[7]: (10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute
Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
In [9]: z = np.zeros(4)
z.shape = (2, 2)
z
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2))
In [10]: z = np.empty(3)
z
In [12]: z = np.identity(2)
z
6.4. NUMPY ARRAYS 85
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [14]: type(z)
Out[14]: numpy.ndarray
See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array
Out[17]: True
Out[18]: False
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details
In [19]: z = np.linspace(1, 2, 5)
z
In [20]: z[0]
Out[20]: 1.0
86 6. NUMPY
In [22]: z[-1]
Out[22]: 2.0
In [24]: z[0, 0]
Out[24]: 1
In [25]: z[0, 1]
Out[25]: 2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
In [26]: z[0, :]
In [27]: z[:, 1]
In [28]: z = np.linspace(2, 4, 5)
z
In [30]: z
6.4. NUMPY ARRAYS 87
In [32]: z[d]
Out[32]: array([2.5, 3. ])
In [33]: z = np.empty(3)
z
In [34]: z[:] = 42
z
Out[37]: 10
Out[38]: 2.5
Out[39]: 4
Out[40]: 3
Out[43]: 1.25
Out[44]: 1.118033988749895
In [46]: z = np.linspace(2, 4, 5)
z
In [47]: z.searchsorted(2.2)
Out[47]: 1
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [49]: np.sum(a)
Out[49]: 10
In [50]: np.mean(a)
Out[50]: 2.5
6.5. OPERATIONS ON ARRAYS 89
In [52]: a * b
In [53]: a + 10
In [54]: a * 10
In [56]: A + 10
In [57]: A * B
With Anaconda’s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
Out[59]: 50
In [61]: A @ (0, 1)
Mutability leads to the following behavior (which can be shocking to MATLAB program-
mers…)
In [64]: a = np.random.randn(3)
a
In [65]: b = a
b[0] = 0.0
a
In [66]: a = np.random.randn(3)
a
In [67]: b = np.copy(a)
b
In [68]: b[:] = 1
b
In [69]: a
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays
In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions
In [72]: z
In [75]: x = np.random.randn(4)
x
f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
However, this approach doesn’t always obtain the same speed as a more carefully crafted vec-
torized function
6.6.2 Comparisons
In [79]: y[0] = 5
z == y
In [80]: z != y
In [82]: z > 3
In [83]: b = z > 3
b
In [84]: z[b]
6.6.3 Sub-packages
NumPy provides some additional functionality related to scientific programming through its
sub-packages
We’ve already seen how we can generate random variables using np.random
Out[86]: 5.034
Out[87]: -2.0000000000000004
Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy
We’ll cover the SciPy versions in more detail soon
For a comprehensive list of what’s available in NumPy see this documentation
6.7 Exercises
6.7.1 Exercise 1
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (1)
𝑛=0
Earlier, you wrote a simple function p(x, coeff) to evaluate Eq. (1) without considering
efficiency
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
don’t use this class)
6.7.2 Exercise 2
• Divide the unit interval [0, 1] into 𝑛 subintervals 𝐼0 , 𝐼1 , … , 𝐼𝑛−1 such that the length of
𝐼𝑖 is 𝑞𝑖
• Draw a uniform random variable 𝑈 on [0, 1] and return the 𝑖 such that 𝑈 ∈ 𝐼𝑖
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you can’t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops
If you can, write the method so that draw(k) returns k draws from q
6.7.3 Exercise 3
6.8 Solutions
In [90]: import matplotlib.pyplot as plt
%matplotlib inline
6.8.1 Exercise 1
Let’s test it
[1. 1. 1.]
3.0
3.0
6.8.2 Exercise 2
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""
The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
6.8. SOLUTIONS 97
The problem is that Q does not change accordingly, and Q is the data used in the draw
method
To deal with this, one option is to compute Q every time the draw method is called
But this is inefficient relative to computing Q once-off
A better option is to use descriptors
A solution from the quantecon library using descriptors that behaves as we desire can be
found here
6.8.3 Exercise 3
In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval
"""
In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
7
Matplotlib
7.1 Contents
• Overview 7.2
• Exercises 7.6
• Solutions 7.7
7.2 Overview
We’ve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with
99
100 7. MATPLOTLIB
Here’s the kind of easy example you might find in introductory treatments
This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line)
This leads us to the alternative, object-oriented Matplotlib API
Here’s the code corresponding to the preceding figure using the object-oriented API
7.3.3 Tweaks
We’ve also used alpha to make the line slightly transparent—which makes it look smoother
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center')
Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them
We mention just a few
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
7.4.3 3D Plots
Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
7.5. FURTHER READING 107
Here’s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes
Read carefully through the code and see if you can follow what’s going on
ax.grid()
return fig, ax
1. calls the standard plt.subplots function internally to generate the fig, ax pair,
2. makes the desired customizations to ax, and
3. passes the fig, ax pair back to the calling code
7.6 Exercises
7.6.1 Exercise 1
7.7 Solutions
7.7.1 Exercise 1
for θ in θ_vals:
ax.plot(x, np.cos(np.pi * θ * x) * np.exp(- x))
plt.show()
7.7. SOLUTIONS 109
110 7. MATPLOTLIB
8
SciPy
8.1 Contents
• Statistics 8.3
• Optimization 8.5
• Integration 8.6
• Exercises 8.8
• Solutions 8.9
SciPy builds on top of NumPy to provide common tools for scientific programming such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
111
112 8. SCIPY
SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initializa-
tion file
__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']
del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')
However, it’s more common and better practice to use NumPy functionality explicitly
a = np.identity(3)
8.3 Statistics
𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface
Here’s an example of usage
In this code, we created a so-called rv_frozen object, via the call q = beta(5, 5)
114 8. SCIPY
The “frozen” part of the notation implies that q represents a particular distribution with a
particular set of parameters
Once we’ve done so, we can then generate random numbers, evaluate the density, etc., all
from this fixed distribution
Out[6]: 0.26656768000000003
Out[7]: 2.0901888000000013
Out[8]: 0.6339134834642708
In [9]: q.mean()
Out[9]: 0.5
identifier = scipy.stats.distribution_name(shape_parameters)
identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
8.4. ROOTS AND FIXED POINTS 115
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()
8.4.1 Bisection
And so on
This is bisection
Here’s a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with 𝑓(𝑎) < 0 < 𝑓(𝑏)
8.4. ROOTS AND FIXED POINTS 117
In fact, SciPy provides its own bisection function, which we now test using the function 𝑓 de-
fined in Eq. (2)
bisect(f, 0, 1)
Out[14]: 0.4082935042806639
• When the function is well-behaved, the Newton-Raphson method is faster than bisec-
tion
• When the function is less well-behaved, the Newton-Raphson might fail
Let’s investigate this using the same function 𝑓, first looking at potential instability
Out[15]: 0.40829350427935673
Out[16]: 0.7001700000000279
62.4 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
149 µs ± 5.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle
• If you have specific knowledge about your function, you might be able to exploit it to
generate efficiency
• If not, then the algorithm choice involves a trade-off between the speed of convergence
and robustness
In practice, most default algorithms for root-finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following man-
ner:
In scipy.optimize, the function brentq is such a hybrid method and a good default
In [19]: brentq(f, 0, 1)
Out[19]: 0.40829350427936706
15.6 µs ± 840 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Here the correct solution is found and the speed is almost the same as newton
Out[21]: array(1.)
If you don’t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function 𝑓 is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥)
8.5 Optimization
Out[22]: 0.0
8.6 Integration
Out[23]: 0.33333333333333337
In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integration—a useful one is fixed_quad, which is fast
and hence works well inside for loops
There are also functions for multivariate integration
See the documentation for more details
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines
8.8 Exercises
8.8.1 Exercise 1
8.9 Solutions
8.9.1 Exercise 1
Out[26]: 0.408294677734375
122 8. SCIPY
9
Numba
9.1 Contents
• Overview 9.2
• Vectorization 9.4
• Numba 9.5
In addition to what’s in Anaconda, this lecture will need the following libraries
9.2 Overview
In our lecture on NumPy, we learned one method to improve speed and efficiency in numeri-
cal work
That method, called vectorization, involved sending array processing operations in batch to
efficient low-level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory-intensive
Another problem is that only some algorithms can be vectorized
In the last few years, a new Python library called Numba has appeared that solves many of
these problems
It does so through something called just in time (JIT) compilation
JIT compilation is effective in many numerical settings and can generate extremely fast, effi-
cient code
It can also do other tricks such as facilitate multithreading (a form of parallelization well
suited to numerical work)
123
124 9. NUMBA
To understand what Numba does and why, we need some background knowledge
Let’s start by thinking about higher-level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment
The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug
The downside is that Python is harder to optimize — that is, turn into fast machine code —
than languages like C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high-speed languages for the great major-
ity of scientific computing tasks
This is because
1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combi-
nation of NumPy and Numba
Let’s start by trying to understand why high-level languages like Python are slower than com-
piled code
In [2]: a, b = 10, 10
a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
9.3. WHERE ARE THE BOTTLENECKS? 125
Out[3]: 'foobar'
(We say that the operator + is overloaded — its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
126 9. NUMBA
Moreover, the compiler is made aware of the data type by the programmer
Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount
9.4 Vectorization
• The machine code itself is typically compiled from carefully optimized C or Fortran
This can greatly accelerate many (but not all) numerical computations
Out[6]: 0.04178762435913086
In [7]: qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()
Out[7]: 0.0038301944732666016
The second code block — which achieves the same thing as the first — runs much faster
The reason is that in the second implementation we have broken the loop down into three
basic operations
1. draw n uniforms
2. square them
3. sum them
Many functions provided by NumPy are so-called universal functions — also called ufuncs
This means that they
In [8]: np.cos(1.0)
Out[8]: 0.5403023058681398
128 9. NUMBA
cos(𝑥2 + 𝑦2 )
𝑓(𝑥, 𝑦) = and 𝑎 = 3
1 + 𝑥2 + 𝑦 2
Here’s a plot of 𝑓
qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z
qe.toc()
Out[11]: 2.7486989498138428
qe.tic()
np.max(f(x, y))
qe.toc()
Out[12]: 0.02516627311706543
In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(We’ll make it even faster again below when we discuss Numba)
9.5 Numba
9.5.1 Prerequisites
9.5.2 An Example
𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )
Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()
Let’s time and compare identical function calls across these two versions:
In [15]: qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
132 9. NUMBA
In [16]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:
In [17]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
Out[18]: 174.51294400963275
In [19]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
9.5. NUMBA 133
In [20]: a = 1
@jit
def add_x(x):
return a + x
print(add_x(10))
11
In [21]: a = 2
print(add_x(10))
11
Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability
Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximiza-
tion problem discussed above
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
qe.tic()
np.max(f_vec(x, y))
qe.toc()
Out[22]: 0.030055522918701172
qe.tic()
np.max(f_vec(x, y))
qe.toc()
Out[23]: 0.023700714111328125
10.1 Contents
• Overview 10.2
• Cython 10.3
• Joblib 10.4
• Exercises 10.6
• Solutions 10.7
In addition to what’s in Anaconda, this lecture will need the following libraries
10.2 Overview
In this lecture, we review some other scientific libraries that are useful for economic research
and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass
10.3 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As you’ll recall, Numba solves this problem (where possible) by inferring type
Cython’s approach is different — programmers add type definitions directly to their “Python”
code
135
136 10. OTHER SCIENTIFIC LIBRARIES
As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code
Cython also takes care of building language extensions — the wrapper code that interfaces
between the resulting compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook
𝑛
1 − 𝛼𝑛+1
∑ 𝛼𝑖 =
𝑖=0
1−𝛼
If you’re not familiar with C, the main thing you should take notice of is the type definitions
In [4]: %%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum
Here cdef is a Cython keyword indicating a variable declaration and is followed by a type
The %%cython line at the top is not actually Cython code — it’s a Jupyter cell magic indi-
cating the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within
Python
What you are in fact calling is compiled C code with a Python call interface
Out[5]: 0.0884397029876709
In [6]: qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()
Out[6]: 0.03421354293823242
138 10. OTHER SCIENTIFIC LIBRARIES
Let’s go back to the first problem that we worked with: generating the iterates of the
quadratic map
𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )
The problem of computing iterates and returning a time series requires us to work with ar-
rays
The natural array type to work with is NumPy arrays
Here’s a Cython implementation that initializes, populates and returns a NumPy array
In [7]: %%cython
import numpy as np
If you run this code and time it, you will see that its performance is disappointing — nothing
like the speed gain we got from Numba
In [8]: qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()
Out[8]: 0.03150629997253418
This example was also computed in the Numba lecture, and you can see Numba is around 90
times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cython’s typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Here’s an example:
In [9]: %%cython
import numpy as np
from numpy cimport float_t
Here
In [10]: qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()
Out[10]: 0.0006136894226074219
10.3.3 Summary
Cython requires more expertise than Numba, and is a little more fiddly in terms of getting
good performance
In fact, it’s surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,
10.4 Joblib
10.4.1 Caching
Perhaps, like us, you sometimes run a long computation that simulates a model at a given set
of parameters — to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all
again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to
you when you repeat the calculation
10.4.2 An Example
Let’s look at a toy example, related to the quadratic map model discussed above
Let’s say we want to generate a long trajectory from a certain initial condition 𝑥0 and see
what fraction of the sample is below 0.1
(We’ll omit JIT compilation or other speedups for simplicity)
Here’s our code
@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)
We are using joblib to cache the result of calling qm at a given set of parameters
With the argument location=’./joblib_cache’, any call to this function results in both the in-
put values and output values being stored a subdirectory joblib_cache of the present working
directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output
that notes information being cached
In [13]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
________________________________________________________________________________
[Memory] Calling __main__--home-anju-Desktop-lecture-source-py-_build-jupyter-executed-__ipython-input__.qm…
qm(0.2, 10000000)
_______________________________________________________________qm - 8.9s, 0.1min
TOC: Elapsed: 0:00:8.85
Out[13]: 8.85545039176941
10.5. OTHER OPTIONS 141
The next time we call the function with the same set of parameters, the result is returned
almost instantaneously
In [14]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
Out[14]: 0.0007827281951904297
There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a
try
10.6 Exercises
10.6.1 Exercise 1
For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be
Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare
speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3
10.7 Solutions
10.7.1 Exercise 1
We let
• 0 represent “low”
• 1 represent “high”
In [15]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively
Let’s run this code and check that the fraction of time spent in the low state is about 0.666
In [17]: n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.6629
In [18]: qe.util.tic()
compute_series(n)
qe.util.toc()
Out[18]: 0.0751335620880127
10.7. SOLUTIONS 143
compute_series_numba = jit(compute_series)
In [20]: x = compute_series_numba(n)
print(np.mean(x == 0))
0.66566
In [21]: qe.util.tic()
compute_series_numba(n)
qe.util.toc()
Out[21]: 0.0015265941619873047
In [23]: %%cython
import numpy as np
from numpy cimport int_t, float_t
In [24]: compute_series_cy(10)
144 10. OTHER SCIENTIFIC LIBRARIES
In [25]: x = compute_series_cy(n)
print(np.mean(x == 0))
0.66746
In [26]: qe.util.tic()
compute_series_cy(n)
qe.util.toc()
Out[26]: 0.0033597946166992188
145
11
11.1 Contents
• Overview 11.2
• Summary 11.6
11.2 Overview
When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing — such as just in time compi-
lation — and how they affect good program design
Here
147
148 11. WRITING GOOD CODE
1. sets 𝑘0 = 1
2. iterates using Eq. (1) to produce a sequence 𝑘0 , 𝑘1 , 𝑘2 … , 𝑘𝑇
3. plots the sequence
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$")
11.3. AN EXAMPLE OF BAD CODE 149
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesn’t matter too much
But if you are ambitious and want to produce useful things, you’ll write medium to large pro-
grams too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts
If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the
code
These kinds of numeric literals in the body of your code are sometimes called “magic num-
bers”
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:
Yes, we realize that you can just cut and paste and change a few symbols
But as a programmer, your aim should be to automate repetition, not do it yourself
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong
If you want to know more, read the excellent summary found on this page
We’ll talk about how to avoid repetition below
11.4. GOOD CODING PRACTICE 151
Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient
Rookie programmers typically use global variables with abandon — as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since
This makes it much harder to be certain about what some small part of a given piece of code
actually commands
Here’s a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them
(We’ll discuss how just below)
JIT Compilation
In fact, there’s now another good reason to avoid global variables
In scientific computing, we’re witnessing the rapid growth of just in time (JIT) compilation
JIT compilation can generate excellent performance for scripting languages like Python and
Julia
But the task of the compiler used for JIT compilation becomes much harder when many
global variables are present
(This is because data type instability hinders the generation of efficient machine code — we’ll
learn more about such topics later on)
Fortunately, we can easily avoid the evils of global variables and WET code
• WET stands for “we love typing” and is the opposite of DRY
Here’s some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that
• global variables are quarantined by collecting together at the end, not the start of the
program
• magic numbers are avoided
• the loop at the end where the actual work is done is short and relatively simple
ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
11.6. SUMMARY 153
11.6 Summary
12.1 Contents
• Overview 12.2
• Exercises 12.6
• Solutions 12.7
12.2 Overview
So imagine now you want to write a program with consumers, who can
155
156 12. OOP II: BUILDING CLASSES
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into “objects”
An example is a Python list, which not only stores data but also knows how to sort itself, etc.
In [1]: x = [1, 5, 4]
x.sort()
x
Out[1]: [1, 4, 5]
As we now know, sort is a function that is “part of” the list object — and hence called a
method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers)
It describes
In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via “dotted attribute notation”
• object_name.data
• object_name.method_name()
In the example
In [2]: x = [1, 5, 4]
x.sort()
x.__class__
Out[2]: list
• x is an object or instance, created from the definition for Python lists, but with its own
particular data
• x.sort() and x.__class__ are two attributes of x
• dir(x) can be used to view all the attributes of x
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure
For example,
• a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all players’ actions, and a timing protocol
These are all abstractions that collect together “objects” of the same “type”
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object
Admittedly a little contrived, this example of a class helps us internalize some new syntax
Here’s one implementation
This class defines instance data wealth and three methods: __init__, earn and spend
• wealth is instance data because each consumer we create (each instance of the Con-
sumer class) will have its own separate wealth data
The ideas behind the earn and spend methods were discussed above
Both of these act on the instance data wealth
The __init__ method is a constructor method
Whenever we create an instance of the class, this method will be called automatically
Calling __init__ sets up a “namespace” to hold the instance data — more on this soon
We’ll also discuss the role of self just below
Usage
Here’s an example of usage
Out[4]: 5
In [5]: c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
In [6]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
Out[6]: 8
In [7]: c1.wealth
Out[7]: 10
In [8]: c1.__dict__
In [9]: c2.__dict__
Out[9]: {'wealth': 8}
When we access or set attributes we’re actually just modifying the dictionary maintained by
the instance
Self
If you look at the Consumer class definition again you’ll see the word self throughout the
code
The rules with self are that
– e.g., the earn method references self.wealth rather than just wealth
• Any method defined within the class should have self as its first argument
There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section, we look at some more formal details related to classes and self
160 12. OOP II: BUILDING CLASSES
• You might wish to skip to the next section on first pass of this lecture
• You can return to these details after you’ve familiarized yourself with more examples
Methods actually live inside a class object formed when the interpreter reads the class defini-
tion
Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code
In [11]: c1 = Consumer(10)
c1.earn(10)
c1.wealth
Out[11]: 20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn
In fact, the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth
For our next example, let’s write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita 𝑘𝑡 evolves according to the rule
𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (1)
1+𝑛
Here
12.4. DEFINING YOUR OWN CLASSES 161
The steady state of the model is the 𝑘 that solves Eq. (1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘
Here’s a class that implements this model
Some points of interest in the code are
• An instance maintains a record of its current capital stock in the variable self.k
– Notice how inside update the reference to the local method h is self.h
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
Here’s a little program that uses the class to compute time series from two different initial
conditions
The common steady state is also plotted for comparison
s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
ax.legend()
plt.show()
Next, let’s write a class for a simple one good market where agents are price takers
The market consists of the following objects:
Here
The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus
Here’s our implementation
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Here’s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
Out[20]: 1.125
Let’s look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map
Let’s write a class for generating time series from this model
Here’s one implementation
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
ax.set_xlabel('$r$', fontsize=16)
plt.show()
12.5. SPECIAL METHODS 167
Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function
Out[25]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
def __len__(self):
return 42
Now we get
In [27]: f = Foo()
len(f)
Out[27]: 42
In [29]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
Out[29]: 50
12.6 Exercises
12.6.1 Exercise 1
The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is
defined as
1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ R) (3)
𝑛 𝑖=1
Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥)
is the fraction of the sample that falls below 𝑥
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 con-
verges to the true distribution function 𝐹
Implement 𝐹𝑛 as a class called ECDF, where
12.7. SOLUTIONS 169
• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥
12.6.2 Exercise 2
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ R) (4)
𝑛=0
The instance data for the class Polynomial will be the coefficients (in the case of Eq. (4),
the numbers 𝑎0 , … , 𝑎𝑁 )
Provide methods that
12.7 Solutions
12.7.1 Exercise 1
In [30]: class ECDF:
In [31]: # == test == #
print(F(0.5))
0.4
0.484
12.7.2 Exercise 2
In [32]: class Polynomial:
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
13
13.1 Contents
• Overview 13.2
• Details 13.3
• Implementation 13.4
• Stochastic Shocks 13.5
• Government Spending 13.6
• Wrapping Everything Into a Class 13.7
• Using the LinearStateSpace Class 13.8
• Pure Multiplier Model 13.9
• Summary 13.10
13.2 Overview
This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [115]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to
171
172 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:
• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply
(To read about linear difference equations see here or chapter IX of [118])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output
Possible dynamic properties include
Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles
(To read about stochastic linear difference equations see chapter XI of [118])
13.3 Details
𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)
𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)
Equations Eq. (1), Eq. (2), and Eq. (3) imply the following second-order linear difference
equation for national income:
𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )
or
where 𝜌1 = (𝑎 + 𝑏) and 𝜌2 = −𝑏
To complete the model, we require two initial conditions
If the model is to generate time series for 𝑡 = 0, … , 𝑇 , we require initial values
̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2
We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large
The deterministic version of the model described so far — meaning that no random shocks
hit aggregate demand — has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand
𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2
or
To discover the properties of the solution of Eq. (6), it is useful first to form the characteris-
tic polynomial for Eq. (6):
𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)
𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)
𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
13.3. DETAILS 175
where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase
These can also be represented as
𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))
𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))
𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2
where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2
When the roots are complex, it is useful to pursue the following calculations
Notice that
𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)
The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms
𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃
So we can write
where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟
We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles
(Draw a cosine function to convince yourself of this please)
176 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Remark: Following [115], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:
Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model
Here is the formula for the matrix 𝐴 in the linear state space system in the case that govern-
ment expenditures are a constant 𝐺:
1 0 0
𝐴 = ⎢𝛾 + 𝐺 𝜌1 𝜌2 ⎤
⎡
⎥
⎣ 0 1 0 ⎦
13.4 Implementation
def param_plot():
"""this function creates the graph on page 189 of Sargent Macroeconomic Theory, second edition, 19
# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])
return fig
param_plot()
plt.show()
The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:
• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles
Later we’ll present the graph with a red mark showing the particular point implied by the
setting of (𝑎, 𝑏)
13.4. IMPLEMENTATION 179
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; therefore get smooth convergence
categorize_solution(1.3, -.4)
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
The following function calculates roots of the characteristic polynomial using high school al-
gebra
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set
roots = []
ρ1 = α + β
ρ2 = -β
print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')
discriminant = ρ1 ** 2 + 4 * ρ2
180 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))
return y_t
plot_y(y_nonstochastic())
ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 181
The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form
𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)
• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots
import cmath
import math
r = .95
period = 10 # Length of cycle in units of time
� = 2 * math.pi/period
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)
ρ1 = ρ1.real
ρ2 = ρ2.real
ρ1, ρ2
Here we’ll use numpy to compute the roots of the characteristic polynomial
p1 = cmath.polar(r1)
p2 = cmath.polar(r2)
r, � = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124, -0.9024999999999999
""" Rather than computing the roots of the characteristic polynomial by hand as we did earlier, t
enlists numpy to do the work for us """
# Useful constants
ρ1 = α + β
ρ2 = -β
categorize_solution(ρ1, ρ2)
return y_t
plot_y(y_nonstochastic())
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
13.4. IMPLEMENTATION 183
a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real
a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
184 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
We can also use sympy to compute analytic formulas for the roots
r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")
Out[12]:
𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2
In [13]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b
Out[13]:
13.5. STOCHASTIC SHOCKS 185
𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2
Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand
"""This function takes parameters of a stochastic version of the model and proceeds to analyze
the roots of the characteristic polynomial and also generate a simulation"""
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
� = np.random.normal(0, 1, n)
return y_t
plot_y(y_stochastic())
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
186 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots
In [15]: r = .97
a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real
a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
13.6. GOVERNMENT SPENDING 187
"""This program computes a response to a permanent increase in government expenditures that occur
at time 20"""
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
� = np.random.normal(0, 1, n)
# Stochastic
else:
� = np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * �[t]
# No government spending
if g == 0:
y_t.append(transition(y_t, t))
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.6. GOVERNMENT SPENDING 189
We can also see the response to a one time jump in government expenditures
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
190 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
.. math::
Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.
"""
def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):
def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
13.7. WRAPPING EVERYTHING INTO A CLASS 191
def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True
def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'
# Stochastic
else:
� = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g + self.σ * �[t]
def generate_series(self):
# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))
def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')
if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')
if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
192 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
print('Non-stochastic series')
if self.g != 0:
print('Government spending equal to ' + str(self.g))
if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))
def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()
return fig
def param_plot(self):
fig = param_plot()
ax = fig.gca()
plt.legend(fontsize=12, loc=3)
return fig
Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20
In [21]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 193
We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed
The red + sign shows the location of the roots
In [22]: sam.param_plot()
plt.show()
194 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class
""" This script maps the Samuelson model in the the ``LinearStateSpace`` class"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = -β
γ = 10
σ = 1
g = 10
n = 100
A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]
x, y = sam_t.simulate(ts_length=n)
axes[-1].set_xlabel('Iteration')
plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 195
Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class
Out[24]:
(2, 6, 1)
(2, 6, 1)
Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴
In [25]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)
We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use
"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):
self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
x, y = self.simulate(ts_length)
axes[-1].set_xlabel('Iteration')
return fig
x, y = self.impulse_response(j)
return fig
13.8.3 Illustrations
In [30]: samlss.plot_irf(100)
plt.show()
In [31]: samlss.multipliers()
Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator
In [33]: pure_multiplier.plot_simulation()
Out[33]:
200 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
In [35]: pure_multiplier.plot_simulation()
13.9. PURE MULTIPLIER MODEL 201
Out[35]:
In [36]: pure_multiplier.plot_irf(100)
202 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Out[36]:
13.10. SUMMARY 203
13.10 Summary
In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [115]
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
204 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
14
14.1 Contents
• Overview 14.2
• Iterables and Iterators 14.3
• Names and Name Resolution 14.4
• Handling Errors 14.5
• Decorators and Descriptors 14.6
• Generators 14.7
• Recursive Function Calls 14.8
• Exercises 14.9
• Solutions 14.10
14.2 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it
It’s here
A variety of topics are treated in the lecture, including generators, exceptions and descriptors
205
206 14. MORE LANGUAGE FEATURES
14.3.1 Iterators
Writing us_cities.txt
In [2]: f = open('us_cities.txt')
f.__next__()
In [3]: f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls
this method
In [4]: next(f)
In [6]: next(e)
Writing test_table.csv
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
In [9]: next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
14.3.3 Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
Out[11]: list
In [12]: next(x)
---------------------------------------------------------------------------
<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)
Out[13]: list
In [14]: y = iter(x)
type(y)
Out[14]: list_iterator
In [15]: next(y)
Out[15]: 'foo'
In [16]: next(y)
14.3. ITERABLES AND ITERATORS 209
Out[16]: 'bar'
In [17]: next(y)
---------------------------------------------------------------------------
<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)
StopIteration:
In [18]: iter(42)
---------------------------------------------------------------------------
<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
For example
Out[19]: 10
In [20]: y = iter(x)
type(y)
Out[20]: list_iterator
210 14. MORE LANGUAGE FEATURES
In [21]: max(y)
Out[21]: 10
One thing to remember about iterators is that they are depleted by use
Out[22]: 10
In [23]: max(y)
---------------------------------------------------------------------------
<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)
In [24]: x = 42
We now know that when this statement is executed, Python creates an object of type int in
your computer’s memory, containing
• the value 42
• some associated attributes
g = f
id(g) == id(f)
14.4. NAMES AND NAME RESOLUTION 211
Out[25]: True
In [26]: g('test')
test
In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then
rebound to another
In [27]: x = 'foo'
id(x)
Out[27]: 139979150881488
14.4.2 Namespaces
In [29]: x = 42
Writing math2.py
Next let’s import the math module from the standard library
In [33]: math.pi
Out[33]: 3.141592653589793
In [34]: math2.pi
Out[34]: 'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary
We can look at the dictionary directly, using module_name.__dict__
math.__dict__.items()
Out[35]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t
math2.__dict__.items()
As you know, we access elements of the namespace using the dotted attribute notation
In [37]: math.pi
Out[37]: 3.141592653589793
Out[38]: True
14.4. NAMES AND NAME RESOLUTION 213
In [39]: vars(math).items()
Out[39]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t
In [40]: dir(math)[0:10]
Out[40]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']
In [41]: print(math.__doc__)
In [42]: math.__name__
Out[42]: 'math'
In [43]: print(__name__)
214 14. MORE LANGUAGE FEATURES
__main__
When we run a script using IPython’s run command, the contents of the file are executed as
part of __main__ too
To see this, let’s create a file mod.py that prints its own __name__ attribute
Writing mod.py
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos
In [47]: x = 2
y = 3
import numpy as np
%whos
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made
In this case it’s __main__, so the namespace of __main__ again becomes the global names-
pace
Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with lo-
cals()
For example, consider
In [49]: f(1)
{'x': 1, 'a': 2}
Out[49]: 2
We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?
In [50]: dir()[0:10]
Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']
In [51]: dir(__builtins__)[0:10]
Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']
In [52]: __builtins__.max
But __builtins__ is special, because we can always access them directly as well
In [53]: max
Out[54]: True
At any point of execution, there are in fact at least two namespaces that can be accessed di-
rectly
(“Accessed directly” means without using a dot, as in pi rather than math.pi)
These namespaces are
If the interpreter is executing a function, then the directly accessible namespaces are
Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Here’s an example that helps to illustrate
Consider a script test.py that looks as follows
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Writing test.py
a = 0 y = 11
In [58]: x
Out[58]: 2
First,
This is a good time to say a little more about mutable vs immutable objects
Consider the code segment
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, it’s a different story when we use a mutable data type such as a list
14.5. HANDLING ERRORS 219
x = [1]
print(f(x), x)
[2] [2]
𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1
• Because the debugging information provided by the interpreter is often less useful than
the information on possible errors you have in your head when writing code
• Because errors causing execution to stop are frustrating if you’re in the middle of a
large computation
• Because it’s reduces confidence in your code on the part of your users (if you are writing
for others)
220 14. MORE LANGUAGE FEATURES
14.5.1 Assertions
If we run this with an array of length one, the program will terminate and print our error
message
In [62]: var([1])
---------------------------------------------------------------------------
<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])
<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Let’s look at how this is done
Exceptions
Here’s an example of a common error type
In [63]: def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Here’s a different kind of error, unrelated to syntax
In [64]: 1 / 0
---------------------------------------------------------------------------
<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0
Here’s another
In [65]: x1 = y1
---------------------------------------------------------------------------
<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1
And another
In [66]: 'foo' + 6
---------------------------------------------------------------------------
<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6
And another
In [67]: X = []
x = X[0]
---------------------------------------------------------------------------
<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]
In [69]: f(2)
Out[69]: 0.5
In [70]: f(0)
In [71]: f(0.0)
In [73]: f(2)
Out[73]: 0.5
In [74]: f(0)
14.6. DECORATORS AND DESCRIPTORS 223
In [75]: f('foo')
In [77]: f(2)
Out[77]: 0.5
In [78]: f(0)
In [79]: f('foo')
Let’s look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peo-
ple’s code
Hence you need to understand them at some stage of your Python education
224 14. MORE LANGUAGE FEATURES
14.6.1 Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar
It’s very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
Suppose we are working on a program that looks something like this
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow
If you try it, you’ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined)
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up
later on
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message
This change is easy enough to implement
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times
14.6. DECORATORS AND DESCRIPTORS 225
The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
226 14. MORE LANGUAGE FEATURES
In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
14.6.2 Descriptors
One potential problem we might have here is that a user alters one of these variables but not
the other
Out[88]: 1000
In [89]: car.kms
Out[89]: 1610.0
Out[90]: 1610.0
In the last two lines we see that miles and kms are out of sync
14.6. DECORATORS AND DESCRIPTORS 227
What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated
A Solution
In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
Out[92]: 1000
Out[93]: 9660.0
The builtin Python function property takes getter and setter methods and creates a prop-
erty
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered — in this case set_miles
Decorators and Properties
These days its very common to see the property function used via a decorator
Here’s another version of our Car class that works as before but now uses decorators to set
up the properties
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
14.7 Generators
Out[95]: tuple
14.7. GENERATORS 229
In [97]: type(plural)
Out[97]: list
Out[98]: generator
In [99]: next(plural)
Out[99]: 'dogs'
In [100]: next(plural)
Out[100]: 'cats'
In [101]: next(plural)
Out[101]: 'birds'
Out[102]: 285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
Out[103]: 285
The most flexible way to create generator objects is to use generator functions
Let’s look at some examples
Example 1
Here’s a very simple example of a generator function
230 14. MORE LANGUAGE FEATURES
It looks like a function, but uses a keyword yield that we haven’t met before
Let’s see how it works after running this code
In [105]: type(f)
Out[105]: function
In [107]: next(gen)
Out[107]: 'start'
In [108]: next(gen)
Out[108]: 'middle'
In [109]: next(gen)
Out[109]: 'end'
In [110]: next(gen)
---------------------------------------------------------------------------
<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)
The second call to next(gen) starts executing from the next line
14.7. GENERATORS 231
In [113]: g
Out[114]: generator
In [115]: next(gen)
Out[115]: 2
In [116]: next(gen)
Out[116]: 4
In [117]: next(gen)
Out[117]: 16
In [118]: next(gen)
---------------------------------------------------------------------------
<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
232 14. MORE LANGUAGE FEATURES
• The body of g() executes until the line yield x, and the value of x is returned
Out[121]: 5001162
But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens
In [122]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
In [124]: n = 10000000
draws = f(n)
draws
In [125]: sum(draws)
Out[125]: 5000216
In summary, iterables
This is not something that you will use every day, but it is still useful — you should learn it
at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing 𝑥𝑡 for some t when
What happens here is that each successive call uses it’s own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution
We’ll meet less contrived applications of recursion later on
234 14. MORE LANGUAGE FEATURES
14.9 Exercises
14.9.1 Exercise 1
The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡
14.9.2 Exercise 2
Complete the following code, and test it using this csv file, which we assume that you’ve put
in your current working directory
dates = column_iterator('test_table.csv', 1)
14.9.3 Exercise 3
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers
14.10. SOLUTIONS 235
14.10 Solutions
14.10.1 Exercise 1
Let’s test it
14.10.2 Exercise 2
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
14.10.3 Exercise 3
7
21
Writing numbers.txt
In [132]: f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
15
Debugging
15.1 Contents
• Overview 15.2
• Debugging 15.3
“Debugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.” – Brian Kernighan
15.2 Overview
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here we’ll focus on Jupyter and leave you to explore other settings
We’ll need the following imports
15.3 Debugging
237
238 15. DEBUGGING
---------------------------------------------------------------------------
<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
This code is intended to plot the log function over the interval [1, 2]
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method
15.3. DEBUGGING 239
But let’s pretend that we don’t understand this for the moment
We might suspect there’s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:
In [3]: ax
---------------------------------------------------------------------------
<ipython-input-3-b00e77935981> in <module>
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates
Let’s try doing it a different way
We run the first cell block again, generating the same error
---------------------------------------------------------------------------
<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
It’s now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
plot_log()
Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10)
242 15. DEBUGGING
Now there won’t be any exception, but the plot won’t look right
To investigate, it would be helpful if we could inspect variables like x during execution of the
function
To this end, we add a “break point” by inserting breakpoint() inside the function code
block
def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now let’s run the script, and investigate via the debugger
> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])
We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q
243
16
Pandas
16.1 Contents
• Overview 16.2
• Series 16.3
• DataFrames 16.4
• Exercises 16.6
• Solutions 16.7
16.2 Overview
245
246 16. PANDAS
Just as NumPy provides the basic array data type plus core array operations, pandas
• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging [1]
• dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture, we will assume that the following imports have taken place
16.3 Series
Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a “column” of data, such as a collection of observations on a
single variable
A DataFrame is an object for storing related columns of data
Let’s start with Series
Out[2]: 0 0.246617
1 1.616297
16.3. SERIES 247
2 1.371344
3 -0.854713
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares
Pandas Series are built on top of NumPy arrays and support many similar operations
In [3]: s * 100
Out[3]: 0 24.661661
1 161.629724
2 137.134394
3 -85.471300
Name: daily returns, dtype: float64
In [4]: np.abs(s)
Out[4]: 0 0.246617
1 1.616297
2 1.371344
3 0.854713
Name: daily returns, dtype: float64
In [5]: s.describe()
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same type—in this case, floats)
In fact, you can use much of the same syntax as Python dictionaries
In [7]: s['AMZN']
248 16. PANDAS
Out[7]: 0.24661661104520952
In [8]: s['AMZN'] = 0
s
In [9]: 'AAPL' in s
Out[9]: True
16.4 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each
variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here
Here’s the content of test_pwt.csv
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","
Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:
In [10]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pw
type(df)
Out[10]: pandas.core.frame.DataFrame
In [11]: df
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
We can select particular rows using standard Python array slicing notation
In [12]: df[2:5]
cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented
as strings
To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]
To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way
Let’s imagine that we’re only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above
Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex
To do this, we set the index to be the country variable in the dataframe
In [17]: df = df.set_index('country')
df
Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions
One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita
df['GDP percap'].plot(kind='bar')
plt.show()
252 16. PANDAS
At the moment the data frame is ordered alphabetically on the countries—let’s change it to
GDP per capita
https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv
One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer
r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv')
1. You are not connected to the Internet — hopefully, this isn’t the case
2. Your machine is accessing the Internet through a proxy server, and Python isn’t aware
of this
Out[25]: 'DATE,VALUE\r'
In [26]: source[1]
Out[26]: '1948-01-01,3.4\r'
In [27]: source[2]
Out[27]: '1948-02-01,3.8\r'
We could now write some additional code to parse this text and store it as an array
But this is unnecessary — pandas’ read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering
The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
16.5. ON-LINE DATA SOURCES 255
In [29]: type(data)
Out[29]: pandas.core.frame.DataFrame
Out[30]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
In [31]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly
Out[31]: VALUE
count 857.0
mean 5.8
std 1.6
min 2.5
25% 4.6
50% 5.6
75% 6.8
max 10.8
We can also plot the unemployment rate from 2006 to 2012 as follows
In [32]: data['2006':'2012'].plot()
plt.show()
256 16. PANDAS
Let’s look at one more example of downloading and manipulating data — this time from the
World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, here’s some data on government debt as a ratio to GDP
If you click on “DOWNLOAD DATA” you will be given the option to download the data as
an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots
time series for the US and Australia
16.6 Exercises
16.6.1 Exercise 1
Write a program to calculate the percentage price change over 2013 for the following shares
A dataset of daily closing prices for the above firms can be found in pan-
das/data/ticker_data.csv and can be downloaded here
Plot the result as a bar graph like follows
16.7 Solutions
16.7.1 Exercise 1
In [35]: ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/tic
ticker.set_index('Date', inplace=True)
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
price_change = pd.Series()
price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()
Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
17
17.1 Contents
• Overview 17.2
• Exercises 17.7
• Solutions 17.8
17.2 Overview
pandas (derived from ‘panel’ and ‘data’) contains powerful and easy-to-use tools for solving
exactly these kinds of problems
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:
259
260 17. PANDAS FOR PANEL DATA
We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex
Additional detail will be added to our DataFrame using pandas’ merge function, and data
will be summarized with the groupby function
Most of this lecture was created by Natasha Watkins
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory
realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/r
The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis
Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
2006-01-01 23,826.64 …
2007-01-01 24,616.84 …
2008-01-01 24,185.70 …
2009-01-01 24,496.84 …
2010-01-01 24,373.76 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88
To more easily filter our time series data, later on, we will convert the index into a Date-
TimeIndex
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
In [5]: type(realwage.columns)
Out[5]: pandas.core.indexes.multi.MultiIndex
In [6]: realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
262 17. PANDAS FOR PANEL DATA
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.un-
stack() works in the opposite direction - try it out)
In [8]: realwage.stack().head()
Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70
Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
2006-01-01 Annual 21,042.28 …
Hourly 10.09 …
2007-01-01 Annual 21,310.05 …
Hourly 10.22 …
2008-01-01 Annual 21,416.96 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56
[5 rows x 64 columns]
We can also pass in an argument to select the level we would like to stack
In [9]: realwage.stack(level='Country').head()
Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars
264 17. PANDAS FOR PANEL DATA
To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)
In [11]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()
[5 rows x 32 columns]
Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here
[5 rows x 17 columns]
First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’
In [14]: realwage_f.transpose().head()
Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91
[5 rows x 11 columns]
We can use either left, right, inner, or outer join to merge our datasets:
We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'
[5 rows x 13 columns]
Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe
In [16]: merged[merged['Continent'].isnull()]
[3 rows x 13 columns]
merged['Country'].map(missing_continents)
Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object
merged[merged['Country'] == 'Korea']
268 17. PANDAS FOR PANEL DATA
[1 rows x 13 columns]
We will also combine the Americas into a single continent - this will make our visualization
nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want
to replace
Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on
By default, levels will be sorted top-down
2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63
[5 rows x 11 columns]
While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format
In [21]: merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
17.5. GROUPING AND SUMMARIZING DATA 269
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged
[5 rows x 32 columns]
Grouping and summarizing data can be particularly useful for understanding large panel
datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max()
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)
In [24]: merged.mean().head(10)
Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set
plt.show()
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)
In [26]: merged.mean(axis=1).head()
Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64
In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
17.5. GROUPING AND SUMMARIZING DATA 271
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
We can also specify a level of the MultiIndex (in the column axis) to aggregate over
We can plot the average minimum wages in each continent as a time series
In [31]: merged.stack().describe()
The groupby method achieves the first step of this process, creating a new
DataFrameGroupBy object with data split into groups
Let’s split merged by continent again, this time using the groupby function, and name the
resulting object grouped
Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using
.size()
In this case, our new data structure is a Series
In [33]: grouped.size()
Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
274 17. PANDAS FOR PANEL DATA
continents = grouped.groups.keys()
This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures
17.7 Exercises
17.7.1 Exercise 1
In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns
17.8. SOLUTIONS 275
Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex
17.7.2 Exercise 2
Filter the above dataframe to only include employment as a percentage of ‘active population’
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries
17.8 Solutions
17.8.1 Exercise 1
In [35]: employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/em
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()
UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00
This is a large dataset so it is useful to explore the levels and variables available
In [36]: employ.columns.names
276 17. PANDAS FOR PANEL DATA
17.8.2 Exercise 2
To easily filter by country, swap GEO to the top level and sort the MultiIndex
We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’
Select only percentage employed in the active population from the dataframe
GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70
18.1 Contents
• Overview 18.2
• Endogeneity 18.5
• Summary 18.6
• Exercises 18.7
• Solutions 18.8
In addition to what’s in Anaconda, this lecture will need the following libraries
18.2 Overview
Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models
Along the way, we’ll discuss a variety of topics, including
As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [3]
279
280 18. LINEAR REGRESSION IN PYTHON
In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around
18.2.1 Prerequisites
18.2.2 Comments
[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes
How do we measure institutional differences and economic outcomes?
In this paper,
• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group
These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.dt
df1.head()
Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index
The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption
We can write our model as
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖
where:
• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)
Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])
X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']
plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
18.3. SIMPLE LINEAR REGRESSION 283
The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, ie.
𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1
where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)
In [5]: df1['const'] = 1
Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments
Out[6]: statsmodels.regression.linear_model.OLS
Out[7]: statsmodels.regression.linear_model.RegressionResultsWrapper
In [8]: print(results.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Using our parameter estimates, we can now write our estimated relationship as
̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖
This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38
Out[9]: 6.515625
Out[10]: 8.3771
An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟
Out[11]: array([8.09156367])
We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes
plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()
So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model
286 18. LINEAR REGRESSION IN PYTHON
Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖
[3] consider other factors such as:
Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta
Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)
results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])
print(results_table)
(0.49) (0.45)
asia -0.15
(0.15)
africa -0.92***
(0.17)
other 0.30
(0.37)
R-squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
18.5 Endogeneity
As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:
The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument
X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']
plt.scatter(X, y, marker='')
plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and expropriation risk')
plt.show()
The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions)
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance
[3] argue this is unlikely because:
• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates
First stage
18.5. ENDOGENEITY 289
The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
𝑙𝑜𝑔𝑒𝑚4𝑖
Therefore, we will estimate the first-stage regression as
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖
The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict()
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model
Our second stage regression is thus
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖
290 18. LINEAR REGRESSION IN PYTHON
results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels
Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)
In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')
print(iv.summary)
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================
Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False
Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development
18.6 Summary
We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmod-
els
If you are familiar with R, you may want to use the formula interface to statsmodels, or
consider using r2py to call R from within Python
18.7 Exercises
18.7.1 Exercise 1
In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖
Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation
If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous
Using the above information, estimate a Hausman test and interpret your results
18.7.2 Exercise 2
The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)
𝑦 = 𝑋𝛽 + 𝑢
To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals
min𝑢̂′ 𝑢̂
𝛽̂
Rearranging the first equation and substituting into the second equation, we can write
Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients
𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture
18.8 Solutions
18.8.1 Exercise 1
In [20]: # Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d
print(reg2.summary())
18.8. SOLUTIONS 293
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous
18.8.2 Exercise 2
In [21]: # Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.d
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])
# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)
β_0 = 4.6
β_1 = 0.53
19.1 Contents
• Overview 19.2
• Summary 19.8
• Exercises 19.9
• Solutions 19.10
19.2 Overview
In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict
295
296 19. MAXIMUM LIKELIHOOD ESTIMATION
19.2.1 Prerequisites
19.2.2 Comments
Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study
The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data
• e.g., the class of all normal distributions, or the class of all gamma distributions
• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞)
We’ll let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates
𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!
We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
19.3. SET UP AND ASSUMPTIONS 297
ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)
plt.show()
Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases
Let’s have a look at the distribution of the data we’ll be working with in this lecture
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth
The dataset mle/fp.dta can be downloaded here or from its AER page
298 19. MAXIMUM LIKELIHOOD ESTIMATION
[5 rows x 36 columns]
Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)
plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3215: MatplotlibDeprecationWarning:
The `xmin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `left` instead.
alternative='`left`', obj_type='argument')
19.4. CONDITIONAL DISTRIBUTIONS 299
From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers)
𝑦
𝜇 𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑖 𝑒−𝜇𝑖 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !
To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖
for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)
ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
300 19. MAXIMUM LIKELIHOOD ESTIMATION
In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate
We will label our entire parameter vector as 𝛽 where
𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦
To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽
Intuitively, we want to find the 𝛽̂ that best fits our data
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 )
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 )
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so
plot_joint_poisson(μ=7, y_n=20)
Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as
𝑛 𝑦
𝜇 𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖
𝑛 𝑦
𝜇 𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)
302 19. MAXIMUM LIKELIHOOD ESTIMATION
Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value
maxℒ(𝛽)
𝛽
The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving
𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1
However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods
Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates
One such numerical method is the Newton-Raphson algorithm
Our goal is to find the maximum likelihood estimate 𝛽̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0
Let’s illustrate this by supposing
ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()
𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot)
Therefore, the likelihood is maximized when 𝛽 = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then
where:
304 19. MAXIMUM LIKELIHOOD ESTIMATION
As can be seen from the updating equation, 𝛽 (𝑘+1) = 𝛽 (𝑘) only when 𝐺(𝛽 (𝑘) ) = 0 ie. where the
first derivative is equal to 0
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Let’s have a go at implementing the Newton-Raphson algorithm
First, we’ll create a class called PoissonRegression so we can easily recompute the values
of the log likelihood, gradient and Hessian for every iteration
def μ(self):
return np.exp(self.X @ self.β)
def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) - μ - np.log(factorial(y)))
def G(self):
y = self.y
μ = self.μ()
return X.T @ (y - μ)
def H(self):
X = self.X
μ = self.μ()
return -(X.T @ (μ * X))
Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector 𝛽 0
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates
Iteration will end when either:
• The difference between the parameter and the updated parameter is below a tolerance
level
• The maximum number of iterations has been achieved (meaning convergence is not
achieved)
19.6. MLE WITH NUMERICAL METHODS 305
So we can get an idea of what’s going on while the algorithm is running, an option dis-
play=True is added to print out values at each iteration
i = 0
error = 100 # Initial error value
# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)
i += 1
return model.β.flatten() # Return a flat array for β (instead of a k_by_1 column vector)
Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
3 -3.3788646 ['-5.92', '0.909', '0.82']
4 -3.3783559 ['-6.07', '0.933', '0.843']
5 -3.3783555 ['-6.08', '0.933', '0.843']
Number of iterations: 6
β_hat = [-6.07848205 0.93340226 0.84329625]
As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations
306 19. MAXIMUM LIKELIHOOD ESTIMATION
You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve
Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller
The gradient vector should be close to 0 at 𝛽̂
In [10]: poi.G()
Out[10]: array([[-3.95169228e-07],
[-1.00114805e-06],
[-7.73114562e-07]])
The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10
β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')
Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize
Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value
X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])
y = np.array([1, 0, 1, 1, 0])
Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture
Treisman starts by estimating equation Eq. (1), where:
• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)
# Add a constant
df['const'] = 1
# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']
Then we can use the Poisson function from statsmodels to fit the model
We’ll use robust standard errors as in the author’s paper
# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())
results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in 2008')
print(results_table)
(0.010) (0.010)
topint08 -0.051***-0.058***
(0.011) (0.012)
nrrents -0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R-squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15
# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']
As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR
19.8 Summary
19.9 Exercises
19.9.1 Exercise 1
Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions
312 19. MAXIMUM LIKELIHOOD ESTIMATION
𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)
Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability)
𝛽 is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution
19.9.2 Exercise 2
Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture
1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥
⎤
⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦
Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement
Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values
19.10 Solutions
19.10.1 Exercise 1
𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1
𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
19.10. SOLUTIONS 313
𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖
𝑛
𝜕 2 log ℒ ′ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙𝑖 (x′𝑖 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x 𝑖 𝛽)[𝑦 𝑖 ′ 2
+ (1 − 𝑦 𝑖 ) ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]
Using these results, we can write a class for the Probit model as follows
class ProbitRegression:
def μ(self):
return norm.cdf(self.X @ self.β.T)
def �(self):
return norm.pdf(self.X @ self.β.T)
def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 - y) * np.log(1 - μ))
def G(self):
μ = self.μ()
� = self.�()
return np.sum((X.T * y * � / μ - X.T * (1 - y) * � / (1 - μ)), axis=1)
def H(self):
X = self.X
β = self.β
μ = self.μ()
� = self.�()
a = (� + (X @ β.T) * μ) / μ**2
b = (� - (X @ β.T) * (1 - μ)) / (1 - μ)**2
return -(� * (y * a + (1 - y) * b) * X.T) @ X
19.10.2 Exercise 2
In [19]: X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
β_hat = [-1.54625858 0.77778952 -0.09709757]
print(Probit(y, X).fit().summary())
315
20
20.1 Contents
• Overview 20.2
• Key Formulas 20.3
• Example: The Money Multiplier in Fractional Reserve Banking 20.4
• Example: The Keynesian Multiplier 20.5
• Example: Interest Rates and Present Values 20.6
• Back to the Keynesian Multiplier 20.7
20.2 Overview
The lecture describes important ideas in economics that use the mathematics of geometric
series
Among these are
(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that
317
318 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯
1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐
To prove key formula Eq. (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1),
then the outcome is the equation 1 = 1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇
1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1)
We now move on to describe some famuous economic applications of geometric series
In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
20.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING319
• In recent times
– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash
• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)
Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfying 0 <
𝑟 < 1), banks create money by issuing deposits backed by fractional reserves plus loans
that they make to their customers
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system
The geometric series formula Eq. (1) is at the heart of the classic model of the money cre-
ation process – one that leads us to the celebrated money multiplier
𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖
The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand)
Ecah bank 𝑖 sets its reserves to satisfy the equation
𝑅𝑖 = 𝑟𝐷𝑖 (2)
• the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
320 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Next we add a theory stating that bank 𝑖 + 1’s deposits depend entirely on loans made by
bank 𝑖, namely
𝐷𝑖+1 = 𝐿𝑖 (3)
Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits
𝐷0 is given exogenously
We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0
Now we do a little algebra
Combining equations Eq. (2) and Eq. (3) tells us that
𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (4)
This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves
Combining equation Eq. (4) with equation Eq. (3) tells us that
Equation Eq. (5) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series
1, (1 − 𝑟), (1 − 𝑟)2 , ⋯
∞
𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (6)
𝑖=0
1 − (1 − 𝑟) 𝑟
The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system
1
Equation Eq. (6) asserts that the money multiplier is 𝑟
20.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 321
• an initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system
∞
according to 𝐷0 = ∑𝑖=0 𝑅𝑖
The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand
𝑐+𝑖 = 𝑦
The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:
𝑐 = 𝑏𝑦
1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
322 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
∞
𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0
∞
1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0
∞
The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next
We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account
𝑐𝑡 = 𝑏𝑦𝑡−1
so that 𝑏 is the marginal propensity to consume (now) out of last period’s income
We begin wtih an initial condition stating that
𝑦−1 = 0
𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0
𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖
and
𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖
and
𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 323
𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖
or
1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏
Evidently, as 𝑡 → +∞,
1
𝑦𝑡 → 𝑖
1−𝑏
Remark 1: The above formula is often applied to assert that an exogenous increase in
investment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by
amounts
at times 0, 1, 2, …
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures
If we generalize the model so that the national income identity becomes
𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡
then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures
We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
𝑅 = 1 + 𝑟 ∈ (1, 2)
Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now
The price at which these exchanges occur is the gross nominal interest rate
We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, …
Two important geometric sequences are
1, 𝑅, 𝑅2 , ⋯ (7)
and
Sequence Eq. (7) tells us how dollar values of an investment accumulate through time
Sequence Eq. (8) tells us how to discount future dollars to get their values in terms of to-
day’s dollars
20.6.1 Accumulation
Geometric sequence Eq. (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence
𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯
20.6.2 Discounting
Geometric sequence Eq. (8) tells us how much future dollars are worth in terms of today’s
dollars
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡
It follows that
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today)
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1
where the last line uses the formula for an infinite geometric series
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and𝑔 are typically
small numbers, e.g., .05 or .03
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,
1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟
1
and the fact that 𝑟 is small to aproximate 1+𝑟 ≈ 1−𝑟
Use this approximation to write 𝑝0 as
326 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔
𝑥0
𝑝0 =
𝑟−𝑔
is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔
We can also extend the asset pricing formula so that it applies to finite leases
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1
1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2
Expanding:
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 327
We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in python
First we plot the true finite stream present-value after computing it below
# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))
Now that we have test run our functions, we can plot some outcomes
First we study the quality of our approximations
In [3]: g = 0.02
r = 0.03
x_0 = 1
T_max = 50
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = finite_lease_pv_approx_f(T, g, r, x_0)
y_3 = finite_lease_pv_approx_s(T, g, r, x_0)
ax.plot(T, y_1, label='True T-period Lease PV')
ax.plot(T, y_2, label='T-period Lease First-order Approx.')
ax.plot(T, y_3, label='T-period Lease First-order Approx. adj.')
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
328 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The above graphs shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetural lease
Now we consider two different views of what happens as 𝑟 and 𝑔 covary
# r ~ g, not defined when r = g, but approximately goes to straight line with slope 1
r = 0.4001
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label=r'$r \approx g$', color='orange')
# r < g
r = 0.4
g = 0.5
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r<g$', color='red')
ax.legend()
plt.show()
330 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The above graphs gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length
𝑇 = +∞ is to have finite value
For fans of 3-d graphs the same point comes through in the following graph
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!
rr, gg = np.meshgrid(r, g)
z = finite_lease_pv(T, gg, rr, x_0)
We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔
We will use a library called SymPy
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease
After that, we’ll use SymPy to compute derivatives
Out[7]:
𝑥0
𝑔+1
− 𝑟+1 + 1
dp0 / dg is:
332 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Out[8]:
𝑥0
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
dp0 / dr is:
Out[9]:
𝑥0 (−𝑔 − 1)
2 2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
always be postive
We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed
# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100
fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 333
In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed
fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in (b_0, b_1, b_2, b_3):
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
334 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Increasing the marginal propensity to consumer 𝑏 increases the path of output over time
Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical
336 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
21
Linear Algebra
21.1 Contents
• Overview 21.2
• Vectors 21.3
• Matrices 21.4
• Exercises 21.8
• Solutions 21.9
21.2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2
The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛
337
338 21. LINEAR ALGEBRA
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
21.3 Vectors
A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by R𝑛
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner
The two most common operators for vectors are addition and scalar multiplication, which we
now describe
As a matter of definition, when we add two vectors, we add them element-by-element
𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑥
⎣ 𝑛⎦ ⎣ 𝑛⎦𝑦 𝑥
⎣ 𝑛 + 𝑦 𝑛⎦
Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces
𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦
scalars = (-2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
more commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
syntax
21.3. VECTORS 341
In [4]: 4 * x
𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1
Out[5]: 12.0
Out[6]: 1.7320508075688772
Out[7]: 1.7320508075688772
21.3.3 Span
Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations
New vectors created in this manner are called linear combinations of 𝐴
In particular, 𝑦 ∈ R𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if
In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination
The set of linear combinations of 𝐴 is called the span of 𝐴
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in R3
The span is a two-dimensional plane passing through these two points and the origin
α, β = 0.2, 0.1
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)
Examples
If 𝐴 contains only one vector 𝑎1 ∈ R2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of R3 , that is
1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥
⎤
⎣0⎦ ⎣0⎦ ⎣1⎦
then the span of 𝐴 is all of R3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ R3 , we can write
𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3
As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors
344 21. LINEAR ALGEBRA
The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 is said to be
Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in R3 as a
plane through the origin
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
As another illustration of the concept, since R𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in R𝑛 must be linearly
dependent
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛
Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛 is linearly independent and
𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘
21.4 Matrices
Matrices are a neat way of organizing data for use in linear operations
21.4. MATRICES 345
Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:
Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations [1]
You can create them manually from tuples of tuples (or lists of lists) as follows
type(A)
Out[9]: tuple
In [10]: A = np.array(A)
type(A)
Out[10]: numpy.ndarray
In [11]: A.shape
Out[11]: (2, 2)
The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
21.5. SOLVING SYSTEMS OF EQUATIONS 347
In [12]: A = np.identity(3)
B = np.ones((3, 3))
2 * A
In [13]: A + B
Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ R𝑘 into
𝑦 = 𝐴𝑥 ∈ R𝑛
These kinds of functions have a special property: they are linear
A function 𝑓 ∶ R𝑘 → R𝑛 is called linear if, for all 𝑥, 𝑦 ∈ R𝑘 and all scalars 𝛼, 𝛽, we have
You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥
𝑦 = 𝐴𝑥 (3)
The problem we face is to determine a vector 𝑥 ∈ R𝑘 that solves Eq. (3), taking 𝑦 and 𝐴 as
given
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥)
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
348 21. LINEAR ALGEBRA
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax = axes[0]
ax = axes[1]
ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)
plt.show()
21.5. SOLVING SYSTEMS OF EQUATIONS 349
In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓
Can we impose conditions on 𝐴 in Eq. (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then
𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘
Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary 𝑦 ∈ R𝑛 , we hope to find a unique 𝑥 ∈ R𝑛 such that 𝑦 = 𝐴𝑥
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of R𝑛
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥
Moreover, the solution is unique
In particular, the following are equivalent
The property of having linearly independent columns is sometimes expressed as having full
column rank
Inverse Matrices
Can we give some sort of expression for the solution?
If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦
A similar expression is available in the matrix case
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative in-
verse matrix 𝐴−1 , with the property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦
This is the solution that we’re looking for
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it here
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted
Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of R𝑛
This span is very “unlikely” to contain arbitrary 𝑦 ∈ R𝑛
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3
Imagine an arbitrarily chosen 𝑦 ∈ R3 , located somewhere in that three-dimensional space
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”
As a result, in the 𝑛 > 𝑘 case we usually give up on existence
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible
To solve this problem, one can use either calculus or the theory of orthogonal projections
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes
This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds
For example, consider the case where 𝑘 = 3 and 𝑛 = 2
Thus, the columns of 𝐴 consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span
R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write
Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code
Out[15]: -2.0
Out[16]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Out[17]: array([[1.],
[1.]])
Out[18]: array([[-1.],
[ 1.]])
Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y)
𝐴𝑣 = 𝜆𝑣
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]
plt.show()
354 21. LINEAR ALGEBRA
The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree 𝑛
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
A = np.array(A)
evals, evecs = eig(A)
evals
In [21]: evecs
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that
𝐴𝑣 = 𝜆𝐵𝑣
We round out our discussion by briefly mentioning several other important topics
Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,
∞
then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1
A generalization of this idea exists in the matrix setting
Matrix Norms
Let 𝐴 be a square matrix, and let
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin [2]
Neumann’s Theorem
Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴
In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ N, then 𝐼 − 𝐴 is invertible,
and
∞
(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0
Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,
Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of 𝐴
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1
In which case Eq. (4) is valid
Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse)
Then
𝜕𝑎′ 𝑥
1. 𝜕𝑥 = 𝑎
𝜕𝐴𝑥 ′
2. 𝜕𝑥 = 𝐴
′
𝜕𝑥 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧 ′
5. 𝜕𝐵 = 𝑦𝑧
21.8 Exercises
21.8.1 Exercise 1
𝑦 = 𝐴𝑥 + 𝐵𝑢
Here
21.9. SOLUTIONS 357
ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1. 𝜆 = −2𝑃 𝑦
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
As we will see, in economic contexts Lagrange multipliers often are shadow prices
Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) −
𝑢′ 𝑄𝑢 with respect to 𝑢. You can verify that this leads to the same maximizer.
21.9 Solutions
s.t.
𝑦 = 𝐴𝑥 + 𝐵𝑢
with primitives
𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies
𝜆 = −2𝑃 𝑦
2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives
𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0
𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0
(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,
which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then
−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,
Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
360 21. LINEAR ALGEBRA
22
22.1 Contents
• Overview 22.2
22.2 Overview
361
362 22. COMPLEX NUMBERS AND TRIGNOMETRY
𝑟 = |𝑧| = √𝑥2 + 𝑦2
The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 )
Therefore,
𝑦
𝜃 = tan−1 ( )
𝑥
22.2.2 An Example
√
Consider the complex number 𝑧 = 1 + 3𝑖
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3
√
It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜
√
Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖
# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)
# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')
ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # less radial ticks
ax.set_rlabel_position(-88.5) # get radial labels away from plotted line
ax.grid(True)
plt.show()
364 22. COMPLEX NUMBERS AND TRIGNOMETRY
𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )
and compute
22.4.1 Example 1
1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟
and thus
𝑥2 + 𝑦2 = 𝑟2
22.4.2 Example 2
𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)
22.4.3 Example 3
This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [115]
Thus, consider a second-order linear difference equation
𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛
𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0
or
(𝑧2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0
has roots 𝑧1 , 𝑧1
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation
Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation
• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions
where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is
𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔
We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃)
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛
366 22. COMPLEX NUMBERS AND TRIGNOMETRY
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 14 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the
above initial condition:
# Set parameters
r = 0.9
θ = π/4
x0 = 4
x1 = 2 * r * sqrt(2)
# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ω+θ) / cos(ω))
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')
# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')
ω = 0.000
p = 2.000
# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)
# Plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')
ax.grid()
plt.show()
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 367
𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖
Since both real and imaginary parts of the above formula should be equal, we get:
368 22. COMPLEX NUMBERS AND TRIGNOMETRY
The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:
# Verify
print("cos(ω)cos(θ) - sin(ω)sin(θ) =", simplify(cos(ω)*cos(θ) - sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =", simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))
We can also compute the trigonometric integrals using polar forms of complex numbers
For example, we want to solve the following integral:
𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋
and thus:
𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2
We can verify the analytical as well as numerical results using integrate in the sympy
package:
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 369
ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)
Out[6]:
sin2 (𝜔)
2
In [7]: print('The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:')
integrate(cos(ω) * sin(ω), (ω, -π, π))
Out[7]:
0
370 22. COMPLEX NUMBERS AND TRIGNOMETRY
23
23.1 Contents
• Overview 23.2
• Exercises 23.9
• Solutions 23.10
23.2 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions
These include, but are not limited to,
371
372 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
• key ideas
• least squares regression
For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [109]
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter
Assume 𝑥, 𝑧 ∈ R𝑛
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors
𝑥 and 𝑧
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧
𝑆 ⟂ is a linear subspace of R𝑛
𝑦 ̂ ∶= min ‖𝑦 − 𝑧‖
𝑧∈𝑆
• 𝑦̂ ∈ 𝑆
• 𝑦 − 𝑦̂ ⟂ 𝑆
Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof
23.4. THE ORTHOGONAL PROJECTION THEOREM 375
For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship
1. 𝑃 𝑦 ∈ 𝑆 and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
For example, to prove 1, observe that 𝑦 = 𝑃 𝑦 + 𝑦 − 𝑃 𝑦 and apply the Pythagorean law
Orthogonal Complement
Let 𝑆 ⊂ R𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every
𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂
We write
𝑌 = 𝑆 ⊕ 𝑆⟂
to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that
𝑦 = 𝑥1 + 𝑥2
Moreover, 𝑥1 = 𝐸𝑆̂ 𝑦 and 𝑥2 = 𝑦 − 𝐸𝑆̂ 𝑦
This amounts to another version of the OPT:
Theorem. If 𝑆 is a linear subspace of R𝑛 , 𝐸𝑆̂ 𝑦 = 𝑃 𝑦 and 𝐸𝑆̂ ⟂ 𝑦 = 𝑀 𝑦, then
𝑘
𝑥 = ∑⟨𝑥, 𝑢𝑖 ⟩𝑢𝑖 for all 𝑥∈𝑆
𝑖=1
To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify
𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1)
𝑗=1
𝑘
⟨𝑥, 𝑢𝑖 ⟩ = ∑ 𝛼𝑗 ⟨𝑢𝑗 , 𝑢𝑖 ⟩ = 𝛼𝑖
𝑗=1
When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then
𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ R𝑛 (2)
𝑖=1
𝑘 𝑘
⟨𝑦 − ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , 𝑢𝑗 ⟩ = ⟨𝑦, 𝑢𝑗 ⟩ − ∑⟨𝑦, 𝑢𝑖 ⟩⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0
𝑖=1 𝑖=1
𝐸𝑆̂ 𝑦 = 𝑃 𝑦
𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′
1. 𝑃 𝑦 ∈ 𝑆, and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
𝑆 ∶= span 𝑋 ∶= span{1 𝑋, … ,𝑘 𝑋}
𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦
𝑘
𝑃 𝑦 = 𝑈 𝑈 ′ 𝑦 = ∑⟨𝑢𝑖 , 𝑦⟩𝑢𝑖
𝑖=1
We have recovered our earlier result about projecting onto the span of an orthonormal basis
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦
Because 𝑋𝑏 ∈ span(𝑋)
If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:
1 𝑁
min ∑(𝑦 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁 𝑛=1 𝑛
23.7. LEAST SQUARES REGRESSION 381
𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
𝑏∈R𝐾
𝑛=1
23.7.2 Solution
𝑦1 𝑥𝑛1
⎛
⎜ 𝑦2 ⎞
⎟ ⎛
⎜ 𝑥𝑛2 ⎞
⎟
𝑦 ∶= ⎜
⎜ ⎟
⎟ , 𝑥𝑛 ∶= ⎜
⎜ ⎟
⎟ = :math:‘n‘-th obs on all regressors
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ 𝑦𝑁 ⎠ ⎝ 𝑥𝑛𝐾 ⎠
and
𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2 = min ‖𝑦 − 𝑋𝑏‖
𝑏∈R𝐾 𝑏∈R𝐾
𝑛=1
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
382 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦
Let’s return to the connection between linear independence and orthogonality touched on
above
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets
The next section gives details
Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ R𝑛 , there exists an orthonormal
set {𝑢1 , … , 𝑢𝑘 } with
23.8.2 QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization
𝑋 = 𝑄𝑅 where
• 𝑥𝑗 ∶=𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with the same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using
Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖
𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1
For matrices 𝑋 and 𝑦 that overdetermine 𝑏𝑒𝑡𝑎 in the linear equation system 𝑦 = 𝑋𝛽, we
found the least squares approximator 𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the QR decomposition 𝑋 = 𝑄𝑅 gives
𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦
Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitu-
tion
23.9 Exercises
23.9.1 Exercise 1
23.9.2 Exercise 2
Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
384 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
23.9.3 Exercise 3
1
𝑦 ∶= ⎛
⎜ 3 ⎞⎟,
⎝ −3 ⎠
and
1 0
𝑋 ∶= ⎛
⎜ 0 −6 ⎞
⎟
⎝ 2 2 ⎠
23.10 Solutions
23.10.1 Exercise 1
23.10.2 Exercise 2
Symmetry and idempotence of 𝑀 and 𝑃 can be established using standard rules for matrix
algebra. The intuition behind idempotence of 𝑀 and 𝑃 is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.)
23.10.3 Exercise 3
Here’s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture
def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.
Parameters
----------
X : an n x k array with linearly independent columns
Returns
-------
U : an n x k array with orthonormal columns
"""
# Set up
n, k = X.shape
U = np.empty((n, k))
23.10. SOLUTIONS 385
I = np.eye(n)
# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))
return U
X = [[1, 0],
[0, -6],
[2, 2]]
First, let’s try projection of 𝑦 onto the column space of 𝑋 using the ordinary matrix expres-
sion:
Now let’s do the same using an orthonormal basis created from our gram_schmidt function
In [4]: U = gram_schmidt(X)
U
This is the same answer. So far so good. Finally, let’s try the same thing but with the basis
obtained via QR decomposition:
Q, R = qr(X, mode='economic')
Q
386 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
24.1 Contents
• Overview 24.2
• Relationships 24.3
• LLN 24.4
• CLT 24.5
• Exercises 24.6
• Solutions 24.7
24.2 Overview
This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold
In addition, we examine several useful extensions of the classical theorems, such as
24.3 Relationships
387
388 24. LLN AND CLT
The LLN gives conditions under which sample moments converge to population moments as
sample size increases
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases
24.4 LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means
The classical law of large numbers concerns independent and identically distributed (IID)
random variables
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹
When it exists, let 𝜇 denote the common mean of this sample:
𝜇 ∶= E𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)
In addition, let
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
P {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
24.4.2 Proof
The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [38]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition
24.4. LLN 389
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with E𝑋𝑖2 < ∞, then, for any 𝜖 > 0,
we have
(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖
Recall the Chebyshev inequality, which tells us that
E[(𝑋̄ 𝑛 − 𝜇)2 ]
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2
2
⎧
{ 1 𝑛 ⎫
}
E[(𝑋̄ 𝑛 − 𝜇)2 ] = E ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
𝑛 𝑛
1
= 2 ∑ ∑ E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= 2 ∑ E(𝑋𝑖 − 𝜇)2
𝑛 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence
Independence means that if 𝑖 ≠ 𝑗, then the covariance term E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛
Combining our last result with Eq. (3), we come to the estimate
𝜎2
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ 2 (4)
𝑛𝜖
24.4.3 Illustration
Let’s now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted
n = 100
for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)
# == Plot == #
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args)
plt.show()
24.4. LLN 391
The three distributions are chosen at random from a selection stored in the dictionary dis-
tributions
What happens if the condition E|𝑋| < ∞ in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy-tailed — the best- known ex-
ample is the Cauchy distribution, which has density
1
𝑓(𝑥) = (𝑥 ∈ R)
𝜋(1 + 𝑥2 )
The next figure shows 100 independent draws from this distribution
n = 100
distribution = cauchy()
392 24. LLN AND CLT
plt.show()
Notice how extreme observations are far more prevalent here than the previous figure
Let’s now have a look at the behavior of the sample mean
In [3]: n = 1000
distribution = cauchy()
# == Plot == #
ax.plot(list(range(n)), sample_mean, 'r-', lw=3, alpha=0.6,
label='$\\bar X_n$')
ax.plot(list(range(n)), [0] * n, 'k--', lw=0.5)
ax.legend()
plt.show()
24.5. CLT 393
Here we’ve increased 𝑛 to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take 𝑛 even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is
̄ 𝑡 𝑛
E𝑒𝑖𝑡𝑋𝑛 = E exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= E ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ E exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛
24.5 CLT
Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means
394 24. LLN AND CLT
The central limit theorem is one of the most remarkable results in all of mathematics
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (6)
𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎
24.5.2 Intuition
The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [38])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact, all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables
In particular, let 𝑋𝑖 be binary, with P{𝑋𝑖 = 0} = P{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8
plt.show()
24.5. CLT 395
When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability
When 𝑛 = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point 𝑘 = 1
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value)
The intuition is the same — there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes even more pronounced
We are witnessing the binomial approximation of the normal distribution
24.5.3 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition
To this end, we now perform the following simulation
√
2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇)
3. Use these draws to compute some measure of their distribution — such as a histogram
4. Compare the latter to 𝑁 (0, 𝜎2 )
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)
# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()
# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()
plt.show()
Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code
24.5. CLT 397
The fit to the normal density is already tight and can be further improved by increasing n
You can also experiment with other specifications of 𝐹
24.5.4 Simulation 2
Our next simulation is somewhat like the first, except that we aim to track the distribution of
√
𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases
In the simulation, we’ll be working with random variables having 𝜇 = 0
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable
√
For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5
beta_dist = beta(2, 2)
def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution is
itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# == Transform rows, so each represents a different distribution == #
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# == Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2} == #
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# == Rescale, so that the random variable is zero mean == #
m, sigma = X.mean(), X.std()
return (X - m) / sigma
nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))
# == Plot == #
ax = fig.gca(projection='3d')
a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)
# == Build verts == #
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))
The law of large numbers and central limit theorem work just as nicely in multidimensional
settings
To state the results, let’s recall some elementary facts about random vectors
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 )
24.5. CLT 399
E[𝑋1 ] 𝜇1
⎛
⎜ E[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞
⎟
E[X] ∶= ⎜
⎜ ⎟
⎟ =⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜⎜ ⋮ ⎟⎟
⎝ E[𝑋 𝑘] 𝜇
⎠ ⎝ 𝑘 ⎠
1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1
P {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (7)
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (8)
400 24. LLN AND CLT
24.6 Exercises
24.6.1 Exercise 1
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (9)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥)
√
Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?
24.6.2 Exercise 2
Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (10)
is valid
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in R𝑘 and A is constant and 𝑘 × 𝑘, then
Var[AX] = A Var[X]A′
24.6. EXERCISES 401
𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in R𝑘 and A is constant and 𝑘 × 𝑘,
then
𝑑
AZ𝑛 → AZ
Third, if S is a 𝑘 × 𝑘 symmetric positive definite matrix, then there exists a symmetric posi-
tive definite matrix Q, called the inverse square root of S, such that
QSQ′ = I
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)
Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2
𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (11)
𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖
where
Hints:
24.7 Solutions
24.7.1 Exercise 1
In [7]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""
# == Set parameters == #
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()
g = np.sin
g_prime = np.cos
# == Plot == #
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
24.7. SOLUTIONS 403
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0
Hence the conditions of the delta theorem are not satisfied
24.7.2 Exercise 2
√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)
√
Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)
𝑑
QY𝑛 → QY
Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal
Its mean is clearly 0, and its variance-covariance matrix is
𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show
Now we turn to the simulation exercise
Our solution is as follows
# == Set parameters == #
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)
# == Compute Σ^{-1/2} == #
Q = inv(sqrtm(Σ))
# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
25
25.1 Contents
• Overview 25.2
• Prediction 25.7
• Code 25.8
• Exercises 25.9
• Solutions 25.10
“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace
In addition to what’s in Anaconda, this lecture will need the following libraries
25.2 Overview
405
406 25. LINEAR STATE SPACE MODELS
– non-financial income
– dividends on a stock
– the money supply
– a government deficit or surplus, etc.
25.3.1 Primitives
1. the matrices 𝐴, 𝐶, 𝐺
2. shock distribution, which we have specialized to 𝑁 (0, 𝐼)
3. the distribution of the initial condition 𝑥0 , which we have set to 𝑁 (𝜇0 , Σ0 )
Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model Eq. (1) pins down the values of the
sequences {𝑥𝑡 } and {𝑦𝑡 }
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }
Later we’ll see how to compute these distributions and their moments
Martingale Difference Shocks
We’ve made the common assumption that the shocks are independent standardized normal
vectors
25.3. THE LINEAR STATE SPACE MODEL 407
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies
This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼)
25.3.2 Examples
To map Eq. (2) into our state space system Eq. (1), we set
1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡ ⎤
⎢ 0 𝜙1 𝜙2 ⎥
𝜙 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦
You can confirm that under these definitions, Eq. (1) and Eq. (2) agree
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1
408 25. LINEAR STATE SPACE MODELS
𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦
The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ]
The next figure shows the dynamics of this process when
Vector Autoregressions
Now suppose that
• 𝑦𝑡 is a 𝑘 × 1 vector
• 𝜙𝑗 is a 𝑘 × 𝑘 matrix and
• 𝑤𝑡 is 𝑘 × 1
𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦
0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦
It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:[1]
𝑥𝑡+4 = 𝑥𝑡
Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations
Time Trends
The model 𝑦𝑡 = 𝑎𝑡 + 𝑏 is known as a linear time trend
We can represent this model in the linear state space form by taking
1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
′
and starting at initial condition 𝑥0 = [0 1]
In fact, it’s possible to use the state-space system to represent polynomial trends of any order
For instance, let
0 1 1 0 0
𝑥0 = ⎢0⎤
⎡
⎥ 𝐴 = ⎢0 1 1 ⎤
⎡
⎥ 𝐶 = ⎢0⎤
⎡
⎥
1
⎣ ⎦ ⎣ 0 0 1 ⎦ 0
⎣ ⎦
It follows that
1 𝑡 𝑡(𝑡 − 1)/2
𝐴𝑡 = ⎡
⎢0 1 𝑡 ⎤
⎥
⎣0 0 1 ⎦
410 25. LINEAR STATE SPACE MODELS
Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1], so that 𝑥𝑡 contains linear and quadratic time trends
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0
1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0
1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0]
0 1
Substituting into the moving average representation Eq. (5), we obtain
𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0
Using Eq. (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡
We’ll explain what unconditional and conditional mean soon
25.4. DISTRIBUTIONS AND MOMENTS 411
This is to distinguish 𝜇𝑡 and Σ𝑡 from related objects that use conditioning information, to be
defined below
However, you should be aware that these “unconditional” moments do depend on the initial
distribution 𝑁 (𝜇0 , Σ0 )
Moments of the Observations
Using linearity of expectations again we have
25.4.2 Distributions
In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given
In particular, given our Gaussian assumptions on the primitives and the linearity of Eq. (1)
we can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 [2]
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact we’ve already done this, in Eq. (6) and Eq. (7)
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have
𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)
In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
25.4. DISTRIBUTIONS AND MOMENTS 413
Let’s now try with 500,000 observations, showing only the histogram (without rotation)
The black line is the population density of 𝑦𝑇 calculated from Eq. (12)
The histogram and population distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distribution’s sufficient statistics
Ensemble Means
In the preceding figure, we approximated the population distribution of 𝑦𝑇 by
Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average
1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇
approximates the expectation E[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers)
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50
414 25. LINEAR STATE SPACE MODELS
The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20)
1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0
𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )
Autocovariance Functions
An important object related to the joint distribution is the autocovariance function
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)
Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models
Let’s start with the intuition
Let’s look at some more time series from the same model that we analyzed above
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
416 25. LINEAR STATE SPACE MODELS
Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞
When such a distribution exists it is called a stationary distribution
Since
𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )
where 𝜇∞ and Σ∞ are fixed points of Eq. (6) and Eq. (7) respectively
25.5. STATIONARITY AND ERGODICITY 417
Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution
Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size
By
Moreover, in view of Eq. (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ ,
which depends on 𝑗 but not on 𝑡
This motivates the following definition
A process {𝑥𝑡 } is said to be covariance stationary if
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡
The difference equation Eq. (7) also has a unique fixed point in this case, and, moreover
𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞
𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0
where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector
′
Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1
It follows that
Let 𝜇1𝑡 = E[𝑥1𝑡 ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one
Then Eq. (15) has a unique stationary solution, namely,
𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎
′
The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1]
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy
Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
25.5. STATIONARITY AND ERGODICITY 419
Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component
Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on Eq. (7) converge to the fixed point of the dis-
crete Lyapunov equation in the first line of Eq. (16)
25.5.5 Ergodicity
1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡
Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
In our linear Gaussian setting, any covariance stationary process is also ergodic
420 25. LINEAR STATE SPACE MODELS
In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce
𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )
25.7 Prediction
The theory of prediction for linear state space systems is elegant and simple
The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0
That E𝑡 [𝑥𝑡+1 ] = E[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property
25.7. PREDICTION 421
More generally, we’d like to compute the 𝑗-step ahead forecasts E𝑡 [𝑥𝑡+𝑗 ] and E𝑡 [𝑦𝑡+𝑗 ]
With a bit of algebra, we obtain
In view of the IID property, current and past state values provide no information about fu-
ture values of the shock
Hence E𝑡 [𝑤𝑡+𝑘 ] = E[𝑤𝑡+𝑘 ] = 0
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is
E𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡
It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors
𝑗−1
𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0
Evidently,
𝑗−1
′
𝑉𝑗 ∶= E𝑡 [(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ]) ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘
′
(21)
𝑘=0
𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡
Under particular conditions, 𝑉𝑗 converges to
𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
422 25. LINEAR STATE SPACE MODELS
Equation Eq. (23) is an example of a discrete Lyapunov equation in the covariance matrix 𝑉∞
A sufficient condition for 𝑉𝑗 to converge is that the eigenvalues of 𝐴 be strictly less than one
in modulus
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of 𝐶 that equal 0
In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system Eq. (1)
We want the following objects
∞
• Forecast of a geometric sum of future 𝑥’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ]
∞
• Forecast of a geometric sum of future 𝑦’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ]
These objects are important components of some famous and interesting dynamic models
For example,
∞
• if {𝑦𝑡 } is a stream of dividends, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price
∞
• if {𝑦𝑡 } is the money supply, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Formulas
Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴]
This leads to our formulas:
∞
E𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
∞
E𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
25.8 Code
Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package
25.9. EXERCISES 423
The code implements a class for handling linear state space models (simulations, calculating
moments, etc.)
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence()
Go back and read the relevant documentation if you’ve forgotten how generator functions
work
Examples of usage are given in the solutions to the exercises
25.9 Exercises
25.9.1 Exercise 1
25.9.2 Exercise 2
25.9.3 Exercise 3
25.9.4 Exercise 4
25.10 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
from quantecon import LinearStateSpace
25.10.1 Exercise 1
In [3]: �_0, �_1, �_2 = 1.1, 0.8, -0.8
A = [[1, 0, 0 ],
424 25. LINEAR STATE SPACE MODELS
ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)
25.10.2 Exercise 2
In [4]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.2
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)
25.10.3 Exercise 3
In [5]: from scipy.stats import norm
import random
I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15
ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)
ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y
ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()
population_means = []
426 25. LINEAR STATE SPACE MODELS
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()
25.10.4 Exercise 4
In [6]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.1
T0 = 10
T1 = 50
T2 = 75
T4 = 100
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)
for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()
Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then Eq. (1)
and Eq. (10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that
every 𝑥𝑡 is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
428 25. LINEAR STATE SPACE MODELS
26
26.1 Contents
• Overview 26.2
• Definitions 26.3
• Simulation 26.4
• Ergodicity 26.8
• Exercises 26.10
• Solutions 26.11
In addition to what’s in Anaconda, this lecture will need the following libraries
26.2 Overview
Markov chains are one of the most useful classes of stochastic processes, being
You will find them in many of the workhorse models of economics and finance
In this lecture, we review some of the theory of Markov chains
429
430 26. FINITE MARKOV CHAINS
We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py
Prerequisite knowledge is basic probability and linear algebra
26.3 Definitions
Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes
It is too not difficult to check [1] that if 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘
for all 𝑘 ∈ N
In other words, knowing the current state is enough to know probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
26.3. DEFINITIONS 431
Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:
26.3.3 Example 1
Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1)
Suppose that, over a one month period,
• 𝑆 = {0, 1}
• 𝑃 (0, 1) = 𝛼 and 𝑃 (1, 0) = 𝛽
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
26.3.4 Example 2
0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞
⎟
⎝ 0 0.508 0.492 ⎠
where
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities
26.4 Simulation
One natural way to answer questions about Markov chains is to simulate them
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.py
• Efficient, bundled with lots of other useful routines for handling Markov chains
However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1
To simulate a Markov chain, we need its stochastic matrix 𝑃 and either an initial state or a
probability distribution 𝜓 for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:
In order to implement this simulation procedure, we need a method for generating draws from
a discrete distribution
For this task, we’ll use DiscreteRV from QuantEcon
26.4. SIMULATION 433
We’ll write our code as a function that takes the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return
return X
0.4 0.6
𝑃 ∶= ( ) (3)
0.2 0.8
As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25
If you run the following code you should get roughly that answer
Out[4]: 0.25109
As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion
Here’s an illustration using the same P as the preceding example
Out[5]: 0.249741
678 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
30.2 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If we want to simulate with output as indices rather than state values we can use
In [11]: mc.simulate_indices(ts_length=4)
Suppose that
26.5.1 Solution
In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
>
𝜓𝑡+1 (𝑦) = ∑ 𝑃 (𝑥, 𝑦)𝜓𝑡 (𝑥)
𝑥∈𝑆
𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)
In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃
By repeating this 𝑚 times we move forward 𝑚 steps into the future
Hence, iterating on Eq. (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚
This is very important, so let’s repeat it
𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)
𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)
Inserting this into Eq. (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚
In particular
Recall the stochastic matrix 𝑃 for recession and growth considered above
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥)
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product
0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞
⎟
⎝ 1 ⎠
The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
rience is described by the specified dynamics, independent of one another
Let 𝜓 be the current cross-sectional distribution over {0, 1}
The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment
The same distribution also describes the fractions of a particular worker’s career spent being
employed and unemployed, respectively
Irreducibility and aperiodicity are central concepts of modern Markov chain theory
Let’s see what they’re about
26.6. IRREDUCIBILITY AND APERIODICITY 437
26.6.1 Irreducibility
The stochastic matrix 𝑃 is called irreducible if all states communicate; that is, if 𝑥 and 𝑦
communicate for all (𝑥, 𝑦) in 𝑆 × 𝑆
For example, consider the following transition probabilities for wealth of a fictitious set of
households
We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes
0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞
⎟
⎝ 0.1 0.1 0.8 ⎠
It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually
We can also test this using QuantEcon.py’s MarkovChain class
Out[12]: True
Here’s a more pessimistic scenario, where the poor are poor forever
438 26. FINITE MARKOV CHAINS
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Let’s confirm this
Out[13]: False
In [14]: mc.communication_classes
It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes
For example, poverty is a life sentence in the second graph but not the first
We’ll come back to this a bit later
26.6.2 Aperiodicity
Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise
Here’s a trivial example with three states
mc = qe.MarkovChain(P)
mc.period
Out[15]: 3
More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers
In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2
mc = qe.MarkovChain(P)
mc.period
Out[16]: 2
In [17]: mc.is_aperiodic
Out[17]: False
As seen in Eq. (4), we can shift probabilities forward one unit of time via postmultiplication
by 𝑃
Some distributions are invariant under this updating process — for example,
• For example, if 𝑃 is the identity matrix, then all distributions are stationary
Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility
This gives some intuition for the following fundamental theorem
Theorem. If 𝑃 is both aperiodic and irreducible, then
26.7.1 Example
Recall our model of employment/unemployment dynamics for a given worker discussed above
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0)
26.7. STATIONARY DISTRIBUTIONS 441
𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment — more on interpretation
below
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0
As discussed above, a given Markov matrix 𝑃 can have many stationary distributions
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,
𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2
Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state
The convergence in the theorem is illustrated in the next figure
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)
plt.show()
Here
The code for the figure can be found here — you might like to try experimenting with differ-
ent initial conditions
26.8 Ergodicity
1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑛 𝑡=1
Here
The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in Eq. (7) is valid
The convergence in Eq. (7) is a special case of a law of large numbers result for Markov
chains — see EDTC, section 4.3.4 for some additional information
26.8.1 Example
𝛽
𝑝=
𝛼+𝛽
E[ℎ(𝑋𝑡 )] (8)
E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)
where
ℎ(𝑥1 )
ℎ=⎛
⎜ ⋮ ⎞
⎟
⎝ ℎ(𝑥𝑛 ) ⎠
The unconditional expectation Eq. (8) is easy: We just sum over the distribution of 𝑋𝑡 to get
E[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ
For the conditional expectation Eq. (9), we need to sum over the conditional distribution of
𝑋𝑡+𝑘 given 𝑋𝑡 = 𝑥
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so
∞
E [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
26.10. EXERCISES 445
where
(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯
26.10 Exercises
26.10.1 Exercise 1
According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be
𝛽
𝑝 ∶=
𝛼+𝛽
In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where
1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1
(You don’t need to add the fancy touches to the graph—see the solution if you’re interested)
26.10.2 Exercise 2
Now let’s think about which pages are likely to be important, in the sense of being valuable
to a search engine user
One possible criterion for the importance of a page is the number of inbound links — an indi-
cation of popularity
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting 𝑗 be (the integer index of) a typical page and 𝑟𝑗 be its ranking, we set
𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖
where
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 )
There is, however, another interpretation, and it brings us back to Markov chains
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link
448 26. FINITE MARKOV CHAINS
𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖
Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page
Here “random” means that each link is selected with equal probability
Since 𝑟 is the stationary distribution of 𝑃 , assuming that the uniform ergodicity condition is
valid, we can interpret 𝑟𝑗 as the fraction of time that a (very persistent) random surfer spends
at page 𝑗
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank
The data for this graph is in the web_graph_data.txt file — you can also view it here
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n
A typical line from the file has the form
d -> h;
In [21]: import re
When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a
26.10. EXERCISES 449
26.10.3 Exercise 3
𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2
Tauchen’s method [128] is the most common method for approximating this continuous state
process with a finite state Markov chain
A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise
As a first step, we choose
Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ R and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that
• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦
• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)
Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 )
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the deriva-
tion, the rules are as follows:
1. If 𝑗 = 0, then set
1. If 𝑗 = 𝑛 − 1, then set
1. Otherwise, set
450 26. FINITE MARKOV CHAINS
26.11 Solutions
26.11.1 Exercise 1
Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability
In [24]: α = β = 0.1
N = 10000
p = β / (α + β)
ax.legend(loc='upper right')
plt.show()
26.11. SOLUTIONS 451
26.11.2 Exercise 2
First, save the data into a file called web_graph_data.txt by executing the next cell
m -> g;
n -> c;
n -> j;
n -> m;
Writing web_graph_data.txt
In [26]: """
Return list of pages, ordered by rank
"""
import numpy as np
from operator import itemgetter
infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'
Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911
26.11.3 Exercise 3
[1] Hint: First show that if 𝑃 and 𝑄 are stochastic matrices then so is their product — to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that 𝑃 𝑛
is a stochastic matrix using induction.
454 26. FINITE MARKOV CHAINS
27
27.1 Contents
• Overview 27.2
• Stability 27.5
• Exercises 27.6
• Solutions 27.7
• Appendix 27.8
In addition to what’s in Anaconda, this lecture will need the following libraries
27.2 Overview
In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that
• evolve in discrete-time
• are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as we’ll see later on
455
456 27. CONTINUOUS STATE MARKOV CHAINS
The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,
Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.
Note
For some people, the term “Markov chain” always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[95]) in using the term to refer to any discrete time Markov process
You are probably aware that some distributions can be represented by densities and some
cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition
Once we’ve built some intuition we’ll cover the general case
In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space 𝑆
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnega-
tive square matrix 𝑃 = 𝑃 [𝑖, 𝑗] such that each row 𝑃 [𝑖, ⋅] sums to one
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖
to state 𝑗 in one unit of time
In symbols,
P{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]
Equivalently,
27.3. THE DENSITY CASE 457
(As you probably recall, when using NumPy arrays, 𝑃 [𝑖, ⋅] is expressed as P[i, :])
In this section, we’ll allow 𝑆 to be a subset of R, such as
• R itself
• the positive reals (0, ∞)
• a bounded interval (𝑎, 𝑏)
The family of discrete distributions 𝑃 [𝑖, ⋅] will be replaced by a family of densities 𝑝(𝑥, ⋅), one
for each 𝑥 ∈ 𝑆
Analogous to the finite state case, 𝑝(𝑥, ⋅) is to be understood as the distribution (density) of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥
More formally, a stochastic kernel on 𝑆 is a function 𝑝 ∶ 𝑆 × 𝑆 → R with the property that
1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (1)
2𝜋 2
IID
𝑋𝑡+1 = 𝑋𝑡 + 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (2)
To see this, let’s find the stochastic kernel 𝑝 corresponding to Eq. (2)
Recall that 𝑝(𝑥, ⋅) represents the distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
Letting 𝑋𝑡 = 𝑥 in Eq. (2) and considering the distribution of 𝑋𝑡+1 , we see that 𝑝(𝑥, ⋅) =
𝑁 (𝑥, 1)
In other words, 𝑝 is exactly 𝑝𝑤 , as defined in Eq. (1)
In the previous section, we made the connection between stochastic difference equation
Eq. (2) and stochastic kernel Eq. (1)
In economics and time-series analysis we meet stochastic difference equations of all different
shapes and sizes
It will be useful for us if we have some systematic methods for converting stochastic difference
equations into stochastic kernels
458 27. CONTINUOUS STATE MARKOV CHAINS
To this end, consider the generic (scalar) stochastic difference equation given by
IID
• {𝜉𝑡 } ∼ 𝜙, where 𝜙 is a given density on R
• 𝜇 and 𝜎 are given functions on 𝑆, with 𝜎(𝑥) > 0 for all 𝑥
Example 1: The random walk Eq. (2) is a special case of Eq. (3), with 𝜇(𝑥) = 𝑥 and 𝜎(𝑥) =
1
Example 2: Consider the ARCH model
This is a special case of Eq. (3) with 𝜇(𝑥) = 𝛼𝑥 and 𝜎(𝑥) = (𝛽 + 𝛾𝑥2 )1/2
Example 3: With stochastic production and a constant savings rate, the one-sector neoclas-
sical growth model leads to a law of motion for capital per worker such as
Here
• 𝛿 is a depreciation rate
(The fixed savings rate can be rationalized as the optimal policy for a particular set of tech-
nologies and preferences (see [87], section 3.1.2), although we omit the details here)
Equation Eq. (5) is a special case of Eq. (3) with 𝜇(𝑥) = (1 − 𝛿)𝑥 and 𝜎(𝑥) = 𝑠𝑓(𝑥)
Now let’s obtain the stochastic kernel corresponding to the generic model Eq. (3)
To find it, note first that if 𝑈 is a random variable with density 𝑓𝑈 , and 𝑉 = 𝑎 + 𝑏𝑈 for some
constants 𝑎, 𝑏 with 𝑏 > 0, then the density of 𝑉 is given by
1 𝑣−𝑎
𝑓𝑉 (𝑣) = 𝑓𝑈 ( ) (6)
𝑏 𝑏
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3)
27.3. THE DENSITY CASE 459
Taking Eq. (6) as given for the moment, we can obtain the stochastic kernel 𝑝 for Eq. (3) by
recalling that 𝑝(𝑥, ⋅) is the conditional density of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
In the present case, this is equivalent to stating that 𝑝(𝑥, ⋅) is the density of 𝑌 ∶= 𝜇(𝑥) +
𝜎(𝑥) 𝜉𝑡+1 when 𝜉𝑡+1 ∼ 𝜙
Hence, by Eq. (6),
1 𝑦 − 𝜇(𝑥)
𝑝(𝑥, 𝑦) = 𝜙( ) (7)
𝜎(𝑥) 𝜎(𝑥)
For example, the growth model in Eq. (5) has stochastic kernel
1 𝑦 − (1 − 𝛿)𝑥
𝑝(𝑥, 𝑦) = 𝜙( ) (8)
𝑠𝑓(𝑥) 𝑠𝑓(𝑥)
In this section of our lecture on finite Markov chains, we asked the following question: If
This intuitive equality states that the probability of being at 𝑗 tomorrow is the probability of
visiting 𝑖 today and then going on to 𝑗, summed over all possible 𝑖
In the density case, we just replace the sum with an integral and probability mass functions
with densities, yielding
Note
Unlike most operators, we write 𝑃 to the right of its argument, instead of to the
left (i.e., 𝜓𝑃 instead of 𝑃 𝜓). This is a common convention, with the intention be-
ing to maintain the parallel with the finite case — see here
With this notation, we can write Eq. (9) more succinctly as 𝜓𝑡+1 (𝑦) = (𝜓𝑡 𝑃 )(𝑦) for all 𝑦, or,
dropping the 𝑦 and letting “=” indicate equality of functions,
𝜓𝑡+1 = 𝜓𝑡 𝑃 (11)
Equation Eq. (11) tells us that if we specify a distribution for 𝜓0 , then the entire sequence of
future distributions can be obtained by iterating with 𝑃
It’s interesting to note that Eq. (11) is a deterministic difference equation
Thus, by converting a stochastic difference equation such as Eq. (3) into a stochastic kernel 𝑝
and hence an operator 𝑃 , we convert a stochastic difference equation into a deterministic one
(albeit in a much higher dimensional space)
Note
Some people might be aware that discrete Markov chains are in fact a special case
of the continuous Markov chains we have just described. The reason is that proba-
bility mass functions are densities with respect to the counting measure.
27.3.4 Computation
To learn about the dynamics of a given process, it’s useful to compute and study the se-
quences of densities generated by the model
One way to do this is to try to implement the iteration described by Eq. (10) and Eq. (11)
using numerical integration
However, to produce 𝜓𝑃 from 𝜓 via Eq. (10), you would need to integrate at every 𝑦, and
there is a continuum of such 𝑦
Another possibility is to discretize the model, but this introduces errors of unknown size
A nicer alternative in the present setting is to combine simulation with an elegant estimator
called the look-ahead estimator
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of
which we repeat here for convenience:
Our aim is to compute the sequence {𝜓𝑡 } associated with this model and fixed initial condi-
tion 𝜓0
To approximate 𝜓𝑡 by simulation, recall that, by definition, 𝜓𝑡 is the density of 𝑘𝑡 given 𝑘0 ∼
𝜓0
If we wish to generate observations of this random variable, all we need to do is
27.3. THE DENSITY CASE 461
1 𝑛
𝜓𝑡𝑛 (𝑦) 𝑖
= ∑ 𝑝(𝑘𝑡−1 , 𝑦) (13)
𝑛 𝑖=1
1 𝑛 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) → E𝑝(𝑘𝑡−1
𝑖
, 𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡−1 (𝑥) 𝑑𝑥 = 𝜓𝑡 (𝑦)
𝑛 𝑖=1
27.3.5 Implementation
A class called LAE for estimating densities by this technique can be found in lae.py
Given our use of the __call__ method, an instance of LAE acts as a callable object, which
is essentially a function that can store its own data (see this discussion)
This function returns the right-hand side of Eq. (13) using
• the data and stochastic kernel that it stores as its instance data
• the value 𝑦 as its argument
The function is vectorized, in the sense that if psi is such an instance and y is an array, then
the call psi(y) acts elementwise
462 27. CONTINUOUS STATE MARKOV CHAINS
(This is the reason that we reshaped X and y inside the class — to make vectorization work)
Because the implementation is fully vectorized, it is about as efficient as it would be in C or
Fortran
27.3.6 Example
The following code is an example of usage for the stochastic growth model described above
# == Define parameters == #
s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # We set f(k) = k**α
ψ_0 = beta(5, 5, scale=0.5) # Initial distribution
� = lognorm(a_σ)
# == Generate T instances of LAE using this data, one for each date t == #
laes = [LAE(p, k[:, t]) for t in range(T)]
# == Plot == #
fig, ax = plt.subplots()
ygrid = np.linspace(0.01, 4.0, 200)
greys = [str(g) for g in np.linspace(0.0, 0.8, T)]
greys.reverse()
for ψ, g in zip(laes, greys):
ax.plot(ygrid, ψ(ygrid), color=g, lw=2, alpha=0.6)
ax.set_xlabel('capital')
ax.set_title(f'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={T}$')
plt.show()
27.4. BEYOND DENSITIES 463
The figure shows part of the density sequence {𝜓𝑡 }, with each density computed via the look-
ahead estimator
Notice that the sequence of densities shown in the figure seems to be converging — more on
this in just a moment
Another quick comment is that each of these distributions could be interpreted as a cross-
sectional distribution (recall this discussion)
Up until now, we have focused exclusively on continuous state Markov chains where all condi-
tional distributions 𝑝(𝑥, ⋅) are densities
As discussed above, not all distributions can be represented as densities
If the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 cannot be represented as a density for
some 𝑥 ∈ 𝑆, then we need a slightly different theory
The ultimate option is to switch from densities to probability measures, but not all readers
will be familiar with measure theory
We can, however, construct a fairly general theory using distribution functions
To illustrate the issues, recall that Hopenhayn and Rogerson [67] study a model of firm dy-
namics where individual firm productivity follows the exogenous process
464 27. CONTINUOUS STATE MARKOV CHAINS
IID
𝑋𝑡+1 = 𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 , where {𝜉𝑡 } ∼ 𝑁 (0, 𝜎2 )
If you think about it, you will see that for any given 𝑥 ∈ [0, 1], the conditional distribution of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥 puts positive probability mass on 0 and 1
Hence it cannot be represented as a density
What we can do instead is use cumulative distribution functions (cdfs)
To this end, set
This family of cdfs 𝐺(𝑥, ⋅) plays a role analogous to the stochastic kernel in the density case
The distribution dynamics in Eq. (9) are then replaced by
Here 𝐹𝑡 and 𝐹𝑡+1 are cdfs representing the distribution of the current state and next period
state
The intuition behind Eq. (14) is essentially the same as for Eq. (9)
27.4.2 Computation
If you wish to compute these cdfs, you cannot use the look-ahead estimator as before
Indeed, you should not use any density estimator, since the objects you are estimat-
ing/computing are not densities
One good option is simulation as before, combined with the empirical distribution function
27.5 Stability
In our lecture on finite Markov chains, we also studied stationarity, stability and ergodicity
Here we will cover the same topics for the continuous case
We will, however, treat only the density case (as in this section), where the stochastic kernel
is a family of densities
The general case is relatively similar — references are given below
27.5. STABILITY 465
Analogous to the finite case, given a stochastic kernel 𝑝 and corresponding Markov operator
as defined in Eq. (10), a density 𝜓∗ on 𝑆 is called stationary for 𝑃 if it is a fixed point of the
operator 𝑃
In other words,
As with the finite case, if 𝜓∗ is stationary for 𝑃 , and the distribution of 𝑋0 is 𝜓∗ , then, in
view of Eq. (11), 𝑋𝑡 will have this same distribution for all 𝑡
Hence 𝜓∗ is the stochastic equivalent of a steady state
In the finite case, we learned that at least one stationary distribution exists, although there
may be many
When the state space is infinite, the situation is more complicated
Even existence can fail very easily
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210)
However, there are well-known conditions under which a stationary density 𝜓∗ exists
With additional conditions, we can also get a unique stationary density (𝜓 ∈ 𝒟 and 𝜓 =
𝜓𝑃 ⟹ 𝜓 = 𝜓∗ ), and also global convergence in the sense that
∀ 𝜓 ∈ 𝒟, 𝜓𝑃 𝑡 → 𝜓∗ as 𝑡 → ∞ (16)
This combination of existence, uniqueness and global convergence in the sense of Eq. (16) is
often referred to as global stability
Under very similar conditions, we get ergodicity, which means that
1 𝑛
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑛 → ∞ (17)
𝑛 𝑡=1
for any (measurable) function ℎ ∶ 𝑆 → R such that the right-hand side is finite
Note that the convergence in Eq. (17) does not depend on the distribution (or value) of 𝑋0
This is actually very important for simulation — it means we can learn about 𝜓∗ (i.e., ap-
proximate the right-hand side of Eq. (17) via the left-hand side) without requiring any special
knowledge about what to do with 𝑋0
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the “edges” of the state space
2. Sufficient “mixing” obtains
As stated above, the growth model treated here is stable under mild conditions on the primi-
tives
We can see this stability in action — in particular, the convergence in Eq. (16) — by simulat-
ing the path of densities from various initial conditions
Here is such a figure
All sequences are converging towards the same limit, regardless of their initial condition
The details regarding initial conditions and so on are given in this exercise, where you are
asked to replicate the figure
In the preceding figure, each sequence of densities is converging towards the unique stationary
density 𝜓∗
Even from this figure, we can get a fair idea what 𝜓∗ looks like, and where its mass is located
However, there is a much more direct way to estimate the stationary density, and it involves
only a slight modification of the look-ahead estimator
27.6. EXERCISES 467
Let’s say that we have a model of the form Eq. (3) that is stable and ergodic
Let 𝑝 be the corresponding stochastic kernel, as given in Eq. (7)
To approximate the stationary density 𝜓∗ , we can simply generate a long time-series
𝑋0 , 𝑋1 , … , 𝑋𝑛 and estimate 𝜓∗ via
1 𝑛
𝜓𝑛∗ (𝑦) = ∑ 𝑝(𝑋𝑡 , 𝑦) (18)
𝑛 𝑡=1
This is essentially the same as the look-ahead estimator Eq. (13), except that now the obser-
vations we generate are a single time-series, rather than a cross-section
The justification for Eq. (18) is that, with probability one as 𝑛 → ∞,
1 𝑛
∑ 𝑝(𝑋𝑡 , 𝑦) → ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥 = 𝜓∗ (𝑦)
𝑛 𝑡=1
where the convergence is by Eq. (17) and the equality on the right is by Eq. (15)
The right-hand side is exactly what we want to compute
On top of this asymptotic result, it turns out that the rate of convergence for the look-ahead
estimator is very good
The first exercise helps illustrate this point
27.6 Exercises
27.6.1 Exercise 1
IID
𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (19)
This is one of those rare nonlinear stochastic models where an analytical expression for the
stationary density is available
In particular, provided that |𝜃| < 1, there is a unique stationary density 𝜓∗ given by
𝜃𝑦
𝜓∗ (𝑦) = 2 𝜙(𝑦) Φ [ ] (20)
(1 − 𝜃2 )1/2
Here 𝜙 is the standard normal density and Φ is the standard normal cdf
As an exercise, compute the look-ahead estimate of 𝜓∗ , as defined in Eq. (18), and compare it
with 𝜓∗ in Eq. (20) to see whether they are indeed close for large 𝑛
In doing so, set 𝜃 = 0.8 and 𝑛 = 500
The next figure shows the result of such a computation
468 27. CONTINUOUS STATE MARKOV CHAINS
The additional density (black line) is a nonparametric kernel density estimate, added to the
solution for illustration
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look-ahead estimator is a much tighter fit than the kernel density estima-
tor
If you repeat the simulation you will see that this is consistently the case
27.6.2 Exercise 2
27.6.3 Exercise 3
{𝑋1 , … , 𝑋𝑛 } ∼ 𝐿𝑁 (0, 1), {𝑌1 , … , 𝑌𝑛 } ∼ 𝑁 (2, 1), and {𝑍1 , … , 𝑍𝑛 } ∼ 𝑁 (4, 1),
In [3]: n = 500
x = np.random.randn(n) # N(0, 1)
x = np.exp(x) # Map x to lognormal
y = np.random.randn(n) + 2.0 # N(2, 1)
z = np.random.randn(n) + 4.0 # N(4, 1)
Each data set is represented by a box, where the top and bottom of the box are the third and
first quartiles of the data, and the red line in the center is the median
The boxes give some indication as to
initial_conditions = np.linspace(8, 0, J)
27.7 Solutions
27.7.1 Exercise 1
and 𝜉𝑡 ∼ 𝑁 (0, 1)
Try running at n = 10, 100, 1000, 10000 to get an idea of the speed of convergence
� = norm()
n = 500
θ = 0.8
# == Frequently used constants == #
d = np.sqrt(1 - θ**2)
δ = θ / d
def ψ_star(y):
"True stationary density of the TAR Model"
return 2 * norm.pdf(y) * norm.cdf(δ * y)
Z = �.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = θ * np.abs(X[t]) + d * Z[t]
ψ_est = LAE(p, X)
k_est = gaussian_kde(X)
27.7.2 Exercise 2
� = lognorm(a_σ)
for i in range(4):
ax = axes[i]
ax.set_xlim(0, xmax)
ψ_0 = beta(5, 5, scale=0.5, loc=i*2) # Initial distribution
27.7.3 Exercise 3
In [6]: n = 20
k = 5000
J = 6
θ = 0.9
d = np.sqrt(1 - θ**2)
δ = θ / d
for j in range(J):
axes[j].set_ylim(-4, 8)
axes[j].set_title(f'time series from t = {initial_conditions[j]}')
27.7. SOLUTIONS 473
Z = np.random.randn(k, n)
X[:, 0] = initial_conditions[j]
for t in range(1, n):
X[:, t] = θ * np.abs(X[:, t-1]) + d * Z[:, t]
axes[j].boxplot(X)
plt.show()
474 27. CONTINUOUS STATE MARKOV CHAINS
27.8. APPENDIX 475
27.8 Appendix
28.1 Contents
• Overview 28.2
28.2 Overview
This lecture describes a model that Tjalling Koopmans [78] and David Cass [24] used to ana-
lyze optimal growth
The model can be viewed as an extension of the model of Robert Solow described in an ear-
lier lecture but adapted to make the savings rate the outcome of an optimal choice
(Solow assumed a constant saving rate determined outside the model)
We describe two versions of the model to illustrate what is, in fact, a more general connection
between a planned economy and an economy organized as a competitive equilibrium
The lecture uses important ideas including
• We shall encounter this trick in this lecture and also in this lecture
477
478 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
• An application of a guess and verify method for solving a system of difference equa-
tions
• The intimate connection between the cases for the optimality of two competing visions
of good ways to organize an economy, namely:
• A turnpike property that describes optimal paths for long-but-finite horizon economies
• A non-stochastic version of a theory of the term structure of interest rates
𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡 (1)
𝑡=0
1−𝛾
where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function
28.3. THE GROWTH MODEL 479
Note that
𝐶𝑡1−𝛾
𝑢(𝐶𝑡 ) = (2)
1−𝛾
𝑇
ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ = ∑ 𝛽 𝑡 {𝑢(𝐶𝑡 ) + 𝜇𝑡 (𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 )}
𝑡=0
𝛼
𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 = 𝑁𝑡 𝐴 ( )
𝑁𝑡
𝛼
𝐾 𝐾
𝑓 ( 𝑡) = 𝐴( 𝑡)
𝑁𝑡 𝑁𝑡
480 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝑁𝑡 𝑓 ( )
𝑁𝑡
𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
=
𝜕𝐾 𝜕𝐾𝑡
𝐾 1
= 𝑁𝑡 𝑓 ′ ( 𝑡 ) (Chain rule)
𝑁𝑡 𝑁𝑡 (6)
𝐾
= 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁 =1
𝑡
′
= 𝑓 (𝐾𝑡 )
Also
𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
= (Product rule)
𝜕𝑁 𝜕𝑁𝑡
𝐾 𝐾 −𝐾
= 𝑓 ( 𝑡 ) +𝑁𝑡 𝑓 ′ ( 𝑡 ) 2𝑡 (Chain rule)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝐾 𝐾 𝐾
= 𝑓 ( 𝑡 ) − 𝑡 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁𝑡 𝑁𝑡 𝑁 =1
𝑡
• Note: Our objective function and constraints satisfy conditions that work to assure
that required second-order conditions are satisfied at an allocation that satisfies the
first-order conditions that we are about to compute
Here are the first order necessary conditions for extremization (i.e., maximization with
respect to 𝐶,⃗ 𝐾,⃗ minimization with respect to 𝜇):
⃗
𝜕𝐹
Note that in Eq. (8) we plugged in for 𝜕𝐾 using our formula Eq. (6) above
Because 𝑁𝑡 = 1 for 𝑡 = 1, … , 𝑇 , need not differentiate with respect to those arguments
Note that Eq. (9) comes from the occurrence of 𝐾𝑡 in both the period 𝑡 and period 𝑡 − 1 fea-
sibility constraints
Eq. (10) comes from differentiating with respect to 𝐾𝑇 +1 in the last period and applying the
following condition called a Karush-Kuhn-Tucker condition (KKT):
𝜇𝑇 𝐾𝑇 +1 = 0 (11)
Rewriting gives
Taking the inverse of the utility function on both sides of the above equation gives
−1
′−1 𝛽
𝐶𝑡+1 = 𝑢 (( ′ [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)]) )
𝑢 (𝐶𝑡 )
1/𝛾
𝐶𝑡+1 = (𝛽𝐶𝑡𝛾 [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
1/𝛾
= 𝐶𝑡 (𝛽[𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
In [2]: @njit
def u(c, γ):
'''
Utility function
ASIDE: If you have a utility function that is hard to solve by hand
you can use automatic or symbolic differentiation
See https://github.com/HIPS/autograd
'''
if γ == 1:
## If γ = 1 we can show via L'hopital's Rule that the utility becomes log
return np.log(c)
else:
482 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
return c**(1 - γ) / (1 - γ)
@njit
def u_prime(c, γ):
'''Derivative of utility'''
if γ == 1:
return 1 / c
else:
return c**(-γ)
@njit
def u_prime_inv(c, γ):
'''Inverse utility'''
if γ == 1:
return c
else:
return c**(-1 / γ)
@njit
def f(A, k, α):
'''Production function'''
return A * k**α
@njit
def f_prime(A, k, α):
'''Derivative of production function'''
return α * A * k**(α - 1)
@njit
def f_prime_inv(A, k, α):
return (k / (A * α))**(1 / (α - 1))
We shall use a shooting method to compute an optimal allocation 𝐶,⃗ 𝐾⃗ and an associated
Lagrange multiplier sequence 𝜇⃗
The first-order necessary conditions for the planning problem, namely, equations Eq. (7),
Eq. (8), and Eq. (9), form a system of difference equations with two boundary conditions:
• Given 𝜇0 and 𝑘0 , we could compute 𝑐0 from equation Eq. (7) and then 𝑘1 from equation
Eq. (9) and 𝜇1 from equation Eq. (8)
• We could then iterate on to compute the remaining elements of 𝐶,⃗ 𝐾,⃗ 𝜇⃗
The following Python code implements the shooting algorithm for the planning problem
We make a slight modification starting with a guess of 𝑐0 but since 𝑐0 is a function of 𝜇0
there is no difference to the procedure above
We’ll apply it with an initial guess that will turn out not to be perfect, as we’ll soon see
In [3]: # Parameters
γ = 2
δ = 0.02
β = 0.95
α = 0.33
A = 1
# Initial guesses
T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
k = np.zeros(T+2) # T periods of capital initialized to 0 (T+2 to include t+1 variable as well)
k[0] = 0.3 # Initial k
c[0] = 0.2 # Guess of c_0
@njit
def shooting_method(c, # Initial consumption
k, # Initial capital
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate on capital# Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A): # Technology
T = len(c) - 1
for t in range(T):
k[t+1] = f(A=A, k=k[t], α=α) + (1 - δ) * k[t] - c[t] # Equation 1 with inequality
if k[t+1] < 0: # Ensure nonnegativity
k[t+1] = 0
return c, k
paths = shooting_method(c, k, γ, δ, β, α, A)
ax.scatter(T+1, 0, s=80)
ax.axvline(T+1, color='k', ls='--', lw=1)
plt.tight_layout()
plt.show()
Evidently, our initial guess for 𝜇0 is too high and makes initial consumption is too low
We know this because we miss our 𝐾𝑇 +1 = 0 target on the high side
Now we automate things with a search-for-a-good 𝜇0 algorithm that stops when we hit the
target 𝐾𝑡+1 = 0
The search procedure is to use a bisection method
Here is how we apply the bisection method
We take an initial guess for 𝐶0 (we can eliminate 𝜇0 because 𝐶0 is an exact function of 𝜇0 )
We know that the lowest 𝐶0 can ever be is 0 and the largest it can be is initial output 𝑓(𝐾0 )
We take a 𝐶0 guess and shoot forward to 𝑇 + 1
If the 𝐾𝑇 +1 > 0, let it be our new lower bound on 𝐶0
If 𝐾𝑇 +1 < 0, let it be our new upper bound
Make a new guess for 𝐶0 exactly halfway between our new upper and lower bounds
Shoot forward again and iterate the procedure
When 𝐾𝑇 +1 gets close enough to 0 (within some error tolerance bounds), stop and declare
victory
In [4]: @njit
def bisection_method(c,
k,
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate on capital# Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A, # Technology
tol=1e-4,
max_iter=1e4,
terminal=0): # Value we are shooting towards
T = len(c) - 1
i = 1 # Initial iteration
c_high = f(k=k[0], α=α, A=A) # Initial high value of c
28.3. THE GROWTH MODEL 485
while (np.abs((path_k[T+1] - terminal)) > tol or path_k[T] == terminal) and i < max_iter:
μ = u_prime(c=path_c, γ=γ)
return path_c, path_k, μ
In [5]: T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
k = np.zeros(T+2) # T periods of capital initialized to 0. T+2 to include t+1 variable as well.
paths = bisection_method(c, k, γ, δ, β, α, A)
T = len(paths[0])
if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))
plot_paths(paths)
𝑓(𝐾)̄ − 𝛿 𝐾̄ = 𝐶 ̄ (13)
𝑢′ (𝐶)̄ ′ ̄
1=𝛽 [𝑓 (𝐾) + (1 − 𝛿)]
𝑢′ (𝐶)̄
1
Defining 𝛽 = 1+𝜌 , and cancelling gives
Simplifying gives
𝑓 ′ (𝐾)̄ = 𝜌 + 𝛿
and
𝐾̄ = 𝑓 ′−1 (𝜌 + 𝛿)
𝛼𝐾̄ 𝛼−1 = 𝜌 + 𝛿
Finally, using 𝛼 = .33, 𝜌 = 1/𝛽 − 1 = 1/(19/20) − 1 = 20/19 − 19/19 = 1/19, 𝛿 = 1/50, we get
67
33 100
𝐾̄ = ( 1
100
1 ) ≈ 9.57583
50 + 19
Let’s verify this with Python and then use this steady state 𝐾̄ as our initial capital stock 𝐾0
28.3. THE GROWTH MODEL 487
In [6]: ρ = 1 / β - 1
k_ss = f_prime_inv(k=ρ+δ, A=A, α=α)
Now we plot
In [7]: T = 150
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss # Start at steady state
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss)
Evidently, in this economy with a large value of 𝑇 , 𝐾𝑡 stays near its initial value at the until
the end of time approaches closely
Evidently, the planner likes the steady state capital stock and wants to stay near there for a
long time
Let’s see what happens when we push the initial 𝐾0 below 𝐾̄
plot_paths(paths, ss=k_ss)
Notice how the planner pushes capital toward the steady state, stays near there for a while,
then pushes 𝐾𝑡 toward the terminal value 𝐾𝑇 +1 = 0 as 𝑡 gets close to 𝑇
The following graphs compare outcomes as we vary 𝑇
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)
The following calculation shows that when we set 𝑇 very large the planner makes the capital
stock spend most of its time close to its steady state value
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)
The different colors in the above graphs are tied to outcomes with different horizons 𝑇
Notice that as the horizon increases, the planner puts 𝐾𝑡 closer to the steady state value 𝐾̄
for longer
This pattern reflects a turnpike property of the steady state
A rule of thumb for the planner is
• for whatever 𝐾0 you start with, push 𝐾𝑡 toward the steady state and stay there for as
long as you can
In loose language: head for the turnpike and stay near it for as long as you can
As we drive 𝑇 toward +∞, the planner keeps 𝐾𝑡 very close to its steady state for all dates
after some transition toward the steady state
𝑓(𝐾𝑡 )−𝐶𝑡
The planner makes the saving rate 𝑓(𝐾𝑡 ) vary over time
Let’s calculate it
In [11]: @njit
def S(K):
'''Aggregate savings'''
T = len(K) - 2
S = np.zeros(T+1)
for t in range(T+1):
S[t] = K[t+1] - (1 - δ) * K[t]
return S
@njit
def s(K):
'''Savings rate'''
T = len(K) - 2
Y = f(A, K, α)
Y = Y[0:T+1]
s = S(K) / Y
return s
T = len(paths[0])
k_star = paths[1]
savings_path = s(k_star)
new_paths = (paths[0], paths[1], savings_path)
if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_savings(paths, k_ss=k_ss, axes=axes)
𝛿 𝐾̄
𝑠̄ =
𝑓(𝐾)̄
28.3. THE GROWTH MODEL 491
In [12]: T = 130
# Steady states
S_ss = δ * k_ss
c_ss = f(A, k_ss, α) - S_ss
s_ss = S_ss / f(A, k_ss, α)
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3 # Start below steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)
28.3.5 Exercise
• Plot the optimal consumption, capital, and savings paths when the initial capital level
begins at 1.5 times the steady state level as we shoot towards the steady state at 𝑇 =
130
• Why does the savings rate respond like it does?
28.3.6 Solution
In [13]: T = 130
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss * 1.5 # Start above steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)
492 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
Next, we study a decentralized version of an economy with the same technology and prefer-
ence structure as our planned economy
But now there is no planner
Market prices adjust to reconcile distinct decisions that are made separately by a representa-
tive household and a representative firm
The technology for producing goods and accumulating capital via physical investment re-
mains as in our planned economy
There is a representative consumer who has the same preferences over consumption plans as
did the consumer in the planned economy
Instead of being told what to consume and save by a planner, the household chooses for itself
subject to a budget constraint
• At each time 𝑡, the household receives wages and rentals of capital from a firm – these
comprise its income at time 𝑡
• The consumer decides how much income to allocate to consumption or to savings
• The household can save either by acquiring additional physical capital (it trades one
for one with time 𝑡 consumption) or by acquiring claims on consumption at dates other
than 𝑡
• A utility-maximizing household owns all physical capital and labor and rents them to
the firm
• The household consumes, supplies labor, and invests in physical capital
• A profit-maximizing representative firm operates the production technology
• The firm rents labor and capital each period from the representative household and sells
its output each period to the household
• The representative household and the representative firm are both price takers:
– they (correctly) believe that prices are not affected by their choices
Note: We are free to think of there being a large number 𝑀 of identical representative con-
sumers and 𝑀 identical representative firms
28.4. COMPETITIVE EQUILIBRIUM 493
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘̃ 𝑡
𝐹𝑘 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝜂𝑡
and
𝐹𝑛 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝑤𝑡 (14)
for 𝛼 ∈ (0, 1)
𝜕𝐹
Taking the partial derivative 𝜕𝛼 on both sides of the above equation gives
𝜕𝐹 ̃ 𝜕𝐹
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) =chain rule 𝑘𝑡 + 𝑛̃
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
𝜕𝐹 ̃ 𝜕𝐹
𝑘𝑡 + 𝑛̃ − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
or
𝜕𝐹 𝜕𝐹
( − 𝜂𝑡 ) 𝑘̃ 𝑡 + ( − 𝑤𝑡 ) 𝑛̃ 𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡
494 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
𝜕𝐹 𝜕𝐹
Because 𝐹 is homogeneous of degree 1, it follows that 𝜕 𝑘̃ 𝑡
and 𝜕 𝑛̃ 𝑡 are homogeneous of degree
0 and therefore fixed with respect to 𝑘̃ 𝑡 and 𝑛̃ 𝑡
If 𝜕𝐹
𝜕 𝑘̃ 𝑡
> 𝜂𝑡 , then the firm makes positive profits on each additional unit of 𝑘̃ 𝑡 , so it will want
to make 𝑘̃ 𝑡 arbitrarily large
But setting 𝑘̃ 𝑡 = +∞ is not physically feasible, so presumably equilibrium prices will assume
values that present the firm with no such arbitrage opportunity
𝜕𝐹
A related argument applies if 𝜕 𝑛̃ 𝑡 > 𝑤𝑡
𝜕 𝑘̃ 𝑡
If 𝜕 𝑘̃ 𝑡
< 𝜂𝑡 , the firm will set 𝑘̃ 𝑡 to zero
𝑤𝑡 1 + 𝜂 𝑡 𝑘 𝑡
Here (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) is the household’s net investment in physical capital and 𝛿 ∈ (0, 1) is
again a depreciation rate of capital
In period 𝑡 is free to purchase more goods to be consumed and invested in physical capital
than its income from supplying capital and labor to the firm, provided that in some other
periods its income exceeds its purchases
A household’s net excess demand for time 𝑡 consumption goods is the gap
There is a single grand competitive market in which a representative household can trade
date 0 goods for goods at all other dates 𝑡 = 1, 2, … , 𝑇
What matters are not bilateral trades of the good at one date 𝑡 for the good at another date
𝑡 ̃ ≠ 𝑡.
28.4. COMPETITIVE EQUILIBRIUM 495
Instead, think of there being multilateral and multitemporal trades in which bundles of
goods at some dates can be traded for bundles of goods at some other dates.
There exist complete markets in such bundles with associated market prices
Because 𝑞𝑡0 is a relative price, the units in terms of which prices are quoted are arbitrary –
we can normalize them without substantial consequence
If we use the price vector {𝑞𝑡0 }𝑇𝑡=0 to evaluate a stream of excess demands {𝑒𝑡 }𝑇𝑡=0 we compute
𝑇
the present value of {𝑒𝑡 }𝑇𝑡=0 to be ∑𝑡=0 𝑞𝑡0 𝑒𝑡
That the market is multitemporal is reflected in the situation that the household faces a
single budget constraint
It states that the present value of the household’s net excess demands must be zero:
𝑇
∑ 𝑞𝑡0 𝑒𝑡 ≤ 0
𝑡=0
or
𝑇
∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )) ≤ 0
𝑡=0
𝑇
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐,⃗ 𝑘⃗ 𝑡=0
𝑇
subject to ∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 ) ≤ 0
𝑡=0
28.4.6 Definitions
– Given the price system, the allocation solves the household’s problem
– Given the price system, the allocation solves the firm’s problem
𝜂𝑡 = 𝑓 ′ (𝐾𝑡 ) (17)
and so on
If our guess for the equilibrium price system is correct, then it must occur that
𝑘𝑡∗ = 𝑘̃ 𝑡∗ (19)
1 = 𝑛̃ ∗𝑡 (20)
∗
𝑐𝑡∗ + 𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡∗ = 𝐹 (𝑘̃ 𝑡∗ , 𝑛̃ ∗𝑡 )
We shall verify that for 𝑡 = 0, … , 𝑇 the allocations chosen by the household and the firm both
equal the allocation that solves the planning problem:
Our approach is to stare at first-order necessary conditions for the optimization problems of
the household and the firm
At the price system we have guessed, both sets of first-order conditions are satisfied at the
allocation that solves the planning problem
To solve the household’s problem, we formulate the appropriate Lagrangian and pose the
min-max problem:
𝑇 𝑇
min max ℒ(𝑐,⃗ 𝑘,⃗ 𝜆) = ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) + 𝜆 (∑ 𝑞𝑡0 (((1 − 𝛿)𝑘𝑡 − 𝑤𝑡 ) + 𝜂𝑡 𝑘𝑡 − 𝑐𝑡 − 𝑘𝑡+1 ))
𝜆 𝑐,⃗ 𝑘⃗ 𝑡=0 𝑡=0
𝑇
𝜆∶ (∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 )) ≤ 0 (24)
𝑡=0
Now we plug in for our guesses of prices and derive all the FONC of the planner problem
Eq. (7)-Eq. (10):
Combining Eq. (22) and Eq. (15), we get:
𝑢′ (𝐶𝑡 ) = 𝜇𝑡
Rewriting Eq. (26) by dividing by 𝜆 on both sides (which is nonzero due to u’>0) we get:
or
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + (𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 ) − 𝑓(𝐾𝑡 ) + 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 ) ≤ 0
𝑡=0
Cancelling,
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1)) ≤ 0
𝑡=0
Since 𝛽 𝑡 and 𝜇𝑡 are always positive here, (excepting perhaps the T+1 period) we get:
−𝛽 𝑇 +1 𝜇𝑇 +1 ≤ 0
−𝜇𝑇 +1 ≤ 0
𝜕𝐹 (𝐾𝑡 , 1)
= 𝑓 ′ (𝐾𝑡 ) = 𝜂𝑡
𝜕𝐾𝑡
𝜕𝐹 (𝐾̃ 𝑡 , 1)
= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 = 𝑤𝑡
𝜕 𝐿̃
By Eq. (19) and Eq. (20) this allocation is identical to the one that solves the consumer’s
problem
Note: Because budget sets are affected only by relative prices, {𝑞0𝑡 } is determined only up to
multiplication by a positive constant
Normalization: We are free to choose a {𝑞0𝑡 } that makes 𝜆 = 1, thereby making 𝑞0𝑡 be mea-
sured in units of the marginal utility of time 0 goods
We will also plot q, w and 𝜂 below to show the prices that induce the same aggregate move-
ments we saw earlier in the planning problem.
In [14]: @njit
def q_func(β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c) - 2
q = np.zeros(T+1)
q[0] = 1
for t in range(1, T+2):
q[t] = β**t * u_prime(c[t], γ)
return q
@njit
def w_func(A, k, α):
w = f(A, k, α) - k * f_prime(A, k, α)
return w
@njit
def η_func(A, k, α):
η = f_prime(A, k, α)
return η
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]
plt.tight_layout()
plt.show()
Varying Curvature
Now we see how our results change if we keep 𝑇 constant, but allow the curvature parameter,
𝛾 to vary, starting with 𝐾0 below the steady state.
We plot the results for 𝑇 = 150
for γ in γ_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]
axes[0, 0].legend()
plt.tight_layout()
plt.show()
Now, we compute Hicks-Arrow prices again, but also calculate the implied yields to maturity
This will let us plot a yield curve
The key formulas are:
The yield to maturity
𝑡
log 𝑞𝑡 0
𝑟𝑡0 ,𝑡 = −
𝑡 − 𝑡0
−𝛾
𝑡 𝑢′ (𝑐𝑡 ) 𝑡−𝑡0 𝑐𝑡
𝑞𝑡 0 = 𝛽 𝑡−𝑡0 = 𝛽
𝑢′ (𝑐𝑡0 ) 𝑐𝑡−𝛾
0
We redefine our function for 𝑞 to allow arbitrary base years, and define a new function for 𝑟,
then plot both
First, we plot when 𝑡0 = 0 as before, for different values of 𝑇 , with 𝐾0 below the steady state
In [17]: @njit
def q_func(t_0, β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c)
q = np.zeros(T+1-t_0)
q[0] = 1
for t in range(t_0+1, T):
q[t-t_0] = β**(t - t_0) * u_prime(c[t], γ) / u_prime(c[t_0], γ)
return q
502 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
@njit
def r_func(t_0, β, c, γ):
'''Yield to maturity'''
T = len(c) - 1
r = np.zeros(T+1-t_0)
for t in range(t_0+1, T+1):
r[t-t_0]= -np.log(q_func(t_0, β, c, γ)[t-t_0]) / (t - t_0)
return r
t_0 = 0
T_list = [150, 75, 50]
γ = 2
titles = ['Hicks-Arrow Prices', 'Yields']
ylabels = ['$q_t^0$', '$r_t^0$']
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)
plt.tight_layout()
plt.show()
In [18]: t_0 = 20
for T in T_list:
c = np.zeros(T+1)
28.4. COMPETITIVE EQUILIBRIUM 503
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)
We shall have more to say about the term structure of interest rates in a later lecture on the
topic
504 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
29
29.1 Contents
• Overview 29.2
• Convergence 29.4
• Implementation 29.5
• Exercises 29.6
• Solutions 29.7
In addition to what’s in Anaconda, this lecture will need the following libraries
29.2 Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
505
506 29. A FIRST LOOK AT THE KALMAN FILTER
The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists
A missile has been launched from country Y and our mission is to track it
Let 𝑥 ∈ R2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸
𝑝 = 𝑁 (𝑥,̂ Σ) (1)
where 𝑥̂ is the mean of the distribution and Σ is a 2 × 2 covariance matrix. In our simula-
tions, we will suppose that
This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥̂
Parameters
----------
x : array_like(float)
Random variable
y : array_like(float)
Random variable
σ_x : array_like(float)
Standard deviation of random variable x
σ_y : array_like(float)
Standard deviation of random variable y
μ_x : scalar(float)
Mean value of random variable x
μ_y : scalar(float)
Mean value of random variable y
σ_xy : array_like(float)
Covariance of random variables x and y
"""
x_μ = x - μ_x
y_μ = y - μ_y
Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
plt.show()
508 29. A FIRST LOOK AT THE KALMAN FILTER
We are now presented with some good news and some bad news
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9)
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦
Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
29.3. THE BASIC IDEA 509
Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via
𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)
• 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ)
• In view of Eq. (3), the conditional density 𝑝(𝑦 | 𝑥) is 𝑁 (𝐺𝑥, 𝑅)
• 𝑝(𝑦) does not depend on 𝑥, and enters into the calculations only as a normalizing con-
stant
510 29. A FIRST LOOK AT THE KALMAN FILTER
Because we are in a linear and Gaussian framework, the updated density can be computed by
calculating population linear regressions
In particular, the solution is known [1] to be
𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )
where
𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)
Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden ob-
ject 𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map
The original density is left in as contour lines for comparison
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
29.3. THE BASIC IDEA 511
Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥 ̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in
Eq. (2)
But now let’s suppose that we are given another task: to predict the location of the missile
after one unit of time (whatever that may be) has elapsed
To do this we need a model of how the state evolves
Let’s suppose that we have one, and that it’s linear and Gaussian. In particular,
Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time
In view of Eq. (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work
out the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄)
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian
Elementary calculations and the expressions in Eq. (4) tell us that
and
The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥̂
𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄
The predictive distribution is the new density shown in the following figure, where the update
has used parameters
1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2
# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
29.3. THE BASIC IDEA 513
plt.show()
Repeating Eq. (6), the dynamics for 𝑥𝑡̂ and Σ𝑡 are as follows
𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄
These are the standard dynamic equations for the Kalman filter (see, for example, [87], page
58)
29.4 Convergence
29.5 Implementation
The class Kalman from the QuantEcon.py package implements the Kalman filter
𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′
• The class Kalman from the QuantEcon.py package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures
29.6 Exercises
29.6.1 Exercise 1
Consider the following simple application of the Kalman filter, loosely based on [87], section
2.9.2
Suppose that
The task of this exercise to simulate the model and, using the code from kalman.py, plot
the first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 )
As shown in [87], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1
Your figure should – modulo randomness – look something like this
29.6.2 Exercise 2
The preceding figure gives some support to the idea that probability mass converges to 𝜃
To get a better idea, choose a small 𝜖 > 0 and calculate
𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖
for 𝑡 = 0, 1, 2, … , 𝑇
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600
Your figure should show error erratically declining something like this
29.6. EXERCISES 517
29.6.3 Exercise 3
As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 )
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1
This competitor will use the conditional expectation E[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error
(More precisely, the minimizer of E ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶=
E[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error
Our horse race will be assessed in terms of squared error
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity
Set
0.5 0.4
𝐴=( )
0.6 0.3
0.9 0.3
Σ0 = ( )
0.3 0.9
Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state
29.6.4 Exercise 4
29.7 Solutions
In [6]: from quantecon import Kalman
from quantecon import LinearStateSpace
from scipy.stats import norm
29.7.1 Exercise 1
In [7]: # == parameters == #
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
29.7. SOLUTIONS 519
# == set up plot == #
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ - 5, θ + 2, 200)
for i in range(N):
# == record the current predicted mean and variance == #
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# == plot, update filter == #
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')
kalman.update(y[i])
29.7.2 Exercise 2
In [8]: from scipy.integrate import quad
� = 0.1
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
520 29. A FIRST LOOK AT THE KALMAN FILTER
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)
T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()
for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]
kalman.update(y[t])
29.7.3 Exercise 3
In [9]: from numpy.random import multivariate_normal
from scipy.linalg import eigvals
29.7. SOLUTIONS 521
A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)
# === Set up state space mode, initial value x_0 set to zero === #
ss = LinearStateSpace(A, C, G, H, mu_0 = np.zeros(2))
# == Print eigenvalues of A == #
print("Eigenvalues of A:")
print(eigvals(A))
# == Print stationary Σ == #
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)
e1 = np.empty(T-1)
e2 = np.empty(T-1)
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k-', lw=2, alpha=0.6, label='Kalman filter error')
ax.plot(range(1, T), e2, 'g-', lw=2, alpha=0.6, label='Conditional expectation error')
ax.legend()
plt.show()
Eigenvalues of A:
[ 0.9+0.j -0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
[0.1050718 0.41061709]]
522 29. A FIRST LOOK AT THE KALMAN FILTER
Footnotes
[1] See, for example, page 93 of [18]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
30
30.1 Contents
%matplotlib inline
np.set_printoptions(linewidth=120, precision=4, suppress=True)
This lecture uses the Kalman filter to reformulate John F. Muth’s first paper [98] about ratio-
nal expectations
Muth used classical prediction methods to reverse engineer a stochastic process that renders
optimal Milton Friedman’s [43] “adaptive expectations” scheme
Milton Friedman [43] (1956) posited that consumer’s forecast their future disposable income
with the adaptive expectations scheme
∞
∗
𝑦𝑡+𝑖,𝑡 = 𝐾 ∑(1 − 𝐾)𝑗 𝑦𝑡−𝑗 (1)
𝑗=0
∗
where 𝐾 ∈ (0, 1) and 𝑦𝑡+𝑖,𝑡 is a forecast of future 𝑦 over horizon 𝑖
523
524 30. REVERSE ENGINEERING A LA MUTH
Milton Friedman justified the exponential smoothing forecasting scheme Eq. (1) infor-
mally, noting that it seemed a plausible way to use past income to forecast future income
In his first paper about rational expectations, John F. Muth [98] reverse-engineered a univari-
ate stochastic process {𝑦𝑡 }∞𝑡=−∞ for which Milton Friedman’s adaptive expectations scheme
gives linear least forecasts of 𝑦𝑡+𝑗 for any horizon 𝑖
Muth sought a setting and a sense in which Friedman’s forecasting scheme is optimal
That is, Muth asked for what optimal forecasting question is Milton Friedman’s adaptive
expectation scheme the answer
Muth (1960) used classical prediction methods based on lag-operators and 𝑧-transforms to
find the answer to his question
Please see lectures Classical Control with Linear Algebra and Classical Filtering and Predic-
tion with Linear Algebra for an introduction to the classical tools that Muth used
Rather than using those classical tools, in this lecture we apply the Kalman filter to express
the heart of Muth’s analysis concisely
The lecture First Look at Kalman Filter describes the Kalman filter
We’ll use limiting versions of the Kalman filter corresponding to what are called stationary
values in that lecture
Suppose that an observable 𝑦𝑡 is the sum of an unobserved random walk 𝑥𝑡 and an IID shock
𝜖2,𝑡 :
𝑥𝑡+1 = 𝑥𝑡 + 𝜎𝑥 𝜖1,𝑡+1
(2)
𝑦𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡
where
𝜖
[ 1,𝑡+1 ] ∼ 𝒩(0, 𝐼)
𝜖2,𝑡
is an IID process
Note: A property of the state-space representation Eq. (2) is that in general neither 𝜖1,𝑡 nor
𝜖2,𝑡 is in the space spanned by square-summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , …
𝜖
In general [ 1,𝑡 ] has more information about future 𝑦𝑡+𝑗 ’s than is contained in 𝑦𝑡 , 𝑦𝑡−1 , …
𝜖2𝑡
We can use the asymptotic or stationary values of the Kalman gain and the one-step-ahead
conditional state covariance matrix to compute a time-invariant innovations representation
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾𝑎𝑡
(3)
𝑦𝑡 = 𝑥𝑡̂ + 𝑎𝑡
Note: A key property about an innovations representation is that 𝑎𝑡 is in the space spanned
by square summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , …
For more ramifications of this property, see the lectures Shock Non-Invertibility and Recursive
Models of Dynamic Linear Economies
Later we’ll stack these state-space systems Eq. (2) and Eq. (3) to display some classic findings
of Muth
But first, let’s create an instance of the state-space system Eq. (2) then apply the quantecon
Kalman class, then uses it to construct the associated “innovations representation”
Now we want to map the time-invariant innovations representation Eq. (3) and the original
state-space system Eq. (2) into a convenient form for deducing the impulse responses from
the original shocks to the 𝑥𝑡 and 𝑥𝑡̂
Putting both of these representations into a single state-space system is yet another applica-
tion of the insight that “finding the state is an art”
We’ll define a state vector and appropriate state-space matrices that allow us to represent
both systems in one fell swoop
Note that
𝑎𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂
so that
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾(𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂ )
= (1 − 𝐾)𝑥𝑡̂ + 𝐾𝑥𝑡 + 𝐾𝜎𝑦 𝜖2,𝑡
𝑥𝑡+1 1 0 0 𝑥𝑡 𝜎𝑥 0
⎡ 𝑥̂ ⎤ = ⎡𝐾 (1 − 𝐾) 𝐾𝜎 ⎤ ⎡ 𝑥̂ ⎤ + ⎡ 0 0⎤ [𝜖1,𝑡+1 ]
⎢ 𝑡+1 ⎥ ⎢ 𝑦⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝜖
⎣𝜖2,𝑡+1 ⎦ ⎣ 0 0 0 ⎦ ⎣𝜖2,𝑡 ⎦ ⎣ 0 1⎦ 2,𝑡+1
526 30. REVERSE ENGINEERING A LA MUTH
𝑥
𝑦𝑡 1 0 𝜎𝑦 ⎡ 𝑡 ⎤
[ ]=[ ] 𝑥̂
𝑎𝑡 1 −1 𝜎𝑦 ⎢ 𝑡 ⎥
⎣𝜖2,𝑡 ⎦
𝜖
is a state-space system that tells us how the shocks [ 1,𝑡+1 ] affect states 𝑥𝑡+1
̂ , 𝑥𝑡 , the observ-
𝜖2,𝑡+1
able 𝑦𝑡 , and the innovation 𝑎𝑡
With this tool at our disposal, let’s form the composite system and simulate it
In [4]: # Create grand state-space for y_t, a_t as observed vars -- Use stacking trick above
Af = np.array([[ 1, 0, 0],
[K1, 1 - K1, K1 * σ_y],
[ 0, 0, 0]])
Cf = np.array([[σ_x, 0],
[ 0, K1 * σ_y],
[ 0, 1]])
Gf = np.array([[1, 0, σ_y],
[1, -1, σ_y]])
Now that we have simulated our joint system, we have 𝑥𝑡 , 𝑥𝑡̂ , and 𝑦𝑡
We can now investigate how these variables are related by plotting some key objects
First, let’s plot the hidden state 𝑥𝑡 and the filtered version 𝑥𝑡̂ that is linear-least squares pro-
jection of 𝑥𝑡 on the history 𝑦𝑡−1 , 𝑦𝑡−2 , …
We see above that 𝑦 seems to look like white noise around the values of 𝑥
30.2.5 Innovations
Recall that we wrote down the innovation representation that depended on 𝑎𝑡 . We now plot
the innovations {𝑎𝑡 }:
fig, ax = plt.subplots(2)
ax[0].plot(coefs_ma_array, label="MA")
ax[0].legend()
ax[1].plot(coefs_var_array, label="VAR")
ax[1].legend()
plt.show()
530 30. REVERSE ENGINEERING A LA MUTH
The moving average coefficients in the top panel show tell-tale signs of 𝑦𝑡 being a process
whose first difference is a first-order autoregression
The autoregressive coefficients decline geometrically with decay rate (1 − 𝐾)
These are exactly the target outcomes that Muth (1960) aimed to reverse engineer
Dynamic Programming
531
31
Shortest Paths
31.1 Contents
• Overview 31.2
• Exercises 31.6
• Solutions 31.7
31.2 Overview
The shortest path problem is a classic problem in mathematics and computer science with
applications in
Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site
533
534 31. SHORTEST PATHS
The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost
Consider the following graph
For this simple graph, a quick scan of the edges shows that the optimal paths are
• A, C, F, G at cost 8
31.4. FINDING LEAST-COST PATHS 535
- A, D, F, G at cost 8
• Start at A
• From node v, move to any node that solves
536 31. SHORTEST PATHS
where
Hence, if we know the function 𝐽 , then finding the best path is almost trivial
But how to find 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies
This is known as the Bellman equation, after the mathematician Richard Bellman
1. Set 𝑛 = 0
2. Set 𝐽𝑛+1 (𝑣) = min𝑤∈𝐹𝑣 {𝑐(𝑣, 𝑤) + 𝐽𝑛 (𝑤)} for all 𝑣
3. If 𝐽𝑛+1 and 𝐽𝑛 are not equal then increment 𝑛, go to 2
31.6 Exercises
31.6.1 Exercise 1
Use the algorithm given above to find the optimal path (and its cost) for the following graph
You can put it in a Jupyter notebook cell and hit Shift-Enter — it will be saved in the local
directory as file graph.txt
Writing graph.txt
Here the line node0, node1 0.04, node8 11.11, node14 72.21 means that from
node0 we can go to
and so on
According to our calculations, the optimal path and its cost are like this
Your code should replicate this result
31.7 Solutions
31.7.1 Exercise 1
In [2]: def read_graph(in_file):
""" Read in the graph from the data file. The graph is stored
as a dictionary, where the keys are the nodes and the values
are a list of pairs (d, c), where d is a node and c is a number.
If (d, c) is in the list for node n, then d can be reached from
n at cost c.
"""
graph = {}
infile = open(in_file)
for line in infile:
elements = line.split(',')
node = elements.pop(0)
graph[node] = []
if node != 'node99':
for element in elements:
destination, cost = element.split()
graph[node].append((destination, float(cost)))
infile.close()
return graph
print('node99\n')
print('Cost: ', sum_costs)
## Main loop
graph = read_graph('graph.txt')
M = 1e10
J = {}
for node in graph:
J[node] = M
J['node99'] = 0
while True:
next_J = update_J(J, graph)
if next_J == J:
break
else:
J = next_J
print_best_path(J, graph)
node0
node8
node11
node18
node23
node33
node41
node53
node56
node57
node60
node67
node70
node73
node76
node85
node87
node88
node93
node94
node96
node97
540 31. SHORTEST PATHS
node98
node99
Cost: 160.55000000000007
32
32.1 Contents
• Overview 32.2
• Exercises 32.6
• Solutions 32.7
In addition to what’s in Anaconda, this lecture will need the following libraries
32.2 Overview
The McCall search model [94] helped transform economists’ way of thinking about labor mar-
kets
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as
541
542 32. JOB SEARCH I: THE MCCALL SEARCH MODEL
• impatience
• unemployment compensation
∞
E ∑ 𝛽 𝑡 𝑌𝑡
𝑡=0
32.3.1 A Trade-Off
• Waiting too long for a good offer is costly, since the future is discounted
• Accepting too early is costly, since better offers might arrive in the future
In order to optimally trade-off current and future rewards, we need to think about two things:
To weigh these two aspects of the decision problem, we need to assign values to states
To this end, let 𝑣∗ (𝑤) be the total lifetime value accruing to an unemployed worker who en-
ters the current period unemployed but with wage offer 𝑤 in hand
More precisely, 𝑣∗ (𝑤) denotes the value of the objective function (1) when an agent in this
situation makes optimal decisions now and at all future points in time
Of course 𝑣∗ (𝑤) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible wage 𝑤 the maximal lifetime value
that can be obtained with that offer in hand
A crucial observation is that this function 𝑣∗ must satisfy the recursion
𝑤
𝑣∗ (𝑤) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )} (1)
1−𝛽 𝑤′
• the first term inside the max operation is the lifetime payoff from accepting current of-
fer 𝑤, since
𝑤
𝑤 + 𝛽𝑤 + 𝛽 2 𝑤 + ⋯ =
1−𝛽
• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current offer 𝑤
But this is precisely 𝑣∗ (𝑤), which is the l.h.s. of Eq. (1)
544 32. JOB SEARCH I: THE MCCALL SEARCH MODEL
Suppose for now that we are able to solve Eq. (1) for the unknown function 𝑣∗
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject)
All we have to do is select the maximal choice on the r.h.s. of Eq. (1)
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions
In our case, the state is the current wage offer 𝑤
Given any 𝑤, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of Eq. (1)
Thus, we have a map from R to {0, 1}, with 1 meaning accept and 0 meaning reject
We can write the policy as follows
𝑤
𝜎(𝑤) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )}
1−𝛽 𝑤′
𝜎(𝑤) ∶= 1{𝑤 ≥ 𝑤}
̄
where
𝑤̄ ∶= (1 − 𝛽) {𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )}
𝑤′
Here 𝑤̄ is a constant depending on 𝛽, 𝑐 and the wage distribution called the reservation wage
The agent should accept if and only if the current wage offer exceeds the reservation wage
Clearly, we can compute this reservation wage if we can compute the value function
To put the above ideas into action, we need to compute the value function at points
𝑤1 , … , 𝑤 𝑛
In doing so, we can identify these values with the vector 𝑣∗ = (𝑣𝑖∗ ) where 𝑣𝑖∗ ∶= 𝑣∗ (𝑤𝑖 )
In view of Eq. (1), this vector satisfies the nonlinear system of equations
𝑤𝑖
𝑣𝑖∗ = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗∗ 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (2)
1−𝛽 𝑗
32.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 545
𝑤𝑖
𝑣𝑖′ = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 𝑗
Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣𝑖 − 𝑣𝑖′ |
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue
Step 5: return 𝑣
This algorithm returns an arbitrarily good approximation to the true solution to Eq. (2),
which represents the value function
(Arbitrarily good means here that the approximation converges to the true solution as the
tolerance goes to zero)
𝑤𝑖
(𝑇 𝑣)𝑖 = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 𝑗
(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
One can show that the conditions of the Banach contraction mapping theorem are satisfied by
𝑇 as a self-mapping on R𝑛
One implication is that 𝑇 has a unique fixed point in R𝑛
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is precisely the value
function
The iterative algorithm presented above corresponds to iterating with 𝑇 from some initial
guess 𝑣
The Banach contraction mapping theorem tells us that this iterative process generates a se-
quence that converges to the fixed point
32.4.3 Implementation
dist = BetaBinomial(n, a, b)
�_vals = dist.pdf()
plt.show()
First, let’s have a look at the sequence of approximate value functions that the algorithm
above generates
Default parameter values are embedded in the function
Our initial guess 𝑣 is the value of accepting at every given wage
v = w_vals / (1 - β)
v_next = np.empty_like(v)
for i in range(num_plots):
ax.plot(w_vals, v, label=f"iterate {i}")
# Update guess
for j, w in enumerate(w_vals):
stop_val = w / (1 - β)
cont_val = c + β * np.sum(v * �_vals)
v_next[j] = max(stop_val, cont_val)
v[:] = v_next
ax.legend(loc='lower right')
32.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 547
Here’s more serious iteration effort, that continues until measured deviation between succes-
sive iterates is below tol
We’ll be using JIT compilation via Numba to turbo charge our loops
In [5]: @jit(nopython=True)
def compute_reservation_wage(c=25,
β=0.99,
w_vals=w_vals,
�_vals=�_vals,
max_iter=500,
tol=1e-6):
v = w_vals / (1 - β)
v_next = np.empty_like(v)
i = 0
error = tol + 1
while i < max_iter and error > tol:
for j, w in enumerate(w_vals):
stop_val = w / (1 - β)
cont_val = c + β * np.sum(v * �_vals)
v_next[j] = max(stop_val, cont_val)
In [6]: compute_reservation_wage()
Out[6]: 47.316499710024964
Now we know how to compute the reservation wage, let’s see how it varies with parameters
In particular, let’s look at what happens when we change 𝛽 and 𝑐
In [7]: grid_size = 25
R = np.empty((grid_size, grid_size))
for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
R[i, j] = compute_reservation_wage(c=c, β=β)
ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)
ax.ticklabel_format(useOffset=False)
plt.show()
32.5. COMPUTING THE OPTIMAL POLICY: TAKE 2 549
As expected, the reservation wage increases both with patience and with unemployment com-
pensation
The approach to dynamic programming just described is very standard and broadly applica-
ble
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function
Let ℎ denote the value of not accepting a job in this period but then behaving optimally in
all subsequent periods
That is,
𝑤′
𝑣∗ (𝑤′ ) = max { , ℎ}
1−𝛽
𝑤′
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝜙(𝑤′ ) (6)
𝑤′
1−𝛽
550 32. JOB SEARCH I: THE MCCALL SEARCH MODEL
𝑤′
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝜙(𝑤′ ) (7)
𝑤′
1−𝛽
In [9]: @jit(nopython=True)
def compute_reservation_wage_two(c=25,
β=0.99,
w_vals=w_vals,
�_vals=�_vals,
max_iter=500,
tol=1e-5):
# == First compute � == #
h = np.sum(w_vals * �_vals) / (1 - β)
i = 0
error = tol + 1
while i < max_iter and error > tol:
s = np.maximum(w_vals / (1 - β), h)
h_next = c + β * np.sum(s * �_vals)
error = np.abs(h_next - h)
i += 1
h = h_next
return (1 - β) * h
32.6 Exercises
32.6.1 Exercise 1
Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values
That is, start the agent off as unemployed, computed their reservation wage given the param-
eters, and then simulate to see how long it takes to accept
Repeat a large number of times and take the average
Plot mean unemployment duration as a function of 𝑐 in c_vals
32.7 Solutions
32.7.1 Exercise 1
@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):
np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_vals[qe.random.draw(cdf)]
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time
@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
return obs.mean()
plt.show()
552 32. JOB SEARCH I: THE MCCALL SEARCH MODEL
33
33.1 Contents
• Overview 33.2
• Implementation 33.5
• Exercises 33.7
• Solutions 33.8
In addition to what’s in Anaconda, this lecture will need the following libraries
33.2 Overview
Previously we looked at the McCall job search model [94] as a way of understanding unem-
ployment and worker decisions
One unrealistic feature of the model is that every job is permanent
In this lecture, we extend the McCall model by introducing job separation
Once separation enters the picture, the agent comes to view
553
554 33. JOB SEARCH II: SEARCH AND SEPARATION
• the opportunities he or she (let’s say he to save one character) has to work at different
wages
• exogenous events that destroy his current job
• his decision making process while unemployed
∞
E ∑ 𝛽 𝑡 𝑢(𝑌𝑡 ) (1)
𝑡=0
The only difference from the baseline model is that we’ve added some flexibility over prefer-
ences by introducing a utility function 𝑢
It satisfies 𝑢′ > 0 and 𝑢″ < 0
Here’s what happens at the start of a given period in our model with search and separation
If currently employed, the worker consumes his wage 𝑤, receiving utility 𝑢(𝑤)
If currently unemployed, he
Let
• 𝑣(𝑤) be the total lifetime value accruing to a worker who enters the current period em-
ployed with wage 𝑤
• ℎ be the total lifetime value accruing to a worker who is unemployed this period
Here value means the value of the objective function Eq. (1) when the worker makes optimal
decisions at all future points in time
Suppose for now that the worker can calculate the function 𝑣 and the constant ℎ and use
them in his decision making
Then 𝑣 and ℎ should satisfy
and
Let’s interpret these two equations in light of the fact that today’s tomorrow is tomorrow’s
today
• The left-hand sides of equations Eq. (2) and Eq. (3) are the values of a worker in a par-
ticular situation today
• The right-hand sides of the equations are the discounted (by 𝛽) expected values of the
possible situations that worker can be in tomorrow
• But tomorrow the worker can be in only one of the situations whose values today are on
the left sides of our two equations
Equation Eq. (3) incorporates the fact that a currently unemployed worker will maximize his
own welfare
In particular, if his next period wage offer is 𝑤′ , he will choose to remain unemployed unless
ℎ < 𝑣(𝑤′ )
Equations Eq. (2) and Eq. (3) are the Bellman equations for this model
Equations Eq. (2) and Eq. (3) provide enough information to solve out for both 𝑣 and ℎ
Before discussing this, however, let’s make a small extension to the model
Let’s suppose now that unemployed workers don’t always receive job offers
Instead, let’s suppose that unemployed workers only receive an offer with probability 𝛾
If our worker does receive an offer, the wage offer is drawn from 𝜙 as before
He either accepts or rejects the offer
556 33. JOB SEARCH II: SEARCH AND SEPARATION
and
We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture
Here this amounts to
and
33.5 Implementation
@njit
33.5. IMPLEMENTATION 557
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""
def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount factor
γ=0.7, # Job offer rate
c=6.0, # Unemployment compensation
σ=2.0, # Utility parameter
w_vals=None, # Possible wage values
�_vals=None): # Probabilities over w_vals
# Add a default wage vector and probabilities over the vector using
# the beta-binomial distribution
if w_vals is None:
n = 60 # number of possible outcomes for wage
self.w_vals = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n-1, a, b)
self.�_vals = dist.pdf()
else:
self.w_vals = w_vals
self.�_vals = �_vals
In [4]: @njit
def Q(v, h, paras):
"""
A jitted function to update the Bellman equations
"""
v_new = np.empty_like(v)
for i in range(len(w_vals)):
w = w_vals[i]
v_new[i] = u(w, σ) + β * ((1 - α) * v[i] + α * h)
h_new = u(c, σ) + β * (1 - γ) * h + \
β * γ * np.sum(np.maximum(h, v) * �_vals)
The approach is to iterate until successive iterates are closer together than some small toler-
ance level
We then return the current iterate as an approximate solution
return v, h
Let’s plot the approximate solutions 𝑣 and ℎ to see what they look like
We’ll use the default parameterizations found in the code above
plt.show()
The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed
33.6. THE RESERVATION WAGE 559
Once 𝑣 and ℎ are known, the agent can use them to make decisions in the face of a given
wage offer
If 𝑣(𝑤) > ℎ, then working at wage 𝑤 is preferred to unemployment
If 𝑣(𝑤) < ℎ, then remaining unemployed will generate greater lifetime value
Suppose in particular that 𝑣 crosses ℎ (as it does in the preceding figure)
Then, since 𝑣 is increasing, there is a unique smallest 𝑤 in the set of possible wages such that
𝑣(𝑤) ≥ ℎ
We denote this wage 𝑤̄ and call it the reservation wage
Optimal behavior for the worker is characterized by 𝑤̄
• if the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts
• if the wage offer 𝑤 in hand is less than 𝑤,̄ then the worker rejects
If v(w) > h for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vals.
"""
v, h = solve_model(mcm)
w_idx = np.searchsorted(v - h, 0)
if w_idx == len(v):
w_bar = np.inf
else:
w_bar = mcm.w_vals[w_idx]
if not return_values:
return w_bar
else:
return w_bar, v, h
Let’s use it to look at how the reservation wage varies with parameters
In each instance below, we’ll show you a figure and then ask you to reproduce it in the exer-
cises
In the figure below, we use the default parameters in the McCallModel class, apart from c
(which takes the values given on the horizontal axis)
As expected, higher unemployment compensation causes the worker to hold out for higher
wages
In effect, the cost of continuing job search is reduced
Again, the results are intuitive: More patient workers will hold out for higher wages
Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼
33.7. EXERCISES 561
Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed
33.7 Exercises
33.7.1 Exercise 1
33.7.2 Exercise 2
In [8]: grid_size = 25
γ_vals = np.linspace(0.05, 0.95, grid_size)
33.8 Solutions
33.8.1 Exercise 1
In [9]: grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # values of unemployment compensation
w_bar_vals = np.empty_like(c_vals)
mcm = McCallModel()
for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.grid()
plt.show()
33.8.2 Exercise 2
In [10]: grid_size = 25
γ_vals = np.linspace(0.05, 0.95, grid_size)
w_bar_vals = np.empty_like(γ_vals)
mcm = McCallModel()
for i, γ in enumerate(γ_vals):
mcm.γ = γ
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
34.1 Contents
• Overview 34.2
• Implementation 34.5
• Analysis 34.6
34.2 Overview
This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University
This problem led Abraham Wald [132] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem
Key ideas in play will be:
• Bayes’ Law
565
566 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
• Dynamic programming
• Type I and type II statistical errors
– a type I error occurs when you reject a null hypothesis that is true
– a type II error is when you accept a null hypothesis that is false
• Abraham Wald’s sequential probability ratio test
• The power of a statistical test
• The critical region of a statistical test
• A uniformly most powerful test
On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [44], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University
Let’s listen to Milton Friedman tell us what happened
The standard statistical answer was to specify a number of firings (say 1,000) and
a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1
in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in
fewer than 47%, B can be regarded as superior; if the percentage is between 47%
and 53%, neither can be so regarded.
When Allen Wallis was discussing such a problem with (Navy) Captain Garret L.
Schyler, the captain objected that such a test, to quote from Allen’s account, may
prove wasteful. If a wise and seasoned ordnance officer like Schyler were on the
premises, he would see after the first few thousand or even few hundred [rounds]
that the experiment need not be completed either because the new method is ob-
viously inferior or because it is obviously superior beyond what was hoped for …
Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald
That started Wald on the path that led him to Sequential Analysis [132]
We’ll formulate the problem using dynamic programming
34.4. A DYNAMIC PROGRAMMING APPROACH 567
The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [14]
A decision-maker observes IID draws of a random variable 𝑧
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧
After a number of draws, also to be determined, he makes a decision as to which of the distri-
butions is generating the draws he observes
He starts with prior
𝜋𝑘 = P{𝑓 = 𝑓0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 }
𝜋𝑘 𝑓0 (𝑧𝑘+1 )
𝜋𝑘+1 = , 𝑘 = −1, 0, 1, …
𝜋𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝜋𝑘 )𝑓1 (𝑧𝑘+1 )
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker believes that 𝑧𝑘+1 has probability distribu-
tion
This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑓 = 𝑓0 [1]
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is
∞
Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0
The next figure shows two beta distributions in the top panel
The bottom panel presents mixtures of these distributions, with various mixing probabilities
𝜋𝑘
@vectorize
def p(x):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)
@njit
def p_rvs():
return np.random.beta(a, b)
568 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
return p, p_rvs
f0, _ = beta_function_factory(1, 1)
f1, _ = beta_function_factory(9, 9)
grid = np.linspace(0, 1, 50)
axes[0].set_title("Original Distributions")
axes[0].plot(grid, f0(grid), lw=2, label="$f_0$")
axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")
axes[1].set_title("Mixtures")
for π in 0.25, 0.5, 0.75:
y = π * f0(grid) + (1 - π) * f1(grid)
axes[1].plot(y, lw=2, label=f"$\pi_k$ = {π}")
for ax in axes:
ax.legend()
ax.set(xlabel="$z$ values", ylabel="probability of $z_k$")
plt.tight_layout()
plt.show()
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker chooses among three distinct actions:
Associated with these three actions, the decision-maker can suffer three kinds of losses:
34.4.3 Intuition
Let’s try to guess what an optimal decision rule might look like before we go further
Suppose at some given point in time that 𝜋 is close to 1
Then our prior beliefs and the evidence so far point strongly to 𝑓 = 𝑓0
If, on the other hand, 𝜋 is close to 0, then 𝑓 = 𝑓1 is strongly favored
Finally, if 𝜋 is in the middle of the interval [0, 1], then we have little information in either di-
rection
This reasoning suggests a decision rule such as the one shown in the figure
As we’ll see, this is indeed the correct form of the decision rule
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽
Let 𝐽 (𝜋) be the total loss for a decision-maker with current belief 𝜋 who chooses optimally
570 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
With some thought, you will agree that 𝐽 should satisfy the Bellman equation
𝜋𝑓0 (𝑧 ′ )
𝜋′ = 𝜅(𝑧 ′ , 𝜋) =
𝜋𝑓0 (𝑧′ ) + (1 − 𝜋)𝑓1 (𝑧′ )
when 𝜋 is fixed and 𝑧′ is drawn from the current best guess, which is the distribution 𝑓 de-
fined by
• (1 − 𝜋)𝐿0 is the expected loss associated with accepting 𝑓0 (i.e., the cost of making a
type II error)
• 𝜋𝐿1 is the expected loss associated with accepting 𝑓1 (i.e., the cost of making a type I
error)
• ℎ(𝜋) ∶= 𝑐 + E[𝐽 (𝜋′ )] the continuation value; i.e., the expected cost associated with draw-
ing one more 𝑧
The optimal decision rule is characterized by two numbers 𝛼, 𝛽 ∈ (0, 1) × (0, 1) that satisfy
and
accept 𝑓 = 𝑓0 if 𝜋 ≥ 𝛼
accept 𝑓 = 𝑓1 if 𝜋 ≤ 𝛽
draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽
34.5. IMPLEMENTATION 571
To make our computations simpler, using Eq. (2), we can write the continuation value ℎ(𝜋) as
The equality
ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧 ′ )𝑑𝑧′ (4)
34.5 Implementation
def __init__(self,
c=1.25, # Cost of another draw
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=25, # Cost of selecting f0 when f1 is true
L1=25, # Cost of selecting f1 when f0 is true
π_grid_size=200,
mc_size=1000):
# Set up distributions
self.f0, self.f0_rvs = beta_function_factory(a0, b0)
self.f1, self.f1_rvs = beta_function_factory(a1, b1)
"""
Returns a jitted version of the Q operator.
@njit
def κ(z, π):
"""
Updates π using Bayes' rule and the current observation z.
"""
π_f0, π_f1 = π * f0(z), (1 - π) * f1(z)
π_new = π_f0 / (π_f0 + π_f1)
return π_new
@njit(parallel=parallel_flag)
def Q(h):
h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)
for i in prange(len(π_grid)):
π = π_grid[i]
h_new[i] = c + integral
return h_new
return Q
To solve the model, we will iterate using Q to find the fixed point
"""
Compute the continuation value function
* wf is an instance of WaldFriedman
"""
Q = operator_factory(wf, parallel_flag=use_parallel)
# Set up loop
h = np.zeros(len(wf.π_grid))
34.6. ANALYSIS 573
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return h_new
34.6 Analysis
In [7]: wf = WaldFriedman()
plt.show()
574 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Converged in 25 iterations.
We will also set up a function to compute the cutoffs 𝛼 and 𝛽 and plot these on our value
function plot
return (β, α)
β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 - wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1
plt.legend(borderpad=1.1)
plt.show()
34.6. ANALYSIS 575
34.6.2 Simulations
The next figure shows the outcomes of 500 simulations of the decision process
On the left is a histogram of the stopping times, which equal the number of draws of 𝑧𝑘 re-
quired to make a decision
The average number of draws is around 6.6
On the right is the fraction of correct decisions at the stopping time
In this case, the decision-maker is correct 80% of the time
return π_new
if true_dist == "f0":
f, f_rvs = wf.f0, wf.f0_rvs
elif true_dist == "f1":
f, f_rvs = wf.f1, wf.f1_rvs
# Find cutoffs
β, α = find_cutoff_rule(wf, h_star)
if true_dist == "f0":
if decision == 0:
correct = True
else:
correct = False
return correct, π, t
for i in range(ndraws):
correct, π, t = simulate(wf, true_dist, h_star)
tdist[i] = t
cdist[i] = correct
def simulation_plot(wf):
h_star = solve_model(wf)
ndraws = 500
cdist, tdist = stopping_dist(wf, h_star, ndraws)
ax[0].hist(tdist, bins=np.max(tdist))
ax[0].set_title(f"Stopping times over {ndraws} replications")
ax[0].set(xlabel="time", ylabel="number of stops")
ax[0].annotate(f"mean = {np.mean(tdist)}", xy=(max(tdist) / 2,
max(np.histogram(tdist, bins=max(tdist))[0]) / 2))
34.6. ANALYSIS 577
ax[1].hist(cdist.astype(int), bins=2)
ax[1].set_title(f"Correct decisions over {ndraws} replications")
ax[1].annotate(f"% correct = {np.mean(cdist)}",
xy=(0.05, ndraws / 2))
plt.show()
simulation_plot(wf)
Converged in 25 iterations.
In [11]: wf = WaldFriedman(c=2.5)
simulation_plot(wf)
Converged in 13 iterations.
578 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Increased cost per draw has induced the decision-maker to take less draws before deciding
Because he decides with less, the percentage of time he is correct drops
This leads to him having a higher expected loss when he puts equal weight on both models
To facilitate comparative statics, we provide a Jupyter notebook that generates the same
plots, but with sliders
With these sliders, you can adjust parameters and immediately observe
• effects on the smoothness of the value function in the indecisive middle range as we in-
crease the number of grid points in the piecewise linear approximation
• effects of different settings for the cost parameters 𝐿0 , 𝐿1 , 𝑐, the parameters of two beta
distributions 𝑓0 and 𝑓1 , and the number of points and linear functions 𝑚 to use in the
piece-wise continuous approximation to the value function
• various simulations from 𝑓0 and associated distributions of waiting times to making a
decision
• associated histograms of correct and incorrect decisions
For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test
We’ll rely on Abraham Wald’s [132] elegant summary of Neyman-Pearson theory
For our purposes, watch for there features of the setup:
• The sample size 𝑛 is not fixed but rather an object to be chosen; technically 𝑛 is a ran-
dom variable
• The parameters 𝛽 and 𝛼 characterize cut-off rules used to determine 𝑛 as a random
variable
• Laws of large numbers make no appearances in the sequential construction
Wald frames the problem as making a decision about a probability distribution that is par-
tially known
(You have to assume that something is already known in order to state a well-posed problem
– usually, something means a lot)
By limiting what is unknown, Wald uses the following simple structure to illustrate the main
ideas:
As a basis for choosing among critical regions the following considerations have
been advanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may
commit errors of two kinds. We commit an error of the first kind if we reject 𝐻0
when it is true; we commit an error of the second kind if we accept 𝐻0 when 𝐻1
is true. After a particular critical region 𝑊 has been chosen, the probability of
committing an error of the first kind, as well as the probability of committing an
error of the second kind is uniquely determined. The probability of committing an
error of the first kind is equal to the probability, determined by the assumption
that 𝐻0 is true, that the observed sample will be included in the critical region 𝑊 .
The probability of committing an error of the second kind is equal to the proba-
bility, determined on the assumption that 𝐻1 is true, that the probability will fall
outside the critical region 𝑊 . For any given critical region 𝑊 we shall denote the
probability of an error of the first kind by 𝛼 and the probability of an error of the
second kind by 𝛽.
Let’s listen carefully to how Wald applies law of large numbers to interpret 𝛼 and 𝛽:
The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
power of the critical region
Wald notes that
one critical region 𝑊 is more desirable than another if it has smaller values of 𝛼
and 𝛽. Although either 𝛼 or 𝛽 can be made arbitrarily small by a proper choice of
the critical region 𝑊 , it is possible to make both 𝛼 and 𝛽 arbitrarily small for a
fixed value of 𝑛, i.e., a fixed sample size.
Neyman and Pearson show that a region consisting of all samples (𝑧1 , 𝑧2 , … , 𝑧𝑛 )
which satisfy the inequality
𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓0 (𝑧𝑛 )
is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative hy-
pothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will have
the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test
Here is how Wald introduces the notion of a sequential test
A rule is given for making one of the following three decisions at any stage of the
experiment (at the m th trial for each integral value of m ): (1) to accept the hy-
pothesis H , (2) to reject the hypothesis H , (3) to continue the experiment by
making an additional observation. Thus, such a test procedure is carried out se-
quentially. On the basis of the first observation, one of the aforementioned deci-
sion is made. If the first or second decision is made, the process is terminated. If
the third decision is made, a second trial is performed. Again, on the basis of the
first two observations, one of the three decision is made. If the third decision is
made, a third trial is performed, and so on. The process is continued until either
the first or the second decisions is made. The number n of observations required
by such a test procedure is a random variable, since the value of n depends on the
outcome of the observations.
Footnotes
34.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 581
[1] Because the decision-maker believes that 𝑧𝑘+1 is drawn from a mixture of two IID distri-
butions, he does not believe that the sequence [𝑧𝑘+1 , 𝑧𝑘+2 , …] is IID Instead, he believes that
it is exchangeable. See [79] chapter 11, for a discussion of exchangeability.
582 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
35
35.1 Contents
• Overview 35.2
• Model 35.3
• Exercises 35.6
• Solutions 35.7
• Appendix 35.8
In addition to what’s in Anaconda, this lecture will need the following libraries
35.2 Overview
In this lecture, we consider an extension of the previously studied job search model of McCall
[94]
In the McCall model, an unemployed worker decides when to accept a permanent position at
a specified wage, given
In the version considered below, the wage distribution is unknown and must be learned
583
584 35. JOB SEARCH III: SEARCH WITH LEARNING
• Infinite horizon dynamic programming with two states and one binary control
• Bayesian updating to learn the unknown distribution
35.3 Model
Let’s first review the basic McCall model [94] and then add the variation we want to consider
Recall that, in the baseline model, an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡
At time 𝑡, our worker either
The wage sequence {𝑊𝑡 } is IID and generated from known density 𝑞
∞
The worker aims to maximize the expected discounted sum of earnings E ∑𝑡=0 𝛽 𝑡 𝑦𝑡 The func-
tion 𝑉 satisfies the recursion
𝑤
𝑣(𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ )𝑞(𝑤′ )𝑑𝑤′ } (1)
1−𝛽
Now let’s extend the model by considering the variation presented in [87], section 6.6
The model is as above, apart from the fact that
The worker knows there are two possible distributions 𝐹 and 𝐺 — with densities 𝑓 and 𝑔
At the start of time, “nature” selects 𝑞 to be either 𝑓 or 𝑔 — the wage distribution from
which the entire sequence {𝑊𝑡 } will be drawn
This choice is not observed by the worker, who puts prior probability 𝜋0 on 𝑓 being chosen
Update rule: worker’s time 𝑡 estimate of the distribution is 𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔, where 𝜋𝑡 updates
via
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
This last expression follows from Bayes’ rule, which tells us that
P{𝑊 = 𝑤 | 𝑞 = 𝑓}P{𝑞 = 𝑓}
P{𝑞 = 𝑓 | 𝑊 = 𝑤} = and P{𝑊 = 𝑤} = ∑ P{𝑊 = 𝑤 | 𝑞 = 𝜔}P{𝑞 = 𝜔}
P{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}
The fact that Eq. (2) is recursive allows us to progress to a recursive solution method
Letting
𝜋𝑓(𝑤)
𝑞𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝜅(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)
we can express the value function for the unemployed worker recursively as follows
𝑤
𝑣(𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝜅(𝑤′ , 𝜋) (3)
1−𝛽
Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards
35.3.3 Parameterization
• 𝑓 is Beta(1, 1)
• 𝑔 is Beta(3, 1.2)
• 𝛽 = 0.95 and 𝑐 = 0.3
@vectorize
def p(x):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)
return p
f = beta_function_factory(1, 1)
g = beta_function_factory(3, 1.2)
plt.figure(figsize=(10, 8))
plt.plot(x_grid, f(x_grid), label='$f$', lw=2)
plt.plot(x_grid, g(x_grid), label='$g$', lw=2)
plt.legend()
plt.show()
What kind of optimal policy might result from Eq. (3) and the parameterization specified
above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤̄
But 𝑤̄ should depend on 𝜋 — in fact, it should be decreasing in 𝜋 because
Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive
35.4. TAKE 1: SOLUTION BY VFI 587
Summary: We conjecture that the optimal policy is of the form 1{𝑤 ≥ 𝑤(𝜋)}
̄ for some de-
creasing function 𝑤̄
Let’s set about solving the model and see how our results match with our intuition
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best
The class SearchProblem is used to store parameters and methods needed to compute opti-
mal actions
"""
def __init__(self,
β=0.95, # Discount factor
c=0.3, # Unemployment compensation
F_a=1,
F_b=1,
G_a=3,
G_b=1.2,
w_max=1, # Maximum wage possible
w_grid_size=100,
π_grid_size=100,
mc_size=500):
self.mc_size = mc_size
The following function takes an instance of this class and returns jitted versions of the Bell-
man operator T, and a get_greedy() function to compute the approximate optimal policy
from a guess v of the value function
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 - π) * g(w)
588 35. JOB SEARCH III: SEARCH WITH LEARNING
return π_new
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator.
"""
v_func = lambda x, y: mlinterp((w_grid, π_grid), v, (x, y))
v_new = np.empty_like(v)
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 - β)
v_2 = c + β * integral
v_new[i, j] = max(v_1, v_2)
return v_new
@njit(parallel=parallel_flag)
def get_greedy(v):
""""
Compute optimal actions taking v as the value function.
"""
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 - β)
v_2 = c + β * integral
return σ
return T, get_greedy
We will omit a detailed discussion of the code because there is a more efficient solution
method that we will use later
To solve the model we will use the following function that iterates using T to find a fixed
point
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=5):
"""
Solves for the value function
* sp is an instance of SearchProblem
"""
T, _ = operator_factory(sp, use_parallel)
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize v
v = np.zeros((m, n)) + sp.c / (1 - sp.β)
if i == max_iter:
print("Failed to converge!")
return v_new
In [7]: sp = SearchProblem()
v_star = solve_model(sp)
fig, ax = plt.subplots(figsize=(6, 6))
ax.contourf(sp.π_grid, sp.w_grid, v_star, 12, alpha=0.6, cmap=cm.jet)
cs = ax.contour(sp.π_grid, sp.w_grid, v_star, 12, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.set(xlabel='$\pi$', ylabel='$w$')
plt.show()
Converged in 33 iterations.
590 35. JOB SEARCH III: SEARCH WITH LEARNING
plt.show()
35.5. TAKE 2: A MORE EFFICIENT METHOD 591
The results fit well with our intuition from section looking forward
• The black line in the figure above corresponds to the function 𝑤(𝜋)
̄ introduced there
• It is decreasing as expected
𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽
𝑤 𝑤(𝜋)
̄
𝑣(𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽
𝑤(𝜋)
̄ 𝑤′ 𝑤(𝜋
̄ ′)
= 𝑐 + 𝛽 ∫ max { , } 𝑞𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽
𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (6)
Equation Eq. (6) can be understood as a functional equation, where 𝑤̄ is the unknown func-
tion
To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping
To this end, let
Comparing Eq. (6) and Eq. (7), we see that the set of fixed points of 𝑄 exactly coincides with
the set of solutions to the RWFE
Moreover, for any 𝜔, 𝜔′ ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} − max {𝑤′ , 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)}| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (8)
Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |𝜔 ∘ 𝜅(𝑤′ , 𝜋) − 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (10)
In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖)
Hence
Implementation
The following function takes an instance of SearchProblem and returns the operator Q
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 - π) * g(w)
π_new = pf / (pf + pg)
return π_new
@njit(parallel=parallel_flag)
def Q(ω):
"""
"""
ω_func = lambda p: interp(π_grid, ω, p)
ω_new = np.empty_like(ω)
594 35. JOB SEARCH III: SEARCH WITH LEARNING
for i in prange(len(π_grid)):
π = π_grid[i]
integral_f, integral_g = 0.0, 0.0
for m in prange(mc_size):
integral_f += max(w_f[m], ω_func(κ(w_f[m], π)))
integral_g += max(w_g[m], ω_func(κ(w_g[m], π)))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size
ω_new[i] = (1 - β) * c + β * integral
return ω_new
return Q
35.6 Exercises
35.6.1 Exercise 1
35.7 Solutions
35.7.1 Exercise 1
This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function
You should find that the run time is shorter than that of the value function approach
Similar to above, we set up a function to iterate with Q to find the fixed point
Q = Q_factory(sp, use_parallel)
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize w
w = np.ones_like(sp.π_grid)
if i == max_iter:
print("Failed to converge!")
return w_new
In [11]: sp = SearchProblem()
w_bar = solve_wbar(sp)
Converged in 24 iterations.
596 35. JOB SEARCH III: SEARCH WITH LEARNING
35.8 Appendix
The next piece of code is just a fun simulation to see what the effect of a change in the un-
derlying distribution on the unemployment rate is
At a point in the simulation, the distribution becomes significantly worse
It takes a while for agents to learn this, and in the meantime, they are too optimistic and
turn down too many jobs
As a result, the unemployment rate spikes
@njit
def update(a, b, e, π):
"Update e and π by drawing wage offer from beta distribution with parameters a and b"
if e == False:
w = np.random.beta(a, b) # Draw random wage
if w >= w_func(π):
e = True # Take new job
else:
π = 1 / (1 + ((1 - π) * g(w)) / (π * f(w)))
return e, π
@njit
def simulate_path(F_a=F_a,
F_b=F_b,
G_a=G_a,
G_b=G_b,
N=5000, # Number of agents
T=600, # Simulation length
d=200, # Change date
s=0.025): # Separation rate
e = np.ones((N, T+1))
π = np.ones((N, T+1)) * 1e-3
for t in range(T+1):
if t == d:
a, b = F_a, F_b # Change distribution parameters
plt.figure(figsize=(10, 6))
plt.plot(unemployment_rate)
plt.axvline(d, color='r', alpha=0.6, label='Change date')
plt.xlabel('Time')
plt.title('Unemployment rate')
plt.legend()
plt.show()
598 35. JOB SEARCH III: SEARCH WITH LEARNING
36
36.1 Contents
• Overview 36.2
• Model 36.3
• Implementation 36.4
• Exercises 36.5
• Solutions 36.6
In addition to what’s in Anaconda, this lecture will need the following libraries
36.2 Overview
599
600 36. JOB SEARCH IV: MODELING CAREER CHOICE
• Career and job within career both chosen to maximize expected discounted wage flow
• Infinite horizon dynamic programming with two state variables
36.3 Model
For workers, wages can be decomposed into the contribution of job and career
• 𝑤𝑡 = 𝜃𝑡 + 𝜖𝑡 , where
• retain a current (career, job) pair (𝜃𝑡 , 𝜖𝑡 ) — referred to hereafter as “stay put”
• retain a current career 𝜃𝑡 but redraw a job 𝜖𝑡 — referred to hereafter as “new job”
• redraw both a career 𝜃𝑡 and a job 𝜖𝑡 — referred to hereafter as “new life”
Draws of 𝜃 and 𝜖 are independent of each other and past values, with
• 𝜃𝑡 ∼ 𝐹
• 𝜖𝑡 ∼ 𝐺
Notice that the worker does not have the option to retain a job but redraw a career — start-
ing a new career always requires starting a new job
A young worker aims to maximize the expected sum of discounted wages
∞
E ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0
where
36.3. MODEL 601
𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)
Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively
36.3.1 Parameterization
As in [87], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
np.linspace(0, B, grid_size)
A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function
𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)
Interpretation:
Nice properties:
Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50
n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
602 36. JOB SEARCH IV: MODELING CAREER CHOICE
36.4 Implementation
We will first create a class CareerWorkerProblem which will hold the default parameteri-
zations of the model and an initial guess for the value function
def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):
The following function takes an instance of CareerWorkerProblem and returns the corre-
sponding Bellman operator 𝑇 and the greedy policy function
36.4. IMPLEMENTATION 603
In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in
Eq. (2)
"""
Returns jitted versions of the Bellman operator and the
greedy policy function
cw is an instance of ``CareerWorkerProblem``
"""
@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"
v_new = np.empty_like(v)
for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + �[j] + β * v[i, j] # stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # new job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # new life
v_new[i, j] = max(v1, v2, v3)
return v_new
@njit
def get_greedy(v):
"Computes the v-greedy policy"
σ = np.empty(v.shape)
for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + �[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action
return σ
return T, get_greedy
T, _ = operator_factory(cw, parallel_flag=use_parallel)
# Set up loop
v = np.ones((cw.grid_size, cw.grid_size)) * 100 # Initial guess
i = 0
error = tol + 1
604 36. JOB SEARCH IV: MODELING CAREER CHOICE
if i == max_iter:
print("Failed to converge!")
return v_new
In [7]: cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with a new job
and new career
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found
• If both job and career are good, the worker will stay put
Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job
606 36. JOB SEARCH IV: MODELING CAREER CHOICE
The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs
36.5 Exercises
36.5.1 Exercise 1
Using the default parameterization in the class CareerWorkerProblem, generate and plot
typical sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy
In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)
Hint: To generate the draws from the distributions 𝐹 and 𝐺, use quante-
con.random.draw()
36.5.2 Exercise 2
Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0)
In other words, we want to study the distribution of the random variable
𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes
Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space
36.6. SOLUTIONS 607
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:
𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}
Collect 25,000 draws of this random variable and compute the median (which should be
about 7)
Repeat the exercise with 𝛽 = 0.99 and interpret the change
36.5.3 Exercise 3
Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure –
interpret
36.6 Solutions
36.6.1 Exercise 1
In [9]: F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)
plt.legend()
plt.show()
608 36. JOB SEARCH IV: MODELING CAREER CHOICE
36.6.2 Exercise 2
In [10]: cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
t += 1
@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
samples = np.empty(M)
for i in prange(M):
samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)
median_time(greedy_star, F, G)
36.6. SOLUTIONS 609
Out[10]: 7.0
To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw =
CareerWorkerProblem() with cw = CareerWorkerProblem(β=0.99)
The medians are subject to randomness but should be about 7 and 14 respectively
Not surprisingly, more patient workers will wait longer to settle down to their final job
36.6.3 Exercise 3
In the new figure, you see that the region for which the worker stays put has grown because
the distribution for 𝜖 has become more concentrated around the mean, making high-paying
jobs less realistic
37
37.1 Contents
• Overview 37.2
• Model 37.3
• Implementation 37.4
• Solving for Policies 37.5
• Exercises 37.6
• Solutions 37.7
In addition to what’s in Anaconda, this lecture will need the following libraries
37.2 Overview
611
612 37. JOB SEARCH V: ON-THE-JOB SEARCH
37.3 Model
Let
• 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm
• 𝑤𝑡 denote current wages
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝑔(𝑥𝑡 , 𝜙𝑡 )
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈ [0, 1]
Value of offer is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with common distribution 𝑓
Worker has the right to reject the current offer and continue with existing job
In particular, 𝑥𝑡+1 = 𝑢𝑡+1 if accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) if rejects
Letting 𝑏𝑡+1 ∈ {0, 1} be binary with 𝑏𝑡+1 = 1 indicating an offer, we can write
Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using Eq. (1), the Bellman equation for this problem
can be written as
𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (2)
𝑠+𝜙≤1
37.3.1 Parameterization
√
𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)
• 𝐴 = 1.4
• 𝛼 = 0.6
• 𝛽 = 0.96
Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like
To begin, observe that the worker has two instruments to build capital and hence wages:
Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical
Our risk-neutral worker should focus on whatever instrument has the highest expected return
The relative expected return will depend on 𝑥
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of Eq. (1) gives expected
next period capital equal to 𝜋(𝑠)E𝑢 = E𝑢 = 0.5
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23
Both rates of return are good, but the return from search is better
Next, suppose that 𝑥 = 0.4
1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return
2. For sufficiently small 𝑥, search will be preferable to investment in job-specific human
capital. For larger 𝑥, the reverse will be true
Now let’s turn to implementation, and see if we can match our predictions
37.4 Implementation
We will set up a class JVWorker that holds the parameters of the model described above
"""
def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
614 37. JOB SEARCH V: ON-THE-JOB SEARCH
# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
� = 1e-4
grid_max = max(A**(1 / (1 - α)), stats.beta(a, b).ppf(1 - �))
# Human capital
self.x_grid = np.linspace(�, grid_max, grid_size)
The function operator_factory takes an instance of this class and returns a jitted version
of the Bellman operator T, ie.
where
When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid
But to evaluate the right-hand side of Eq. (3), we need a function, so we replace the arrays v
and x_grid with a function v_func that gives linear interpolation of v on x_grid
Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) =
𝑤(𝑠, 𝜙) defined in Eq. (3)
The function is maximized over all feasible (𝑠, 𝜙) pairs
Another function, get_greedy returns the optimal policies of s and 𝜙 given a value func-
tion
"""
Returns a jitted version of the Bellman operator T
jv is an instance of JVWorker
"""
π, β = jv.π, jv.β
x_grid, �, mc_size = jv.x_grid, jv.�, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g
@njit
def objective(z, x, v):
s, � = z
v_func = lambda x: interp(x_grid, v, x)
integral = 0
for m in range(mc_size):
37.4. IMPLEMENTATION 615
u = f_rvs[m]
integral += v_func(max(g(x, �), u))
integral = integral / mc_size
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]
return v_new
@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
s_policy, �_policy = np.empty_like(v), np.empty_like(v)
for i in range(len(x_grid)):
x = x_grid[i]
# === Search on a grid === #
search_grid = np.linspace(�, 1, 15)
max_val = -1
for s in search_grid:
for � in search_grid:
current_val = objective((s, �), x, v) if s + � <= 1 else -1
if current_val > max_val:
max_val = current_val
max_s, max_� = s, �
s_policy[i], �_policy[i] = max_s, max_�
return s_policy, �_policy
return T, get_greedy
To solve the model, we will write a function that uses the Bellman operator and iterates to
find a fixed point
"""
Solves the model by value function iteration
* jv is an instance of JVWorker
"""
T, _ = operator_factory(jv, parallel_flag=use_parallel)
616 37. JOB SEARCH V: ON-THE-JOB SEARCH
# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
Let’s plot the optimal policies and see what they look like
In [6]: jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, �_star = get_greedy(v_star)
plots = [s_star, �_star, v_star]
titles = ["� policy", "s policy", "value function"]
axes[-1].set_xlabel("x")
plt.show()
The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥)
Overall, the policies match well with our predictions from above
• Worker switches from one investment strategy to the other depending on relative return
• For low values of 𝑥, the best option is to search for a new job
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position
37.6 Exercises
37.6.1 Exercise 1
Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies
The dynamics are given by Eq. (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal poli-
cies, and P{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 )
Since the dynamics are random, analysis is a bit subtle
618 37. JOB SEARCH V: ON-THE-JOB SEARCH
One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large
number 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥
Plot this with one dot for each realization, in the form of a 45 degree diagram, setting
jv = JVWorker(grid_size=25, mc_size=50)
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)
By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6
37.6.2 Exercise 2
In the preceding exercise, we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡)
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝑔(𝑥, 𝜙)
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙)
Graph 𝑤∗ (𝜙) with respect to 𝜙, and examine the best choice of 𝜙
Can you give a rough interpretation for the value that you see?
37.7 Solutions
37.7.1 Exercise 1
plt.show()
• If 𝑥𝑡 is below about 0.2 the dynamics are random, but 𝑥𝑡+1 > 𝑥𝑡 is very likely
• As 𝑥𝑡 increases the dynamics become deterministic, and 𝑥𝑡 converges to a steady state
value close to 1
620 37. JOB SEARCH V: ON-THE-JOB SEARCH
Referring back to the figure here we see that 𝑥𝑡 ≈ 1 means that 𝑠𝑡 = 𝑠(𝑥𝑡 ) ≈ 0 and 𝜙𝑡 =
𝜙(𝑥𝑡 ) ≈ 0.6
37.7.2 Exercise 2
In [8]: jv = JVWorker()
def xbar(�):
A, α = jv.A, jv.α
return (A * �**α)**(1 / (1 - α))
plt.show()
38.1 Contents
• Overview 38.2
• Computation 38.4
• Exercises 38.5
• Solutions 38.6
In addition to what’s in Anaconda, this lecture will need the following libraries
38.2 Overview
In this lecture, we’re going to study a simple optimal growth model with one agent
The model is a version of the standard one sector infinite horizon growth model studied in
• [123], chapter 2
• [87], section 3.1
• EDTC, chapter 1
• [127], chapter 12
621
622 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
We use an interpolation function from the interpolation.py package because it comes in handy
later when we want to just-in-time compile our code
This library can be installed with the following command in Jupyter: !pip install in-
terpolation
𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)
In what follows,
While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡
This will allow us to treat a stochastic model while maintaining only one state variable
We consider alternative states and timing specifications in some of our other lectures
38.3. THE MODEL 623
38.3.2 Optimization
∞
E [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0
subject to
where
In Eq. (3) we are assuming that the resource constraint Eq. (1) holds with equality — which
is reasonable because 𝑢 is strictly increasing and no output will be wasted at the optimum
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is
1. nonnegative,
2. feasible in the sense of Eq. (1),
3. optimal, in the sense that it maximizes Eq. (2) relative to all other feasible consumption
sequences, and
4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not on
future outcomes such as 𝜉𝑡+1
• 𝑦𝑡 is called the state variable — it summarizes the “state of the world” at the start of
each period
• 𝑐𝑡 is called the control variable — a value chosen by the agent each period after observ-
ing the state
One way to think about solving this problem is to look for the best policy function
A policy function is a map from past and present observables into current action
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today
This is quite intuitive but if you wish you can find proofs in texts such as [123] (section 4.1)
624 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint
The set of all feasible consumption policies will be denoted by Σ
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via
This is the time path for output when we choose and stick with the policy 𝜎
We insert this process into the objective function to get
∞ ∞
E [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = E [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0
This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0
The aim is to select a policy that makes this number as large as possible
The next section covers these ideas more formally
38.3.4 Optimality
∞
𝑣𝜎 (𝑦) = E [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0
The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in Eq. (8) for all 𝑦 ∈ R+
38.3. THE MODEL 625
With our assumptions on utility and production function, the value function as defined in
Eq. (8) also satisfies a Bellman equation
For this problem, the Bellman equation takes the form
The Bellman equation is important because it gives us more information about the value
function
It also suggests a way of computing the value function, which we discuss below
The primary importance of the value function is that we can use it to compute optimal poli-
cies
The details are as follows
Given a continuous function 𝑣 on R+ , we say that 𝜎 ∈ Σ is 𝑣-greedy if 𝜎(𝑦) is a solution to
for every 𝑦 ∈ R+
In other words, 𝜎 ∈ Σ is 𝑣-greedy if it optimally trades off current and future rewards when 𝑣
is taken to be the value function
In our setting, we have the following key result
626 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
The intuition is similar to the intuition for the Bellman equation, which was provided after
Eq. (9)
See, for example, theorem 10.1.11 of EDTC
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy
The advantage is that we are now solving a much lower dimensional optimization problem
In other words, 𝑇 sends the function 𝑣 into the new function 𝑇 𝑣 defined by Eq. (11)
By construction, the set of solutions to the Bellman equation Eq. (9) exactly coincides with
the set of fixed points of 𝑇
For example, if 𝑇 𝑣 = 𝑣, then, for any 𝑦 ≥ 0,
One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on R+ under the supremum distance
The results stated above assume that the utility function is bounded
In practice economists often work with unbounded utility functions — and so will we
In the unbounded setting, various optimality theories exist
Unfortunately, they tend to be case-specific, as opposed to valid for a large range of applica-
tions
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”)
Consult, for example, section 12.2 of EDTC, [75] or [92]
38.4 Computation
Let’s now look at computing the value function and the optimal policy
The first step is to compute the value function by value function iteration
In theory, the algorithm is as follows
def Af(x):
return interp(c_grid, f(c_grid), x)
c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)
Another advantage of piecewise linear interpolation is that it preserves useful shape properties
such as monotonicity and concavity/convexity
def __init__(self,
f, # Production function
u, # Utility function
β=0.96, # Discount factor
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):
@njit
def objective(c, v, y):
"""
The right-hand side of the Bellman equation
"""
# First turn v into a function via interpolation
v_func = lambda x: interp(grid, v, x)
return u(c) + β * np.mean(v_func(f(y - c) * shocks))
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal v at y
v_max = brent_max(objective, 1e-10, y, args=(v, y))[1]
v_new[i] = v_max
return v_new
@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
σ = np.empty_like(v)
for i in range(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_max = brent_max(objective, 1e-10, y, args=(v, y))[0]
σ[i] = c_max
return σ
return T, get_greedy
optgro The function operator_factory takes a class that represents the growth model and
returns the operator T and a function get_greedy that we will use to solve the model
Notice that the expectation in Eq. (11) is computed via Monte Carlo, using the approxima-
tion
1 𝑛
∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑣(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1
38.4.4 An Example
• 𝑓(𝑘) = 𝑘𝛼
• 𝑢(𝑐) = ln 𝑐
• 𝜙 is the distribution of exp(𝜇 + 𝜎𝜁) when 𝜁 is standard normal
As is well-known (see [87], section 3.1.2), for this particular problem an exact analytical solu-
tion is available, with
𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦
We will define functions to compute the closed-form solutions to check our answers
To test our code, we want to see if we can replicate the analytical solution numerically, using
fitted value function iteration
First, having run the code for the general model shown above, let’s generate an instance of
the model and generate its Bellman operator
We first need to define a jitted version of the production function
@njit
def f(k):
"""
Cobb-Douglas production function
"""
return k**α
Now we will create an instance of the model and assign it to the variable og
This instance will use the Cobb-Douglas production function and log utility
We will use og to generate the Bellman operator and a function that computes greedy poli-
cies
632 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
The two functions are essentially indistinguishable, so we are off to a good start
Now let’s have a look at iterating with the Bellman operator, starting off from an arbitrary
initial condition
The initial condition we’ll start with is 𝑣(𝑦) = 5 ln(𝑦)
ax.plot(grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial condition')
for i in range(n):
38.4. COMPUTATION 633
ax.legend()
ax.set(ylim=(-40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()
1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates
2. the true value function 𝑣∗ drawn in black
T, _ = operator_factory(og, parallel_flag=use_parallel)
# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
634 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
if i == max_iter:
print("Failed to converge!")
return v_new
ax.legend()
ax.set_ylim(-35, -24)
plt.show()
To compute an approximate optimal policy, we will use the second function returned from
operator_factory that backs out the optimal policy from the solution to the Bellman
equation
The next figure compares the result to the exact solution, which, as mentioned above, is
𝜎(𝑦) = (1 − 𝛼𝛽)𝑦
ax.legend()
plt.show()
The figure shows that we’ve done a good job in this instance of approximating the true policy
38.5 Exercises
38.5.1 Exercise 1
38.6 Solutions
38.6.1 Exercise 1
Here’s one solution (assuming as usual that you’ve executed everything above)
σ_star = get_greedy(v_solution)
σ_func = lambda x: interp(grid, σ_star, x) # Define an optimal policy function
38.6. SOLUTIONS 637
y = simulate_og(σ_func, og, α)
ax.plot(y, lw=2, alpha=0.6, label=rf'$\beta = {β}$')
ax.legend(loc='lower right')
plt.show()
638 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
39
39.1 Contents
• Overview 39.2
• Implementation 39.5
• Exercises 39.6
• Solutions 39.7
In addition to what’s in Anaconda, this lecture will need the following libraries
39.2 Overview
In this lecture, we’ll continue our earlier study of the stochastic optimal growth model
In that lecture, we solved the associated discounted dynamic programming problem using
value function iteration
The beauty of this technique is its broad applicability
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives
We’ll use this structure to obtain an Euler equation based method that’s more efficient
than value function iteration for this and some other closely related applications
In a subsequent lecture, we’ll see that the numerical implementation part of the Euler equa-
tion method can be further adjusted to obtain even more efficiency
Let’s start with some imports
639
640 39. OPTIMAL GROWTH II: TIME ITERATION
Let’s take the model set out in the stochastic growth model lecture and add the assumptions
that
• 𝜎∗ is the unique optimal policy for the stochastic optimal growth model
• the optimal policy is continuous, strictly increasing and also interior, in the sense that
0 < 𝜎∗ (𝑦) < 𝑦 for all strictly positive 𝑦, and
• the value function is strictly concave and continuously differentiable, with
The last result is called the envelope condition due to its relationship with the envelope
theorem
To see why Eq. (2) might be valid, write the Bellman equation in the equivalent form
Combining Eq. (2) and the first-order condition Eq. (3) gives the famous Euler equation
over interior consumption policies 𝜎, one solution of which is the optimal policy 𝜎∗
Our aim is to solve the functional equation Eq. (5) and hence obtain 𝜎∗
Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior (i.e., 0 < 𝜎(𝑦) < 𝑦 for all strictly positive 𝑦)
Henceforth we denote this set of policies by 𝒫
We call this operator the Coleman-Reffett operator to acknowledge the work of [28] and
[107]
In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation Eq. (5)
In particular, the optimal policy 𝜎∗ is a fixed point
Indeed, for fixed 𝑦, the value 𝐾𝜎∗ (𝑦) is the 𝑐 that solves
Sketching these curves and using the information above will convince you that they cross ex-
actly once as 𝑐 ranges over (0, 𝑦)
With a bit more analysis, one can show in addition that 𝐾𝜎 ∈ 𝒫 whenever 𝜎 ∈ 𝒫
How does Euler equation time iteration compare with value function iteration?
Both can be used to compute the optimal policy, but is one faster or more accurate?
There are two parts to this story
First, on a theoretical level, the two methods are essentially isomorphic
In particular, they converge at the same rate
We’ll prove this in just a moment
The other side of the story is the accuracy of the numerical implementation
It turns out that, once we actually implement these two routines, time iteration is more accu-
rate than value function iteration
More on this below
𝜏 ∘𝑔 = ℎ∘𝜏
𝑔 = 𝜏 −1 ∘ ℎ ∘ 𝜏 (8)
Here’s a similar figure that traces out the action of the maps on a point 𝑥 ∈ 𝑋
In fact, if you like proofs by induction, you won’t have trouble showing that
𝑔𝑛 = 𝜏 −1 ∘ ℎ𝑛 ∘ 𝜏
𝑀 ∘𝑇 = 𝐾 ∘𝑀 (9)
𝑇 𝑛 = 𝑀 −1 ∘ 𝐾 𝑛 ∘ 𝑀
39.5 Implementation
We’ve just shown that the operators 𝑇 and 𝐾 have the same rate of convergence
However, it turns out that, once numerical approximation is taken into account, significant
differences arise
In particular, the image of policy functions under 𝐾 can be calculated faster and with greater
accuracy than the image of value functions under 𝑇
Our intuition for this result is that
• the Coleman-Reffett operator exploits more information because it uses first order and
envelope conditions
• policy functions generally have less curvature than value functions, and hence admit
more accurate approximations based on grid point information
def __init__(self,
f,
f_prime,
u,
u_prime,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):
@njit
def objective(c, σ, y):
"""
646 39. OPTIMAL GROWTH II: TIME ITERATION
@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star
return σ_new
return K
It has some similarities to the code for the Bellman operator in our optimal growth lecture
For example, it evaluates integrals by Monte Carlo and approximates functions using linear
interpolation
Here’s that Bellman operator code again, which needs to be executed because we’ll use it in
some tests below
@njit
def objective(c, v, y):
"""
The right-hand side of the Bellman equation
"""
# First turn v into a function via interpolation
v_func = lambda x: interp(grid, v, x)
return u(c) + β * np.mean(v_func(f(y - c) * shocks))
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal v at y
v_max = brent_max(objective, 1e-10, y, args=(v, y))[1]
v_new[i] = v_max
return v_new
@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
σ = np.empty_like(v)
39.5. IMPLEMENTATION 647
for i in range(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_max = brent_max(objective, 1e-10, y, args=(v, y))[0]
σ[i] = c_max
return σ
return T, get_greedy
As we did for value function iteration, let’s start by testing our method in the presence of a
model that does have an analytical solution
First, we generate an instance of OptimalGrowthModel and return the corresponding
Coleman-Reffett operator
In [6]: α = 0.3
@njit
def f(k):
"Deterministic part of production function"
return k**α
@njit
def f_prime(k):
return α * k**(α - 1)
og = OptimalGrowthModel(f=f, f_prime=f_prime,
u=np.log, u_prime=njit(lambda x: 1/x))
K = time_operator_factory(og)
In [7]: @njit
def σ_star(y, α, β):
"True optimal policy"
return (1 - α * β) * y
fig, ax = plt.subplots()
ax.plot(grid, σ_star(grid, α, β), label="optimal policy $\sigma^*$")
ax.plot(grid, σ_star_new, label="$K\sigma^*$")
ax.legend()
plt.show()
648 39. OPTIMAL GROWTH II: TIME ITERATION
We can’t really distinguish the two plots, so we are looking good, at least for this test
Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗
The initial condition we’ll use is the one that eats the whole pie: 𝜎(𝑦) = 𝑦
In [8]: n = 15
σ = grid.copy() # Set initial condition
fig, ax = plt.subplots(figsize=(9, 6))
lb = 'initial condition $\sigma(y) = y$'
ax.plot(grid, σ, color=plt.cm.jet(0), alpha=0.6, label=lb)
for i in range(n):
σ = K(σ)
ax.plot(grid, σ, color=plt.cm.jet(i / n), alpha=0.6)
plt.show()
39.5. IMPLEMENTATION 649
We see that the policy has converged nicely, in only a few steps
Now let’s compare the accuracy of iteration between the operators
We’ll generate
1. 𝐾 𝑛 𝜎 where 𝜎(𝑦) = 𝑦
2. (𝑀 ∘ 𝑇 𝑛 ∘ 𝑀 −1 )𝜎 where 𝜎(𝑦) = 𝑦
for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration
As you can see, time iteration is much more accurate for a given number of iterations
39.6 Exercises
39.6.1 Exercise 1
39.6.2 Exercise 2
39.6.3 Exercise 3
Consider the same model as above but with the CRRA utility function
𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾
Iterate 20 times with Bellman iteration and Euler equation time iteration
Compare the resulting policies and check that they are close
39.6.4 Exercise 4
Solve the above model as we did in the previous lecture using the operators 𝑇 and 𝐾, and
check the solutions are similiar by plotting
39.7 Solutions
39.7.1 Exercise 1
39.7.2 Exercise 2
𝑦
Let 𝑣(𝑦) ∶= ∫0 𝑢′ (𝜎(𝑥))𝑑𝑥 with 𝑣(0) = 0
With a small amount of effort, you will be able to show that 𝑣 ∈ 𝒱 and 𝑀 𝑣 = 𝜎
It’s also true that 𝑀 is one-to-one on 𝒱
To see this, suppose that 𝑣 and 𝑤 are elements of 𝒱 satisfying 𝑀 𝑣 = 𝑀 𝑤
Then 𝑣(0) = 𝑤(0) = 0 and 𝑣′ = 𝑤′ on (0, ∞)
The fundamental theorem of calculus then implies that 𝑣 = 𝑤 on R+
39.7.3 Exercise 3
Here’s the code, which will execute if you’ve run all the code above
@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)
@njit
def u_prime(c):
return c**(-γ)
T, get_greedy = operator_factory(og)
K = time_operator_factory(og)
for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration
39.7.4 Exercise 4
Here’s is the function we need to solve the model using value function iteration, copied from
the previous lecture
T, _ = operator_factory(og, parallel_flag=use_parallel)
# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
K = time_operator_factory(og, parallel_flag=use_parallel)
# Set up loop
σ = og.grid # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Converged in 10 iterations.
39.7. SOLUTIONS 655
Time iteration is numerically far more accurate for a given number of iterations
656 39. OPTIMAL GROWTH II: TIME ITERATION
40
40.1 Contents
• Overview 40.2
• Implementation 40.4
• Speed 40.5
In addition to what’s in Anaconda, this lecture will need the following libraries
40.2 Overview
657
658 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD
Let’s start by reminding ourselves of the theory and then see how the numerics fit in
40.3.1 Theory
Take the model set out in the time iteration lecture, following the same terminology and no-
tation
The Euler equation is
The method discussed above requires a root-finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖
Root-finding is costly because it typically involves a significant number of function evalua-
tions
As pointed out by Carroll [23], we can avoid this if 𝑦𝑖 is chosen endogenously
The only assumption required is that 𝑢′ is invertible on (0, ∞)
The idea is this:
First, we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐)
Then we obtain 𝑐𝑖 via
40.4 Implementation
Let’s implement this version of the Coleman-Reffett operator and see how it performs
First, we will construct a class OptimalGrowthModel to hold the parameters of the model
"""
The class holds parameters and true value and policy functions.
"""
def __init__(self,
f, # Production function
f_prime, # f'(k)
u, # Utility function
u_prime, # Marginal utility
u_prime_inv, # Inverse marginal utility
β=0.96, # Discount factor
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):
def K(σ):
"""
The Bellman operator
* σ is a function
"""
# Allocate memory for value of consumption on endogenous grid points
c = np.empty_like(grid)
return σ_new
return K
@njit
def objective(c, σ, y):
"""
The right hand side of the operator
"""
# First turn w into a function via interpolation
σ_func = lambda x: interp(grid, σ, x)
vals = u_prime(σ_func(f(y - c) * shocks)) * f_prime(y - c) * shocks
40.4. IMPLEMENTATION 661
@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star
return σ_new
return K
As we did for value function iteration and time iteration, let’s start by testing our method
with the log-linear benchmark
First, we generate an instance
@njit
def f(k):
"""
Cobb-Douglas production function
"""
return k**α
@njit
def f_prime(k):
"""
First derivative of the production function
"""
return α * k**(α - 1)
@njit
def u_prime(c):
return 1 / c
og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=np.log,
u_prime=u_prime,
u_prime_inv=u_prime)
def c_star(y):
"True optimal policy"
return (1 - α * β) * y
662 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD
ax.legend()
plt.show()
Out[8]: 9.881666666666672e-06
Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗
Let’s start from the consumption policy that eats the whole pie: 𝜎(𝑦) = 𝑦
In [9]: σ = lambda x: x
n = 15
fig, ax = plt.subplots(figsize=(9, 6))
for i in range(n):
σ = K(σ) # Update policy
ax.plot(grid, σ(grid), color=plt.cm.jet(i / n), alpha=0.6)
ax.legend()
plt.show()
We see that the policy has converged nicely, in only a few steps
40.5 Speed
Now let’s compare the clock times per iteration for the standard Coleman-Reffett operator
(with exogenous grid) and the EGM version
We’ll do so using the CRRA model adopted in the exercises of the Euler equation time itera-
tion lecture
@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)
@njit
def u_prime(c):
return c**(-γ)
@njit
def u_prime_inv(c):
return c**(-1 / γ)
og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=u,
u_prime=u_prime,
u_prime_inv=u_prime_inv)
In [11]: sim_length = 20
Out[11]: 0.3692951202392578
We see that the EGM version is significantly faster, even without jit compilation!
The absence of numerical root-finding means that it is typically more accurate at each step as
well
41
LQ Dynamic Programming
Problems
41.1 Contents
• Overview 41.2
• Introduction 41.3
• Implementation 41.5
• Exercises 41.8
• Solutions 41.9
In addition to what’s in Anaconda, this lecture will need the following libraries
41.2 Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field
This lecture provides an introduction to LQ control and its economic applications
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially
These themes appear repeatedly below
Mathematically, LQ control problems are closely related to the Kalman filter
665
666 41. LQ DYNAMIC PROGRAMMING PROBLEMS
• matrix manipulations
• vectors of random variables
• dynamic programming and the Bellman equation (see for example this lecture and this
lecture)
• [87], chapter 5
• [52], chapter 4
• [65], section 3.5
In order to focus on computation, we leave longer proofs to these sources (while trying to pro-
vide as much intuition as possible)
41.3 Introduction
The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem
Here
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
41.3. INTRODUCTION 667
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
Example 1
Consider a household budget constraint given by
𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡
Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as
This is clearly a special case of Eq. (1), with assets being the state and consumption being
the control
Example 2
One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative
This can easily be overcome by adding a sufficiently large mean
Hence in this example, we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐 ̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐 ̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes
How can we write this new system in the form of equation Eq. (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion)
Fortunately, we can easily circumvent this problem by adding an extra state variable
In particular, if we write
𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 )+( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0
Moreover, the model is now linear and can be written in the form of Eq. (1) by setting
𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0
41.3.2 Preferences
In the LQ model, the aim is to minimize flow of losses, where time-𝑡 loss is given by the
quadratic expression
Here
Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [52] for details
Example 1
A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices so that current loss is
Thus, for both the state and the control, loss is measured as squared distance from the origin
(In fact, the general case Eq. (5) can also be understood in this way, but with 𝑅 and 𝑄 iden-
tifying other – non-Euclidean – notions of “distance” from the zero vector)
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as
The aim is to put the state close to the target, while using controls parsimoniously
Example 2
In the household problem studied above, setting 𝑅 = 0 and 𝑄 = 1 yields preferences
Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐 ̄
41.4. OPTIMALITY – FINITE HORIZON 669
Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it
We will begin with the finite horizon case, with terminal time 𝑇 ∈ N
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive
𝑇 −1
E { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0
41.4.2 Information
There’s one constraint we’ve neglected to mention so far, which is that the decision-maker
who solves this LQ problem knows only the present and the past, not the future
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }
When choosing these controls, the decision-maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”)
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡
In fact, it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision-maker needs to set controls optimally
670 41. LQ DYNAMIC PROGRAMMING PROBLEMS
More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone
Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 )
Actually, the preceding discussion applies to all standard dynamic programming problems
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡
41.4.3 Solution
To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backward induction that is conceptually similar to the approach adopted in this lecture
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥
Now consider the problem of the decision-maker in the second to last period
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1
The decision-maker must trade-off current and (discounted) final losses, and hence solves
The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision-maker behaves
optimally
Now let’s step back to 𝑇 − 2
For a decision-maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision-maker at 𝑇 − 1
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥
The decision-maker chooses her control 𝑢 to trade off current loss against future loss, where
• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 )
Letting
The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls
As a first step, let’s find out what the value functions look like
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0
Using this notation, Eq. (7) becomes
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero
Applying the relevant rules of matrix calculus, this gives
𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1
where
and
𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)
and
Recalling Eq. (9), the minimizers from these backward steps are
41.5 Implementation
We will use code from lqcontrol.py in QuantEcon.py to solve finite and infinite horizon linear
quadratic control problems
In the module, the various updating, simulation and fixed point methods are wrapped in a
class called LQ, which includes
• Instance data:
• Methods:
– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via Eq. (12), Eq. (13) and
Eq. (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assum-
ing standard normal shocks
41.5.1 An Application
Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income
Data contradicted the constancy of the marginal propensity to consume
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream
41.5. IMPLEMENTATION 673
𝑇 −1
E { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0
0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying
Eq. (12) and Eq. (14)
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via Eq. (15)
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106
The shocks {𝑤𝑡 } were taken to be IID and standard normal
# == Model parameters == #
r = 0.05
β = 1/(1 + r)
T = 45
c_bar = 2
σ = 0.25
μ = 1
q = 1e6
# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
674 41. LQ DYNAMIC PROGRAMMING PROBLEMS
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income
(But note that consumption becomes more irregular towards the end of life, when the zero
final asset requirement impinges more on consumption choices)
The second panel in the figure shows that the time path of assets 𝑎𝑡 is closely correlated with
cumulative unanticipated income, where the latter is defined as
𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0
A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement)
41.5. IMPLEMENTATION 675
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
676 41. LQ DYNAMIC PROGRAMMING PROBLEMS
We now have a slowly rising consumption stream and a hump-shaped build-up of assets in the
middle periods to fund rising consumption
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income
Let’s now consider a number of standard extensions to the LQ problem treated above
For further examples and a more systematic treatment, see [53], section 2.4
In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes
𝑇 −1
E { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0
Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by
∞
E {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0
In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡
That decision rules are constant over time is intuitive — after all, the decision-maker faces
the same infinite horizon at every stage, with only the current state changing
Not surprisingly, 𝑃 and 𝑑 are also constant
The stationary matrix 𝑃 is the solution to the discrete-time algebraic Riccati equation
Equation Eq. (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃
into the right-hand side of Eq. (21) is called the LQ Bellman operator
The stationary optimal policy for this model is
678 41. LQ DYNAMIC PROGRAMMING PROBLEMS
The sequence {𝑑𝑡 } from Eq. (13) is replaced by the constant value
𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽
The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1
An example infinite horizon problem is treated below
Linear quadratic control problems of the class discussed above have the property of certainty
equivalence
By this, we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process
This can be confirmed by inspecting Eq. (22) or Eq. (19)
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics
𝑇 −1
E { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0
To put this into an LQ setting, consider the budget constraint, which becomes
The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily
Thus, for the dynamics we set
𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎜ 1 ⎞
⎛ ⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎜ 0 ⎞
⎛ ⎟ ⎜ 0 ⎞
⎛ ⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠
If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will find
that assets follow Eq. (25) as desired and that the other state variables also update appropri-
ately
To implement preference specification Eq. (24) we take
0 0 0 0 𝑞 0 0 0
⎛
⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞
⎟
𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠
The next figure shows a simulation of consumption and assets computed using the com-
680 41. LQ DYNAMIC PROGRAMMING PROBLEMS
In the previous application, we generated income dynamics with an inverted U shape using
polynomials and placed them in an LQ framework
It is arguably the case that this income process still contains unrealistic features
A more common earning profile is where
1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
2. retirement follows, with lower but relatively stable (non-financial) income
𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise
Here
• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
41.7. FURTHER APPLICATIONS 681
• 𝑠 is retirement income
1. solve lq_retired by the usual backward induction procedure, iterating back to the
start of retirement
2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification
3. solve lq_working by backward induction from this choice of 𝑅𝑓 , iterating back to the
start of working life
This process gives the entire life-time sequence of value functions and optimal policies
682 41. LQ DYNAMIC PROGRAMMING PROBLEMS
The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure
Once again, the dominant feature observable in the simulation is consumption smoothing
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving
Assets peak at retirement and subsequently decline
𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡
∞
E {∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0
41.7. FURTHER APPLICATIONS 683
Here
This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period
It’s not difficult to show that profit-maximizing output is
𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1
• if 𝛾 is close to zero, then 𝑞𝑡 will track the time path of 𝑞𝑡̄ relatively closely
• if 𝛾 is larger, then 𝑞𝑡 will be smoother than 𝑞𝑡̄ , as the monopolist seeks to avoid adjust-
ment costs
The reason for making this substitution is that, as you will be able to verify, 𝜋𝑡̂ reduces to the
simple quadratic
∞
min E ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0
It’s now relatively straightforward to find 𝑅 and 𝑄 such that Eq. (30) can be written as
Eq. (20)
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from Eq. (1) can be found by writing down the dy-
namics of each element of the state
Exercise 3 asks you to complete this process, and reproduce the preceding figures
41.8 Exercises
41.8.1 Exercise 1
41.8.2 Exercise 2
For lq_retired, use the same definition of 𝑥𝑡 and 𝑢𝑡 , but modify 𝐴, 𝐵, 𝐶 to correspond to
constant income 𝑦𝑡 = 𝑠
For lq_retired, set preferences as in Eq. (27)
For lq_working, preferences are the same, except that 𝑅𝑓 should be replaced by the final
value function that emerges from iterating lq_retired back to the start of retirement
686 41. LQ DYNAMIC PROGRAMMING PROBLEMS
With some careful footwork, the simulation can be generated by patching together the simu-
lations from these two separate models
41.8.3 Exercise 3
41.9 Solutions
41.9.1 Exercise 1
𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1
where {𝑤𝑡 } is IID 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 +
𝑚2 𝑡2 has an inverted U shape with
# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, -c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[-1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
688 41. LQ DYNAMIC PROGRAMMING PROBLEMS
41.9.2 Exercise 2
This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income
The model is solved by combining two LQ programming problems as described in the lecture
up = np.column_stack((up_w, up_r))
c = up.flatten() + c_bar # Consumption
# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
690 41. LQ DYNAMIC PROGRAMMING PROBLEMS
41.9.3 Exercise 3
The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation
𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1
∞
min E {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0
# == Useful constants == #
m0 = (a0-c)/(2 * a1)
m1 = 1/(2 * a1)
# == Formulate LQ problem == #
Q = γ
R = [[ a1, -a1, 0],
[-a1, a1, 0],
[ 0, 0, 0]]
A = [[ρ, 0, m0 * (1 - ρ)],
[0, 1, 0],
[0, 0, 1]]
B = [[0],
[1],
[0]]
C = [[m1 * σ],
[ 0],
[ 0]]
time = range(len(q))
ax.set(xlabel='Time', xlim=(0, max(time)))
ax.plot(time, q_bar, 'k-', lw=2, alpha=0.6, label=r'$\bar q_t$')
ax.plot(time, q, 'b-', lw=2, alpha=0.6, label='$q_t$')
ax.legend(ncol=2, **legend_args)
s = f'dynamics with $\gamma = {γ}$'
ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14)
plt.show()
692 41. LQ DYNAMIC PROGRAMMING PROBLEMS
42
42.1 Contents
• Overview 42.2
In addition to what’s in Anaconda, this lecture will need the following libraries
42.2 Overview
This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [43]
Robert Hall cast Friedman’s model within a linear-quadratic setting [48]
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem
We use the model as a vehicle for illustrating
693
694 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
In this section, we state and solve the savings and consumption problem faced by the con-
sumer
42.3.1 Preliminaries
E𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …
Martingales have the feature that the history of past outcomes provides no predictive power
for changes between current and future outcomes
For example, the current wealth of a gambler engaged in a “fair game” has this property
One common class of martingales is the family of random walks
A random walk is a stochastic process {𝑋𝑡 } that satisfies
𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1
𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1
Not every martingale arises as a random walk (see, for example, Wald’s martingale)
42.3. THE SAVINGS PROBLEM 695
A consumer has preferences over consumption streams that are ordered by the utility func-
tional
∞
E0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0
where
The consumer maximizes Eq. (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0
subject to the sequence of budget constraints
1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1
Here
The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random
42.3.3 Assumptions
For the remainder of this lecture, we follow Friedman and Hall in assuming that (1 + 𝑟)−1 = 𝛽
Regarding the endowment process, we assume it has the state-space representation
where
The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite
Regarding preferences, we assume the quadratic utility function
696 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired
∞
E0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0
This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever
First-order conditions for maximizing Eq. (1) subject to Eq. (2) are
E𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)
(In fact, quadratic preferences are necessary for this conclusion [1])
One way to interpret Eq. (6) is that consumption will change only when “new information”
about permanent income is revealed
These ideas will be clarified below
Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes
42.3. THE SAVINGS PROBLEM 697
𝑡
To accomplish this, observe first that Eq. (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0
Using this restriction on the debt path and solving Eq. (2) forward yields
∞
𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0
Take conditional expectations on both sides of Eq. (7) and use the martingale property of
consumption and the law of iterated expectations to deduce
∞
𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽
∞ ∞
𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0
Note that 𝑧𝑡 contains all variables useful for forecasting the consumer’s future endowment
It is plausible that current decisions 𝑐𝑡 and 𝑏𝑡+1 should be expressible as functions of 𝑧𝑡 and 𝑏𝑡
698 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
∞ ∞
∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] = E𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0
𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟
Using this equality to eliminate 𝑐𝑡 in the budget constraint Eq. (2) gives
𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡
To get from the second last to the last expression in this chain of equalities is not trivial
∞
A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡
A State-Space Representation
We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:
𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼) 1 0
and
𝑈 0 𝑦
𝑈̃ = [ −1 ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴) −(1 − 𝛽) 𝑐𝑡
𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
𝑦𝑡̃ = 𝑈̃ 𝑥𝑡
We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = E𝑥𝑡 and covariance Σ𝑡 ∶= E[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
42.3. THE SAVINGS PROBLEM 699
̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡
̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡
𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′
𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0
𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1
Thus income is IID and debt and consumption are both Gaussian random walks
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15
In [3]: r = 0.05
β = 1 / (1 + r)
σ = 0.15
μ = 1
T = 60
@njit
def time_path(T):
w = np.random.randn(T+1) # w_0, w_1, ..., w_T
w[0] = 0
b = np.zeros(T+1)
for t in range(1, T+1):
b[t] = w[1:t].sum()
b = -σ * b
c = μ + (1 - β) * (σ * w - b)
return w, b, c
700 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
w, b, c = time_path(T)
plt.show()
b_sum = np.zeros(T+1)
for i in range(250):
w, b, c = time_path(T) # Generate new time path
rcolor = random.choice(('c', 'g', 'b', 'k'))
ax.plot(c, color=rcolor, lw=0.8, alpha=0.7)
ax.grid()
ax.set(xlabel='Time', ylabel='Consumption')
plt.show()
42.4. ALTERNATIVE REPRESENTATIONS 701
In this section, we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways
Hall [48] suggested an insightful way to summarize the implications of LQ permanent income
theory
First, to represent the solution for 𝑏𝑡 , shift Eq. (9) forward one period and eliminate 𝑏𝑡+1 by
using Eq. (2) to obtain
∞
𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 E𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0
∞
If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 E𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain
∞
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {E𝑡+1 [𝑦𝑡+𝑗+1 ] − E𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0
The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of Eq. (16) and Eq. (8),
which we repeat:
702 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
∞
1
𝑏𝑡 = ∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡
Equation Eq. (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value
of its endowment minus the expected present value of its consumption stream
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡
Recalling again our discussion on forecasting geometric sums, we have
∞
E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0
∞
E𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0
∞
E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0
Using these formulas together with Eq. (3) and substituting into Eq. (16) and Eq. (17) gives
the following representation for the consumer’s optimum decision rule:
42.4.2 Cointegration
Representation Eq. (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that En-
gle and Granger [39] called cointegration
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary [4]
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see Eq. (11)
for 𝑏𝑡 )
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary
42.4. ALTERNATIVE REPRESENTATIONS 703
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (20)
𝑗=0
Equation Eq. (20) asserts that the cointegrating residual on the left side equals the condi-
tional expectation of the geometric sum of future incomes on the right [6]
Consider again Eq. (18), this time in light of our discussion of distribution dynamics in the
lecture on linear systems
The dynamics of 𝑐𝑡 are given by
or
𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1
The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡
In particular, since {𝑤̂ 𝑡 } is IID, we have
where
A number of different studies have investigated this prediction and found some support for it
(see, e.g., [32], [126])
Impulse response functions measure responses to various impulses (i.e., temporary shocks)
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1
It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡
The endowment process defined by Eq. (3) has the moving average representation
where
∞
• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator [3]
• at time 𝑡, the consumer has an information set [5] 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]
Notice that
It follows that
The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡
𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ]+[ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1
42.5. TWO CLASSIC EXAMPLES 705
Here
42.5.1 Example 1
Formula Eq. (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income
𝑧1𝑡+1 leads to
But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1
Application of the formula for debt in Eq. (11) to this example shows that
This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
In [5]: r = 0.05
β = 1 / (1 + r)
S = 5 # Impulse date
σ1 = σ2 = 0.15
@njit
def time_path(T, permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] - σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 - β) * σ2 * w2[t+1]
return b, c
L = 0.175
axes[0].legend(loc='lower right')
plt.tight_layout()
plt.show()
42.5.2 Example 2
Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in Eq. (18)
The discussion in sections 2.9.1 and 2.11.3 of [87] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is
42.5. TWO CLASSIC EXAMPLES 707
𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where
In the same discussion in [87] it is shown that 𝐾 ∈ [0, 1] and that 𝐾 increases as 𝜎1 /𝜎2 does
In other words, 𝐾 increases as the ratio of the standard deviation of the permanent shock to
that of the transitory shock increases
Please see first look at the Kalman filter
Applying formulas Eq. (18) implies
where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as
The consumer permanently increases his consumption by the full amount of his estimate of
the permanent part of 𝑎𝑡+1 , but by only (1 − 𝛽) times his estimate of the purely transitory
part of 𝑎𝑡+1
Therefore, in total, he permanently increments his consumption by a fraction 𝐾 + (1 − 𝛽)(1 −
𝐾) = 1 − 𝛽(1 − 𝐾) of 𝑎𝑡+1
He saves the remaining fraction 𝛽(1 − 𝐾)
According to equation Eq. (29), the first difference of income is a first-order moving average
Equation Eq. (28) asserts that the first difference of consumption is IID
Application of formula to this example shows that
This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved
708 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
The model described above significantly changed how economists think about consumption
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data
For example, liquidity constraints and precautionary savings appear to be present sometimes
Further discussion can be found in, e.g., [49], [103], [31], [22]
𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟
𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 E0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅
[4] This would be the case if, for example, the spectral radius of 𝐴 is strictly less than one
[5] A moving average representation for a process 𝑦𝑡 is said to be fundamental if the linear
space spanned by 𝑦𝑡 is equal to the linear space spanned by 𝑤𝑡 . A time-invariant innovations
representation, attained via the Kalman filter, is by construction fundamental.
[6] See [70], [84], [85] for interesting applications of related ideas.
710 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
43
43.1 Contents
• Overview 43.2
• Setup 43.3
• Implementation 43.5
43.2 Overview
This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption
As we saw in our previous lecture on this topic, Robert Hall [48] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income
711
712 43. OPTIMAL SAVINGS II: LQ TECHNIQUES
This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model
It is just a matter of appropriately relabeling the variables in Hall’s model
In this lecture, we’ll
• show how the solution to the LQ permanent income model can be obtained using LQ
control methods
• represent the model as a linear state space system as in this lecture
• apply QuantEcon’s LinearStateSpace class to characterize statistical features of the con-
sumer’s optimal consumption and borrowing plans
We’ll then use these characterizations to construct a simple model of cross-section wealth and
consumption dynamics in the spirit of Truman Bewley [16]
(Later we’ll study other Bewley models—see this lecture)
The model will prove useful for illustrating concepts such as
• stationarity
• ergodicity
• ensemble moments and cross-section observations
43.3 Setup
Let’s recall the basic features of the model discussed in the permanent income model
Consumer preferences are ordered by
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0
The interpretation of all variables and parameters are the same as in the previous lecture
We continue to assume that (1 + 𝑟)𝛽 = 1
The dynamics of {𝑦𝑡 } again follow the linear state space model
43.3. SETUP 713
The restrictions on the shock process and parameters are the same as in our previous lecture
If we set
For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:
We can map this into the linear state space framework in Eq. (4), as discussed in our lecture
on linear models
To do so we take
1 1 0 0 0
𝑧𝑡 = ⎢ 𝑦𝑡 ⎤
⎡
⎥, 𝐴 = ⎢𝛼 𝜌1 𝜌2 ⎤
⎡
⎥,
⎡
𝐶 = ⎢𝜎 ⎤⎥, and 𝑈 = [0 1 0]
𝑦
⎣ 𝑡−1 ⎦ ⎣ 0 1 0 ⎦ 0
⎣ ⎦
Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions
Here we solve the same model using LQ methods based on dynamic programming
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LinearStateSpace class to illustrate features of the model
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that
Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize
∞
E ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0
̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)
where 𝑤𝑡+1 is IID with mean vector zero and E𝑤𝑡 𝑤𝑡′ = 𝐼
The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in Eq. (4)
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where
• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion
43.5. IMPLEMENTATION 715
̃ ′̃ )
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶
1
𝑧 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ 𝑡 ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦
𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0
Please confirm for yourself that, with these definitions, the LQ dynamics Eq. (5) match the
dynamics of 𝑧𝑡 and 𝑏𝑡 described above
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set
43.5 Implementation
The reason is that it drops out of the Euler equation for consumption
In what follows we set it equal to unity
# Set parameters
α, β, ρ1, ρ2, σ = 10.0, 0.95, 0.9, 0.0, 1.0
R = 1 / β
A = np.array([[1., 0., 0.],
[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
G = np.array([[0., 1., 0.]])
# These choices will initialize the state vector of an individual at zero debt
# and the ergodic distribution of the endowment process. Use these to create
# the Bewley economy.
mxbewley = mxo
sxbewley = sxo
QLQ = np.array([1.0])
BLQ = np.array([0., 0., 0., R]).reshape(4,1)
CLQ = np.array([0., σ, 0., 0.]).reshape(4,1)
β_LQ = β
print(f"R = \n {RLQ}")
print(f"Q = \n {QLQ}")
A =
[[ 1. 0. 0. 0. ]
[10. 0.9 0. 0. ]
[ 0. 1. 0. 0. ]
[ 0. -1.05263158 0. 1.05263158]]
B =
[[0. ]
[0. ]
[0. ]
[1.05263158]]
R =
[[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 1.e-09]]
Q =
[1.]
We’ll save the implied optimal policy function soon compare them with what we get by em-
ploying an alternative solution method
In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method
The method was based around
• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations
• solving those equations to obtain the solution
In [7]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G @ la.inv(np.eye(3, 3) - β * A) @ (A - np.eye(3, 3))
c_pol = (1 - β) * G @ la.inv(np.eye(3, 3) - β * A)
# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = np.array([1., 0., 0., 0.])
Σ_0 = np.zeros((4, 4))
A_LSS calculated as we have here should equal ABF calculated above using the LQ model
We have verified that the two methods give the same solution
Now let’s create instances of the LinearStateSpace class and use it to do some interesting ex-
periments
To do this, we’ll use the outcomes from our second method
• In the first example, all consumers begin with zero nonfinancial income and zero debt
• In the second example, while all begin with zero debt, we draw their initial income lev-
els from the invariant distribution of financial income
In the first example, consumers’ nonfinancial income paths display pronounced transients
early in the sample
We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In the first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths
A second graph plots a collection of simulations against the population distribution that we
extract from the LinearStateSpace instance LSS
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers
• compute and plot population quantiles of the distributions of consumption and debt for
a population of consumers
• simulate a group of 25 consumers and plot sample paths on the same graph as the pop-
ulation distribution
# Simulation/Moment Parameters
moment_generator = LSS.moment_sequence()
for i in range(npaths):
sims = LSS.simulate(T)
bsim[i, :] = sims[0][-1, :]
csim[i, :] = sims[1][1, :]
ysim[i, :] = sims[1][0, :]
# Get T
T = bsim.shape[1]
# Plot debt
ax[1].plot(bsim[0, :], label="b", color="r")
ax[1].plot(bsim.T, alpha=.1, color="r")
ax[1].legend(loc=4)
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
# Consumption fan
ax[0].plot(xvals, cons_mean, color="k")
ax[0].plot(csim.T, color="k", alpha=.25)
ax[0].fill_between(xvals, c_perc_95m, c_perc_95p, alpha=.25, color="b")
43.6. TWO EXAMPLE ECONOMIES 721
# Debt fan
ax[1].plot(xvals, debt_mean, color="k")
ax[1].plot(bsim.T, color="k", alpha=.25)
ax[1].fill_between(xvals, d_perc_95m, d_perc_95p, alpha=.25, color="b")
ax[1].fill_between(xvals, d_perc_90m, d_perc_90p, alpha=.25, color="r")
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0
plt.show()
plt.show()
722 43. OPTIMAL SAVINGS II: LQ TECHNIQUES
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0
So at time 0 we have
∞
𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0
This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0
To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income
43.6. TWO EXAMPLE ECONOMIES 723
He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex-ante identical people each of whom starts with zero debt
All of them accumulate debt in anticipation of rising nonfinancial income
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0
Cointegration Residual
The following figure plots realizations of the left side of Eq. (6), which, as discussed in our
last lecture, is called the cointegrating residual
As mentioned above, the right side can be thought of as an annuity payment on the expected
∞
present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −
∞
𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below
return fig
When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample
Average debt in the cross-section rises and approaches the asymptote
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅
This excess supply is filled by “foreigner lenders” willing to make those loans
We can use virtually the same code to rig a “poor man’s Bewley [16] model” in the following
way
This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1
43.6. TWO EXAMPLE ECONOMIES 725
Across the group of people being analyzed, risk-free loans are in zero excess supply
We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply
So the risk-free loans are being made from one person to another within our closed set of
agent
There is no need for foreigners to lend to our group
Let’s have a look at the corresponding figures
plt.show()
plt.show()
726 43. OPTIMAL SAVINGS II: LQ TECHNIQUES
But now there is some initial dispersion because there is ex-ante heterogeneity in the initial
𝑦
draws of [ −1 ]
𝑦−2
44.1 Contents
• Overview 44.2
• Background 44.3
In addition to what’s in Anaconda, this lecture will need the following libraries
44.2 Overview
Complete markets allow a consumer or government to buy or sell claims contingent on all
possible states of the world
Incomplete markets allow a consumer or government to buy or sell only a limited set of secu-
rities, often only a single risk-free security
Hall [48] and Barro [11] both assumed that the only asset that can be traded is a risk-free one
period bond
729
730 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
Hall assumed an exogenous stochastic process of nonfinancial income and an exogenous gross
interest rate on one period risk-free debt that equals 𝛽 −1 , where 𝛽 ∈ (0, 1) is also a con-
sumer’s intertemporal discount factor
Barro [11] made an analogous assumption about the risk-free interest rate in a tax-smoothing
model that we regard as isomorphic to Hall’s consumption-smoothing model
We maintain Hall and Barro’s assumption about the interest rate when we describe an incom-
plete markets version of our model
In addition, we extend their assumption about the interest rate to an appropriate counterpart
that we use in a “complete markets” model in the style of Lucas and Stokey [90]
While we are equally interested in consumption-smoothing and tax-smoothing models, for the
most part, we focus explicitly on consumption-smoothing versions of these models
But for each version of the consumption-smoothing model, there is a natural tax-smoothing
counterpart obtained simply by
For elaborations on this theme, please see Optimal Savings II: LQ Techniques and later parts
of this lecture
We’ll consider two closely related alternative assumptions about the consumer’s exogenous
nonfinancial income process (or in the tax-smoothing interpretation, the government’s exoge-
nous expenditure process):
• that it obeys a finite 𝑁 state Markov chain (setting 𝑁 = 2 most of the time)
• that it is described by a linear state space model with a continuous state vector in R𝑛
driven by a Gaussian vector IID shock process
We’ll spend most of this lecture studying the finite-state Markov specification, but will briefly
treat the linear state space specification before concluding
This lecture can be viewed as a followup to Optimal Savings II: LQ Techniques and a warm-
up for a model of tax smoothing described in Optimal Taxation with State-Contingent Debt
Linear-quadratic versions of the Lucas-Stokey tax-smoothing model are described in Optimal
Taxation in an LQ Economy
The key difference between those lectures and this one is
• Here the decision-maker takes all prices as exogenous, meaning that his decisions do not
affect them
• In Optimal Taxation in an LQ Economy and Optimal Taxation with State-Contingent
Debt, the decision-maker – the government in the case of these lectures – recognizes
that his decisions affect prices
So these later lectures are partly about how the government should manipulate prices of gov-
ernment debt
44.3. BACKGROUND 731
44.3 Background
In the complete markets version of the model, each period the consumer can buy or sell one-
period ahead state-contingent securities whose payoffs depend on next period’s realization of
the Markov state
In the two-state Markov chain case, there are two such securities each period
In an 𝑁 state Markov state version of the model, 𝑁 such securities are traded each period
These state-contingent securities are commonly called Arrow securities, after Kenneth Arrow
who first theorized about them
In the incomplete markets version of the model, the consumer can buy and sell only one secu-
rity each period, a risk-free bond with gross return 𝛽 −1
𝑦1̄ if 𝑠𝑡 = 𝑠1̄
𝑦𝑡 = {
𝑦2̄ if 𝑠𝑡 = 𝑠2̄
∞
E [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] where 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 and 0 < 𝛽 < 1 (1)
𝑡=0
The two models differ in how effectively the market structure allows the consumer to trans-
fer resources across time and Markov states, there being more transfer opportunities in the
complete markets setting than in the incomplete markets setting
Watch how these differences in opportunities affect
where 𝑏𝑡 is the consumer’s one-period debt that falls due at time 𝑡 and 𝑏𝑡+1 (𝑠𝑗̄ | 𝑠𝑡 ) are the
consumer’s time 𝑡 sales of the time 𝑡 + 1 consumption good in Markov state 𝑠𝑗̄ , a source of
time 𝑡 revenues
An analog of Hall’s assumption that the one-period risk-free gross interest rate is 𝛽 −1 is
To understand this, observe that in state 𝑠𝑖̄ it costs ∑𝑗 𝑞(𝑠𝑗̄ | 𝑠𝑖̄ ) to purchase one unit of con-
sumption next period for sure, i.e., meaning no matter what state of the world occurs at 𝑡 + 1
Hence the implied price of a risk-free claim on one unit of consumption next period is
This confirms that Eq. (2) is a natural analog of Hall’s assumption about the risk-free one-
period interest rate
First-order necessary conditions for maximizing the consumer’s expected utility are
44.4. MODEL 1 (COMPLETE MARKETS) 733
𝑢′ (𝑐𝑡+1 )
𝛽 P{𝑠𝑡+1 | 𝑠𝑡 } = 𝑞(𝑠𝑡+1 | 𝑠𝑡 )
𝑢′ (𝑐𝑡 )
𝑐𝑡+1 = 𝑐𝑡 (3)
Thus, our consumer sets 𝑐𝑡 = 𝑐 ̄ for all 𝑡 ≥ 0 for some value 𝑐 ̄ that it is our job now to deter-
mine
Guess: We’ll make the plausible guess that
so that the amount borrowed today turns out to depend only on tomorrow’s Markov state.
(Why is this is a plausible guess?)
To determine 𝑐,̄ we shall pursue the implications of the consumer’s budget constraints in each
Markov state today and our guess Eq. (4) about the consumer’s debt level choices
For 𝑡 ≥ 1, these imply
or
If we substitute Eq. (6) into the first equation of Eq. (5) and rearrange, we discover that
𝑏(𝑠1̄ ) = 𝑏0 (7)
We can then use the second equation of Eq. (5) to deduce the restriction
𝑦(𝑠1̄ ) − 𝑦(𝑠2̄ ) + [𝑞(𝑠1̄ | 𝑠1̄ ) − 𝑞(𝑠1̄ | 𝑠2̄ ) − 1]𝑏0 + [𝑞(𝑠2̄ | 𝑠1̄ ) + 1 − 𝑞(𝑠2̄ | 𝑠2̄ )]𝑏(𝑠2̄ ) = 0, (8)
The preceding calculations indicate that in the complete markets version of our model, we
obtain the following striking results:
• The consumer chooses to make consumption perfectly constant across time and Markov
states
We computed the constant level of consumption 𝑐 ̄ and indicated how that level depends on
the underlying specifications of preferences, Arrow securities prices, the stochastic process of
exogenous nonfinancial income, and the initial debt level 𝑏0
• The consumer’s debt neither accumulates, nor decumulates, nor drifts – instead, the
debt level each period is an exact function of the Markov state, so in the two-state
Markov case, it switches between two values
• We have verified guess Eq. (4)
We computed how one of those debt levels depends entirely on initial debt – it equals it – and
how the other value depends on virtually all remaining parameters of the model
44.4.2 Code
Here’s some code that, among other things, contains a function called consump-
tion_complete()
This function computes 𝑏(𝑠1̄ ), 𝑏(𝑠2̄ ), 𝑐 ̄ as outcomes given a set of parameters, under the as-
sumption of complete markets
class ConsumptionProblem:
"""
The data for a consumption problem, including some default values.
"""
def __init__(self,
β=.96,
y=[2, 1.5],
b0=3,
P=np.asarray([[.8, .2],
[.4, .6]])):
"""
Parameters
----------
β : discount factor
P : 2x2 transition matrix
y : list containing the two income levels
b0 : debt in period 0 (= state_1 debt level)
"""
self.β = β
self.y = y
self.b0 = b0
self.P = P
44.4. MODEL 1 (COMPLETE MARKETS) 735
def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.
Parameters
----------
cp : instance of ConsumptionProblem
Returns
-------
Q = β * P
"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack
Parameters
----------
cp : instance of ConsumptionProblem
N_simul : int
"""
# Useful variables
y = np.asarray(y).reshape(2, 1)
v = np.linalg.inv(np.eye(2) - β * P) @ y
for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - b_path[i] * np.ones((2, 1)))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]
In [3]: cp = ConsumptionProblem()
c_bar, b1, b2 = consumption_complete(cp)
debt_complete = np.asarray([b1, b2])
np.isclose(c_bar + b2 - cp.y[1] - (cp.β * cp.P)[1, :] @ debt_complete, 0)
Out[3]: True
Below, we’ll take the outcomes produced by this code – in particular the implied consumption
and debt paths – and compare them with outcomes from an incomplete markets model in the
spirit of Hall [48] and Barro [11] (and also, for those who love history, Gallatin (1807) [46])
This is a version of the original models of Hall (1978) and Barro (1979) in which the decision-
maker’s ability to substitute intertemporally is constrained by his ability to buy or sell only
one security, a risk-free one-period bond bearing a constant gross interest rate that equals 𝛽 −1
Given an initial debt 𝑏0 at time 0, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝑏𝑡+1 , 𝑡≥0
where 𝛽 is the price at time 𝑡 of a risk-free claim on one unit of time consumption at time
𝑡+1
First-order conditions for the consumer’s problem are
∞
𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − (1 − 𝛽)−1 𝑐𝑡 (10)
𝑗=0
and
∞
𝑐𝑡 = (1 − 𝛽) [E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑏𝑡 ] (11)
𝑗=0
Equation Eq. (11) expresses 𝑐𝑡 as a net interest rate factor 1 − 𝛽 times the sum of the ex-
∞
pected present value of nonfinancial income E𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 and financial wealth −𝑏𝑡
44.5. MODEL 2 (ONE-PERIOD RISK-FREE DEBT ONLY) 737
Substituting Eq. (11) into the one-period budget constraint and rearranging leads to
∞
𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑦𝑡 ] (12)
𝑗=0
Now let’s do a useful calculation that will yield a convenient expression for the key term
∞
E𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 in our finite Markov chain setting
Define
∞
𝑣𝑡 ∶= E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0
In our finite Markov chain setting, 𝑣𝑡 = 𝑣(1) when 𝑠𝑡 = 𝑠1̄ and 𝑣𝑡 = 𝑣(2) when 𝑠𝑡 = 𝑠2̄
Therefore, we can write
𝑣 ⃗ = 𝑦 ⃗ + 𝛽𝑃 𝑣 ⃗
𝑣(1) 𝑦(1)
where 𝑣 ⃗ = [ ] and 𝑦 ⃗ = [ ]
𝑣(2) 𝑦(2)
We can also write the last expression as
𝑣 ⃗ = (𝐼 − 𝛽𝑃 )−1 𝑦 ⃗
In our finite Markov chain setting, from expression Eq. (11), consumption at date 𝑡 when
debt is 𝑏𝑡 and the Markov state today is 𝑠𝑡 = 𝑖 is evidently
In contrast to outcomes in the complete markets model, in the incomplete markets model
• consumption drifts over time as a random walk; the level of consumption at time 𝑡 de-
pends on the level of debt that the consumer brings into the period as well as the ex-
pected discounted present value of nonfinancial income at 𝑡
• the consumer’s debt drifts upward over time in response to low realizations of nonfinan-
cial income and drifts downward over time in response to high realizations of nonfinan-
cial income
738 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
• the drift over time in the consumer’s debt and the dependence of current consumption
on today’s debt level account for the drift over time in consumption
The code above also contains a function called consumption_incomplete() that uses Eq. (13)
and Eq. (14) to
Let’s try this, using the same parameters in both complete and incomplete markets economies
np.random.seed(1)
N_simul = 150
cp = ConsumptionProblem()
ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), c_bar * np.ones(N_simul), label='complete market')
ax[0].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path], label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')
plt.show()
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets, but it takes a random walk
in the incomplete markets version of the model
• the consumer’s debt oscillates between two values that are functions of the Markov state
in the complete markets model, while the consumer’s debt drifts in a “unit root” fashion
in the incomplete markets economy
plt.show()
• Purchasing insurance protects the government against the need to raise taxes
too high or issue too much debt in the high government expenditure event.
We assume that government expenditures move between two values 𝐺1 < 𝐺2 , where Markov
state 1 means “peace” and Markov state 2 means “war”
The government budget constraint in Markov state 𝑖 is
𝑇𝑖 + 𝑏𝑖 = 𝐺𝑖 + ∑ 𝑄𝑖𝑗 𝑏𝑗
𝑗
where
𝑄𝑖𝑗 = 𝛽𝑃𝑖𝑗
740 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
is the price of one unit of output next period in state 𝑗 when today’s Markov state is 𝑖 and 𝑏𝑖
is the government’s level of assets in Markov state 𝑖
That is, 𝑏𝑖 is the amount of the one-period loans owned by the government that fall due at
time 𝑡
As above, we’ll assume that the initial Markov state is state 1
In addition, to simplify our example, we’ll set the government’s initial asset level to 0, so that
𝑏1 = 0
Here’s our code to compute a quantitative example with zero debt in peace time:
In [6]: # Parameters
β = .96
y = [1, 2]
b0 = 0
P = np.asarray([[.8, .2],
[.4, .6]])
cp = ConsumptionProblem(β, y, b0, P)
Q = β * P
N_simul = 150
print(f"P \n {P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in peace and war = {y}")
print(f"Constant tax collections = {c_bar}")
print(f"Govt assets in two states = {debt_complete}")
msg = """
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
"""
print(msg)
AS1 = Q[0, 1] * b2
print(f"Spending on Arrow war security in peace = {AS1}")
AS2 = Q[1, 1] * b2
print(f"Spending on Arrow war security in war = {AS2}")
print("\n")
print("Government tax collections plus asset levels in peace and war")
TB1 = c_bar + b1
print(f"T+b in peace = {TB1}")
TB2 = c_bar + b2
print(f"T+b in war = {TB2}")
print("\n")
print("Total government spending in peace and war")
G1 = y[0] + AS1
G2 = y[1] + AS2
print(f"Peace = {G1}")
print(f"War = {G2}")
print("\n")
print("Let's see ex-post and ex-ante returns on Arrow securities")
Π = np.reciprocal(Q)
exret = Π
print(f"Ex-post returns to purchase of Arrow securities = {exret}")
exant = Π * P
print(f"Ex-ante returns to purchase of Arrow securities {exant}")
P
[[0.8 0.2]
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 741
[0.4 0.6]]
Q
[[0.768 0.192]
[0.384 0.576]]
Govt expenditures in peace and war = [1, 2]
Constant tax collections = 1.3116883116883118
Govt assets in two states = [0. 1.62337662]
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
44.6.1 Explanation
In this example, the government always purchase 0 units of the Arrow security that pays off
in peace time (Markov state 1)
But it purchases a positive amount of the security that pays off in war time (Markov state 2)
We recommend plugging the quantities computed above into the government budget con-
straints in the two Markov states and staring
This is an example in which the government purchases insurance against the possibility that
war breaks out or continues
1 0
𝑃 =[ ]
.2 .8
Also, start the system in Markov state 2 (war) with initial government assets −10, so that the
government starts the war in debt and 𝑏2 = −10
Now we’ll use a setting like that in the first lecture on the permanent income model
742 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
• incomplete markets: the consumer could trade only a single risk-free one-period bond
bearing gross one-period risk-free interest rate equal to 𝛽 −1
• the consumer’s exogenous nonfinancial income was governed by a linear state space
model driven by Gaussian shocks, the kind of model studied in an earlier lecture about
linear state space models
where 𝜙(⋅ | 𝜇, Σ) is a multivariate Gaussian distribution with mean vector 𝜇 and covariance
matrix Σ
Let 𝑏(𝑥𝑡+1 ) be a vector of state-contingent debt due at 𝑡 + 1 as a function of the 𝑡 + 1 state
𝑥𝑡+1 .
Using the pricing function assumed in Eq. (15), the value at 𝑡 of 𝑏(𝑥𝑡+1 ) is
In the complete markets setting, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽E𝑡 𝑏𝑡+1 , 𝑡 ≥ 0
∞
𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 )
𝑗=0
We assume as before that the consumer cares about the expected value of
∞
∑ 𝛽 𝑡 𝑢(𝑐𝑡 ), 0<𝛽<1
𝑡=0
In the incomplete markets version of the model, we assumed that 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 , so that
the above utility functional became
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 743
∞
− ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)2 , 0<𝛽<1
𝑡=0
But in the complete markets version, we can assume a more general form of utility function
that satisfies 𝑢′ > 0 and 𝑢″ < 0
The first-order condition for the consumer’s problem with complete markets and our assump-
tion about Arrow securities prices is
∞
𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐)̄
𝑗=0
or
1
𝑏𝑡 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥𝑡 − 𝑐̄ (16)
1−𝛽
1
𝑏̄0 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥0 − 𝑐̄ (17)
1−𝛽
where 𝑏̄0 is an initial level of the consumer’s debt, specified as a parameter of the problem
Thus, in the complete markets version of the consumption-smoothing model, 𝑐𝑡 = 𝑐,̄ ∀𝑡 ≥ 0 is
determined by Eq. (17) and the consumer’s debt is a fixed function of the state 𝑥𝑡 described
by Eq. (16)
Here’s an example that shows how in this setting the availability of insurance against fluctu-
ating nonfinancial income allows the consumer completely to smooth consumption across time
and across states of the world
# Debt
x_hist, y_hist = lss.simulate(T)
b_hist = np.squeeze(S_y @ rm @ x_hist - cbar / (1 - β))
# Define parameters
N_simul = 150
α, ρ1, ρ2 = 10.0, 0.9, 0.0
σ = 1.0
# Consumption plots
ax[0].set_title('Cons and income', fontsize=17)
ax[0].plot(np.arange(N_simul), c_hist_com, label='consumption')
ax[0].plot(np.arange(N_simul), y_hist_com, label='income', alpha=.6, linestyle='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([-5.0, 110])
# Debt plots
ax[1].set_title('Debt and income')
ax[1].plot(np.arange(N_simul), b_hist_com, label='debt')
ax[1].plot(np.arange(N_simul), y_hist_com, label='Income', alpha=.6, linestyle='--')
ax[1].legend()
ax[1].axhline(0, color='k')
ax[1].set_xlabel('Periods')
plt.show()
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 745
The incomplete markets version of the model with nonfinancial income being governed by a
linear state space system is described in the first lecture on the permanent income model and
the followup lecture on the permanent income model
In that version, consumption follows a random walk and the consumer’s debt follows a pro-
cess with a unit root
We leave it to the reader to apply the usual isomorphism to deduce the corresponding impli-
cations for a tax-smoothing model like Barro’s [11]
45.1 Contents
• Overview 45.2
• Computation 45.4
• Exercises 45.5
• Solutions 45.6
In addition to what’s in Anaconda, this lecture will need the following libraries
45.2 Overview
Next, we study an optimal savings problem for an infinitely lived consumer—the “common
ancestor” described in [87], section 1.3
This is an essential sub-problem for many representative macroeconomic models
• [4]
• [68]
• etc.
It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint
Our presentation of the model will be relatively brief
747
748 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
• For further details on economic intuition, implication and models, see [87]
• Proofs of all mathematical results stated below can be found in this paper
To solve the model we will use Euler equation based time iteration, similar to this lecture
This method turns out to be globally convergent under mild assumptions, even when utility is
unbounded (both above and below)
We’ll need the following imports
45.2.1 References
Other useful references include [31], [33], [80], [105], [108] and [119]
Let’s write down the model and then discuss how to solve it
45.3.1 Set-Up
Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize
∞
E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
Here
Non-capital income {𝑧𝑡 } is assumed to be a Markov process taking values in 𝑍 ⊂ (0, ∞) with
stochastic kernel Π
This means that Π(𝑧, 𝐵) is the probability that 𝑧𝑡+1 ∈ 𝐵 given 𝑧𝑡 = 𝑧
The expectation of 𝑓(𝑧𝑡+1 ) given 𝑧𝑡 = 𝑧 is written as
45.3. THE OPTIMAL SAVINGS PROBLEM 749
The asset space is [−𝑏, ∞) and the state is the pair (𝑎, 𝑧) ∈ 𝑆 ∶= [−𝑏, ∞) × 𝑍
A feasible consumption path from (𝑎, 𝑧) ∈ 𝑆 is a consumption sequence {𝑐𝑡 } such that {𝑐𝑡 }
and its induced asset path {𝑎𝑡 } satisfy
1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)
2. the feasibility constraints in Eq. (1), and
3. measurability of 𝑐𝑡 w.r.t. the filtration generated by {𝑧1 , … , 𝑧𝑡 }
The meaning of the third point is just that consumption at time 𝑡 can only be a function of
outcomes that have already been observed
∞
𝑉 (𝑎, 𝑧) ∶= sup E {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0
and
In essence, this says that the natural “arbitrage” relation 𝑢′ (𝑐𝑡 ) = 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )] holds when
the choice of current consumption is interior
Interiority means that 𝑐𝑡 is strictly less than its upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏
(The lower boundary case 𝑐𝑡 = 0 never arises at the optimum because 𝑢′ (0) = ∞)
When 𝑐𝑡 does hit the upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏, the strict inequality 𝑢′ (𝑐𝑡 ) > 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )]
can occur because 𝑐𝑡 cannot increase sufficiently to attain equality
750 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
With some thought and effort, one can show that Eq. (3) and Eq. (4) are equivalent to
1. For each (𝑎, 𝑧) ∈ 𝑆, a unique optimal consumption path from (𝑎, 𝑧) exists
2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality Eq. (5)
and the transversality condition
Moreover, there exists an optimal consumption function 𝜎∗ ∶ 𝑆 → [0, ∞) such that the path
from (𝑎, 𝑧) generated by
(𝑎0 , 𝑧0 ) = (𝑎, 𝑧), 𝑧𝑡+1 ∼ Π(𝑧𝑡 , 𝑑𝑦), 𝑐𝑡 = 𝜎∗ (𝑎𝑡 , 𝑧𝑡 ) and 𝑎𝑡+1 = 𝑅𝑎𝑡 + 𝑧𝑡 − 𝑐𝑡
satisfies both Eq. (5) and Eq. (6), and hence is the unique optimal path from (𝑎, 𝑧)
In summary, to solve the optimization problem, we need to compute 𝜎∗
45.4 Computation
We can rewrite Eq. (5) to make it a statement about functions rather than random variables
In particular, consider the functional equation
𝑢′ ∘ 𝜎 (𝑎, 𝑧) = max {𝛾 ∫ 𝑢′ ∘ 𝜎 {𝑅𝑎 + 𝑧 − 𝑐(𝑎, 𝑧), 𝑧}́ Π(𝑧, 𝑑𝑧)́ , 𝑢′ (𝑅𝑎 + 𝑧 + 𝑏)} (7)
where
We have to be careful with VFI (i.e., iterating with 𝑇 ) in this setting because 𝑢 is not as-
sumed to be bounded
• In fact typically unbounded both above and below — e.g. 𝑢(𝑐) = log 𝑐
• In which case, the standard DP theory does not apply
• 𝑇 𝑛 𝑣 is not guaranteed to converge to the value function for arbitrary continuous
bounded 𝑣
752 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
Nonetheless, we can always try the popular strategy “iterate and hope”
We can then check the outcome by comparing with that produced by TI
The latter is known to converge, as described above
45.4.3 Implementation
First, we build a class called ConsumerProblem that stores the model primitives
self.u, self.du = u, du
self.r, self.R = r, 1 + r
self.β, self.b = β, b
self.Π, self.z_vals = np.array(Π), tuple(z_vals)
self.asset_grid = np.linspace(-b, grid_max, grid_size)
@njit
def euler_diff(c, a, z, i_z, σ):
"""
The difference of the left-hand side and the right-hand side
of the Euler Equation.
"""
lhs = du(c)
expectation = 0
for i in range(len(z_vals)):
expectation += du(interp(asset_grid, σ[:, i], R * a + z - c)) * Π[i_z, i]
rhs = max(γ * expectation, du(R * a + z + b))
@njit
def K(σ):
"""
The operator K.
return σ_new
return K
K uses linear interpolation along the asset grid to approximate the value and consumption
functions
To solve for the optimal policy function, we will write a function solve_model to iterate
and find the optimal 𝜎
"""
Solves for the optimal policy using time iteration
* cp is an instance of ConsumerProblem
"""
# initial guess of σ
σ = np.empty((len(asset_grid), len(z_vals)))
for i_a, a in enumerate(asset_grid):
for i_z, z in enumerate(z_vals):
c_max = R * a + z + b
σ[i_a, i_z] = c_max
K = operator_factory(cp)
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Plotting the result using the default parameters of the ConsumerProblem class
In [6]: cp = ConsumerProblem()
σ_star = solve_model(cp)
754 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
Converged in 41 iterations.
The following exercises walk you through several applications where policy functions are com-
puted
45.5 Exercises
45.5.1 Exercise 1
45.5.2 Exercise 2
Now let’s consider the long run asset levels held by households
We’ll take r = 0.03 and otherwise use default parameters
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal
𝑎′ = ℎ(𝑎, 𝑧) ∶= 𝑅𝑎 + 𝑧 − 𝜎∗ (𝑎, 𝑧)
Ergodicity is valid here, so stationary probabilities can be calculated by averaging over a sin-
gle long time series
Hence to approximate the stationary distribution we can simulate a long time series for assets
and histogram, as in the following figure
45.5. EXERCISES 757
45.5.3 Exercise 3
Following on from exercises 1 and 2, let’s look at how savings and aggregate asset holdings
vary with the interest rate
• Note: [87] section 18.6 can be consulted for more background on the topic treated in
this exercise
For a given parameterization of the model, the mean of the stationary distribution can be in-
terpreted as aggregate capital in an economy with a unit mass of ex-ante identical households
facing idiosyncratic shocks
Let’s look at how this measure of aggregate capital varies with the interest rate and borrow-
ing constraint
The next figure plots aggregate capital against the interest rate for b in (1, 3)
758 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
45.6 Solutions
45.6.1 Exercise 1
45.6.2 Exercise 2
cp is an instance of ConsumerProblem
"""
Π, z_vals, R = cp.Π, cp.z_vals, cp.R # Simplify names
mc = MarkovChain(Π)
σ_star = solve_model(cp, verbose=False)
cf = lambda a, i_z: interp(cp.asset_grid, σ_star[:, i_z], a)
a = np.zeros(T+1)
z_seq = mc.simulate(T)
for t in range(T):
i_z = z_seq[t]
a[t+1] = R * a[t] + z_vals[i_z] - cf(a[t], i_z)
return a
cp = ConsumerProblem(r=0.03, grid_max=4)
a = compute_asset_series(cp)
45.6.3 Exercise 3
In [10]: M = 25
r_vals = np.linspace(0, 0.04, M)
fig, ax = plt.subplots(figsize=(10, 8))
Finished iteration b = 1
Finished iteration b = 3
45.6. SOLUTIONS 761
762 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
46
Robustness
46.1 Contents
• Overview 46.2
• Implementation 46.7
• Application 46.8
• Appendix 46.9
In addition to what’s in Anaconda, this lecture will need the following libraries
46.2 Overview
This lecture modifies a Bellman equation to express a decision-maker’s doubts about transi-
tion dynamics
His specification doubts make the decision-maker want a robust decision rule
Robust means insensitive to misspecification of transition dynamics
The decision-maker has a single approximating model
He calls it approximating to acknowledge that he doesn’t completely trust it
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly
All that he knows is that the actual data-generating model is in some (uncountable) set of
models that surrounds his approximating model
763
764 46. ROBUSTNESS
He quantifies the discrepancy between his approximating model and the genuine data-
generating model by using a quantity called entropy
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models
actually governs outcomes
This is what it means for his decision rule to be “robust to misspecification of an approximat-
ing model”
This may sound like too much to ask for, but …
… a secret weapon is available to design robust decision rules
The secret weapon is max-min control theory
A value-maximizing decision-maker enlists the aid of an (imaginary) value-minimizing model
chooser to construct bounds on the value attained by a given decision rule under different
models of the transition dynamics
The original decision-maker uses those bounds to construct a decision rule with an assured
performance level, no matter which model actually governs outcomes
Note
In reading this lecture, please don’t think that our decision-maker is paranoid
when he conducts a worst-case analysis. By designing a rule that works well
against a worst-case, his intention is to construct a rule that will work well across
a set of models.
Our “robust” decision-maker wants to know how well a given rule will work when he does not
know a single transition law …
… he wants to know sets of values that will be attained by a given decision rule 𝐹 under a set
of transition laws
Ultimately, he wants to design a decision rule 𝐹 that shapes these sets of values in ways that
he prefers
With this in mind, consider the following graph, which relates to a particular decision prob-
lem to be explained below
46.2. OVERVIEW 765
• Value refers to a sum of discounted rewards obtained by applying the decision rule 𝐹
when the state starts at some fixed initial state 𝑥0
• Entropy is a non-negative number that measures the size of a set of models surrounding
the decision-maker’s approximating model
– Entropy is zero when the set includes only the approximating model, indicating
that the decision-maker completely trusts the approximating model
– Entropy is bigger, and the set of surrounding models is bigger, the less the
decision-maker trusts the approximating model
The shaded region indicates that for all models having entropy less than or equal to the num-
ber on the horizontal axis, the value obtained will be somewhere within the indicated set of
values
Now let’s compare sets of values associated with two different decision rules, 𝐹𝑟 and 𝐹𝑏
In the next figure,
• The red set shows the value-entropy correspondence for decision rule 𝐹𝑟
• The blue set shows the value-entropy correspondence for decision rule 𝐹𝑏
766 46. ROBUSTNESS
• more robust means that the set of values is less sensitive to increasing misspecification
as measured by entropy
Notice that the less robust rule 𝐹𝑟 promises higher values for small misspecifications (small
entropy)
(But it is more fragile in the sense that it is more sensitive to perturbations of the approxi-
mating model)
Below we’ll explain in detail how to construct these sets of values for a given 𝐹 , but for now
…
Here is a hint about the secret weapons we’ll use to construct these sets
If you want to understand more about why one serious quantitative researcher is interested in
this approach, we recommend Lars Peter Hansen’s Nobel lecture
46.3. THE MODEL 767
• [56]
• [52]
For simplicity, we present ideas in the context of a class of problems with linear transition
laws and quadratic objective functions
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than
value maximization
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of con-
trols {𝑢𝑡 } to minimize
∞
∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (1)
𝑡=0
As before,
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
• 𝑅 is 𝑛 × 𝑛 and 𝑄 is 𝑘 × 𝑘
We also allow for model uncertainty on the part of the agent solving this optimization prob-
lem
In particular, the agent takes 𝑤𝑡 = 0 for all 𝑡 ≥ 0 as a benchmark model but admits the
possibility that this model might be wrong
As a consequence, she also considers a set of alternative models expressed in terms of se-
quences {𝑤𝑡 } that are “close” to the zero sequence
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {𝑤𝑡 }
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the
∞
expression ∑𝑡=0 𝛽 𝑡+1 𝑤𝑡+1
′
𝑤𝑡+1
768 46. ROBUSTNESS
If our agent takes {𝑤𝑡 } as a given deterministic sequence, then, drawing on intuition from
earlier lectures on dynamic programming, we can anticipate Bellman equations such as
where
and 𝐼 is a 𝑗 × 𝑗 identity matrix. Substituting this expression for the maximum into Eq. (3)
yields
𝑃 = ℬ(𝒟(𝑃 ))
The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and 𝑃 = ℬ(𝑃 ) is the
standard matrix Riccati equation coming from the Bellman equation — see this discussion
Under some regularity conditions (see [52]), the operator ℬ ∘ 𝒟 has a unique positive definite
fixed point, which we denote below by 𝑃 ̂
A robust policy, indexed by 𝜃, is 𝑢 = −𝐹 ̂ 𝑥 where
We also define
The interpretation of 𝐾̂ is that 𝑤𝑡+1 = 𝐾𝑥̂ 𝑡 on the worst-case path of {𝑥𝑡 }, in the sense that
this vector is the maximizer of Eq. (4) evaluated at the fixed rule 𝑢 = −𝐹 ̂ 𝑥
Note that 𝑃 ̂ , 𝐹 ̂ , 𝐾̂ are all determined by the primitives and 𝜃
Note also that if 𝜃 is very large, then 𝒟 is approximately equal to the identity mapping
Hence, when 𝜃 is large, 𝑃 ̂ and 𝐹 ̂ are approximately equal to their standard LQ values
Furthermore, when 𝜃 is large, 𝐾̂ is approximately equal to zero
Conversely, smaller 𝜃 is associated with greater fear of model misspecification and greater
concern for robustness
What we have done above can be interpreted in terms of a two-person zero-sum game in
which 𝐹 ̂ , 𝐾̂ are Nash equilibrium objects
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting
the possibility of misspecification
Agent 2 is an imaginary malevolent player
Agent 2’s malevolence helps the original agent to compute bounds on his value function
across a set of models
We begin with agent 2’s problem
770 46. ROBUSTNESS
Agent 2
1. knows a fixed policy 𝐹 specifying the behavior of agent 1, in the sense that 𝑢𝑡 = −𝐹 𝑥𝑡
for all 𝑡
2. responds by choosing a shock sequence {𝑤𝑡 } from a set of paths sufficiently close to the
benchmark sequence {0, 0, 0, …}
A natural way to say “sufficiently close to the zero sequence” is to restrict the summed inner
∞
product ∑𝑡=1 𝑤𝑡′ 𝑤𝑡 to be small
However, to obtain a time-invariant recursive formulation, it turns out to be convenient to
restrict a discounted inner product
∞
∑ 𝛽 𝑡 𝑤𝑡′ 𝑤𝑡 ≤ 𝜂 (9)
𝑡=1
Now let 𝐹 be a fixed policy, and let 𝐽𝐹 (𝑥0 , w) be the present-value cost of that policy given
sequence w ∶= {𝑤𝑡 } and initial condition 𝑥0 ∈ R𝑛
Substituting −𝐹 𝑥𝑡 for 𝑢𝑡 in Eq. (1), this value can be written as
∞
𝐽𝐹 (𝑥0 , w) ∶= ∑ 𝛽 𝑡 𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 (10)
𝑡=0
where
and the initial condition 𝑥0 is as specified in the left side of Eq. (10)
Agent 2 chooses w to maximize agent 1’s loss 𝐽𝐹 (𝑥0 , w) subject to Eq. (9)
Using a Lagrangian formulation, we can express this problem as
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃(𝑤𝑡+1
′
𝑤𝑡+1 − 𝜂)}
w
𝑡=0
where {𝑥𝑡 } satisfied Eq. (11) and 𝜃 is a Lagrange multiplier on constraint Eq. (9)
For the moment, let’s take 𝜃 as fixed, allowing us to drop the constant 𝛽𝜃𝜂 term in the objec-
tive function, and hence write the problem as
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 }
w
𝑡=0
or, equivalently,
∞
min ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 + 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (12)
w
𝑡=0
46.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 771
∞
𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃𝜂 )′ 𝐾(𝐹 , 𝜃𝜂 )𝑥𝑡 = 𝜂 (13)
𝑡=0
Here 𝑥𝑡 is given by Eq. (11) — which in this case becomes 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡
46.5.2 Using Agent 2’s Problem to Construct Bounds on the Value Sets
∞ ∞
𝑅𝜃 (𝑥0 , 𝐹 ) ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } + 𝛽𝜃 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1 ,
𝑡=0 𝑡=0
∞
𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } (14)
𝑡=0
where
∞
ent ∶= 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality Eq. (14) is a straight line with slope −𝜃
Technically, it is a “separating hyperplane”
At a particular value of entropy, the line is tangent to the lower bound of values as a function
of entropy
In particular, the lower bound on the left side of Eq. (14) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)′ 𝐾(𝐹 , 𝜃)𝑥𝑡 (15)
𝑡=0
To construct the lower bound on the set of values associated with all perturbations w satisfy-
ing the entropy constraint Eq. (9) at a given entropy level, we proceed as follows:
772 46. ROBUSTNESS
Note
This procedure sweeps out a set of separating hyperplanes indexed by different
values for the Lagrange multiplier 𝜃
∞
𝑡 ′ ̃ ′
̃ 0 , 𝐹 ) = max ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 − 𝛽 𝜃𝑤𝑡+1 𝑤𝑡+1 }
𝑉𝜃 (𝑥 ′
(16)
w
𝑡=0
∞ ∞
𝑉𝜃 (𝑥 𝑡 ′ ′ ̃ 𝑡 ′
̃ 0 , 𝐹 ) ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } − 𝛽 𝜃 ∑ 𝛽 𝑤𝑡+1 𝑤𝑡+1
𝑡=0 𝑡=0
∞
𝑉𝜃 (𝑥 ̃ 𝑡 ′ ′
̃ 0 , 𝐹 ) + 𝜃 ent ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } (17)
𝑡=0
where
∞
ent ≡ 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality Eq. (17) is a straight line with slope 𝜃 ̃
The upper bound on the left side of Eq. (17) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)̃ ′ 𝐾(𝐹 , 𝜃)𝑥
̃
𝑡 (18)
𝑡=0
To construct the upper bound on the set of values associated all perturbations w with a given
entropy we proceed much as we did for the lower bound
∞
min ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (19)
{𝑢𝑡 }
𝑡=0
∞
∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 − 𝛽𝜃𝐾 ′ 𝐾)𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (20)
𝑡=0
subject to
Once again, the expression for the optimal policy can be found here — we denote it by 𝐹 ̃
Clearly, the 𝐹 ̃ we have obtained depends on 𝐾, which, in agent 2’s problem, depended on an
initial policy 𝐹
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where
𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))
As you may have already guessed, the robust policy 𝐹 ̂ defined in Eq. (7) is a fixed point of
the mapping Φ
In particular, for any given 𝜃,
774 46. ROBUSTNESS
Now we turn to the stochastic case, where the sequence {𝑤𝑡 } is treated as an IID sequence of
random vectors
In this setting, we suppose that our agent is uncertain about the conditional probability distri-
bution of 𝑤𝑡+1
The agent takes the standard normal distribution 𝑁 (0, 𝐼) as the baseline conditional distribu-
tion, while admitting the possibility that other “nearby” distributions prevail
These alternative conditional distributions of 𝑤𝑡+1 might depend nonlinearly on the history
𝑥𝑠 , 𝑠 ≤ 𝑡
To implement this idea, we need a notion of what it means for one distribution to be near
another one
Here we adopt a very useful measure of closeness for distributions known as the relative en-
tropy, or Kullback-Leibler divergence
For densities 𝑝, 𝑞, the Kullback-Leibler divergence of 𝑞 from 𝑝 is defined as
𝑝(𝑥)
𝐷𝐾𝐿 (𝑝, 𝑞) ∶= ∫ ln [ ] 𝑝(𝑥) 𝑑𝑥
𝑞(𝑥)
Using this notation, we replace Eq. (3) with the stochastic analog
𝐽 (𝑥) = min max {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [∫ 𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]} (22)
𝑢 𝜓∈𝒫
Here 𝒫 represents the set of all densities on R𝑛 and 𝜙 is the benchmark distribution 𝑁 (0, 𝐼)
The distribution 𝜙 is chosen as the least desirable conditional distribution in terms of next
period outcomes, while taking into account the penalty term 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)
This penalty term plays a role analogous to the one played by the deterministic penalty 𝜃𝑤′ 𝑤
in Eq. (3), since it discourages large deviations from the benchmark
The maximization problem in Eq. (22) appears highly nontrivial — after all, we are maximiz-
ing over an infinite dimensional space consisting of the entire set of densities
However, it turns out that the solution is tractable, and in fact also falls within the class of
normal distributions
First, we note that 𝐽 has the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 for some positive definite matrix 𝑃 and
constant real number 𝑑
46.6. THE STOCHASTIC CASE 775
where
Substituting the expression for the maximum into Bellman equation Eq. (22) and using
𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 gives
𝑥′ 𝑃 𝑥 + 𝑑 = min {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]} (25)
𝑢
Since constant terms do not affect minimizers, the solution is the same as Eq. (6), leading to
To solve this Bellman equation, we take 𝑃 ̂ to be the positive definite fixed point of ℬ ∘ 𝒟
In addition, we take 𝑑 ̂ as the real number solving 𝑑 = 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )], which is
𝛽
𝑑 ̂ ∶= 𝜅(𝜃, 𝑃 ) (26)
1−𝛽
The robust policy in this stochastic case is the minimizer in Eq. (25), which is once again 𝑢 =
−𝐹 ̂ 𝑥 for 𝐹 ̂ given by Eq. (7)
Substituting the robust policy into Eq. (24) we obtain the worst-case shock distribution:
̂ 𝑡 , (𝐼 − 𝜃−1 𝐶 ′ 𝑃 ̂ 𝐶)−1 )
𝑤𝑡+1 ∼ 𝑁 (𝐾𝑥
Before turning to implementation, we briefly outline how to compute several other quantities
of interest
Worst-Case Value of a Policy
One thing we will be interested in doing is holding a policy fixed and computing the dis-
counted loss associated with that policy
776 46. ROBUSTNESS
So let 𝐹 be a given policy and let 𝐽𝐹 (𝑥) be the associated loss, which, by analogy with
Eq. (22), satisfies
Writing 𝐽𝐹 (𝑥) = 𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 and applying the same argument used to derive Eq. (23) we get
and
𝛽 𝛽
𝑑𝐹 ∶= 𝜅(𝜃, 𝑃𝐹 ) = 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃𝐹 𝐶)−1 ] (27)
1−𝛽 1−𝛽
If you skip ahead to the appendix, you will be able to verify that −𝑃𝐹 is the solution to the
Bellman equation in agent 2’s problem discussed above — we use this in our computations
46.7 Implementation
The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ
optimal control
The code can be found on GitHub
Here is a brief description of the methods of the class
• K_to_F() and F_to_K() solve the decision problems of agent 1 and agent 2 respec-
tively
• evaluate_F() computes the loss and entropy associated with a given policy — see
this discussion
46.8. APPLICATION 777
46.8 Application
Let us consider a monopolist similar to this one, but now facing model uncertainty
The inverse demand function is 𝑝𝑡 = 𝑎0 − 𝑎1 𝑦𝑡 + 𝑑𝑡
where
IID
𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑑 𝑤𝑡+1 , {𝑤𝑡 } ∼ 𝑁 (0, 1)
(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑦𝑡 − 𝛾 − 𝑐𝑦𝑡
2
1
𝑥𝑡 = ⎢𝑦𝑡 ⎤
⎡
⎥ and 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
𝑑
⎣ 𝑡⎦
0 𝑏 0
𝑅 = − ⎢𝑏 −𝑎1 1/2⎤
⎡
⎥ and 𝑄 = 𝛾/2
⎣0 1/2 0 ⎦
1 0 0 0 0
𝐴=⎡ ⎤
⎢0 1 0⎥ , 𝐵=⎡ ⎤
⎢1⎥ , 𝐶=⎡
⎢0⎥
⎤
⎣0 0 𝜌⎦ ⎣0⎦ ⎣𝜎𝑑 ⎦
The standard normal distribution for 𝑤𝑡 is understood as the agent’s baseline, with uncer-
tainty parameterized by 𝜃
We compute value-entropy correspondences for two policies
1. The no concern for robustness policy 𝐹0 , which is the ordinary LQ loss minimizer
2. A “moderate” concern for robustness policy 𝐹𝑏 , with 𝜃 = 0.02
778 46. ROBUSTNESS
The code for producing the graph shown above, with blue being for the robust policy, is as
follows
In [2]: """
"""
import pandas as pd
import numpy as np
from scipy.linalg import eig
import matplotlib.pyplot as plt
import quantecon as qe
# == model parameters == #
a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
c = 2
γ = 50.0
θ = 0.002
ac = (a_0 - c) / 2.0
# == Define LQ matrices == #
R = -R # For minimization
Q = γ / 2
# -------------------------------------------------------------------------- #
# Functions
# -------------------------------------------------------------------------- #
Parameters
==========
46.8. APPLICATION 779
emax: scalar
The target entropy value
F: array_like
The policy function to be evaluated
bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.
Returns
=======
df: pd.DataFrame
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.
"""
if bw == 'worst':
θs = 1 / np.linspace(1e-8, 1000, grid_size)
else:
θs = -1 / np.linspace(1e-8, 1000, grid_size)
for θ in θs:
df.loc[θ] = evaluate_policy(θ, F)
if df.loc[θ, 'entropy'] >= emax:
break
df = df.dropna(how='any')
return df
# -------------------------------------------------------------------------- #
# Main
# -------------------------------------------------------------------------- #
emax = 1.6e6
fig, ax = plt.subplots()
ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()
class Curve:
plt.show()
Can you explain the different shape of the value-entropy correspondence for the robust pol-
icy?
46.9 Appendix
We sketch the proof only of the first claim in this section, which is that, for any given 𝜃,
𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as given in Eq. (8)
This is the content of the next lemma
Lemma. If 𝑃 ̂ is the fixed point of the map ℬ ∘ 𝒟 and 𝐹 ̂ is the robust policy as given in
Eq. (7), then
Proof: As a first step, observe that when 𝐹 = 𝐹 ̂ , the Bellman equation associated with the
LQ problem Eq. (11) – Eq. (12) is
Suppose for a moment that −𝑃 ̂ solves the Bellman equation Eq. (29)
In this case, the policy becomes
Using the definition of 𝒟, we can rewrite the right-hand side more simply as
Although it involves a substantial amount of algebra, it can be shown that the latter is just 𝑃 ̂
(Hint: Use the fact that 𝑃 ̂ = ℬ(𝒟(𝑃 ̂ )))
782 46. ROBUSTNESS
47
47.1 Contents
• Overview 47.2
• Exercises 47.6
• Solutions 47.7
In addition to what’s in Anaconda, this lecture will need the following libraries
47.2 Overview
In this lecture we discuss a family of dynamic programming problems with the following fea-
tures:
• monetary economics
783
784 47. DISCRETE STATE DYNAMIC PROGRAMMING
When a given model is not inherently discrete, it is common to replace it with a discretized
version in order to use discrete DP techniques
This lecture covers
• the theory of dynamic programming in a discrete setting, plus examples and applica-
tions
• a powerful set of routines for solving discrete DPs from the QuantEcon code library
The objective of this lecture is to provide a more systematic and theoretical treatment, in-
cluding algorithms and implementation while focusing on the discrete case
47.2.2 Code
JIT compilation relies on Numba, which should work seamlessly if you are using Anaconda as
suggested
47.2.3 References
For background reading on dynamic programming and additional applications, see, for exam-
ple,
• [87]
• [65], section 3.5
47.3. DISCRETE DPS 785
• [104]
• [123]
• [112]
• [96]
• EDTC, chapter 5
∞
E ∑ 𝛽 𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 ) (1)
𝑡=0
where
Each pair (𝑠𝑡 , 𝑎𝑡 ) pins down transition probabilities 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) for the next period state
𝑠𝑡+1
Thus, actions influence not only current rewards but also the future time path of the state
The essence of dynamic programming problems is to trade off current rewards vs favorable
positioning of the future state (modulo randomness)
Examples:
47.3.1 Policies
The most fruitful way to think about solutions to discrete DP problems is to compare policies
In general, a policy is a randomized map from past actions and states to current action
In the setting formalized below, it suffices to consider so-called stationary Markov policies,
which consider only the current state
In particular, a stationary Markov policy is a map 𝜎 from states to actions
It is known that, for any arbitrary policy, there exists a stationary Markov policy that domi-
nates it at least weakly
786 47. DISCRETE STATE DYNAMIC PROGRAMMING
SA ∶= {(𝑠, 𝑎) ∣ 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)}
1. A reward function 𝑟 ∶ SA → R
2. A transition probability function 𝑄 ∶ SA → Δ(𝑆), where Δ(𝑆) is the set of probability
distributions over 𝑆
3. A discount factor 𝛽 ∈ [0, 1)
We also use the notation 𝐴 ∶= ⋃𝑠∈𝑆 𝐴(𝑠) = {0, … , 𝑚 − 1} and call this set the action space
A policy is a function 𝜎 ∶ 𝑆 → 𝐴
A policy is called feasible if it satisfies 𝜎(𝑠) ∈ 𝐴(𝑠) for all 𝑠 ∈ 𝑆
Denote the set of all feasible policies by Σ
If a decision-maker uses a policy 𝜎 ∈ Σ, then
Comments
47.3. DISCRETE DPS 787
Notice that we’re not really distinguishing between functions from 𝑆 to R and vectors in R𝑛
This is natural because they are in one to one correspondence
Let 𝑣𝜎 (𝑠) denote the discounted sum of expected reward flows from policy 𝜎 when the initial
state is 𝑠
To calculate this quantity we pass the expectation through the sum in Eq. (1) and use Eq. (2)
to get
∞
𝑣𝜎 (𝑠) = ∑ 𝛽 𝑡 (𝑄𝑡𝜎 𝑟𝜎 )(𝑠) (𝑠 ∈ 𝑆)
𝑡=0
This function is called the policy value function for the policy 𝜎
The optimal value function, or simply value function, is the function 𝑣∗ ∶ 𝑆 → R defined by
(We can use max rather than sup here because the domain is a finite set)
A policy 𝜎 ∈ Σ is called optimal if 𝑣𝜎 (𝑠) = 𝑣∗ (𝑠) for all 𝑠 ∈ 𝑆
Given any 𝑤 ∶ 𝑆 → R, a policy 𝜎 ∈ Σ is called 𝑤-greedy if
As discussed in detail below, optimal policies are precisely those that are 𝑣∗ -greedy
𝑇𝜎 𝑣 = 𝑟𝜎 + 𝛽𝑄𝜎 𝑣
Now that the theory has been set out, let’s turn to solution methods
The code for solving discrete DPs is available in ddp.py from the QuantEcon.py code library
It implements the three most important solution methods for discrete dynamic programs,
namely
47.4. SOLVING DISCRETE DPS 789
Perhaps the most familiar method for solving all manner of dynamic programs is value func-
tion iteration
This algorithm uses the fact that the Bellman operator 𝑇 is a contraction mapping with fixed
point 𝑣∗
Hence, iterative application of 𝑇 to any initial function 𝑣0 ∶ 𝑆 → R converges to 𝑣∗
The details of the algorithm can be found in the appendix
This routine, also known as Howard’s policy improvement algorithm, exploits more closely the
particular structure of a discrete DP problem
Each iteration consists of
1. A policy evaluation step that computes the value 𝑣𝜎 of a policy 𝜎 by solving the linear
equation 𝑣 = 𝑇𝜎 𝑣
2. A policy improvement step that computes a 𝑣𝜎 -greedy policy
In the current setting, policy iteration computes an exact optimal policy in finitely many iter-
ations
Modified policy iteration replaces the policy evaluation step in policy iteration with “partial
policy evaluation”
The latter computes an approximation to the value of a policy 𝜎 by iterating 𝑇𝜎 for a speci-
fied number of times
This approach can be useful when the state space is very large and the linear system in the
policy evaluation step of policy iteration is correspondingly difficult to solve
The details of the algorithm can be found in the appendix
790 47. DISCRETE STATE DYNAMIC PROGRAMMING
𝑠′ = 𝑎 + 𝑈 where 𝑈 ∼ 𝑈 [0, … , 𝐵]
– hence 𝑛 = 𝑀 + 𝐵 + 1
1
if 𝑎 ≤ 𝑠′ ≤ 𝑎 + 𝐵
𝑄(𝑠, 𝑎, 𝑠′ ) ∶= { 𝐵+1 (3)
0 otherwise
This information will be used to create an instance of DiscreteDP by passing the following
information
47.5. EXAMPLE: A GROWTH MODEL 791
1. An 𝑛 × 𝑚 reward array 𝑅
2. An 𝑛 × 𝑚 × 𝑛 transition probability array 𝑄
3. A discount factor 𝛽
class SimpleOG:
self.populate_Q()
self.populate_R()
def populate_R(self):
"""
Populate the R matrix, with R[s, a] = -np.inf for infeasible
state-action pairs.
"""
for s in range(self.n):
for a in range(self.m):
self.R[s, a] = self.u(s - a) if a <= s else -np.inf
def populate_Q(self):
"""
Populate the Q matrix by setting
for a in range(self.m):
self.Q[:, a, a:(a + self.B + 1)] = 1.0 / (self.B + 1)
In [6]: dir(results)
(In IPython version 4.0 and above you can also type results. and hit the tab key)
The most important attributes are v, the value function, and σ, the optimal policy
In [7]: results.v
In [8]: results.sigma
Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound
max_iter
Let’s make sure this didn’t happen
In [9]: results.max_iter
Out[9]: 250
In [10]: results.num_iter
Out[10]: 3
Another interesting object is results.mc, which is the controlled chain defined by 𝑄𝜎∗ ,
where 𝜎∗ is the optimal policy
In other words, it gives the dynamics of the state when the agent follows the optimal policy
Since this object is an instance of MarkovChain from QuantEcon.py (see this lecture for more
discussion), we can easily simulate it, compute its stationary distribution and so on
In [11]: results.mc.stationary_distributions
If we look at the bar graph we can see the rightward shift in probability mass
• s_indices and a_indices are arrays of equal length L enumerating all feasible
state-action pairs
• R is an array of length L giving corresponding rewards
• Q is an L x n transition probability array
Here’s how we could set up these objects for the preceding example
def u(c):
return c**α
s_indices = []
a_indices = []
Q = []
R = []
b = 1.0 / (B + 1)
for s in range(n):
for a in range(min(M, s) + 1): # All feasible a at this s
s_indices.append(s)
a_indices.append(a)
q = np.zeros(n)
q[a:(a + B + 1)] = b # b on these values, otherwise 0
Q.append(q)
R.append(u(s - a))
For larger problems, you might need to write this code more efficiently by vectorizing or using
Numba
47.6 Exercises
In the stochastic optimal growth lecture dynamic programming lecture, we solve a benchmark
model that has an analytical solution to check we could replicate it numerically
The exercise is to replicate this solution using DiscreteDP
47.7 Solutions
47.7.1 Setup
In [15]: α = 0.65
f = lambda k: k**α
u = np.log
β = 0.95
Here we want to solve a finite state version of the continuous state model above
We discretize the state space into a grid of size grid_size=500, from 10−6 to grid_max=2
In [16]: grid_max = 2
grid_size = 500
grid = np.linspace(1e-6, grid_max, grid_size)
We choose the action to be the amount of capital to save for the next period (the state is the
capital stock at the beginning of the period)
Thus the state indices and the action indices are both 0, …, grid_size-1
Action (indexed by) a is feasible at state (indexed by) s if and only if grid[a] <
f([grid[s]) (zero consumption is not allowed because of the log utility)
Thus the Bellman equation is:
# State-action indices
s_indices, a_indices = np.where(C > 0)
print(L)
print(s_indices)
print(a_indices)
118841
[ 0 1 1 … 499 499 499]
[ 0 0 1 … 389 390 391]
(Degenerate) transition probability matrix Q (of shape (L, grid_size)), where we choose
the scipy.sparse.lil_matrix format, while any format will do (internally it will be converted to
the csr format):
(If you are familiar with the data structure of scipy.sparse.csr_matrix, the following is the
most efficient way to create the Q matrix in the current case)
---------------------------------------------------------------------------
<ipython-input-21-1f5786e1c771> in <module>
----> 1 ddp = DiscreteDP(R, Q, β, s_indices, a_indices)
Notes
Here we intensively vectorized the operations on arrays to simplify the code
As noted, however, vectorization is memory consumptive, and it can be prohibitively so for
grids with large size
Out[22]: 3
Note that sigma contains the indices of the optimal capital stocks to save for the next pe-
riod. The following translates sigma to the corresponding consumption vector
47.7. SOLUTIONS 797
def v_star(k):
return c1 + c2 * np.log(k)
def c_star(k):
return (1 - ab) * k**α
---------------------------------------------------------------------------
<ipython-input-23-8108508c9c72> in <module>
1 # Optimal consumption in the discrete version
----> 2 c = f(grid) - grid[σ]
3
4 # Exact solution of the continuous version
5 ab = α * β
ValueError: operands could not be broadcast together with shapes (500,) (16,)
Let us compare the solution of the discrete model with that of the original continuous model
---------------------------------------------------------------------------
<ipython-input-24-7350cbf9e3b1> in <module>
5
6 lb0 = 'discrete value function'
----> 7 ax[0].plot(grid, v, lw=2, alpha=0.6, label=lb0)
8
9 lb0 = 'continuous value function'
1811
1812 inner.__doc__ = _add_data_doc(inner.__doc__,
~/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
229 if x.shape[0] != y.shape[0]:
230 raise ValueError("x and y must have same first dimension, but "
--> 231 "have shapes {} and {}".format(x.shape, y.shape))
232 if x.ndim > 2 or y.ndim > 2:
233 raise ValueError("x and y can be no greater than 2-D, but have "
ValueError: x and y must have same first dimension, but have shapes (500,) and (16,)
---------------------------------------------------------------------------
<ipython-input-25-f495487d57cd> in <module>
----> 1 np.abs(v - v_star(grid)).max()
47.7. SOLUTIONS 799
---------------------------------------------------------------------------
<ipython-input-26-e0fc69a759f9> in <module>
----> 1 np.abs(v - v_star(grid))[1:].max()
---------------------------------------------------------------------------
<ipython-input-27-c41ccfb31a8e> in <module>
----> 1 np.abs(c - c_star(grid)).max()
In fact, the optimal consumption obtained in the discrete version is not really monotone, but
the decrements are quite small:
---------------------------------------------------------------------------
<ipython-input-28-c5e3251b22c6> in <module>
----> 1 diff = np.diff(c)
2 (diff >= 0).all()
---------------------------------------------------------------------------
<ipython-input-29-709b5e262c1e> in <module>
----> 1 dec_ind = np.where(diff < 0)[0]
2 len(dec_ind)
In [30]: np.abs(diff[dec_ind]).max()
---------------------------------------------------------------------------
<ipython-input-30-6d006cd8d323> in <module>
----> 1 np.abs(diff[dec_ind]).max()
Out[31]: True
Out[32]: 123
Out[33]: True
Out[34]: 5
Out[35]: True
Speed Comparison
%timeit ddp.solve(method='value_iteration')
%timeit ddp.solve(method='policy_iteration')
%timeit ddp.solve(method='modified_policy_iteration')
As is often the case, policy iteration and modified policy iteration are much faster than value
iteration
47.7. SOLUTIONS 801
plt.show()
---------------------------------------------------------------------------
<ipython-input-36-83048ad9d92c> in <module>
7 ax.plot(grid, w, color=plt.cm.jet(0), lw=2, alpha=0.6, label=lb)
8 for i in range(n):
----> 9 w = ddp.bellman_operator(w)
10 ax.plot(grid, w, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
11 lb = 'true value function'
ValueError: shapes (81,16) and (500,) not aligned: 16 (dim 1) != 500 (dim 0)
802 47. DISCRETE STATE DYNAMIC PROGRAMMING
We next plot the consumption policies along with the value iteration
---------------------------------------------------------------------------
<ipython-input-37-6b660fc306cf> in <module>
2
3 fig, ax = plt.subplots(3, 1, figsize=(8, 10))
----> 4 true_c = c_star(grid)
5
6 for i, n in enumerate((2, 4, 6)):
sample_size = 25
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel("time")
ax.set_ylabel("capital")
ax.set_ylim(0.10, 0.30)
804 47. DISCRETE STATE DYNAMIC PROGRAMMING
ax.legend(loc='lower right')
plt.show()
---------------------------------------------------------------------------
<ipython-input-38-7e126512fe9e> in <module>
13
14 # Create a new instance, not to modify the one used above
---> 15 ddp0 = DiscreteDP(R, Q, β, s_indices, a_indices)
16
17 for beta in discount_factors:
This appendix covers the details of the solution algorithms implemented for DiscreteDP
We will make use of the following notions of approximate optimality:
The DiscreteDP value iteration method implements value function iteration as follows
(While not explicit, in the actual implementation each algorithm is terminated if the number
of iterations reaches iter_max)
Given 𝜀 > 0, provided that 𝑣0 is such that 𝑇 𝑣0 ≥ 𝑣0 , the modified policy iteration algorithm
terminates in a finite number of iterations
It returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy func-
tion (unless iter_max is reached)
See also the documentation for DiscreteDP
Part VII
807
48
48.1 Contents
• Outline 48.2
• The Model 48.3
• Results 48.4
• Exercises 48.5
• Solutions 48.6
48.2 Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation
[121]
His model studies the dynamics of racially mixed neighborhoods
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann)
In this lecture, we (in fact you) will build and run a version of Schelling’s model
We will cover a variation of Schelling’s model that is easy to program and captures the main
idea
48.3.1 Set-Up
Suppose we have two types of people: orange people and green people
809
810 48. SCHELLING’S SEGREGATION MODEL
For the purpose of this lecture, we will assume there are 250 of each type
These agents all live on a single unit square
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1
48.3.2 Preferences
We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type
Here ‘nearest’ is in terms of Euclidean distance
An agent who is not happy is called unhappy
An important point here is that agents are not averse to living in mixed areas
They are perfectly happy if half their neighbors are of the other color
48.3.3 Behavior
48.4 Results
Let’s have a look at the results we got when we coded and ran this model
As discussed above, agents are initially mixed randomly together
48.4. RESULTS 811
But after several cycles, they become segregated into distinct regions
812 48. SCHELLING’S SEGREGATION MODEL
48.4. RESULTS 813
814 48. SCHELLING’S SEGREGATION MODEL
In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness
What is striking about the pictures is how rapidly racial integration breaks down
This is despite the fact that people in the model don’t actually mind living mixed with the
other type
Even with these preferences, the outcome is a high degree of segregation
48.5 Exercises
48.5.1 Exercise 1
* Data:
* Methods:
48.6 Solutions
48.6.1 Exercise 1
class Agent:
def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)
# == Main == #
num_of_type_0 = 250
num_of_type_1 = 250
num_neighbors = 10 # Number of agents regarded as neighbors
require_same_type = 5 # Want at least this many neighbors to be same type
count = 1
# == Loop until none wishes to move == #
while True:
print('Entering loop ', count)
plot_distribution(agents, count)
count += 1
no_one_moved = True
for agent in agents:
old_location = agent.location
agent.update(agents)
if agent.location != old_location:
no_one_moved = False
if no_one_moved:
break
print('Converged, terminating.')
Entering loop 1
48.6. SOLUTIONS 817
Entering loop 2
818 48. SCHELLING’S SEGREGATION MODEL
Entering loop 3
48.6. SOLUTIONS 819
Entering loop 4
820 48. SCHELLING’S SEGREGATION MODEL
Converged, terminating.
49
49.1 Contents
• Overview 49.2
• Implementation 49.4
• Exercises 49.7
• Solutions 49.8
In addition to what’s in Anaconda, this lecture will need the following libraries
49.2 Overview
It is a good model for interpreting monthly labor department reports on gross and net jobs
created and jobs destroyed
821
822 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
The “lakes” in the model are the pools of employed and unemployed
The “flows” between the lakes are caused by
For the first part of this lecture, the parameters governing transitions into and out of unem-
ployment and employment are exogenous
Later, we’ll determine some of these transition rates endogenously using the McCall search
model
We’ll also use some nifty concepts like ergodicity, which provides a fundamental link between
cross-sectional and long run time series distributions
These concepts will help us build an equilibrium model of ex-ante homogeneous workers
whose different luck generates variations in their ex post experiences
49.2.1 Prerequisites
Before working through what follows, we recommend you read the lecture on finite Markov
chains
You will also need some basic linear algebra and probability
(Here and below, capital letters represent stocks and lowercase letters represent flows)
The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as
𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡
(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)
This law tells us how total employment and unemployment evolve over time
𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
824 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Letting
𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡
̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔
49.4 Implementation
class LakeModel:
"""
Solves the lake model and computes dynamics of unemployment stocks and
rates.
Parameters:
------------
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
49.4. IMPLEMENTATION 825
"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
self._λ, self._α, self._b, self._d = λ, α, b, d
self.compute_derived_values()
def compute_derived_values(self):
# Unpack names to simplify expression
λ, α, b, d = self._λ, self._α, self._b, self._d
self._g = b - d
self._A = np.array([[(1-d) * (1-λ) + b, (1 - d) * α + b],
[ (1-d) * λ, (1 - d) * (1 - α)]])
@property
def g(self):
return self._g
@property
def A(self):
return self._A
@property
def A_hat(self):
return self._A_hat
@property
def λ(self):
return self._λ
@λ.setter
def λ(self, new_value):
self._α = new_value
self.compute_derived_values()
@property
def α(self):
return self._α
@α.setter
def α(self, new_value):
self._α = new_value
self.compute_derived_values()
@property
def b(self):
return self._b
@b.setter
def b(self, new_value):
self._b = new_value
self.compute_derived_values()
@property
def d(self):
return self._d
@d.setter
def d(self, new_value):
self._d = new_value
self.compute_derived_values()
Returns
--------
xbar : steady state vector of employment and unemployment rates
"""
826 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x - x))
x = new_x
return x
Parameters
------------
X0 : array
Contains initial values (E0, U0)
T : int
Number of periods to simulate
Returns
---------
X : iterator
Contains sequence of employment and unemployment stocks
"""
Parameters
------------
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate
Returns
---------
x : iterator
Contains sequence of employment and unemployment rates
"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x
As desired, if we create an instance and update a primitive like 𝛼, derived objects like 𝐴 will
also change
In [3]: lm = LakeModel()
lm.α
Out[3]: 0.013
In [4]: lm.A
In [5]: lm.α = 2
lm.A
Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)
lm = LakeModel()
N_0 = 150 # Population
e_0 = 0.92 # Initial employment rate
u_0 = 1 - e_0 # Initial unemployment rate
T = 50 # Simulation length
axes[2].plot(X_path.sum(1), lw=2)
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that
̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1
This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1
This is the case for our default parameters:
In [7]: lm = LakeModel()
e, f = np.linalg.eigvals(lm.A_hat)
abs(e), abs(f)
Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)
In [8]: lm = LakeModel()
e_0 = 0.92 # Initial employment rate
49.5. DYNAMICS OF AN INDIVIDUAL WORKER 829
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
An individual worker’s employment dynamics are governed by a finite state Markov process
The worker can be in one of two states:
830 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
• 𝑠𝑡 = 0 means unemployed
• 𝑠𝑡 = 1 means employed
1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼
Let 𝜓𝑡 denote the marginal distribution over employment/unemployment states for the
worker at time 𝑡
As usual, we regard it as a row vector
We know from an earlier discussion that 𝜓𝑡 follows the law of motion
𝜓𝑡+1 = 𝜓𝑡 𝑃
We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗
The unique stationary distribution satisfies
𝛼
𝜓∗ [0] =
𝛼+𝜆
Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate
49.5.1 Ergodicity
1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 1{𝑠𝑡 = 0}
𝑇 𝑡=1
and
1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 1{𝑠𝑡 = 1}
𝑇 𝑡=1
lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞
How long does it take for time series sample averages to converge to cross-sectional averages?
We can use QuantEcon.py’s MarkovChain class to investigate this
Let’s plot the path of the sample averages over 5,000 periods
lm = LakeModel(d=0, b=0)
T = 5000 # Simulation length
α, λ = lm.α, lm.λ
P = [[1 - λ, λ],
[ α, 1 - α]]
mc = MarkovChain(P)
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
832 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤̄
• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts
• Otherwise, the worker rejects
As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
49.6. ENDOGENOUS JOB FINDING RATE 833
Suppose that all workers inside a lake model behave according to the McCall search model
The exogenous probability of leaving employment remains 𝛼
But their optimal decision rules determine the probability 𝜆 of leaving unemployment
This is now
̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾P{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄
We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance
We assume that the government sets unemployment compensation 𝑐
The government imposes a lump-sum tax 𝜏 sufficient to finance total unemployment pay-
ments
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy
𝜏 = 𝑢𝑐
𝜏 = 𝑢(𝑐, 𝜏 )𝑐
𝑊 ∶= 𝑒 E[𝑉 | employed] + 𝑢 𝑈
where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure
• 𝑏 = 0.0124
• 𝑑 = 0.00822
• 𝛼 = 0.013
We will make use of code we wrote in the McCall model lecture, embedded below for conve-
nience
The first piece of code, repeated below, implements value function iteration
@jit
def u(c, σ):
if c > 0:
return (c**(1 - σ) - 1) / (1 - σ)
else:
return -10e6
class McCallModel:
49.6. ENDOGENOUS JOB FINDING RATE 835
"""
Stores the parameters and functions associated with a given model.
"""
def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount rate
γ=0.7, # Job offer rate
c=6.0, # Unemployment compensation
σ=2.0, # Utility parameter
w_vec=None, # Possible wage values
p_vec=None): # Probabilities over w_vec
# Add a default wage vector and probabilities over the vector using
# the beta-binomial distribution
if w_vec is None:
n = 60 # number of possible outcomes for wage
self.w_vec = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n-1, a, b)
self.p_vec = dist.pdf()
else:
self.w_vec = w_vec
self.p_vec = p_vec
@jit
def _update_bellman(α, β, γ, c, σ, w_vec, p_vec, V, V_new, U):
"""
A jitted function to update the Bellman equations. Note that V_new is
modified in place (i.e, modified by this function). The new value of U is
returned.
"""
for w_idx, w in enumerate(w_vec):
# w_idx indexes the vector of possible wages
V_new[w_idx] = u(w, σ) + β * ((1 - α) * V[w_idx] + α * U)
U_new = u(c, σ) + β * (1 - γ) * U + \
β * γ * np.sum(np.maximum(U, V) * p_vec)
return U_new
Parameters
----------
mcm : an instance of McCallModel
tol : float
error tolerance
max_iter : int
the maximum number of iterations
"""
i += 1
return V, U
The second piece of code repeated from the McCall model lecture is used to complete the
reservation wage
If V(w) > U for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vec.
Parameters
----------
mcm : an instance of McCallModel
return_values : bool (optional, default=False)
Return the value functions as well
Returns
-------
w_bar : scalar
The reservation wage
"""
V, U = solve_mccall_model(mcm)
w_idx = np.searchsorted(V - U, 0)
if w_idx == len(V):
w_bar = np.inf
else:
w_bar = mcm.w_vec[w_idx]
if return_values == False:
return w_bar
else:
return w_bar, V, U
Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate
"""
mcm = McCallModel(α=α_q,
β=β,
γ=γ,
c=c-τ, # post tax compensation
σ=σ,
w_vec=w_vec-τ, # post tax wages
p_vec=p_vec)
"""
w_bar, λ, V, U = compute_optimal_quantities(c, τ)
return e, u, welfare
def find_balanced_budget_tax(c):
"""
Find the tax level that will induce a balanced budget.
"""
def steady_state_budget(t):
e, u, w = compute_steady_state_quantities(c, t)
return t - u * c
tax_vec = []
unempl_vec = []
empl_vec = []
welfare_vec = []
for c in c_vec:
t = find_balanced_budget_tax(c)
e_rate, u_rate, welfare = compute_steady_state_quantities(c, t)
tax_vec.append(t)
unempl_vec.append(u_rate)
empl_vec.append(e_rate)
welfare_vec.append(welfare)
plt.tight_layout()
plt.show()
49.7 Exercises
49.7.1 Exercise 1
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
49.8. SOLUTIONS 839
49.7.2 Exercise 2
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.0025) and then re-
turned to its original level
Plot the transition dynamics of the unemployment and employment stocks for 50 periods
Plot the transition dynamics for the rates
How long does the economy take to return to its original steady state?
49.8 Solutions
49.9.1 Exercise 1
We begin by constructing the class containing the default parameters and assigning the
steady state values to x0
In [13]: lm = LakeModel()
x0 = lm.rate_steady_state()
print(f"Initial Steady State: {x0}")
In [14]: N0 = 100
T = 50
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
plt.tight_layout()
plt.show()
We see that it takes 20 periods for the economy to converge to its new steady state levels
49.9.2 Exercise 2
This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels
For 20 periods the economy has a new entry rate into the labor market
Let’s start off at the baseline parameterization and record the steady state
In [18]: lm = LakeModel()
x0 = lm.rate_steady_state()
Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
49.9. LAKE MODEL SOLUTIONS 843
plt.tight_layout()
plt.show()
844 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
50
50.1 Contents
• Overview 50.2
• Exercises 50.5
• Solutions 50.6
In addition to what’s in Anaconda, this lecture will need the following libraries
50.2 Overview
845
846 50. RATIONAL EXPECTATIONS EQUILIBRIUM
Equality between a perceived and an actual law of motion for endogenous market-wide ob-
jects captures in a nutshell what the rational expectations equilibrium concept is all about
Finally, we will learn about the important “Big 𝐾, little 𝑘” trick, a modeling device widely
used in macroeconomics
Except that for us
This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium
We want to impose that
• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but …
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative
• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but …
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem
Please watch for how this strategy is applied as the lecture unfolds
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context
A Simple Static Example of the Big Y, Little y Trick
Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market
Each of these 𝑛 firms sell output 𝑦
The price 𝑝 of the good lies on an inverse demand curve
𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output
Using Eq. (1), we can express the problem of the representative firm as
𝑎0 − 𝑎 1 𝑌 − 𝑐 1 − 𝑐 2 𝑦 = 0 (3)
At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into Eq. (3) to obtain the following linear
equation
• [89]
• [118], chapter XIV
• [87], chapter 7
Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices
Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms
Hence each firm wants to forecast future total industry supplies
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state
848 50. RATIONAL EXPECTATIONS EQUILIBRIUM
Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that
maps beliefs into optimal beliefs
𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output
∞
∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0
where
𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2
Regarding timing, the firm observes 𝑝𝑡 and 𝑦𝑡 when it chooses 𝑦𝑡+1 at time 𝑡
To state the firm’s optimization problem completely requires that we specify dynamics for all
state variables
This includes ones that the firm cares about but does not control like 𝑝𝑡
We turn to this problem now
Prices and Aggregate Output
50.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 849
In view of Eq. (5), the firm’s incentive to forecast the market price translates into an incen-
tive to forecast aggregate output 𝑌𝑡
Aggregate output depends on the choices of other firms
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions
The Firm’s Beliefs
We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion
𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)
𝑦′ 2
where
𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶=𝑦′ {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
2
𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)
850 50. RATIONAL EXPECTATIONS EQUILIBRIUM
Substituting this equation into Eq. (12) gives the Euler equation
The firm optimally sets an output path that satisfies Eq. (13), taking Eq. (8) as given, and
subject to
This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”
The firm’s decision rule solves the difference equation Eq. (13) subject to the given initial
condition 𝑦0 and the transversality condition
Note that solving the Bellman equation Eq. (9) for 𝑣 and then ℎ in Eq. (11) yields a decision
rule that automatically imposes both the Euler equation Eq. (13) and the transversality con-
dition
The Actual Law of Motion for Output
As we’ve seen, a given belief translates into a particular decision rule ℎ
Recalling that 𝑌𝑡 = 𝑛𝑦𝑡 , the actual law of motion for market-wide output is then
Thus, when firms believe that the law of motion for market-wide output is Eq. (8), their opti-
mizing behavior makes the actual law of motion be Eq. (14)
A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that
Thus, a rational expectations equilibrium equates the perceived and actual laws of motion
Eq. (8) and Eq. (14)
Fixed Point Characterization
As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻)
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via Eq. (9)–Eq. (11), and a decision rule into an actual law via Eq. (14)
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ
50.4. COMPUTATION OF AN EQUILIBRIUM 851
Now let’s consider the problem of computing the rational expectations equilibrium
Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ
Unfortunately, the mapping Φ is not a contraction
In particular, there is no guarantee that direct iterations on Φ converge [1]
Fortunately, there is another method that works here
The method exploits a general connection between equilibrium and Pareto optimality ex-
pressed in the fundamental theorems of welfare economics (see, e.g, [93])
Lucas and Prescott [89] used this method to construct a rational expectations equilibrium
The details follow
Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem
As we’ll see, this planning problem can be solved by LQ control (linear regulator)
The optimal quantities from the planning problem are rational expectations equilibrium
quantities
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem
For convenience, in this section, we set 𝑛 = 1
We first compute a sum of consumer and producer surplus at time 𝑡
𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2
The first term is the area under the demand curve, while the second measures the social costs
of changing output
The planning problem is to choose a production plan {𝑌𝑡 } to maximize
∞
∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0
Evaluating the integral in Eq. (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2
852 50. RATIONAL EXPECTATIONS EQUILIBRIUM
𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2
−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)
𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )
Substituting this into equation Eq. (17) and rearranging leads to the Euler equation
If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation Eq. (16)
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium
Structure of the Law of Motion
As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form
𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)
Now that we know the aggregate law of motion is linear, we can see from the firm’s Bellman
equation Eq. (9) that the firm’s problem can also be framed as an LQ problem
As you’re asked to show in the exercises, the LQ formulation of the firm’s problem implies a
law of motion that looks as follows
𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)
50.5 Exercises
50.5.1 Exercise 1
Express the solution of the firm’s problem in the form Eq. (20) and give the values for each
ℎ𝑗
If there were 𝑛 identical competitive firms all behaving according to Eq. (20), what would
Eq. (20) imply for the actual law of motion Eq. (8) for market supply
50.5.2 Exercise 2
Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see Eq. (19))
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium
(You are not being asked actually to use the algorithm you are suggesting)
854 50. RATIONAL EXPECTATIONS EQUILIBRIUM
50.5.3 Exercise 3
50.5.4 Exercise 4
A monopolist faces the industry demand curve Eq. (5) and chooses {𝑌𝑡 } to maximize
∞
∑𝑡=0 𝛽 𝑡 𝑟𝑡 where
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2
𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡
50.6 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
50.6.1 Exercise 1
To map a problem into a discounted optimal linear control problem, we need to define
𝑦𝑡
𝑥𝑡 = ⎡ 𝑌 ⎤
⎢ 𝑡⎥ , 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣1⎦
For 𝐵, 𝑄, 𝑅 we set
1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴 = ⎢0 𝜅1 𝜅0 ⎤
⎡
⎥, 𝐵 = ⎢0⎤
⎡
⎥,
⎡
𝑅 = ⎢ 𝑎1 /2 0 0 ⎤ ⎥, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦
We’ll use the module lqcontrol.py to solve the firm’s problem at the stated parameter
values
This will return an LQ policy 𝐹 with the interpretation 𝑢𝑡 = −𝐹 𝑥𝑡 , or
𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2
ℎ0 = −𝐹2 , ℎ1 = 1 − 𝐹0 , ℎ2 = −𝐹1
a0 = 100
a1 = 0.05
β = 0.95
γ = 10.0
# == Beliefs == #
κ0 = 95.5
κ1 = 0.95
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
out1 = f"F = [{F[0]:.3f}, {F[1]:.3f}, {F[2]:.3f}]"
h0, h1, h2 = -F[2], 1 - F[0], -F[1]
out2 = f"(h0, h1, h2) = ({h0:.3f}, {h1:.3f}, {h2:.3f})"
print(out1)
print(out2)
856 50. RATIONAL EXPECTATIONS EQUILIBRIUM
For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields
50.6.2 Exercise 2
To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:
In the second step, we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes
The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475)
50.6. SOLUTIONS 857
(Notice we use np.allclose to test equality of floating-point numbers, since exact equality
is too strict)
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair
This amounts to implementing the operator Φ described in the lecture
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)
50.6.3 Exercise 3
𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1
1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0
𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1
we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0
The Python code to solve this problem is below:
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
κ0, κ1 = -F[1], 1 - F[0]
print(κ0, κ1)
95.08187459215002 0.9524590627039248
858 50. RATIONAL EXPECTATIONS EQUILIBRIUM
The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise
50.6.4 Exercise 4
The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that
𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
m0, m1 = -F[1], 1 - F[0]
print(m0, m1)
73.47294403502818 0.9265270559649701
We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡
In the rational expectations case, the law of motion was approximately 𝑌𝑡+1 = 95.0818 +
0.9525𝑌𝑡
One way to compare these two laws of motion is by their fixed points, which give long-run
equilibrium output in each case
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 )
If you crunch the numbers, you will see that the monopolist adopts a lower long-run quantity
than obtained by the competitive market, implying a higher market price
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge to
rational expectations equilibria features iterations on a modification of the mapping Φ that
can be approximated as 𝛾Φ + (1 − 𝛾)𝐼. Here 𝐼 is the identity operator and 𝛾 ∈ (0, 1) is a
relaxation parameter. See [91] and [41] for statements and applications of this approach to
establish conditions under which collections of adaptive agents who use least squares learning
to converge to a rational expectations equilibrium.
51
51.1 Contents
• Overview 51.2
• Background 51.3
• Application 51.5
• Exercises 51.6
• Solutions 51.7
In addition to what’s in Anaconda, this lecture will need the following libraries
51.2 Overview
• two players
• quadratic payoff functions
• linear transition rules for the state
859
860 51. MARKOV PERFECT EQUILIBRIUM
51.3 Background
• Choice of price, output, location or capacity for firms in an industry (e.g., [40], [113],
[36])
• Rate of extraction from a shared natural resource, such as a fishery (e.g., [86], [130])
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0
In Eq. (1) and what follows,
Each firm recognizes that its output affects total output and therefore the market price
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve Eq. (1) into Eq. (2) lets us express the one-period pay-
off as
𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂
Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,
The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states
• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state
51.3.2 Computation
One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion
Imagine constructing the iterates
𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations
862 51. MARKOV PERFECT EQUILIBRIUM
In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure
We’ll lay out that structure in a general setup and then apply it to some simple problems
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0
Here
If we take 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 and substitute it into Eq. (6) and Eq. (7), then player 1’s problem
becomes minimization of
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0
subject to
51.4. LINEAR MARKOV PERFECT EQUILIBRIA 863
where
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)
𝑃1𝑡 = Π1𝑡 − (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 )′ (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) + 𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡
(11)
Similarly, the policy that solves player 2’s problem is
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)
𝑃2𝑡 = Π2𝑡 − (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 )′ (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) + 𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡
(13)
Here in all cases 𝑡 = 𝑡0 , … , 𝑡1 − 1 and the terminal conditions are 𝑃𝑖𝑡1 = 0
The solution procedure is to use equations Eq. (10), Eq. (11), Eq. (12), and Eq. (13), and
“work backwards” from time 𝑡1 − 1
Since we’re working backward, 𝑃1𝑡+1 and 𝑃2𝑡+1 are taken as given at each stage
Moreover, since
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞
This is the approach we adopt in the next section
51.4.3 Implementation
We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium
of the infinite horizon linear-quadratic dynamic game in the manner described above
51.5 Application
Let’s use these procedures to treat some applications, starting with the duopoly model
To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤
⎡
⎥, 𝐵1 ∶= ⎢1⎤
⎡
⎥, 𝐵2 ∶= ⎢0⎤
⎡
⎥
⎣ 0 0 1 ⎦ 0
⎣ ⎦ 1
⎣ ⎦
The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed-loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
Consider the previously presented duopoly model with parameter values of:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
From these, we compute the infinite horizon MPE using the preceding code
In [2]: """
"""
import numpy as np
import quantecon as qe
# == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
In particular, let’s take F2 as computed above, plug it into Eq. (8) and Eq. (9) to get firm 1’s
problem and solve it using LQ
We hope that the resulting policy will agree with F1 as computed above
In [3]: Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
F1_ih
This is close enough for rock and roll, as they say in the trade
Indeed, np.allclose agrees with our assessment
Out[4]: True
51.5.3 Dynamics
Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to Eq. (14)
The following program
• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters
• computes the evolution of 𝑥𝑡 using Eq. (14)
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡
AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0
To gain some perspective we can compare this to what happens in the monopoly case
51.6. EXERCISES 867
The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time
The second panel shows analogous curves for price
Here parameters are the same as above for both the MPE and monopoly solutions
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case
As expected, output is higher and prices are lower under duopoly than monopoly
51.6 Exercises
51.6.1 Exercise 1
Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies
under duopoly
The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class
51.6.2 Exercise 2
It takes the form of infinite horizon linear-quadratic game proposed by Judd [72]
Two firms set prices and quantities of two goods interrelated through their demand curves
Relevant variables are defined as follows:
2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars
𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏
where
′
• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants
1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡
𝐼1𝑡
𝑝
𝑢𝑖𝑡 = [ 𝑖𝑡 ] and ⎡
𝑥𝑡 = ⎢𝐼2𝑡 ⎤
⎥
𝑞𝑖𝑡
⎣1⎦
Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately
The exercise is to calculate these matrices and compute the following figures
The first figure shows the dynamics of inventories for each firm when the parameters are
51.7. SOLUTIONS 869
In [6]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
51.7 Solutions
51.7.1 Exercise 1
First, let’s compute the duopoly MPE under the stated parameters
In [7]: # == Parameters == #
a0 = 10.0
a1 = 2.0
870 51. MARKOV PERFECT EQUILIBRIUM
β = 0.96
γ = 12.0
# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
R1 = [[ 0., -a0/2, 0.],
[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1
In [8]: AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
𝑅 = 𝑎1 and 𝑄=𝛾
𝐴=𝐵=1
In [9]: R = a1
Q = γ
A = B = 1
lq_alt = qe.LQ(Q, R, A, B, beta=β)
P, F, d = lq_alt.stationary_values()
51.7. SOLUTIONS 871
ax = axes[0]
ax.plot(qm, 'b-', lw=2, alpha=0.75, label='monopolist output')
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE total output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(pm, 'b-', lw=2, alpha=0.75, label='monopolist price')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
872 51. MARKOV PERFECT EQUILIBRIUM
51.7.2 Exercise 2
In [11]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
δ_1 = 1 - δ
𝐼1𝑡
𝑝
𝑢𝑖𝑡 = [ 𝑖𝑡 ] and 𝑥𝑡 = ⎢𝐼2𝑡 ⎤
⎡
⎥
𝑞𝑖𝑡
⎣1⎦
we set up the matrices as follows:
S1 = np.zeros((2, 2))
S2 = np.copy(S1)
W1 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e1[1], b[0] / 2.]])
W2 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e2[1], b[1] / 2.]])
Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02
In [14]: AF = A - B1 @ F1 - B2 @ F2
n = 25
x = np.empty((3, n))
x[:, 0] = 2, 0, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
I1 = x[0, :]
I2 = x[1, :]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(I1, 'b-', lw=2, alpha=0.75, label='inventories, firm 1')
ax.plot(I2, 'g-', lw=2, alpha=0.75, label='inventories, firm 2')
ax.set_title(rf'$\delta = {δ}$')
ax.legend()
plt.show()
874 51. MARKOV PERFECT EQUILIBRIUM
52
52.1 Contents
• Overview 52.2
• Linear Markov Perfect Equilibria with Robust Agents 52.3
• Application 52.4
52.2 Overview
• two players
• quadratic payoff functions
• linear transition rules for the state vector
These specifications simplify calculations and allow us to give a simple example that illus-
trates basic forces
This lecture is based on ideas described in chapter 15 of [52] and in Markov perfect equilib-
rium and Robustness
Decisions of two agents affect the motion of a state vector that appears as an argument of
payoff functions of both agents
As described in Markov perfect equilibrium, when decision-makers have no concerns about
the robustness of their decision rules to misspecifications of the state dynamics, a Markov
perfect equilibrium can be computed via backward recursion on two sets of equations
875
876 52. ROBUST MARKOV PERFECT EQUILIBRIUM
This lecture shows how a similar equilibrium concept and similar computational procedures
apply when we impute concerns about robustness to both decision-makers
A Markov perfect equilibrium with robust agents will be characterized by
Below, we’ll construct a robust firms version of the classic duopoly model with adjustment
costs analyzed in Markov perfect equilibrium
As we saw in Markov perfect equilibrium, the study of Markov perfect equilibria in dynamic
games with two players leads us to an interrelated pair of Bellman equations
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure
We consider a general linear quadratic regulator game with two players, each of whom fears
model misspecifications
We often call the players agents
The agents share a common baseline model for the transition dynamics of the state vector
But now one or more agents doubt that the baseline model is correctly specified
The agents express the possibility that their baseline specification is incorrect by adding a
contribution 𝐶𝑣𝑖𝑡 to the time 𝑡 transition law for the state
For convenience, we’ll start with a finite horizon formulation, where 𝑡0 is the initial date and
𝑡1 is the common terminal date
Player 𝑖 takes a sequence {𝑢−𝑖𝑡 } as given and chooses a sequence {𝑢𝑖𝑡 } to minimize and {𝑣𝑖𝑡 }
to maximize
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 − 𝜃𝑖 𝑣𝑖𝑡
′
𝑣𝑖𝑡 } (1)
𝑡=𝑡0
Here
• {𝐹1𝑡 , 𝐾1𝑡 } solves player 1’s robust decision problem, taking {𝐹2𝑡 } as given, and
• {𝐹2𝑡 , 𝐾2𝑡 } solves player 2’s robust decision problem, taking {𝐹1𝑡 } as given
If we substitute 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 into Eq. (1) and Eq. (2), then player 1’s problem becomes
minimization-maximization of
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 − 𝜃1 𝑣1𝑡
′
𝑣1𝑡 } (3)
𝑡=𝑡0
subject to
where
This is an LQ robust dynamic programming problem of the type studied in the Robustness
lecture, which can be solved by working backward
Maximization with respect to distortion 𝑣1𝑡 leads to the following version of the 𝒟 operator
from the Robustness lecture, namely
The matrix 𝐹1𝑡 in the policy rule 𝑢1𝑡 = −𝐹1𝑡 𝑥𝑡 that solves agent 1’s problem satisfies
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) (6)
𝑃1𝑡 = Π1𝑡 − (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 )′ (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) + 𝛽Λ′1𝑡 𝒟1 (𝑃1𝑡+1 )Λ1𝑡
(7)
Similarly, the policy that solves player 2’s problem is
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) (8)
𝑃2𝑡 = Π2𝑡 − (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 )′ (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) + 𝛽Λ′2𝑡 𝒟2 (𝑃2𝑡+1 )Λ2𝑡
(9)
Here in all cases 𝑡 = 𝑡0 , … , 𝑡1 − 1 and the terminal conditions are 𝑃𝑖𝑡1 = 0
52.3. LINEAR MARKOV PERFECT EQUILIBRIA WITH ROBUST AGENTS 879
The solution procedure is to use equations Eq. (6), Eq. (7), Eq. (8), and Eq. (9), and “work
backwards” from time 𝑡1 − 1
Since we’re working backwards, 𝑃1𝑡+1 and 𝑃2𝑡+1 are taken as given at each stage
Moreover, since
As in Markov perfect equilibrium, a key insight here is that equations Eq. (6) and Eq. (8) are
linear in 𝐹1𝑡 and 𝐹2𝑡
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in Eq. (7) and
Eq. (9)
Notice how 𝑗’s control law 𝐹𝑗𝑡 is a function of {𝐹𝑖𝑠 , 𝑠 ≥ 𝑡, 𝑖 ≠ 𝑗}
Thus, agent 𝑖’s choice of {𝐹𝑖𝑡 ; 𝑡 = 𝑡0 , … , 𝑡1 − 1} influences agent 𝑗’s choice of control laws
However, in the Markov perfect equilibrium of this game, each agent is assumed to ignore the
influence that his choice exerts on the other agent’s choice
After these equations have been solved, we can also deduce associated sequences of worst-case
shocks
𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡
where
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞
This is the approach we adopt in the next section
880 52. ROBUST MARKOV PERFECT EQUILIBRIUM
52.3.6 Implementation
We use the function nnash_robust to compute a Markov perfect equilibrium of the infinite
horizon linear quadratic dynamic game with robust planers in the manner described above
52.4 Application
Without concerns for robustness, the model is identical to the duopoly model from the
Markov perfect equilibrium lecture
To begin, we briefly review the structure of that model
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (10)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0
In Eq. (10) and what follows,
Each firm recognizes that its output affects total output and therefore the market price
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve Eq. (10) into Eq. (11) lets us express the one-period
payoff as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
then we recover the one-period payoffs Eq. (11) for the two firms in the duopoly model
The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where
1 0 0 0 0
𝐴 ∶= ⎡ ⎤
⎢0 1 0 ⎥ , 𝐵1 ∶= ⎡ ⎤
⎢1⎥ , 𝐵2 ∶= ⎡
⎢0⎥
⎤
⎣0 0 1 ⎦ ⎣0⎦ ⎣1⎦
A robust decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following closed-
loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
From these, we computed the infinite horizon MPE without robustness using the code
In [2]: """
"""
import numpy as np
import quantecon as qe
# == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# == In LQ form == #
882 52. ROBUST MARKOV PERFECT EQUILIBRIUM
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
def nnash_robust(A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
θ1, θ2, beta=1.0, tol=1e-8, max_iter=1000):
r"""
Compute the limit of a Nash linear quadratic dynamic game with
robustness concern.
Parameters
----------
A : scalar(float) or array_like(float)
Corresponds to the MPE equations, should be of size (n, n)
C : scalar(float) or array_like(float)
As above, size (n, c), c is the size of w
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
θ1 : scalar(float)
Robustness parameter of player 1
θ2 : scalar(float)
Robustness parameter of player 2
beta : scalar(float), optional(default=1.0)
Discount factor
tol : scalar(float), optional(default=1e-8)
This is the tolerance level for convergence
max_iter : scalar(int), optional(default=1000)
This is the maximum number of iterations allowed
Returns
-------
F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 1
P2 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 2
"""
# == Unload parameters and make sure everything is a matrix == #
params = A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2
params = map(np.asmatrix, params)
A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2 = params
# == Initial values == #
n = A.shape[0]
k_1 = B1.shape[1]
k_2 = B2.shape[1]
v1 = np.eye(k_1)
v2 = np.eye(k_2)
P1 = np.eye(n) * 1e-5
P2 = np.eye(n) * 1e-5
F1 = np.random.randn(k_1, n)
F2 = np.random.randn(k_2, n)
for it in range(max_iter):
# update
F10 = F1
F20 = F2
I = np.eye(C.shape[1])
# D1(P1)
# Note: INV1 may not be solved if the matrix is singular
INV1 = solve(θ1 * I - C.T @ P1 @ C, I)
D1P1 = P1 + P1 @ C @ INV1 @ C.T @ P1
# D2(P2)
# Note: INV2 may not be solved if the matrix is singular
INV2 = solve(θ2 * I - C.T @ P2 @ C, I)
D2P2 = P2 + P2 @ C @ INV2 @ C.T @ P2
Λ1 = A - B2 @ F2
Λ2 = A - B1 @ F1
Π1 = R1 + F2.T @ S1 @ F2
Π2 = R2 + F1.T @ S2 @ F1
Γ1 = W1.T - M1.T @ F2
Γ2 = W2.T - M2.T @ F1
# Compute P1 and P2
P1 = Π1 - (B1.T @ D1P1 @ Λ1 + Γ1).T @ F1 + \
Λ1.T @ D1P1 @ Λ1
P2 = Π2 - (B2.T @ D2P2 @ Λ2 + Γ2).T @ F2 + \
Λ2.T @ D2P2 @ Λ2
else:
raise ValueError(f'No convergence: Iteration limit of {maxiter} reached in nnash')
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 }
𝑡=𝑡0
where
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
and
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡ −
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥, 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤,
⎥ 𝑄1 = 𝑄2 = 𝛾, 𝑆1 = 𝑆2 = 0, 𝑊1 = 𝑊2 = 0, 𝑀1 = 𝑀
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
In [4]: # == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Consistency Check
We first conduct a comparison test to check if nnash_robust agrees with qe.nnash in the
non-robustness case in which each 𝜃𝑖 ≈ +∞
We can see that the results are consistent across the two functions
Comparative Dynamics under Baseline Transition Dynamics
We want to compare the dynamics of price and output under the baseline MPE model with
those under the baseline model under the robust decision rules within the robust MPE
This means that we simulate the state dynamics under the MPE equilibrium closed-loop
transition matrix
𝐴𝑜 = 𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2
where 𝐹1 and 𝐹2 are the firms’ robust decision rules within the robust markov_perfect equi-
librium
• by simulating under the baseline model transition dynamics and the robust
MPE rules we are in assuming that at the end of the day firms’ concerns
about misspecification of the baseline model do not materialize
• a short way of saying this is that misspecification fears are all ‘just in the
minds’ of the firms
• simulating under the baseline model is a common practice in the literature
• note that some assumption about the model that actually governs the data
has to be made in order to create a simulation
• later we will describe the (erroneous) beliefs of the two firms that justify
their robust decisions as best responses to transition laws that are distorted
relative to the baseline model
After simulating 𝑥𝑡 under the baseline transition dynamics and robust decision rules 𝐹𝑖 , 𝑖 =
1, 2, we extract and plot industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡
Here we set the robustness and volatility matrix parameters as follows:
• 𝜃1 = 0.02
• 𝜃2 = 0.04
0
• 𝐶 = ⎜0.01⎞
⎛ ⎟
⎝ 0.01⎠
• both firms fear that the baseline specification of the state transition dynam-
ics are incorrect
• firm 1 fears misspecification more than firm 2
The following code prepares graphs that compare market-wide output 𝑞1𝑡 + 𝑞2𝑡 and the price
of the good 𝑝𝑡 under equilibrium decision rules 𝐹𝑖 , 𝑖 = 1, 2 from an ordinary Markov perfect
equilibrium and the decision rules under a Markov perfect equilibrium with robust firms with
multiplier parameters 𝜃𝑖 , 𝑖 = 1, 2 set as described above
888 52. ROBUST MARKOV PERFECT EQUILIBRIUM
Both industry output and price are under the transition dynamics associated with the base-
line model; only the decision rules 𝐹𝑖 differ across the two equilibrium objects presented
ax = axes[0]
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE output')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
Under the dynamics associated with the baseline model, the price path is higher with the
Markov perfect equilibrium robust decision rules than it is with decision rules for the ordinary
Markov perfect equilibrium
So is the industry output path
52.4. APPLICATION 889
To dig a little beneath the forces driving these outcomes, we want to plot 𝑞1𝑡 and 𝑞2𝑡 in the
Markov perfect equilibrium with robust firms and to compare them with corresponding ob-
jects in the Markov perfect equilibrium without robust firms
ax = axes[0]
ax.plot(q1, 'g-', lw=2, alpha=0.75, label='firm 1 MPE output')
ax.plot(qr1, 'b-', lw=2, alpha=0.75, label='firm 1 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(q2, 'g-', lw=2, alpha=0.75, label='firm 2 MPE output')
ax.plot(qr2, 'r-', lw=2, alpha=0.75, label='firm 2 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
plt.show()
Evidently, firm 1’s output path is substantially lower when firms are robust firms while firm
2’s output path is virtually the same as it would be in an ordinary Markov perfect equilib-
rium with no robust firms
890 52. ROBUST MARKOV PERFECT EQUILIBRIUM
Recall that we have set 𝜃1 = .02 and 𝜃2 = .04, so that firm 1 fears misspecification of the
baseline model substantially more than does firm 2
• but also please notice that firm 2’s behavior in the Markov perfect equilibrium with ro-
bust firms responds to the decision rule 𝐹1 𝑥𝑡 employed by firm 1
• thus it is something of a coincidence that its output is almost the same in the two equi-
libria
Larger concerns about misspecification induce firm 1 to be more cautious than firm 2 in pre-
dicting market price and the output of the other firm
To explore this, we study next how ex-post the two firms’ beliefs about state dynamics differ
in the Markov perfect equilibrium with robust firms
(by ex-post we mean after extremization of each firm’s intertemporal objective)
Heterogeneous Beliefs
As before, let 𝐴𝑜 = 𝐴 − 𝐵_1𝐹 _1𝑟 − 𝐵_2𝐹 _2𝑟 , where in a robust MPE, 𝐹𝑖𝑟 is a robust deci-
sion rule for firm 𝑖
Worst-case forecasts of 𝑥𝑡 starting from 𝑡 = 0 differ between the two firms
This means that worst-case forecasts of industry output 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 also differ be-
tween the two firms
To find these worst-case beliefs, we compute the following three “closed-loop” transition ma-
trices
• 𝐴𝑜
• 𝐴𝑜 + 𝐶𝐾_1
• 𝐴𝑜 + 𝐶𝐾_2
We call the first transition law, namely, 𝐴𝑜 , the baseline transition under firms’ robust deci-
sion rules
We call the second and third worst-case transitions under robust decision rules for firms 1 and
2
From {𝑥𝑡 } paths generated by each of these transition laws, we pull off the associated price
and total output sequences
The following code plots them
In [10]: # == Plot == #
fig, axes = plt.subplots(2, 1, figsize=(9, 9))
ax = axes[0]
ax.plot(qrp1, 'b--', lw=2, alpha=0.75, label='RMPE worst-case belief output player 1')
ax.plot(qrp2, 'r:', lw=2, alpha=0.75, label='RMPE worst-case belief output player 2')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(prp1, 'b--', lw=2, alpha=0.75, label='RMPE worst-case belief price player 1')
ax.plot(prp2, 'r:', lw=2, alpha=0.75, label='RMPE worst-case belief price player 2')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
We see from the above graph that under robustness concerns, player 1 and player 2 have het-
erogeneous beliefs about total output and the goods price even though they share the same
baseline model and information
• firm 1 thinks that total output will be higher and price lower than does firm
2
• this leads firm 1 to produce less than firm 2
892 52. ROBUST MARKOV PERFECT EQUILIBRIUM
These beliefs justify (or rationalize) the Markov perfect equilibrium robust decision rules
This means that the robust rules are the unique optimal rules (or best responses) to the in-
dicated worst-case transition dynamics
([52] discuss how this property of robust decision rules is connected to the concept of admissi-
bility in Bayesian statistical decision theory)
53
Uncertainty Traps
53.1 Contents
• Overview 53.2
• Implementation 53.4
• Results 53.5
• Exercises 53.6
• Solutions 53.7
53.2 Overview
893
894 53. UNCERTAINTY TRAPS
Uncertainty traps stem from a positive externality: high aggregate economic activity levels
generates valuable information
The original model described in [42] has many interesting moving parts
Here we examine a simplified version that nonetheless captures many of the key ideas
53.3.1 Fundamentals
where
53.3.2 Output
Dropping time subscripts, beliefs for current 𝜃 are represented by the normal distribution
𝑁 (𝜇, 𝛾 −1 )
Here 𝛾 is the precision of beliefs; its inverse is the degree of uncertainty
These parameters are updated by Kalman filtering
Let
With this notation and primes for next period values, we can write the updating of the mean
and precision via
𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥
−1
𝜌2
𝛾′ = ( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥
These are standard Kalman filtering results applied to the current setting
Exercise 1 provides more details on how Eq. (2) and Eq. (3) are derived and then asks you to
fill in remaining steps
The next figure plots the law of motion for the precision in Eq. (3) as a 45 degree diagram,
with one curve for each 𝑀 ∈ {0, … , 6}
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
896 53. UNCERTAINTY TRAPS
Points where the curves hit the 45 degree lines are long-run steady states for precision for dif-
ferent values of 𝑀
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision
• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state
53.3.4 Participation
Omitting time subscripts once more, entrepreneurs enter the market in the current period if
Here
• the mathematical expectation of 𝑥𝑚 is based on Eq. (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but pre-visible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
53.4. IMPLEMENTATION 897
The statement that 𝐹𝑚 is pre-visible means that it is realized at the start of the period and
treated as a constant in Eq. (4)
The utility function has the constant absolute risk aversion form
1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
where 𝑎 is a positive parameter
Combining Eq. (4) and Eq. (5), entrepreneur 𝑚 participates in the market (or is said to be
active) when
1
{1 − E[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition
1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2
53.4 Implementation
• the parameters, the current value of 𝜃 and the current values of the two belief parame-
ters 𝜇 and 𝛾
• methods to update 𝜃, 𝜇 and 𝛾, as well as to determine the number of active firms and
their outputs
The updating methods follow the laws of motion for 𝜃, 𝜇 and 𝛾 given above
The method to evaluate the number of active firms generates 𝐹1 , … , 𝐹𝑀̄ and tests condition
Eq. (6) for each firm
The init method encodes as default values the parameters we’ll use in the simulations below
class UncertaintyTrapEcon:
def __init__(self,
a=1.5, # Risk aversion
γ_x=0.5, # Production shock precision
ρ=0.99, # Correlation coefficient for θ
σ_θ=0.5, # Standard dev of θ shock
num_firms=100, # Number of firms
σ_F=1.5, # Standard dev of fixed costs
c=-420, # External opportunity cost
μ_init=0, # Initial value for μ
γ_init=4, # Initial value for γ
θ_init=0): # Initial value for θ
898 53. UNCERTAINTY TRAPS
# == Record values == #
self.a, self.γ_x, self.ρ, self.σ_θ = a, γ_x, ρ, σ_θ
self.num_firms, self.σ_F, self.c, = num_firms, σ_F, c
self.σ_x = np.sqrt(1/γ_x)
# == Initialize states == #
self.γ, self.μ, self.θ = γ_init, μ_init, θ_init
def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (μ, γ). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.σ_F * np.random.randn(self.num_firms)
M = np.sum(self.ψ(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.θ + self.σ_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M
In the results below we use this code to simulate time series for the major variables
53.5 Results
Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
53.5. RESULTS 899
We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information
These are episodes where the uncertainty traps take hold
During these episodes
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
900 53. UNCERTAINTY TRAPS
Notice how the traps only take hold after a sequence of bad draws for the fundamental
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity
53.6. EXERCISES 901
53.6 Exercises
53.6.1 Exercise 1
Fill in the details behind Eq. (2) and Eq. (3) based on the following standard result (see, e.g.,
p. 24 of [136])
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is
where
𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥
53.6.2 Exercise 2
• Use the parameter values listed as defaults in the init method of the Uncertainty-
TrapEcon class
53.7 Solutions
In [2]: import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import itertools
53.7.1 Exercise 1
This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting. The stated result
tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior beliefs will be
𝑁 (𝜇0 , 1/𝛾0 )
where
𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥
If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture
902 53. UNCERTAINTY TRAPS
53.7.2 Exercise 2
First, let’s replicate the plot that illustrates the law of motion for precision, which is
−1
𝜌2
𝛾𝑡+1 = ( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥
Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀
for M in range(7):
γ_next = 1 / (ρ**2 / (γ + M * γ_x) + σ_θ**2)
label_string = f"$M = {M}$"
ax.plot(γ, γ_next, lw=2, label=label_string)
ax.legend(loc='lower right', fontsize=14)
ax.set_xlabel(r'$\gamma$', fontsize=16)
ax.set_ylabel(r"$\gamma'$", fontsize=16)
ax.grid()
plt.show()
53.7. SOLUTIONS 903
The points where the curves hit the 45 degree lines are the long-run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long-run steady state of precision
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output
In [4]: sim_length=2000
μ_vec = np.empty(sim_length)
θ_vec = np.empty(sim_length)
γ_vec = np.empty(sim_length)
X_vec = np.empty(sim_length)
M_vec = np.empty(sim_length)
μ_vec[0] = econ.μ
γ_vec[0] = econ.γ
θ_vec[0] = 0
w_shocks = np.random.randn(sim_length)
for t in range(sim_length-1):
X, M = econ.gen_aggregates()
X_vec[t] = X
M_vec[t] = M
econ.update_beliefs(X, M)
econ.update_θ(w_shocks[t])
μ_vec[t+1] = econ.μ
γ_vec[t+1] = econ.γ
θ_vec[t+1] = econ.θ
plt.show()
53.7. SOLUTIONS 905
If you run the code above you’ll get different plots, of course
906 53. UNCERTAINTY TRAPS
Try experimenting with different parameters to see the effects on the time series
(It would also be interesting to experiment with non-Gaussian distributions for the shocks,
but this is a big exercise since it takes us outside the world of the standard Kalman filter)
54
54.1 Contents
• Overview 54.2
• Firms 54.4
• Code 54.5
In addition to what’s in Anaconda, this lecture will need the following libraries
54.2 Overview
In this lecture, we describe the structure of a class of models that build on work by Truman
Bewley [15]
We begin by discussing an example of a Bewley model due to Rao Aiyagari
The model features
• Heterogeneous agents
• A single exogenous vehicle for borrowing and lending
• Limits on amounts individual agents may borrow
The Aiyagari model has been used to investigate many topics, including
907
908 54. THE AIYAGARI MODEL
54.2.1 References
54.3.1 Households
∞
max E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
where
• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow
The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃
The wage and interest rate are fixed over time
In this simple version of the model, households supply labor inelastically because they do not
value leisure
54.4 Firms
𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼
where
1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾
Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as
54.4.1 Equilibrium
• prices induce behavior that generates aggregate quantities consistent with the prices
• aggregate quantities and prices are constant over time
In more detail, an SREE lists a set of prices, savings and production policies such that
• households want to choose the specified savings policies taking the prices as given
• firms maximize profits taking the same prices as given
• the resulting aggregate quantities are consistent with the prices; in particular, the de-
mand for capital equals the supply
• aggregate quantities (defined as cross-sectional averages) are constant
In practice, once parameter values are set, we can check for an SREE by the following steps
2. determine corresponding prices, with interest rate 𝑟 determined by Eq. (1) and a wage
rate 𝑤(𝑟) as given in Eq. (2)
3. determine the common optimal savings policy of the households given these prices
4. compute aggregate capital as the mean of steady state capital given this savings policy
54.5 Code
class Household:
"""
This class takes the parameters that define a household asset accumulation
problem and computes the corresponding reward and transition matrices R
and Q required to generate an instance of DiscreteDP, and thereby solve
for the optimal policy.
"""
def __init__(self,
54.5. CODE 911
self.Π = np.asarray(Π)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)
def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Π)
def build_R(self):
self.R.fill(-np.inf)
populate_R(self.R, self.a_size, self.z_size, self.a_vals, self.z_vals, self.r, self.w)
@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a - a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility
@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Π):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i * z_size + next_z_i] = Π[z_i, next_z_i]
@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
912 54. THE AIYAGARI MODEL
As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices
# Example prices
r = 0.03
w = 0.956
# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size
# Get all optimal actions across the set of a indices with z fixed in each row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]
plt.show()
The plot shows asset accumulation policies at different values of the exogenous state
Now we want to calculate the equilibrium
Let’s do this visually as a first pass
The following code draws aggregate supply and demand curves
The intersection gives equilibrium interest rates and capital
In [4]: A = 1.0
N = 1.0
α = 0.33
β = 0.96
δ = 0.05
54.5. CODE 913
def r_to_w(r):
"""
Equilibrium wages associated with a given interest rate r.
"""
return A * (1 - α) * (A * α / (r + δ))**(α / (1 - α))
def rd(K):
"""
Inverse demand curve for capital. The interest rate associated with a
given demand for capital K.
"""
return A * α * (N / K)**(1 - α) - δ
Parameters:
----------
am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, β)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)
plt.show()
914 54. THE AIYAGARI MODEL
55
55.1 Contents
• Overview 55.2
• Structure 55.3
• Equilibrium 55.4
• Computation 55.5
• Results 55.6
• Exercises 55.7
• Solutions 55.8
In addition to what’s in Anaconda, this lecture will need the following libraries
55.2 Overview
915
916 55. DEFAULT RISK AND INCOME FLUCTUATIONS
The interest rate on government debt adjusts in response to the state-dependent default
probability chosen by government
The model yields outcomes that help interpret sovereign default experiences, including
Notably, long recessions caused by bad draws in the income process increase the government’s
incentive to default
This can lead to
55.3 Structure
A small open economy is endowed with an exogenous stochastically fluctuating potential out-
put stream {𝑦𝑡 }
Potential output is realized only in periods in which the government honors its sovereign debt
The output good can be traded or consumed
The sequence {𝑦𝑡 } is described by a Markov process with stochastic density kernel 𝑝(𝑦, 𝑦′ )
Households within the country are identical and rank stochastic consumption streams accord-
ing to
∞
E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
Here
The government is the only domestic actor with access to foreign credit
Because household are averse to consumption fluctuations, the government will try to smooth
consumption by borrowing from (and lending to) foreign creditors
The only credit instrument available to the government is a one-period bond traded in inter-
national credit markets
The bond market has the following features
• A purchase of a bond with face value 𝐵′ is a claim to 𝐵′ units of the consumption good
next period
• For selling −𝐵′ units of next period goods the seller earns −𝑞𝐵′ of today’s goods
– if 𝐵′ < 0, then −𝑞𝐵′ units of the good are received in the current period, for a
promise to repay −𝐵′ units next period
– there is an equilibrium price function 𝑞(𝐵′ , 𝑦) that makes 𝑞 depend on both 𝐵′ and
𝑦
Earnings on the government portfolio are distributed (or, if negative, taxed) lump sum to
households
When the government is not excluded from financial markets, the one-period national budget
constraint is
Here and below, a prime denotes a next period value or a claim maturing next period
To rule out Ponzi schemes, we also require that 𝐵 ≥ −𝑍 in every period
Foreign creditors
When a government is expected to default next period with probability 𝛿, the expected value
of a promise to pay one unit of consumption next period is 1 − 𝛿
Therefore, the discounted expected value of a promise to pay 𝐵 next period is
1−𝛿
𝑞= (3)
1+𝑟
Next we turn to how the government in effect chooses the default probability 𝛿
1. defaulting
2. meeting its current obligations and purchasing or selling an optimal quantity of one-
period sovereign debt
• it returns to 𝑦 only after the country regains access to international credit markets
While in a state of default, the economy regains access to foreign credit in each subsequent
period with probability 𝜃
55.4 Equilibrium
1. The interest rate on the government’s debt includes a risk-premium sufficient to make
foreign creditors expect on average to earn the constant risk-free international interest
rate
To express these ideas more precisely, consider first the choices of the government, which
1. enters a period with initial assets 𝐵, or what is the same thing, initial debt to be repaid
now of −𝐵
3. chooses either
4. to default, or
In a recursive formulation,
𝑣𝑑 (𝑦) does not depend on 𝐵 because, when access to credit is eventually regained, net foreign
assets equal 0
Expressed recursively, the value of defaulting is
𝑣𝑐 (𝐵, 𝑦) = max
′
{𝑢(𝑦 − 𝑞(𝐵′ , 𝑦)𝐵′ + 𝐵) + 𝛽 ∫ 𝑣(𝐵′ , 𝑦′ )𝑝(𝑦, 𝑦′ )𝑑𝑦′ }
𝐵 ≥−𝑍
Given zero profits for foreign creditors in equilibrium, we can combine Eq. (3) and Eq. (4) to
pin down the bond price function:
1 − 𝛿(𝐵′ , 𝑦)
𝑞(𝐵′ , 𝑦) = (5)
1+𝑟
An equilibrium is
such that
• The three Bellman equations for (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)) are satisfied
• Given the price function 𝑞(𝐵′ , 𝑦), the default decision rule and the asset accumulation
decision rule attain the optimal value function 𝑣(𝐵, 𝑦), and
• The price function 𝑞(𝐵′ , 𝑦) satisfies equation Eq. (5)
55.5 Computation
• The appendix to [8] recommends value function iteration until convergence, updating
the price, and then repeating
• Instead, we update the bond price at every value function iteration step
The second approach is faster and the two different procedures deliver very similar results
Here is a more detailed description of our algorithm:
1. Update the value function 𝑣(𝐵, 𝑦), the default rule, the implied ex ante default probabil-
ity, and the price function
2. Check for convergence. If converged, stop – if not, go to step 2
In [2]: """
"""
import numpy as np
import random
import quantecon as qe
from numba import jit
class Arellano_Economy:
"""
Arellano 2008 deals with a small open economy whose government
invests in foreign assets in order to smooth the consumption of
domestic households. Domestic households receive a stochastic
path of income.
Parameters
----------
β : float
Time discounting parameter
γ : float
Risk-aversion parameter
r : float
int lending rate
ρ : float
Persistence in the income process
η : float
Standard deviation of the income process
θ : float
Probability of re-entering financial markets in each period
ny : int
Number of points in y grid
nB : int
Number of points in B grid
tol : float
Error tolerance in iteration
maxit : int
Maximum number of iterations
"""
def __init__(self,
β=.953, # time discount rate
γ=2., # risk aversion
r=0.017, # international interest rate
ρ=.945, # persistence in output
η=0.025, # st dev of output shock
θ=0.282, # prob of regaining access
ny=21, # number of points in y grid
nB=251, # number of points in B grid
tol=1e-8, # error tolerance in iteration
maxit=10000):
# Save parameters
self.β, self.γ, self.r = β, γ, r
self.ρ, self.η, self.θ = ρ, η, θ
922 55. DEFAULT RISK AND INCOME FLUCTUATIONS
# Allocate memory
self.Vd = np.zeros(ny)
self.Vc = np.zeros((ny, nB))
self.V = np.zeros((ny, nB))
self.Q = np.ones((ny, nB)) * .95 # Initial guess for prices
self.default_prob = np.empty((ny, nB))
# == Main loop == #
while dist > tol and maxit > it:
# Update prices
Vd_compat = np.repeat(self.Vd, self.nB).reshape(self.ny, self.nB)
default_states = Vd_compat > self.Vc
self.default_prob[:, :] = self.Py @ default_states
self.Q[:, :] = (1 - self.default_prob)/(1 + self.r)
it += 1
if it % 25 == 0:
print(f"Running iteration {it} with dist of {dist}")
return None
def compute_savings_policy(self):
"""
Compute optimal savings B' conditional on not defaulting.
The policy is recorded as an index value in Bgrid.
"""
# Allocate memory
self.next_B_index = np.empty((self.ny, self.nB))
EV = self.Py @ self.V
55.5. COMPUTATION 923
if y_init is None:
# Set to index near the mean of the ygrid
y_init = np.searchsorted(self.ygrid, self.ygrid.mean())
if B_init is None:
B_init = zero_B_index
# Start off not in default
in_default = False
for t in range(T-1):
yi, Bi = y_sim_indices[t], B_sim_indices[t]
if not in_default:
if self.Vc[yi, Bi] < self.Vd[yi]:
in_default = True
Bi_next = zero_B_index
else:
new_index = self.next_B_index[yi, Bi]
Bi_next = new_index
else:
in_default_series[t] = 1
Bi_next = zero_B_index
if random.uniform(0, 1) < self.θ:
in_default = False
B_sim_indices[t+1] = Bi_next
q_sim[t] = self.Q[yi, int(Bi_next)]
return return_vecs
@jit(nopython=True)
def u(c, γ):
return c**(1-γ)/(1-γ)
@jit(nopython=True)
def _inner_loop(ygrid, def_y, Bgrid, Vd, Vc, EVc,
EVd, EV, qq, β, θ, γ):
"""
This is a numba version of the inner loop of the solve in the
Arellano class. It updates Vd and Vc in place.
"""
ny, nB = len(ygrid), len(Bgrid)
zero_ind = nB // 2 # Integer division
for iy in range(ny):
y = ygrid[iy] # Pull out current y
# Compute Vd
Vd[iy] = u(def_y[iy], γ) + \
β * (θ * EVc[iy, zero_ind] + (1 - θ) * EVd[iy])
# Compute Vc
924 55. DEFAULT RISK AND INCOME FLUCTUATIONS
for ib in range(nB):
B = Bgrid[ib] # Pull out current B
current_max = -1e14
for ib_next in range(nB):
c = max(y - qq[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
Vc[iy, ib] = current_max
return None
@jit(nopython=True)
def _compute_savings_policy(ygrid, Bgrid, Q, EV, γ, β, next_B_index):
# Compute best index in Bgrid given iy, ib
ny, nB = len(ygrid), len(Bgrid)
for iy in range(ny):
y = ygrid[iy]
for ib in range(nB):
B = Bgrid[ib]
current_max = -1e10
for ib_next in range(nB):
c = max(y - Q[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
current_max_index = ib_next
next_B_index[iy, ib] = current_max_index
return None
55.6 Results
• For example, r=0.017 matches the average quarterly rate on a 5 year US treasury over
the period 1983–2001
Details on how to compute the figures are reported as solutions to the exercises
The first figure shows the bond price schedule and replicates Figure 3 of Arellano, where 𝑦𝐿
and 𝑌𝐻 are particular below average and above average values of output 𝑦
55.6. RESULTS 925
• Higher levels of debt (larger −𝐵′ ) induce larger discounts on the face value, which cor-
respond to higher interest rates
• Lower income also causes more discounting, as foreign creditors anticipate greater likeli-
hood of default
926 55. DEFAULT RISK AND INCOME FLUCTUATIONS
The next figure plots value functions and replicates the right hand panel of Figure 4 of [8]
We can use the results of the computation to study the default probability 𝛿(𝐵′ , 𝑦) defined in
Eq. (4)
The next plot shows these default probabilities over (𝐵′ , 𝑦) as a heat map
As anticipated, the probability that the government chooses to default in the following period
increases with indebtedness and falls with income
Next let’s run a time series simulation of {𝑦𝑡 }, {𝐵𝑡 } and 𝑞(𝐵𝑡+1 , 𝑦𝑡 )
The grey vertical bars correspond to periods when the economy is excluded from financial
markets because of a past default
55.7. EXERCISES 927
One notable feature of the simulated data is the nonlinear response of interest rates
Periods of relative stability are followed by sharp spikes in the discount rate on government
debt
55.7 Exercises
55.7.1 Exercise 1
To the extent that you can, replicate the figures shown above
• Use the parameter values listed as defaults in the __init__ method of the Arel-
lano_Economy
• The time series will of course vary depending on the shock draws
55.8 Solutions
In [4]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))
In [5]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))
# Create figure
fig, ax = plt.subplots(figsize=(10, 6.5))
hm = ax.pcolormesh(xx, yy, zz)
cax = fig.add_axes([.92, .1, .02, .8])
fig.colorbar(hm, cax=cax)
ax.axis([xx.min(), 0.05, yy.min(), yy.max()])
ax.set(xlabel="$B'$", ylabel="$y$", title="Probability of Default")
plt.show()
55.8. SOLUTIONS 931
In [7]: T = 250
y_vec, B_vec, q_vec, default_vec = ae.simulate(T)
plt.show()
932 55. DEFAULT RISK AND INCOME FLUCTUATIONS
56
56.1 Contents
• Overview 56.2
• Model 56.4
• Simulation 56.5
• Exercises 56.6
• Solutions 56.7
56.2 Overview
In this lecture, we review the paper Globalization and Synchronization of Innovation Cycles
by Kiminori Matsuyama, Laura Gardini and Iryna Sushko
This model helps us understand several interesting stylized facts about the world economy
One of these is synchronized business cycles across different countries
Most existing models that generate synchronized business cycles do so by assumption, since
they tie output in each country to a common shock
They also fail to explain certain features of the data, such as the fact that the degree of syn-
chronization tends to increase with trade ties
By contrast, in the model we consider in this lecture, synchronization is both endogenous and
increasing with the extent of trade integration
In particular, as trade costs fall and international competition increases, innovation incentives
become aligned and countries synchronize their innovation cycles
933
934 56. GLOBALIZATION AND CYCLES
56.2.1 Background
The model builds on work by Judd [73], Deneckner and Judd [34] and Helpman and Krugman
[64] by developing a two-country model with trade and innovation
On the technical side, the paper introduces the concept of coupled oscillators to economic
modeling
As we will see, coupled oscillators arise endogenously within the model
Below we review the model and replicate some of the results on synchronization of innovation
across countries
As discussed above, two countries produce and trade with each other
In each country, firms innovate, producing new varieties of goods and, in doing so, receiving
temporary monopoly power
Imitators follow and, after one period of monopoly, what had previously been new varieties
now enter competitive production
Firms have incentives to innovate and produce new goods when the mass of varieties of goods
currently in production is relatively low
In addition, there are strategic complementarities in the timing of innovation
Firms have incentives to innovate in the same period, so as to avoid competing with substi-
tutes that are competitively produced
This leads to temporal clustering in innovations in each country
After a burst of innovation, the mass of goods currently in production increases
However, goods also become obsolete, so that not all survive from period to period
This mechanism generates a cycle, where the mass of varieties increases through simultaneous
innovation and then falls through obsolescence
56.3.2 Synchronization
In the absence of trade, the timing of innovation cycles in each country is decoupled
This will be the case when trade costs are prohibitively high
If trade costs fall, then goods produced in each country penetrate each other’s markets
As illustrated below, this leads to synchronization of business cycles across the two countries
56.4. MODEL 935
56.4 Model
𝑜 1−𝛼 𝛼
𝑋𝑘,𝑡 𝑋𝑘,𝑡
𝑌𝑘,𝑡 = 𝐶𝑘,𝑡 = ( ) ( )
1−𝛼 𝛼
𝑜
Here 𝑋𝑘,𝑡 is a homogeneous input which can be produced from labor using a linear, one-for-
one technology
It is freely tradeable, competitively supplied, and homogeneous across countries
By choosing the price of this good as numeraire and assuming both countries find it optimal
to always produce the homogeneous good, we can set 𝑤1,𝑡 = 𝑤2,𝑡 = 1
The good 𝑋𝑘,𝑡 is a composite, built from many differentiated goods via
1
1− 1 1− 𝜎
𝑋𝑘,𝑡 𝜎 = ∫ [𝑥𝑘,𝑡 (𝜈)] 𝑑𝜈
Ω𝑡
Here 𝑥𝑘,𝑡 (𝜈) is the total amount of a differentiated good 𝜈 ∈ Ω𝑡 that is produced
The parameter 𝜎 > 1 is the direct partial elasticity of substitution between a pair of varieties
and Ω𝑡 is the set of varieties available in period 𝑡
We can split the varieties into those which are supplied competitively and those supplied mo-
nopolistically; that is, Ω𝑡 = Ω𝑐𝑡 + Ω𝑚
𝑡
56.4.1 Prices
−𝜎
𝑝𝑘,𝑡 (𝜈) 𝛼𝐿𝑘
𝑥𝑘,𝑡 (𝜈) = ( )
𝑃𝑘,𝑡 𝑃𝑘,𝑡
Here
1−𝜎
[𝑃𝑘,𝑡 ] = ∫ [𝑝𝑘,𝑡 (𝜈)]1−𝜎 𝑑𝜈
Ω𝑡
The price of a variety also depends on the origin, 𝑗, and destination, 𝑘, of the goods because
shipping varieties between countries incurs an iceberg trade cost 𝜏𝑗,𝑘
Thus the effective price in country 𝑘 of a variety 𝜈 produced in country 𝑗 becomes 𝑝𝑘,𝑡 (𝜈) =
𝜏𝑗,𝑘 𝑝𝑗,𝑡 (𝜈)
Using these expressions, we can derive the total demand for each variety, which is
where
𝜌𝑗,𝑘 𝐿𝑘
𝐴𝑗,𝑡 ∶= ∑ and 𝜌𝑗,𝑘 = (𝜏𝑗,𝑘 )1−𝜎 ≤ 1
𝑘
(𝑃𝑘,𝑡 )1−𝜎
It is assumed that 𝜏1,1 = 𝜏2,2 = 1 and 𝜏1,2 = 𝜏2,1 = 𝜏 for some 𝜏 > 1, so that
𝑐 𝑐 𝑐 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 𝜓 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
Monopolists will have the same marked-up price, so, for all 𝜈 ∈ Ω𝑚 ,
𝑚 𝜓 𝑚 𝑚 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
1 − 𝜎1
Define
𝑐
𝑝𝑗,𝑡 𝑐
𝑦𝑗,𝑡 1 1−𝜎
𝜃 ∶= 𝑚 𝑚 = (1 − )
𝑝𝑗,𝑡 𝑦𝑗,𝑡 𝜎
Using the preceding definitions and some algebra, the price indices can now be rewritten as
1−𝜎 𝑚
𝑃𝑘,𝑡 𝑐
𝑁𝑗,𝑡
( ) = 𝑀𝑘,𝑡 + 𝜌𝑀𝑗,𝑡 where 𝑀𝑗,𝑡 ∶= 𝑁𝑗,𝑡 +
𝜓 𝜃
𝑐 𝑚
The symbols 𝑁𝑗,𝑡 and 𝑁𝑗,𝑡 will denote the measures of Ω𝑐 and Ω𝑚 respectively
56.4. MODEL 937
To introduce a new variety, a firm must hire 𝑓 units of labor per variety in each country
Monopolist profits must be less than or equal to zero in expectation, so
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝑁𝑗,𝑡 ≥ 0, 𝜋𝑗,𝑡 ∶= (𝑝𝑗,𝑡 − 𝜓)𝑦𝑗,𝑡 −𝑓 ≤ 0 and 𝜋𝑗,𝑡 𝑁𝑗,𝑡 =0
𝑚 𝑐 1 𝛼𝐿𝑗 𝛼𝐿𝑘
𝑁𝑗,𝑡 = 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ) ≥ 0, [ + ]≤𝑓
𝜎 𝜃(𝑀𝑗,𝑡 + 𝜌𝑀𝑘,𝑡 ) 𝜃(𝑀𝑗,𝑡 + 𝑀𝑘,𝑡 /𝜌)
With 𝛿 as the exogenous probability of a variety becoming obsolete, the dynamic equation for
the measure of firms becomes
𝑐 𝑐 𝑚 𝑐 𝑐
𝑁𝑗,𝑡+1 = 𝛿(𝑁𝑗,𝑡 + 𝑁𝑗,𝑡 ) = 𝛿(𝑁𝑗,𝑡 + 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ))
𝑐 𝑚
𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑀𝑗,𝑡 𝑖𝑗,𝑡
𝑛𝑗,𝑡 ∶= , 𝑖𝑗,𝑡 ∶= , 𝑚𝑗,𝑡 ∶= = 𝑛𝑗,𝑡 +
𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝜃
𝐿𝑗
We also use 𝑠𝑗 ∶= 𝐿1 +𝐿2 to be the share of labor employed in country 𝑗
We can use these definitions and the preceding expressions to obtain a law of motion for
𝑛𝑡 ∶= (𝑛1,𝑡 , 𝑛2,𝑡 )
In particular, given an initial condition, 𝑛0 = (𝑛1,0 , 𝑛2,0 ) ∈ R2+ , the equilibrium trajectory,
𝑡=0 = {(𝑛1,𝑡 , 𝑛2,𝑡 )}𝑡=0 , is obtained by iterating on 𝑛𝑡+1 = 𝐹 (𝑛𝑡 ) where 𝐹 ∶ R+ → R+ is
{𝑛𝑡 }∞ ∞ 2 2
given by
Here
while
𝑠1 − 𝜌𝑠2
𝑠1 (𝜌) = 1 − 𝑠2 (𝜌) = min { , 1}
1−𝜌
938 56. GLOBALIZATION AND CYCLES
𝑠𝑗 𝑠𝑘
1= +
ℎ𝑗 (𝑛𝑘 ) + 𝜌𝑛𝑘 ℎ𝑗 (𝑛𝑘 ) + 𝑛𝑘 /𝜌
1 𝑠𝑗 𝑛𝑘
ℎ𝑗 (𝑛𝑘 )2 + ((𝜌 + )𝑛𝑘 − 𝑠𝑗 − 𝑠𝑘 ) ℎ𝑗 (𝑛𝑘 ) + (𝑛2𝑘 − − 𝑠𝑘 𝑛𝑘 𝜌) = 0
𝜌 𝜌
56.5 Simulation
@jit(nopython=True)
def _hj(j, nk, s1, s2, θ, δ, ρ):
"""
If we expand the implicit function for h_j(n_k) then we find that
it is quadratic. We know that h_j(n_k) > 0 so we can get its
value by using the quadratic form
"""
# Find out who's h we are evaluating
if j == 1:
sj = s1
sk = s2
else:
sj = s2
sk = s1
b = ((ρ + 1 / ρ) * nk - sj - sk)
c = (nk * nk - (sj * nk) / ρ - sk * ρ * nk)
return root
@jit(nopython=True)
def DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLL"
return (n1 <= s1_ρ) and (n2 <= s2_ρ)
@jit(nopython=True)
def DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHH"
return (n1 >= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHL"
return (n1 >= s1_ρ) and (n2 <= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLH"
return (n1 <= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= s2_ρ)
@jit(nopython=True)
def one_step(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Takes a current value for (n_{1, t}, n_{2, t}) and returns the
values (n_{1, t+1}, n_{2, t+1}) according to the law of motion.
"""
# Depending on where we are, evaluate the right branch
if DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * s1_ρ + (1 - θ) * n1)
n2_tp1 = δ * (θ * s2_ρ + (1 - θ) * n2)
elif DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * n2
elif DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * (θ * _hj(2, n1, s1, s2, θ, δ, ρ) + (1 - θ) * n2)
elif DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * _hj(1, n2, s1, s2, θ, δ, ρ) + (1 - θ) * n1)
n2_tp1 = δ * n2
@jit(nopython=True)
def n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Given an initial condition, continues to yield new values of
n1 and n2
"""
n1_t, n2_t = n1_0, n2_0
while True:
n1_tp1, n2_tp1 = one_step(n1_t, n2_t, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
yield (n1_tp1, n2_tp1)
n1_t, n2_t = n1_tp1, n2_tp1
@jit(nopython=True)
def _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers):
"""
Takes initial values and iterates forward to see whether
the histories eventually end up in sync.
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
940 56. GLOBALIZATION AND CYCLES
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Initialize the status of synchronization
synchronized = False
pers_2_sync = maxiter
iters = 0
# Initialize generator
n_gen = n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
@jit(nopython=True)
def _create_attraction_basis(s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers, npts):
# Create unit range with npts
synchronized, pers_2_sync = False, 0
unit_range = np.linspace(0.0, 1.0, npts)
return time_2_sync
class MSGSync:
"""
The paper "Globalization and Synchronization of Innovation Cycles" presents
a two-country model with endogenous innovation cycles. Combines elements
from Deneckere Judd (1985) and Helpman Krugman (1985) to allow for a
model with trade that has firms who can introduce new varieties into
the economy.
Parameters
----------
s1 : scalar(Float)
Amount of total labor in country 1 relative to total worldwide labor
θ : scalar(Float)
A measure of how much more of the competitive variety is used in
production of final goods
δ : scalar(Float)
Percentage of firms that are not exogenously destroyed every period
ρ : scalar(Float)
Measure of how expensive it is to trade between countries
"""
def __init__(self, s1=0.5, θ=2.5, δ=0.7, ρ=0.2):
# Store model parameters
self.s1, self.θ, self.δ, self.ρ = s1, θ, δ, ρ
def _unpack_params(self):
return self.s1, self.s2, self.θ, self.δ, self.ρ
def _calc_s1_ρ(self):
# Unpack params
s1, s2, θ, δ, ρ = self._unpack_params()
# s_1(ρ) = min(val, 1)
val = (s1 - ρ * s2) / (1 - ρ)
return min(val, 1)
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
T : scalar(Int)
Number of periods to simulate
Returns
-------
n1 : Array(Float64, ndim=1)
A history of normalized measures of firms in country one
n2 : Array(Float64, ndim=1)
A history of normalized measures of firms in country two
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
# Allocate space
n1 = np.empty(T)
n2 = np.empty(T)
942 56. GLOBALIZATION AND CYCLES
# Store in arrays
n1[t] = n1_tp1
n2[t] = n2_tp1
return n1, n2
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
return ab
We write a short function below that exploits the preceding code and plots two time series
Each time series gives the dynamics for the two countries
56.5. SIMULATION 943
The time series share parameters but differ in their initial condition
Here’s the function
In [2]: def plot_timeseries(n1_0, n2_0, s1=0.5, θ=2.5, δ=0.7, ρ=0.2, ax=None, title=''):
"""
Plot a single time series with initial conditions
"""
if ax is None:
fig, ax = plt.subplots()
ax.legend()
ax.set(title=title, ylim=(0.15, 0.8))
return ax
# Create figure
fig, ax = plt.subplots(2, 1, figsize=(10, 8))
fig.tight_layout()
plt.show()
944 56. GLOBALIZATION AND CYCLES
In the first case, innovation in the two countries does not synchronize
In the second case, different initial conditions are chosen, and the cycles become synchronized
Next, let’s study the initial conditions that lead to synchronized cycles more systematically
We generate time series from a large collection of different initial conditions and mark those
conditions with different colors according to whether synchronization occurs or not
The next display shows exactly this for four different parameterizations (one for each subfig-
ure)
Dark colors indicate synchronization, while light colors indicate failure to synchronize
56.6 Exercises
56.6.1 Exercise 1
Replicate the figure shown above by coloring initial conditions according to whether or not
synchronization occurs from those conditions
56.7 Solutions
return ab, cf
Additionally, instead of just seeing 4 plots at once, we might want to manually be able to
change 𝜌 and see how it affects the plot in real-time. Below we use an interactive plot to do
56.7. SOLUTIONS 947
this
Note, interactive plotting requires the ipywidgets module to be installed and enabled
57.1 Contents
• Overview 57.2
• The Model 57.3
• Equilibrium 57.4
• Existence, Uniqueness and Computation of Equilibria 57.5
• Implementation 57.6
• Exercises 57.7
• Solutions 57.8
57.2 Overview
In 1937, Ronald Coase wrote a brilliant essay on the nature of the firm [27]
Coase was writing at a time when the Soviet Union was rising to become a significant indus-
trial power
At the same time, many free-market economies were afflicted by a severe and painful depres-
sion
This contrast led to an intensive debate on the relative merits of decentralized, price-based
allocation versus top-down planning
In the midst of this debate, Coase made an important observation: even in free-market
economies, a great deal of top-down planning does in fact take place
This is because firms form an integral part of free-market economies and, within firms, alloca-
tion is by planning
In other words, free-market economies blend both planning (within firms) and decentralized
production coordinated by prices
The question Coase asked is this: if prices and free markets are so efficient, then why do firms
even exist?
Couldn’t the associated within-firm planning be done more efficiently by the market?
949
950 57. COASE’S THEORY OF THE FIRM
On top of asking a deep and fascinating question, Coase also supplied an illuminating answer:
firms exist because of transaction costs
Here’s one example of a transaction cost:
Suppose agent A is considering setting up a small business and needs a web developer to con-
struct and help run an online store
She can use the labor of agent B, a web developer, by writing up a freelance contract for
these tasks and agreeing on a suitable price
But contracts like this can be time-consuming and difficult to verify
• How will agent A be able to specify exactly what she wants, to the finest detail, when
she herself isn’t sure how the business will evolve?
• And what if she isn’t familiar with web technology? How can she specify all the relevant
details?
• And, if things go badly, will failure to comply with the contract be verifiable in court?
In this situation, perhaps it will be easier to employ agent B under a simple labor contract
The cost of this contract is far smaller because such contracts are simpler and more standard
The basic agreement in a labor contract is: B will do what A asks him to do for the term of
the contract, in return for a given salary
Making this agreement is much easier than trying to map every task out in advance in a con-
tract that will hold up in a court of law
So agent A decides to hire agent B and a firm of nontrivial size appears, due to transaction
costs
57.2.2 A Trade-Off
• transaction costs, which add to the expense of operating between firms, and
• diminishing returns to management, which adds to the expense of operating within
firms
For example, you could think of management as a pyramid, so hiring more workers to im-
plement more tasks requires expansion of the pyramid, and hence labor costs grow at a rate
more than proportional to the range of tasks
Diminishing returns to management makes in-house production expensive, favoring small
firms
57.2.3 Summary
• Firms grow because transaction costs encourage them to take some operations in house
• But as they get large, in-house operations become costly due to diminishing returns to
management
• The size of firms is determined by balancing these effects, thereby equalizing the
marginal costs of each form of operation
57.3.1 Subcontracting
The subcontracting scheme by which tasks are allocated across firms is illustrated in the fig-
ure below
952 57. COASE’S THEORY OF THE FIRM
In this example,
• Firm 1 receives a contract to sell one unit of the completed good to a final buyer
• Firm 1 then forms a contract with firm 2 to purchase the partially completed good at
stage 𝑡1 , with
the intention of implementing the remaining 1 − 𝑡1 tasks in-house (i.e., processing from stage
𝑡1 to stage 1)
• Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at
stage 𝑡2
• firm 3 decides to complete the chain, selecting 𝑡3 = 0
At this point, production unfolds in the opposite direction (i.e., from upstream to down-
stream)
• Firm 3 completes processing stages from 𝑡3 = 0 up to 𝑡2 and transfers the good to firm
2
• Firm 2 then processes from 𝑡2 up to 𝑡1 and transfers the good to firm 1,
• Firm 1 processes from 𝑡1 to 1 and delivers the completed good to the final buyer
The length of the interval of stages (range of tasks) carried out by firm 𝑖 is denoted by ℓ𝑖
Each firm chooses only its upstream boundary, treating its downstream boundary as given
The benefit of this formulation is that it implies a recursive structure for the decision problem
for each firm
57.4. EQUILIBRIUM 953
In choosing how many processing stages to subcontract, each successive firm faces essentially
the same decision problem as the firm above it in the chain, with the only difference being
that the decision space is a subinterval of the decision space for the firm above
We will exploit this recursive structure in our study of equilibrium
57.3.2 Costs
57.4 Equilibrium
We assume that all firms are ex-ante identical and act as price takers
As price takers, they face a price function 𝑝, which is a map from [0, 1] to R+ , with 𝑝(𝑡) inter-
preted as the price of the good at processing stage 𝑡
There is a countable infinity of firms indexed by 𝑖 and no barriers to entry
The cost of supplying the initial input (the good processed up to stage zero) is set to zero for
simplicity
Free entry and the infinite fringe of competitors rule out positive profits for incumbents, since
any incumbent could be replaced by a member of the competitive fringe filling the same role
in the production chain
Profits are never negative in equilibrium because firms can freely exit
An equilibrium in this setting is an allocation of firms and a price function such that
1. all active firms in the chain make zero profits, including suppliers of raw materials
2. no firm in the production chain has an incentive to deviate, and
954 57. COASE’S THEORY OF THE FIRM
As a labeling convention, we assume that firms enter in order, with firm 1 being the furthest
downstream
An allocation {ℓ𝑖 } is called feasible if ∑ 𝑖≥1 ℓ𝑖 = 1
In a feasible allocation, the entire production process is completed by finitely many firms
Given a feasible allocation, {ℓ𝑖 }, let {𝑡𝑖 } represent the corresponding transaction stages, de-
fined by
In particular, 𝑡𝑖−1 is the downstream boundary of firm 𝑖 and 𝑡𝑖 is its upstream boundary
As transaction costs are incurred only by the buyer, its profits are
1. 𝑝(0) = 0,
2. 𝜋𝑖 = 0 for all 𝑖, and
3. 𝑝(𝑠) − 𝑐(𝑠 − 𝑡) − 𝛿𝑝(𝑡) ≤ 0 for any pair 𝑠, 𝑡 with 0 ≤ 𝑠 ≤ 𝑡 ≤ 1
The rationale behind these conditions was given in our informal definition of equilibrium
above
We have defined an equilibrium but does one exist? Is it unique? And, if so, how can we com-
pute it?
57.5. EXISTENCE, UNIQUENESS AND COMPUTATION OF EQUILIBRIA 955
By definition, 𝑡∗ (𝑠) is the cost-minimizing upstream boundary for a firm that is contracted to
deliver the good at stage 𝑠 and faces the price function 𝑝∗
Since 𝑝∗ lies in 𝒫 and since 𝑐 is strictly convex, it follows that the right-hand side of Eq. (4) is
continuous and strictly convex in 𝑡
Hence the minimizer 𝑡∗ (𝑠) exists and is uniquely defined
We can use 𝑡∗ to construct an equilibrium allocation as follows:
Recall that firm 1 sells the completed good at stage 𝑠 = 1, its optimal upstream boundary is
𝑡∗ (1)
Hence firm 2’s optimal upstream boundary is 𝑡∗ (𝑡∗ (1))
Continuing in this way produces the sequence {𝑡∗𝑖 } defined by
The sequence ends when a firm chooses to complete all remaining tasks
We label this firm (and hence the number of firms in the chain) as
The task allocation corresponding to Eq. (5) is given by ℓ𝑖∗ ∶= 𝑡∗𝑖−1 − 𝑡∗𝑖 for all 𝑖
In [77] it is shown that
While the proofs are too long to repeat here, much of the insight can be obtained by observ-
ing that, as a fixed point of 𝑇 , the equilibrium price function must satisfy
From this equation, it is clear that so profits are zero for all incumbent firms
We can develop some additional insights on the behavior of firms by examining marginal con-
ditions associated with the equilibrium
As a first step, let ℓ∗ (𝑠) ∶= 𝑠 − 𝑡∗ (𝑠)
This is the cost-minimizing range of in-house tasks for a firm with downstream boundary 𝑠
In [77] it is shown that 𝑡∗ and ℓ∗ are increasing and continuous, while 𝑝∗ is continuously dif-
ferentiable at all 𝑠 ∈ (0, 1) with
Equation Eq. (8) follows from 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} and the envelope theorem for
derivatives
A related equation is the first order condition for 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)}, the mini-
mization problem for a firm with upstream boundary 𝑠, which is
This condition matches the marginal condition expressed verbally by Coase that we stated
above:
“A firm will tend to expand until the costs of organizing an extra transaction
within the firm become equal to the costs of carrying out the same transaction
by means of an exchange on the open market…”
Combining Eq. (8) and Eq. (9) and evaluating at 𝑠 = 𝑡𝑖 , we see that active firms that are
adjacent satisfy
𝛿 𝑐′ (ℓ𝑖+1
∗
) = 𝑐′ (ℓ𝑖∗ ) (10)
57.6. IMPLEMENTATION 957
In other words, the marginal in-house cost per task at a given firm is equal to that of its up-
stream partner multiplied by gross transaction cost
This expression can be thought of as a Coase–Euler equation, which determines inter-firm
efficiency by indicating how two costly forms of coordination (markets and management) are
jointly minimized in equilibrium
57.6 Implementation
For most specifications of primitives, there is no closed-form solution for the equilibrium as
far as we are aware
However, we know that we can compute the equilibrium corresponding to a given transaction
cost parameter 𝛿 and a cost function 𝑐 by applying the results stated above
In particular, we can
As we step between iterates, we will use linear interpolation of functions, as we did in our lec-
ture on optimal growth and several other places.
To begin, here’s a class to store primitives and a grid
class ProductionChain:
def __init__(self,
n=1000,
delta=1.05,
c=lambda t: np.exp(10 * t) - 1):
* pc is an instance of ProductionChain
* The initial condition is p = c
"""
delta, c, n, grid = pc.delta, pc.c, pc.n, pc.grid
p = c(grid) # Initial condition is c(s), as an array
new_p = np.empty_like(p)
error = tol + 1
i = 0
if i < max_iter:
print(f"Iteration converged in {i} steps")
else:
print(f"Warning: iteration hit upper bound {max_iter}")
The next function computes optimal choice of upstream boundary and range of task imple-
mented for a firm face price function p_function and with downstream boundary 𝑠
"""
delta, c = pc.delta, pc.c
f = lambda t: delta * p_function(t) + c(s - t)
t_star = max(fminbound(f, -1, s), 0)
ell_star = s - t_star
return t_star, ell_star
The allocation of firms can be computed by recursively stepping through firms’ choices of
their respective upstream boundary, treating the previous firm’s upstream boundary as their
own downstream boundary
In doing so, we start with firm 1, who has downstream boundary 𝑠 = 1
In [6]: pc = ProductionChain()
p_star = compute_prices(pc)
fig, ax = plt.subplots()
ax.plot(pc.grid, p_star(pc.grid))
ax.set_xlim(0.0, 1.0)
ax.set_ylim(0.0)
for s in transaction_stages:
ax.axvline(x=s, c="0.5")
plt.show()
57.6. IMPLEMENTATION 959
Here’s the function ℓ∗ , which shows how large a firm with downstream boundary 𝑠 chooses to
be
fig, ax = plt.subplots()
ax.plot(pc.grid, ell_star, label="$\ell^*$")
ax.legend(fontsize=14)
plt.show()
960 57. COASE’S THEORY OF THE FIRM
57.7 Exercises
57.7.1 Exercise 1
57.7.2 Exercise 2
57.8 Solutions
57.8.1 Exercise 1
In [8]: for delta in (1.01, 1.05, 1.1):
57.8. SOLUTIONS 961
pc = ProductionChain(delta=delta)
p_star = compute_prices(pc)
transaction_stages = compute_stages(pc, p_star)
num_firms = len(transaction_stages)
print(f"When delta={delta} there are {num_firms} firms")
57.8.2 Exercise 2
Firm size increases with downstreamness because 𝑝∗ , the equilibrium price function, is in-
creasing and strictly convex
This means that, for a given producer, the marginal cost of the input purchased from the pro-
ducer just upstream from itself in the chain increases as we go further downstream
Hence downstream firms choose to do more in house than upstream firms — and are therefore
larger
The equilibrium price function is strictly convex due to both transaction costs and diminish-
ing returns to management
One way to put this is that firms are prevented from completely mitigating the costs associ-
ated with diminishing returns to management — which induce convexity — by transaction
costs. This is because transaction costs force firms to have nontrivial size
Here’s one way to compute and graph value added across firms
In [9]: pc = ProductionChain()
p_star = compute_prices(pc)
stages = compute_stages(pc, p_star)
va = []
fig, ax = plt.subplots()
ax.plot(va, label="value added by firm")
ax.set_xticks((5, 25))
ax.set_xticklabels(("downstream firms", "upstream firms"))
plt.show()
963
58
58.1 Contents
“Mathematics is the art of giving the same name to different things” – Henri
Poincare
“Complete market economies are all alike” – Robert E. Lucas, Jr., (1989)
965
966 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
In saying that “complete markets are all alike”, Robert E. Lucas, Jr. was noting that all of
them have
• a commodity space
• a space dual to the commodity space in which prices reside
• endowments of resources
• peoples’ preferences over goods
• physical technologies for transforming resources into goods
• random processes that govern shocks to technologies and preferences and associated in-
formation flows
• a single budget constraint per person
• the existence of a representative consumer even when there are many people in the
model
• a concept of competitive equilibrium
• theorems connecting competitive equilibrium allocations to allocations that would be
chosen by a benevolent social planner
• Enforcement difficulties
• Information asymmetries
• Other forms of transactions costs
• Externalities
Much of the imperialism of complete markets models comes from applying these two tricks
The Hicks trick of indexing commodities by time is the idea that dynamics are a special
case of statics
The Arrow trick of indexing commodities by chance is the idea that analysis of trade un-
der uncertainty is a special case of the analysis of trade under certainty
The [59] class of models specify the commodity space, preferences, technologies, stochastic
shocks and information flows in ways that allow the models to be analyzed completely using
only the tools of linear time series models and linear-quadratic optimal control described in
the two lectures Linear State Space Models and Linear Quadratic Control
58.2. A SUITE OF MODELS 967
There are costs and benefits associated with the simplifications and specializations needed to
make a particular model fit within the [59] class
• the costs are that linear-quadratic structures are sometimes too confining
• benefits include computational speed, simplicity, and ability to analyze many model fea-
tures analytically or nearly analytically
A variety of superficially different models are all instances of the [59] class of models
The diversity of these models conceals an essential unity that illustrates the quotation by
Robert E. Lucas, Jr., with which we began this lecture
58.2.2 Forecasting?
A consequence of a single budget constraint per person plus the Hicks-Arrow tricks is that
households and firms need not forecast
But there exist equivalent structures called recursive competitive equilibria in which they
do appear to need to forecast
In these structures, to forecast, households and firms use:
For an application of the [59] class of models, the outcome of theorizing is a stochastic pro-
cess, i.e., a probability distribution over sequences of prices and quantities, indexed by param-
eters describing preferences, technologies, and information flows
Another name for that object is a likelihood function, a key object of both frequentist and
Bayesian statistics
There are two important uses of an equilibrium stochastic process or likelihood func-
tion
The first is to solve the direct problem
The direct problem takes as inputs values of the parameters that define preferences, tech-
nologies, and information flows and as an output characterizes or simulates random paths of
quantities and prices
968 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
The second use of an equilibrium stochastic process or likelihood function is to solve the in-
verse problem
The inverse problem takes as an input a time series sample of observations on a subset of
prices and quantities determined by the model and from them makes inferences about the
parameters that define the model’s preferences, technologies, and information flows
A [59] economy consists of lists of matrices that describe peoples’ household technologies,
their preferences over consumption services, their production technologies, and their informa-
tion sets
There are complete markets in history-contingent commodities
Competitive equilibrium allocations and prices
Different example economies manifest themselves simply as different settings for various ma-
trices
[59] use these tools:
The models are flexible enough to express alternative senses of a representative household
• A single ‘stand-in’ household of the type used to good effect by Edward C. Prescott
• Heterogeneous households satisfying conditions for Gorman aggregation into a represen-
tative household
• Heterogeneous household technologies that violate conditions for Gorman aggregation
but are still susceptible to aggregation into a single representative household via ‘non-
Gorman’ or ‘mongrel’ aggregation’
These three alternative types of aggregation have different consequences in terms of how
prices and allocations can be computed
In particular, can prices and an aggregate allocation be computed before the equilibrium allo-
cation to individual heterogeneous households is computed?
• Answers are “Yes” for Gorman aggregation, “No” for non-Gorman aggregation
In summary, the insights and practical benefits from economics to be introduced in this lec-
ture are
• Speed and ease of computation that comes from unleashing a common suite of Python
programs
• Information
• Technologies
• Preferences
We’ll use stochastic linear difference equations to describe information flows and equilibrium
outcomes
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be a martingale difference sequence adapted to
{𝐽𝑡 ∶ 𝑡 = 0, 1, …} if 𝐸(𝑤𝑡+1 |𝐽𝑡 ) = 0 for 𝑡 = 0, 1, …
′
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be conditionally homoskedastic if 𝐸(𝑤𝑡+1 𝑤𝑡+1 ∣
𝐽𝑡 ) = 𝐼 for 𝑡 = 0, 1, …
We assume that the {𝑤𝑡 ∶ 𝑡 = 1, 2, …} process is conditionally homoskedastic
Let {𝑥𝑡 ∶ 𝑡 = 1, 2, …} be a sequence of 𝑛-dimensional random vectors, i.e. an 𝑛-dimensional
stochastic process
The process {𝑥𝑡 ∶ 𝑡 = 1, 2, …} is constructed recursively using an initial random vector 𝑥0 ∼
𝒩(𝑥0̂ , Σ0 ) and a time-invariant law of motion:
𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ) = 𝐴𝑥𝑡
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
𝑡−1
= [∑ 𝐴𝜏 𝐶𝑤𝑡−𝜏 ] + 𝐴𝑡 𝑥0
𝜏=0
𝑗−1
𝑥𝑡+𝑗 = ∑ 𝐴𝑠 𝐶𝑤𝑡+𝑗−𝑠 + 𝐴𝑗 𝑥𝑡
𝑠=0
𝐸𝑡 𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡
𝑗−1
′
𝐸𝑡 (𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 )(𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 ) = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 ≡ 𝑣𝑗
′
𝑘=0
𝑣1 = 𝐶𝐶 ′
𝑣𝑗 = 𝐶𝐶 ′ + 𝐴𝑣𝑗−1 𝐴′ , 𝑗≥2
58.2. A SUITE OF MODELS 971
𝑗−1
′
𝜐𝑗,𝜏 = ∑ 𝐴𝑘 𝐶𝑖𝜏 𝑖′𝜏 𝐶 ′ 𝐴 𝑘 .
𝑘=0
𝑁
Note that ∑𝜏=1 𝑖𝜏 𝑖′𝜏 = 𝐼, so that we have
𝑁
∑ 𝜐𝑗,𝜏 = 𝜐𝑗
𝜏=1
𝑏𝑡 = 𝑈𝑏 𝑧𝑡 and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡 ,
𝑈𝑏 and 𝑈𝑑 are matrices that select entries of 𝑧𝑡 . The law of motion for {𝑧𝑡 ∶ 𝑡 = 0, 1, …} is
where 𝑧0 is a given initial condition. The eigenvalues of the matrix 𝐴22 have absolute values
that are less than or equal to one
Thus, in summary, our model of information and shocks is
• Production technologies
• Household technologies
• Household preferences
Production Technology
Where 𝑐𝑡 is a vector of consumption rates, 𝑘𝑡 is a vector of physical capital goods, 𝑔𝑡 is a vec-
tor intermediate productions goods, 𝑑𝑡 is a vector of technology shocks, the production tech-
nology is
972 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Here Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 are all matrices conformable to the vectors they multiply and ℓ𝑡 is a
disutility generating resource supplied by the household
For technical reasons that facilitate computations, we make the following
Assumption: [Φ𝑐 Φ𝑔 ] is nonsingular
Household Technology
Households confront a technology that allows them to devote consumption goods to construct
a vector ℎ𝑡 of household capital goods and a vector 𝑠𝑡 of utility generating house services
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
We now proceed to give examples of production and household technologies that appear in
various models that appear in the literature
First, we give examples of production Technologies
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
∣ 𝑔𝑡 ∣≤ ℓ𝑡
So 𝑐𝑡 = 𝑑𝑡
To implement this specification, we can choose 𝐴22 , 𝐶2 , and 𝑈𝑑 to make 𝑑𝑡 follow any of a
variety of stochastic processes
To satisfy our earlier rank assumption, we set:
𝑐𝑡 + 𝑖𝑡 = 𝑑1𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡
1 1 0 0 𝑑
Φ𝑐 = [ ] , Φ𝑖 = [ ] , Φ𝑔 = [ ] , Γ = [ ] , 𝑑𝑡 = [ 1𝑡 ]
0 𝜙1 −1 0 0
We can use this specification to create a linear-quadratic version of Lucas’s (1978) asset pric-
ing model
There is a single consumption good, a single intermediate good, and a single investment good
The technology is described by
Set
1 0 0
Φ𝑐 = [ ] , Φ𝑔 = [ ] , Φ𝑖 = [ ]
0 −1 𝜙1
𝛾
Γ = [ ] , Δ𝑘 = 𝛿 𝑘 , Θ 𝑘 = 1
0
We set 𝐴22 , 𝐶2 and 𝑈𝑑 to make (𝑑1𝑡 , 𝑑2𝑡 )′ = 𝑑𝑡 follow a desired stochastic process
Now we describe some examples of preferences, which as we have seen are ordered by
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + (ℓ𝑡 )2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
974 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and we make
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one
Later we shall introduce canonical household technologies that satisfy an ‘invertibility’ re-
quirement relating sequences {𝑠𝑡 } of services and {𝑐𝑡 } of consumption flows
And we’ll describe how to obtain a canonical representation of a household technology from
one that is not canonical
Here are some examples of household preferences
Time Separable preferences
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
Consumer Durables
Services at 𝑡 are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
Set Δℎ = 𝛿ℎ , Θℎ = 1, Λ = 𝜆, Π = 0
Habit Persistence
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0
2 𝑡=0 𝑗=0
𝑡
ℎ𝑡 = (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗 + 𝛿ℎ𝑡+1 ℎ−1
𝑗=0
𝑠𝑡 = −𝜆ℎ𝑡−1 + 𝑐𝑡 , 𝜆 > 0
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−4𝑗−4 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0 𝑗=0
ℎ̃ 0 0 0 𝛿ℎ ℎ̃ (1 − 𝛿ℎ )
⎡ ̃ 𝑡 ⎤ ⎡ ⎡ ̃ 𝑡−1 ⎤ ⎡
ℎ 1 0 0 0 ⎥ ⎢ℎ𝑡−2 ⎥ ⎢ 0 ⎤
⎤
ℎ𝑡 = ⎢ 𝑡−1 ⎥
⎢ℎ̃ ⎥ = ⎢ ⎥+ ⎥ 𝑐𝑡
⎢ 𝑡−2 ⎥ ⎢0 1 0 0 ⎥⎢ ⎢ℎ̃ 𝑡−3 ⎥ ⎢ 0 ⎥
⎣ℎ̃ 𝑡−3 ⎦ ⎣0 0 1 0 ⎦ ⎣ℎ̃ 𝑡−4 ⎦ ⎣ 0 ⎦
Adjustment Costs
Recall
∞
1
−( )𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏1𝑡 )2 + 𝜆2 (𝑐𝑡 − 𝑐𝑡−1 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
0<𝛽<1 , 𝜆>0
ℎ𝑡 = 𝑐 𝑡
0 1
𝑠𝑡 = [ ] ℎ + [ ] 𝑐𝑡
−𝜆 𝑡−1 𝜆
976 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
so that
𝑠1𝑡 = 𝑐𝑡
We set the first component 𝑏1𝑡 of 𝑏𝑡 to capture the stochastic bliss process and set the second
component identically equal to zero.
Thus, we set Δℎ = 0, Θℎ = 1
0 1
Λ=[ ] , Π=[ ]
−𝜆 𝜆
0 𝜋 0
Λ = [ ] and Π = [ 1 ].
0 𝜋2 𝜋3
1
− 𝛽 𝑡 (Π𝑐𝑡 − 𝑏𝑡 )′ (Π𝑐𝑡 − 𝑏𝑡 )
2
𝑚𝑢𝑡 = −𝛽 𝑡 [Π′ Π 𝑐𝑡 − Π′ 𝑏𝑡 ]
Production Technology
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Household Technology
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
Preferences
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
∞
−(1/2)𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]∣𝐽0
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡
∞ ∞
𝐸 ∑ 𝛽 ℎ𝑡 ⋅ ℎ𝑡 ∣ 𝐽0 < ∞ and 𝐸 ∑ 𝛽 𝑡 𝑘𝑡 ⋅ 𝑘𝑡 ∣ 𝐽0 < ∞
𝑡
𝑡=0 𝑡=0
Define:
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
978 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Thus, we require that each component of ℎ𝑡 and each component of 𝑘𝑡 belong to 𝐿20
We shall compare and utilize two approaches to solving the planning problem
• Lagrangian
• Dynamic Programming
∞
1
ℒ = −𝐸 ∑ 𝛽 𝑡 [( )[(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
𝑡=0
2
+ 𝑀𝑡𝑑′ ⋅ (Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 − Γ𝑘𝑡−1 − 𝑑𝑡 )
+ 𝑀𝑡𝑘′ ⋅ (𝑘𝑡 − Δ𝑘 𝑘𝑡−1 − Θ𝑘 𝑖𝑡 )
+ 𝑀𝑡ℎ′ ⋅ (ℎ𝑡 − Δℎ ℎ𝑡−1 − Θℎ 𝑐𝑡 )
for 𝑡 = 0, 1, …
In addition, we have the complementary slackness conditions (these recover the original tran-
sition equations) and also transversality conditions
lim 𝛽 𝑡 𝐸[𝑀𝑡𝑘′ 𝑘𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
lim 𝛽 𝑡 𝐸[𝑀𝑡ℎ′ ℎ𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
The system formed by the FONCs and the transition equations can be handed over to
Python
Python will solve the planning problem for fixed parameter values
Here are the Python Ready Equations
58.2. A SUITE OF MODELS 979
𝑀𝑡𝑠 = 𝑏𝑡 − 𝑠𝑡
∞
𝑀𝑡ℎ = 𝐸[∑ 𝛽 𝜏 (Δ′ℎ )𝜏−1 Λ′ 𝑀𝑡+𝜏
𝑠
∣ 𝐽𝑡 ]
𝜏=1
−1
Φ′ Θ′ 𝑀 ℎ + Π′ 𝑀𝑡𝑠
𝑀𝑡𝑑 = [ ′𝑐 ] [ ℎ 𝑡 ]
Φ𝑔 −𝑔𝑡
∞
𝑀𝑡𝑘 = 𝐸[∑ 𝛽 𝜏 (Δ′𝑘 )𝜏−1 Γ′ 𝑀𝑡+𝜏
𝑑
∣ 𝐽𝑡 ]
𝜏=1
Although it is possible to use matrix operator methods to solve the above Python ready
equations, that is not the approach we’ll use
Instead, we’ll use dynamic programming to get recursive representations for both quantities
and shadow prices
Φ𝑐 𝑐0 + Φ𝑔 𝑔0 + Φ𝑖 𝑖0 = Γ𝑘−1 + 𝑑0 ,
𝑘0 = Δ𝑘 𝑘−1 + Θ𝑘 𝑖0 ,
ℎ0 = Δℎ ℎ−1 + Θℎ 𝑐0 ,
𝑠0 = Λℎ−1 + Π𝑐0 ,
𝑧1 = 𝐴22 𝑧0 + 𝐶2 𝑤1 , 𝑏0 = 𝑈𝑏 𝑧0 and 𝑑0 = 𝑈𝑑 𝑧0
𝑉 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝜌
∞
−𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0 < 𝛽 < 1
𝑡=0
subject to
𝑉 (𝑥𝑡 ) = −𝑥′𝑡 𝑃 𝑥𝑡 − 𝜌
𝑃 satisfies
The optimum decision rule for 𝑢𝑡 is independent of the parameters 𝐶, and so of the noise
statistics
Iterating on the Bellman operator leads to
𝑉𝑗 (𝑥𝑡 ) = −𝑥′𝑡 𝑃𝑗 𝑥𝑡 − 𝜌𝑗
∞
max −𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0<𝛽<1
{𝑢𝑡 ,𝑥𝑡+1 }
𝑡=0
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢𝑘𝑡−1 ⎥ , 𝑢𝑡 = 𝑖𝑡
⎣ 𝑧𝑡 ⎦
where
𝑆 = (𝐺′ 𝐺 + 𝐻 ′ 𝐻)/2
For us a useful fact is that Lagrange multipliers equal gradients of the planner’s value func-
tion
−1
Φ′ Θ′ 𝑀 + Π′ 𝑀𝑠
ℳ𝑑𝑡 = 𝑀𝑑 𝑥𝑡 where 𝑀𝑑 = [ ′𝑐 ] [ ℎ ℎ ]
Φ𝑔 −𝑆𝑔
We will use this fact and these equations to compute competitive equilibrium prices
Let’s start with describing the commodity space and pricing functional for our competi-
tive equilibrium
For the commodity space, we use
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
∞
𝜋(𝑐) = 𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0
𝑡=0
The representative household owns endowment process and initial stocks of ℎ and 𝑘 and
chooses stochastic processes for {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , ℓ𝑡 }∞ 2
𝑡=0 , each element of which is in 𝐿0 , to maximize
∞
1
− 𝐸0 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0
subject to
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘−1
𝑡=0 𝑡=0
58.2. A SUITE OF MODELS 983
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
We now describe the problems faced by two types of firms called type I and type II
A type I firm rents capital and labor and endowments and produces 𝑐𝑡 , 𝑖𝑡 .
It chooses stochastic processes for {𝑐𝑡 , 𝑖𝑡 , 𝑘𝑡 , ℓ𝑡 , 𝑔𝑡 , 𝑑𝑡 }, each element of which is in 𝐿20 , to maxi-
mize
∞
𝐸0 ∑ 𝛽 𝑡 (𝑝𝑡0 ⋅ 𝑐𝑡 + 𝑞𝑡0 ⋅ 𝑖𝑡 − 𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑤𝑡0 ℓ𝑡 − 𝛼0𝑡 ⋅ 𝑑𝑡 )
𝑡=0
subject to
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
− ℓ𝑡2 + 𝑔𝑡 ⋅ 𝑔𝑡 = 0
A firm of type II acquires capital via investment and then rents stocks of capital to the 𝑐, 𝑖-
producing type I firm
A type II firm is a price taker facing the vector 𝑣0 and the stochastic processes {𝑟𝑡0 , 𝑞𝑡0 }.
The firm chooses 𝑘−1 and stochastic processes for {𝑘𝑡 , 𝑖𝑡 }∞
𝑡=0 to maximize
∞
𝐸 ∑ 𝛽 𝑡 (𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑞𝑡0 ⋅ 𝑖𝑡 ) ∣ 𝐽0 − 𝑣0 ⋅ 𝑘−1
𝑡=0
subject to
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
• Each component of the price system and the allocation resides in the space
𝐿20
• Given the price system and given ℎ−1 , 𝑘−1 , the allocation solves the represen-
tative household’s problem and the problems of the two types of firms
984 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Versions of the two classical welfare theorems prevail under our assumptions
We exploit that fact in our algorithm for computing a competitive equilibrium
The allocation (i.e., quantities) that solve the planning problem are the competitive equilib-
rium quantities
Step 2: use the following formulas to compute the equilibrium price system
𝑤𝑡0 =∣ 𝑆𝑔 𝑥𝑡 ∣ /𝜇𝑤
0
𝑣0 = Γ′ 𝑀0𝑑 /𝜇𝑤 ′ 𝑘 𝑤
0 + Δ𝑘 𝑀0 /𝜇0
Verification: With this price system, values can be assigned to the Lagrange multipliers for
each of our three classes of agents that cause all first-order necessary conditions to be satisfied
at these prices and at the quantities associated with the optimum of the planning problem
An important use of an equilibrium pricing system is to do asset pricing
Thus, imagine that we are presented a dividend stream: {𝑦𝑡 } ∈ 𝐿20 and want to compute the
value of a perpetual claim to this stream
To value this asset we simply take price times quantity and add to get an asset value:
∞
𝑎0 = 𝐸 ∑𝑡=0 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑦𝑡 ∣ 𝐽0
To compute 𝑎𝑜 we proceed as follows
We let
𝑦𝑡 = 𝑈𝑎 𝑥𝑡
∞
𝑎0 = 𝐸 ∑ 𝛽 𝑡 𝑥′𝑡 𝑍𝑎 𝑥𝑡 ∣ 𝐽0
𝑡=0
𝑍𝑎 = 𝑈𝑎′ 𝑀𝑐 /𝜇𝑤
0
𝑎0 = 𝑥′0 𝜇𝑎 𝑥0 + 𝜎𝑎
∞
𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜′ )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0
∞
𝛽
𝜎𝑎 = trace (𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 ′ (𝐴𝑜′ )𝜏 )
1−𝛽 𝜏=0
𝐿2𝑡 = [{𝑦𝑠 }∞
𝑠=𝑡 ∶ 𝑦𝑠 is a random variable in 𝐽𝑠 for 𝑠 ≥ 𝑡
∞
and 𝐸 ∑ 𝛽 𝑠−𝑡 𝑦𝑠2 ∣ 𝐽𝑡 < +∞].
𝑠=𝑡
𝑤𝑠𝑡 =∣ 𝑆𝑔 𝑥𝑠 |/[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
𝑟𝑠𝑡 = Γ′ 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
𝛼𝑡𝑠 = 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
58.3 Econometrics
Up to now, we have described how to solve the direct problem that maps model parameters
into an (equilibrium) stochastic process of prices and quantities
Recall the inverse problem of inferring model parameters from a single realization of a time
series of some of the prices and quantities
Another name for the inverse problem is econometrics
An advantage of the [59] structure is that it comes with a self-contained theory of economet-
rics
986 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡
where 𝑣𝑡 is a martingale difference sequence of measurement errors that satisfies 𝐸𝑣𝑡 𝑣𝑡′ =
𝑅, 𝐸𝑤𝑡+1 𝑣𝑠′ = 0 for all 𝑡 + 1 ≥ 𝑠 and
𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ).
Innovations Representation:
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡 ,
• 𝑛𝑤 + 𝑛𝑦 versus 𝑛𝑦
• 𝐻(𝑦𝑡 ) ⊂ 𝐻(𝑤𝑡 , 𝑣𝑡 )
• 𝐻(𝑦𝑡 ) = 𝐻(𝑎𝑡 )
Kalman Filter:
Kalman gain:
𝐾𝑡 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1
Σ𝑡+1 = 𝐴𝑜 Σ𝑡 𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴𝑜′
𝑎𝑡 = 𝑦𝑡 − 𝐺𝑥𝑡̂
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
can be used recursively to construct a record of innovations {𝑎𝑡 }𝑇𝑡=0 from an (𝑥0̂ , Σ0 ) and a
record of observations {𝑦𝑡 }𝑇𝑡=0
58.3. ECONOMETRICS 987
Σ = 𝐴𝑜 Σ𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴𝑜′
𝐾 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝐺′ + 𝑅)−1
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡
𝑓(𝑦𝑇 , 𝑦𝑇 −1 , … , 𝑦0 ) = 𝑓𝑇 (𝑦𝑇 |𝑦𝑇 −1 , … , 𝑦0 )𝑓𝑇 −1 (𝑦𝑇 −1 |𝑦𝑇 −2 , … , 𝑦0 ) ⋯ 𝑓1 (𝑦1 |𝑦0 )𝑓0 (𝑦0 )
= 𝑔𝑇 (𝑎𝑇 )𝑔𝑇 −1 (𝑎𝑇 −1 ) … 𝑔1 (𝑎1 )𝑓0 (𝑦0 ).
Gaussian Log-Likelihood:
𝑇
−.5 ∑{𝑛𝑦 ln(2𝜋) + ln |Ω𝑡 | + 𝑎′𝑡 Ω−1
𝑡 𝑎𝑡 }
𝑡=0
Key Insight: The zeros of the polynomial det[𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐾 + 𝐼] all lie inside the unit cir-
cle, which means that 𝑎𝑡 lies in the space spanned by square summable linear combinations of
𝑦𝑡
988 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝐻(𝑎𝑡 ) = 𝐻(𝑦𝑡 )
𝐿𝑥𝑡 ≡ 𝑥𝑡−1
𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1
Applying the inverse of the operator on the right side and using
∞
𝑦𝑡 = ∑ 𝐺(𝐴𝑜 − 𝐾𝐺)𝑗−1 𝐾𝑦𝑡−𝑗 + 𝑎𝑡
𝑗=1
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
• Π is nonsingular, and
• the√absolute values of the eigenvalues of (Δℎ − Θℎ Π−1 Λ) are strictly less than
1/ 𝛽.
The restriction on the eigenvalues of the matrix (Δℎ − Θℎ Π−1 Λ) keeps the household capital
stock {ℎ𝑡 } in 𝐿20
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1
∞
𝑊0 = 𝐸0 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) + 𝑣0 ⋅ 𝑘−1
𝑡=0
∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ (𝑏𝑡 − 𝑠𝑖,𝑡 ) − 𝑊0
𝜇𝑤
0 = ∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ 𝜌𝑡0
This system expresses consumption demands at date 𝑡 as functions of: (i) time-𝑡 conditional
0
expectations of future scaled Arrow-Debreu prices {𝑝𝑡+𝑠 }∞
𝑠=0 ; (ii) the stochastic process for
the household’s endowment {𝑑𝑡 } and preference shock {𝑏𝑡 }, as mediated through the multi-
plier 𝜇𝑤
0 and wealth 𝑊0 ; and (iii) past values of consumption, as mediated through the state
variable ℎ𝑡−1
We shall explore how the dynamic demand schedule for consumption goods opens up the pos-
sibility of satisfying Gorman’s (1953) conditions for aggregation in a heterogeneous consumer
model
The first equation of our demand system is an Engel curve for consumption that is linear in
the marginal utility 𝜇20 of individual wealth with a coefficient on 𝜇𝑤
0 that depends only on
prices
The multiplier 𝜇𝑤
0 depends on wealth in an affine relationship, so that consumption is linear
in wealth
In a model with multiple consumers who have the same household technologies (Δℎ , Θℎ , Λ, Π)
but possibly different preference shock processes and initial values of household capital stocks,
the coefficient on the marginal utility of wealth is the same for all consumers
990 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Gorman showed that when Engel curves satisfy this property, there exists a unique commu-
nity or aggregate preference ordering over aggregate consumption that is independent of the
distribution of wealth
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1 ,
∞
𝑊𝑡 = 𝐸𝑡 ∑ 𝛽 𝑗 (𝑤𝑡+𝑗
𝑡
ℓ𝑡+𝑗 + 𝛼𝑡𝑡+𝑗 ⋅ 𝑑𝑡+𝑗 ) + 𝑣𝑡 ⋅ 𝑘𝑡−1
𝑗=0
∞
𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝜌𝑡+𝑗
𝑡
⋅ (𝑏𝑡+𝑗 − 𝑠𝑖,𝑡+𝑗 ) − 𝑊𝑡
𝜇𝑤
𝑡 = ∞ 𝑡 𝑡
𝐸𝑡 ∑𝑡=0 𝛽 𝑗 𝜌𝑡+𝑗 ⋅ 𝜌𝑡+𝑗
′
[Π + 𝛽 1/2 𝐿Λ(𝐼 − 𝛽 1/2 𝐿Δℎ )−1 Θℎ ]
̂ − 𝛽 1/2 𝐿−1 Δ )−1 Θ ]′ [Π̂ + 𝛽 1/2 𝐿Λ(𝐼
= [Π̂ + 𝛽 1/2 𝐿−1 Λ(𝐼 ̂ − 𝛽 1/2 𝐿Δ )−1 Θ ]
ℎ ℎ ℎ ℎ
The factorization identity guarantees that the [Λ,̂ Π]̂ representation satisfies both require-
ments for a canonical representation
58.5. GORMAN AGGREGATION AND ENGEL CURVES 991
Demand:
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 ⋅ 𝑐𝑡 − 𝑔𝑡 ⋅ 𝑔𝑡 /2}
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑖 𝑖𝑡 + Φ𝑔 𝑔𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 .
∞
𝐸 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − 𝑔𝑡2 /2}
𝑡=0
𝑐𝑡 = 𝛾𝑘𝑡−1
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝑓1 𝑖𝑡 + 𝑓2 𝑑𝑡
where 𝑑𝑡 is a cost shifter, 𝛾 > 0, and 𝑓1 > 0 is a cost parameter and 𝑓2 = 1. Demand is
governed by
𝑝𝑡 = 𝛼0 − 𝛼1 𝑐𝑡 + 𝑢𝑡
where 𝑢𝑡 is a demand shifter with mean zero and 𝛼0 , 𝛼1 are positive parameters
Assume that 𝑢𝑡 , 𝑑𝑡 are uncorrelated first-order autoregressive processes
992 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝑅𝑡 = 𝑏𝑡 + 𝛼ℎ𝑡
∞
𝑝𝑡 = 𝐸𝑡 ∑(𝛽𝛿ℎ )𝜏 𝑅𝑡+𝜏
𝜏=0
where ℎ𝑡 is the stock of housing at time 𝑡 𝑅𝑡 is the rental rate for housing, 𝑝𝑡 is the price of
new houses, and 𝑏𝑡 is a demand shifter; 𝛼 < 0 is a demand parameter, and 𝛿ℎ is a deprecia-
tion factor for houses
We cast this demand specification within our class of models by letting the stock of houses ℎ𝑡
evolve according to
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 𝛿ℎ ∈ (0, 1)
𝑠𝑡 = 𝑏𝑡 − 𝜇0 𝜌𝑡0
where the price of new houses 𝑝𝑡 is related to 𝜌𝑡0 by 𝜌𝑡0 = 𝜋−1 [𝑝𝑡 − 𝛽𝛿ℎ 𝐸𝑡 𝑝𝑡+1 ]
Rosen, Murphy, and Scheinkman (1994). Let 𝑝𝑡 be the price of freshly slaughtered beef, 𝑚𝑡
the feeding cost of preparing an animal for slaughter, ℎ̃ 𝑡 the one-period holding cost for a ma-
ture animal, 𝛾1 ℎ̃ 𝑡 the one-period holding cost for a yearling, and 𝛾0 ℎ̃ 𝑡 the one-period holding
cost for a calf
The cost processes {ℎ̃ 𝑡 , 𝑚𝑡 }∞ ∞
𝑡=0 are exogenous, while the stochastic process {𝑝𝑡 }𝑡=0 is deter-
mined by a rational expectations equilibrium. Let 𝑥𝑡̃ be the breeding stock, and 𝑦𝑡̃ be the to-
tal stock of animals
The law of motion for cattle stocks is
𝑥𝑡̃ = (1 − 𝛿)𝑥𝑡−1
̃ + 𝑔𝑥𝑡−3
̃ − 𝑐𝑡
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ̃ 𝑡 𝑥𝑡̃ − (𝛾0 ℎ̃ 𝑡 )(𝑔𝑥𝑡−1
̃ ) − (𝛾1 ℎ̃ 𝑡 )(𝑔𝑥𝑡−2
̃ ) − 𝑚𝑡 𝑐𝑡
𝑡=0
− Ψ(𝑥𝑡̃ , 𝑥𝑡−1
̃ , 𝑥𝑡−2
̃ , 𝑐𝑡 )}
where
𝜓1 2 𝜓2 2 𝜓 𝜓
Ψ= 𝑥𝑡̃ + ̃ + 3 𝑥2𝑡−2
𝑥𝑡−1 ̃ + 4 𝑐𝑡2
2 2 2 2
Demand is governed by
𝑐𝑡 = 𝛼0 − 𝛼1 𝑝𝑡 + 𝑑𝑡̃
Ryoo and Rosen’s (2004) [114] model consists of the following equations:
first, a demand curve for engineers
third, a definition of the discounted present value of each new engineering student
∞
𝑣𝑡 = 𝛽 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗 ;
𝑘
𝑗=0
𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖2𝑡 , 𝛼𝑠 > 0
Here {𝜖1𝑡 , 𝜖2𝑡 } are stochastic processes of labor demand and supply shocks
∞
Definition: A partial equilibrium is a stochastic process {𝑤𝑡 , 𝑁𝑡 , 𝑣𝑡 , 𝑛𝑡 }𝑡=0 satisfying these
four equations, and initial conditions 𝑁−1 , 𝑛−𝑠 , 𝑠 = 1, … , −𝑘
994 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
We sweep the time-to-build structure and the demand for engineers into the household tech-
nology and putting the supply of new engineers into the technology for producing goods
ℎ1𝑡−1
⎡ ℎ ⎤
𝑠𝑡 = [𝜆1 0 … 0] ⎢ 2𝑡−1 ⎥ + 0 ⋅ 𝑐𝑡
⎢ ⋮ ⎥
ℎ
⎣ 𝑘+1,𝑡−1 ⎦
ℎ1𝑡 𝛿𝑁 1 0 ⋯ 0 ℎ1𝑡−1 0
⎡ ℎ ⎤ ⎡0 0 1 ⋯ 0⎤ ⎡ ℎ2𝑡−1 ⎤ ⎡0⎤
⎢ 2𝑡 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥=⎢ ⋮ ⋮ ⋮ ⋱ ⋮⎥⎢ ⋮ ⎥ + ⎢ ⋮ ⎥ 𝑐𝑡
⎢ ℎ𝑘,𝑡 ⎥ ⎢ 0 ⋯ ⋯ 0 1⎥ ⎢ ℎ𝑘,𝑡−1 ⎥ ⎢0⎥
⎣ℎ𝑘+1,𝑡 ⎦ ⎣ 0 0 0 ⋯ 0⎦ ⎣ℎ𝑘+1,𝑡−1 ⎦ ⎣1⎦
This specification sets Rosen’s 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−𝜏 , 𝜏 = 1, … , 𝑘, and uses the
home-produced service to capture the demand for labor. Here 𝜆1 embodies Rosen’s demand
parameter 𝛼𝑑
𝑤 𝑁
[ 𝑢𝑡 ] = 𝛼𝑑 [ 𝑢𝑡 ] + 𝜖1𝑡
𝑤𝑠𝑡 𝑁𝑠𝑡
where 𝛼𝑑 is a (2 × 2) matrix of demand parameters and 𝜖1𝑡 is a vector of demand shifters sec-
ond, time-to-train specifications for skilled and unskilled labor, respectively:
where 𝑁𝑠𝑡 , 𝑁𝑢𝑡 are stocks of the two types of labor, and 𝑛𝑠𝑡 , 𝑛𝑢𝑡 are entry rates into the two
occupations
third, definitions of discounted present values of new entrants to the skilled and unskilled oc-
cupations, respectively:
∞
𝑣𝑠𝑡 = 𝐸𝑡 𝛽 𝑘 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑠𝑡+𝑘+𝑗
𝑗=0
∞
𝑣𝑢𝑡 = 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑢𝑡+𝑗
𝑗=0
where 𝑤𝑢𝑡 , 𝑤𝑠𝑡 are wage rates for the two occupations; and fourth, supply curves for new en-
trants:
58.8. PERMANENT INCOME MODELS 995
𝑛 𝑣
[ 𝑠𝑡 ] = 𝛼𝑠 [ 𝑢𝑡 ] + 𝜖2𝑡
𝑛𝑢𝑡 𝑣𝑠𝑡
Short Cut
As an alternative, Siow simply used the equalizing differences condition
𝑣𝑢𝑡 = 𝑣𝑠𝑡
𝜙𝑐 ⋅ 𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑒𝑡
𝑘𝑡 = 𝑘𝑡−1 + 𝑖𝑡
𝜙𝑖 𝑖 𝑡 − 𝑔 𝑡 = 0
Implication One:
Equality of Present Values of Moving Average Coefficients of 𝑐 and 𝑒
∞
𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 (𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )
𝑗=0
∞
𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 𝐸(𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )|𝐽𝑡
𝑗=0
∞ ∞
∑ 𝛽 (𝜙𝑐 ) 𝜒𝑗 = ∑ 𝛽 𝑗 𝜖𝑗
𝑗 ′
𝑗=0 𝑗=0
and
These have been tested in work by Hansen, Sargent, and Roberts (1991) [116] and by Attana-
sio and Pavoni (2011) [10]
We now assume that there is a finite number of households, each with its own household tech-
nology and preferences over consumption services
Household 𝑗 orders preferences over consumption processes according to
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) ⋅ (𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) + ℓ𝑗𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑏𝑗𝑡 = 𝑈𝑏𝑗 𝑧𝑡
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑗𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑗𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑗𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘𝑗,−1 ,
𝑡=0 𝑡=0
where 𝑘𝑗,−1 is given. The 𝑗th consumer owns an endowment process 𝑑𝑗𝑡 , governed by the
stochastic process 𝑑𝑗𝑡 = 𝑈𝑑𝑗 𝑧𝑡
We refer to this as a setting with Gorman heterogeneous households
This specification confines heterogeneity among consumers to:
ℓ𝑗𝑡 = (𝜇𝑤 𝑤
0𝑗 /𝜇0𝑎 )ℓ𝑎𝑡
∞ ∞
𝜇𝑤 𝑡 0 0 0 𝑤 𝑡 0 𝑖 0
0𝑗 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ 𝜌𝑡 + (𝑤𝑡 /𝜇0𝑎 )ℓ𝑎𝑡 } = 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ (𝑏𝑗𝑡 − 𝑠𝑗𝑡 ) − 𝛼𝑡 ⋅ 𝑑𝑗𝑡 } − 𝑣0 𝑘𝑗,−1
𝑡=0 𝑡=0
𝑠𝑗𝑡 − 𝑏𝑗𝑡 = 𝜇𝑤 0
0𝑗 𝜌𝑡
We now describe a less tractable type of heterogeneity across households that we dub Non-
Gorman heterogeneity
Here is the specification
Preferences and Household Technologies:
∞
1
− 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) ⋅ (𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) + ℓ𝑖𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑏𝑖𝑡 = 𝑈𝑏𝑖 𝑧𝑡
Production Technology
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑑𝑖𝑡 = 𝑈𝑑𝑖 𝑧𝑡 , 𝑖 = 1, 2
Pareto Problem:
∞
1
− 𝜆𝐸0 ∑ 𝛽 𝑡 [(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + ℓ1𝑡
2
]
2 𝑡=0
∞
1
− (1 − 𝜆)𝐸0 ∑ 𝛽 𝑡 [(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 ) + ℓ2𝑡
2
]
2 𝑡=0
𝑝𝑡 = 𝜇−1 ′ −1 ′
0 Π 𝑏𝑡 − 𝜇0 Π Π𝑐𝑡
Integrating the marginal utility vector shows that preferences can be taken to be
𝜇−1 ′ −1 −1′
0 Π Π = (𝜇01 Π1 Π2 + 𝜇02 Π−1 −1′ −1
2 Π2 )
58.10. NON-GORMAN HETEROGENEOUS HOUSEHOLDS 999
Dynamic Analogue:
We now describe how to extend mongrel aggregation to a dynamic setting
The key comparison is
∞
∑ 𝛽 𝑡 [𝜆(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + (1 − 𝜆)(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 )]
𝑡=0
subject to
subject to (ℎ1,−1 , ℎ2,−1 ) given and {𝑏1𝑡 }, {𝑏2𝑡 }, {𝑐𝑡 } being known and fixed sequences
Substituting the {𝑐1𝑡 , 𝑐2𝑡 } sequences that solve this problem as functions of {𝑏1𝑡 , 𝑏2𝑡 , 𝑐𝑡 } into
the objective determines a mongrel preference ordering over {𝑐𝑡 } = {𝑐1𝑡 + 𝑐2𝑡 }
In solving this problem, it is convenient to proceed by using Fourier transforms. For details,
please see [59] where they deploy a
Secret Weapon: Another application of the spectral factorization identity
Concluding remark: The [59] class of models described in this lecture are all complete
markets models. We have exploited the fact that complete market models are all alike to
allow us to define a class that gives the same name to different things in the spirit of
Henri Poincare
Could we create such a class for incomplete markets models?
That would be nice, but before trying it would be wise to contemplate the remainder of a
statement by Robert E. Lucas, Jr., with which we began this lecture
“Complete market economies are all alike but each incomplete market economy is
incomplete in its own individual way.” Robert E. Lucas, Jr., (1989)
1000 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
59
59.1 Contents
This lecture describes several complete market economies having a common linear-quadratic-
Gaussian structure
Three examples of such economies show how the DLE class can be used to compute equilibria
of such economies in Python and to illustrate how different versions of these economies can or
cannot generate sustained growth
We require the following imports
1001
1002 59. GROWTH IN DYNAMIC LINEAR ECONOMIES
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡
• Consumption and physical investment goods are produced using the following technol-
ogy
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
{𝐴22 , 𝐶2 , 𝑈𝑏 , 𝑈𝑑 , Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 , Λ, Π, Δℎ , Θℎ }
The first welfare theorem asserts that a competitive equilibrium allocation solves the follow-
ing planning problem
Choose {𝑐𝑡 , 𝑠𝑡 , 𝑖𝑡 , ℎ𝑡 , 𝑘𝑡 , 𝑔𝑡 }∞
𝑡=0 to maximize
1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
2 𝑡=0
59.3. A PLANNING PROBLEM 1003
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡
The DLE class in Python maps this planning problem into a linear-quadratic dynamic pro-
gramming problem and then solves it by using QuantEcon’s LQ class
(See Section 5.5 of Hansen & Sargent (2013) [59] for a full description of how to map these
economies into an LQ setting, and how to use the solution to the LQ problem to construct
the output matrices in order to simulate the economies)
The state for the LQ problem is
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
1004 59. GROWTH IN DYNAMIC LINEAR ECONOMIES
Each of the example economies shown here will share a number of components. In particular,
for each we will consider preferences of the form
1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝜃ℎ 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑐𝑡 + 𝑖𝑡 = 𝛾1 𝑘𝑡−1 + 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
1 0 0 0 0
𝑧𝑡+1 = ⎢ 0 0.8 0 ⎥ 𝑧𝑡 + ⎢ 1 0 ⎤
⎡ ⎤ ⎡
⎥ 𝑤𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
We shall vary {𝜆, 𝜋, 𝛿ℎ , 𝜃ℎ , 𝛾1 , 𝛿𝑘 , 𝜙1 } and the initial state 𝑥0 across the three economies
59.4. EXAMPLE ECONOMIES 1005
First, we set parameters such that consumption follows a random walk. In particular, we set
1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05
(In this economy 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1
For simulations of this economy, we choose an initial condition of
′
𝑥0 = [ 5 150 1 0 0 ]
# Initial condition
x0 = np.array([[5], [150], [1], [0], [0]])
These parameter values are used to define an economy of the DLE class
We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0
The economy stores the simulated values for each variable. Below we plot consumption and
investment
1006 59. GROWTH IN DYNAMIC LINEAR ECONOMIES
In [6]: # This is the right panel of Fig 5.7.1 from p.105 of HS2013
plt.plot(Econ1.c[0], label='Cons.')
plt.plot(Econ1.i[0], label='Inv.')
plt.legend()
plt.show()
Inspection of the plot shows that the sample paths of consumption and investment drift in
ways that suggest that each has or nearly has a random walk or unit root component
This is confirmed by checking the eigenvalues of 𝐴𝑜
The endogenous eigenvalue that appears to be unity reflects the random walk character of
consumption in Hall’s model
In [8]: Econ1.endo[1]
Out[8]: 0.9999999999904767
The fact that the largest endogenous eigenvalue is strictly less than unity in modulus means
that it is possible to compute the non-stochastic steady state of consumption, investment and
capital
In [9]: Econ1.compute_steadystate()
np.set_printoptions(precision=3, suppress=True)
print(Econ1.css, Econ1.iss, Econ1.kss)
59.4. EXAMPLE ECONOMIES 1007
However, the near-unity endogenous eigenvalue means that these steady state values are of
little relevance
We generate our next economy by making two alterations to the parameters of Example 1
– This will lower the endogenous eigenvalue that is close to 1, causing the economy
to head more quickly to the vicinity of its non-stochastic steady-state
– This has the effect of raising the optimal steady-state value of capital
We also start the economy off from an initial condition with a lower capital stock
′
𝑥0 = [ 5 20 1 0 0 ]
In [10]: γ2 = 0.15
γ22 = np.array([[γ2], [0]])
�_12 = 1
�_i2 = np.array([[1], [-�_12]])
Creating the DLE class and then simulating gives the following plot for consumption and in-
vestment
Econ2.compute_sequence(x02, ts_length=300)
plt.plot(Econ2.c[0], label='Cons.')
plt.plot(Econ2.i[0], label='Inv.')
plt.legend()
plt.show()
1008 59. GROWTH IN DYNAMIC LINEAR ECONOMIES
Simulating our new economy shows that consumption grows quickly in the early stages of the
sample
However, it then settles down around the new non-stochastic steady-state level of consump-
tion of 17.5, which we find as follows
In [12]: Econ2.compute_steadystate()
print(Econ2.css, Econ2.iss, Econ2.kss)
The economy converges faster to this level than in Example 1 because the largest endogenous
eigenvalue of 𝐴𝑜 is now significantly lower than 1
For our third economy, we choose parameter values with the aim of generating sustained
growth in consumption, investment and capital
To do this, we set parameters so that Jones and Manuelli’s “growth condition” is just satisfied
In our notation, just satisfying the growth condition is actually equivalent to setting 𝛽(𝛾1 +
𝛿𝑘 ) = 1, the condition that was necessary for consumption to be a random walk in Hall’s
model
Thus, we lower 𝛾1 back to 0.1
In our model, this is a necessary but not sufficient condition for growth
59.4. EXAMPLE ECONOMIES 1009
1 ∞ ∞
− E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 − (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 )2 + 𝑙2𝑡 ]
2 𝑡=0 𝑗=0
∞
• the effective “bliss point” 𝑏𝑡 + (1 − 𝛿ℎ ) ∑𝑗=0 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 now shifts in response to a moving
average of past consumption
Since 𝛿ℎ and 𝜃ℎ were defined earlier, the only change we need to make from the parameters of
Example 1 is to define the new value of 𝜆
Thus, adding habit persistence to the Hall model of Example 1 is enough to generate sus-
tained growth in our economy
The eigenvalues of 𝐴𝑜 in this new economy are
We now have two unit endogenous eigenvalues. One stems from satisfying the growth condi-
tion (as in Example 1)
The other unit eigenvalue results from setting 𝜆 = −1
To show the importance of both of these for generating growth, we consider the following ex-
periments
Econ4.compute_sequence(x0, ts_length=300)
plt.plot(Econ4.c[0], label='Cons.')
plt.plot(Econ4.i[0], label='Inv.')
plt.legend()
plt.show()
59.4. EXAMPLE ECONOMIES 1011
Econ5.compute_sequence(x0, ts_length=300)
plt.plot(Econ5.c[0], label='Cons.')
plt.plot(Econ5.i[0], label='Inv.')
plt.legend()
plt.show()
60.1 Contents
This lecture uses the DLE class to price payout streams that are linear functions of the econ-
omy’s state vector, as well as risk-free assets that pay out one unit of the first consumption
good with certainty
We assume basic knowledge of the class of economic environments that fall within the domain
of the DLE class
Many details about the basic environment are contained in the lecture Growth in Dynamic
Linear Economies
We’ll also need the following imports
We use a linear-quadratic version of an economy that Lucas (1978) [88] used to develop an
equilibrium theory of asset prices:
Preferences
1 ∞
− E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
1013
1014 60. LUCAS ASSET PRICING USING DLE
𝑠𝑡 = 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 = 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
Information
1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤𝑧 +⎡ 1 0 ⎤𝑤
⎥ 𝑡 ⎢ ⎥ 𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
′
𝑥0 = [ 5 150 1 0 0 ]
[59] show that the time t value of a permanent claim to a stream 𝑦𝑠 = 𝑈𝑎 𝑥𝑠 , 𝑠 ≥ 𝑡 is:
𝑎𝑡 = (𝑥′𝑡 𝜇𝑎 𝑥𝑡 + 𝜎𝑎 )/(𝑒1̄ 𝑀𝑐 𝑥𝑡 )
with
∞
′
𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0
∞
𝛽 ′ ′
𝜎𝑎 = trace(𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 (𝐴𝑜 )𝜏 )
1−𝛽 𝜏=0
where
60.3. ASSET PRICING SIMULATIONS 1015
′
𝑍𝑎 = 𝑈 𝑎 𝑀 𝑐
The use of 𝑒1̄ indicates that the first consumption good is the numeraire
In [3]: gam = 0
γ = np.array([[gam], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 1e-4
�_i = np.array([[0], [-�_1]])
δ_k = np.array([[.95]])
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
ud = np.array([[5, 1, 0],
[0, 0, 0]])
a22 = np.array([[1, 0, 0],
[0, 0.8, 0],
[0, 0, 0.5]])
c2 = np.array([[0, 1, 0],
[0, 0, 1]]).T
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ub = np.array([[30, 0, 0]])
x0 = np.array([[5, 150, 1, 0, 0]]).T
The graph below plots the price of this claim over time:
The next plot displays the realized gross rate of return on this “Lucas tree” as well as on a
risk-free one-period bond:
Above we have also calculated the correlation coefficient between these two returns
To give an idea of how the term structure of interest rates moves in this economy, the next
plot displays the net rates of return on one-period and five-period risk-free bonds:
From the above plot, we can see the tendency of the term structure to slope up when rates
are low and to slope down when rates are high
Comparing it to the previous plot of the price of the “Lucas tree”, we can also see that net
rates of return are low when the price of the tree is high, and vice versa
We now plot the realized gross rate of return on a “Lucas tree” as well as on a risk-free one-
period bond when the autoregressive parameter for the endowment process is reduced to 0.4:
The correlation between these two gross rates is now more negative
Next, we again plot the net rates of return on one-period and five-period risk-free bonds:
We can see the tendency of the term structure to slope up when rates are low (and down
when rates are high) has been accentuated relative to the first instance of our economy
1020 60. LUCAS ASSET PRICING USING DLE
61
61.1 Contents
This lecture shows how the DLE class can be used to create impulse response functions for
three related economies, starting from Hall (1978) [48]
Knowledge of the basic economic environment is assumed
See the lecture “Growth in Dynamic Linear Economies” for more details
1021
1022 61. IRFS IN HALL MODELS
1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05
(In this example 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services
We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1
For simulations of this economy, we choose an initial condition of:
′
𝑥0 = [ 5 150 1 0 0 ]
These parameter values are used to define an economy of the DLE class
We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0
The economy stores the simulated values for each variable. Below we plot consumption and
investment:
The DLE class can be used to create impulse response functions for each of the endogenous
variables: {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , 𝑔𝑡 }
If no selector vector for the shock is specified, the default choice is to give IRFs to the first
shock in 𝑤𝑡+1
Below we plot the impulse response functions of investment and consumption to an endow-
ment innovation (the first shock) in the Hall model:
It can be seen that the endowment shock has permanent effects on the level of both consump-
tion and investment, consistent with the endogenous unit eigenvalue in this economy
Investment is much more responsive to the endowment shock at shorter time horizons
We generate our next economy by making only one change to the parameters of Example 1:
we raise the parameter associated with the cost of adjusting capital,𝜙1 , from 0.00001 to 0.2
This will lower the endogenous eigenvalue that is unity in Example 1 to a value slightly below
1
In [7]: Econ2.irf(ts_length=40,shock=None)
# This is the left panel of Fig 5.8.1 from p.106 of HS2013
plt.plot(Econ2.c_irf,label='Cons.')
plt.plot(Econ2.i_irf,label='Inv.')
plt.legend()
plt.show()
In [8]: Econ2.endo
1026 61. IRFS IN HALL MODELS
In [9]: Econ2.compute_steadystate()
print(Econ2.css, Econ2.iss, Econ2.kss)
The first graph shows that there seems to be a downward trend in both consumption and in-
vestment
his is a consequence of the decrease in the largest endogenous eigenvalue from unity in the
earlier economy, caused by the higher adjustment cost
The present economy has a nonstochastic steady state value of 5 for consumption and 0 for
both capital and investment
Because the largest endogenous eigenvalue is still close to 1, the economy heads only slowly
towards these mean values
The impulse response functions now show that an endowment shock does not have a perma-
nent effect on the levels of either consumption or investment
We generate our third economy by raising 𝜙1 further, to 1.0. We also raise the production
function parameter from 0.1 to 0.15 (which raises the non-stochastic steady state value of
capital above zero)
We also change the specification of preferences to make the consumption good durable
Specifically, we allow for a single durable household good obeying:
Services are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− E ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
To implement this, we set 𝜆 = 0.1 and 𝜋 = 0 (we have already set 𝜃ℎ = 1 and 𝛿ℎ = 0.9)
We start from an initial condition that makes consumption begin near around its non-
stochastic steady state
In [10]: �_13 = 1
�_i3 = np.array([[1], [-�_13]])
γ_12 = 0.15
γ_2 = np.array([[γ_12], [0]])
61.4. EXAMPLE 3: DURABLE CONSUMPTION GOODS 1027
l_λ2 = np.array([[0.1]])
π_h2 = np.array([[0]])
In contrast to Hall’s original model of Example 1, it is now investment that is much smoother
than consumption
This illustrates how making consumption goods durable tends to undo the strong consump-
tion smoothing result that Hall obtained
The impulse response functions confirm that consumption is now much more responsive to an
endowment shock (and investment less so) than in Example 1
As in Example 2, the endowment shock has permanent effects on neither variable
62
62.1 Contents
This lecture adds a third solution method for the linear-quadratic-Gaussian permanent in-
come model with 𝛽𝑅 = 1, complementing the other two solution methods described in
Optimal Savings I: The Permanent Income Model and Optimal Savings II: LQ Techniques
and this Jupyter notebook http://nbviewer.jupyter.org/github/QuantEcon/
QuantEcon.notebooks/blob/master/permanent_income.ipynb
The additional solution method uses the DLE class
In this way, we map the permanent income model into the framework of Hansen & Sargent
(2013) “Recursive Models of Dynamic Linear Economies” [59]
We’ll also require the following imports
%matplotlib inline
np.set_printoptions(suppress=True, precision=4)
1029
1030 62. PERMANENT INCOME MODEL USING THE DLE CLASS
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
where 𝑤𝑡+1 is an IID process with mean zero and identity contemporaneous covariance ma-
trix, 𝐴22 is a stable matrix, its eigenvalues being strictly below unity in modulus, and 𝑈𝑦 is a
selection vector that identifies 𝑦 with a particular linear combination of the 𝑧𝑡
We impose the following condition on the consumption, borrowing plan:
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < +∞ (4)
𝑡=0
𝑧
𝑥𝑡 = [ 𝑡 ]
𝑏𝑡
where 𝑏𝑡 is its one-period debt falling due at the beginning of period 𝑡 and 𝑧𝑡 contains all
variables useful for forecasting its future endowment
We assume that {𝑦𝑡 } follows a second order univariate autoregressive process:
One way of solving this model is to map the problem into the framework outlined in Section
4.8 of [59] by setting up our technology, information and preference matrices as follows:
1 0 −1 −1
Technology: 𝜙𝑐 = [ ] , 𝜙𝑔 = [ ] , 𝜙𝑖 = [ ], Γ = [ ], Δ𝑘 = 0, Θ𝑘 = 𝑅
0 1 −0.00001 0
1 0 0 0
0 1 0
Information: 𝐴22 = ⎢ 𝛼 𝜌1 𝜌2 ⎥, 𝐶2 = ⎢ 𝜎 ⎤
⎡ ⎤ ⎡
⎥, 𝑈𝑏 = [ 𝛾 0 0 ], 𝑈𝑑 = [ 0 0 0 ]
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
Preferences: Λ = 0, Π = 1, Δℎ = 0, Θℎ = 0
We set parameters
𝛼 = 10, 𝛽 = 0.95, 𝜌1 = 0.9, 𝜌2 = 0, 𝜎 = 1
(The value of 𝛾 does not affect the optimal decision rule)
The chosen matrices mean that the household’s technology is:
𝑐𝑡 + 𝑘𝑡−1 = 𝑖𝑡 + 𝑦𝑡
𝑘𝑡
= 𝑖𝑡
𝑅
𝑙2𝑡 = (0.00001)2 𝑖𝑡
Combining the first two of these gives the budget constraint of the permanent income model,
where 𝑘𝑡 = 𝑏𝑡+1
The third equation is a very small penalty on debt-accumulation to rule out Ponzi schemes
We set up this instance of the DLE class below:
γ = np.array([[-1], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 1e-5
�_i = np.array([[-1], [-�_1]])
δ_k = np.array([[0]])
θ_k = np.array([[1 / β]])
β = np.array([[β]])
1032 62. PERMANENT INCOME MODEL USING THE DLE CLASS
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[0]])
θ_h = np.array([[0]])
To check the solution of this model with that from the LQ problem, we select the 𝑆𝑐 matrix
from the DLE class
The solution to the DLE economy has:
𝑐𝑡 = 𝑆𝑐 𝑥𝑡
In [4]: Econ1.Sc
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦
for i in range(25):
Econ1.compute_sequence(x0, ts_length=150)
plt.plot(Econ1.c[0], c='g')
plt.plot(Econ1.d[0], c='b')
plt.plot(Econ1.c[0], label='Consumption', c='g')
plt.plot(Econ1.d[0], label='Income', c='b')
plt.legend()
62.2. THE PERMANENT INCOME MODEL 1033
plt.subplot(122)
for i in range(25):
Econ1.compute_sequence(x0, ts_length=150)
plt.plot(Econ1.k[0], color='r')
plt.plot(Econ1.k[0], label='Debt', c='r')
plt.legend()
plt.show()
1034 62. PERMANENT INCOME MODEL USING THE DLE CLASS
63
63.1 Contents
• a stock of “Engineers” 𝑁𝑡
• a number of new entrants in engineering school, 𝑛𝑡
• the wage rate of engineers, 𝑤𝑡
1035
1036 63. ROSEN SCHOOLING MODEL
𝑤𝑡 = −𝛼𝑑 𝑁𝑡 + 𝜖𝑑𝑡
𝑁𝑡+𝑘 = 𝛿𝑁 𝑁𝑡+𝑘−1 + 𝑛𝑡
∞
𝑣𝑡 = 𝛽𝑘 E ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗
𝑗=0
𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖𝑠𝑡
• sweeping the time-to-build structure and the demand for engineers into the household
technology, and
• putting the supply of engineers into the technology for producing goods
63.3.1 Preferences
𝛿𝑁 1 0 ⋯ 0 0
⎡0 0 1 ⋯ 0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
Π = 0, Λ = [𝛼𝑑 0 ⋯ 0] , Δℎ = ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ , Θℎ = ⎢ ⋮ ⎥
⎢0 ⋯ ⋯ 0 1⎥ ⎢0⎥
⎣0 0 0 ⋯ 0⎦ ⎣1⎦
63.3.2 Technology
To capture Ryoo and Rosen’s [114] supply curve, we use the physical technology:
𝑐𝑡 = 𝑖𝑡 + 𝑑1𝑡
𝜓1 𝑖 𝑡 = 𝑔 𝑡
63.3.3 Information
1 0 0 0 0
10 1 0
𝐴22 = ⎢0 𝜌𝑠 0 ⎥ , 𝐶2 = ⎢1 0⎤
⎡ ⎤ ⎡
⎥ , 𝑈𝑏 = [30 0 1] , 𝑈𝑑 = [ 0 0 0]
⎣0 0 𝜌𝑑 ⎦ ⎣0 1⎦
where 𝜌𝑠 and 𝜌𝑑 describe the persistence of the supply and demand shocks
β = np.array([[1 / 1.05]])
α_d = np.array([[0.1]])
α_s = 1
ε_1 = 1e-7
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1)) # Use of ε_1 is trick to aquire detectability, see HS2013 p. 228 footnote
π_h = np.array([[0]])
δ_n = np.array([[0.95]])
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
ψ_1 = 1 / α_s
1038 63. ROSEN SCHOOLING MODEL
δ_k = np.array([[0]])
θ_k = np.array([[0]])
ρ_s = 0.8
ρ_d = 0.8
1. Raising 𝛼𝑑 to 2
2. Raising k to 7
3. Raising k to 10
α_d = np.array([[0.1]])
k = 7
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k+1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
k = 10
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
Econ1.irf(ts_length=25, shock=shock_demand)
Econ2.irf(ts_length=25, shock=shock_demand)
Econ3.irf(ts_length=25, shock=shock_demand)
Econ4.irf(ts_length=25, shock=shock_demand)
63.3. MAPPING INTO HS2013 FRAMEWORK 1039
The first figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝛼𝑑 = 0.1 and 𝛼𝑑 = 2
When 𝛼𝑑 = 2, the number of new students 𝑛𝑡 rises initially, but the response then turns nega-
tive
A positive demand shock raises wages, drawing new students into the profession
However, these new students raise 𝑁𝑡
The higher is 𝛼𝑑 , the larger the effect of this rise in 𝑁𝑡 on wages
This counteracts the demand shock’s positive effect on wages, reducing the number of new
students in subsequent periods
Consequently, when 𝛼𝑑 is lower, the effect of a demand shock on 𝑁𝑡 is larger
plt.subplot(122)
plt.plot(Econ1.h_irf[:, 0], label='$\\alpha_d = 0.1$')
plt.plot(Econ2.h_irf[:, 0], label='$\\alpha_d = 24$')
plt.legend()
plt.title('Response of $N_t$ to a demand shock')
plt.show()
The next figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝑘 = 4, 𝑘 = 7 and 𝑘 = 10 (with 𝛼𝑑 = 0.1)
plt.subplot(122)
plt.plot(Econ1.h_irf[:,0], label='$k=4$')
plt.plot(Econ3.h_irf[:,0], label='$k=7$')
plt.plot(Econ4.h_irf[:,0], label='$k=10$')
plt.legend()
plt.title('Response of $N_t$ to a demand shock')
plt.show()
1040 63. ROSEN SCHOOLING MODEL
Both panels in the above figure show that raising k lowers the effect of a positive demand
shock on entry into the engineering profession
Increasing the number of periods of schooling lowers the number of new students in response
to a demand shock
This occurs because with longer required schooling, new students ultimately benefit less from
the impact of that shock on wages
64
Cattle Cycles
64.1 Contents
This lecture uses the DLE class to construct instances of the “Cattle Cycles” model of Rosen,
Murphy and Scheinkman (1994) [110]
That paper constructs a rational expectations equilibrium model to understand sources of
recurrent cycles in US cattle stocks and prices
We make the following imports
The model features a static linear demand curve and a “time-to-grow” structure for cattle
Let 𝑝𝑡 be the price of slaughtered beef, 𝑚𝑡 the cost of preparing an animal for slaughter, ℎ𝑡
the holding cost for a mature animal, 𝛾1 ℎ𝑡 the holding cost for a yearling, and 𝛾0 ℎ𝑡 the hold-
ing cost for a calf
1041
1042 64. CATTLE CYCLES
𝑥𝑡 = (1 − 𝛿)𝑥𝑡−1 + 𝑔𝑥𝑡−3 − 𝑐𝑡
where 𝑔 < 1 is the number of calves that each member of the breeding stock has each year,
and 𝑐𝑡 is the number of cattle slaughtered
The total headcount of cattle is
𝑦𝑡 = 𝑥𝑡 + 𝑔𝑥𝑡−1 + 𝑔𝑥𝑡−2
This equation states that the total number of cattle equals the sum of adults, calves and
yearlings, respectively
A representative farmer chooses {𝑐𝑡 , 𝑥𝑡 } to maximize:
∞
𝜓1 2 𝜓2 2 𝜓 𝜓
E0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ𝑡 𝑥𝑡 − 𝛾0 ℎ𝑡 (𝑔𝑥𝑡−1 ) − 𝛾1 ℎ𝑡 (𝑔𝑥𝑡−2 ) − 𝑚𝑡 𝑐𝑡 − 𝑥 − 𝑥 − 3 𝑥2𝑡−3 − 4 𝑐𝑡2 }
𝑡=0
2 𝑡 2 𝑡−1 2 2
subject to the law of motion for 𝑥𝑡 , taking as given the stochastic laws of motion for the ex-
ogenous processes, the equilibrium price process, and the initial state [𝑥−1 , 𝑥−2 , 𝑥−3 ]
Remark The 𝜓𝑗 parameters are very small quadratic costs that are included for technical
reasons to make well posed and well behaved the linear quadratic dynamic programming
problem solved by the fictitious planner who in effect chooses equilibrium quantities and
shadow prices
Demand for beef is government by 𝑐𝑡 = 𝑎0 − 𝑎1 𝑝𝑡 + 𝑑𝑡̃ where 𝑑𝑡̃ is a stochastic process with
mean zero, representing a demand shifter
64.3.1 Preferences
1
−
We set Λ = 0, Δℎ = 0, Θℎ = 0, Π = 𝛼1 2 and 𝑏𝑡 = Π𝑑𝑡̃ + Π𝛼0
With these settings, the FOC for the household’s problem becomes the demand curve of the
“Cattle Cycles” model
64.3.2 Technology
(1 − 𝛿) 0 𝑔 1
Δ𝑘 = ⎡
⎢ 1 0 0 ⎤
⎥ , Θ 𝑘 = ⎡ 0 ⎤
⎢ ⎥
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
64.3. MAPPING INTO HS2013 FRAMEWORK 1043
(where 𝑖𝑡 = −𝑐𝑡 )
To capture the production of cattle, we set
1 0 0 0 0 1 0 0 0
⎡ 𝑓 ⎤ ⎡ 1 0 0 0 ⎤ ⎡ 0 ⎤ ⎡ 𝑓 (1 − 𝛿) 0 𝑔𝑓 ⎤
⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 1 ⎥
Φ𝑐 = ⎢ 0 ⎥ , Φ𝑔 = ⎢ 0 1 0 0 ⎥ , Φ𝑖 = ⎢ 0 ⎥ , Γ = ⎢ 𝑓3 0 0 ⎥
⎢ 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎢ 0 ⎥ ⎢ 0 𝑓5 0 ⎥
⎣ −𝑓7 ⎦ ⎣ 0 0 0 1 ⎦ 0
⎣ ⎦ ⎣ 0 0 0 ⎦
64.3.3 Information
We set
0
1 0 0 0 0 0 0 ⎡ ⎤
⎡ ⎤ ⎡ ⎤ 𝑓2 𝑈ℎ
0 𝜌1 0 0 1 0 0 ⎢ ⎥
𝐴22 =⎢ ⎥ , 𝐶2 = ⎢ ⎥ , 𝑈𝑏 = [ Π𝛼0 0 0 Π ] , 𝑈𝑑 = ⎢ 𝑓4 𝑈ℎ ⎥
⎢ 0 0 𝜌2 0 ⎥ ⎢ 0 1 0 ⎥
⎢ 𝑓6 𝑈ℎ ⎥
⎣ 0 0 0 𝜌3 ⎦ ⎣ 0 0 15 ⎦
⎣ 𝑓8 𝑈ℎ ⎦
Ψ1 Ψ2 Ψ3
To map this into our class, we set 𝑓12 = 2 , 𝑓22 = 2 , 𝑓32 = 2 , 2𝑓1 𝑓2 = 1, 2𝑓3 𝑓4 = 𝛾0 𝑔,
2𝑓5 𝑓6 = 𝛾1 𝑔
In [4]: β = np.array([[0.909]])
lλ = np.array([[0]])
a1 = 0.5
πh = np.array([[1 / (sqrt(a1))]])
δh = np.array([[0]])
θh = np.array([[0]])
δ = 0.1
g = 0.85
f1 = 0.001
f3 = 0.001
f5 = 0.001
f7 = 0.001
�g = np.array([[0, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1,0],
[0, 0, 0, 1]])
γ = np.array([[ 0, 0, 0],
[f1 * (1 - δ), 0, g * f1],
1044 64. CATTLE CYCLES
[ f3, 0, 0],
[ 0, f5, 0],
[ 0, 0, 0]])
δk = np.array([[1 - δ, 0, g],
[ 1, 0, 0],
[ 0, 1, 0]])
ρ1 = 0
ρ2 = 0
ρ3 = 0.6
a0 = 500
γ0 = 0.4
γ1 = 0.7
f2 = 1 / (2 * f1)
f4 = γ0 * g / (2 * f3)
f6 = γ1 * g / (2 * f5)
f8 = 1 / (2 * f7)
c2 = np.array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 15]])
Notice that we have set 𝜌1 = 𝜌2 = 0, so ℎ𝑡 and 𝑚𝑡 consist of a constant and a white noise
component
We set up the economy using tuples for information, technology and preference matrices be-
low
We also construct two extra information matrices, corresponding to cases when 𝜌3 = 1 and
𝜌3 = 0 (as opposed to the baseline case of 𝜌3 = 0.6)
ρ3_2 = 1
a22_2 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_2]])
ρ3_3 = 0
a22_3 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_3]])
# Example of how we can look at the matrices associated with a given namedtuple
Info1.a22
Out[5]: array([[1. , 0. , 0. , 0. ],
64.3. MAPPING INTO HS2013 FRAMEWORK 1045
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.6]])
# Calculate steady-state in baseline case and use to set the initial condition
Econ1.compute_steadystate(nnc=4)
x0 = Econ1.zz
[110] use the model to understand the sources of recurrent cycles in total cattle stocks
Plotting 𝑦𝑡 for a simulation of their model shows its ability to generate cycles in quantities
In their Figure 3, [110] plot the impulse response functions of consumption and the breeding
stock of cattle to the demand shock, 𝑑𝑡̃ , under the three different values of 𝜌3
We replicate their Figure 3 below
Econ1.irf(ts_length=25, shock=shock_demand)
Econ2.irf(ts_length=25, shock=shock_demand)
Econ3.irf(ts_length=25, shock=shock_demand)
1046 64. CATTLE CYCLES
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.c_irf, label='$\\rho=0.6$')
plt.plot(Econ2.c_irf, label='$\\rho=1$')
plt.plot(Econ3.c_irf, label='$\\rho=0$')
plt.title('Consumption response to demand shock')
plt.legend()
plt.subplot(122)
plt.plot(Econ1.k_irf[:, 0], label='$\\rho=0.6$')
plt.plot(Econ2.k_irf[:, 0], label='$\\rho=1$')
plt.plot(Econ3.k_irf[:, 0], label='$\\rho=0$')
plt.title('Breeding stock response to demand shock')
plt.legend()
plt.show()
The above figures show how consumption patterns differ markedly, depending on the persis-
tence of the demand shock:
• If it is purely transitory (𝜌3 = 0) then consumption rises immediately but is later re-
duced to build stocks up again.
• If it is permanent (𝜌3 = 1), then consumption falls immediately, in order to build up
stocks to satisfy the permanent rise in future demand.
In Figure 4 of their paper, [110] plot the response to a demand shock of the breeding stock
and the total stock, for 𝜌3 = 0 and 𝜌3 = 0.6
We replicate their Figure 4 below
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.k_irf[:, 0], label='Breeding Stock')
plt.plot(Total1_irf, label='Total Stock')
plt.title('$\\rho=0.6$')
plt.subplot(122)
plt.plot(Econ3.k_irf[:, 0], label='Breeding Stock')
plt.plot(Total3_irf, label='Total Stock')
plt.title('$\\rho=0$')
plt.show()
64.3. MAPPING INTO HS2013 FRAMEWORK 1047
The fact that 𝑦𝑡 is a weighted moving average of 𝑥𝑡 creates a humped shape response of the
total stock in response to demand shocks, contributing to the cyclicality seen in the first
graph of this lecture
1048 64. CATTLE CYCLES
65
This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s in Anaconda, this lecture uses the quantecon library
This lecture can be viewed as introducing an early contribution to what is now often called a
news and noise issue
In particular, it analyzes and illustrates an invertibility issue that is endemic within a class
of permanent income models
Technically, the invertibility problem indicates a situation in which histories of the shocks in
an econometrician’s autoregressive or Wold moving average representation span a smaller in-
formation space than do the shocks seen by the agent inside the econometrician’s model
This situation sets the stage for an econometrician who is unaware of the problem to misin-
terpret shocks and likely responses to them
We consider the following modification of Robert Hall’s (1978) model [48] in which the en-
dowment process is the sum of two orthogonal autoregressive processes:
Preferences
1 ∞ 𝑡
− E ∑ 𝛽 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
𝑠𝑡 = 𝑐𝑡
1049
1050 65. SHOCK NON INVERTIBILITY
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
Information
1 0 0 0 0 0 0 0
⎡ 0 0.9 0 0 0 0 ⎤ ⎡ 1 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 0 0 0 0 0 ⎥ ⎢ 0 4 ⎥
𝑧𝑡+1 =⎢ 𝑧𝑡 + ⎢ ⎥ 𝑤𝑡+1
⎢ 0 0 1 0 0 0 ⎥ ⎥ ⎢ 0 0 ⎥
⎢ 0 0 0 1 0 0 ⎥ ⎢ 0 0 ⎥
⎣ 0 0 0 0 1 0 ⎦ ⎣ 0 0 ⎦
𝑈𝑏 = [ 30 0 0 0 0 0 ]
The preference shock is constant at 30, while the endowment process is the sum of a constant
and two orthogonal processes
Specifically:
𝑑𝑡 = 5 + 𝑑1𝑡 + 𝑑2𝑡
𝑑1𝑡 is a first-order AR process, while 𝑑2𝑡 is a third-order pure moving average process
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ud = np.array([[5, 1, 1, 0.8, 0.6, 0.4],
[0, 0, 0, 0, 0, 0]])
a22 = np.zeros((6, 6))
a22[[0, 1, 3, 4, 5], [0, 1, 2, 3, 4]] = np.array([1.0, 0.9, 1.0, 1.0, 1.0]) # Chase's great trick
c2 = np.zeros((6, 2))
c2[[1, 2], [0, 1]] = np.array([1.0, 4.0])
ub = np.array([[30, 0, 0, 0, 0, 0]])
x0 = np.array([[5], [150], [1], [0], [0], [0], [0], [0]])
∞
E ∑ 𝛽 𝑗 (𝑐𝑡+𝑗 − 𝑑𝑡+𝑗 )|𝐽𝑡 = 𝛽 −1 𝑘𝑡−1 ∀𝑡
𝑗=0
𝑐𝑡 𝜎 (𝐿)
[ ]=[ 1 ] 𝑤𝑡
𝑐𝑡 − 𝑑𝑡 𝜎2 (𝐿)
𝑐𝑡 𝜎∗ (𝐿)
[ ] = [ 1∗ ] 𝑢𝑡
𝑐𝑡 − 𝑑 𝑡 𝜎2 (𝐿)
The Appendix of chapter 8 of [59] explains why the impulse response functions in the Wold
representation estimated by the econometrician do not resemble the impulse response func-
tions that depict the response of consumption and the deficit to innovations to agents’ infor-
mation
Technically, 𝜎2 (𝛽) = [0 0] implies that the history of 𝑢𝑡 s spans a smaller linear space than
does the history of 𝑤𝑡 s
This means that 𝑢𝑡 will typically be a distributed lag of 𝑤𝑡 that is not concentrated at zero
lag:
∞
𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0
1052 65. SHOCK NON INVERTIBILITY
Econ1.irf(ts_length=40, shock=None)
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.c_irf, label='Consumption')
plt.plot(Econ1.c_irf - Econ1.d_irf[:,0].reshape(40,1), label='Deficit')
plt.legend()
plt.title('Response to $w_{1t}$')
plt.subplot(122)
plt.plot(Econ1.c_irf, label='Consumption')
plt.plot(Econ1.c_irf - Econ1.d_irf[:,0].reshape(40, 1), label='Deficit')
plt.legend()
plt.title('Response to $w_{2t}$')
plt.show()
The above figure displays the impulse response of consumption and the deficit to the endow-
ment innovations
Consumption displays the characteristic “random walk” response with respect to each innova-
tion
Each endowment innovation leads to a temporary surplus followed by a permanent net-of-
interest deficit
The temporary surplus just offsets the permanent deficit in terms of expected present value
HS_kal = qe.Kalman(LSS_HS)
w_lss = HS_kal.whitener_lss()
ma_coefs = HS_kal.stationary_coefficients(50, 'ma')
ma_coefs = ma_coefs
jj = 50
y1_w1 = np.empty(jj)
1053
y2_w1 = np.empty(jj)
y1_w2 = np.empty(jj)
y2_w2 = np.empty(jj)
for t in range(jj):
y1_w1[t] = ma_coefs[t][0, 0]
y1_w2[t] = ma_coefs[t][0, 1]
y2_w1[t] = ma_coefs[t][1, 0]
y2_w2[t] = ma_coefs[t][1, 1]
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(y1_w1, label='Consumption')
plt.plot(y2_w1, label='Deficit')
plt.legend()
plt.title('Response to $u_{1t}$')
plt.subplot(122)
plt.plot(y1_w2, label='Consumption')
plt.plot(y2_w2, label='Deficit')
plt.legend()
plt.title('Response to $u_{2t}$')
plt.show()
The above figure displays the impulse response of consumption and the deficit to the innova-
tions in the econometrician’s Wold representation
• this is the object that would be recovered from a high order vector autoregression on
the econometrician’s observations
• this is indicative of the Granger causality imposed on the [𝑐𝑡 , 𝑐𝑡 − 𝑑𝑡 ] process by Hall’s
model: consumption Granger causes 𝑐𝑡 − 𝑑𝑡 , with no reverse causality
jj = 20
irf_wlss = w_lss.impulse_response(jj)
ycoefs = irf_wlss[1]
1054 65. SHOCK NON INVERTIBILITY
for t in range(jj):
a1_w1[t] = ycoefs[t][0, 0]
a1_w2[t] = ycoefs[t][0, 1]
a2_w1[t] = ycoefs[t][1, 0]
a2_w2[t] = ycoefs[t][1, 1]
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(a1_w1, label='Consumption innov.')
plt.plot(a2_w1, label='Deficit innov.')
plt.title('Response to $w_{1t}$')
plt.legend()
plt.subplot(122)
plt.plot(a1_w2, label='Consumption innov.')
plt.plot(a2_w2, label='Deficit innov.')
plt.legend()
plt.title('Response to $w_{2t}$')
plt.show()
∞
𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0
While the responses of the innovations to consumption are concentrated at lag zero for both
components of 𝑤𝑡 , the responses of the innovations to (𝑐𝑡 − 𝑑𝑡 ) are spread over time (espe-
cially in response to 𝑤1𝑡 )
Thus, the innovations to (𝑐𝑡 − 𝑑𝑡 ) as revealed by the vector autoregression depend on what
the economic agent views as “old news”
Part IX
1055
66
66.1 Contents
• Notation 66.2
• Duality 66.5
np.set_printoptions(precision=2)
1057
1058 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
Let:
n ... number of goods
m ... number of activities
A ... input matrix is m-by-n
a_{i,j} - amount of good j consumed by activity i
B ... output matrix is m-by-n
b_{i,j} - amount of good j produced by activity i
Parameters
----------
A : array_like or scalar(float)
Part of the state transition equation. It should be `n x n`
B : array_like or scalar(float)
Part of the state transition equation. It should be `n x k`
"""
def __repr__(self):
return self.__str__()
def __str__(self):
me = """
Generalized von Neumann expanding model:
- number of goods : {n}
- number of activities : {m}
Assumptions:
- AI: every column of B has a positive entry : {AI}
- AII: every row of A has a positive entry : {AII}
"""
# Irreducible : {irr}
return dedent(me.format(n=self.n, m=self.m,
AI=self.AI, AII=self.AII))
66.1. CONTENTS 1059
def bounds(self):
"""
Calculate the trivial upper and lower bounds for alpha (expansion rate) and
beta (interest factor). See the proof of Theorem 9.8 in Gale (1960) :cite:`gale1989theory`
"""
n, m = self.n, self.m
A, B = self.A, self.B
return LB, UB
M(gamma) = B - gamma * A
Outputs:
--------
value: scalar
value of the zero-sum game
strategy: vector
if dual = False, it is the intensity vector,
if dual = True, it is the price vector
"""
if dual == False:
# Solve the primal LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(m), -1])
else:
# Solve the dual LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(n), 1])
if res.status != 0:
print(res.message)
Outputs:
--------
alpha: scalar
optimal expansion rate
"""
LB, UB = self.bounds()
γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ)
V = ZS[0] # value of the game with γ
if V >= 0:
LB = γ
else:
UB = γ
return γ, x, p
66.2. NOTATION 1061
Outputs:
--------
beta: scalar
optimal interest rate
"""
LB, UB = self.bounds()
if V > 0:
LB = γ
else:
UB = γ
return γ, x, p
66.2 Notation
𝑏.,𝑗 > 0 ∀𝑗 = 1, 2, … , 𝑛
𝑎𝑖,. > 0 ∀𝑖 = 1, 2, … , 𝑚
A semi-positive 𝑚-vector:math:x denotes the levels at which activities are operated (intensity
vector)
Therefore,
B1 = np.array([[1, 0, 0, 0],
[0, 0, 2, 0],
66.4. DYNAMIC INTERPRETATION 1063
[0, 1, 0, 1]])
B2 = np.array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 1, 0, 1]])
The following code sets up our first Neumann economy or Neumann instance
Out[4]:
Generalized von Neumann expanding model:
- number of goods : 4
- number of activities : 3
Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True
Out[5]:
Generalized von Neumann expanding model:
- number of goods : 6
- number of activities : 5
Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True
Attach a time index 𝑡 to the preceding objects, regard an economy as a dynamic system, and
study sequences
An interesting special case holds the technology process constant and investigates the dynam-
ics of quantities and prices only
Accordingly, in the rest of this notebook, we assume that (𝐴𝑡 , 𝐵𝑡 ) = (𝐴, 𝐵) for all 𝑡 ≥ 0
A crucial element of the dynamic interpretation involves the timing of production
We assume that production (consumption of inputs) takes place in period 𝑡, while the associ-
ated output materializes in period 𝑡 + 1, i.e. consumption of 𝑥𝑇𝑡 𝐴 in period 𝑡 results in 𝑥𝑇𝑡 𝐵
amounts of output in period 𝑡 + 1
1064 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
𝑥𝑇𝑡 𝐵 ≥ 𝑥𝑇𝑡+1 𝐴 ∀𝑡 ≥ 1
which asserts that no more goods can be used today than were produced yesterday
Accordingly, 𝐴𝑝𝑡 tells the costs of production in period 𝑡 and 𝐵𝑝𝑡 tells revenues in period 𝑡 +
1
𝑥𝑡+1 ./𝑥𝑡 = 𝛼, ∀𝑡 ≥ 0
With balanced growth, the law of motion of 𝑥 is evidently 𝑥𝑡+1 = 𝛼𝑥𝑡 and so we can rewrite
the feasibility constraint as
𝑥𝑇𝑡 𝐵 ≥ 𝛼𝑥𝑇𝑡 𝐴 ∀𝑡
In the same spirit, define 𝛽 ∈ R as the interest factor per unit of time
We assume that it is always possible to earn a gross return equal to the constant interest fac-
tor 𝛽 by investing “outside the model”
Under this assumption about outside investment opportunities, a no-arbitrage condition gives
rise to the following (no profit) restriction on the price sequence:
𝛽𝐴𝑝𝑡 ≥ 𝐵𝑝𝑡 ∀𝑡
This says that production cannot yield a return greater than that offered by the investment
opportunity (note that we compare values in period 𝑡 + 1)
The balanced growth assumption allows us to drop time subscripts and conduct an analysis
purely in terms of a time-invariant growth rate 𝛼 and interest factor 𝛽
66.5 Duality
The following two problems are connected by a remarkable dual relationship between the
technological and valuation characteristics of the economy:
Definition: The technological expansion problem (TEP) for the economy (𝐴, 𝐵) is to find a
semi-positive 𝑚-vector 𝑥 > 0 and a number 𝛼 ∈ R, s.t.
max 𝛼
𝛼
s.t. 𝑥𝑇 𝐵 ≥ 𝛼𝑥𝑇 𝐴
66.5. DUALITY 1065
Theorem 9.3 of David Gale’s book [45] assets that if Assumptions I and II are both satisfied,
then a maximum value of 𝛼 exists and it is positive
It is called the technological expansion rate and is denoted by 𝛼0 . The associated intensity
vector 𝑥0 is the optimal intensity vector
Definition: The economical expansion problem (EEP) for (𝐴, 𝐵) is to find a semi-positive
𝑛-vector 𝑝 > 0 and a number 𝛽 ∈ R, such that
min 𝛽
𝛽
s.t. 𝐵𝑝 ≤ 𝛽𝐴𝑝
Assumptions I and II imply existence of a minimum value 𝛽0 > 0 called the economic expan-
sion rate
The corresponding price vector 𝑝0 is the optimal price vector
Evidently, the criterion functions in technological expansion problem and the economical ex-
pansion problem are both linearly homogeneous, so the optimality of 𝑥0 and 𝑝0 are defined
only up to a positive scale factor
For simplicity (and to emphasize a close connection to zero-sum games), in the following, we
normalize both vectors 𝑥0 and 𝑝0 to have unit length
A standard duality argument (see Lemma 9.4. in (Gale, 1960) [45]) implies that under As-
sumptions I and II, 𝛽0 ≤ 𝛼0
But in the other direction, that is 𝛽0 ≥ 𝛼0 , Assumptions I and II are not sufficient
Nevertheless, von Neumann (1937) [131] proved the following remarkable “duality-type” re-
sult connecting TEP and EEP
Theorem 1 (von Neumann): If the economy (𝐴, 𝐵) satisfies Assumptions I and II, then
there exists a set (𝛾 ∗ , 𝑥0 , 𝑝0 ), where 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ] ⊂ R, 𝑥0 > 0 is an 𝑚-vector, 𝑝0 > 0 is an
𝑛-vector and the following holds true
𝑥𝑇0 𝐵 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴
𝐵𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0
𝑥𝑇0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0
Proof (Sketch): Assumption I and II imply that there exist (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 )
solving the TEP and EEP, respectively. If 𝛾 ∗ > 𝛼0 , then by definition of 𝛼0 , there
cannot exist a semi-positive 𝑥 that satisfies 𝑥𝑇 𝐵 ≥ 𝛾 ∗ 𝑥𝑇 𝐴. Similarly, if 𝛾 ∗ < 𝛽0 ,
there is no semi-positive 𝑝 so that 𝐵𝑝 ≤ 𝛾 ∗ 𝐴𝑝. Let 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ], then 𝑥𝑇0 𝐵 ≥
𝛼0 𝑥𝑇0 𝐴 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴. Moreover, 𝐵𝑝0 ≤ 𝛽0 𝐴𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0 . These two inequalities imply
𝑥0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0.
Here the constant 𝛾 ∗ is both expansion and interest factor (not necessarily optimal)
We have already encountered and discussed the first two inequalities that represent feasibility
and no-profit conditions
Moreover, the equality compactly captures the requirements that if any good grows at a rate
larger than 𝛾 ∗ (i.e., if it is oversupplied), then its price must be zero; and that if any activity
provides negative profit, it must be unused
1066 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
Therefore, these expressions encode all equilibrium conditions and Theorem I essentially
states that under Assumptions I and II there always exists an equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ) with
balanced growth
Note that Theorem I is silent about uniqueness of the equilibrium. In fact, it does not rule
out (trivial) cases with 𝑥𝑇0 𝐵𝑝0 = 0 so that nothing of value is produced
To exclude such uninteresting cases, Kemeny, Morgenstern and Thomspson (1956) add an
extra requirement
• by playing the appropriate mixed stategy, the maximizing player can assure himself at
least 𝑉 (𝐶) (no matter what the column player chooses)
• by playing the appropriate mixed stategy, the minimizing player can make sure that the
maximizing player will not get more than 𝑉 (𝐶) (irrespective of what is the maximizing
player’s choice)
From the famous theorem of Nash (1951), it follows that there always exists a mixed strategy
Nash equilibrium for any finite two-player zero-sum game
Moreover, von Neumann’s Minmax Theorem (1928) [100] implies that
Finding Nash equilibria of a finite two-player zero-sum game can be formulated as a linear
programming problem
To see this, we introduce the following notation - For a fixed 𝑥, let 𝑣 be the value of the min-
imization problem: 𝑣 ≡ min𝑝 𝑥𝑇 𝐶𝑝 = min𝑗 𝑥𝑇 𝐶𝑒𝑗 - For a fixed 𝑝, let 𝑢 be the value of the
maximization problem: 𝑢 ≡ max𝑥 𝑥𝑇 𝐶𝑝 = max𝑖 (𝑒𝑖 )𝑇 𝐶𝑝
Then the max-min problem (the game from the maximizing player’s point of view) can be
written as the primal LP
𝑉 (𝐶) = max 𝑣
s.t. 𝑣𝜄𝑇𝑛 ≤ 𝑥𝑇 𝐶
𝑥≥0
𝜄𝑇𝑛 𝑥 =1
while the min-max problem (the game from the minimizing player’s point of view) is the dual
LP
𝑉 (𝐶) = min 𝑢
s.t. 𝑢𝜄𝑚 ≥ 𝐶𝑝
𝑝≥0
𝜄𝑇𝑚 𝑝 =1
Hamburger, Thompson and Weil (1967) [50] view the input-output pair of the economy as
payoff matrices of two-player zero-sum games. Using this interpretation, they restate As-
sumption I and II as follows
𝑀 (𝛾) ≡ 𝐵 − 𝛾𝐴
For fixed 𝛾, treating 𝑀 (𝛾) as a matrix game, we can calculate the solution of the game
• If 𝛾 > 𝛼0 , then for all 𝑥 > 0, there ∃𝑗 ∈ {1, … , 𝑛}, s.t. [𝑥𝑇 𝑀 (𝛾)]𝑗 < 0 implying that
𝑉 (𝑀 (𝛾)) < 0
• If 𝛾 < 𝛽0 , then for all 𝑝 > 0, there ∃𝑖 ∈ {1, … , 𝑚}, s.t. [𝑀 (𝛾)𝑝]𝑖 > 0 implying that
𝑉 (𝑀 (𝛾)) > 0
1068 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
• If 𝛾 ∈ {𝛽0 , 𝛼0 }, then (by Theorem I) the optimal intensity and price vectors 𝑥0 and 𝑝0
satisfy
That is, (𝑥0 , 𝑝0 , 0) is a solution of the game 𝑀 (𝛾) so that 𝑉 (𝑀 (𝛽0 )) = 𝑉 (𝑀 (𝛼0 )) = 0
Moreover, if 𝑥′ is optimal for the maximizing player in 𝑀 (𝛾 ′ ) for 𝛾 ′ ∈ (𝛽0 , 𝛼0 ) and 𝑝″ is op-
timal for the minimizing player in 𝑀 (𝛾 ″ ) where 𝛾 ″ ∈ (𝛽0 , 𝛾 ′ ), then (𝑥′ , 𝑝″ , 0) is a solution for
𝑀 (𝛾), ∀𝛾 ∈ (𝛾 ″ , 𝛾 ′ )
𝑀 (𝛾)𝑝″ = 𝑀 (𝛾 ″ ) + (𝛾 ″ − 𝛾)𝐴𝑝″ ≤ 0
hence 𝑉 (𝑀 (𝛾)) ≤ 0
It is clear from the above argument that 𝛽0 , 𝛼0 are the minimal and maximal 𝛾 for which
𝑉 (𝑀 (𝛾)) = 0
Moreover, Hamburger et al. (1967) [50] show that the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)) is continuous
and nonincreasing in 𝛾
This suggests an algorithm to compute (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 ) for a given input-output pair
(𝐴, 𝐵)
66.6.2 Algorithm
Hamburger, Thompson and Weil (1967) [50] propose a simple bisection algorithm to find the
minimal and maximal roots (i.e. 𝛽0 and 𝛼0 ) of the function 𝛾 ↦ 𝑉 (𝑀 (𝛾))
Step 1
First, notice that we can easily find trivial upper and lower bounds for 𝛼0 and 𝛽0
• TEP requires that 𝑥𝑇 (𝐵 − 𝛼𝐴) ≥ 0𝑇 and 𝑥 > 0, so if 𝛼 is so large that max𝑖 {[(𝐵 −
𝛼𝐴)𝜄𝑛 ]𝑖 } < 0, then TEP ceases to have a solution
• Similar to the upper bound, if 𝛽 is so low that min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽𝐴)]𝑗 } > 0, then the EEP
has no solution and so we can define LB as the 𝛽 ∗ that solves min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽 ∗ 𝐴)]𝑗 } = 0
66.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)10
In [6]: N1.bounds()
Step 2
Compute 𝛼0 and 𝛽0
• Finding 𝛼0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated with 𝑀 (𝛾). We can use either the primal or the dual LP problem
2. If 𝑉 (𝑀 (𝛾)) ≥ 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖
• Finding 𝛽0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated with 𝑀 (𝛾). We can use either the primal or the dual LP problem
2. If 𝑉 (𝑀 (𝛾)) > 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖
Existence: Since 𝑉 (𝑀 (𝐿𝐵)) > 0 and 𝑉 (𝑀 (𝑈 𝐵)) < 0 and 𝑉 (𝑀 (⋅)) is a continuous,
nonincreasing function, there is at least one 𝛾 ∈ [𝐿𝐵, 𝑈 𝐵], s.t. 𝑉 (𝑀 (𝛾)) = 0
The zerosum method calculates the value and optimal strategies associated with a given 𝛾
In [7]: γ = 2
for ax, grid, N, i in zip(axes, (value_ex1_grid, value_ex2_grid), (N1, N2), (1, 2)):
ax.plot(γ_grid, grid)
ax.set(title=f'Example {i}', xlabel='$\gamma$')
ax.axhline(0, c='k', lw=1)
ax.axvline(N.bounds()[0], c='r', ls='--', label='lower bound')
ax.axvline(N.bounds()[1], c='g', ls='--', label='upper bound')
plt.show()
1070 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
The expansion method implements the bisection algorithm for 𝛼0 (and uses the primal LP
problem for 𝑥0 )
α_0 = 1.2599210478365421
x_0 = [0.33 0.26 0.41]
The corresponding p from the dual = [0.41 0.33 0.26 0. ]
The interest method implements the bisection algorithm for 𝛽0 (and uses the dual LP prob-
lem for 𝑝0 )
β_0 = 1.2599210478365421
p_0 = [0.41 0.33 0.26 0. ]
The corresponding x from the primal = [0.33 0.26 0.41]
Of course, when 𝛾 ∗ is unique, it is irrelevant which one of the two methods we use
In particular, as will be shown below, in case of an irreducible (𝐴, 𝐵) (like in Example 1), the
maximal and minimal roots of 𝑉 (𝑀 (𝛾)) necessarily coincide implying a ‘’full duality” result,
i.e. 𝛼0 = 𝛽0 = 𝛾 ∗ , and that the expansion (and interest) rate 𝛾 ∗ is unique
As an illustration, compute first the maximal and minimal roots of 𝑉 (𝑀 (⋅)) for Example 2,
which displays a reducible input-output pair (𝐴, 𝐵)
α_0 = 1.2556518474593759
x_0 = [0. 0. 0.33 0.26 0.41]
The corresponding p from the dual = [4.43e-01 5.57e-01 0.00e+00 8.49e-17 1.26e-17 0.00e+00]
β_0 = 1.0000000009313226
p_0 = [0.5 0.5 0. 0. 0. 0. ]
The corresponding x from the primal = [3.33e-01 3.33e-01 3.33e-01 1.45e-19 0.00e+00]
As we can see, with a reducible (𝐴, 𝐵), the roots found by the bisection algorithms might dif-
fer, so there might be multiple 𝛾 ∗ that make the value of the game with 𝑀 (𝛾 ∗ ) zero. (see the
figure above)
Indeed, although the von Neumann theorem assures existence of the equilibrium, Assump-
tions I and II are not sufficient for uniqueness. Nonetheless, Kemeny et al. (1967) show that
there are at most finitely many economic solutions, meaning that there are only finitely many
𝛾 ∗ that satisfy 𝑉 (𝑀 (𝛾 ∗ )) = 0 and 𝑥𝑇0 𝐵𝑝0 > 0 and that for each such 𝛾𝑖∗ , there is a self-
sufficient part of the economy (a sub-economy) that in equilibrium can expand independently
with the expansion coefficient 𝛾𝑖∗
The following theorem (see Theorem 9.10. in Gale, 1960 [45]) asserts that imposing irre-
ducibility is sufficient for uniqueness of (𝛾 ∗ , 𝑥0 , 𝑝0 )
Theorem II: Consider the conditions of Theorem 1. If the economy (𝐴, 𝐵) is irreducible,
then 𝛾 ∗ = 𝛼0 = 𝛽0
There is a special (𝐴, 𝐵) that allows us to simplify the solution method significantly by invok-
ing the powerful Perron-Frobenius theorem for non-negative matrices
Definition: We call an economy simple if it satisfies 1. 𝑛 = 𝑚 2. Each activity produces
exactly one good 3. Each good is produced by one and only one activity
These assumptions imply that 𝐵 = 𝐼𝑛 , i.e., that 𝐵 can be written as an identity matrix (pos-
sibly after reshuffling its rows and columns)
The simple model has the following special property (Theorem 9.11. in [45]): if 𝑥0 and 𝛼0 > 0
solve the TEP with (𝐴, 𝐼𝑛 ), then
1
𝑥𝑇0 = 𝛼0 𝑥𝑇0 𝐴 ⇔ 𝑥𝑇0 𝐴 = ( ) 𝑥𝑇0
𝛼0
The latter shows that 1/𝛼0 is a positive eigenvalue of 𝐴 and 𝑥0 is the corresponding non-
negative left eigenvector
The classical result of Perron and Frobenius implies that a non-negative matrix always has
a non-negative eigenvalue-eigenvector pair
Moreover, if 𝐴 is irreducible, then the optimal intensity vector 𝑥0 is positive and unique up to
multiplication by a positive scalar
1072 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
1073
67
67.1 Contents
• Overview 67.2
• Introduction 67.3
• Spectral Analysis 67.4
• Implementation 67.5
In addition to what’s in Anaconda, this lecture will need the following libraries
67.2 Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models
routinely used to study economic and financial time series
This class has the advantage of being
We will focus much of our attention on linear covariance stationary models with a finite num-
ber of parameters
In particular, we will study stationary ARMA processes, which form a cornerstone of the
standard theory of time series analysis
Every ARMA process can be represented in linear state space form
However, ARMA processes have some important structure that makes it valuable to study
them separately
1075
1076 67. COVARIANCE STATIONARY PROCESSES
The famous Fourier transform and its inverse are used to map between the two representa-
tions
• [87], chapter 2
• [118], chapter 11
• John Cochrane’s notes on time series analysis, chapter 8
• [122], chapter 6
• [29], all
67.3 Introduction
67.3.1 Definitions
Throughout this lecture, we will work exclusively with zero-mean (i.e., 𝜇 = 0) covariance
stationary processes
The zero-mean assumption costs nothing in terms of generality since working with non-zero-
mean processes involves no more than adding a constant
Perhaps the simplest class of covariance stationary processes is the white noise processes
A process {𝜖𝑡 } is called a white noise process if
1. E𝜖𝑡 = 0
2. 𝛾(𝑘) = 𝜎2 1{𝑘 = 0} for some 𝜎 > 0
From the simple building block provided by white noise, we can construct a very flexible fam-
ily of covariance stationary processes — the general linear processes
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 , 𝑡∈Z (1)
𝑗=0
where
∞
2
𝛾(𝑘) = 𝜎 ∑ 𝜓𝑗 𝜓𝑗+𝑘 (2)
𝑗=0
By the Cauchy-Schwartz inequality, one can show that 𝛾(𝑘) satisfies equation Eq. (2)
Evidently, 𝛾(𝑘) does not depend on 𝑡
1078 67. COVARIANCE STATIONARY PROCESSES
Remarkably, the class of general linear processes goes a long way towards describing the en-
tire class of zero-mean covariance stationary processes
In particular, Wold’s decomposition theorem states that every zero-mean covariance station-
ary process {𝑋𝑡 } can be written as
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 + 𝜂𝑡
𝑗=0
where
67.3.5 AR and MA
𝜎2
𝛾(𝑘) = 𝜙𝑘 , 𝑘 = 0, 1, … (4)
1 − 𝜙2
The next figure plots an example of this function for 𝜙 = 0.8 and 𝜙 = −0.8 with 𝜎 = 1
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.4)
times = list(range(16))
acov = [�**k / (1 - �**2) for k in times]
ax.plot(times, acov, 'bo-', alpha=0.6, label=f'autocovariance, $\phi = {�:.2}$')
ax.legend(loc='upper right')
ax.set(xlabel='time', xlim=(0, 15))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
plt.show()
Another very simple process is the MA(1) process (here MA means “moving average”)
𝑋𝑡 = 𝜖𝑡 + 𝜃𝜖𝑡−1
The AR(1) can be generalized to an AR(𝑝) and likewise for the MA(1)
Putting all of this together, we get the
A stochastic process {𝑋𝑡 } is called an autoregressive moving average process, or ARMA(𝑝, 𝑞),
if it can be written as
1080 67. COVARIANCE STATIONARY PROCESSES
𝐿0 𝑋𝑡 − 𝜙1 𝐿1 𝑋𝑡 − ⋯ − 𝜙𝑝 𝐿𝑝 𝑋𝑡 = 𝐿0 𝜖𝑡 + 𝜃1 𝐿1 𝜖𝑡 + ⋯ + 𝜃𝑞 𝐿𝑞 𝜖𝑡 (6)
In what follows we always assume that the roots of the polynomial 𝜙(𝑧) lie outside the unit
circle in the complex plane
This condition is sufficient to guarantee that the ARMA(𝑝, 𝑞) process is covariance stationary
In fact, it implies that the process falls within the class of general linear processes described
above
That is, given an ARMA(𝑝, 𝑞) process {𝑋𝑡 } satisfying the unit circle condition, there exists a
∞
square summable sequence {𝜓𝑡 } with 𝑋𝑡 = ∑𝑗=0 𝜓𝑗 𝜖𝑡−𝑗 for all 𝑡
The sequence {𝜓𝑡 } can be obtained by a recursive procedure outlined on page 79 of [29]
The function 𝑡 ↦ 𝜓𝑡 is often called the impulse response function
Autocovariance functions provide a great deal of information about covariance stationary pro-
cesses
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution
Even for non-Gaussian processes, it provides a significant amount of information
It turns out that there is an alternative representation of the autocovariance function of a
covariance stationary process, called the spectral density
At times, the spectral density is easier to derive, easier to manipulate, and provides additional
intuition
67.4. SPECTRAL ANALYSIS 1081
Before discussing the spectral density, we invite you to recall the main properties of complex
numbers (or skip to the next section)
It can be helpful to remember that, in a formal sense, complex numbers are just points
(𝑥, 𝑦) ∈ R2 endowed with a specific notion of multiplication
When (𝑥, 𝑦) is regarded as a complex number, 𝑥 is called the real part and 𝑦 is called the
imaginary part
The modulus or absolute value of a complex number 𝑧 = (𝑥, 𝑦) is just its Euclidean norm in
R2 , but is usually written as |𝑧| instead of ‖𝑧‖
The product of two complex numbers (𝑥, 𝑦) and (𝑢, 𝑣) is defined to be (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢),
while addition is standard pointwise vector addition
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field — addition and multiplication play well together, just as they do in R
The complex number (𝑥, 𝑦) is often written as 𝑥 + 𝑖𝑦, where 𝑖 is called the imaginary unit and
is understood to obey 𝑖2 = −1
The 𝑥 + 𝑖𝑦 notation provides an easy way to remember the definition of multiplication given
above, because, proceeding naively,
Converted back to our first notation, this becomes (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢) as promised
Complex numbers can be represented in the polar form 𝑟𝑒𝑖𝜔 where
(Some authors normalize the expression on the right by constants such as 1/𝜋 — the conven-
tion chosen makes little difference provided you are consistent)
Using the fact that 𝛾 is even, in the sense that 𝛾(𝑡) = 𝛾(−𝑡) for all 𝑡, we can show that
• real-valued
• even (𝑓(𝜔) = 𝑓(−𝜔) ), and
• 2𝜋-periodic, in the sense that 𝑓(2𝜋 + 𝜔) = 𝑓(𝜔) for all 𝜔
It follows that the values of 𝑓 on [0, 𝜋] determine the values of 𝑓 on all of R — the proof is an
exercise
For this reason, it is standard to plot the spectral density only on the interval [0, 𝜋]
It is an exercise to show that the MA(1) process 𝑋𝑡 = 𝜃𝜖𝑡−1 + 𝜖𝑡 has a spectral density
With a bit more effort, it’s possible to show (see, e.g., p. 261 of [118]) that the spectral den-
sity of the AR(1) process 𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 is
𝜎2
𝑓(𝜔) = (11)
1 − 2𝜙 cos(𝜔) + 𝜙2
More generally, it can be shown that the spectral density of the ARMA process Eq. (5) is
2
𝜃(𝑒𝑖𝜔 )
𝑓(𝜔) = ∣ ∣ 𝜎2 (12)
𝜙(𝑒𝑖𝜔 )
where
The derivation of Eq. (12) uses the fact that convolutions become products under Fourier
transformations
The proof is elegant and can be found in many places — see, for example, [118], chapter 11,
section 4
It’s a nice exercise to verify that Eq. (10) and Eq. (11) are indeed special cases of Eq. (12)
67.4. SPECTRAL ANALYSIS 1083
Plotting Eq. (11) reveals the shape of the spectral density for the AR(1) model when 𝜙 takes
the values 0.8 and -0.8 respectively
These spectral densities correspond to the autocovariance functions for the AR(1) process
shown above
Informally, we think of the spectral density as being large at those 𝜔 ∈ [0, 𝜋] at which the
autocovariance function seems approximately to exhibit big damped cycles
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral
density for the case 𝜙 = −0.8 is large at 𝜔 = 𝜋
1084 67. COVARIANCE STATIONARY PROCESSES
When we evaluate this at 𝜔 = 𝜋, we get a large number because cos(𝜋𝑘) is large and positive
when (−0.8)𝑘 is positive, and large in absolute value and negative when (−0.8)𝑘 is negative
Hence the product is always large and positive, and hence the sum of the products on the
right-hand side of Eq. (13) is large
These ideas are illustrated in the next figure, which has 𝑘 on the horizontal axis
In [4]: � = -0.8
times = list(range(16))
y1 = [�**k / (1 - �**2) for k in times]
y2 = [np.cos(np.pi * k) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("k")
plt.show()
67.4. SPECTRAL ANALYSIS 1085
On the other hand, if we evaluate 𝑓(𝜔) at 𝜔 = 𝜋/3, then the cycles are not matched, the
sequence 𝛾(𝑘) cos(𝜔𝑘) contains both positive and negative terms, and hence the sum of these
terms is much smaller
In [5]: � = -0.8
times = list(range(16))
y1 = [�**k / (1 - �**2) for k in times]
y2 = [np.cos(np.pi * k/3) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
1086 67. COVARIANCE STATIONARY PROCESSES
plt.show()
In summary, the spectral density is large at frequencies 𝜔 where the autocovariance function
exhibits damped cycles
We have just seen that the spectral density is useful in the sense that it provides a frequency-
based perspective on the autocovariance structure of a covariance stationary process
Another reason that the spectral density is useful is that it can be “inverted” to recover the
autocovariance function via the inverse Fourier transform
In particular, for all 𝑘 ∈ Z, we have
𝜋
1
𝛾(𝑘) = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 (14)
2𝜋 −𝜋
This is convenient in situations where the spectral density is easier to calculate and manipu-
late than the autocovariance function
(For example, the expression Eq. (12) for the ARMA spectral density is much easier to work
with than the expression for the ARMA autocovariance)
67.4. SPECTRAL ANALYSIS 1087
This section is loosely based on [118], p. 249-253, and included for those who
Others should feel free to skip to the next section — none of this material is necessary to
progress to computation
Recall that every separable Hilbert space 𝐻 has a countable orthonormal basis {ℎ𝑘 }
The nice thing about such a basis is that every 𝑓 ∈ 𝐻 satisfies
Summarizing these results, we say that any separable Hilbert space is isometrically isomor-
phic to ℓ2
In essence, this says that each separable Hilbert space we consider is just a different way of
looking at the fundamental space ℓ2
With this in mind, let’s specialize to a setting where
𝑒𝑖𝜔𝑘
ℎ𝑘 (𝜔) = √ , 𝑘 ∈ Z, 𝜔 ∈ [−𝜋, 𝜋]
2𝜋
Using the definition of 𝑇 from above and the fact that 𝑓 is even, we now have
1088 67. COVARIANCE STATIONARY PROCESSES
𝑒𝑖𝜔𝑘 1
𝑇 𝛾 = ∑ 𝛾(𝑘) √ = √ 𝑓(𝜔) (16)
𝑘∈Z 2𝜋 2𝜋
In other words, apart from a scalar multiple, the spectral density is just a transformation of
𝛾 ∈ ℓ2 under a certain linear isometry — a different way to view 𝛾
In particular, it is an expansion of the autocovariance function with respect to the trigono-
metric basis functions in 𝐿2
As discussed above, the Fourier coefficients of 𝑇 𝛾 are given by the sequence 𝛾, and, in partic-
ular, 𝛾(𝑘) = ⟨𝑇 𝛾, ℎ𝑘 ⟩
Transforming this inner product into its integral expression and using Eq. (16) gives Eq. (14),
justifying our earlier expression for the inverse transform
67.5 Implementation
Most code for working with covariance stationary models deals with ARMA models
Python code for studying ARMA models can be found in the tsa submodule of statsmodels
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis —
we’ve put together the module arma.py, which is part of QuantEcon.py package
The module provides functions for mapping ARMA(𝑝, 𝑞) models into their
67.5.1 Application
Let’s use this code to replicate the plots on pages 68–69 of [87]
Here are some functions to generate the plots
acov = arma.autocovariance()
ax.stem(list(range(len(acov))), acov)
ax.set(xlim=(-0.5, len(acov) - 0.5), title='Autocovariance',
xlabel='time', ylabel='autocovariance')
return ax
def quad_plot(arma):
"""
Plots the impulse response, spectral_density, autocovariance,
and one realization of the process.
"""
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8))
plot_functions = [plot_impulse_response,
plot_spectral_density,
plot_autocovariance,
plot_simulation]
for plot_func, ax in zip(plot_functions, axes.flatten()):
plot_func(arma, ax)
plt.tight_layout()
plt.show()
� = 0.0
θ = 0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)
If we look carefully, things look good: the spectrum is the flat line at 100 at the very top of
the spectrum graphs, which is at it should be
Also
1 𝜋
• the variance equals 1 = 2𝜋 ∫−𝜋 1𝑑𝜔 as it should
• the covariogram and impulse response look as they should
• it is actually challenging to visualize a time series realization of white noise –
a sequence of surprises – but this too looks pretty good
To get some more examples, as our laboratory we’ll replicate quartets of graphs that [87] use
to teach “how to read spectral densities”
Ljunqvist and Sargent’s first model is 𝑋𝑡 = 1.3𝑋𝑡−1 − .7𝑋𝑡−2 + 𝜖𝑡
In [9]: � = 0.9
θ = -0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)
In [11]: � = .98
θ = -0.7
arma = qe.ARMA(�, θ)
quad_plot(arma)
67.5. IMPLEMENTATION 1093
67.5.2 Explanation
The call
arma = ARMA(�, θ, σ)
The parameter σ is always a scalar, the standard deviation of the white noise
We also permit � and θ to be scalars, in which case the model will be interpreted as
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 + 𝜃𝜖𝑡−1
The two numerical packages most useful for working with ARMA models are scipy.signal
and numpy.fft
The package scipy.signal expects the parameters to be passed into its functions in a
manner consistent with the alternative ARMA notation Eq. (8)
For example, the impulse response sequence {𝜓𝑡 } discussed above can be obtained using
scipy.signal.dimpulse, and the function call should be of the form
1094 67. COVARIANCE STATIONARY PROCESSES
where ma_poly and ar_poly correspond to the polynomials in Eq. (7) — that is,
To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their
values computed automatically from the values of phi and theta supplied by the user
If the user decides to change the value of either theta or phi ex-post by assignments such
as arma.phi = (0.5, 0.2) or arma.theta = (0, -0.1)
then ma_poly and ar_poly should update automatically to reflect these new parameters
This is achieved in our implementation by using descriptors
As discussed above, for ARMA processes the spectral density has a simple representation that
is relatively easy to calculate
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform
Here we use NumPy’s Fourier transform package np.fft, which wraps a standard Fortran-
based package called FFTPACK
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given
sequence 𝐴0 , 𝐴1 , … , 𝐴𝑛−1 and returns the sequence 𝑎0 , 𝑎1 , … , 𝑎𝑛−1 defined by
1 𝑛−1
𝑎𝑘 = ∑ 𝐴 𝑒𝑖𝑘2𝜋𝑡/𝑛
𝑛 𝑡=0 𝑡
Thus, if we set 𝐴𝑡 = 𝑓(𝜔𝑡 ), where 𝑓 is the spectral density and 𝜔𝑡 ∶= 2𝜋𝑡/𝑛, then
2𝜋 𝜋
1 1
𝑎𝑘 ≈ ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔
2𝜋 0 2𝜋 −𝜋
Estimation of Spectra
68.1 Contents
• Overview 68.2
• Periodograms 68.3
• Smoothing 68.4
• Exercises 68.5
• Solutions 68.6
In addition to what’s in Anaconda, this lecture will need the following libraries
68.2 Overview
68.3 Periodograms
Recall that the spectral density 𝑓 of a covariance stationary process with autocorrelation
function 𝛾 can be written
1095
1096 68. ESTIMATION OF SPECTRA
Now consider the problem of estimating the spectral density of a given time series, when 𝛾 is
unknown
In particular, let 𝑋0 , … , 𝑋𝑛−1 be 𝑛 consecutive observations of a single time series that is as-
sumed to be covariance stationary
The most common estimator of the spectral density of this process is the periodogram of
𝑋0 , … , 𝑋𝑛−1 , which is defined as
2
1 𝑛−1
𝐼(𝜔) ∶= ∣∑ 𝑋𝑡 𝑒𝑖𝑡𝜔 ∣ , 𝜔∈R (1)
𝑛 𝑡=0
2 2
1⎧
{ 𝑛−1 𝑛−1 ⎫
}
𝐼(𝜔) = [∑ 𝑋𝑡 cos(𝜔𝑡)] + [∑ 𝑋𝑡 sin(𝜔𝑡)]
𝑛⎨
{ ⎬
}
⎩ 𝑡=0 𝑡=0 ⎭
It is straightforward to show that the function 𝐼 is even and 2𝜋-periodic (i.e., 𝐼(𝜔) = 𝐼(−𝜔)
and 𝐼(𝜔 + 2𝜋) = 𝐼(𝜔) for all 𝜔 ∈ R)
From these two results, you will be able to verify that the values of 𝐼 on [0, 𝜋] determine the
values of 𝐼 on all of R
The next section helps to explain the connection between the periodogram and the spectral
density
68.3.1 Interpretation
To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies
2𝜋𝑗
𝜔𝑗 ∶= , 𝑗 = 0, … , 𝑛 − 1
𝑛
𝑛−1 𝑛−1
𝑖𝑡𝜔𝑗 𝑡
∑𝑒 = ∑ exp {𝑖2𝜋𝑗 } = 0
𝑡=0 𝑡=0
𝑛
𝑛−1
Letting 𝑋̄ denote the sample mean 𝑛−1 ∑𝑡=0 𝑋𝑡 , we then have
2
𝑛−1 𝑛−1 𝑛−1
̄ 𝑖𝑡𝜔𝑗 ∣ = ∑(𝑋𝑡 − 𝑋)𝑒
𝑛𝐼(𝜔𝑗 ) = ∣∑(𝑋𝑡 − 𝑋)𝑒 ̄ −𝑖𝑟𝜔𝑗
̄ 𝑖𝑡𝜔𝑗 ∑(𝑋𝑟 − 𝑋)𝑒
𝑡=0 𝑡=0 𝑟=0
68.3. PERIODOGRAMS 1097
Now let
1 𝑛−1 ̄ ̄
𝛾(𝑘)
̂ ∶= ∑(𝑋 − 𝑋)(𝑋 𝑡−𝑘 − 𝑋), 𝑘 = 0, 1, … , 𝑛 − 1
𝑛 𝑡=𝑘 𝑡
This is the sample autocovariance function, the natural “plug-in estimator” of the autocovari-
ance function 𝛾
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations
with sample means)
With this notation, we can now write
𝑛−1
𝐼(𝜔𝑗 ) = 𝛾(0)
̂ + 2 ∑ 𝛾(𝑘)
̂ cos(𝜔𝑗 𝑘)
𝑘=1
Recalling our expression for 𝑓 given above, we see that 𝐼(𝜔𝑗 ) is just a sample analog of 𝑓(𝜔𝑗 )
68.3.2 Calculation
Let’s now consider how to compute the periodogram as defined in Eq. (1)
There are already functions available that will do this for us — an example is statsmod-
els.tsa.stattools.periodogram in the statsmodels package
However, it is very simple to replicate their results, and this will give us a platform to make
useful extensions
The most common way to calculate the periodogram is via the discrete Fourier transform,
which in turn is implemented through the fast Fourier transform algorithm
In general, given a sequence 𝑎0 , … , 𝑎𝑛−1 , the discrete Fourier transform computes the se-
quence
𝑛−1
𝑡𝑗
𝐴𝑗 ∶= ∑ 𝑎𝑡 exp {𝑖2𝜋 }, 𝑗 = 0, … , 𝑛 − 1
𝑡=0
𝑛
With numpy.fft.fft imported as fft and 𝑎0 , … , 𝑎𝑛−1 stored in NumPy array a, the func-
tion call fft(a) returns the values 𝐴0 , … , 𝐴𝑛−1 as a NumPy array
It follows that when the data 𝑋0 , … , 𝑋𝑛−1 are stored in array X, the values 𝐼(𝜔𝑗 ) at the
Fourier frequencies, which are given by
2
1 𝑛−1 𝑡𝑗
∣∑ 𝑋𝑡 exp {𝑖2𝜋 }∣ , 𝑗 = 0, … , 𝑛 − 1
𝑛 𝑡=0 𝑛
Note: The NumPy function abs acts elementwise, and correctly handles complex numbers
(by computing their modulus, which is exactly what we need)
A function called periodogram that puts all this together can be found here
Let’s generate some data for this function using the ARMA class from QuantEcon.py (see the
lecture on linear processes for more details)
Here’s a code snippet that, once the preceding code has been run, generates data from the
process
where {𝜖𝑡 } is white noise with unit variance, and compares the periodogram to the actual
spectral density
n = 40 # Data size
�, θ = 0.5, (0, -0.8) # AR and MA parameters
lp = ARMA(�, θ)
X = lp.simulation(ts_length=n)
fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
x_sd, y_sd = lp.spectral_density(two_pi=False, res=120)
ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')
ax.legend()
plt.show()
This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not sur-
prising that the estimate is poor
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density
This brings us to our next topic
68.4 Smoothing
𝑝
𝐼𝑆 (𝜔𝑗 ) ∶= ∑ 𝑤(ℓ)𝐼(𝜔𝑗+ℓ ) (3)
ℓ=−𝑝
1100 68. ESTIMATION OF SPECTRA
where the weights 𝑤(−𝑝), … , 𝑤(𝑝) are a sequence of 2𝑝 + 1 nonnegative values summing to
one
In general, larger values of 𝑝 indicate more smoothing — more on this below
The next figure shows the kind of sequence typically used
Note the smaller weights towards the edges and larger weights in the center, so that more dis-
tant values from 𝐼(𝜔𝑗 ) have less weight than closer ones in the sum Eq. (3)
def hanning_window(M):
w = [0.5 - 0.5 * np.cos(2 * np.pi * n/(M-1)) for n in range(M)]
return w
Our next step is to provide code that will not only estimate the periodogram but also provide
smoothing as required
68.4. SMOOTHING 1101
Such functions have been written in estspec.py and are available once you’ve installed Quan-
tEcon.py
The GitHub listing displays three functions, smooth(), periodogram(),
ar_periodogram(). We will discuss the first two here and the third one below
The periodogram() function returns a periodogram, optionally smoothed via the
smooth() function
Regarding the smooth() function, since smoothing adds a nontrivial amount of computa-
tion, we have applied a fairly terse array-centric method based around np.convolve
Readers are left either to explore or simply to use this code according to their interests
The next three figures each show smoothed and unsmoothed periodograms, as well as the
population or “true” spectral density
(The model is the same as before — see equation Eq. (2) — and there are 400 observations)
From the top figure to bottom, the window length is varied from small to large
In looking at the figure, we can see that for this model and data size, the window length cho-
sen in the middle figure provides the best fit
1102 68. ESTIMATION OF SPECTRA
Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing
Of course in real estimation problems, the true spectral density is not visible and the choice
of appropriate smoothing will have to be made based on judgement/priors or some other the-
ory
In the code listing, we showed three functions from the file estspec.py
The third function in the file (ar_periodogram()) adds a pre-processing step to peri-
odogram smoothing
First, we describe the basic idea, and after that we give the code
The essential idea is to
1. Transform the data in order to make estimation of the spectral density more efficient
2. Compute the periodogram associated with the transformed data
3. Reverse the effect of the transformation on the periodogram, so that it now estimates
the spectral density of the original process
Let’s examine this idea more carefully in a particular setting — where the data are assumed
to be generated by an AR(1) process
(More general ARMA settings can be handled using similar techniques to those described be-
low)
Suppose in particular that {𝑋𝑡 } is covariance stationary and AR(1), with
where 𝜇 and 𝜙 ∈ (−1, 1) are unknown parameters and {𝜖𝑡 } is white noise
It follows that if we regress 𝑋𝑡+1 on 𝑋𝑡 and an intercept, the residuals will approximate white
noise
Let
In view of an earlier result we obtained while discussing ARMA processes, 𝑓 and 𝑔 are related
by
2
1
𝑓(𝜔) = ∣ ∣ 𝑔(𝜔) (5)
1 − 𝜙𝑒𝑖𝜔
This suggests that the recoloring step, which constructs an estimate 𝐼 of 𝑓 from 𝐼0 , should set
2
1
𝐼(𝜔) = ∣ ∣ 𝐼0 (𝜔)
1 − 𝜙𝑒̂ 𝑖𝜔
The periodograms are calculated from time series drawn from Eq. (4) with 𝜇 = 0 and 𝜙 =
−0.9
Each time series is of length 150
The difference between the three subfigures is just randomness — each one uses a different
1104 68. ESTIMATION OF SPECTRA
In all cases, periodograms are fit with the “hamming” window and window length of 65
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer
to the true spectral density
68.5 Exercises
68.5.1 Exercise 1
68.5.2 Exercise 2
The model is as in equation Eq. (4), with 𝜇 = 0, 𝜙 = −0.9 and 150 observations in each time
series
All periodograms are fit with the “hamming” window and window length of 65
68.6 Solutions
68.6.1 Exercise 1
In [5]: ## Data
n = 400
� = 0.5
θ = 0, -0.8
lp = ARMA(�, θ)
X = lp.simulation(ts_length=n)
x, y = periodogram(X)
ax[i].plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
ax[i].legend()
ax[i].set_title(f'window length = {wl}')
plt.show()
1106 68. ESTIMATION OF SPECTRA
68.6.2 Exercise 2
In [6]: lp = ARMA(-0.9)
wl = 65
for i in range(3):
X = lp.simulation(ts_length=150)
ax[i].set_xlim(0, np.pi)
ax[i].legend(loc='upper left')
plt.show()
1108 68. ESTIMATION OF SPECTRA
69
69.1 Contents
• Overview 69.2
• A Particular Additive Functional 69.3
• Dynamics 69.4
• Code 69.5
• More About the Multiplicative Martingale 69.6
69.2 Overview
Many economic time series display persistent growth that prevents them from being asymp-
totically stationary and ergodic
For example, outputs, prices, and dividends typically display irregular but persistent growth
Asymptotic stationarity and ergodicity are key assumptions needed to make it possible to
learn by applying statistical methods
Are there ways to model time series having persistent growth that still enables statistical
learning based on a law of large number for an asymptotically stationary and ergodic process?
The answer provided by Hansen and Scheinkman [60] is yes
They described two classes of time series models that accommodate growth
They are:
1109
1110 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
1. a constant
2. a trend component
3. an asymptotically stationary component
4. a martingale
Here
• 𝑥𝑡 is an 𝑛 × 1 vector,
• 𝐴 is an 𝑛 × 𝑛 stable matrix (all eigenvalues lie within the open unit circle),
• 𝑧𝑡+1 ∼ 𝑁 (0, 𝐼) is an 𝑚 × 1 IID shock,
• 𝐵 is an 𝑛 × 𝑚 matrix, and
• 𝑥0 ∼ 𝑁 (𝜇0 , Σ0 ) is a random initial condition for 𝑥
• a scalar constant 𝜈,
• the vector 𝑥𝑡 , and
• the same Gaussian vector 𝑧𝑡+1 that appears in the VAR Eq. (1)
69.4. DYNAMICS 1111
In particular,
A convenient way to represent our additive functional is to use a linear state space system
To do this, we set up state and observation vectors
1
𝑥
𝑥𝑡̂ = ⎢𝑥𝑡 ⎤
⎡
⎥ and 𝑦𝑡̂ = [ 𝑡 ]
𝑦𝑡
⎣ 𝑦𝑡 ⎦
1 1 0 0 1 0
⎡𝑥 ⎤ = ⎡0 𝐴 0⎤ ⎡𝑥 ⎤ + ⎡ 𝐵 ⎤ 𝑧
⎢ 𝑡+1 ⎥ ⎢ ⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡+1
′ ′
⎣ 𝑦𝑡+1 ⎦ ⎣𝜈 𝐷 1⎦ ⎣ 𝑦𝑡 ⎦ ⎣𝐹 ⎦
1
𝑥𝑡 0 𝐼 0 ⎡ ⎤
[ ]=[ ] 𝑥
𝑦𝑡 0 0 1 ⎢ 𝑡⎥
⎣ 𝑦𝑡 ⎦
𝑥𝑡+1
̂ = 𝐴𝑥 ̂ ̂ + 𝐵𝑧
̂ 𝑡+1
𝑡
𝑦𝑡̂ = 𝐷̂ 𝑥𝑡̂
69.4 Dynamics
𝑥𝑡+1
̃ = 𝜙1 𝑥𝑡̃ + 𝜙2 𝑥𝑡−1
̃ + 𝜙3 𝑥𝑡−2
̃ + 𝜙4 𝑥𝑡−3
̃ + 𝜎𝑧𝑡+1 (3)
𝜙(𝑧) = (1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − 𝜙3 𝑧3 − 𝜙4 𝑧4 )
In fact, this whole model can be mapped into the additive functional system definition in
Eq. (1) – Eq. (2) by appropriate selection of the matrices 𝐴, 𝐵, 𝐷, 𝐹
You can try writing these matrices down now as an exercise — correct expressions appear in
the code below
69.4.1 Simulation
In [2]: """
@authors: Chase Coleman, Balint Szoke, Tom Sargent
"""
import numpy as np
import scipy as sp
import scipy.linalg as la
import quantecon as qe
import matplotlib.pyplot as plt
from scipy.stats import norm, lognorm
class AMF_LSS_VAR:
"""
This class transforms an additive (multiplicative)
functional into a QuantEcon linear state space system.
"""
# Set F
if not np.any(F):
self.F = np.zeros((self.nk, 1))
else:
self.F = F
# Set ν
if not np.any(ν):
self.ν = np.zeros((self.nm, 1))
elif type(ν) == float:
self.ν = np.asarray([[ν]])
elif len(ν.shape) == 1:
self.ν = np.expand_dims(ν, 1)
else:
self.ν = ν
if self.ν.shape[0] != self.D.shape[0]:
raise ValueError("The dimension of ν is inconsistent with D!")
def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
nx, nk, nm = self.nx, self.nk, self.nm
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()
# Auxiliary blocks with 0's and 1's to fill out the lss matrices
nx0c = np.zeros((nx, 1))
nx0r = np.zeros(nx)
nx1 = np.ones(nx)
nk0 = np.zeros(nk)
ny0c = np.zeros((nm, 1))
ny0r = np.zeros(nm)
ny1m = np.eye(nm)
ny0m = np.zeros((nm, nm))
nyx0m = np.zeros_like(D)
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (κ_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
I = np.identity(self.nx)
A_res = la.solve(I - self.A, I)
g = self.D @ A_res
H = self.F + self.D @ A_res @ self.B
return self.ν, H, g
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5)*np.expand_dims(np.diag(H @ H.T), 1)
return ν_tilde, H, g
return llh[-1]
"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm
# Allocate space (nm is the number of additive functionals - we want npaths for each)
mpath = np.empty((nm*npaths, T))
mbounds = np.empty((nm*2, T))
spath = np.empty((nm*npaths, T))
sbounds = np.empty((nm*2, T))
tpath = np.empty((nm*npaths, T))
69.4. DYNAMICS 1115
add_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
add_figs.append(self.plot_given_paths(T, ypath[li:ui,:], mpath[li:ui,:], spath[li:ui,:],
tpath[li:ui,:], mbounds[LI:UI,:], sbounds[LI:UI,:],
show_trend=show_trend))
return add_figs
"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm
# Matrices for the multiplicative decomposition
ν_tilde, H, g = self.multiplicative_decomp()
# Allocate space (nm is the number of functionals - we want npaths for each)
mpath_mult = np.empty((nm*npaths, T))
mbounds_mult = np.empty((nm*2, T))
spath_mult = np.empty((nm*npaths, T))
sbounds_mult = np.empty((nm*2, T))
tpath_mult = np.empty((nm*npaths, T))
ypath_mult = np.empty((nm*npaths, T))
t*(.5)*np.expand_dims(np.diag(H @ H.T),1)[ii])))
Sdist = lognorm(np.asscalar(np.sqrt(yvar[nx+2*nm+ii, nx+2*nm+ii])),
scale = np.asscalar( np.exp(-ymeans[nx+2*nm+ii])))
mbounds_mult[li:ui, t] = Mdist.ppf([.01, .99])
sbounds_mult[li:ui, t] = Sdist.ppf([.01, .99])
mult_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
return mult_figs
# Allocate space (nm is the number of functionals - we want npaths for each)
mpath_mult = np.empty((nm*npaths, T))
mbounds_mult = np.empty((nm*2, T))
mart_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mart_figs.append(self.plot_martingale_paths(T, mpath_mult[li:ui, :],
mbounds_mult[LI:UI, :],
horline=1))
mart_figs[ii].suptitle(f'Martingale components for many paths of $y_{ii+1}$', fontsize=14)
return mart_figs
69.4. DYNAMICS 1117
# Allocate space
trange = np.arange(T)
# Create figure
fig, ax = plt.subplots(2, 2, sharey=True, figsize=(15, 8))
return fig
# Create figure
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
return fig
For now, we just plot 𝑦𝑡 and 𝑥𝑡 , postponing until later a description of exactly how we com-
pute them
# A matrix should be n x n
A = np.array([[�_1, �_2, �_3, �_4],
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0]])
# B matrix should be n x k
B = np.array([[σ, 0, 0, 0]]).T
D = np.array([1, 0, 0, 0]) @ A
F = np.array([1, 0, 0, 0]) @ B
T = 150
x, y = amf.lss.simulate(T)
69.4.2 Decomposition
Hansen and Sargent [58] describe how to construct a decomposition of an additive functional
into four parts:
To attain this decomposition for the particular class of additive functionals defined by Eq. (1)
and Eq. (2), we first construct the matrices
𝐻 ∶= 𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷
𝑔 ∶= 𝐷′ (𝐼 − 𝐴)−1
Martingale component
⏞
𝑡 initial conditions
𝑦𝑡 = 𝑡𝜈
⏟ + ∑ 𝐻𝑧𝑗 − 𝑔𝑥
⏟𝑡 + 𝑔⏞
𝑥 0 + 𝑦0
trend component 𝑗=1 stationary component
At this stage, you should pause and verify that 𝑦𝑡+1 − 𝑦𝑡 satisfies Eq. (2)
It is convenient for us to introduce the following notation:
1 1 0 0 0 0 1 0
⎡ 𝑡 + 1 ⎤ ⎡1 1 0 0 0⎤ ⎡ 𝑡 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 𝑥𝑡+1 ⎥ = ⎢0 0 𝐴 0 0⎥ ⎢ 𝑥𝑡 ⎥ + ⎢ 𝐵 ⎥ 𝑧𝑡+1
⎢ 𝑦𝑡+1 ⎥ ⎢𝜈 0 𝐷′ 1 0⎥ ⎢ 𝑦𝑡 ⎥ ⎢ 𝐹 ′ ⎥
⎣𝑚𝑡+1 ⎦ ⎣0 0 0 0 1⎦ ⎣𝑚𝑡 ⎦ ⎣𝐻 ′ ⎦
and
1120 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
𝑥𝑡 0 0 𝐼 0 0 1
⎡ 𝑦 ⎤ ⎡0 0 0 1 0⎤ ⎡ 𝑡 ⎤
⎢ 𝑡⎥ ⎢ ⎥⎢ ⎥
⎢ 𝜏𝑡 ⎥ = ⎢0 𝜈 0 0 0⎥ ⎢ 𝑥𝑡 ⎥
⎢𝑚𝑡 ⎥ ⎢0 0 0 0 1 ⎥ ⎢ 𝑦𝑡 ⎥
⎣ 𝑠𝑡 ⎦ ⎣0 0 −𝑔 0 0⎦ ⎣𝑚𝑡 ⎦
With
1 𝑥𝑡
⎡ 𝑡 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑡⎥
𝑥̃ ∶= ⎢ 𝑥𝑡 ⎥ and 𝑦 ̃ ∶= ⎢ 𝜏𝑡 ⎥
⎢ 𝑦𝑡 ⎥ ⎢𝑚𝑡 ⎥
⎣𝑚𝑡 ⎦ ⎣ 𝑠𝑡 ⎦
𝑥𝑡+1
̃ = 𝐴𝑥 ̃ ̃ + 𝐵𝑧
̃ 𝑡+1
𝑡
𝑦𝑡̃ = 𝐷̃ 𝑥𝑡̃
69.5 Code
The class AMF_LSS_VAR mentioned above does all that we want to study our additive
functional
In fact, AMF_LSS_VAR does more because it allows us to study an associated multiplicative
functional as well
(A hint that it does more is the name of the class – here AMF stands for “additive and mul-
tiplicative functional” – the code computes and displays objects associated with multiplicative
functionals too)
Let’s use this code (embedded above) to explore the example process described above
If you run the code that first simulated that example again and then the method call you will
generate (modulo randomness) the plot
In [4]: amf.plot_additive(T)
plt.show()
/home/anju/anaconda3/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py:1920: RuntimeWarning: in
lower_bound = self.a * scale + loc
/home/anju/anaconda3/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py:1921: RuntimeWarning: in
upper_bound = self.b * scale + loc
69.5. CODE 1121
When we plot multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also
plot the population 95% probability coverage sets computed using the LinearStateSpace class
We have chosen to simulate many paths, all starting from the same non-random initial condi-
tions 𝑥0 , 𝑦0 (you can tell this from the shape of the 95% probability coverage shaded areas)
Notice tell-tale signs of these probability coverage shaded areas
√
• the purple one for the martingale component 𝑚𝑡 grows with 𝑡
• the green one for the stationary component 𝑠𝑡 converges to a constant band
𝑡
𝑀𝑡
= exp(𝑡𝜈) exp(∑ 𝐻 ⋅ 𝑍𝑗 ) exp(𝐷′ (𝐼 − 𝐴)−1 𝑥0 − 𝐷′ (𝐼 − 𝐴)−1 𝑥𝑡 )
𝑀0 𝑗=1
or
𝑀𝑡 ̃
𝑀 𝑒(𝑋
̃ 0)
̃ ( 𝑡)(
= exp (𝜈𝑡) )
𝑀0 ̃
𝑀0 𝑒(𝑥
̃ 𝑡)
where
𝑡
𝐻 ⋅𝐻 ̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )), ̃0 = 1
𝜈̃ = 𝜈 + , 𝑀 𝑀
2 𝑗=1
2
1122 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
and
𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷′ (𝐼 − 𝐴)−1 𝑥]
In [5]: amf.plot_multiplicative(T)
plt.show()
As before, when we plotted multiple realizations of a component in the 2nd, 3rd, and 4th
panels, we also plotted population 95% confidence bands computed using the LinearStateS-
pace class
Comparing this figure and the last also helps show how geometric growth differs from arith-
metic growth
The top right panel of the above graph shows a panel of martingales associated with the
panel of 𝑀𝑡 = exp(𝑦𝑡 ) that we have generated for a limited horizon 𝑇
It is interesting to how the martingale behaves as 𝑇 → +∞
Let’s see what happens when we set 𝑇 = 12000 instead of 150
̃𝑡 of the multiplicative
Hansen and Sargent [58] (ch. 8) note that the martingale component 𝑀
decomposition
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1123
In [6]: np.random.seed(10021987)
amf.plot_martingales(12000)
plt.show()
The dotted line in the above graph is the mean 𝐸 𝑀̃ 𝑡 = 1 of the martingale
It remains constant at unity, illustrating the first property
The purple 95 percent coverage intervale collapses around zero, illustrating the second prop-
erty
̃𝑡 }∞
Let’s drill down and study probability distribution of the multiplicative martingale {𝑀 𝑡=0
in more detail
As we have seen, it has representation
𝑡
̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )),
𝑀 ̃0 = 1
𝑀
𝑗=1
2
1124 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
where 𝐻 = [𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷]
̃𝑡 ∼ 𝒩(− 𝑡𝐻⋅𝐻 , 𝑡𝐻 ⋅ 𝐻) and that consequently 𝑀
It follows that log 𝑀 ̃𝑡 is log normal
2
In particular, we want to simulate 5000 sample paths of length 𝑇 for the case in which 𝑥 is a
scalar and [𝐴, 𝐵, 𝐷, 𝐹 ] = [0.8, 0.001, 1.0, 0.01] and 𝜈 = 0.005
After accomplishing this, we want to display and stare at histograms of 𝑀̃ 𝑇𝑖 for various values
of 𝑇
Here is code that accomplishes these tasks
We’ll do this by formulating the additive functional as a linear state space model and putting
the LinearStateSpace class to work
In [7]: """
"""
import numpy as np
import scipy as sp
import scipy.linalg as la
import quantecon as qe
import matplotlib.pyplot as plt
from scipy.stats import lognorm
class AMF_LSS_VAR:
"""
This class is written to transform a scalar additive functional
into a linear state space system.
"""
def __init__(self, A, B, D, F=0.0, ν=0.0):
# Unpack required elements
self.A, self.B, self.D, self.F, self.ν = A, B, D, F, ν
def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
nx, nk, nm = 1, 1, 1
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1125
if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition (Proposition 4.3.3.)
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (kappa_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
A_res = 1 / (1 - self.A)
g = self.D * A_res
H = self.F + self.D * A_res * self.B
return self.ν, H, g
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5) * H**2
return ν_tilde, H, g
return llh[-1]
return x, y
# Allocate space
storeX = np.empty((I, T))
storeY = np.empty((I, T))
for i in range(I):
# Do specific simulation
x, y = simulate_xy(amf, T)
Now that we have these functions in our took kit, let’s apply them to run some simulations
# Allocate space
add_mart_comp = np.empty((I, T))
# Build model
amf_2 = AMF_LSS_VAR(0.8, 0.001, 1.0, 0.01,.005)
This is peculiar, so make sure you are careful in working with the log normal distribution
Here is some code that tackles these tasks
# The distribution
mdist = lognorm(np.sqrt(t * H2), scale=np.exp(-t * H2 / 2))
x = np.linspace(xmin, xmax, npts)
pdf = mdist.pdf(x)
return x, pdf
# The distribution
lmdist = norm(-t * H2 / 2, np.sqrt(t * H2))
x = np.linspace(xmin, xmax, npts)
pdf = lmdist.pdf(x)
return x, pdf
1128 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
plt.tight_layout()
plt.show()
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1129
1130 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
These probability density functions help us understand mechanics underlying the peculiar
property of our multiplicative martingale
70.1 Contents
• Overview 70.2
• Implementation 70.7
• Exercises 70.8
70.2 Overview
In this lecture and a companion lecture Classical Filtering with Linear Algebra, we study the
classical theory of linear-quadratic (LQ) optimal control problems.
The classical approach does not use the two closely related methods – dynamic programming
and Kalman filtering – that we describe in other lectures, namely, Linear Quadratic Dynamic
Programming Problems and A First Look at the Kalman Filter
Instead, they use either
1131
1132 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In this lecture and the sequel Classical Filtering with Linear Algebra, we mostly rely on ele-
mentary linear algebra
The main tool from linear algebra we’ll put to work here is LU decomposition
We’ll begin with discrete horizon problems
Then we’ll view infinite horizon problems as appropriate limits of these finite horizon prob-
lems
Later, we will examine the close connection between LQ control and least-squares prediction
and filtering problems
These classes of problems are connected in the sense that to solve each, essentially the same
mathematics is used
70.2.1 References
Let 𝐿 be the lag operator, so that, for sequence {𝑥𝑡 } we have 𝐿𝑥𝑡 = 𝑥𝑡−1
More generally, let 𝐿𝑘 𝑥𝑡 = 𝑥𝑡−𝑘 with 𝐿0 𝑥𝑡 = 𝑥𝑡 and
𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚
𝑁
1 2 1 2
max lim ∑ 𝛽 𝑡 {𝑎𝑡 𝑦𝑡 − ℎ𝑦 − [𝑑(𝐿)𝑦𝑡 ] } , (1)
{𝑦𝑡 } 𝑁→∞
𝑡=0
2 𝑡 2
where
Maximization in Eq. (1) is subject to initial conditions for 𝑦−1 , 𝑦−2 … , 𝑦−𝑚
Maximization is over infinite sequences {𝑦𝑡 }𝑡≥0
70.4. FINITE HORIZON THEORY 1133
70.3.1 Example
The formulation of the LQ problem given above is broad enough to encompass many useful
models
As a simple illustration, recall that in LQ Dynamic Programming Problems we consider a
monopolist facing stochastic demand shocks and adjustment costs
Let’s consider a deterministic version of this problem, where the monopolist maximizes the
discounted sum
∞
∑ 𝛽 𝑡 𝜋𝑡
𝑡=0
and
• 𝑎𝑡 ∶= 𝛼0 + 𝑑𝑡 − 𝑐
• ℎ ∶= 2𝛼1
√
• 𝑑(𝐿) ∶= 2𝛾(𝐼 − 𝐿)
Further examples of this problem for factor demand, economic growth, and government policy
problems are given in ch. IX of [118]
1. fixing 𝑁 > 𝑚,
2. differentiating the finite version of Eq. (1) with respect to 𝑦0 , 𝑦1 , … , 𝑦𝑁 , and
3. setting these derivatives to zero
𝑁
𝐽 = ∑ 𝛽 𝑡 [𝑑(𝐿)𝑦𝑡 ][𝑑(𝐿)𝑦𝑡 ]
𝑡=0
𝑁
= ∑ 𝛽 𝑡 (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 ) (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 )
𝑡=0
𝜕𝐽
= 2𝛽 𝑡 𝑑0 𝑑(𝐿)𝑦𝑡 + 2𝛽 𝑡+1 𝑑1 𝑑(𝐿)𝑦𝑡+1 + ⋯ + 2𝛽 𝑡+𝑚 𝑑𝑚 𝑑(𝐿)𝑦𝑡+𝑚
𝜕𝑦𝑡
= 2𝛽 𝑡 (𝑑0 + 𝑑1 𝛽𝐿−1 + 𝑑2 𝛽 2 𝐿−2 + ⋯ + 𝑑𝑚 𝛽 𝑚 𝐿−𝑚 ) 𝑑(𝐿)𝑦𝑡
𝜕𝐽
= 2𝛽 𝑡 𝑑(𝛽𝐿−1 ) 𝑑(𝐿)𝑦𝑡 (2)
𝜕𝑦𝑡
𝜕𝐽
= 2𝛽 𝑁 𝑑0 𝑑(𝐿)𝑦𝑁
𝜕𝑦𝑁
𝜕𝐽
= 2𝛽 𝑁−1 [𝑑0 + 𝛽 𝑑1 𝐿−1 ] 𝑑(𝐿)𝑦𝑁−1
𝜕𝑦𝑁−1 (3)
⋮ ⋮
𝜕𝐽
= 2𝛽 𝑁−𝑚+1 [𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 ]𝑑(𝐿)𝑦𝑁−𝑚+1
𝜕𝑦𝑁−𝑚+1
With these preliminaries under our belts, we are ready to differentiate Eq. (1)
Differentiating Eq. (1) with respect to 𝑦𝑡 for 𝑡 = 0, … , 𝑁 − 𝑚 gives the Euler equations
The system of equations Eq. (4) forms a 2 × 𝑚 order linear difference equation that must hold
for the values of 𝑡 indicated.
Differentiating Eq. (1) with respect to 𝑦𝑡 for 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 gives the terminal condi-
tions
These conditions uniquely pin down the solution of the finite 𝑁 problem
That is, for the finite 𝑁 problem, conditions Eq. (4) and Eq. (5) are necessary and sufficient
for a maximum, by concavity of the objective function
Next, we describe how to obtain the solution using matrix methods
Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ
control problem
A Single Lag Term
Let’s begin with the special case in which 𝑚 = 1
We want to solve the system of 𝑁 + 1 linear equations
[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑 (𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 1
𝑁
(6)
𝛽 [𝑎𝑁 − ℎ 𝑦𝑁 − 𝑑0 𝑑 (𝐿)𝑦𝑁 ] = 0
where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿
These equations are to be solved for 𝑦0 , 𝑦1 , … , 𝑦𝑁 as functions of 𝑎0 , 𝑎1 , … , 𝑎𝑁 and 𝑦−1
Let
(𝜙0 − 𝑑12 ) 𝜙1 0 0 … … 0 𝑦𝑁 𝑎𝑁
⎡ 𝛽𝜙 𝜙 𝜙 0 … … 0 ⎤ ⎡𝑦 ⎤ ⎡ 𝑎 ⎤
⎢ 1 0 1 ⎥ ⎢ 𝑁−1 ⎥ ⎢ 𝑁−1 ⎥
⎢ 0 𝛽𝜙1 𝜙0 𝜙1 … … 0 ⎥ ⎢𝑦𝑁−2 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥ ⎢ ⎥=⎢ ⎥ (7)
⎢ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 … … … 𝛽𝜙1 𝜙0 𝜙1 ⎥ ⎢ 𝑦1 ⎥ ⎢ 𝑎1 ⎥
⎣ 0 … … … 0 𝛽𝜙 1 𝜙 0⎦ ⎣ 𝑦0 ⎦ 𝑎
⎣ 0 − 𝜙 𝑦
1 −1 ⎦
or
𝑊 𝑦 ̄ = 𝑎̄ (8)
1. The first element differs from the remaining diagonal elements, reflecting the terminal
condition
2. The sub-diagonal elements equal 𝛽 time the super-diagonal elements
𝑦 ̄ = 𝑊 −1 𝑎̄ (9)
The factorization can be normalized so that the diagonal elements of 𝑈 are unity
Using the LU representation in Eq. (9), we obtain
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (10)
Because there are zeros everywhere in the matrix on the left of Eq. (7) except on the diago-
nal, super-diagonal, and sub-diagonal, the 𝐿𝑈 decomposition takes
1 𝑈12 0 0 … 0 0 𝑦𝑁
⎡0 1 𝑈 0 … 0 0 ⎤ ⎡𝑦 ⎤
⎢ 23 ⎥ ⎢ 𝑁−1 ⎥
⎢0 0 1 𝑈34 … 0 0 ⎥ ⎢𝑦𝑁−2 ⎥
⎢0 0 0 1 … 0 0 ⎥ ⎢𝑦𝑁−3 ⎥ =
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 … 1 𝑈𝑁,𝑁+1 ⎥ ⎢ 𝑦1 ⎥
⎣0 0 0 0 … 0 1 ⎦ ⎣ 𝑦0 ⎦
70.4. FINITE HORIZON THEORY 1137
𝐿−1
11 0 0 … 0 𝑎𝑁
⎡ 𝐿−1 𝐿−1
0 … 0 ⎤⎡ 𝑎 ⎤
⎢ 21
−1
22
−1
⎥⎢ 𝑁−1 ⎥
⎢ 𝐿31 𝐿32 𝐿−1
33 … 0 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥
⎢ −1 −1 −1
⎥⎢ ⎥
⎢ 𝐿𝑁,1 𝐿𝑁,2 𝐿𝑁,3 … 0 ⎥⎢ 𝑎1 ⎥
−1 −1 −1
𝐿
⎣ 𝑁+1,1 𝐿𝑁+1,2 𝐿𝑁+1,3 … 𝐿−1
𝑁+1 𝑁+1 ⎦ ⎣𝑎 0 − 𝜙 𝑦
1 −1 ⎦
where 𝐿−1
𝑖𝑗 is the (𝑖, 𝑗) element of 𝐿
−1
and 𝑈𝑖𝑗 is the (𝑖, 𝑗) element of 𝑈
Note how the left side for a given 𝑡 involves 𝑦𝑡 and one lagged value 𝑦𝑡−1 while the right side
involves all future values of the forcing process 𝑎𝑡 , 𝑎𝑡+1 , … , 𝑎𝑁
Additional Lag Terms
We briefly indicate how this approach extends to the problem with 𝑚 > 1
Assume that 𝛽 = 1 and let 𝐷𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) symmetric matrix whose elements
are determined from the following formula:
𝑦𝑁 𝑎𝑁 𝑦𝑁−𝑚+1
⎡𝑦 ⎤ ⎡𝑎 ⎤ ⎡𝑦 ⎤
(𝐷𝑚+1 + ℎ𝐼𝑚+1 ) ⎢ 𝑁−1 ⎥ = ⎢ 𝑁−1 ⎥ + 𝑀 ⎢ 𝑁−𝑚−2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑦𝑁−𝑚 ⎦ ⎣𝑎𝑁−𝑚 ⎦ ⎣ 𝑦𝑁−2𝑚 ⎦
where 𝑀 is (𝑚 + 1) × 𝑚 and
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (11)
𝑡 𝑁−𝑡
∑ 𝑈−𝑡+𝑁+1, −𝑡+𝑁+𝑗+1 𝑦𝑡−𝑗 = ∑ 𝐿−𝑡+𝑁+1, −𝑡+𝑁+1−𝑗 𝑎𝑡+𝑗
̄ ,
𝑗=0 𝑗=0
𝑡 = 0, 1, … , 𝑁
where 𝐿−1
𝑡,𝑠 is the element in the (𝑡, 𝑠) position of 𝐿, and similarly for 𝑈
The left side of equation Eq. (11) is the “feedback” part of the optimal control law for 𝑦𝑡 ,
while the right-hand side is the “feedforward” part
We note that there is a different control law for each 𝑡
Thus, in the finite horizon case, the optimal control law is time-dependent
It is natural to suspect that as 𝑁 → ∞, Eq. (11) becomes equivalent to the solution of our
infinite horizon problem, which below we shall show can be expressed as
−1
so that as 𝑁 → ∞ we expect that for each fixed 𝑡, 𝑈𝑡,𝑡−𝑗 → 𝑐𝑗 and 𝐿𝑡,𝑡+𝑗 approaches the
−𝑗 −1 −1
coefficient on 𝐿 in the expansion of 𝑐(𝛽𝐿 )
This suspicion is true under general conditions that we shall study later
For now, we note that by creating the matrix 𝑊 for large 𝑁 and factoring it into the 𝐿𝑈
form, good approximations to 𝑐(𝐿) and 𝑐(𝛽𝐿−1 )−1 can be obtained
For the infinite horizon problem, we propose to discover first-order necessary conditions by
taking the limits of Eq. (4) and Eq. (5) as 𝑁 → ∞
This approach is valid, and the limits of Eq. (4) and Eq. (5) as 𝑁 approaches infinity are
first-order necessary conditions for a maximum
However, for the infinite horizon problem with 𝛽 < 1, the limits of Eq. (4) and Eq. (5) are, in
general, not sufficient for a maximum
That is, the limits of Eq. (5) do not provide enough information uniquely to determine the
solution of the Euler equation Eq. (4) that maximizes Eq. (1)
As we shall see below, a side condition on the path of 𝑦𝑡 that together with Eq. (4) is suffi-
cient for an optimum is
∞
∑ 𝛽 𝑡 ℎ𝑦𝑡2 < ∞ (12)
𝑡=0
All paths that satisfy the Euler equations, except the one that we shall select below, violate
this condition and, therefore, evidently lead to (much) lower values of Eq. (1) than does the
optimal path selected by the solution procedure below
Consider the characteristic equation associated with the Euler equation
70.5. THE INFINITE HORIZON LIMIT 1139
where 𝑧0 is a constant
1 𝛽 −1
In Eq. (14), we substitute (𝑧 − 𝑧𝑗 ) = −𝑧𝑗 (1 − 𝑧𝑗 𝑧) and (𝑧 − 𝛽𝑧𝑗−1 ) = 𝑧(1 − 𝑧𝑗 𝑧 ) for 𝑗 =
1, … , 𝑚 to get
1 1 1 1
ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = (−1)𝑚 (𝑧0 𝑧1 ⋯ 𝑧𝑚 )(1 − 𝑧) ⋯ (1 − 𝑧)(1 − 𝛽𝑧−1 ) ⋯ (1 − 𝛽𝑧 −1 )
𝑧1 𝑧𝑚 𝑧1 𝑧𝑚
𝑚
Now define 𝑐(𝑧) = ∑𝑗=0 𝑐𝑗 𝑧𝑗 as
1/2 𝑧 𝑧 𝑧
𝑐 (𝑧) = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] (1 − ) (1 − ) ⋯ (1 − ) (15)
𝑧1 𝑧2 𝑧𝑚
𝑐(𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧) (17)
where
1/2 1
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] ; 𝜆𝑗 = , 𝑗 = 1, … , 𝑚
𝑧𝑗
√ √
Since |𝑧𝑗 | > 𝛽 for 𝑗 = 1, … , 𝑚 it follows that |𝜆𝑗 | < 1/ 𝛽 for 𝑗 = 1, … , 𝑚
Using Eq. (17), we can express the factorization Eq. (16) as
In sum, we have constructed a factorization Eq. (16) of the characteristic polynomial for the
Euler equation in which the zeros of 𝑐(𝑧) exceed 𝛽 1/2 in modulus, and the zeros of 𝑐 (𝛽𝑧 −1 )
are less than 𝛽 1/2 in modulus
Using Eq. (16), we now write the Euler equation as
𝑐(𝛽𝐿−1 ) 𝑐 (𝐿) 𝑦𝑡 = 𝑎𝑡
The unique solution of the Euler equation that satisfies condition Eq. (12) is
𝑐0−2 𝑎𝑡
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = (19)
(1 − 𝛽𝜆1 𝐿−1 ) ⋯ (1 − 𝛽𝜆𝑚 𝐿−1 )
Using partial fractions, we can write the characteristic polynomial on the right side of
Eq. (19) as
𝑚
𝐴𝑗 𝑐0−2
∑ where 𝐴𝑗 ∶= 𝜆𝑖
1 − 𝜆𝑗 𝛽𝐿−1 ∏𝑖≠𝑗 (1 −
𝑗=1 𝜆𝑗 )
𝑚
𝐴𝑗
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝑎
𝑗=1
1 − 𝜆𝑗 𝛽𝐿−1 𝑡
or
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (20)
𝑗=1 𝑘=0
Equation Eq. (20) expresses the optimum sequence for 𝑦𝑡 in terms of 𝑚 lagged 𝑦’s, and 𝑚
weighted infinite geometric sums of future 𝑎𝑡 ’s
Furthermore, Eq. (20) is the unique solution of the Euler equation that satisfies the initial
conditions and condition Eq. (12)
In effect, condition Eq. (12) compels us to solve the “unstable” roots of ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) for-
ward (see [118])
−1 −1
The step of factoring the polynomial
√ ℎ + 𝑑(𝛽𝑧 ) 𝑑(𝑧) into 𝑐 (𝛽𝑧 )𝑐 (𝑧), where the zeros of
𝑐 (𝑧) all have modulus exceeding 𝛽, is central to solving the problem
We note two features of the solution Eq. (20)
√ √
• Since |𝜆𝑗 | < 1/ 𝛽 for all 𝑗, it follows that (𝜆𝑗 𝛽) < 𝛽 √
• The assumption that {𝑎𝑡 } is of exponential order less than 1/ 𝛽 is sufficient to guaran-
tee that the geometric sums of future 𝑎𝑡 ’s on the right side of Eq. (20) converge
70.6. UNDISCOUNTED PROBLEMS 1141
We immediately see that those sums will converge under the weaker condition that {𝑎𝑡 } is of
exponential order less than 𝜙−1 where 𝜙 = max {𝛽𝜆𝑖 , 𝑖 = 1, … , 𝑚}
Note that with 𝑎𝑡 identically zero, Eq. (20) implies that in general |𝑦𝑡 | eventually grows expo-
nentially at a rate given by max𝑖 |𝜆𝑖 |
√
The condition max𝑖 |𝜆𝑖 | < 1/ 𝛽 guarantees that condition Eq. (12) is satisfied
√
In fact, max𝑖 |𝜆𝑖 | < 1/ 𝛽 is a necessary condition for Eq. (12) to hold
Were Eq. (12) not satisfied, the objective function would diverge to −∞, implying that the 𝑦𝑡
path could not be optimal
For example, with 𝑎𝑡 = 0, for all 𝑡, it is easy to describe a naive (nonoptimal) policy for
{𝑦𝑡 , 𝑡 ≥ 0} that gives a finite value of Eq. (17)
We can simply let 𝑦𝑡 = 0 for 𝑡 ≥ 0
This policy involves at most 𝑚 nonzero values of ℎ𝑦𝑡2 and [𝑑(𝐿)𝑦𝑡 ]2 , and so yields a finite
value of Eq. (1)
Therefore it is easy to dominate a path that violates Eq. (12)
It is worthwhile focusing on a special case of the LQ problems above: the undiscounted prob-
lem that emerges when 𝛽 = 1
In this case, the Euler equation is
(ℎ + 𝑑(𝐿−1 )𝑑(𝐿)) 𝑦𝑡 = 𝑎𝑡
(ℎ + 𝑑 (𝑧 −1 )𝑑(𝑧)) = 𝑐 (𝑧 −1 ) 𝑐 (𝑧)
where
𝑐 (𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧)
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 … 𝑧𝑚 ]
|𝜆𝑗 | < 1 for 𝑗 = 1, … , 𝑚
1
𝜆𝑗 = for 𝑗 = 1, … , 𝑚
𝑧𝑗
𝑧0 = constant
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
1142 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA
Discounted problems can always be converted into undiscounted problems via a simple trans-
formation
Consider problem Eq. (1) with 0 < 𝛽 < 1
Define the transformed variables
𝑚
Then notice that 𝛽 𝑡 [𝑑 (𝐿)𝑦𝑡 ]2 = [𝑑 ̃(𝐿)𝑦𝑡̃ ]2 with 𝑑 ̃(𝐿) = ∑𝑗=0 𝑑𝑗̃ 𝐿𝑗 and 𝑑𝑗̃ = 𝛽 𝑗/2 𝑑𝑗
Then the original criterion function Eq. (1) is equivalent to
𝑁
1 1
lim ∑{𝑎𝑡̃ 𝑦𝑡̃ − ℎ 𝑦𝑡2̃ − [𝑑 ̃(𝐿) 𝑦𝑡̃ ]2 } (22)
𝑁→∞
𝑡=0
2 2
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿) 𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ , (23)
𝑗=1 𝑘=0
1/2
[(−1)𝑚 𝑧0̃ 𝑧1̃ … 𝑧𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) … (1 − 𝜆̃ 𝑚 𝑧) = 𝑐 ̃ (𝑧), where |𝜆̃ 𝑗 | < 1
We leave it to the reader to show that Eq. (23) implies the equivalent form of the solution
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + ⋯ + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘
𝑗=1 𝑘=0
where
The transformations Eq. (21) and the inverse formulas Eq. (24) allow us to solve a discounted
problem by first solving a related undiscounted problem
70.7. IMPLEMENTATION 1143
70.7 Implementation
Code that computes solutions to the LQ problem using the methods described above can be
found in file control_and_filter.py
Here’s how it looks
In [1]: """
"""
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of � upfront
#---------------------------------------------
� = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
�[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) @ \
self.d.reshape(1, self.m + 1), k=-i))
�[self.m] = �[self.m] + self.h
self.� = �
#-----------------------------------------------------
# If r is given calculate the vector �_r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
�_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
�_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1, 1) @ \
self.r.reshape(1, self.k + 1), k=-i))
1144 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA
if h_eps is None:
self.�_r = �_r
else:
�_r[self.k] = �_r[self.k] + h_eps
self.�_r = �_r
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)).reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
� = self.�
for i in range(m):
W_m[N - i, :(m - i)] = �[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic equation
associated with the Euler equation (1.7)
70.7. IMPLEMENTATION 1145
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can be
evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
� = self.�
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = �_r[self.k + abs(i-j)]
return V
return d.rvs()
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary diagonal
matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1].reshape(N + 1, 1)
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)).reshape(N + 1 + self.m, 1)
70.7.1 Example
d = γ * np.asarray([1, -1])
y_m = np.asarray(y_m).reshape(m, 1)
plt.show()
plot_simulation()
In [3]: plot_simulation(γ=5)
And here’s 𝛾 = 10
70.8. EXERCISES 1149
In [4]: plot_simulation(γ=10)
70.8 Exercises
70.8.1 Exercise 1
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿)𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ (25)
𝑗=1 𝑘=0
Here
̃ −1 )𝑑(𝑧)
• ℎ + 𝑑(𝑧 ̃ = 𝑐(𝑧̃ −1 )𝑐(𝑧)
̃
• 𝑐(𝑧) 𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) ⋯ (1 − 𝜆̃ 𝑚 𝑧)
̃ = [(−1) 𝑧0̃ 𝑧1̃ ⋯ 𝑧𝑚 1/2
̃ −1 ) 𝑑(𝑧)
where the 𝑧𝑗̃ are the zeros of ℎ + 𝑑(𝑧 ̃
1150 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA
Prove that Eq. (25) implies that the solution for 𝑦𝑡 in feedback form is
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + … + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ 𝛽 𝑘 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
70.8.2 Exercise 2
2
1
∑ {𝑎𝑡 𝑦𝑡 − [(1 − 2𝐿)𝑦𝑡 ]2 }
𝑡=0
2
70.8.3 Exercise 3
𝑁
1
lim ∑ − [(1 − 2𝐿)𝑦𝑡 ]2 ,
𝑁→∞
𝑡=0
2
70.8.4 Exercise 4
𝑁
1
lim ∑ (.0000001) 𝑦𝑡2 − [(1 − 2𝐿)𝑦𝑡 ]2
𝑁→∞
𝑡=0
2
subject to 𝑦−1 given. Prove that the solution 𝑦𝑡 = 2𝑦𝑡−1 violates condition Eq. (12), and so is
not optimal
Prove that the optimal solution is approximately 𝑦𝑡 = .5𝑦𝑡−1
71
71.1 Contents
• Overview 71.2
• Finite Dimensional Prediction 71.3
• Combined Finite Dimensional Control and Prediction 71.4
• Infinite Horizon Prediction and Filtering Problems 71.5
• Exercises 71.6
71.2 Overview
This is a sequel to the earlier lecture Classical Control with Linear Algebra
That lecture used linear algebra – in particular, the LU decomposition – to formulate and
solve a class of linear-quadratic optimal control problems
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition, to
solve linear prediction and filtering problems
We exploit the useful fact that there is an intimate connection between two superficially dif-
ferent classes of problems:
The first class of problems involves no randomness, while the second is all about randomness
Nevertheless, essentially the same mathematics solves both types of problem
This connection, which is often termed “duality,” is present whether one uses “classical” or
“recursive” solution procedures
In fact, we saw duality at work earlier when we formulated control and prediction problems
recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter,
and The permanent income model
1151
1152 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
• With every LQ control problem, there is implicitly affiliated a linear least squares pre-
diction or filtering problem
• With every linear least squares prediction or filtering problem there is implicitly affili-
ated a LQ control problem
71.2.1 References
The key insight here comes from noting that because the covariance matrix 𝑉 of 𝑥 is a posi-
tive definite and symmetric, there exists a (Cholesky) decomposition of 𝑉 such that
𝑉 = 𝐿−1 (𝐿−1 )′
and
𝐿 𝑉 𝐿′ = 𝐼
• 𝐿 is nonsingular
• E 𝜀 𝜀′ = 𝐿E𝑥𝑥′ 𝐿′ = 𝐼
• 𝑥 = 𝐿−1 𝜀
𝐿11 𝑥1 = 𝜀1
𝐿21 𝑥1 + 𝐿22 𝑥2 = 𝜀2
(1)
⋮
𝐿𝑇 1 𝑥1 … + 𝐿𝑇 𝑇 𝑥𝑇 = 𝜀𝑇
or
𝑡−1
∑ 𝐿𝑡,𝑡−𝑗 𝑥𝑡−𝑗 = 𝜀𝑡 , 𝑡 = 1, 2, … 𝑇 (2)
𝑗=0
𝑥1 = 𝐿−1
11 𝜀1
𝑥2 = 𝐿−1 −1
22 𝜀2 + 𝐿21 𝜀1
, (3)
⋮
𝑥𝑇 = 𝐿−1 −1 −1
𝑇 𝑇 𝜀𝑇 + 𝐿𝑇 ,𝑇 −1 𝜀𝑇 −1 … + 𝐿𝑇 ,1 𝜀1
or
𝑡−1
𝑥𝑡 = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (4)
𝑗=0
where 𝐿−1
𝑖,𝑗 denotes the 𝑖, 𝑗 element of 𝐿
−1
From Eq. (2), it follows that 𝜀𝑡 is in the linear subspace spanned by 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥1
From Eq. (4) it follows that that 𝑥𝑡 is in the linear subspace spanned by 𝜀𝑡 , 𝜀𝑡−1 , … , 𝜀1
1154 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
To proceed, it is useful to drill down and note that for 𝑡 − 1 ≥ 𝑚 ≥ 1 we can rewrite Eq. (4)
in the form of the moving average representation
𝑚−1 𝑡−1
𝑥𝑡 = ∑ 𝐿−1 −1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 + ∑ 𝐿𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (6)
𝑗=0 𝑗=𝑚
𝑡−1
Representation Eq. (6) is an orthogonal decomposition of 𝑥𝑡 into a part ∑𝑗=𝑚 𝐿−1 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
that lies in the space spanned by [𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … , 𝑥1 ] and an orthogonal component
𝑡−1
∑𝑗=𝑚 𝐿−1 𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that does not line in that space but instead in a linear space knowns as its
orthogonal complement
It follows that
𝑚−1
E[𝑥𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚−1 , … , 𝑥1 ] = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=0
71.3.1 Implementation
Code that computes solutions to LQ control and filtering problems using the methods de-
scribed here and in Classical Control with Linear Algebra can be found in the file con-
trol_and_filter.py
Here’s how it looks
In [1]: """
"""
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
71.3. FINITE DIMENSIONAL PREDICTION 1155
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of � upfront
#---------------------------------------------
� = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
�[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) @ \
self.d.reshape(1, self.m + 1), k=-i))
�[self.m] = �[self.m] + self.h
self.� = �
#-----------------------------------------------------
# If r is given calculate the vector �_r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
�_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
�_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1, 1) @ \
self.r.reshape(1, self.k + 1), k=-i))
if h_eps is None:
self.�_r = �_r
else:
�_r[self.k] = �_r[self.k] + h_eps
self.�_r = �_r
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)).reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
1156 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
� = self.�
for i in range(m):
W_m[N - i, :(m - i)] = �[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic equation
associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can be
evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
� = self.�
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = �_r[self.k + abs(i-j)]
return V
return d.rvs()
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary diagonal
matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1].reshape(N + 1, 1)
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)).reshape(N + 1 + self.m, 1)
71.3.2 Example 1
𝑥𝑡 = (1 − 2𝐿)𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity
71.3. FINITE DIMENSIONAL PREDICTION 1159
If we were to use the tools associated with infinite dimensional prediction and filtering to be
described below, we would use the Wiener-Kolmogorov formula Eq. (21) to compute the lin-
ear least squares forecasts E[𝑥𝑡+𝑗 ∣ 𝑥𝑡 , 𝑥𝑡−1 , …], for 𝑗 = 1, 2
But we can do everything we want by instead using our finite dimensional tools and setting
𝑑 = 𝑟, generating an instance of LQFilter, then invoking pertinent methods of LQFilter
In [2]: m = 1
y_m = np.asarray([.0]).reshape(m, 1)
d = np.asarray([1, -2])
r = np.asarray([1, -2])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
In [3]: example.coeffs_of_c()
In [4]: example.roots_of_characteristic()
Now let’s form the covariance matrix of a time series vector of length 𝑁 and put it in 𝑉
Then we’ll take a Cholesky decomposition of 𝑉 = 𝐿−1 𝐿−1 and use it to form the vector of
“moving average representations” 𝑥 = 𝐿−1 𝜀 and the vector of “autoregressive representations”
𝐿𝑥 = 𝜀
In [5]: V = example.construct_V(N=5)
print(V)
[[ 5. -2. 0. 0. 0.]
[-2. 5. -2. 0. 0.]
[ 0. -2. 5. -2. 0.]
[ 0. 0. -2. 5. -2.]
[ 0. 0. 0. -2. 5.]]
Notice how the lower rows of the “moving average representations” are converging to the ap-
propriate infinite history Wold representation to be described below when we study infinite
horizon-prediction and filtering
In [6]: Li = np.linalg.cholesky(V)
print(Li)
[[ 2.23606798 0. 0. 0. 0. ]
[-0.89442719 2.04939015 0. 0. 0. ]
[ 0. -0.97590007 2.01186954 0. 0. ]
[ 0. 0. -0.99410024 2.00293902 0. ]
[ 0. 0. 0. -0.99853265 2.000733 ]]
Notice how the lower rows of the “autoregressive representations” are converging to the ap-
propriate infinite-history autoregressive representation to be described below when we study
infinite horizon-prediction and filtering
1160 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
In [7]: L = np.linalg.inv(Li)
print(L)
[[ 0.4472136 0. 0. 0. 0. ]
[ 0.19518001 0.48795004 0. 0. 0. ]
[ 0.09467621 0.23669053 0.49705012 0. 0. ]
[ 0.04698977 0.11747443 0.2466963 0.49926632 -0. ]
[ 0.02345182 0.05862954 0.12312203 0.24917554 0.49981682]]
71.3.3 Example 2
√
𝑋𝑡 = (1 − 2𝐿2 )𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity
Let’s find a Wold moving average representation for 𝑥𝑡 that will prevail in the infinite-history
context to be studied in detail below
To do this, we’ll use the Wiener-Kolomogorov formula Eq. (21) presented below to compute
the linear least squares forecasts E [𝑋𝑡+𝑗 ∣ 𝑋𝑡−1 , …] for 𝑗 = 1, 2, 3
We proceed in the same way as in example 1
In [8]: m = 2
y_m = np.asarray([.0, .0]).reshape(m, 1)
d = np.asarray([1, 0, -np.sqrt(2)])
r = np.asarray([1, 0, -np.sqrt(2)])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()
In [9]: example.roots_of_characteristic()
In [10]: V = example.construct_V(N=8)
print(V)
[[ 3. 0. -1.41421356 0. 0. 0.
0. 0. ]
[ 0. 3. 0. -1.41421356 0. 0.
0. 0. ]
[-1.41421356 0. 3. 0. -1.41421356 0.
0. 0. ]
[ 0. -1.41421356 0. 3. 0. -1.41421356
0. 0. ]
[ 0. 0. -1.41421356 0. 3. 0.
-1.41421356 0. ]
[ 0. 0. 0. -1.41421356 0. 3.
0. -1.41421356]
[ 0. 0. 0. 0. -1.41421356 0.
3. 0. ]
[ 0. 0. 0. 0. 0. -1.41421356
0. 3. ]]
71.3. FINITE DIMENSIONAL PREDICTION 1161
In [11]: Li = np.linalg.cholesky(V)
print(Li[-3:, :])
[[ 0. 0. 0. -0.9258201 0. 1.46385011
0. 0. ]
[ 0. 0. 0. 0. -0.96609178 0.
1.43759058 0. ]
[ 0. 0. 0. 0. 0. -0.96609178
0. 1.43759058]]
In [12]: L = np.linalg.inv(Li)
print(L)
[[0.57735027 0. 0. 0. 0. 0.
0. 0. ]
[0. 0.57735027 0. 0. 0. 0.
0. 0. ]
[0.3086067 0. 0.65465367 0. 0. 0.
0. 0. ]
[0. 0.3086067 0. 0.65465367 0. 0.
0. 0. ]
[0.19518001 0. 0.41403934 0. 0.68313005 0.
0. 0. ]
[0. 0.19518001 0. 0.41403934 0. 0.68313005
0. 0. ]
[0.13116517 0. 0.27824334 0. 0.45907809 0.
0.69560834 0. ]
[0. 0.13116517 0. 0.27824334 0. 0.45907809
0. 0.69560834]]
71.3.4 Prediction
It immediately follows from the “orthogonality principle” of least squares (see [9] or [118]
[ch. X]) that
𝑡−1
E[𝑥𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … 𝑥1 ] = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=𝑚 (7)
= [𝐿−1 −1 −1
𝑡,1 𝐿𝑡,2 , … , 𝐿𝑡,𝑡−𝑚 0 0 … 0]𝐿 𝑥
𝐼 0
E[𝑥 ∣ 𝑥𝑠 , 𝑥𝑠−1 , … , 𝑥1 ] = 𝐿−1 [ 𝑠 ] 𝐿𝑥 (8)
0 0(𝑡−𝑠)
This formula will be convenient in representing the solution of control problems under uncer-
tainty
Equation Eq. (4) can be recognized as a finite dimensional version of a moving average repre-
sentation
Equation Eq. (2) can be viewed as a finite dimension version of an autoregressive representa-
tion
1162 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
Notice that even if the 𝑥𝑡 process is covariance stationary, so that 𝑉 is such that 𝑉𝑖𝑗 depends
only on |𝑖 − 𝑗|, the coefficients in the moving average representation are time-dependent, there
being a different moving average for each 𝑡
If 𝑥𝑡 is a covariance stationary process, the last row of 𝐿−1 converges to the coefficients in the
Wold moving average representation for {𝑥𝑡 } as 𝑇 → ∞
Further, if 𝑥𝑡 is covariance stationary, for fixed 𝑘 and 𝑗 > 0, 𝐿−1 −1
𝑇 ,𝑇 −𝑗 converges to 𝐿𝑇 −𝑘,𝑇 −𝑘−𝑗
as 𝑇 → ∞
That is, the “bottom” rows of 𝐿−1 converge to each other and to the Wold moving average
coefficients as 𝑇 → ∞
This last observation gives one simple and widely-used practical way of forming a finite 𝑇 ap-
proximation to a Wold moving average representation
First, form the covariance matrix E𝑥𝑥′ = 𝑉 , then obtain the Cholesky decomposition
′
𝐿−1 𝐿−1 of 𝑉 , which can be accomplished quickly on a computer
The last row of 𝐿−1 gives the approximate Wold moving average coefficients
This method can readily be generalized to multivariate systems
𝑁
1 1
E ∑ {𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 } , ℎ>0
𝑡=0
2 2
𝑦−1
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
⎣𝑦−𝑚 ⎦
0 0
E[𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] = 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄
0 𝐼(𝑠+1)
(We have reversed the time axis in dating the 𝑎’s relative to earlier)
The time axis can be reversed in representation Eq. (8) by replacing 𝐿 with 𝐿𝑇
The optimal decision rule to use at time 0 ≤ 𝑡 ≤ 𝑁 is then given by the (𝑁 − 𝑡 + 1)th row of
𝑦−1
0 0
𝑈 𝑦 ̄ = 𝐿−1 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
0 𝐼(𝑡+1)
⎣𝑦−𝑚 ⎦
𝑌𝑡 = 𝑑(𝐿)𝑢𝑡 (9)
𝑚
where 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 , and 𝑢𝑡 is a serially uncorrelated stationary random process satisfy-
ing
E𝑢𝑡 = 0
1 if 𝑡 = 𝑠 (10)
E𝑢𝑡 𝑢𝑠 = {
0 otherwise
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 (11)
where 𝜀𝑡 is a serially uncorrelated stationary random process with E𝜀𝑡 = 0 and E𝜀𝑡 𝜀𝑠 = 0 for
all distinct 𝑡 and 𝑠
We also assume that E𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠
The linear least squares prediction problem is to find the 𝐿2 random variable 𝑋̂ 𝑡+𝑗
among linear combinations of {𝑋𝑡 , 𝑋𝑡−1 , …} that minimizes E(𝑋̂ 𝑡+𝑗 − 𝑋𝑡+𝑗 )2
∞ ∞
That is, the problem is to find a 𝛾𝑗 (𝐿) = ∑𝑘=0 𝛾𝑗𝑘 𝐿𝑘 such that ∑𝑘=0 |𝛾𝑗𝑘 |2 < ∞ and
E[𝛾𝑗 (𝐿)𝑋𝑡 − 𝑋𝑡+𝑗 ]2 is minimized
∞
The linear least squares filtering problem is to find a 𝑏 (𝐿) = ∑𝑗=0 𝑏𝑗 𝐿𝑗 such that
∞
∑𝑗=0 |𝑏𝑗 |2 < ∞ and E[𝑏 (𝐿)𝑋𝑡 − 𝑌𝑡 ]2 is minimized
1164 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
Interesting versions of these problems related to the permanent income theory were studied
by [98]
𝐶𝑋 (𝜏 ) = E𝑋𝑡 𝑋𝑡−𝜏
𝐶𝑌 (𝜏 ) = E𝑌𝑡 𝑌𝑡−𝜏 𝜏 = 0, ±1, ±2, … (12)
𝐶𝑌 ,𝑋 (𝜏 ) = E𝑌𝑡 𝑋𝑡−𝜏
∞
𝑔𝑋 (𝑧) = ∑ 𝐶𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
∞
𝑔𝑌 (𝑧) = ∑ 𝐶𝑌 (𝜏 )𝑧 𝜏 (13)
𝜏=−∞
∞
𝑔𝑌 𝑋 (𝑧) = ∑ 𝐶𝑌 𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
𝑦𝑡 = 𝐴(𝐿)𝑣1𝑡 + 𝐵(𝐿)𝑣2𝑡
𝑥𝑡 = 𝐶(𝐿)𝑣1𝑡 + 𝐷(𝐿)𝑣2𝑡
𝑔𝑌 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
𝑔𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 ) + ℎ (15)
−1
𝑔𝑌 𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 )
The key step in obtaining solutions to our problems is to factor the covariance generating
function 𝑔𝑋 (𝑧) of 𝑋
The solutions of our problems are given by formulas due to Wiener and Kolmogorov
71.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1165
These formulas utilize the Wold moving average representation of the 𝑋𝑡 process,
𝑋𝑡 = 𝑐 (𝐿) 𝜂𝑡 (16)
𝑚
where 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿𝑗 , with
Therefore, we have already shown constructively how to factor the covariance generating
function 𝑔𝑋 (𝑧) = 𝑑(𝑧) 𝑑 (𝑧 −1 ) + ℎ
We now introduce the annihilation operator:
∞ ∞
[ ∑ 𝑓𝑗 𝐿𝑗 ] ≡ ∑ 𝑓𝑗 𝐿𝑗 (20)
𝑗=−∞ 𝑗=0
+
𝑐(𝐿)
𝛾𝑗 (𝐿) = [ ] 𝑐 (𝐿)−1 (21)
𝐿𝑗 +
We have defined the solution of the filtering problem as E[𝑌𝑡 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = 𝑏(𝐿)𝑋𝑡
The Wiener-Kolomogorov formula for 𝑏(𝐿) is
1166 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
𝑔𝑌 𝑋 (𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝑐(𝐿−1 ) +
or
𝑑(𝐿)𝑑(𝐿−1 )
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (22)
𝑐(𝐿−1 ) +
Formulas Eq. (21) and Eq. (22) are discussed in detail in [134] and [118]
The interested reader can there find several examples of the use of these formulas in eco-
nomics Some classic examples using these formulas are due to [98]
As an example of the usefulness of formula Eq. (22), we let 𝑋𝑡 be a stochastic process with
Wold moving average representation
𝑋𝑡 = 𝑐(𝐿)𝜂𝑡
𝑚
where E𝜂𝑡2 = 1, and 𝑐0 𝜂𝑡 = 𝑋𝑡 − E[𝑋𝑡 |𝑋𝑡−1 , …], 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿
Suppose that at time 𝑡, we wish to predict a geometric sum of future 𝑋’s, namely
∞
1
𝑦𝑡 ≡ ∑ 𝛿 𝑗 𝑋𝑡+𝑗 = 𝑋
𝑗=0
1 − 𝛿𝐿−1 𝑡
𝑐(𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (23)
1 − 𝛿𝐿−1 +
In order to evaluate the term in the annihilation operator, we use the following result from
[55]
Proposition Let
∞ ∞
• 𝑔(𝑧) = ∑𝑗=0 𝑔𝑗 𝑧𝑗 where ∑𝑗=0 |𝑔𝑗 |2 < +∞
• ℎ (𝑧 −1 ) = (1 − 𝛿1 𝑧−1 ) … (1 − 𝛿𝑛 𝑧−1 ), where |𝛿𝑗 | < 1, for 𝑗 = 1, … , 𝑛
Then
𝑛
𝑔(𝑧) 𝑔(𝑧) 𝛿𝑗 𝑔(𝛿𝑗 ) 1
[ −1
] = −1
− ∑ 𝑛 ( ) (24)
ℎ(𝑧 ) + ℎ(𝑧 ) 𝑗=1 ∏ 𝑘=1 (𝛿𝑗 − 𝛿𝑘 ) 𝑧 − 𝛿𝑗
𝑘≠𝑗
71.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1167
and, alternatively,
𝑛
𝑔(𝑧) 𝑧𝑔(𝑧) − 𝛿𝑗 𝑔(𝛿𝑗 )
[ −1
] = ∑ 𝐵𝑗 ( ) (25)
ℎ(𝑧 ) + 𝑗=1 𝑧 − 𝛿𝑗
𝑛
where 𝐵𝑗 = 1/ ∏ 𝑘=1 (1 − 𝛿𝑘 /𝛿𝑗 )
𝑘+𝑗
Applying formula Eq. (25) of the proposition to evaluating Eq. (23) with 𝑔(𝑧) = 𝑐(𝑧) and
ℎ(𝑧−1 ) = 1 − 𝛿𝑧 −1 gives
𝐿𝑐(𝐿) − 𝛿𝑐(𝛿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝐿−𝛿
or
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝑏(𝐿) = [ ]
1 − 𝛿𝐿−1
Thus, we have
∞
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
E [∑ 𝛿 𝑗 𝑋𝑡+𝑗 |𝑋𝑡 , 𝑥𝑡−1 , …] = [ ] 𝑋𝑡 (26)
𝑗=0
1 − 𝛿𝐿−1
This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control
with Linear Algebra in which the randomness emerges because {𝑎𝑡 } is a stochastic process
The problem is to maximize
𝑁
1 1
E0 lim ∑ 𝛽 𝑡 [𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 ] (27)
𝑁→∞
𝑡−0
2 2
𝑎𝑡 = 𝑐(𝐿) 𝜂𝑡
where
𝑛̃
𝑐(𝐿) = ∑ 𝑐𝑗 𝐿𝑗
𝑗=0
and
𝜂𝑡 = 𝑎𝑡 − E[𝑎𝑡 |𝑎𝑡−1 , …]
The problem is to maximize Eq. (27) with respect to a contingency plan expressing 𝑦𝑡 as a
function of information known at 𝑡, which is assumed to be (𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑎𝑡 , 𝑎𝑡−1 , …)
The solution of this problem can be achieved in two steps
1168 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
First, ignoring the uncertainty, we can solve the problem assuming that {𝑎𝑡 } is a known se-
quence
The solution is, from above,
or
𝑚 ∞
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑(𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (28)
𝑗=1 𝑘=0
Second, the solution of the problem under uncertainty is obtained by replacing the terms on
the right-hand side of the above expressions with their linear least squares predictors
Using Eq. (26) and Eq. (28), we have the following solution
𝑚
1 − 𝛽𝜆𝑗 𝑐(𝛽𝜆𝑗 )𝐿−1 𝑐(𝐿)−1
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 [ ] 𝑎𝑡
𝑗=1
1 − 𝛽𝜆𝑗 𝐿−1
Blaschke factors
The following is a useful piece of mathematics underlying “root flipping”
𝑚
Let 𝜋(𝑧) = ∑𝑗=0 𝜋𝑗 𝑧𝑗 and let 𝑧1 , … , 𝑧𝑘 be the zeros of 𝜋(𝑧) that are inside the unit circle,
𝑘<𝑚
Then define
(𝑧1 𝑧 − 1) (𝑧 𝑧 − 1) (𝑧 𝑧 − 1)
𝜃(𝑧) = 𝜋(𝑧)( )( 2 )…( 𝑘 )
(𝑧 − 𝑧1 ) (𝑧 − 𝑧2 ) (𝑧 − 𝑧𝑘 )
and that the zeros of 𝜃(𝑧) are not inside the unit circle
71.6 Exercises
71.6.1 Exercise 1
Let 𝑌𝑡 = (1 − 2𝐿)𝑢𝑡 where 𝑢𝑡 is a mean zero white noise with E𝑢2𝑡 = 1. Let
𝑋𝑡 = 𝑌 𝑡 + 𝜀 𝑡
where 𝜀𝑡 is a serially uncorrelated white noise with E𝜀2𝑡 = 9, and E𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠
Find the Wold moving average representation for 𝑋𝑡
71.6. EXERCISES 1169
∞
̂𝑡+1 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴1𝑗 𝑋𝑡−𝑗
E𝑋
𝑗=0
∞
E𝑋𝑡+2 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴2𝑗 𝑋𝑡−𝑗
𝑗=0
71.6.2 Exercise 2
𝑌𝑡 = 𝐷(𝐿)𝑈𝑡
𝑚
where 𝐷(𝐿) = ∑𝑗=0 𝐷𝑗 𝐿𝐽 , 𝐷𝑗 an 𝑛 × 𝑛 matrix, 𝑈𝑡 an (𝑛 × 1) vector white noise with E𝑈𝑡 = 0
for all 𝑡, E𝑈𝑡 𝑈𝑠′ = 0 for all 𝑠 ≠ 𝑡, and E𝑈𝑡 𝑈𝑡′ = 𝐼 for all 𝑡
Let 𝜀𝑡 be an 𝑛 × 1 vector white noise with mean 0 and contemporaneous covariance matrix 𝐻,
where 𝐻 is a positive definite matrix
Let 𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡
Define the covariograms as 𝐶𝑋 (𝜏 ) = E𝑋𝑡 𝑋𝑡−𝜏
′
, 𝐶𝑌 (𝜏 ) = E𝑌𝑡 𝑌𝑡−𝜏
′
, 𝐶𝑌 𝑋 (𝜏 ) = E𝑌𝑡 𝑋𝑡−𝜏
′
Then define the matrix covariance generating function, as in (21), only interpret all the ob-
jects in (21) as matrices
Show that the covariance generating functions are given by
𝑔𝑦 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑔𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻
𝑔𝑌 𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑚
𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻 = 𝐶(𝑧)𝐶(𝑧 −1 )′ , 𝐶(𝑧) = ∑ 𝐶𝑗 𝑧𝑗
𝑗=0
where the zeros of |𝐶(𝑧)| do not lie inside the unit circle
A vector Wold moving average representation of 𝑋𝑡 is then
𝑋𝑡 = 𝐶(𝐿)𝜂𝑡
𝐶(𝐿)
E [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ ] 𝜂
𝐿𝑗 + 𝑡
If 𝐶(𝐿) is invertible, i.e., if the zeros of det 𝐶(𝑧) lie strictly outside the unit circle, then this
formula can be written
𝐶(𝐿)
E [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ ] 𝐶(𝐿)−1 𝑋𝑡
𝐿𝐽 +
Part XI
1171
72
72.1 Contents
• Overview 72.2
• Pricing Models 72.3
• Prices in the Risk-Neutral Case 72.4
• Asset Prices under Risk Aversion 72.5
• Exercises 72.6
• Solutions 72.7
“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.
In addition to what’s in Anaconda, this lecture will need the following libraries
72.2 Overview
• the anticipated dynamics for the stream of income accruing to the owners
• attitudes to risk
• rates of time preference
In this lecture, we consider some standard pricing models and dividend stream specifications
We study how prices and dividend-price ratios respond in these different scenarios
We also look at creating and pricing derivative assets by repackaging income streams
Key tools for the lecture are
1173
1174 72. ASSET PRICING I: FINITE STATE MODELS
Let’s look at some equations that we expect to hold for prices of assets under ex-dividend
contracts (we will consider cum-dividend pricing in the exercises)
What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [62] and Lars Peter Hansen and Scott Richard [54] showed
that in quite general settings the price of an ex-dividend asset obeys
Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have
If we apply this definition to the asset pricing equation Eq. (2) we obtain
Equation Eq. (4) asserts that the covariance of the stochastic discount factor with the one
period payout 𝑑𝑡+1 + 𝑝𝑡+1 is an important determinant of the price 𝑝𝑡
We give examples of some models of stochastic discount factors that have been proposed later
in this lecture and also in a later lecture
Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡
Let’s write down an expression that this ratio should satisfy
We can divide both sides of Eq. (2) by 𝑑𝑡 to get
𝑑𝑡+1
𝑣𝑡 = E𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡
What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on
For now let’s focus on the risk-neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process
1176 72. ASSET PRICING I: FINITE STATE MODELS
The simplest case is risk-neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0
Removing the expectation from Eq. (1) and iterating forward gives
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))
⋮
= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )
𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽
Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be
If we guess this, substituting 𝑣𝑡 = 𝑣 into Eq. (5) as well as our other assumptions, we get
𝑣 = 𝛽𝑔(1 + 𝑣)
Since 𝛽𝑔 < 1, we have a unique positive solution:
𝛽𝑔
𝑣=
1 − 𝛽𝑔
𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes
1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡
𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …
where
1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities
(For a refresher on notation and theory for finite Markov chains see this lecture)
The next figure shows a simulation, where
Pricing
To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth
In that case, we found that 𝑣 is constant
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 )
We can substitute this guess into Eq. (5) to get
or
𝑣 = 𝛽𝐾(1 + 𝑣) (9)
72.4. PRICES IN THE RISK-NEUTRAL CASE 1179
Here
72.4.4 Code
K = mc.P * np.exp(mc.state_values)
I = np.identity(n)
v = solve(I - β * K, β * K @ np.ones(n))
Now let’s turn to the case where agents are risk averse
We’ll price several distinct assets, including
Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [88]
As in [88], suppose that the stochastic discount factor takes the form
𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )
𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾
−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡
Substituting this into Eq. (5) gives the price-dividend ratio formula
If we let
𝑣 = 𝛽𝐽 (1 + 𝑣)
Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution
𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 1 (14)
We will define a function tree_price to solve for 𝑣 given parameters stored in the class Asset-
PriceModel
Parameters
----------
β : scalar, float
Discount factor
1182 72. ASSET PRICING I: FINITE STATE MODELS
mc : MarkovChain
Contains the transition matrix and set of state values for the state
process
γ : scalar(float)
Coefficient of risk aversion
g : callable
The function mapping states to growth rates
"""
def __init__(self, β=0.96, mc=None, γ=2.0, g=np.exp):
self.β, self.γ = β, γ
self.g = g
self.n = self.mc.P.shape[0]
def tree_price(ap):
"""
Computes the price-dividend ratio of the Lucas tree.
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
Returns
-------
v : array_like(float)
Lucas tree price-dividend ratio
"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
J = P * ap.g(y)**(1 - γ)
# == Compute v == #
I = np.identity(ap.n)
Ones = np.ones(ap.n)
v = solve(I - β * J, β * J @ Ones)
return v
Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)
for γ in γs:
ap.γ = γ
v = tree_price(ap)
ax.plot(states, v, lw=2, alpha=0.6, label=rf"$\gamma = {γ}$")
∞
1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 1 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 1 = 𝛽 1
𝑖=0
1−𝛽
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk-neutral solution Eq. (10)
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk-neutral)
1184 72. ASSET PRICING I: FINITE STATE MODELS
• 𝜁 in period 𝑡 + 1, plus
• the right to sell the claim for 𝑝𝑡+1 next period
𝑝𝑡 = E𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]
−𝛾
𝑝𝑡 = E𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)
Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution
𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁1 (16)
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
Coupon of the console
Returns
-------
p : array_like(float)
Console bond prices
"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)
# == Compute price == #
I = np.identity(ap.n)
72.5. ASSET PRICES UNDER RISK AVERSION 1185
Ones = np.ones(ap.n)
p = solve(I - β * M, β * ζ * M @ Ones)
return p
Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆
An Infinite Horizon Call Option
We want to price an infinite horizon option to purchase a consol at a price 𝑝𝑆
The option entitles the owner at the beginning of a period either to
Thus, the owner either exercises the option now or chooses not to exercise and wait until next
period
This is termed an infinite-horizon call option with strike price 𝑝𝑆
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today)
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies
𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 E𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )
The first term on the right is the value of waiting, while the second is the value of exercising
now
We can also write this as
With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
Eq. (17) as the nonlinear vector equation
𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 1} (18)
To solve Eq. (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via
𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 1}
1186 72. ASSET PRICING I: FINITE STATE MODELS
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
Coupon of the console
p_s : scalar(float)
Strike price
� : scalar(float), optional(default=1e-8)
Tolerance for infinite horizon problem
Returns
-------
w : array_like(float)
Infinite horizon call option prices
"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)
return w
In [8]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40
x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)
𝑚1 = 𝛽𝑀 1
where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖
Other Terms
Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖
1188 72. ASSET PRICING I: FINITE STATE MODELS
72.6 Exercises
72.6.1 Exercise 1
72.6.2 Exercise 2
In [9]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # state values of the Markov chain
γ = 2.0
β = 0.94
72.6.3 Exercise 3
Let’s consider finite horizon call options, which are more common than the infinite horizon
variety
Finite horizon options obey functional equations closely related to Eq. (17)
A 𝑘 period option expires after 𝑘 periods
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1
The option expires at time 𝑘
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option
It obeys
72.7. SOLUTIONS 1189
72.7 Solutions
72.7.1 Exercise 1
𝑝𝑡 = 𝑑𝑡 + 𝛽E𝑡 [𝑝𝑡+1 ]
1
𝑝𝑡 = 𝑑
1−𝛽 𝑡
1
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
72.7.2 Exercise 2
In [10]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # state values
mc = qe.MarkovChain(P, state_values=s)
γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0
In [12]: tree_price(apm)
In [13]: consol_price(apm, ζ)
72.7.3 Exercise 3
return w
73.1 Contents
• Overview 73.2
• Exercises 73.4
• Solutions 73.5
In addition to what’s in Anaconda, this lecture will need the following libraries
73.2 Overview
1193
1194 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
Lucas studied a pure exchange economy with a representative consumer (or household), where
Either way, the assumption of a representative agent means that prices adjust to eradicate
desires to trade
This makes it very easy to compute competitive equilibrium prices
We will assume that this endowment is Markovian, following the exogenous process
∞
E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
73.3. THE LUCAS MODEL 1195
Here
– 𝑦𝑡+1 and
– the right to sell the claim tomorrow at price 𝑝𝑡+1
Since this is a competitive model, the first step is to pin down consumer behavior, taking
prices as given
Next, we’ll impose equilibrium constraints and try to back out prices
In the consumer problem, the consumer’s control variable is the share 𝜋𝑡 of the claim held in
each period
Thus, the consumer problem is to maximize Eq. (1) subject to
𝑐𝑡 + 𝜋𝑡+1 𝑝𝑡 ≤ 𝜋𝑡 𝑦𝑡 + 𝜋𝑡 𝑝𝑡
• Since this is a competitive (read: price taking) model, the consumer will take this func-
tion 𝑝 as given
• In this way, we determine consumer behavior given 𝑝 and then use equilibrium condi-
tions to recover 𝑝
1196 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
Using the assumption that price is a given function 𝑝 of 𝑦, we write the value function and
constraint as
𝑣(𝜋, 𝑦) = max
′
{𝑢(𝑐) + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)}
𝑐,𝜋
subject to
We can invoke the fact that utility is increasing to claim equality in Eq. (2) and hence elimi-
nate the constraint, obtaining
𝑣(𝜋, 𝑦) = max
′
{𝑢[𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ 𝑝(𝑦)] + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)} (3)
𝜋
The solution to this dynamic programming problem is an optimal policy expressing either 𝜋′
or 𝑐 as a function of the state (𝜋, 𝑦)
• Each one determines the other, since 𝑐(𝜋, 𝑦) = 𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ (𝜋, 𝑦)𝑝(𝑦)
Next Steps
What we need to do now is determine equilibrium prices
It seems that to obtain these, we will have to
1. Solve this two-dimensional dynamic programming problem for the optimal policy
2. Impose equilibrium constraints
3. Solve out for the price function 𝑝(𝑦) directly
However, as Lucas showed, there is a related but more straightforward way to do this
Equilibrium Constraints
Since the consumption good is not storable, in equilibrium we must have 𝑐𝑡 = 𝑦𝑡 for all 𝑡
In addition, since there is one representative consumer (alternatively, since all consumers are
identical), there should be no trade in equilibrium
In particular, the representative consumer owns the whole tree in every period, so 𝜋𝑡 = 1 for
all 𝑡
Prices must adjust to satisfy these two constraints
The Equilibrium Price Function
Now observe that the first-order condition for Eq. (3) can be written as
To obtain 𝑣1′ we can simply differentiate the right-hand side of Eq. (3) with respect to 𝜋,
yielding
Next, we impose the equilibrium constraints while combining the last two equations to get
𝑢′ [𝐺(𝑦, 𝑧)]
𝑝(𝑦) = 𝛽 ∫ [𝐺(𝑦, 𝑧) + 𝑝(𝐺(𝑦, 𝑧))]𝜙(𝑑𝑧) (4)
𝑢′ (𝑦)
𝑢′ (𝑐𝑡+1 )
𝑝𝑡 = E𝑡 [𝛽 (𝑦 + 𝑝𝑡+1 )] (5)
𝑢′ (𝑐𝑡 ) 𝑡+1
Here ℎ(𝑦) ∶= 𝛽 ∫ 𝑢′ [𝐺(𝑦, 𝑧)]𝐺(𝑦, 𝑧)𝜙(𝑑𝑧) is a function that depends only on the primitives
Equation Eq. (7) is a functional equation in 𝑓
The plan is to solve out for 𝑓 and convert back to 𝑝 via Eq. (6)
To solve Eq. (7) we’ll use a standard method: convert it to a fixed point problem
First, we introduce the operator 𝑇 mapping 𝑓 into 𝑇 𝑓 as defined by
(Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the
next section)
Recall the Banach contraction mapping theorem
It tells us that the previous statements will be true if we can find an 𝛼 < 1 such that
To see that Eq. (9) is valid, pick any 𝑓, 𝑔 ∈ 𝑐𝑏R+ and any 𝑦 ∈ R+
Observe that, since integrals get larger when absolute values are moved to the inside,
≤ 𝛽 ∫ ‖𝑓 − 𝑔‖𝜙(𝑑𝑧)
= 𝛽‖𝑓 − 𝑔‖
Since the right-hand side is an upper bound, taking the sup over all 𝑦 on the left-hand side
gives Eq. (9) with 𝛼 ∶= 𝛽
The preceding discussion tells that we can compute 𝑓 ∗ by picking any arbitrary 𝑓 ∈ 𝑐𝑏R+ and
then iterating with 𝑇
The equilibrium price function 𝑝∗ can then be recovered by 𝑝∗ (𝑦) = 𝑓 ∗ (𝑦)/𝑢′ (𝑦)
Let’s try this when ln 𝑦𝑡+1 = 𝛼 ln 𝑦𝑡 + 𝜎𝜖𝑡+1 where {𝜖𝑡 } is IID and standard normal
Utility will take the isoelastic form 𝑢(𝑐) = 𝑐1−𝛾 /(1 − 𝛾), where 𝛾 > 0 is the coefficient of
relative risk aversion
We will set up a LucasTree class to hold parameters of the model
"""
def __init__(self,
γ=2, # CRRA utility parameter
β=0.95, # Discount factor
α=0.90, # Correlation coefficient
σ=0.1, # Volatility coefficient
grid_size=100):
The following function takes an instance of the LucasTree and generates a jitted version of
the Lucas operator
"""
Returns approximate Lucas operator, which computes and returns the
updated function Tf on the grid points.
"""
@njit(parallel=parallel_flag)
def T(f):
"""
The Lucas operator
"""
Tf = np.empty_like(f)
# == Apply the T operator to f using Monte Carlo integration == #
for i in prange(len(grid)):
y = grid[i]
Tf[i] = h[i] + β * np.mean(Af(y**α * z_vec))
return Tf
return T
To solve the model, we write a function that iterates using the Lucas operator to find the
fixed point
"""
# == simplify notation == #
grid, grid_size = tree.grid, tree.grid_size
γ = tree.γ
T = operator_factory(tree)
i = 0
f = np.ones_like(grid) # Initial guess of f
error = tol + 1
while error > tol and i < max_iter:
Tf = T(f)
error = np.max(np.abs(Tf - f))
f = Tf
i += 1
return price
plt.figure(figsize=(12, 8))
plt.plot(tree.grid, price_vals, label='$p*(y)$')
plt.xlabel('$y$')
plt.ylabel('price')
plt.legend()
plt.show()
73.4. EXERCISES 1201
We see that the price is increasing, even if we remove all serial correlation from the endow-
ment process
The reason is that a larger current endowment reduces current marginal utility
The price must therefore rise to induce the household to consume the entire endowment (and
hence satisfy the resource constraint)
What happens with a more patient consumer?
Here the orange line corresponds to the previous parameters and the green line is price when
𝛽 = 0.98
We see that when consumers are more patient the asset becomes more valuable, and the price
of the Lucas tree shifts up
Exercise 1 asks you to replicate this figure
73.4 Exercises
73.4.1 Exercise 1
73.5 Solutions
73.5.1 Exercise 1
In [7]: fig, ax = plt.subplots(figsize=(10, 6))
ax.legend(loc='upper left')
ax.set(xlabel='$y$', ylabel='price', xlim=(min(grid), max(grid)))
plt.show()
74
74.1 Contents
• Overview 74.2
• Exercises 74.5
• Solutions 74.6
In addition to what’s in Anaconda, this lecture will need the following libraries
74.2 Overview
• heterogeneous beliefs
• incomplete markets
• short sales constraints, and possibly …
• (leverage) limits on an investor’s ability to borrow in order to finance purchases of a
risky asset
74.2.1 References
Prior to reading the following, you might like to review our lectures on
1203
1204 74. ASSET PRICING III: INCOMPLETE MARKETS
• Markov chains
• Asset pricing with finite state space
74.2.2 Bubbles
The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts
There is a fixed number 𝐴 of shares of an asset
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}
The dividend obeys
0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1
The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element
𝑃 (𝑖, 𝑗) = P{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}
1 1
𝑃𝑎 = [ 22 2]
1
3 3
2 1
𝑃𝑏 = [ 31 3]
3
4 4
The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:
74.3. STRUCTURE OF THE MODEL 1205
In [3]: mcB.stationary_distributions
An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1)
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock
1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset [1]
2. No single type of investor has sufficient resources to purchase the entire stock
The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism
Remember that state 1 is the high dividend state
• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend
However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix
1 1
𝑃𝑜 = [ 21 2]
3
4 4
1 1
𝑃𝑝 = [ 21 2]
3
4 4
74.3.4 Information
Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state
This price function is endogenous and to be determined below
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡
2. There are two types of agents differentiated only by their beliefs. Each type of agent
has sufficient resources to purchase all of the asset (Harrison and Kreps’s setting)
3. There are two types of agents with different beliefs, but because of limited wealth
and/or limited leverage, both types of investors hold the asset each period
The following table gives a summary of the findings obtained in the remainder of the lecture
(you will be asked to recreate the table in an exercise)
It records implications of Harrison and Kreps’s specifications of 𝑃𝑎 , 𝑃𝑏 , 𝛽
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
Here
We’ll explain these values and how they are calculated one row at a time
𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1
The first two rows of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠)
Here’s a function that can be used to compute these values
In [4]: """
"""
import scipy.linalg as la
return prices
• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset
• Here “fundamental value” means the expected discounted present value of future divi-
dends
We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs
𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)
for 𝑠 = 0, 1
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if
𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)
for 𝑠 = 0, 1
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will
• Investors of type 𝑎 are willing to pay the following price for the asset
𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1
• Investors of type 𝑏 are willing to pay the following price for the asset
• The asset changes hands whenever the state changes from 0 to 1 or from 1 to 0
• The valuations 𝑝𝑎̂ (𝑠) and 𝑝𝑏̂ (𝑠) are displayed in the fourth and fifth rows of the table
1210 74. ASSET PRICING III: INCOMPLETE MARKETS
• Even the pessimistic investors who don’t buy the asset think that it is worth more than
they think future dividends are worth
Here’s code to solve for 𝑝,̄ 𝑝𝑎̂ and 𝑝𝑏̂ using the iterative method described above
Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset
In this case, the asset price must adjust to attract pessimistic investors
Instead of equation Eq. (2), the equilibrium price satisfies
𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)
and the marginal investor who prices the asset is always the one that values it less highly
than does the other type
Now the marginal investor is always the (temporarily) pessimistic type
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states
When pessimistic investors price the asset according to Eq. (4), optimistic investors think
that the asset is underpriced
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset
Implicit constraints on leverage prohibit them from doing so
When optimistic investors price the asset as in equation Eq. (2), pessimistic investors think
that the asset is overpriced and would like to sell the asset short
Constraints on short sales prevent that
Here’s code to solve for 𝑝̌ using iteration
74.5. EXERCISES 1211
return p_new
• Compared to the homogeneous beliefs setting leading to the pricing formula, high vol-
ume occurs when the Harrison-Kreps pricing formula prevails
Type 𝑎 investors sell the entire stock of the asset to type 𝑏 investors every time the state
switches from 𝑠𝑡 = 0 to 𝑠𝑡 = 1
Type 𝑏 investors sell the asset to type 𝑎 investors every time the state switches from 𝑠𝑡 = 1 to
𝑠𝑡 = 0
Scheinkman takes this as a strength of the model because he observes high volume during
famous bubbles
• If the supply of the asset is increased sufficiently either physically (more “houses” are
built) or artificially (ways are invented to short sell “houses”), bubbles end when the
supply has grown enough to outstrip optimistic investors’ resources for purchasing the
asset
• If optimistic investors finance purchases by borrowing, tightening leverage constraints
can extinguish a bubble
74.5 Exercises
74.5.1 Exercise 1
Recreate the summary table using the functions we have built above
1212 74. ASSET PRICING III: INCOMPLETE MARKETS
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
You will first need to define the transition matrices and dividend payoff vector
74.6 Solutions
74.6.1 Exercise 1
First, we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic
p_a
====================
State 0: [1.33]
State 1: [1.22]
--------------------
p_b
====================
State 0: [1.45]
State 1: [1.91]
--------------------
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_pessimistic
====================
State 0: [1.]
State 1: [1.]
--------------------
We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs
74.6. SOLUTIONS 1213
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_hat_a
====================
State 0: [1.85]
State 1: [1.69]
--------------------
p_hat_b
====================
State 0: [1.69]
State 1: [2.08]
--------------------
Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type
Footnotes
[1] By assuming that both types of agents always have “deep enough pockets” to purchase
all of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model
generates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
1214 74. ASSET PRICING III: INCOMPLETE MARKETS
75
75.1 Contents
• Overview 75.2
• Appendix 75.3
75.2 Overview
The famous Black-Litterman (1992) [19] portfolio choice model that we describe in this lec-
ture is motivated by the finding that with high or moderate frequency data, means are more
difficult to estimate than variances
A model of robust portfolio choice that we’ll describe also begins from the same starting
point
To begin, we’ll take for granted that means are more difficult to estimate that covariances
and will focus on how Black and Litterman, on the one hand, an robust control theorists,
on the other, would recommend modifying the mean-variance portfolio choice model
to take that into account
At the end of this lecture, we shall use some rates of convergence results and some simula-
tions to verify how means are more difficult to estimate than variances
Among the ideas in play in this lecture will be
1215
1216 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
This lecture describes two lines of thought that modify the classic mean-variance portfolio
choice model in ways designed to make its recommendations more plausible
As we mentioned above, the two approaches build on a common and widespread hunch – that
because it is much easier statistically to estimate covariances of excess returns than it is to
estimate their means, it makes sense to contemplated the consequences of adjusting investors’
subjective beliefs about mean returns in order to render more sensible decisions
Both of the adjustments that we describe are designed to confront a widely recognized em-
barrassment to mean-variance portfolio theory, namely, that it usually implies taking very
extreme long-short portfolio positions
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇, Σ)
or
𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖
𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) ∼ 𝒩(𝑤′ 𝜇, 𝑤′ Σ𝑤)
𝛿
𝑈 (𝜇, Σ; 𝑤) = 𝑤′ 𝜇 − 𝑤′ Σ𝑤 (1)
2
where 𝛿 > 0 is a risk-aversion parameter. The first-order condition for maximizing Eq. (1)
with respect to the vector 𝑤 is
𝜇 = 𝛿Σ𝑤
75.2. OVERVIEW 1217
𝑤 = (𝛿Σ)−1 𝜇 (2)
The key inputs into the portfolio choice model Eq. (2) are
When estimates of 𝜇 and Σ from historical sample means and covariances have been com-
bined with reasonable values of the risk-aversion parameter 𝛿 to compute an optimal port-
folio from formula Eq. (2), a typical outcome has been 𝑤’s with extreme long and short
positions
A common reaction to these outcomes is that they are so unreasonable that a portfolio man-
ager cannot recommend them to a customer
In [2]: np.random.seed(12)
N = 10 # Number of assets
T = 200 # Sample size
# True risk premia and variance of excess return (constructed so that the Sharpe ratio is 1)
μ = (np.random.randn(N) + 5) /100 # Mean excess return (risk premium)
S = np.random.randn(N, N) # Random matrix for the covariance matrix
V = S @ S.T # Turn the random matrix into symmetric psd
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m) # Make sure that the Sharpe ratio is one
# Estimate μ and Σ
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)
• They continue to accept Eq. (2) as a good model for choosing an optimal portfolio 𝑤
• They want to continue to allow the customer to express his or her risk tolerance by set-
ting 𝛿
• Leaving Σ at its maximum-likelihood value, they push 𝜇 away from its maximum value
in a way designed to make portfolio choices that are more plausible in terms of conform-
ing to what most people actually do
In particular, given Σ and a reasonable value of 𝛿, Black and Litterman reverse engineered
a vector 𝜇𝐵𝐿 of mean excess returns that makes the 𝑤 implied by formula Eq. (2) equal the
actual market portfolio 𝑤𝑚 , so that
𝑤𝑚 = (𝛿Σ)−1 𝜇𝐵𝐿
75.2.6 Details
Let’s define
′
𝑤𝑚 𝜇 ≡ (𝑟𝑚 − 𝑟𝑓 )
Define
𝜎 2 = 𝑤𝑚
′
Σ𝑤𝑚
𝑟𝑚 − 𝑟𝑓
SR𝑚 =
𝜎
as the Sharpe-ratio on the market portfolio 𝑤𝑚
Let 𝛿𝑚 be the value of the risk aversion parameter that induces an investor to hold the mar-
ket portfolio in light of the optimal portfolio choice rule Eq. (2)
Evidently, portfolio rule Eq. (2) then implies that 𝑟𝑚 − 𝑟𝑓 = 𝛿𝑚 𝜎2 or
𝑟𝑚 − 𝑟𝑓
𝛿𝑚 =
𝜎2
or
SR𝑚
𝛿𝑚 =
𝜎
Following the Black-Litterman philosophy, our first step will be to back a value of 𝛿𝑚 from
The second key Black-Litterman step is then to use this value of 𝛿 together with the maxi-
mum likelihood estimate of Σ to deduce a 𝜇BL that verifies portfolio rule Eq. (2) at the mar-
ket portfolio 𝑤 = 𝑤𝑚
𝜇𝑚 = 𝛿𝑚 Σ𝑤𝑚
The starting point of the Black-Litterman portfolio choice model is thus a pair (𝛿𝑚 , 𝜇𝑚 ) that
tells the customer to hold the market portfolio
# Sharpe-ratio
SR_m = r_m / np.sqrt(σ_m)
Black and Litterman start with a baseline customer who asserts that he or she shares the
market’s views, which means that he or she believes that excess returns are governed by
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇𝐵𝐿 , Σ) (3)
Black and Litterman would advise that customer to hold the market portfolio of risky securi-
ties
Black and Litterman then imagine a consumer who would like to express a view that differs
from the market’s
The consumer wants appropriately to mix his view with the market’s before using Eq. (2) to
choose a portfolio
Suppose that the customer’s view is expressed by a hunch that rather than Eq. (3), excess
returns are governed by
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇,̂ 𝜏 Σ)
75.2. OVERVIEW 1221
where 𝜏 > 0 is a scalar parameter that determines how the decision maker wants to mix his
view 𝜇̂ with the market’s view 𝜇BL
Black and Litterman would then use a formula like the following one to mix the views 𝜇̂ and
𝜇BL
Black and Litterman would then advise the customer to hold the portfolio associated with
these views implied by rule Eq. (2):
𝑤̃ = (𝛿Σ)−1 𝜇̃
This portfolio 𝑤̃ will deviate from the portfolio 𝑤𝐵𝐿 in amounts that depend on the mixing
parameter 𝜏 .
If 𝜇̂ is the maximum likelihood estimator and 𝜏 is chosen heavily to weight this view, then the
customer’s portfolio will involve big short-long positions
τ = 1
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
@interact(τ=τ_slider)
def BL_plot(τ):
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
w_tilde = np.linalg.solve(δ * Σ_est, μ_tilde)
𝜇 ∼ 𝒩(𝜇𝐵𝐿 , Σ)
Given a particular realization of the mean excess returns 𝜇 one observes the average excess
returns 𝜇̂ on the market according to the distribution
𝜇̂ ∣ 𝜇, Σ ∼ 𝒩(𝜇, 𝜏 Σ)
where 𝜏 is typically small capturing the idea that the variation in the mean is smaller than
the variation of the individual random variable
Given the realized excess returns one should then update the prior over the mean excess re-
turns according to Bayes rule
The corresponding posterior over mean excess returns is normally distributed with mean
Hence, the Black-Litterman recommendation is consistent with the Bayes update of the prior
over the mean excess returns in light of the realized average excess returns on the market
75.2. OVERVIEW 1223
𝑟𝑒⃗ ∼ 𝒩(𝜇𝐵𝐿 , Σ)
and
𝑟𝑒⃗ ∼ 𝒩(𝜇,̂ 𝜏 Σ)
A special feature of the multivariate normal random variable 𝑍 is that its density function
depends only on the (Euclidiean) length of its realization 𝑧
Formally, let the 𝑘-dimensional random vector be
𝑍 ∼ 𝒩(𝜇, Σ)
then
𝑍 ̄ ≡ Σ(𝑍 − 𝜇) ∼ 𝒩(0, 𝐼)
and so the points where the density takes the same value can be described by the ellipse
Remark: More generally there is a class of density functions that possesses this
feature, i.e.
This property is called spherical symmetry (see p 81. in Leamer (1978) [83])
In our specific example, we can use the pair (𝑑1̄ , 𝑑2̄ ) as being two “likelihood” values for which
the corresponding iso-likelihood ellipses in the excess return space are given by
Notice that for particular 𝑑1̄ and 𝑑2̄ values the two ellipses have a tangency point
These tangency points, indexed by the pairs (𝑑1̄ , 𝑑2̄ ), characterize points 𝑟𝑒⃗ from which there
exists no deviation where one can increase the likelihood of one view without decreasing the
likelihood of the other view
1224 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
The pairs (𝑑1̄ , 𝑑2̄ ) for which there is such a point outlines a curve in the excess return space.
This curve is reminiscent of the Pareto curve in an Edgeworth-box setting
Dickey (1975) [35] calls it a curve decolletage
Leamer (1978) [83] calls it an information contract curve and describes it by the following
program: maximize the likelihood of one view, say the Black-Litterman recommendation
while keeping the likelihood of the other view at least at a prespecified constant 𝑑2̄
𝑟𝑒⃗ = (Σ−1 + 𝜆(𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + 𝜆(𝜏 Σ)−1 𝜇)̂ (6)
Note that if 𝜆 = 1, Eq. (6) is equivalent with Eq. (4) and it identifies one point on the infor-
mation contract curve.
Furthermore, because 𝜆 is a function of the minimum likelihood 𝑑2̄ on the RHS of the con-
straint, by varying 𝑑2̄ (or 𝜆 ), we can trace out the whole curve as the figure below illustrates
In [5]: np.random.seed(1987102)
N = 2 # Number of assets
T = 200 # Sample size
τ = 0.8
μ = (np.random.randn(N) + 5) / 100
S = np.random.randn(N, N)
V = S @ S.T
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m)
excess_return = stat.multivariate_normal(μ, Σ)
sample = excess_return.rvs(T)
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)
@interact(λ=λ_slider)
75.2. OVERVIEW 1225
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * Σ_est)
X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))
for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))
Note that the line that connects the two points 𝜇̂ and 𝜇𝐵𝐿 is linear, which comes from the
fact that the covariance matrices of the two competing distributions (views) are proportional
to each other
To illustrate the fact that this is not necessarily the case, consider another example using the
same parameter values, except that the “second view” constituting the constraint has covari-
ance matrix 𝜏 𝐼 instead of 𝜏 Σ
This leads to the following figure, on which the curve connecting 𝜇̂ and 𝜇𝐵𝐿 are bending
1226 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
@interact(λ=λ_slider)
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * np.eye(N))
X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))
for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))
̂
𝛽𝑂𝐿𝑆 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
̂
mse(𝛽𝑂𝐿𝑆 ̂
, 𝛽0 ) ∶= E‖𝛽𝑂𝐿𝑆 − 𝛽0 ‖2 = ⏟⏟ ̂
E‖𝛽⏟ − E𝛽⏟ ‖2 + ‖E ̂
𝛽⏟⏟⏟ 2
⏟⏟⏟
𝑂𝐿𝑆 ⏟⏟
𝑂𝐿𝑆 ⏟⏟ 𝑂𝐿𝑆 − 𝛽
⏟⏟0‖
variance bias
From this decomposition, one can see that in order for the MSE to be small, both the bias
and the variance terms must be small
For example, consider the case when 𝑋 is a 𝑇 -vector of ones (where 𝑇 is the sample size), so
̂
𝛽𝑂𝐿𝑆 is simply the sample average, while 𝛽0 ∈ R is defined by the true mean of 𝑦
In this example the MSE is
2
𝑇
̂ 1
mse(𝛽𝑂𝐿𝑆 , 𝛽0 ) = 2 E (∑(𝑦𝑡 − 𝛽0 )) + 0⏟
𝑇
⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑡=1 bias
variance
However, because there is a trade-off between the estimator’s bias and variance, there are
cases when by permitting a small bias we can substantially reduce the variance so overall the
MSE gets smaller
A typical scenario when this proves to be useful is when the number of coefficients to be esti-
mated is large relative to the sample size
In these cases, one approach to handle the bias-variance trade-off is the so called Tikhonov
regularization
A general form with regularization matrix Γ can be written as
̃ 2}
min {‖𝑋𝛽 − 𝑦‖2 + ‖Γ(𝛽 − 𝛽)‖
𝛽
̂
𝛽𝑅𝑒𝑔 = (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑦 + Γ′ Γ𝛽)̃
̂
Substituting the value of 𝛽𝑂𝐿𝑆 yields
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
1228 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
Often, the regularization matrix takes the form Γ = 𝜆𝐼 with 𝜆 > 0 and 𝛽 ̃ = 0
Then the Tikhonov regularization is equivalent to what is called ridge regression in statistics
To illustrate how this estimator addresses the bias-variance trade-off, we compute the MSE of
the ridge estimator
2
𝑇 2
̂ 1 𝜆
mse(𝛽ridge , 𝛽0 ) = E (∑ (𝑦𝑡 − 𝛽0 )) + ( ) 𝛽02
(𝑇 + 𝜆)2 𝑇 +
⏟⏟⏟⏟⏟ 𝜆
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟𝑡=1
bias
variance
The ridge regression shrinks the coefficients of the estimated vector towards zero relative to
the OLS estimates thus reducing the variance term at the cost of introducing a “small” bias
However, there is nothing special about the zero vector
When 𝛽 ̃ ≠ 0 shrinkage occurs in the direction of 𝛽 ̃
Now, we can give a regularization interpretation of the Black-Litterman portfolio recommen-
dation
To this end, simplify first the equation Eq. (4) characterizing the Black-Litterman recommen-
dation
In our case, 𝜇̂ is the estimated mean excess returns of securities. This could be written as a
vector autoregression where
Correspondingly, the OLS regression of 𝑦 on 𝑋 would yield the mean excess returns as coeffi-
cients
√
With Γ = 𝜏 𝑇 −1 (𝐼𝑁 ⊗ 𝜄𝑇 ) we can write the regularized version of the mean excess return
estimation
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
̂
= (1 + 𝜏 )−1 𝑋 ′ 𝑋(𝑋 ′ 𝑋)−1 (𝛽𝑂𝐿𝑆 + 𝜏 𝛽)̃
= (1 + 𝜏 )−1 (𝛽 ̂
𝑂𝐿𝑆+ 𝜏 𝛽)̃
̂
= (1 + 𝜏 −1 )−1 (𝜏 −1 𝛽𝑂𝐿𝑆 + 𝛽)̃
̂
Given that 𝛽𝑂𝐿𝑆 = 𝜇̂ and 𝛽 ̃ = 𝜇𝐵𝐿 in the Black-Litterman model, we have the following
interpretation of the model’s recommendation
The estimated (personal) view of the mean excess returns, 𝜇̂ that would lead to extreme
short-long positions are “shrunk” towards the conservative market view, 𝜇𝐵𝐿 , that leads to
the more conservative market portfolio
75.2. OVERVIEW 1229
The Black-Litterman approach is partly inspired by the econometric insight that it is easier
to estimate covariances of excess returns than the means
That is what gave Black and Litterman license to adjust investors’ perception of mean excess
returns while not tampering with the covariance matrix of excess returns
The robust control theory is another approach that also hinges on adjusting mean excess re-
turns but not covariances
Associated with a robust control problem is what Hansen and Sargent [57], [52] call a T oper-
ator
Let’s define the T operator as it applies to the problem at hand
Let 𝑥 be an 𝑛 × 1 Gaussian random vector with mean vector 𝜇 and covariance matrix Σ =
𝐶𝐶 ′ . This means that 𝑥 can be represented as
𝑥 = 𝜇 + 𝐶𝜖
where 𝜖 ∼ 𝒩(0, 𝐼)
Let 𝜙(𝜖) denote the associated standardized Gaussian density
Let 𝑚(𝜖, 𝜇) be a likelihood ratio, meaning that it satisfies
• 𝑚(𝜖, 𝜇) > 0
• ∫ 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖 = 1
̃ = 𝑚(𝜖, 𝜇)𝜙(𝜖)
𝜙(𝜖)
The next concept that we need is the entropy of the distorted distribution 𝜙 ̃ with respect to
𝜙
Entropy is defined as
or
̃
ent = ∫ log 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖
1230 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
That is, relative entropy is the expected value of the likelihood ratio 𝑚 where the expectation
is taken with respect to the twisted density 𝜙 ̃
Relative entropy is non-negative. It is a measure of the discrepancy between two probability
distributions
As such, it plays an important role in governing the behavior of statistical tests designed to
discriminate one probability distribution from another
We are ready to define the T operator
Let 𝑉 (𝑥) be a value function
Define
−𝑉 (𝜇 + 𝐶𝜖)
= − log 𝜃 ∫ exp ( ) 𝜙(𝜖)𝑑𝜖
𝜃
This asserts that T is an indirect utility function for a minimization problem in which an evil
agent chooses a distorted probability distribution 𝜙 ̃ to lower expected utility, subject to a
penalty term that gets bigger the larger is relative entropy
Here the penalty parameter
𝜃 ∈ [𝜃, +∞]
is a robustness parameter when it is +∞, there is no scope for the minimizing agent to dis-
tort the distribution, so no robustness to alternative distributions is acquired As 𝜃 is lowered,
more robustness is achieved
Note: The T operator is sometimes called a risk-sensitivity operator
We shall apply Tto the special case of a linear value function 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) where 𝑟 ⃗ − 𝑟𝑓 1 ∼
𝒩(𝜇, Σ) or 𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖and 𝜖 ∼ 𝒩(0, 𝐼)
The associated worst-case distribution of 𝜖 is Gaussian with mean 𝑣 = −𝜃−1 𝐶 ′ 𝑤 and co-
variance matrix 𝐼 (When the value function is affine, the worst-case distribution distorts the
mean vector of 𝜖 but not the covariance matrix of 𝜖)
For utility function argument 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1)
1 ′
T(𝑟 ⃗ − 𝑟𝑓 1) = 𝑤′ 𝜇 + 𝜁 − 𝑤 Σ𝑤
2𝜃
and entropy is
𝑣′ 𝑣 1
= 2 𝑤′ 𝐶𝐶 ′ 𝑤
2 2𝜃
According to criterion (1), the mean-variance portfolio choice problem chooses 𝑤 to maximize
which equals
𝛿
𝑤′ 𝜇 − 𝑤′ Σ𝑤
2
A robust decision maker can be modeled as replacing the mean return 𝐸[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] with the
risk-sensitive
1 ′
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] = 𝑤′ 𝜇 − 𝑤 Σ𝑤
2𝜃
that comes from replacing the mean 𝜇 of 𝑟 ⃗ − 𝑟_𝑓1 with the worst-case mean
𝜇 − 𝜃−1 Σ𝑤
𝛿
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] − 𝑤′ Σ𝑤
2
or
𝛿
𝑤′ (𝜇 − 𝜃−1 Σ𝑤) − 𝑤′ Σ𝑤 (7)
2
1
𝑤rob = Σ−1 𝜇
𝛿+𝛾
75.3 Appendix
We want to illustrate the “folk theorem” that with high or moderate frequency data, it is
more difficult to estimate means than variances
In order to operationalize this statement, we take two analog estimators:
𝑁
• sample average: 𝑋̄ 𝑁 = 1
𝑁 ∑𝑖=1 𝑋𝑖
𝑁
• sample variance: 𝑆𝑁 = 1
𝑁−1 ∑𝑡=1 (𝑋𝑖 − 𝑋̄ 𝑁 )2
1232 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
to estimate the unconditional mean and unconditional variance of the random variable 𝑋,
respectively
To measure the “difficulty of estimation”, we use mean squared error (MSE), that is the aver-
age squared difference between the estimator and the true value
Assuming that the process {𝑋𝑖 }is ergodic, both analog estimators are known to converge to
their true values as the sample size 𝑁 goes to infinity
More precisely for all 𝜀 > 0
and
A necessary condition for these convergence results is that the associated MSEs vanish as 𝑁
goes to infinity, or in other words,
Even if the MSEs converge to zero, the associated rates might be different. Looking at the
limit of the relative MSE (as the sample size grows to infinity)
We start our analysis with the benchmark case of IID data. Consider a sample of size 𝑁 gen-
erated by the following IID process,
𝑋𝑖 ∼ 𝒩(𝜇, 𝜎2 )
𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) =
𝑁
75.3. APPENDIX 1233
2𝜎4
MSE(𝑆𝑁 , 𝜎2 ) =
𝑁 −1
Both estimators are unbiased and hence the MSEs reflect the corresponding variances of the
estimators
Furthermore, both MSEs are 𝑜(1) with a (multiplicative) factor of difference in their rates of
convergence:
MSE(𝑆𝑁 , 𝜎2 ) 𝑁 2𝜎2
= → 2𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) 𝑁 −1 𝑁→∞
We are interested in how this (asymptotic) relative rate of convergence changes as increasing
sampling frequency puts dependence into the data
To investigate how sampling frequency affects relative rates of convergence, we assume that
the data are generated by a mean-reverting continuous time process of the form
where 𝜇is the unconditional mean, 𝜅 > 0 is a persistence parameter, and {𝑊𝑡 } is a standard-
ized Brownian motion
Observations arising from this system in particular discrete periods 𝒯(ℎ) ≡ {𝑛ℎ ∶ 𝑛 ∈
Z}withℎ > 0 can be described by the following process
where
𝜎2 (1 − exp(−2𝜅ℎ))
𝜖𝑡,ℎ ∼ 𝒩(0, Σℎ ) with Σℎ =
2𝜅
We call ℎ the frequency parameter, whereas 𝑛 represents the number of lags between observa-
tions
Hence, the effective distance between two observations 𝑋𝑡 and 𝑋𝑡+𝑛 in the discrete time nota-
tion is equal to ℎ ⋅ 𝑛 in terms of the underlying continuous time process
Straightforward calculations show that the autocorrelation function for the stochastic process
{𝑋𝑡 }𝑡∈𝒯(ℎ) is
exp(−𝜅ℎ𝑛)𝜎2
𝛾ℎ (𝑛) ≡ cov(𝑋𝑡+ℎ𝑛 , 𝑋𝑡 ) = .
2𝜅
1234 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
𝜎2
It follows that if 𝑛 = 0, the unconditional variance is given by 𝛾ℎ (0) = 2𝜅 irrespective of the
sampling frequency
The following figure illustrates how the dependence between the observations is related to the
sampling frequency
• For any given ℎ, the autocorrelation converges to zero as we increase the distance – 𝑛–
between the observations. This represents the “weak dependence” of the 𝑋 process
• Moreover, for a fixed lag length, 𝑛, the dependence vanishes as the sampling frequency
goes to infinity. In fact, letting ℎ go to ∞ gives back the case of IID data
In [7]: μ = .0
κ = .1
σ = .5
var_uncond = σ**2 / (2 * κ)
Consider again the AR(1) process generated by discrete sampling with frequency ℎ. Assume
that we have a sample of size 𝑁 and we would like to estimate the unconditional mean – in
75.3. APPENDIX 1235
1 𝑁
E[𝑋̄ 𝑁 ] = ∑ E[𝑋𝑖 ] = E[𝑋0 ] = 𝜇
𝑁 𝑖=1
1 𝑁
V (𝑋̄ 𝑁 ) = V ( ∑ 𝑋𝑖 )
𝑁 𝑖=1
𝑁 𝑁−1 𝑁
1
= (∑ V(𝑋 𝑖 ) + 2 ∑ ∑ cov(𝑋𝑖 , 𝑋𝑠 ))
𝑁 2 𝑖=1 𝑖=1 𝑠=𝑖+1
𝑁−1
1
= (𝑁 𝛾(0) + 2 ∑ 𝑖 ⋅ 𝛾 (ℎ ⋅ (𝑁 − 𝑖)))
𝑁2 𝑖=1
𝑁−1
1 𝜎2 𝜎2
= 2 (𝑁 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖)) )
𝑁 2𝜅 𝑖=1
2𝜅
It is explicit in the above equation that time dependence in the data inflates the variance of
the mean estimator through the covariance terms. Moreover, as we can see, a higher sampling
frequency—smaller ℎ—makes all the covariance terms larger, everything else being fixed. This
implies a relatively slower rate of convergence of the sample average for high-frequency data
Intuitively, the stronger dependence across observations for high-frequency data reduces the
“information content” of each observation relative to the IID case
We can upper bound the variance term in the following way
𝑁−1
1
V(𝑋̄ 𝑁 ) = 2 (𝑁 𝜎2 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖))𝜎2 )
𝑁 𝑖=1
𝑁−1
𝜎2
≤ (1 + 2 ∑ ⋅ exp(−𝜅ℎ(𝑖)))
2𝜅𝑁 𝑖=1
𝜎2 1 − exp(−𝜅ℎ)𝑁−1
= (1 + 2 )
2𝜅𝑁
⏟ 1 − exp(−𝜅ℎ)
IID case
Asymptotically the exp(−𝜅ℎ)𝑁−1 vanishes and the dependence in the data inflates the bench-
mark IID variance by a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
This long run factor is larger the higher is the frequency (the smaller is ℎ)
Therefore, we expect the asymptotic relative MSEs, 𝐵, to change with time-dependent data.
We just saw that the mean estimator’s rate is roughly changing by a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
1236 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
mean_uncond = μ
std_uncond = np.sqrt(σ**2 / (2 * κ))
for i in range(N):
y_path[:, i + 1] = � + ρ * y_path[:, i] + ε_path[:, i]
return y_path
var_est_store = []
mean_est_store = []
labels = []
for h in h_grid:
labels.append(h)
sample = sample_generator(h, N_app, M_app)
mean_est_store.append(np.mean(sample, 1))
var_est_store.append(np.var(sample, 1))
var_est_store = np.array(var_est_store)
mean_est_store = np.array(mean_est_store)
The above figure illustrates the relationship between the asymptotic relative MSEs and the
sampling frequency
• We can see that with low-frequency data – large values of ℎ – the ratio of asymptotic
rates approaches the IID case
• As ℎ gets smaller – the higher the frequency – the relative performance of the variance
estimator is better in the sense that the ratio of asymptotic rates gets smaller. That
is, as the time dependence gets more pronounced, the rate of convergence of the mean
estimator’s MSE deteriorates more than that of the variance estimator
1238 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
Part XII
1239
76
Stackelberg Plans
76.1 Contents
• Overview 76.2
• Duopoly 76.3
In addition to what’s in Anaconda, this lecture will need the following libraries
76.2 Overview
This notebook formulates and computes a plan that a Stackelberg leader uses to manip-
ulate forward-looking decisions of a Stackelberg follower that depend on continuation se-
quences of decisions made once and for all by the Stackelberg leader at time 0
To facilitate computation and interpretation, we formulate things in a context that allows us
to apply linear optimal dynamic programming
From the beginning, we carry along a linear-quadratic model of duopoly in which firms face
adjustment costs that make them want to forecast actions of other firms that influence future
prices
1241
1242 76. STACKELBERG PLANS
76.3 Duopoly
𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 )
where 𝑞𝑖𝑡 is output of firm 𝑖 at time 𝑡 and 𝑎0 and 𝑎1 are both positive
𝑞10 , 𝑞20 are given numbers that serve as initial conditions at time 0
By incurring a cost of change
2
𝛾𝑣𝑖𝑡
2
𝜋𝑖𝑡 = 𝑝𝑡 𝑞𝑖𝑡 − 𝛾𝑣𝑖𝑡
∞
∑ 𝛽 𝑡 𝜋𝑖𝑡
𝑡=0
where the appearance behind the semi-colon indicates that 𝑞2⃗ is given
Firm 1’s problem induces the best response mapping
𝑞1⃗ = 𝐵(𝑞2⃗ )
whose maximizer is a sequence 𝑞2⃗ that depends on the initial conditions 𝑞10 , 𝑞20 and the pa-
rameters of the model 𝑎0 , 𝑎1 , 𝛾
This formulation captures key features of the model
While our abstract formulation reveals the timing protocol and equilibrium concept well, it
obscures details that must be addressed when we want to compute and interpret a Stackel-
berg plan and the follower’s best response to it
To gain insights about these things, we study them in more detail
Firm 2 knows that firm 1 chooses second and takes this into account in choosing {𝑞2𝑡+1 }∞
𝑡=0
In the spirit of working backward, we study firm 1’s problem first, taking {𝑞2𝑡+1 }∞
𝑡=0 as given
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
We approach this problem using methods described in Ljungqvist and Sargent RMT5 chapter
2, appendix A and Macroeconomic Theory, 2nd edition, chapter IX
First-order conditions for this problem are
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
1244 76. STACKELBERG PLANS
These first-order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to take the
form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We can substitute the second equation into the first equation to obtain
This equation can in turn be rearranged to become the second-order difference equation
Equation Eq. (1) is a second-order difference equation in the sequence 𝑞1⃗ whose solution we
want
It satisfies two boundary conditions:
Using the lag operators described in chapter IX of Macroeconomic Theory, Second edition
(1987), difference equation Eq. (1) can be written as
1 + 𝛽 + 𝑐1
𝛽(1 − 𝐿 + 𝛽 −1 𝐿2 )𝑞1𝑡+2 = −𝑐0 + 𝑐2 𝑞2𝑡+1
𝛽
The polynomial in the lag operator on the left side can be factored as
1 + 𝛽 + 𝑐1
(1 − 𝐿 + 𝛽 −1 𝐿2 ) = (1 − 𝛿1 𝐿)(1 − 𝛿2 𝐿) (2)
𝛽
Because 𝛿2 > √1𝛽 the operator (1 − 𝛿2 𝐿) contributes an unstable component if solved back-
wards but a stable component if solved forwards
Mechanically, write
−1 −1 −1
[−𝛿2 𝐿(1 − 𝛿2−1 𝐿−1 )] = −𝛿2 (1 − 𝛿2 ) 𝐿−1
Operating on both sides of equation Eq. (2) with 𝛽 −1 times this inverse operator gives the
follower’s decision rule for setting 𝑞1𝑡+1 in the feedback-feedforward form
76.4. THE STACKELBERG PROBLEM 1245
∞
1
𝑞1𝑡+1 = 𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐 2 𝛿 −1 −1
2 𝛽 ∑ 𝛿2𝑗 𝑞2𝑡+𝑗+1 , 𝑡≥0 (3)
1 − 𝛿2 𝑗=0
The problem of the Stackelberg leader firm 2 is to choose the sequence {𝑞2𝑡+1 }∞
𝑡=0 to maxi-
mize its discounted profits
∞
∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞
𝐿̃ = ∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞ ∞ (4)
1
+∑𝛽 𝑡
𝜃𝑡 {𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 + 𝑐 𝛿
2 2
−1 −1
𝛽 ∑ 𝛿2−𝑗 𝑞2𝑡+𝑗+1 − 𝑞1𝑡+1 }
𝑡=0
1 − 𝛿2−1 𝑗=0
But 𝑥𝑡 is a decision made by the Stackelberg follower at time 𝑡 that is the follower’s best re-
sponse to the choice of an entire sequence of decisions made by the Stackelberg leader at time
𝑡=0
Let
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑟(𝑦, 𝑢) = 𝑦′ 𝑅𝑦 + 𝑢′ 𝑄𝑢
Subject to an initial condition for 𝑧0 , but not for 𝑥0 , the Stackelberg leader wants to maxi-
mize
∞
− ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 ) (5)
𝑡=0
𝐼 0 𝑧 𝐴̂ ̂
𝐴12 𝑧𝑡 ̂
[ ] [ 𝑡+1 ] = [ 11̂ ̂ ] [𝑥𝑡 ] + 𝐵𝑢𝑡 (6)
𝐺21 𝐺22 𝑥𝑡+1 𝐴21 𝐴22
𝐼 0
We assume that the matrix [ ] on the left side of equation Eq. (6) is invertible, so
𝐺21 𝐺22
that we can multiply both sides by its inverse to obtain
𝑧 𝐴 𝐴12 𝑧𝑡
[ 𝑡+1 ] = [ 11 ] [ ] + 𝐵𝑢𝑡 (7)
𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡
or
The Stackelberg follower’s best response mapping is summarized by the second block of equa-
tions of Eq. (7)
In particular, these equations are the first-order conditions of the Stackelberg follower’s opti-
mization problem (i.e., its Euler equations)
These Euler equations summarize the forward-looking aspect of the follower’s behavior and
express how its time 𝑡 decision depends on the leader’s actions at times 𝑠 ≥ 𝑡
When combined with a stability condition to be imposed below, the Euler equations summa-
rize the follower’s best response to the sequence of actions by the leader
The Stackelberg leader maximizes Eq. (5) by choosing sequences {𝑢𝑡 , 𝑥𝑡 , 𝑧𝑡+1 }∞
𝑡=0 subject to
Eq. (8) and an initial condition for 𝑧0
76.4. THE STACKELBERG PROBLEM 1247
Please remember that the follower’s Euler equation is embedded in the system of dynamic
equations 𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡
Note that in the definition of Ω(𝑦0 ), 𝑦0 is taken as given
Although it is taken as given in Ω(𝑦0 ), eventually, the 𝑥0 component of 𝑦0 will be chosen by
the Stackelberg leader
• to respect the protocol in which the follower chooses 𝑞1⃗ after seeing 𝑞2⃗ chosen by the
leader
• to make the leader choose 𝑞2⃗ while respecting that 𝑞1⃗ will be the follower’s best response
to 𝑞2⃗
• to represent the leader’s problem recursively by artfully choosing the state variables
confronting and the control variables available to the leader
Subproblem 1
∞
𝑣(𝑦0 ) = max − ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 )
(𝑦1⃗ ,𝑢⃗ 0 )∈Ω(𝑦0 )
𝑡=0
Subproblem 2
𝑣(𝑦) = max
∗
{−𝑟(𝑦, 𝑢) + 𝛽𝑣(𝑦∗ )} (9)
𝑢,𝑦
𝑦∗ = 𝐴𝑦 + 𝐵𝑢
which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation
𝑢𝑡 = −𝐹 𝑦𝑡
Subproblem 2
We find an optimal 𝑥0 by equating to zero the gradient of 𝑣(𝑦0 ) with respect to 𝑥0 :
−2𝑃21 𝑧0 − 2𝑃22 𝑥0 = 0,
−1
𝑥0 = −𝑃22 𝑃21 𝑧0
76.5. STACKELBERG PLAN 1249
Now let’s map our duopoly model into the above setup.
We will formulate a state space system
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
where in this instance 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the follower firm 1
Now we’ll proceed to cast our duopoly model within the framework of the more general
linear-quadratic structure described above
That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation to
solve a linear-quadratic dynamic program
As emphasized above, firm 1 acts as if firm 2’s decisions {𝑞2𝑡+1 , 𝑣2𝑡 }∞
𝑡=0 are given and beyond
its control
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
These first-order order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to
take the form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞 − 1 𝑞2𝑡+1
2𝛾 𝛾 1𝑡+1 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We use these two equations as components of the following linear system that confronts a
Stackelberg continuation leader at time 𝑡
1250 76. STACKELBERG PLANS
1 0 0 0 1 1 0 0 0 1 0
⎡ 0 1 0 0 ⎤ ⎡𝑞2𝑡+1 ⎤ ⎡0 1 0 0⎥ ⎢𝑞2𝑡 ⎥ ⎢1⎤
⎤ ⎡ ⎤ ⎡
⎢ ⎥⎢ ⎥=⎢ + ⎥𝑣
⎢ 0 0 1 0 ⎥ ⎢𝑞1𝑡+1 ⎥ ⎢0 0 1 1⎥ ⎢𝑞1𝑡 ⎥ ⎢0⎥ 2𝑡
𝛽𝑎0
⎣ 2𝛾 − 𝛽𝑎
2𝛾
1
− 𝛽𝑎𝛾 1 𝛽 ⎦ ⎣𝑣1𝑡+1 ⎦ ⎣0 0 0 1⎦ ⎣𝑣1𝑡 ⎦ ⎣0⎦
2
Time 𝑡 revenues of firm 2 are 𝜋2𝑡 = 𝑎0 𝑞2𝑡 − 𝑎1 𝑞2𝑡 − 𝑎1 𝑞1𝑡 𝑞2𝑡 which evidently equal
′ 𝑎0
1 0 2 0 1
′ ⎡ ⎤ ⎡ 𝑎0
𝑧𝑡 𝑅1 𝑧𝑡 ≡ ⎢𝑞2𝑡 ⎥ ⎢ 2 −𝑎1 − 2 ⎥ ⎢𝑞2𝑡 ⎤
𝑎1 ⎤ ⎡
⎥
⎣𝑞1𝑡 ⎦ ⎣ 0 − 𝑎21 0 ⎦ ⎣𝑞1𝑡 ⎦
where
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑅1 0
𝑅=[ ]
0 0
First, let’s get a recursive representation of the Stackelberg leader’s choice of 𝑞2⃗ for our
duopoly model
That we distinguish 𝑧𝑡̌ from 𝑧𝑡 is part and parcel of the Big K, little k device in this in-
stance
We have demonstrated that a Stackelberg plan for {𝑢𝑡 }∞
𝑡=0 has a recursive representation
−1
𝑥0̌ = −𝑃22 𝑃21 𝑧0
𝑢𝑡 = −𝐹 𝑦𝑡̌ , 𝑡≥0
𝑦𝑡+1
̌ = (𝐴 − 𝐵𝐹 )𝑦𝑡̌ , 𝑡≥0
From this representation, we can deduce the sequence of functions 𝜎 = {𝜎𝑡 (𝑧𝑡̌ )}∞
𝑡=0 that com-
prise a Stackelberg plan
𝑧̌
For convenience, let 𝐴 ̌ ≡ 𝐴 − 𝐵𝐹 and partition 𝐴 ̌ conformably to the partition 𝑦𝑡 = [ 𝑡 ] as
𝑥𝑡̌
𝐴̌ ̌
𝐴12
[ 11̌ ̌ ]
𝐴21 𝐴22
𝑡
𝑥𝑡 = ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗
̌
𝑗=1
where
̌
𝐻1𝑡 = 𝐴21
𝐻𝑡 = 𝐴̌ 𝐴̌
2 22 21
⋮ ⋮
𝑡
𝐻𝑡−1 ̌ 𝐴̌
= 𝐴𝑡−2
22 21
𝐻𝑡𝑡 = ̌
𝐴𝑡−1 ̌ ̌ 𝐻 0)
22 (𝐴21 + 𝐴22 0
𝑧̌
𝑢𝑡 = −𝐹 𝑦𝑡̌ ≡ − [𝐹𝑧 𝐹𝑥 ] [ 𝑡 ]
𝑥𝑡
or
𝑡
𝑢𝑡 = −𝐹𝑧 𝑧𝑡̌ − 𝐹𝑥 ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗 = 𝜎𝑡 (𝑧𝑡̌ ) (10)
𝑗=1
Representation Eq. (10) confirms that whenever 𝐹𝑥 ≠ 0, the typical situation, the time 𝑡
component 𝜎𝑡 of a Stackelberg plan is history-dependent, meaning that the Stackelberg
leader’s choice 𝑢𝑡 depends not just on 𝑧𝑡̌ but on components of 𝑧𝑡−1
̌
1252 76. STACKELBERG PLANS
After all, at the end of the day, it will turn out that because we set 𝑧0̌ = 𝑧0 , it will be true
that 𝑧𝑡 = 𝑧𝑡̌ for all 𝑡 ≥ 0
Then why did we distinguish 𝑧𝑡̌ from 𝑧𝑡 ?
The answer is that if we want to present to the Stackelberg follower a history-dependent
representation of the Stackelberg leader’s sequence 𝑞2⃗ , we must use representation Eq. (10)
cast in terms of the history 𝑧𝑡̌ and not a corresponding representation cast in terms of 𝑧𝑡
Given the sequence 𝑞2⃗ chosen by the Stackelberg leader in our duopoly model, it turns out
that the Stackelberg follower’s problem is recursive in the natural state variables that con-
front a follower at any time 𝑡 ≥ 0
This means that the follower’s plan is time consistent
To verify these claims, we’ll formulate a recursive version of a follower’s problem that builds
on our recursive representation of the Stackelberg leader’s plan and our use of the Big K,
little k idea
We now use what amounts to another “Big 𝐾, little 𝑘” trick (see rational expectations equi-
librium) to formulate a recursive version of a follower’s problem cast in terms of an ordinary
Bellman equation
Firm 1, the follower, faces {𝑞2𝑡 }∞
𝑡=0 as a given quantity sequence chosen by the leader and be-
lieves that its output price at 𝑡 satisfies
To do so, recall that under the Stackelberg plan, firm 2 sets output according to the 𝑞2𝑡 com-
ponent of
1
⎡𝑞 ⎤
𝑦𝑡+1 = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡 ⎥
⎣ 𝑥𝑡 ⎦
which is governed by
𝑦𝑡+1 = (𝐴 − 𝐵𝐹 )𝑦𝑡
1
⎡𝑞 ⎤
𝑦𝑡̃ = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑡̃ ⎦
𝑥
̃ = (𝐴 − 𝐵𝐹 )𝑦𝑡̃
𝑦𝑡+1
−1
subject to the initial condition 𝑞10
̃ = 𝑞10 and 𝑥0̃ = 𝑥0 where 𝑥0 = −𝑃22 𝑃21 as stated above
Firm 1’s state vector is
𝑦𝑡̃
𝑋𝑡 = [ ]
𝑞1𝑡
𝑦̃ 𝐴 − 𝐵𝐹 0 𝑦𝑡̃ 0
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑥𝑡 (11)
𝑞1𝑡+1 0 1 𝑞1𝑡 1
This specification assures that from the point of the view of a firm 1, 𝑞2𝑡 is an exogenous pro-
cess
Here
• 𝑞1𝑡
̃ , 𝑥𝑡̃ play the role of Big K
• 𝑞1𝑡 , 𝑥𝑡 play the role of little k
′
1 0 0 0 0 𝑎20 1
⎡𝑞 ⎤ ⎡0 0 0 0 − 𝑎21 ⎤ ⎡𝑞2𝑡 ⎤
2𝑡 ⎥
̃ 𝑡 − 𝑥2𝑡 𝑄̃ = ⎢
𝑋̃ 𝑡′ 𝑅𝑥 ⎢𝑞1𝑡
̃ ⎥
⎢
⎢0 0 0 0
⎥⎢ ⎥
0 ⎥ ⎢𝑞1𝑡 ̃ ⎥ − 𝛾𝑥2𝑡
⎢ 𝑥𝑡̃ ⎥ ⎢0 0 0 0 0 ⎥ ⎢ 𝑥𝑡̃ ⎥
𝑎
⎣𝑞1𝑡 ⎦ ⎣ 20 − 𝑎21 0 0 −𝑎1 ⎦ ⎣𝑞1𝑡 ⎦
𝑥𝑡 = −𝐹 ̃ 𝑋𝑡
𝑋̃ 𝑡+1 = (𝐴 ̃ − 𝐵̃ 𝐹 ̃ )𝑋𝑡
1
⎡𝑞 ⎤
⎢ 20 ⎥
𝑋0 = ⎢𝑞10 ⎥
⎢ 𝑥0 ⎥
⎣𝑞10 ⎦
we recover
𝑥0 = −𝐹 ̃ 𝑋̃ 0
which will verify that we have properly set up a recursive representation of the follower’s
problem facing the Stackelberg leader’s 𝑞2⃗
Since the follower can solve its problem using dynamic programming its problem is recursive
in what for it are the natural state variables, namely
1
⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎢𝑞10
̃ ⎥
⎣ 𝑥0̃ ⎦
Here is our code to compute a Stackelberg plan via a linear-quadratic dynamic program as
outlined above
In [3]: # == Parameters == #
a0 = 10
a1 = 2
β = 0.96
γ = 120
n = 300
tol0 = 1e-8
tol1 = 1e-16
tol2 = 1e-2
βs = np.ones(n)
βs[1:] = β
βs = βs.cumprod()
In [4]: # == In LQ form == #
Alhs = np.eye(4)
76.7. COMPUTING THE STACKELBERG PLAN 1255
Arhs = np.eye(4)
Arhs[2, 3] = 1
Alhsinv = la.inv(Alhs)
A = Alhsinv @ Arhs
Q = np.array([[γ]])
# == Simulate forward == #
π_leader = np.zeros(n)
z0 = np.array([[1, 1, 1]]).T
x0 = H_0_0 @ z0
y0 = np.vstack((z0, x0))
π_matrix = (R + F. T @ Q @ F)
for t in range(n):
π_leader[t] = -(yt[:, t].T @ π_matrix @ yt[:, t])
# == Display policies == #
print("Computed policy for Stackelberg leader\n")
print(f"F = {F}")
# == Display values == #
print("Computed values for the Stackelberg leader at t=0:\n")
print(f"v_leader_forward(forward sim) = {v_leader_forward:.4f}")
print(f"v_leader_direct (direct) = {v_leader_direct:.4f}")
Out[7]: True
Out[8]: True
76.8. EXHIBITING TIME INCONSISTENCY OF STACKELBERG PLAN 1257
• the continuation value −𝑦𝑡 𝑃 𝑦𝑡 earned by a continuation Stackelberg leader who inherits
state 𝑦𝑡 at 𝑡
• the value of a reborn Stackelberg leader who inherits state 𝑧𝑡 at 𝑡 and sets 𝑥𝑡 =
−1
−𝑃22 𝑃21
The difference between these two values is a tell-tale time of the time inconsistency of the
Stackelberg plan
yt_reset = yt.copy()
yt_reset[-1, :] = (H_0_0 @ yt[:3, :])
for t in range(n):
vt_leader[t] = -yt[:, t].T @ P @ yt[:, t]
vt_reset_leader[t] = -yt_reset[:, t].T @ P @ yt_reset[:, t]
plt.tight_layout()
plt.show()
1258 76. STACKELBERG PLANS
We now formulate and compute the recursive version of the follower’s problem
We check that the recursive Big 𝐾 , little 𝑘 formulation of the follower’s problem produces
the same output path 𝑞1⃗ that we computed when we solved the Stackelberg problem
Q_tilde = Q
B_tilde = np.array([[0, 0, 0, 0, 1]]).T
In [12]: # Checks that the recursive formulation of the follower's problem gives
# the same solution as the original Stackelberg problem
plt.plot(yt_tilde[4], 'r', label="q_tilde")
plt.plot(yt_tilde[2], 'b', label="q")
plt.legend()
plt.show()
76.9. RECURSIVE FORMULATION OF THE FOLLOWER’S PROBLEM 1259
Note: Variables with _tilde are obtained from solving the follower’s problem – those with-
out are from the Stackelberg problem
In [13]: # Maximum absolute difference in quantities over time between the first and second solution methods
np.max(np.abs(yt_tilde[4] - yt_tilde[2]))
Out[13]: 1.7763568394002505e-15
In [14]: # x0 == x0_tilde
yt[:, 0][-1] - (yt_tilde[:, 1] - yt_tilde[:, 0])[-1] < tol0
Out[14]: True
If we inspect the coefficients in the decision rule −𝐹 ̃ , we can spot the reason that the follower
chooses to set 𝑥𝑡 = 𝑥𝑡̃ when it sets 𝑥𝑡 = −𝐹 ̃ 𝑋𝑡 in the recursive formulation of the follower
problem
Can you spot what features of 𝐹 ̃ imply this?
Hint: remember the components of 𝑋𝑡
Out[18]: True
for i in range(1000):
P_guess = ((R_tilde + F_tilde_star.T @ Q @ F_tilde_star) +
β * (A_tilde - B_tilde @ F_tilde_star).T @ P_guess
@ (A_tilde - B_tilde @ F_tilde_star))
Out[20]: 112.65590740578095
Out[21]: 112.65590740578085
for i in range(100):
# Compute P_iter
P_iter = np.zeros((5, 5))
for j in range(1000):
P_iter = ((R_tilde + F_iter.T @ Q @ F_iter) + β *
(A_tilde - B_tilde @ F_iter).T @ P_iter @
(A_tilde - B_tilde @ F_iter))
# Update F_iter
F_iter = (β * la.inv(Q + β * B_tilde.T @ P_iter @ B_tilde)
@ B_tilde.T @ P_iter @ A_tilde)
In [23]: # Simulate the system using `F_tilde_star` and check that it gives the same result as the original so
for t in range(n-1):
yt_tilde_star[t+1, :] = (A_tilde - B_tilde @ F_tilde_star) @ yt_tilde_star[t, :]
Out[24]: 0.0
1
𝑧𝑡 = ⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎣𝑞1𝑡 ⎦
0 0
𝐵1 = ⎢0⎤
⎡
⎥, 𝐵2 = ⎢1⎤
⎡
⎥
⎣1⎦ ⎣0⎦
𝑧𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2 )𝑧𝑡
In [25]: # == In LQ form == #
A = np.eye(3)
B1 = np.array([[0], [0], [1]])
B2 = np.array([[0], [1], [0]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# == Simulate forward == #
AF = A - B1 @ F1 - B2 @ F2
z = np.empty((3, n))
z[:, 0] = 1, 1, 1
for t in range(n-1):
z[:, t+1] = AF @ z[:, t]
# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
In [26]: q1 = z[1, :]
q2 = z[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
In [27]: # Computes the maximum difference between the two quantities of the two firms
np.max(np.abs(q1 - q2))
Out[27]: 6.8833827526759706e-15
π_1 = p * q1 - γ * (u1) ** 2
π_2 = p * q2 - γ * (u2) ** 2
# == Display values == #
print("Computed values for firm 1 and firm 2:\n")
print(f"v1(forward sim) = {v1_forward:.4f}; v1 (direct) = {v1_direct:.4f}")
print(f"v2 (forward sim) = {v2_forward:.4f}; v2 (direct) = {v2_direct:.4f}")
Out[29]: True
for t in range(n):
vt_MPE[t] = -z[:, t].T @ P1 @ z[:, t]
vt_follower[t] = -yt_tilde[:, t].T @ P_tilde @ yt_tilde[:, t]
Computed values:
vt_leader(y0) = 150.0324
vt_follower(y0) = 112.6559
vt_MPE(y0) = 133.3296
In [32]: # Compute the difference in total value between the Stackelberg and the MPE
vt_leader[0] + vt_follower[0] - 2 * vt_MPE[0]
Out[32]: -3.9709425620912953
77
77.1 Contents
• Overview 77.2
• Structure 77.4
1265
1266 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
77.2 Overview
This lecture describes a linear-quadratic version of a model that Guillermo Calvo [21] used to
illustrate the time inconsistency of optimal government plans
Like Chang [25], we use the model as a laboratory in which to explore the consequences of
different timing protocols for government decision making
The model focuses attention on intertemporal tradeoffs between
• rational expectations
• costly government actions at all dates 𝑡 ≥ 1 that increase household utilities at dates
before 𝑡
• two Bellman equations, one that expresses the private sector’s expectation of future in-
flation as a function of current and future government actions, another that describes
the value function of a Ramsey planner
There is no uncertainty
Let:
The demand for real balances is governed by a perfect foresight version of the Cagan [20] de-
mand function:
77.3. THE MODEL 1267
for 𝑡 ≥ 0
Equation Eq. (1) asserts that the demand for real balances is inversely related to the public’s
expected rate of inflation, which here equals the actual rate of inflation
(When there is no uncertainty, an assumption of rational expectations simplifies to per-
fect foresight)
(See [117] for a rational expectations version of the model when there is uncertainty)
Subtracting the demand function at time 𝑡 from the demand function at 𝑡 + 1 gives:
𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡
or
𝛼 1
𝜃𝑡 = 𝜃𝑡+1 + 𝜇 (2)
1+𝛼 1+𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 <1
Definition: For a scalar 𝑥𝑡 , let 𝐿2 be the space of sequences {𝑥𝑡 }∞
𝑡=0 satisfying
∞
∑ 𝑥2𝑡 < +∞
𝑡=0
∞ 𝑗
1 𝛼
𝜃𝑡 = ∑( ) 𝜇𝑡+𝑗 (3)
1 + 𝛼 𝑗=0 1 + 𝛼
Insight: In the spirit of Chang [25], note that equations Eq. (1) and Eq. (3) show that 𝜃𝑡
intermediates how choices of 𝜇𝑡+𝑗 , 𝑗 = 0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡
We shall use this insight to help us simplify and analyze government policy problems
That future rates of money creation influence earlier rates of inflation creates optimal govern-
ment policy problems in which timing protocols matter
We can rewrite the model as:
1 1 0 1 0
[ ]=[ 1+𝛼 ] [ ] + [ 1 ] 𝜇𝑡
𝜃𝑡+1 0 𝛼 𝜃𝑡 −𝛼
or
We write the model in the state-space form Eq. (4) even though 𝜃0 is to be determined and
so is not an initial condition as it ordinarily would be in the state-space model described in
Linear Quadratic Control
We write the model in the form Eq. (4) because we want to apply an approach described in
Stackelberg problems
Assume that a representative household’s utility of real balances at time 𝑡 is:
𝑎2
𝑈 (𝑚𝑡 − 𝑝𝑡 ) = 𝑎0 + 𝑎1 (𝑚𝑡 − 𝑝𝑡 ) − (𝑚𝑡 − 𝑝𝑡 )2 , 𝑎0 > 0, 𝑎1 > 0, 𝑎2 > 0 (5)
2
𝑎1
The “bliss level” of real balances is then 𝑎2
The money demand function Eq. (1) and the utility function Eq. (5) imply that utility maxi-
mizing or bliss level of real balances is attained when:
𝑎1
𝜃𝑡 = 𝜃∗ = −
𝑎2 𝛼
Below, we introduce the discount factor 𝛽 ∈ (0, 1) that a representative household and a
benevolent government both use to discount future utilities
(If we set parameters so that 𝜃∗ = log(𝛽), then we can regard a recommendation to set 𝜃𝑡 =
𝜃∗ as a “poor man’s Friedman rule” that attains Milton Friedman’s optimal quantity of
money)
Via equation Eq. (3), a government plan 𝜇⃗ = {𝜇𝑡 }∞
𝑡=0 leads to an equilibrium sequence of
inflation outcomes 𝜃 ⃗ = {𝜃𝑡 }∞
𝑡=0
We assume that social costs 2𝑐 𝜇2𝑡 are incurred at 𝑡 when the government changes the stock of
nominal money balances at rate 𝜇𝑡
Therefore, the one-period welfare function of a benevolent government is:
′
1 𝑎 − 𝑎12𝛼 1 𝑐 2
−𝑠(𝜃𝑡 , 𝜇𝑡 ) ≡ −𝑟(𝑥𝑡 , 𝜇𝑡 ) = [ ] [ 𝑎01 𝛼 ′ 2
𝑎2 𝛼2 ] [ ] − 𝜇𝑡 = −𝑥𝑡 𝑅𝑥𝑡 − 𝑄𝜇𝑡 (6)
𝜃𝑡 − 2 − 2 𝜃𝑡 2
∞ ∞
𝑣0 = − ∑ 𝛽 𝑟(𝑥𝑡 , 𝜇𝑡 ) = − ∑ 𝛽 𝑡 𝑠(𝜃𝑡 , 𝜇𝑡 )
𝑡
(7)
𝑡=0 𝑡=0
77.4 Structure
The following structure is induced by private agents’ behavior as summarized by the demand
function for money Eq. (1) that leads to equation Eq. (3) that tells how future settings of 𝜇
affect the current value of 𝜃
77.5. INTERTEMPORAL INFLUENCES 1269
Equation Eq. (3) maps a policy sequence of money growth rates 𝜇⃗ = {𝜇𝑡 }∞ 2
𝑡=0 ∈ 𝐿 into an
inflation sequence 𝜃 ⃗ = {𝜃𝑡 }∞
𝑡=0 ∈ 𝐿
2
𝑣𝑡 = 𝑠(𝜃𝑡 , 𝜇𝑡 ) + 𝛽𝑣𝑡+1
Criterion function Eq. (7) and the constraint system Eq. (4) exhibit the following structure:
That settings of 𝜇 at one date affect household utilities at earlier dates sets the stage for the
emergence of a time-inconsistent optimal government plan under a Ramsey (also called a
Stackelberg) timing protocol
We’ll study outcomes under a Ramsey timing protocol below
But we’ll also study the consequences of other timing protocols
In two of our models, a single policymaker chooses a sequence {𝜇𝑡 }∞𝑡=0 once and for all, taking
into account how 𝜇𝑡 affects household one-period utilities at dates 𝑠 = 0, 1, … , 𝑡 − 1
In two other models, there is a sequence of policymakers, each of whom sets 𝜇𝑡 at one 𝑡 only
• Each such policymaker ignores effects that its choice of 𝜇𝑡 has on household one-period
utilities at dates 𝑠 = 0, 1, … , 𝑡 − 1
1270 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
The four models differ with respect to timing protocols, constraints on government choices,
and government policymakers’ beliefs about how their decisions affect private agents’ beliefs
about future government decisions
The models are
– a time 𝑡 policymaker chooses 𝜇𝑡 only and forecasts that future government deci-
sions are unaffected by its choice
– a time 𝑡 policymaker chooses only 𝜇𝑡 but believes that its choice of 𝜇𝑡 shapes pri-
vate agents’ beliefs about future rates of money creation and inflation, and through
them, future government actions
Ω(𝑥0 ) = {(⃗⃗𝑥⃗⃗1 , 𝜇
⃗⃗⃗⃗0 ) ∶ 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝜇𝑡 , ∀𝑡 ≥ 0}
77.7.1 Subproblem 1
∞
𝐽 (𝑥0 ) = max ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝜇𝑡 )
(⃗⃗𝑥⃗ ⃗1 ,⃗⃗⃗⃗⃗
𝜇0 )∈Ω(𝑥0 )
𝑡=0
subject to:
77.7. A RAMSEY PLANNER 1271
𝑥′ = 𝐴𝑥 + 𝐵𝜇
As in Stackelberg problems, we map this problem into a linear-quadratic control problem and
then carefully use the optimal value function associated with it
Guessing that 𝐽 (𝑥) = −𝑥′ 𝑃 𝑥 and substituting into the Bellman equation gives rise to the
algebraic matrix Riccati equation:
𝜇𝑡 = −𝐹 𝑥𝑡
where
77.7.2 Subproblem 2
𝑉 = max 𝐽 (𝑥0 )
𝑥0
𝑃11 𝑃12 1
𝐽 (𝑥0 ) = − [1 𝜃0 ] [ ] [ ] = −𝑃11 − 2𝑃21 𝜃0 − 𝑃22 𝜃02
𝑃21 𝑃22 𝜃0
−2𝑃21 − 2𝑃22 𝜃0 = 0
which implies
𝑃21
𝜃0∗ = −
𝑃22
The preceding calculations indicate that we can represent a Ramsey plan 𝜇⃗ recursively with
the following system created in the spirit of Chang [25]:
1272 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
𝜃0 = 𝜃0∗
𝜇𝑡 = 𝑏0 + 𝑏1 𝜃𝑡 (9)
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡
The inflation rate 𝜃𝑡 that appears in the system Eq. (9) and equation Eq. (3) plays three roles
simultaneously:
As discussed in Stackelberg problems and Optimal taxation with state-contingent debt, a con-
tinuation Ramsey plan is not a Ramsey plan
This is a concise way of characterizing the time inconsistency of a Ramsey plan
The time inconsistency of a Ramsey plan has motivated other models of government decision
making that alter either
Instead of allowing the Ramsey government to choose different settings of its instrument at
different moments, we now assume that at time 0, a Ramsey government at time 0 once and
for all chooses a constant sequence 𝜇𝑡 = 𝜇̌ for all 𝑡 ≥ 0 to maximize
𝑐
𝑈 (−𝛼𝜇)̌ − 𝜇2̌
2
Here we have imposed the perfect foresight outcome implied by equation Eq. (2) that 𝜃𝑡 = 𝜇̌
when the government chooses a constant 𝜇 for all 𝑡 ≥ 0
With the quadratic form Eq. (5) for the utility function 𝑈 , the maximizing 𝜇̄ is
𝛼𝑎1
𝜇̌ = −
𝛼 2 𝑎2 + 𝑐
𝛼 1
𝜃𝑡 = 𝜇̄ + 𝜇
1+𝛼 1+𝛼 𝑡
𝑐
𝑊 = 𝑈 (−𝛼𝜃𝑡 ) − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
2
where 𝑉 (𝜇)̄ is the time 0 value 𝑣0 of recursion Eq. (8) under a money supply growth rate that
is forever constant at 𝜇̄
Substituting for 𝑈 and 𝜃𝑡 gives:
𝛼2 𝛼 𝑎 𝛼2 𝛼 𝑐
𝑊 = 𝑎0 + 𝑎1 (− 𝜇̄ − 𝜇𝑡 ) − 2 ((− 𝜇̄ − 𝜇𝑡 )2 − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
1+𝛼 1+𝛼 2 1+𝛼 1+𝛼 2
𝛼 𝛼2 𝛼 𝛼
− 𝑎1 − 𝑎2 (− 𝜇̄ − 𝜇𝑡 )(− ) − 𝑐𝜇𝑡 = 0
1+𝛼 1+𝛼 1+𝛼 1+𝛼
Rearranging we get:
1274 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
−𝑎1 𝛼 2 𝑎2
𝜇𝑡 = 1+𝛼 𝛼
− 𝜇̄
𝛼 𝑐 + 1+𝛼 𝑎2 [ 1+𝛼 𝛼
𝛼 𝑐 + 1+𝛼 𝑎2 ] (1 + 𝛼)
−𝑎1
𝜇𝑡 = 𝜇 ̄ = 1+𝛼 𝛼 𝛼2
𝛼 𝑐 + 1+𝛼 𝑎2 + 1+𝛼 𝑎2
In light of results presented in the previous section, this can be simplified to:
𝛼𝑎1
𝜇̄ = −
𝛼2 𝑎 2 + (1 + 𝛼)𝑐
Below we compute sequences {𝜃𝑡 , 𝜇𝑡 } under a Ramsey plan and compare these with the con-
stant levels of 𝜃 and 𝜇 in a) a Markov Perfect Equilibrium, and b) a Ramsey plan in which
the planner is restricted to choose 𝜇𝑡 = 𝜇̌ for all 𝑡 ≥ 0
We denote the Ramsey sequence as 𝜃𝑅 , 𝜇𝑅 and the MPE values as 𝜃𝑀𝑃 𝐸 , 𝜇𝑀𝑃 𝐸
The bliss level of inflation is denoted by 𝜃∗
First, we will create a class ChangLQ that solves the models and stores their values
class ChangLQ:
"""
Class to solve LQ Chang model
"""
def __init__(self, α, α0, α1, α2, c, T=1000, θ_n=200):
# Record parameters
self.α, self.α0, self.α1 = α, α0, α1
self.α2, self.c, self.T, self.θ_n = α2, c, T, θ_n
# LQ Matrices
R = -np.array([[α0, -α1 * α / 2],
[-α1 * α/2, -α2 * α**2 / 2]])
Q = -np.array([[-c / 2]])
A = np.array([[1, 0], [0, (1 + α) / α]])
B = np.array([[0], [-1 / α]])
# Solve Subproblem 2
self.θ_R = -self.P[0, 1] / self.P[1, 1]
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1275
self.J_series = J_series
self.μ_series = μ_series
self.θ_series = θ_series
J_LB = min(J_space)
J_UB = max(J_space)
J_range = J_UB - J_LB
self.J_LB = J_LB - 0.05 * J_range
self.J_UB = J_UB + 0.05 * J_range
self.J_range = J_range
self.J_space = J_space
self.θ_space = θ_space
self.μ_space = μ_space
self.θ_prime = θ_prime
self.check_space = check_space
Out[3]: 0.8464817248906141
The following code generates a figure that plots the value function from the Ramsey Planner’s
problem, which is maximized at 𝜃0𝑅
𝑅
The figure also shows the limiting value 𝜃∞ to which the inflation rate 𝜃𝑡 converges under the
Ramsey plan and compares it to the MPE value and the bliss value
"""
fig, ax = plt.subplots()
ax.set_xlim([clq.θ_LB, clq.θ_UB])
ax.set_ylim([clq.J_LB, clq.J_UB])
t1 = clq.θ_space[np.argmax(clq.J_space)]
tR = clq.θ_series[1, -1]
θ_points = [t1, tR, clq.θ_B, clq.θ_MPE]
labels = [r"$\theta_0^R$", r"$\theta_\infty^R$",
r"$\theta^*$", r"$\theta^{MPE}$"]
plot_value_function(clq)
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1277
The next code generates a figure that plots the value function from the Ramsey Planner’s
problem as well as that for a Ramsey planner that must choose a constant 𝜇 (that in turn
equals an implied constant 𝜃)
plt.xlabel(r"$\theta$", fontsize=18)
ax.plot(clq.θ_space, clq.check_space,
lw=2, label=r"$V^\check(\theta)$")
plt.legend(fontsize=14, loc='upper left')
θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.μ_check]
labels = [r"$\theta_0^R$", r"$\theta^\check$"]
compare_ramsey_check(clq)
1278 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
The next code generates figures that plot the policy functions for a continuation Ramsey
planner
The left figure shows the choice of 𝜃′ chosen by a continuation Ramsey planner who inherits 𝜃
The right figure plots a continuation Ramsey planner’s choice of 𝜇 as a function of an inher-
ited 𝜃
ax = axes[0]
ax.set_ylim([clq.θ_LB, clq.θ_UB])
ax.plot(clq.θ_space, clq.θ_prime,
label=r"$\theta'(\theta)$", lw=2)
x = np.linspace(clq.θ_LB, clq.θ_UB, 5)
ax.plot(x, x, 'k--', lw=2, alpha=0.7)
ax.set_ylabel(r"$\theta'$", fontsize=18)
θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.θ_series[1, -1]]
ax = axes[1]
μ_min = min(clq.μ_space)
μ_max = max(clq.μ_space)
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1279
for ax in axes:
ax.set_xlabel(r"$\theta$", fontsize=18)
ax.set_xlim([clq.θ_LB, clq.θ_UB])
plot_policy_functions(clq)
The following code generates a figure that plots sequences of 𝜇 and 𝜃 in the Ramsey plan and
compares these to the constant levels in a MPE and in a Ramsey plan with a government re-
stricted to set 𝜇𝑡 to a constant for all 𝑡
plt.tight_layout()
plt.show()
plot_ramsey_MPE(clq)
1280 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
The variation over time in 𝜇⃗ chosen by the Ramsey planner is a symptom of time inconsis-
tency
• The Ramsey planner reaps immediate benefits from promising lower inflation later to be
achieved by costly distorting taxes
• These benefits are intermediated by reductions in expected inflation that precede the re-
ductions in money creation rates that rationalize them, as indicated by equation Eq. (3)
• A government authority offered the opportunity to ignore effects on past utilities and to
reoptimize at date 𝑡 ≥ 1 would, if allowed, want to deviate from a Ramsey plan
Note: A modified Ramsey plan constructed under the restriction that 𝜇𝑡 must be constant
over time is time consistent (see 𝜇̌ and 𝜃 ̌ in the above graphs)
In settings in which governments actually choose sequentially, many economists regard a time
inconsistent plan implausible because of the incentives to deviate that occur along the plan
A way to summarize this defect in a Ramsey plan is to say that it is not credible because
there endure incentives for policymakers to deviate from it
For that reason, the Markov perfect equilibrium concept attracts many economists
The no incentive to deviate from the plan property is what makes the Markov perfect equilib-
rium concept attractive
Research by Abreu [1], Chari and Kehoe [26] [124], and Stokey [125] discovered conditions
under which a Ramsey plan can be rescued from the complaint that it is not credible
77.11. A FOURTH MODEL OF GOVERNMENT DECISION MAKING 1281
They accomplished this by expanding the description of a plan to include expectations about
adverse consequences of deviating from it that can serve to deter deviations
We turn to such theories of sustainable plans next
• The government chooses {𝜇𝑡 }∞ 𝑡=0 not once and for all at 𝑡 = 0 but chooses to set 𝜇𝑡 at
time 𝑡, not before
• private agents’ forecasts of {𝜇𝑡+𝑗+1 , 𝜃𝑡+𝑗+1 }∞
𝑗=0 respond to whether the government at 𝑡
confirms or disappoints their forecasts of 𝜇𝑡 brought into period 𝑡 from period 𝑡 − 1
• the government at each time 𝑡 understands how private agents’ forecasts will respond to
its choice of 𝜇𝑡
• at each 𝑡, the government chooses 𝜇𝑡 to maximize a continuation discounted utility of a
representative household
• at time 𝑡 − 1, private agents expect that the government will set 𝜇𝑡 = 𝜇𝑡̃ , and more
generally that it will set 𝜇𝑡+𝑗 = 𝜇𝑡+𝑗
̃ for all 𝑗 ≥ 0
̃
• Those forecasts determine a 𝜃𝑡 = 𝜃𝑡 and an associated log of real balances 𝑚𝑡 − 𝑝𝑡 =
−𝛼𝜃𝑡̃ at 𝑡
• Given those expectations and the associated 𝜃𝑡 , at 𝑡 a government is free to set 𝜇𝑡 ∈ R
• If the government at 𝑡 confirms private agents’ expectations by setting 𝜇𝑡 = 𝜇𝑡̃ at time
𝑡, private agents expect the continuation government policy {𝜇𝑡+𝑗+1 ̃ }∞
𝑗=0 and therefore
bring expectation 𝜃𝑡+1̃ into period 𝑡 + 1
• If the government at 𝑡 disappoints private agents by setting 𝜇𝑡 ≠ 𝜇𝑡̃ , private agents
expect {𝜇𝐴 ∞ 𝐴 ∞
𝑗 }𝑗=0 as the continuation policy for 𝑡 + 1, i.e., {𝜇𝑡+𝑗+1 } = {𝜇𝑗 }𝑗=0 and there-
fore expect 𝜃0𝐴 for 𝑡 + 1. Here 𝜇𝐴⃗ = {𝜇𝐴 ∞
𝑗 }𝑗=0 is an alternative government plan to be
described below
The government’s one-period return function 𝑠(𝜃, 𝜇) described in equation Eq. (6) above has
the property that for all 𝜃
𝑠(𝜃, 0) ≥ 𝑠(𝜃, 𝜇)
This inequality implies that whenever the policy calls for the government to set 𝜇 ≠ 0, the
government could raise its one-period return by setting 𝜇 = 0
Disappointing private sector expectations in that way would increase the government’s cur-
rent payoff but would have adverse consequences for subsequent government payoffs be-
cause the private sector would alter its expectations about future settings of 𝜇
1282 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
The temporary gain constitutes the government’s temptation to deviate from a plan
If the government at 𝑡 is to resist the temptation to raise its current payoff, it is only because
it forecasts adverse consequences that its setting of 𝜇𝑡 would bring for subsequent government
payoffs via alterations in the private sector’s expectations
That credible plans come in pairs seems to bring an explosion of plans to keep track of
But Dilip Abreu showed how to render manageable the number of plans that must be kept
track of
The key is an object called a self-enforcing plan
A plan 𝜇𝐴
⃗ is said to be self-enforcing if
𝑣𝑗𝐴 = 𝑠(𝜃𝑗𝐴 , 𝜇𝐴 𝐴
𝑗 ) + 𝛽𝑣𝑗+1
(10)
≥ 𝑠(𝜃𝑗𝐴 , 0) + 𝛽𝑣0𝐴 ≡ 𝑣𝑗𝐴,𝐷 , 𝑗≥0
77.12. SUSTAINABLE OR CREDIBLE PLAN 1283
(Here it is useful to recall that setting 𝜇 = 0 is the maximizing choice for the government’s
one-period return function)
The first line tells the consequences of confirming private agents’ expectations, while the sec-
ond line tells the consequences of disappointing private agents’ expectations
A consequence of the definition is that a self-enforcing plan is credible
Self-enforcing plans can be used to construct other credible plans, including ones with better
values
A sufficient condition for a plan 𝜇⃗ to be credible or sustainable is that
Abreu taught us that key step in constructing a credible plan is first constructing a self-
enforcing plan that has a low time 0 value
The idea is to use the self-enforcing plan as a continuation plan whenever the government’s
choice at time 𝑡 fails to confirm private agents’ expectation
We shall use a construction featured in [1] to construct a self-enforcing plan with low time 0
value
[1] invented a way to create a self-enforcing plan with a low initial value
Imitating his idea, we can construct a self-enforcing plan 𝜇⃗ with a low time 0 value to the
government by insisting that future government decision makers set 𝜇𝑡 to a value yielding
low one-period utilities to the household for a long time, after which government decisions
thereafter yield high one-period utilities
Consider a plan 𝜇𝐴
⃗ that sets 𝜇𝐴
𝑡 = 𝜇̄ (a high positive number) for 𝑇𝐴 periods, and then re-
verts to the Ramsey plan
Denote this sequence by {𝜇𝐴 ∞
𝑡 }𝑡=0
∞ 𝑗
1 𝛼
𝜃𝑡𝐴 = ∑( ) 𝜇𝐴
𝑡+𝑗
1 + 𝛼 𝑗=0 1 + 𝛼
𝑇𝐴 −1
𝑣0𝐴 = ∑ 𝛽 𝑡 𝑠(𝜃𝑡𝐴 , 𝜇𝐴
𝑡 )+𝛽
𝑇𝐴
𝐽 (𝜃0𝑅 )
𝑡=0
1284 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
clq.V_A = np.zeros(T)
for t in range(T):
clq.V_A[t] = sum(U_A[t:] / clq.β**t)
plt.tight_layout()
plt.show()
abreu_plan(clq)
77.12. SUSTAINABLE OR CREDIBLE PLAN 1285
To confirm that the plan 𝜇𝐴 ⃗ is self-enforcing, we plot an object that we call 𝑉𝑡𝐴,𝐷 , defined
in the second line of equation Eq. (10) above
𝑉𝑡𝐴,𝐷 is the value at 𝑡 of deviating from the self-enforcing plan 𝜇𝐴
⃗ by setting 𝜇𝑡 = 0 and then
1286 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
Out[9]: True
check_ramsey(clq)
Out[10]: True
We can represent a sustainable plan recursively by taking the continuation value 𝑣𝑡 as a state
variable
We form the following 3-tuple of functions:
𝜇𝑡̂ = 𝜈𝜇 (𝑣𝑡 )
𝜃𝑡 = 𝜈𝜃 (𝑣𝑡 ) (11)
𝑣𝑡+1 = 𝜈𝑣 (𝑣𝑡 , 𝜇𝑡 )
In [11]: clq.J_series[0]
Out[11]: 6.67918822960449
In [12]: clq.J_check
Out[12]: 6.676729524674898
In [13]: clq.J_MPE
Out[13]: 6.663435886995107
The theory deployed in this lecture is an application of what we nickname dynamic pro-
gramming squared
The nickname refers to the fact that a value satisfying one Bellman equation is itself an argu-
ment in a second Bellman equation
Thus, our models have involved two Bellman equations:
A value 𝜃 from one Bellman equation appears as an argument of a second Bellman equation
for another value 𝑣
1288 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
78
Optimal Taxation in an LQ
Economy
78.1 Contents
• Overview 78.2
• Implementation 78.4
• Examples 78.5
• Exercises 78.6
• Solutions 78.7
In addition to what’s in Anaconda, this lecture will need the following libraries
78.2 Overview
1289
1290 78. OPTIMAL TAXATION IN AN LQ ECONOMY
The Ramsey problem [106] is to choose tax and borrowing plans that maximize the house-
hold’s welfare, taking the household’s optimizing behavior as given
There is a large number of competitive equilibria indexed by different government fiscal poli-
cies
The Ramsey planner chooses the best competitive equilibrium
We want to study the dynamics of tax rates, tax revenues, government debt under a Ramsey
plan
Because the Lucas and Stokey model features state-contingent government debt, the govern-
ment debt dynamics differ substantially from those in a model of Robert Barro [11]
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde
We cover only the key features of the problem in this lecture, leaving you to refer to that
source for additional results and intuition
We begin by outlining the key assumptions regarding technology, households and the govern-
ment sector
78.3.1 Technology
78.3.2 Households
Consider a representative household who chooses a path {ℓ𝑡 , 𝑐𝑡 } for labor and consumption to
maximize
1 ∞
−E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] (1)
2 𝑡=0
∞
E ∑ 𝛽 𝑡 𝑝𝑡0 [𝑑𝑡 + (1 − 𝜏𝑡 )ℓ𝑡 + 𝑠𝑡 − 𝑐𝑡 ] = 0 (2)
𝑡=0
Here
The scaled Arrow-Debreu price 𝑝𝑡0 is related to the unscaled Arrow-Debreu price as follows.
If we let 𝜋𝑡0 (𝑥𝑡 ) denote the probability (density) of a history 𝑥𝑡 = [𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] of the state
𝑥𝑡 , then the Arrow-Debreu time 0 price of a claim on one unit of consumption at date 𝑡, his-
tory 𝑥𝑡 would be
𝛽 𝑡 𝑝𝑡0
𝜋𝑡0 (𝑥𝑡 )
Thus, our scaled Arrow-Debreu price is the ordinary Arrow-Debreu price multiplied by the
discount factor 𝛽 𝑡 and divided by an appropriate probability
The budget constraint Eq. (2) requires that the present value of consumption be restricted to
equal the present value of endowments, labor income and coupon payments on bond holdings
78.3.3 Government
The government imposes a linear tax on labor income, fully committing to a stochastic path
of tax rates at time zero
The government also issues state-contingent debt
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare
of the representative consumer
Endowments, government expenditure, the preference shock process 𝑏𝑡 , and promised coupon
payments on initial government debt 𝑠𝑡 are all exogenous, and given by
• 𝑑𝑡 = 𝑆𝑑 𝑥𝑡
• 𝑔𝑡 = 𝑆𝑔 𝑥𝑡
• 𝑏𝑡 = 𝑆𝑏 𝑥𝑡
• 𝑠𝑡 = 𝑆𝑠 𝑥𝑡
1292 78. OPTIMAL TAXATION IN AN LQ ECONOMY
The matrices 𝑆𝑑 , 𝑆𝑔 , 𝑆𝑏 , 𝑆𝑠 are primitives and {𝑥𝑡 } is an exogenous stochastic process taking
values in R𝑘
We consider two specifications for {𝑥𝑡 }
1. Discrete case: {𝑥𝑡 } is a discrete state Markov chain with transition matrix 𝑃
1. VAR case: {𝑥𝑡 } obeys 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 where {𝑤𝑡 } is independent zero-mean Gaussian
with identify covariance matrix
78.3.5 Feasibility
𝑐𝑡 + 𝑔 𝑡 = 𝑑 𝑡 + ℓ 𝑡 (3)
A labor-consumption process {ℓ𝑡 , 𝑐𝑡 } is called feasible if Eq. (3) holds for all 𝑡
Where 𝑝𝑡0 is again a scaled Arrow-Debreu price, the time zero government budget constraint
is
∞
E ∑ 𝛽 𝑡 𝑝𝑡0 (𝑠𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 ) = 0 (4)
𝑡=0
78.3.7 Equilibrium
An equilibrium is a feasible allocation {ℓ𝑡 , 𝑐𝑡 }, a sequence of prices {𝑝𝑡0 }, and a tax system
{𝜏𝑡 } such that
1. The allocation {ℓ𝑡 , 𝑐𝑡 } is optimal for the household given {𝑝𝑡0 } and {𝜏𝑡 }
2. The government’s budget constraint Eq. (4) is satisfied
The Ramsey problem is to choose the equilibrium {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } that maximizes the house-
hold’s welfare
If {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } solves the Ramsey problem, then {𝜏𝑡 } is called the Ramsey plan
The solution procedure we adopt is
1. Use the first-order conditions from the household problem to pin down prices and allo-
cations given {𝜏𝑡 }
2. Use these expressions to rewrite the government budget constraint Eq. (4) in terms of
exogenous variables and allocations
3. Maximize the household’s objective function Eq. (1) subject to the constraint con-
structed in step 2 and the feasibility constraint Eq. (3)
The solution to this maximization problem pins down all quantities of interest
78.3. THE RAMSEY PROBLEM 1293
78.3.8 Solution
Step one is to obtain the first-conditions for the household’s problem, taking taxes and prices
as given
Letting 𝜇 be the Lagrange multiplier on Eq. (2), the first-order conditions are 𝑝𝑡0 = (𝑐𝑡 − 𝑏𝑡 )/𝜇
and ℓ𝑡 = (𝑐𝑡 − 𝑏𝑡 )(1 − 𝜏𝑡 )
Rearranging and normalizing at 𝜇 = 𝑏0 − 𝑐0 , we can write these conditions as
𝑏𝑡 − 𝑐𝑡 ℓ𝑡
𝑝𝑡0 = and 𝜏𝑡 = 1 − (5)
𝑏0 − 𝑐0 𝑏𝑡 − 𝑐𝑡
Substituting Eq. (5) into the government’s budget constraint Eq. (4) yields
∞
E ∑ 𝛽 𝑡 [(𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 − ℓ𝑡 ) + ℓ𝑡2 ] = 0 (6)
𝑡=0
The Ramsey problem now amounts to maximizing Eq. (1) subject to Eq. (6) and Eq. (3)
The associated Lagrangian is
∞
1
ℒ = E ∑ 𝛽 𝑡 {− [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] + 𝜆 [(𝑏𝑡 − 𝑐𝑡 )(ℓ𝑡 − 𝑠𝑡 − 𝑔𝑡 ) − ℓ𝑡2 ] + 𝜇𝑡 [𝑑𝑡 + ℓ𝑡 − 𝑐𝑡 − 𝑔𝑡 ]} (7)
𝑡=0
2
and
ℓ𝑡 − 𝜆[(𝑏𝑡 − 𝑐𝑡 ) − 2ℓ𝑡 ] = 𝜇𝑡
Combining these last two equalities with Eq. (3) and working through the algebra, one can
show that
where
• 𝜈 ∶= 𝜆/(1 + 2𝜆)
• ℓ𝑡̄ ∶= (𝑏𝑡 − 𝑑𝑡 + 𝑔𝑡 )/2
• 𝑐𝑡̄ ∶= (𝑏𝑡 + 𝑑𝑡 − 𝑔𝑡 )/2
• 𝑚𝑡 ∶= (𝑏𝑡 − 𝑑𝑡 − 𝑠𝑡 )/2
Apart from 𝜈, all of these quantities are expressed in terms of exogenous variables
To solve for 𝜈, we can use the government’s budget constraint again
The term inside the brackets in Eq. (6) is (𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 ) − (𝑏𝑡 − 𝑐𝑡 )ℓ𝑡 + ℓ𝑡2
1294 78. OPTIMAL TAXATION IN AN LQ ECONOMY
Using Eq. (8), the definitions above and the fact that ℓ ̄ = 𝑏 − 𝑐,̄ this term can be rewritten as
∞ ∞
E {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} + (𝜈 2 − 𝜈)E {∑ 𝛽 𝑡 2𝑚2𝑡 } = 0 (9)
𝑡=0 𝑡=0
• The two expectations terms in Eq. (9) can be solved for in terms of model primitives
• This in turn allows us to solve for the Lagrange multiplier 𝜈
• With 𝜈 in hand, we can go back and solve for the allocations via Eq. (8)
• Once we have the allocations, prices and the tax system can be derived from Eq. (5)
∞ ∞
𝑏0 ∶= E {∑ 𝛽 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )}
𝑡
and 𝑎0 ∶= E {∑ 𝛽 𝑡 2𝑚2𝑡 } (10)
𝑡=0 𝑡=0
𝑏0 + 𝑎0 (𝜈 2 − 𝜈) = 0
for 𝜈
Provided that 4𝑏0 < 𝑎0 , there is a unique solution 𝜈 ∈ (0, 1/2), and a unique corresponding
𝜆>0
Let’s work out how to compute mathematical expectations in Eq. (10)
For the first one, the random variable (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) inside the summation can be expressed
as
1 ′
𝑥 (𝑆 − 𝑆𝑑 + 𝑆𝑔 )′ (𝑆𝑔 + 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
For the second expectation in Eq. (10), the random variable 2𝑚2𝑡 can be written as
1 ′
𝑥 (𝑆 − 𝑆𝑑 − 𝑆𝑠 )′ (𝑆𝑏 − 𝑆𝑑 − 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
It follows that both objects of interest are special cases of the expression
∞
𝑞(𝑥0 ) = E ∑ 𝛽 𝑡 𝑥′𝑡 𝐻𝑥𝑡 (11)
𝑡=0
78.3. THE RAMSEY PROBLEM 1295
The first equation is known as a discrete Lyapunov equation and can be solved using this
function
Next, suppose that {𝑥𝑡 } is the discrete Markov process described above
Suppose further that each 𝑥𝑡 takes values in the state space {𝑥1 , … , 𝑥𝑁 } ⊂ R𝑘
Let ℎ ∶ R𝑘 → R be a given function, and suppose that we wish to evaluate
∞
𝑞(𝑥0 ) = E ∑ 𝛽 𝑡 ℎ(𝑥𝑡 ) given 𝑥0 = 𝑥𝑗
𝑡=0
∞
𝑞(𝑥0 ) = ∑ 𝛽 𝑡 (𝑃 𝑡 ℎ)[𝑗] (12)
𝑡=0
Here
It can be shown that Eq. (12) is in fact equal to the 𝑗-th element of the vector (𝐼 − 𝛽𝑃 )−1 ℎ
This last fact is applied in the calculations below
We are interested in tracking several other variables besides the ones described above.
To prepare the way for this, we define
𝑡
𝑏𝑡+𝑗 − 𝑐𝑡+𝑗
𝑝𝑡+𝑗 =
𝑏𝑡 − 𝑐𝑡
as the scaled Arrow-Debreu time 𝑡 price of a history contingent claim on one unit of con-
sumption at time 𝑡 + 𝑗
1296 78. OPTIMAL TAXATION IN AN LQ ECONOMY
These are prices that would prevail at time 𝑡 if markets were reopened at time 𝑡
These prices are constituents of the present value of government obligations outstanding at
time 𝑡, which can be expressed as
∞
𝐵𝑡 ∶= E𝑡 ∑ 𝛽 𝑗 𝑝𝑡+𝑗
𝑡
(𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) (13)
𝑗=0
Using our expression for prices and the Ramsey plan, we can also write 𝐵𝑡 as
∞ 2
(𝑏𝑡+𝑗 − 𝑐𝑡+𝑗 )(ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) − ℓ𝑡+𝑗
𝐵𝑡 = E𝑡 ∑ 𝛽 𝑗
𝑗=0
𝑏𝑡 − 𝑐𝑡
𝑡 𝑡 𝑡+1
𝑝𝑡+𝑗 = 𝑝𝑡+1 𝑝𝑡+𝑗
∞
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝐸𝑡 ∑ 𝑝𝑡+𝑗 (𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 )
𝑗=1
and
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 (14)
Define
𝑅𝑡−1 ∶= E𝑡 𝛽 𝑗 𝑝𝑡+1
𝑡
(15)
78.3.12 A Martingale
𝑡
Π𝑡 ∶= ∑ 𝜋𝑡
𝑠=0
• 𝑅𝑡 [𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ], which is what the government would have owed at the begin-
ning of period 𝑡 + 1 if it had simply borrowed at the one-period risk-free rate
rather than selling state-contingent securities
Thus, 𝜋𝑡+1 is the excess payout on the actual portfolio of state-contingent government debt
relative to an alternative portfolio sufficient to finance 𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 and consisting entirely of
risk-free one-period bonds
Use expressions Eq. (14) and Eq. (15) to obtain
1 𝑡
𝜋𝑡+1 = 𝐵𝑡+1 − 𝑡 [𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 ]
𝛽𝐸𝑡 𝑝𝑡+1
or
where 𝐸𝑡̃ is the conditional mathematical expectation taken with respect to a one-step tran-
sition density that has been formed by multiplying the original transition density with the
likelihood ratio
𝑡
𝑝𝑡+1
𝑚𝑡𝑡+1 = 𝑡
𝐸𝑡 𝑝𝑡+1
which asserts that {𝜋𝑡+1 } is a martingale difference sequence under the distorted probability
measure, and that {Π𝑡 } is a martingale under the distorted probability measure
In the tax-smoothing model of Robert Barro [11], government debt is a random walk
In the current model, government debt {𝐵𝑡 } is not a random walk, but the excess payoff
{Π𝑡 } on it is
78.4 Implementation
Parameters
===========
T: int
Length of the simulation
Returns
========
path: a namedtuple of type 'Path', containing
g - Govt spending
d - Endowment
b - Utility shift parameter
s - Coupon payment on existing debt
c - Consumption
l - Labor
p - Price
τ - Tax rate
rvn - Revenue
B - Govt debt
R - Risk-free gross return
π - One-period risk-free interest rate
Π - Cumulative rate of return, adjusted
ξ - Adjustment factor for Π
"""
# == Simplify names == #
β, Sg, Sd, Sb, Ss = econ.β, econ.Sg, econ.Sd, econ.Sb, econ.Ss
78.4. IMPLEMENTATION 1299
if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc
rvn = l * τ
return path
def gen_fig_1(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# == Prepare axes == #
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10))
plt.subplots_adjust(hspace=0.4)
for i in range(num_rows):
for j in range(num_cols):
axes[i, j].grid()
axes[i, j].set_xlabel('Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
def gen_fig_2(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# == Prepare axes == #
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
bbox = (0., 1.02, 1., .102)
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
The function var_quadratic_sum imported from quadsums is for computing the value of
Eq. (11) when the exogenous process {𝑥𝑡 } is of the VAR type described above
Below the definition of the function, you will see definitions of two namedtuple objects,
Economy and Path
The first is used to collect all the parameters and primitives of a given LQ economy, while the
second collects output of the computations
In Python, a namedtuple is a popular data type from the collections module of the
standard library that replicates the functionality of a tuple, but also allows you to assign a
name to each tuple element
These elements can then be references via dotted attribute notation — see for example the
use of path in the functions gen_fig_1() and gen_fig_2()
The benefits of using namedtuples:
78.5 Examples
• 𝛽 = 1/1.05
• 𝑏𝑡 = 2.135 and 𝑠𝑡 = 𝑑𝑡 = 0 for all 𝑡
In [3]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .7, .35
A = eye(2)
A[0, :] = ρ, mg * (1-ρ)
C = np.zeros((2, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 10
Sg = np.array((1, 0)).reshape(1, 2)
Sd = np.array((0, 0)).reshape(1, 2)
Sb = np.array((0, 2.135)).reshape(1, 2)
Ss = np.array((0, 0)).reshape(1, 2)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
78.5. EXAMPLES 1303
In [4]: gen_fig_2(path)
1304 78. OPTIMAL TAXATION IN AN LQ ECONOMY
Our second example adopts a discrete Markov specification for the exogenous process
In [5]: # == Parameters == #
β = 1 / 1.05
P = np.array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 1, 0, 0)).reshape(1, 5)
78.5. EXAMPLES 1305
Ss = np.array((0, 0, 0, 1, 0)).reshape(1, 5)
T = 15
path = compute_paths(T, economy)
gen_fig_1(path)
In [6]: gen_fig_2(path)
1306 78. OPTIMAL TAXATION IN AN LQ ECONOMY
78.6 Exercises
78.6.1 Exercise 1
78.7 Solutions
78.7.1 Exercise 1
In [7]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .95, .35
A = np.array([[0, 0, 0, ρ, mg*(1-ρ)],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1]])
C = np.zeros((5, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 8
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 0, 0, 2.135)).reshape(1, 5) # Chosen st. (Sc + Sg) * x0 = 1
Ss = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
In [8]: gen_fig_2(path)
1308 78. OPTIMAL TAXATION IN AN LQ ECONOMY
79
79.1 Contents
• Overview 79.2
• Examples 79.5
In addition to what’s in Anaconda, this lecture will need the following libraries
79.2 Overview
This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and
Nancy Stokey [90]
The model revisits classic issues about how to pay for a war
Here a war means a more or less temporary surge in an exogenous government expenditure
process
The model features
1309
1310 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
• a Ramsey planner who at time 𝑡 = 0 chooses a plan for taxes and trades of Arrow secu-
rities for all 𝑡 ≥ 0
After first presenting the model in a space of sequences, we shall represent it recursively
in terms of two Bellman equations formulated along lines that we encountered in Dynamic
Stackelberg models
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state
vector artfully
In particular, we shall include forward-looking variables that summarize optimal responses of
private agents to a Ramsey plan
See Optimal taxation for analysis within a linear-quadratic setting
For 𝑡 ≥ 0, a history 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ] of an exogenous state 𝑠𝑡 has joint probability density
𝜋𝑡 (𝑠𝑡 )
We begin by assuming that government purchases 𝑔𝑡 (𝑠𝑡 ) at time 𝑡 ≥ 0 depend on 𝑠𝑡
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 and date 𝑡
A representative household is endowed with one unit of time that can be divided between
leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between 𝑐𝑡 (𝑠𝑡 ) and 𝑔𝑡 (𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where the utility function 𝑢 is increasing, strictly concave, and three times continuously dif-
ferentiable in both arguments
The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡
The government imposes a flat-rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡
There are complete markets in one-period Arrow securities
One unit of an Arrow security issued at time 𝑡 at history 𝑠𝑡 and promising to pay one unit of
time 𝑡 + 1 consumption in state 𝑠𝑡+1 costs 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
The government issues one-period Arrow securities each period
The government has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1311
𝑔𝑡 (𝑠𝑡 ) = 𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) − 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) (4)
𝑠𝑡+1
where
• 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) is a competitive equilibrium price of one unit of consumption at date 𝑡 + 1
in state 𝑠𝑡+1 at date 𝑡 and history 𝑠𝑡
• 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) is government debt falling due at time 𝑡, history 𝑠𝑡
𝑐𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = [1 − 𝜏𝑡 (𝑠𝑡 )] 𝑛𝑡 (𝑠𝑡 ) + 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) ∀𝑡 ≥ 0 (5)
𝑠𝑡+1
The household faces the price system as a price-taker and takes the government policy as
given
The household chooses {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞
𝑡=0 to maximize Eq. (3) subject to Eq. (5) and Eq. (1)
for all 𝑡, 𝑠𝑡
A competitive equilibrium with distorting taxes is a feasible allocation, a price system,
and a government policy such that
• Given the price system and the government policy, the allocation solves the household’s
optimization problem
• Given the allocation, government policy, and price system, the government’s budget
constraint is satisfied for all 𝑡, 𝑠𝑡
We find it convenient sometimes to work with the Arrow-Debreu price system that is implied
by a sequence of Arrow securities prices
Let 𝑞𝑡0 (𝑠𝑡 ) be the price at time 0, measured in time 0 consumption goods, of one unit of con-
sumption at time 𝑡, history 𝑠𝑡
1312 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
0
𝑞𝑡+1 (𝑠𝑡+1 ) = 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑞𝑡0 (𝑠𝑡 ) 𝑠.𝑡. 𝑞00 (𝑠0 ) = 1 (6)
Arrow-Debreu prices are useful when we want to compress a sequence of budget constraints
into a single intertemporal budget constraint, as we shall find it convenient to do below
We apply a popular approach to solving a Ramsey problem, called the primal approach
The idea is to use first-order conditions for household optimization to eliminate taxes and
prices in favor of quantities, then pose an optimization problem cast entirely in terms of
quantities
After Ramsey quantities have been found, taxes and prices can then be unwound from the
allocation
The primal approach uses four steps:
1. Obtain first-order conditions of the household’s problem and solve them for
{𝑞𝑡0 (𝑠𝑡 ), 𝜏𝑡 (𝑠𝑡 )}∞ 𝑡 𝑡 ∞
𝑡=0 as functions of the allocation {𝑐𝑡 (𝑠 ), 𝑛𝑡 (𝑠 )}𝑡=0
2. Substitute these expressions for taxes and prices in terms of the allocation into the
household’s present-value budget constraint
• This intertemporal constraint involves only the allocation and is regarded as an imple-
mentability constraint
1. Find the allocation that maximizes the utility of the representative household Eq. (3)
subject to the feasibility constraints Eq. (1) and Eq. (2) and the implementability condi-
tion derived in step 2
1. Use the Ramsey allocation together with the formulas from step 1 to find taxes and
prices
By sequential substitution of one one-period budget constraint Eq. (5) into another, we can
obtain the household’s present-value budget constraint:
∞ ∞
∑ ∑ 𝑞𝑡0 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) = ∑ ∑ 𝑞𝑡0 (𝑠𝑡 )[1 − 𝜏𝑡 (𝑠𝑡 )]𝑛𝑡 (𝑠𝑡 ) + 𝑏0 (7)
𝑡=0 𝑠𝑡 𝑡=0 𝑠𝑡
First-order conditions for the household’s problem for ℓ𝑡 (𝑠𝑡 ) and 𝑏𝑡 (𝑠𝑡+1 |𝑠𝑡 ), respectively, im-
ply
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (8)
𝑢𝑐 (𝑠𝑡 )
and
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽𝜋(𝑠𝑡+1 |𝑠𝑡 ) ( ) (9)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡 )
𝑞𝑡0 (𝑠𝑡 ) = 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ) (10)
𝑢𝑐 (𝑠0 )
Using the first-order conditions Eq. (8) and Eq. (9) to eliminate taxes and prices from
Eq. (7), we derive the implementability condition
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )[𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝑢𝑐 (𝑠0 )𝑏0 = 0 (11)
𝑡=0 𝑠𝑡
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] (12)
𝑡=0 𝑠𝑡
𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] = 𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] + Φ [𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] (13)
∞
𝐽 = ∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ){𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] + 𝜃𝑡 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )]} − Φ𝑢𝑐 (0)𝑏0 (14)
𝑡=0 𝑠𝑡
where {𝜃𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 is a sequence of Lagrange multipliers on the feasible conditions Eq. (2)
Given an initial government debt 𝑏0 , we want to maximize 𝐽 with respect to
{𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 and to minimize with respect to {𝜃(𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0
1314 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
The first-order conditions for the Ramsey problem for periods 𝑡 ≥ 1 and 𝑡 = 0, respectively,
are
𝑐𝑡 (𝑠𝑡 )∶ (1 + Φ)𝑢𝑐 (𝑠𝑡 ) + Φ [𝑢𝑐𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ𝑐 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝜃𝑡 (𝑠𝑡 ) = 0, 𝑡≥1
𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡
(15)
𝑛𝑡 (𝑠 )∶ − (1 + Φ)𝑢ℓ (𝑠 ) − Φ [𝑢𝑐ℓ (𝑠 )𝑐𝑡 (𝑠 ) − 𝑢ℓℓ (𝑠 )𝑛𝑡 (𝑠 )] + 𝜃𝑡 (𝑠 ) = 0, 𝑡≥1
and
𝑐0 (𝑠0 , 𝑏0 )∶ (1 + Φ)𝑢𝑐 (𝑠0 , 𝑏0 ) + Φ [𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓ𝑐 (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] − 𝜃0 (𝑠0 , 𝑏0 )
− Φ𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑏0 = 0
(16)
𝑛0 (𝑠0 , 𝑏0 )∶ − (1 + Φ)𝑢ℓ (𝑠0 , 𝑏0 ) − Φ [𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓℓ (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] + 𝜃0 (𝑠0 , 𝑏0 )
+ Φ𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑏0 = 0
𝑔𝑡 (𝑠𝑡 ) = 𝑔𝜏 (𝑠𝜏̃ ) = 𝑔
then it follows from Eq. (17) that the Ramsey choices of consumption and leisure,
(𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )) and (𝑐𝑗 (𝑠𝜏̃ ), ℓ𝑗 (𝑠𝜏̃ )), are identical
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1315
The proposition asserts that the optimal allocation is a function of the currently realized
quantity of government purchases 𝑔 only and does not depend on the specific history that
preceded that realization of 𝑔
Also, assume that government purchases 𝑔 are an exact time-invariant function 𝑔(𝑠) of 𝑠
We maintain these assumptions throughout the remainder of this lecture
We complete the Ramsey plan by computing the Lagrange multiplier Φ on the implementabil-
ity constraint Eq. (11)
Government budget balance restricts Φ via the following line of reasoning
The household’s first-order conditions imply
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (19)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽Π(𝑠𝑡+1 |𝑠𝑡 ) (20)
𝑢𝑐 (𝑠𝑡 )
1316 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
Substituting from Eq. (19), Eq. (20), and the feasibility condition Eq. (2) into the recursive
version Eq. (5) of the household budget constraint gives
𝑢𝑐 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )] + 𝛽 ∑ Π(𝑠𝑡+1 |𝑠𝑡 )𝑢𝑐 (𝑠𝑡+1 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
𝑠𝑡+1 (21)
𝑡 𝑡 𝑡 𝑡−1
= 𝑢𝑙 (𝑠 )𝑛𝑡 (𝑠 ) + 𝑢𝑐 (𝑠 )𝑏𝑡 (𝑠𝑡 |𝑠 )
where 𝑠′ denotes a next period value of 𝑠 and 𝑥′ (𝑠′ ) denotes a next period value of 𝑥
Equation Eq. (22) is easy to solve for 𝑥(𝑠) for 𝑠 = 1, … , 𝑆
If we let 𝑛,⃗ 𝑔,⃗ 𝑥⃗ denote 𝑆 × 1 vectors whose 𝑖th elements are the respective 𝑛, 𝑔, and 𝑥 values
when 𝑠 = 𝑖, and let Π be the transition matrix for the Markov state 𝑠, then we can express
Eq. (22) as the matrix equation
In these equations, by 𝑢⃗𝑐 𝑛,⃗ for example, we mean element-by-element multiplication of the
two vectors
𝑥(𝑠)
After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the
matrix equation
𝑥⃗
𝑏⃗ = (25)
𝑢⃗𝑐
where division here means an element-by-element division of the respective components of the
𝑆 × 1 vectors 𝑥⃗ and 𝑢⃗𝑐
Here is a computational algorithm:
1. Start with a guess for the value for Φ, then use the first-order conditions and the feasi-
bility conditions to compute 𝑐(𝑠𝑡 ), 𝑛(𝑠𝑡 ) for 𝑠 ∈ [1, … , 𝑆] and 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ),
given Φ
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1317
• these depend on Φ
𝑆
𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠|𝑠0 )𝑥(𝑠) (26)
𝑠=1
by gradually raising Φ if the left side of Eq. (26) exceeds the right side and lowering Φ if the
left side is less than the right side
1. After computing a Ramsey allocation, recover the flat tax rate on labor from Eq. (8) and
the implied one-period Arrow securities prices from Eq. (9)
In summary, when 𝑔𝑡 is a time-invariant function of a Markov state 𝑠𝑡 , a Ramsey plan can be
constructed by solving 3𝑆 + 3 equations in 𝑆 components each of 𝑐,⃗ 𝑛,⃗ and 𝑥⃗ together with
𝑛0 , 𝑐0 , and Φ
In our calculations below and in a subsequent lecture based on an extension of the Lucas-
Stokey model by Aiyagari, Marcet, Sargent, and Seppälä (2002) [5], we shall modify the one-
period utility function assumed above
(We adopted the preceding utility specification because it was the one used in the original
[90] paper)
We will modify their specification by instead assuming that the representative agent has util-
ity function
1318 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
With these understandings, equations Eq. (17) and Eq. (18) simplify in the case of the CRRA
utility function
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (28)
In equation Eq. (27), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (29)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate
In equation Eq. (29), it is understood that
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1319
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])
if not res.success:
raise Exception('Could not find LS allocation.')
z = res.x
c, n, Ξ = z[:S], z[S:2 * S], z[2 * S:]
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1321
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) in equation Eq. (21) appears to be a purely “forward-looking” vari-
able
But 𝑥𝑡 (𝑠𝑡 ) is a also a natural candidate for a state variable in a recursive formulation of the
Ramsey problem
To express a Ramsey plan recursively, we imagine that a time 0 Ramsey planner is followed
by a sequence of continuation Ramsey planners at times 𝑡 = 1, 2, …
A “continuation Ramsey planner” has a different objective function and faces different con-
straints than a Ramsey planner
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled
government debts 𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as predetermined quantities that continuation
Ramsey planners at times 𝑡 ≥ 1 are obligated to attain
Continuation Ramsey planners do this by choosing continuation policies that induce the rep-
resentative household to make choices that imply that 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) = 𝑥𝑡 (𝑠𝑡 )
A time 𝑡 ≥ 1 continuation Ramsey planner delivers 𝑥𝑡 by choosing a suitable 𝑛𝑡 , 𝑐𝑡 pair and
a list of 𝑠𝑡+1 -contingent continuation quantities 𝑥𝑡+1 to bequeath to a time 𝑡 + 1 continuation
Ramsey planner
A time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables
But the time 0 Ramsey planner faces 𝑏0 , not 𝑥0 , as a state variable
Furthermore, the Ramsey planner cares about (𝑐0 (𝑠0 ), ℓ0 (𝑠0 )), while continuation Ramsey
planners do not
The time 0 Ramsey planner hands 𝑥1 as a function of 𝑠1 to a time 1 continuation Ramsey
planner
These lines of delegated authorities and responsibilities across time express the continuation
Ramsey planners’ obligations to implement their parts of the original Ramsey plan, designed
once-and-for-all at time 0
1322 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
After 𝑠𝑡 has been realized at time 𝑡 ≥ 1, the state variables confronting the time 𝑡 continua-
tion Ramsey planner are (𝑥𝑡 , 𝑠𝑡 )
We work backward by presenting a Bellman equation for 𝑉 (𝑥, 𝑠) first, then a Bellman equa-
tion for 𝑊 (𝑏, 𝑠)
where maximization over 𝑛 and the 𝑆 elements of 𝑥′ (𝑠′ ) is subject to the single imple-
mentability constraint for 𝑡 ≥ 1
𝑛𝑡 = 𝑓(𝑥𝑡 , 𝑠𝑡 ), 𝑡≥1
(32)
𝑥𝑡+1 (𝑠𝑡+1 ) = ℎ(𝑠𝑡+1 ; 𝑥𝑡 , 𝑠𝑡 ), 𝑠𝑡+1 ∈ 𝑆, 𝑡 ≥ 1
where maximization over 𝑛0 and the 𝑆 elements of 𝑥′ (𝑠1 ) is subject to the time 0 imple-
mentability constraint
Associated with a value function 𝑊 (𝑏0 , 𝑛0 ) that solves Bellman equation Eq. (33) are 𝑆 + 1
time 0 policy functions
𝑛0 = 𝑓0 (𝑏0 , 𝑠0 )
(35)
𝑥1 (𝑠1 ) = ℎ0 (𝑠1 ; 𝑏0 , 𝑠0 )
Notice the appearance of state variables (𝑏0 , 𝑠0 ) in the time 0 policy functions for the Ramsey
planner as compared to (𝑥𝑡 , 𝑠𝑡 ) in the policy functions Eq. (32) for the time 𝑡 ≥ 1 continua-
tion Ramsey planners
The value function 𝑉 (𝑥𝑡 , 𝑠𝑡 ) of the time 𝑡 continuation Ramsey planner equals
∞
𝐸𝑡 ∑𝜏=𝑡 𝛽 𝜏−𝑡 𝑢(𝑐𝑡 , 𝑙𝑡 ), where the consumption and leisure processes are evaluated along the
original time 0 Ramsey plan
Attach a Lagrange multiplier Φ1 (𝑥, 𝑠) to constraint Eq. (31) and a Lagrange multiplier Φ0 to
constraint Eq. (26)
Time 𝑡 ≥ 1: the first-order conditions for the time 𝑡 ≥ 1 constrained maximization problem on
the right side of the continuation Ramsey planner’s Bellman equation Eq. (30) are
for 𝑛
Given Φ1 , equation Eq. (37) is one equation to be solved for 𝑛 as a function of 𝑠 (or of 𝑔(𝑠))
Equation Eq. (36) implies 𝑉𝑥 (𝑥′ , 𝑠′ ) = Φ1 , while an envelope condition is 𝑉𝑥 (𝑥, 𝑠) = Φ1 , so it
follows that
Time 𝑡 = 0: For the time 0 problem on the right side of the Ramsey planner’s Bellman equa-
tion Eq. (33), first-order conditions are
𝑉𝑥 (𝑥(𝑠1 ), 𝑠1 ) = Φ0 (39)
Notice similarities and differences between the first-order conditions for 𝑡 ≥ 1 and for 𝑡 = 0
An additional term is present in Eq. (40) except in three special cases
1324 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
• 𝑏0 = 0, or
• 𝑢𝑐 is constant (i.e., preferences are quasi-linear in consumption), or
• initial government assets are sufficiently large to finance all government purchases with
interest earnings from those assets so that Φ0 = 0
Except in these special cases, the allocation and the labor tax rate as functions of 𝑠𝑡 differ
between dates 𝑡 = 0 and subsequent dates 𝑡 ≥ 1
Naturally, the first-order conditions in this recursive formulation of the Ramsey problem
agree with the first-order conditions derived when we first formulated the Ramsey plan in the
space of sequences
Equations Eq. (39) and Eq. (40) imply that Φ0 = Φ1 and that
𝑉𝑥 (𝑥𝑡 , 𝑠𝑡 ) = Φ0 (41)
for all 𝑡 ≥ 1
When 𝑉 is concave in 𝑥, this implies state-variable degeneracy along a Ramsey plan in the
sense that for 𝑡 ≥ 1, 𝑥𝑡 will be a time-invariant function of 𝑠𝑡
Given Φ0 , this function mapping 𝑠𝑡 into 𝑥𝑡 can be expressed as a vector 𝑥⃗ that solves equa-
tion Eq. (34) for 𝑛 and 𝑐 as functions of 𝑔 that are associated with Φ = Φ0
While the marginal utility adjusted level of government debt 𝑥𝑡 is a key state variable for the
continuation Ramsey planners at 𝑡 ≥ 1, it is not a state variable at time 0
The time 0 Ramsey planner faces 𝑏0 , not 𝑥0 = 𝑢𝑐,0 𝑏0 , as a state variable
The discrepancy in state variables faced by the time 0 Ramsey planner and the time 𝑡 ≥ 1
continuation Ramsey planners captures the differing obligations and incentives faced by the
time 0 Ramsey planner and the time 𝑡 ≥ 1 continuation Ramsey planners
• The time 0 Ramsey planner is obligated to honor government debt 𝑏0 measured in time
0 consumption goods
• The time 0 Ramsey planner can manipulate the value of government debt as measured
by 𝑢𝑐,0 𝑏0
• In contrast, time 𝑡 ≥ 1 continuation Ramsey planners are obligated not to alter values
of debt, as measured by 𝑢𝑐,𝑡 𝑏𝑡 , that they inherit from a preceding Ramsey planner or
continuation Ramsey planner
This in turn means that prices of one-period Arrow securities 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝑝(𝑠𝑡+1 |𝑠𝑡 ) will
be the same time-invariant functions of (𝑠𝑡+1 , 𝑠𝑡 ) for 𝑡 ≥ 1, but a different function 𝑝0 (𝑠1 |𝑠0 )
for 𝑡 = 0, except when 𝑏0 = 0
The differences between these time 0 and time 𝑡 ≥ 1 objects reflect the Ramsey planner’s
incentive to manipulate Arrow security prices and, through them, the value of initial govern-
ment debt 𝑏0
class RecursiveAllocation:
'''
Compute the planner's allocation by solving Bellman
equation.
'''
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
S = len(model.π)
# Create xgrid
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
for s in range(S):
diff = max(diff, np.abs(
(Vf[s](xgrid) - Vfnew[s](xgrid)) / Vf[s](xgrid)).max())
Vf = Vfnew
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
cHist[0], nHist[0], xprime = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = 0
# Time 1 onward
for t in range(1, T):
s, x = sHist[t], xprime[sHist[t]]
79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1327
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[sHist[t - 1]] @ u_c
μHist[t] = self.Vf[s](x, 1)
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s in range(self.S):
for x in xgrid:
xprime0 = np.empty(self.S)
for sprime in range(self.S):
xprime0[sprime] = xprimef[s, sprime](x)
self.z0[x, s] = np.hstack([cf[s](x), nf[s](x), xprime0])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + Un(self.cFB, self.nFB) * self.nFB
self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack([self.cFB[s], self.nFB[s], self.xFB])
Given continuation value function, next period return value function, this
period return T(V) and optimal policies
'''
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF
def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([x - Uc(c, n) * c - Un(c, n) * n - β * π[s] @ xprime,
(Θ * n - c - G)[s]])
if imode > 0:
raise Exception(smode)
self.z0[x, s] = out
return np.hstack([-fx, out])
def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([-Uc(c, n) * (c - B_) - Un(c, n) * n - β * π[s0] @ xprime,
(Θ * n - c - G)[s0]])
if imode > 0:
raise Exception(smode)
79.5 Examples
This example illustrates in a simple setting how a Ramsey planner manages risk
Government expenditures are known for sure in all periods except one
We define the components of the state vector as the following six (𝑡, 𝑔) pairs:
(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 )
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6
The transition matrix is
0 1 0 0 0 0
⎛
⎜0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
Π=⎜
⎜
⎜
⎟
⎟
⎜0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜0.1⎞⎟
⎜
⎜ ⎟
0.1⎟
𝑔=⎜
⎜
⎜
⎟
⎟
⎟
⎜0.1 ⎟
⎜
⎜0.2⎟⎟
⎝0.1⎠
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
sim_seq_h[4] = time_example.G[sHist_h]
# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]
plt.tight_layout()
plt.show()
Tax smoothing
• the tax rate is the same at 𝑡 = 3 for both the high 𝑔𝑡 outcome and the low 𝑔𝑡 outcome
𝑢𝑐,𝑡
𝑅𝑡 =
𝛽E𝑡 [𝑢𝑐,𝑡+1 ]
A tax policy that makes time 𝑡 = 0 consumption be higher than time 𝑡 = 1 consumption
evidently increases the risk-free rate one-period interest rate, 𝑅𝑡 , at 𝑡 = 0
Raising the time 𝑡 = 0 risk-free interest rate makes time 𝑡 = 0 consumption goods cheaper
relative to consumption goods at later dates, thereby lowering the value 𝑢𝑐,0 𝑏0 of initial gov-
ernment debt 𝑏0
We see this in a figure below that plots the time path for the risk-free interest rate under
both realizations of the time 𝑡 = 3 government expenditure shock
The following plot illustrates how the government lowers the interest rate at time 0 by raising
consumption
At time 𝑡 = 1, the government evidently saves since it has set the tax rate sufficiently high to
allow it to set 𝑏2 < 𝑏1
At time 𝑡 = 2 the government trades state-contingent Arrow securities to hedge against war
at 𝑡 = 3
At times 𝑡 ≥ 4 the government rolls over its debt, knowing that the tax rate is set at level
required to service the interest payments on the debt and government expenditures
We have seen that when 𝑏0 > 0, the Ramsey plan sets the time 𝑡 = 0 tax rate partly with an
eye toward raising a risk-free interest rate for one-period loans between times 𝑡 = 0 and 𝑡 = 1
By raising this interest rate, the plan makes time 𝑡 = 0 goods cheap relative to consumption
goods at later times
By doing this, it lowers the value of time 𝑡 = 0 debt that it has inherited and must finance
In the preceding example, the Ramsey tax rate at time 0 differs from its value at time 1
To explore what is going on here, let’s simplify things by removing the possibility of war at
time 𝑡 = 3
The Ramsey problem then includes no randomness because 𝑔𝑡 = 𝑔𝑙 for all 𝑡
The figure below plots the Ramsey tax rates and gross interest rates at time 𝑡 = 0 and time
𝑡 ≥ 1 as functions of the initial government debt (using the sequential allocation solution and
a CRRA utility function defined above)
n = 100
tax_policy = np.empty((n, 2))
interest_rate = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
1334 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
interest_rate[i] = tax_sequence.simulate(gov_debt[i], 0, 3)[-1]
fig.tight_layout()
plt.show()
The figure indicates that if the government enters with positive debt, it sets a tax rate at 𝑡 =
0 that is less than all later tax rates
By setting a lower tax rate at 𝑡 = 0, the government raises consumption, which reduces the
value 𝑢𝑐,0 𝑏0 of its initial debt
It does this by increasing 𝑐0 and thereby lowering 𝑢𝑐,0
Conversely, if 𝑏0 < 0, the Ramsey planner sets the tax rate at 𝑡 = 0 higher than in subsequent
periods
A side effect of lowering time 𝑡 = 0 consumption is that it raises the one-period interest rate
at time 0 above that of subsequent periods
79.5. EXAMPLES 1335
There are only two values of initial government debt at which the tax rate is constant for all
𝑡≥0
The first is 𝑏0 = 0
• Here the government can’t use the 𝑡 = 0 tax rate to alter the value of the
initial debt
The second occurs when the government enters with sufficiently large assets that the Ramsey
planner can achieve first best and sets 𝜏𝑡 = 0 for all 𝑡
It is only for these two values of initial government debt that the Ramsey plan is time-
consistent
Another way of saying this is that, except for these two values of initial government debt, a
continuation of a Ramsey plan is not a Ramsey plan
To illustrate this, consider a Ramsey planner who starts with an initial government debt 𝑏1
associated with one of the Ramsey plans computed above
Call 𝜏1𝑅 the time 𝑡 = 0 tax rate chosen by the Ramsey planner confronting this value for ini-
tial government debt government
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner
and what a new Ramsey planner would choose for its time 𝑡 = 0 tax rate
n = 100
tax_policy = np.empty((n, 2))
τ_reset = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
τ_reset[i] = tax_sequence.simulate(gov_debt[i], 0, 1)[3]
fig.tight_layout()
plt.show()
1336 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
The tax rates in the figure are equal for only two values of initial government debt
The complete tax smoothing for 𝑡 ≥ 1 in the preceding example is a consequence of our hav-
ing assumed CRRA preferences
To see what is driving this outcome, we begin by noting that the Ramsey tax rate for 𝑡 ≥ 1
is a time-invariant function 𝜏 (Φ, 𝑔) of the Lagrange multiplier on the implementability con-
straint and government expenditures
For CRRA preferences, we can exploit the relations 𝑈𝑐𝑐 𝑐 = −𝜎𝑈𝑐 and 𝑈𝑛𝑛 𝑛 = 𝛾𝑈𝑛 to derive
(1 + (1 − 𝜎)Φ)𝑈𝑐
=1
(1 + (1 − 𝛾)Φ)𝑈𝑛
def __init__(self,
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
79.5. EXAMPLES 1337
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)
Also, suppose that 𝑔𝑡 follows a two-state IID process with equal probabilities attached to 𝑔𝑙
and 𝑔ℎ
To compute the tax rate, we will use both the sequential and recursive approaches described
above
The figure below plots a sample path of the Ramsey tax rate
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1,
1, 1, 1, 1, 1, 0])
# Simulate
sim_seq = seq_log.simulate(0.5, 0, T, sHist)
sim_bel = bel_log.simulate(0.5, 0, T, sHist)
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
axes.flatten()[0].legend(('Sequential', 'Recursive'))
fig.tight_layout()
plt.show()
app.launch_new_instance()
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:20: RuntimeWarning: divide by zero enco
As should be expected, the recursive and sequential solutions produce almost identical alloca-
tions
Unlike outcomes with CRRA preferences, the tax rate is not perfectly smoothed
Instead, the government raises the tax rate when 𝑔𝑡 is high
A related lecture describes an extension of the Lucas-Stokey model by Aiyagari, Marcet, Sar-
gent, and Seppälä (2002) [5]
In th AMSS economy, only a risk-free bond is traded
That lecture compares the recursive representation of the Lucas-Stokey model presented in
this lecture with one for an AMSS economy
By comparing these recursive formulations, we shall glean a sense in which the dimension of
the state is lower in the Lucas Stokey model
Accompanying that difference in dimension will be different dynamics of government debt
80
80.1 Contents
• Overview 80.2
• Examples 80.5
In addition to what’s in Anaconda, this lecture will need the following libraries
80.2 Overview
In an earlier lecture, we described a model of optimal taxation with state-contingent debt due
to Robert E. Lucas, Jr., and Nancy Stokey [90]
Aiyagari, Marcet, Sargent, and Seppälä [5] (hereafter, AMSS) studied optimal taxation in a
model without state-contingent debt
In this lecture, we
1339
1340 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Many but not all features of the economy are identical to those of the Lucas-Stokey economy
Let’s start with things that are identical
For 𝑡 ≥ 0, a history of the state is represented by 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ]
Government purchases 𝑔(𝑠) are an exact time-invariant function of 𝑠
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 at time 𝑡
Each period a representative household is endowed with one unit of time that can be divided
between leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between consumption 𝑐𝑡 (𝑠𝑡 ) and 𝑔(𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where
The government imposes a flat rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities;
also see smoothing models
It is at this point that AMSS [5] modify the Lucas and Stokey economy
AMSS allow the government to issue only one-period risk-free debt each period
Ruling out complete markets in this way is a step in the direction of making total tax collec-
tions behave more like that prescribed in [11] than they do in [90]
• 𝑏𝑡+1 (𝑠𝑡 ) be the amount of the time 𝑡 + 1 consumption good that at time 𝑡 the govern-
ment promised to pay
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1341
• 𝑅𝑡 (𝑠𝑡 ) be the gross interest rate on risk-free one-period debt between periods 𝑡 and 𝑡 + 1
• 𝑇𝑡 (𝑠𝑡 ) be a non-negative lump-sum transfer to the representative household [1]
That 𝑏𝑡+1 (𝑠𝑡 ) is the same for all realizations of 𝑠𝑡+1 captures its risk-free character
The market value at time 𝑡 of government debt maturing at time 𝑡 + 1 equals 𝑏𝑡+1 (𝑠𝑡 ) divided
by 𝑅𝑡 (𝑠𝑡 )
The government’s budget constraint in period 𝑡 at history 𝑠𝑡 is
𝑏𝑡+1 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝜏𝑡𝑛 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) +
𝑅𝑡 (𝑠𝑡 )
(4)
𝑏 (𝑠𝑡 )
≡ 𝑧(𝑠 ) + 𝑡+1 𝑡 ,
𝑡
𝑅𝑡 (𝑠 )
𝑡+1
1 𝑡+1 𝑡 𝑢𝑐 (𝑠 )
= ∑ 𝛽𝜋 𝑡+1 (𝑠 |𝑠 )
𝑅𝑡 (𝑠𝑡 ) 𝑠𝑡+1 |𝑠𝑡 𝑢𝑐 (𝑠𝑡 )
Substituting this expression into the government’s budget constraint Eq. (4) yields:
𝑢𝑐 (𝑠𝑡+1 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝑧(𝑠𝑡 ) + 𝛽 ∑ 𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑏 (𝑠𝑡 ) (5)
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+1
Components of 𝑧(𝑠𝑡 ) on the right side depend on 𝑠𝑡 , but the left side is required to depend on
𝑠𝑡−1 only
This is what it means for one-period government debt to be risk-free
Therefore, the sum on the right side of equation Eq. (5) also has to depend only on 𝑠𝑡−1
This requirement will give rise to measurability constraints on the Ramsey allocation to
be discussed soon
If we replace 𝑏𝑡+1 (𝑠𝑡 ) on the right side of equation Eq. (5) by the right side of next period’s
budget constraint (associated with a particular realization 𝑠𝑡 ) we get
After making similar repeated substitutions for all future occurrences of government indebt-
edness, and by invoking the natural debt limit, we arrive at:
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = ∑ ∑ 𝛽 𝑗 𝜋𝑡+𝑗 (𝑠𝑡+𝑗 |𝑠𝑡 ) 𝑧(𝑠𝑡+𝑗 ) (6)
𝑗=0 𝑠𝑡+𝑗 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 )
Now let’s
1342 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
• substitute the resource constraint into the net-of-interest government surplus, and
• use the household’s first-order condition 1 − 𝜏𝑡𝑛 (𝑠𝑡 ) = 𝑢ℓ (𝑠𝑡 )/𝑢𝑐 (𝑠𝑡 ) to eliminate the
labor tax rate
𝑢ℓ (𝑠𝑡 )
𝑧(𝑠𝑡 ) = [1 − ] [𝑐𝑡 (𝑠𝑡 ) + 𝑔𝑡 (𝑠𝑡 )] − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) . (7)
𝑢𝑐 (𝑠𝑡 )
If we substitute the appropriate versions of the right side of Eq. (7) for 𝑧(𝑠𝑡+𝑗 ) into equation
Eq. (6), we obtain a sequence of implementability constraints on a Ramsey allocation in an
AMSS economy
Expression Eq. (6) at time 𝑡 = 0 and initial state 𝑠0 was also an implementability constraint
on a Ramsey allocation in a Lucas-Stokey economy:
∞
𝑢𝑐 (𝑠𝑗 )
𝑏0 (𝑠−1 ) = E0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) (8)
𝑗=0
𝑢𝑐 (𝑠0 )
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = E𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) (9)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
The expression on the right side of Eq. (9) in the Lucas-Stokey (1983) economy would equal
the present value of a continuation stream of government surpluses evaluated at what would
be competitive equilibrium Arrow-Debreu prices at date 𝑡
In the Lucas-Stokey economy, that present value is measurable with respect to 𝑠𝑡
In the AMSS economy, the restriction that government debt be risk-free imposes that that
same present value must be measurable with respect to 𝑠𝑡−1
In a language used in the literature on incomplete markets models, it can be said that the
AMSS model requires that at each (𝑡, 𝑠𝑡 ) what would be the present value of continuation
government surpluses in the Lucas-Stokey model must belong to the marketable subspace
of the AMSS model
After we have substituted the resource constraint into the utility function, we can express the
Ramsey problem as being to choose an allocation that solves
∞
max E0 ∑ 𝛽 𝑡 𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
{𝑐𝑡 (𝑠𝑡 ),𝑏𝑡+1 (𝑠𝑡 )}
𝑡=0
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1343
∞
𝑢𝑐 (𝑠𝑗 )
E0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) ≥ 𝑏0 (𝑠−1 ) (10)
𝑗=0
𝑢𝑐 (𝑠0 )
and
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
E𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) = 𝑏𝑡 (𝑠𝑡−1 ) ∀ 𝑠𝑡 (11)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
given 𝑏0 (𝑠−1 )
Lagrangian Formulation
Let 𝛾0 (𝑠0 ) be a non-negative Lagrange multiplier on constraint Eq. (10)
As in the Lucas-Stokey economy, this multiplier is strictly positive when the government must
resort to distortionary taxation; otherwise it equals zero
A consequence of the assumption that there are no markets in state-contingent securities and
that a market exists only in a risk-free security is that we have to attach stochastic processes
{𝛾𝑡 (𝑠𝑡 )}∞
𝑡=1 of Lagrange multipliers to the implementability constraints Eq. (11)
Depending on how the constraints bind, these multipliers can be positive or negative:
A negative multiplier 𝛾𝑡 (𝑠𝑡 ) < 0 means that if we could relax constraint Eq. (11), we would
like to increase the beginning-of-period indebtedness for that particular realization of history
𝑠𝑡
That would let us reduce the beginning-of-period indebtedness for some other history [2]
These features flow from the fact that the government cannot use state-contingent debt and
therefore cannot allocate its indebtedness efficiently across future states
∞
𝐽 = E0 ∑ 𝛽 𝑡 {𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0
∞
+ 𝛾𝑡 (𝑠𝑡 )[E𝑡 ∑ 𝛽 𝑗 𝑢𝑐 (𝑠𝑡+𝑗 ) 𝑧(𝑠𝑡+𝑗 ) − 𝑢𝑐 (𝑠𝑡 ) 𝑏𝑡 (𝑠𝑡−1 )}
𝑗=0
(12)
∞
= E0 ∑ 𝛽 {𝑢 (𝑐𝑡 (𝑠 ), 1 − 𝑐𝑡 (𝑠 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡 𝑡 𝑡
𝑡=0
where
In Eq. (12), the second equality uses the law of iterated expectations and Abel’s summation
formula (also called summation by parts, see this page)
First-order conditions with respect to 𝑐𝑡 (𝑠𝑡 ) can be expressed as
𝑢𝑐 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 ) + Ψ𝑡 (𝑠𝑡 ) {[𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑧(𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 ) 𝑧𝑐 (𝑠𝑡 )}
(14)
− 𝛾𝑡 (𝑠𝑡 ) [𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑏𝑡 (𝑠𝑡−1 ) = 0
If we substitute 𝑧(𝑠𝑡 ) from Eq. (7) and its derivative 𝑧𝑐 (𝑠𝑡 ) into the first-order condition
Eq. (14), we find two differences from the corresponding condition for the optimal allocation
in a Lucas-Stokey economy with state-contingent government debt
1. The term involving 𝑏𝑡 (𝑠𝑡−1 ) in the first-order condition Eq. (14) does not appear in the
corresponding expression for the Lucas-Stokey economy
1. The Lagrange multiplier Ψ𝑡 (𝑠𝑡 ) in the first-order condition Eq. (14) may change over
time in response to realizations of the state, while the multiplier Φ in the Lucas-Stokey
economy is time-invariant
We need some code from our an earlier lecture on optimal taxation with state-contingent debt
sequential allocation implementation:
class SequentialAllocation:
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1345
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
1346 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
80.4. RECURSIVE VERSION OF AMSS MODEL 1347
μHist[t] = μ
To analyze the AMSS model, we find it useful to adopt a recursive formulation using tech-
niques like those in our lectures on dynamic Stackelberg models and optimal taxation with
state-contingent debt
• leaves intact the single implementability constraint on allocations Eq. (8) from the
Lucas-Stokey economy, but
• adds measurability constraints Eq. (6) on functions of tails of allocations at each time
and history
We now explore how these constraints alter Bellman equations for a time 0 Ramsey planner
and for time 𝑡 ≥ 1, history 𝑠𝑡 continuation Ramsey planners
where 𝑅𝑡 (𝑠𝑡 ) is the gross risk-free rate of interest between 𝑡 and 𝑡 + 1 at history 𝑠𝑡 and 𝑇𝑡 (𝑠𝑡 )
are non-negative transfers
Throughout this lecture, we shall set transfers to zero (for some issues about the limiting
behavior of debt, this makes a possibly important difference from AMSS [5], who restricted
transfers to be non-negative)
In this case, the household faces a sequence of budget constraints
𝑏𝑡 (𝑠𝑡−1 ) + (1 − 𝜏𝑡 (𝑠𝑡 ))𝑛𝑡 (𝑠𝑡 ) = 𝑐𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) (16)
The household’s first-order conditions are 𝑢𝑐,𝑡 = 𝛽𝑅𝑡 E𝑡 𝑢𝑐,𝑡+1 and (1 − 𝜏𝑡 )𝑢𝑐,𝑡 = 𝑢𝑙,𝑡
Using these to eliminate 𝑅𝑡 and 𝜏𝑡 from budget constraint Eq. (16) gives
or
𝑢𝑐,𝑡 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡−1 ) + 𝑢𝑙,𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) = 𝑢𝑐,𝑡 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) + 𝛽(E𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 ) (18)
1348 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Now define
𝑏𝑡+1 (𝑠𝑡 )
𝑥𝑡 ≡ 𝛽𝑏𝑡+1 (𝑠𝑡 )E𝑡 𝑢𝑐,𝑡+1 = 𝑢𝑐,𝑡 (𝑠𝑡 ) (19)
𝑅𝑡 (𝑠𝑡 )
𝑢𝑐,𝑡 𝑥𝑡−1
= 𝑢𝑐,𝑡 𝑐𝑡 − 𝑢𝑙,𝑡 𝑛𝑡 + 𝑥𝑡 (20)
𝛽E𝑡−1 𝑢𝑐,𝑡
for 𝑡 ≥ 1
The right side of equation Eq. (21) expresses the time 𝑡 value of government debt in terms of
a linear combination of terms whose individual components are measurable with respect to 𝑠𝑡
The sum of terms on the right side of equation Eq. (21) must equal 𝑏𝑡 (𝑠𝑡−1 )
That implies that it has to be measurable with respect to 𝑠𝑡−1
Equations Eq. (21) are the measurability constraints that the AMSS model adds to the single
time 0 implementation constraint imposed in the Lucas and Stokey model
Let Π(𝑠|𝑠− ) be a Markov transition matrix whose entries tell probabilities of moving from
state 𝑠− to state 𝑠 in one period
Let
𝑢𝑐 (𝑠)𝑥−
= 𝑢𝑐 (𝑠)(𝑛(𝑠) − 𝑔(𝑠)) − 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (23)
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
80.4. RECURSIVE VERSION OF AMSS MODEL 1349
A continuation Ramsey planner at 𝑡 ≥ 1 takes (𝑥𝑡−1 , 𝑠𝑡−1 ) = (𝑥− , 𝑠− ) as given and before 𝑠 is
realized chooses (𝑛𝑡 (𝑠𝑡 ), 𝑥𝑡 (𝑠𝑡 )) = (𝑛(𝑠), 𝑥(𝑠)) for 𝑠 ∈ 𝑆
The Ramsey planner takes (𝑏0 , 𝑠0 ) as given and chooses (𝑛0 , 𝑥0 ).
The value function 𝑊 (𝑏0 , 𝑠0 ) for the time 𝑡 = 0 Ramsey planner satisfies the Bellman equa-
tion
Let 𝜇(𝑠|𝑠− )Π(𝑠|𝑠− ) be a Lagrange multiplier on the constraint Eq. (23) for state 𝑠
After forming an appropriate Lagrangian, we find that the continuation Ramsey planner’s
first-order condition with respect to 𝑥(𝑠) is
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ Π(𝑠|𝑠− )𝜇(𝑠|𝑠− ) (27)
𝑠
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠̃ − )𝑢𝑐 (𝑠)̃
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ (Π(𝑠|𝑠− ) ) 𝑉𝑥 (𝑥(𝑠), 𝑠) (28)
𝑠
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌ 𝑢𝑐 (𝑠)
Π(𝑠|𝑠 − ) ≡ Π(𝑠|𝑠− )
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌
Exercise: Please verify that Π(𝑠|𝑠 − ) is a valid Markov transition density, i.e., that its ele-
ments are all non-negative and that for each 𝑠− , the sum over 𝑠 equals unity
Along a Ramsey plan, the state variable 𝑥𝑡 = 𝑥𝑡 (𝑠𝑡 , 𝑏0 ) becomes a function of the history 𝑠𝑡
and initial government debt 𝑏0
In Lucas-Stokey model, we found that
1350 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
That 𝑉𝑥 (𝑥, 𝑠) varies over time according to a twisted martingale means that there is no state-
variable degeneracy in the AMSS model
In the AMSS model, both 𝑥 and 𝑠 are needed to describe the state
This property of the AMSS model transmits a twisted martingale component to consumption,
employment, and the tax rate
When 𝜇(𝑠|𝑠− ) = 𝛽𝑉𝑥 (𝑥(𝑠), 𝑥) converges to zero, in the limit 𝑢𝑙 (𝑠) = 1 = 𝑢𝑐 (𝑠), so that
𝜏 (𝑥(𝑠), 𝑠) = 0
Thus, in the limit, if 𝑔𝑡 is perpetually random, the government accumulates sufficient assets
to finance all expenditures from earnings on those assets, returning any excess revenues to the
household as non-negative lump-sum transfers
80.4.7 Code
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
80.4. RECURSIVE VERSION OF AMSS MODEL 1353
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
1354 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)
if imode > 0:
raise Exception(smode)
80.5 Examples
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
80.5. EXAMPLES 1355
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
In our lecture on optimal taxation with state contingent debt we studied how the government
manages uncertainty in a simple setting
As in that lecture, we assume the one-period utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
Note
For convenience in matching our computer code, we have expressed utility as a
function of 𝑛 rather than leisure 𝑙
1356 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
We consider the same government expenditure process studied in the lecture on optimal taxa-
tion with state contingent debt
Government expenditures are known for sure in all periods except one
A useful trick is to define components of the state vector as the following six (𝑡, 𝑔) pairs:
0 1 0 0 0 0
⎛
⎜ 0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
𝑃 =⎜
⎜
⎜
⎟
⎟
⎜ 0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜0.1⎞⎟
⎜
⎜ ⎟
⎟
⎜0.1 ⎟
𝑔=⎜
⎜ ⎟
⎜0.1⎟⎟
⎜0.2⎟
⎜ ⎟
0.1
⎝ ⎠
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
80.5. EXAMPLES 1357
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
The following figure plots the Ramsey plan under both complete and incomplete markets for
both possible realizations of the state at time 𝑡 = 3
Optimal policies when the government has access to state contingent debt are represented by
black lines, while the optimal policies when there is only a risk-free bond are in red
Paths with circles are histories in which there is peace, while those with triangle denote war
time_example = CRRAutility()
# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]
sim_bel_l[5] = time_example.Θ[sHist_l] * sim_bel_l[1]
sim_bel_h[5] = time_example.Θ[sHist_h] * sim_bel_h[1]
1358 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
plt.tight_layout()
plt.show()
0.6029333236643755
0.11899588239403049
0.09881553212225772
0.08354106892508192
0.07149555120835548
0.06173036758118132
0.05366019901394205
0.04689112026451663
0.04115178347560931
0.036240012965927396
0.032006237992696515
0.028368464481562206
0.025192689677184087
0.022405843880195616
0.01994774715614924
0.017777614158738117
0.01586311426476452
0.014157556340393418
0.012655688350303772
0.011323561508356405
0.010134342587404501
0.009067133049314944
0.008133363039380094
0.007289176565901135
0.006541414713738157
0.005872916742002829
0.005262680193064001
0.0047307749771207785
0.00425304528362447
0.003818501528167009
0.0034264405600953744
0.003079364780532014
0.002768326786546087
0.002490427866931677
0.002240592066624134
0.0020186948255381727
0.001817134273040178
0.001636402035539666
0.0014731339707420147
0.0013228186455305523
0.0011905279885160533
0.001069923299755228
0.0009619064545164963
0.000866106560101833
0.0007801798498127538
0.0007044038334509719
0.001135820461718877
0.0005858462046557034
0.0005148785169405882
80.5. EXAMPLES 1359
0.0008125646930954998
0.000419343630648423
0.0006110525605884945
0.0003393644339027041
0.00030505082851731526
0.0002748939327310508
0.0002466101258104514
0.00022217612526700695
0.00020017376735678401
0.00018111714263865545
0.00016358937979053516
0.00014736943218961575
0.00013236625616948046
0.00011853760872608077
0.00010958653853354627
9.594155330329376e-05
How a Ramsey planner responds to war depends on the structure of the asset market
If it is able to trade state-contingent debt, then at time 𝑡 = 2
• the tax rate is a function of the current level of government spending only, given the
Lagrange multiplier on the implementability constraint
Without state contingent debt, the optimal tax rate is history dependent
def __init__(self,
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)
With these preferences, Ramsey tax rates will vary even in the Lucas-Stokey model with
state-contingent debt
The figure below plots optimal tax policies for both the economy with state contingent debt
(circles) and the economy with only a risk-free bond (triangles)
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
# Simulate
sim_seq = log_sequential.simulate(0.5, 0, T, sHist)
sim_bel = log_bellman.simulate(0.5, 0, T, sHist)
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
0.09444436241467027
0.05938723624807882
0.009418765429903522
0.008379498687574425
0.0074624123240604355
0.006647816620408291
0.005931361510280879
0.005294448322145543
0.0047253954106721
0.0042222775808757355
0.0037757367327914595
0.0033746180929005954
0.003017386825278821
0.002699930230109115
0.002417750826161132
0.002162259204654334
0.0019376221726160596
0.001735451076427532
0.0015551292357692775
0.0013916748907577743
1362 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
0.0012464994947173087
0.0011179310440191763
0.0010013269547115295
0.0008961739076702308
0.0008040179931696027
0.0007206700039880485
0.0006461943981250373
0.0005794223638423901
0.0005197699346274911
0.0004655559524182191
0.00047793563176274804
0.00041453864841817386
0.0003355701386912934
0.0003008429779500316
0.00034467634902466326
0.00024187486910206502
0.0002748557784369297
0.0002832106657514851
0.00017453973560832125
0.00017016491393155364
0.00017694150942686578
0.00019907388038770387
0.00011291946575698052
0.00010121277902064972
9.094360747603131e-05
When the government experiences a prolonged period of peace, it is able to reduce govern-
ment debt and set permanently lower tax rates
However, the government finances a long war by borrowing and raising taxes
This results in a drift away from policies with state contingent debt that depends on the his-
tory of shocks
This is even more evident in the following figure that plots the evolution of the two policies
over 200 periods
80.5. EXAMPLES 1363
# Output paths
sim_seq_long[5] = log_example.Θ[sHist_long] * sim_seq_long[1]
sim_bel_long[5] = log_example.Θ[sHist_long] * sim_bel_long[1]
Footnotes
[1] In an allocation that solves the Ramsey problem and that levies distorting taxes on labor,
why would the government ever want to hand revenues back to the private sector? It would
not in an economy with state-contingent debt, since any such allocation could be improved by
lowering distortionary taxes rather than handing out lump-sum transfers. But, without state-
contingent debt there can be circumstances when a government would like to make lump-sum
transfers to the private sector.
1364 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
[2] From the first-order conditions for the Ramsey problem, there exists another realization 𝑠𝑡̃
with the same history up until the previous period, i.e., 𝑠𝑡−1̃ = 𝑠𝑡−1 , but where the multiplier
𝑡
on constraint Eq. (11) takes a positive value, so 𝛾𝑡 (𝑠 ̃ ) > 0.
81
81.1 Contents
• Overview 81.2
81.2 Overview
This lecture extends our investigations of how optimal policies for levying a flat-rate tax on
labor income and issuing government debt depend on whether there are complete markets for
debt
A Ramsey allocation and Ramsey policy in the AMSS [5] model described in optimal taxation
without state-contingent debt generally differs from a Ramsey allocation and Ramsey policy
in the Lucas-Stokey [90] model described in optimal taxation with state-contingent debt
1365
1366 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
This is because the implementability restriction that a competitive equilibrium with a distort-
ing tax imposes on allocations in the Lucas-Stokey model is just one among a set of imple-
mentability conditions imposed in the AMSS model
These additional constraints require that time 𝑡 components of a Ramsey allocation for the
AMSS model be measurable with respect to time 𝑡 − 1 information
The measurability constraints imposed by the AMSS model are inherited from the restriction
that only one-period risk-free bonds can be traded
Differences between the Ramsey allocations in the two models indicate that at least some
of the measurability constraints of the AMSS model of optimal taxation without state-
contingent debt are violated at the Ramsey allocation of a corresponding [90] model with
state-contingent debt
Another way to say this is that differences between the Ramsey allocations of the two models
indicate that some of the measurability constraints of the AMSS model are violated at the
Ramsey allocation of the Lucas-Stokey model
Nonzero Lagrange multipliers on those constraints make the Ramsey allocation for the AMSS
model differ from the Ramsey allocation for the Lucas-Stokey model
This lecture studies a special AMSS model in which
• After the implementability constraints (8) no longer bind in the tail of the AMSS Ram-
sey plan
– history dependence of the AMSS state variable 𝑥𝑡 vanishes and 𝑥𝑡 becomes a time-
invariant function of the Markov state 𝑠𝑡
– the par value of government debt becomes constant over time so that 𝑏𝑡+1 (𝑠𝑡 ) =
𝑏̄ for 𝑡 ≥ 𝑇 for a sufficiently large 𝑇
– 𝑏̄ < 0, so that the tail of the Ramsey plan instructs the government always to make
a constant par value of risk-free one-period loans to the private sector
– the one-period gross interest rate 𝑅𝑡 (𝑠𝑡 ) on risk-free debt converges to a time-
invariant function of the Markov state 𝑠𝑡
• For a particular 𝑏0 < 0 (i.e., a positive level of initial government loans to the private
sector), the measurability constraints never bind
– the par value 𝑏𝑡+1 (𝑠𝑡 ) = 𝑏̄ of government debt at time 𝑡 and Markov state 𝑠𝑡 is
constant across time and states, but …
̄
– the market value 𝑅 𝑏(𝑠 ) of government debt at time 𝑡 varies as a time-invariant
𝑡 𝑡
function of the Markov state 𝑠𝑡
81.3. FORCES AT WORK 1367
̄
– fluctuations in the interest rate make gross earnings on government debt 𝑅 𝑏(𝑠 )
𝑡 𝑡
fully insure the gross-of-gross-interest-payments government budget against fluc-
tuations in government expenditures
– the state variable 𝑥 in a recursive representation of a Ramsey plan is a time-
invariant function of the Markov state for 𝑡 ≥ 0
• In this special case, the Ramsey allocation in the AMSS model agrees with that in a
[90] model in which the same amount of state-contingent debt falls due in all states to-
morrow
– it is a situation in which the Ramsey planner loses nothing from not being able to
purchase state-contingent debt and being restricted to exchange only risk-free debt
debt
• This outcome emerges only when we initialize government debt at a particular 𝑏0 < 0
In a nutshell, the reason for this striking outcome is that at a particular level of risk-free gov-
ernment assets, fluctuations in the one-period risk-free interest rate provide the government
with complete insurance against stochastically varying government expenditures
The forces driving asymptotic outcomes here are examples of dynamics present in a more gen-
eral class incomplete markets models analyzed in [17] (BEGS)
BEGS provide conditions under which government debt under a Ramsey plan converges to an
invariant distribution
BEGS construct approximations to that asymptotically invariant distribution of government
debt under a Ramsey plan
BEGS also compute an approximation to a Ramsey plan’s rate of convergence to that limit-
ing invariant distribution
We shall use the BEGS approximating limiting distribution and the approximating rate of
convergence to help interpret outcomes here
For a long time, the Ramsey plan puts a nontrivial martingale-like component into the par
value of government debt as part of the way that the Ramsey plan imperfectly smooths dis-
tortions from the labor tax rate across time and Markov states
But BEGS show that binding implementability constraints slowly push government debt in
a direction designed to let the government use fluctuations in equilibrium interest rate rather
than fluctuations in par values of debt to insure against shocks to government expenditures
• This is a weak (but unrelenting) force that, starting from an initial debt level, for a
long time is dominated by the stochastic martingale-like component of debt dynam-
ics that the Ramsey planner uses to facilitate imperfect tax-smoothing across time and
states
• This weak force slowly drives the par value of government assets to a constant level
at which the government can completely insure against government expenditure shocks
while shutting down the stochastic component of debt dynamics
• At that point, the tail of the par value of government debt becomes a trivial martingale:
it is constant over time
1368 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
• We describe a two-state AMSS economy and generate a long simulation starting from a
positive initial government debt
• We observe that in a long simulation starting from positive government debt, the par
value of government debt eventually converges to a constant 𝑏̄
• In fact, the par value of government debt converges to the same constant level 𝑏̄ for al-
ternative realizations of the Markov government expenditure process and for alternative
settings of initial government debt 𝑏0
• We reverse engineer a particular value of initial government debt 𝑏0 (it turns out to be
negative) for which the continuation debt moves to 𝑏̄ immediately
• We note that for this particular initial debt 𝑏0 , the Ramsey allocations for the AMSS
economy and the Lucas-Stokey model are identical
– we verify that the LS Ramsey planner chooses to purchase identical claims to
time 𝑡 + 1 consumption for all Markov states tomorrow for each Markov state to-
day
• We compute the BEGS approximations to check how accurately they describe the dy-
namics of the long-simulation
Although we are studying an AMSS [5] economy, a Lucas-Stokey [90] economy plays an im-
portant role in the reverse-engineering calculation to be described below
For that reason, it is helpful to have readily available some key equations underlying a Ram-
sey plan for the Lucas-Stokey economy
Recall first-order conditions for a Ramsey allocation for the Lucas-Stokey economy
For 𝑡 ≥ 1, these take the form
There is one such equation for each value of the Markov state 𝑠𝑡
In addition, given an initial Markov state, the time 𝑡 = 0 quantities 𝑐0 and 𝑏0 satisfy
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (3)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate
81.4. LOGICAL FLOW OF LECTURE 1369
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0−1 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
It is useful to transform some of the above equations to forms that are more natural for ana-
lyzing the case of a CRRA utility specification that we shall use in our example economies
As in lectures optimal taxation without state-contingent debt and optimal taxation with
state-contingent debt, we assume that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
The analysis of Lucas and Stokey prevails once we make the following replacements
With these understandings, equations Eq. (1) and Eq. (2) simplify in the case of the CRRA
utility function
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (5)
In equation Eq. (4), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠
The CRRA utility function is represented in the following class
1370 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
𝛽 = .9
𝜎=2
𝛾=2
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
1372 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
81.5. EXAMPLE ECONOMY 1373
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
1374 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
81.5. EXAMPLE ECONOMY 1375
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
'''
Finds the optimal policies
'''
model, β, Θ, G, S, π = self.model, self.β, self.Θ, self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)
if imode > 0:
raise Exception(smode)
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
1378 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
We can reverse engineer a value 𝑏0 of initial debt due that renders the AMSS measurability
constraints not binding from time 𝑡 = 0 onward
We accomplish this by recognizing that if the AMSS measurability constraints never bind,
then the AMSS allocation and Ramsey plan is equivalent with that for a Lucas-Stokey econ-
omy in which for each period 𝑡 ≥ 0, the government promises to pay the same state-
contingent amount 𝑏̄ in each state tomorrow
This insight tells us to find a 𝑏0 and other fundamentals for the Lucas-Stokey [90] model that
make the Ramsey planner want to borrow the same value 𝑏̄ next period for all states and all
dates
We accomplish this by using various equations for the Lucas-Stokey [90] model presented in
optimal taxation with state-contingent debt
We use the following steps
Step 1: Pick an initial Φ
Step 2: Given that Φ, jointly solve two versions of equation Eq. (4) for 𝑐(𝑠), 𝑠 = 1, 2 associ-
ated with the two values for 𝑔(𝑠), 𝑠 = 1, 2
Step 3: Solve the following equation for 𝑥⃗
𝑥⃗
𝑏⃗ = (7)
𝑢⃗𝑐
u = CRRAutility()
def min_Φ(Φ):
# Solve Φ(c)
def equations(unknowns, Φ):
c1, c2 = unknowns
# First argument of .Uc and second argument of .Un are redundant
81.7. CODE FOR REVERSE ENGINEERING 1379
return loss
Out[7]: -1.0757576567504166
c0, b0 = unknowns
g0 = u.G[s-1]
In [9]: c0, b0 = fsolve(solve_cb, np.array([1., -1.], dtype='float64'), args=(Φ_star, b[0], 1), xtol=1.0e-12)
c0, b0
The following graph shows simulations of outcomes for both a Lucas-Stokey economy and for
an AMSS economy starting from initial government debt equal to 𝑏0 = −1.038698407551764
These graphs report outcomes for both the Lucas-Stokey economy with complete markets and
the AMSS economy with one-period risk-free debt only
log_example = CRRAutility()
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
0.04094445433234912
0.0016732111459338028
0.0014846748487524172
0.0013137721375787164
0.001181403713496291
0.001055965336274255
0.0009446661646844358
0.0008463807322718293
0.0007560453780620191
0.0006756001036624751
0.0006041528458700388
0.0005396004512131591
0.0004820716911559142
0.0004308273211001684
81.8. SHORT SIMULATION FOR REVERSE-ENGINEERED: INITIAL DEBT 1381
0.0003848185136981698
0.0003438352175587286
0.000307243693715206
0.0002745009148200469
0.00024531773404782317
0.0002192332430448889
0.00019593539446980383
0.00017514303514117128
0.0001565593983558638
0.00013996737141091305
0.00012514457833358872
0.00011190070779369022
0.0001000702022487836
8.949728533921615e-05
8.004975220206986e-05
7.16059059036149e-05
6.40583656889648e-05
5.731162430892402e-05
5.127968193566545e-05
4.5886529754852955e-05
4.106387898823845e-05
3.675099365037568e-05
3.289361837628717e-05
2.9443289305467077e-05
2.635678797913085e-05
2.3595484132661966e-05
2.1124903957300157e-05
1.891424711454524e-05
1.6936003234214835e-05
1.5165596593393527e-05
1.358106697950504e-05
1.2162792578343118e-05
1.089323614045592e-05
9.756722989261432e-06
8.739240835382216e-06
7.828264537526775e-06
7.012590840428639e-06
6.282206099226885e-06
5.628151985858767e-06
5.042418443402312e-06
4.5178380641774095e-06
4.048002049270609e-06
3.6271748637111453e-06
3.25022483449945e-06
2.9125597419793e-06
2.6100730258792974e-06
2.33908472396273e-06
2.096307136505147e-06
1.8787904889257265e-06
1.6838997430816734e-06
1.509274819366032e-06
1.3528011889214775e-06
1.212587081653834e-06
1.0869381104429176e-06
9.743372244174285e-07
8.73426405689756e-07
7.829877314930334e-07
7.019331006223168e-07
6.292850109121352e-07
5.641704754646274e-07
5.058062142044674e-07
4.534908905846261e-07
4.0659614636622263e-07
3.6455917260464895e-07
3.2687571576858064e-07
2.9309400626589154e-07
2.628097110920697e-07
2.3565904692627078e-07
2.1131781852307158e-07
1.894947440294367e-07
1.699288361713118e-07
1.5238586063734686e-07
1.366568424325186e-07
1382 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
1.2255365279755824e-07
1.0990783200082102e-07
9.856861272368773e-08
8.840091774987147e-08
7.928334532230156e-08
7.110738489161091e-08
6.377562438179933e-08
5.720073827118772e-08
5.1304550974155735e-08
4.6016827121093976e-08
4.127508285786482e-08
3.702254013429707e-08
3.3208575403099436e-08
2.9788031505649846e-08
2.6720125194025672e-08
2.3968551794263268e-08
2.1500634727809534e-08
1.928709568259096e-08
1.7301644673193848e-08
1.5520805495718083e-08
1.3923446503682317e-08
1.2490628141347746e-08
1.1205412924843752e-08
1.005255424847768e-08
9.018420064493843e-09
8.090776959812253e-09
7.2586201295038205e-09
6.512151645666916e-09
5.842497427160883e-09
5.2417739988686235e-09
4.702866830975856e-09
4.219410867722359e-09
3.7856971691602775e-09
3.3965991981299917e-09
3.047527271191316e-09
2.73435780104547e-09
2.4533959184694e-09
2.201325576919178e-09
1.975173912964314e-09
1.7722736943474094e-09
1.5902318528480405e-09
1.4269032326934397e-09
1.280361209635549e-09
1.1488803057922307e-09
1.030910807308611e-09
9.250638131182712e-10
8.30091415855734e-10
7.44876618462649e-10
6.684152536152628e-10
5.998085081044447e-10
5.382483192957509e-10
4.830097256567513e-10
4.3344408654246964e-10
3.88969172650052e-10
3.4905943032488643e-10
3.1324806778169217e-10
2.811122777111904e-10
2.5227584505600285e-10
2.2639906361282244e-10
2.0317838832934676e-10
1.8234104590203233e-10
1.6364103618734542e-10
1.468608707188693e-10
1.3180218471597189e-10
1.182881710076278e-10
1.0616062455371046e-10
9.527750852134792e-11
81.9. LONG SIMULATION 1383
The Ramsey allocations and Ramsey outcomes are identical for the Lucas-Stokey and AMSS
economies
This outcome confirms the success of our reverse-engineering exercises
Notice how for 𝑡 ≥ 1, the tax rate is a constant - so is the par value of government debt
However, output and labor supply are both nontrivial time-invariant functions of the Markov
state
The following graph shows the par value of government debt and the flat rate tax on labor
income for a long simulation for our sample economy
For the same realization of a government expenditure path, the graph reports outcomes for
two economies
• the gray lines are for the Lucas-Stokey economy with complete markets
• the blue lines are for the AMSS economy with risk-free one-period debt only
• Notice that this is a time-invariant function of the Markov state from the beginning
For the AMSS incomplete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡 )
1384 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
• Notice that this is a martingale-like random process that eventually seems to converge
to a constant 𝑏̄ ≈ −1.07
• Notice that the limiting value 𝑏̄ < 0 so that asymptotically the government makes a
constant level of risk-free loans to the public
• In the simulation displayed as well as other simulations we have run, the par value of
government debt converges to about 1.07 afters between 1400 to 2000 periods
For the AMSS incomplete markets economy, the marginal tax rate on labor income 𝜏𝑡 con-
verges to a constant
• labor supply and output each converge to time-invariant functions of the Markov state
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
As remarked above, after 𝑏𝑡+1 (𝑠𝑡 ) has converged to a constant, the measurability constraints
in the AMSS model cease to bind
This leads us to seek an initial value of government debt 𝑏0 that renders the measurability
constraints slack from time 𝑡 = 0 onward
• a tell-tale sign of this situation is that the Ramsey planner in a corresponding Lucas-
Stokey economy would instruct the government to issue a constant level of government
debt 𝑏𝑡+1 (𝑠𝑡+1 ) across the two Markov states
It is useful to link the outcome of our reverse engineering exercise to limiting approximations
constructed by [17]
[17] used a slightly different notation to represent a generalization of the AMSS model
We’ll introduce a version of their notation so that readers can quickly relate notation that
appears in their key formulas to the notation that we have used
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to our notation by
𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅 =
𝑢𝑐,𝑡−1 𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]
In terms of their notation, equation (44) of [17] expresses the time 𝑡 state 𝑠 government bud-
get constraint as
where the dependence on 𝜏 is to remind us that these objects depend on the tax rate and 𝑠−
is last period’s Markov state
BEGS interpret random variations in the right side of Eq. (8) as a measure of fiscal risk
composed of
1386 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
cov∞ (ℛ, 𝒳)
ℬ∗ = − (9)
var∞ (ℛ)
where the superscript ∞ denotes a moment taken with respect to an ergodic distribution
Formula Eq. (9) presents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribution
This regression coefficient emerges as the minimizer for a variance-minimization problem:
The minimand in criterion Eq. (10) is the measure of fiscal risk associated with a given tax-
debt policy that appears on the right side of equation Eq. (8)
Expressing formula Eq. (9) in terms of our notation tells us that 𝑏̄ should approximately
equal
ℬ∗
𝑏̂ = (11)
𝛽𝐸𝑡 𝑢𝑐,𝑡+1
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
∗
≈ 2
(12)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var(ℛ)
For our example, we describe some code that we use to compute the steady state mean and
the rate of convergence to it
The values of 𝜋(𝑠) are .5, .5
We can then construct 𝒳(𝑠), ℛ(𝑠), 𝑢𝑐 (𝑠) for our two states using the definitions above
We can then construct 𝛽𝐸𝑡−1 𝑢𝑐 = 𝛽 ∑𝑠 𝑢𝑐 (𝑠)𝜋(𝑠), cov(ℛ(𝑠), 𝒳(𝑠)) and var(ℛ(𝑠)) to be
plugged into formula Eq. (11)
We also want to compute var(𝒳)
81.10. BEGS APPROXIMATIONS OF LIMITING DEBT AND CONVERGENCE RATE1387
To compute the variances and covariance, we use the following standard formulas
Temporarily let 𝑥(𝑠), 𝑠 = 1, 2 be an arbitrary random variables
Then we define
𝜇𝑥 = ∑ 𝑥(𝑠)𝜋(𝑠)
𝑠
cov(𝑥, 𝑦) = (∑ 𝑥(𝑠)𝑦(𝑠)𝜋(𝑠)) − 𝜇𝑥 𝜇𝑦
𝑠
After we compute these moments, we compute the BEGS approximation to the asymptotic
mean 𝑏̂ in formula Eq. (11)
After that, we move on to compute ℬ∗ in formula Eq. (9)
We’ll also evaluate the BEGS criterion Eq. (8) at the limiting value ℬ∗
2
𝐽 (ℬ∗ ) = var(ℛ) (ℬ∗ ) + 2ℬ∗ cov(ℛ, 𝒳) + var(𝒳) (13)
Here are some functions that we’ll use to compute key objects that we want
def variance(x):
x = np.array(x)
return x**2 @ u.π[s] - mean(x)**2
Now let’s form the two random variables ℛ, 𝒳 appearing in the BEGS approximating formu-
las
In [13]: u = CRRAutility()
s = 0
c = [0.940580824225584, 0.8943592757759343] # Vector for c
g = u.G # Vector for g
n = c + g # Total population
τ = lambda s: 1 + u.Un(1, n[s]) / u.Uc(c[s], 1)
R = [R_s(0), R_s(1)]
X = [X_s(0), X_s(1)]
Now let’s compute the ingredient of the approximating limit and the approximating rate of
convergence
Out[14]: -1.0757585378303758
So we have
Out[16]: -8.810799592140484e-07
Out[17]: -9.020562075079397e-17
This is machine zero, a verification that 𝑏̂ succeeds in minimizing the nonnegative fiscal cost
criterion 𝐽 (ℬ∗ ) defined in BEGS and in equation Eq. (13) above
Let’s push our luck and compute the mean reversion speed in the formula above equation
(47) in [17]
Now let’s compute the implied meantime to get to within .01 of the limit
The slow rate of convergence and the implied time of getting within one percent of the limit-
ing value do a good job of approximating our long simulation above
82
82.1 Contents
• Overview 82.2
In addition to what’s in Anaconda, this lecture will need the following libraries
82.2 Overview
This lecture studies government debt in an AMSS economy [5] of the type described in Opti-
mal Taxation without State-Contingent Debt
We study the behavior of government debt as time 𝑡 → +∞
We use these techniques
• simulations
• a regression coefficient from the tail of a long simulation that allows us to
verify that the asymptotic mean of government debt solves a fiscal-risk mini-
mization problem
• an approximation to the mean of an ergodic distribution of government debt
• an approximation to the rate of convergence to an ergodic distribution of
government debt
We apply tools applicable to more general incomplete markets economies that are presented
on pages 648 - 650 in section III.D of [17] (BEGS)
We study an [5] economy with three Markov states driving government expenditures
1389
1390 82. FISCAL RISK AND GOVERNMENT DEBT
• In a previous lecture, we showed that with only two Markov states, it is pos-
sible that eventually endogenous interest rate fluctuations support complete
markets allocations and Ramsey outcomes
• The presence of three states prevents the full spanning that eventually pre-
vails in the two-state example featured in Fiscal Insurance via Fluctuating
Interest Rates
The lack of full spanning means that the ergodic distribution of the par value of government
debt is nontrivial, in contrast to the situation in Fiscal Insurance via Fluctuating Interest
Rates where the ergodic distribution of the par value is concentrated on one point
Nevertheless, [17] (BEGS) establish for general settings that include ours, the Ramsey plan-
ner steers government assets to a level that comes as close as possible to providing full
spanning in a precise a sense defined by BEGS that we describe below
We use code constructed in a previous lecture
Warning: Key equations in [17] section III.D carry typos that we correct below
As in Optimal Taxation without State-Contingent Debt and Optimal Taxation with State-
Contingent Debt, we assume that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
𝛽 = .9
𝜎=2
𝛾=2
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
We’ll want first and second moments of some key random variables below
The following code computes these moments; the code is recycled from Fiscal Insurance via
Fluctuating Interest Rates
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
1394 82. FISCAL RISK AND GOVERNMENT DEBT
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
1396 82. FISCAL RISK AND GOVERNMENT DEBT
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)
if imode > 0:
raise Exception(smode)
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
Next, we show the code that we use to generate a very long simulation starting from initial
government debt equal to −.5
Here is a graph of a long simulation of 102000 periods
82.4. LONG SIMULATION 1399
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
0.03826635338764132
0.0015144378246369176
0.0013387575048731985
0.0011833202401039893
0.0010600307116126906
0.0009506620324956109
0.0008518776516864095
0.0007625857030935052
0.0006819563061521688
0.0006094002927215782
0.0005443007358277924
0.00048599500336476265
0.00043383959355578774
0.0003872273086410756
0.0003455954121656354
0.0003084287064303067
0.0002752590187447044
0.00024566312918700075
0.00021925988532276431
0.00019570695816949855
0.00017469751640983744
0.00015595697136515873
0.0001392398796073817
0.00012432704754811855
0.00011102285955108606
9.91528320785181e-05
8.85613917694051e-05
7.910986484645073e-05
7.067466534287542e-05
6.314566738064437e-05
5.6424746011174256e-05
1400 82. FISCAL RISK AND GOVERNMENT DEBT
5.042447141827191e-05
4.506694213583938e-05
4.0282743557388626e-05
3.6010019182228725e-05
3.219364288206812e-05
2.878448159091498e-05
2.573873836048089e-05
2.3017369984667964e-05
2.05855625553573e-05
1.8412273738832955e-05
1.647009682046267e-05
1.473414850165591e-05
1.318221437445491e-05
1.179465462832215e-05
1.0553942908677422e-05
9.444436171219125e-06
8.452171085911092e-06
7.564681532049729e-06
6.770836845076241e-06
6.0606989929363644e-06
5.4253876521102315e-06
4.856977565561056e-06
4.3483826637469755e-06
3.893276378650257e-06
3.4860031062237643e-06
3.1215108661056906e-06
2.795283854622686e-06
2.503284403583463e-06
2.241904892406543e-06
2.007920705909931e-06
1.798447325327638e-06
1.6109044251479309e-06
1.442988347655394e-06
1.2926351682153122e-06
1.1580010625721907e-06
1.0374365434222132e-06
9.294651116185741e-07
8.327667920437415e-07
7.461587021354829e-07
6.685858974594693e-07
5.991018344113918e-07
5.368603393149832e-07
4.81103577238208e-07
4.3115412525372483e-07
3.864050517478613e-07
3.4631297123891545e-07
3.103915075306082e-07
2.7820599855382486e-07
2.4936652532360413e-07
2.2352414642839936e-07
2.0036654801897765e-07
1.796139868675989e-07
1.6101607241185348e-07
1.443483988609368e-07
1.294101204124584e-07
1.1602136958604934e-07
1.0402103101246419e-07
9.32645840418982e-08
8.362288415242993e-08
7.49800787010894e-08
6.723245436434705e-08
6.028706738630238e-08
5.406065458358515e-08
4.8478615812375635e-08
4.3474113222701845e-08
3.8987258373706536e-08
3.496438798394339e-08
3.1357419654540646e-08
2.8123260340143566e-08
2.5223296649714814e-08
2.2622923200109855e-08
2.0291126502626413e-08
1.820010906835736e-08
82.4. LONG SIMULATION 1401
1.6324961312300717e-08
1.4643350782351372e-08
1.3135263982637831e-08
1.1782761809326932e-08
1.0569761128861453e-08
9.481846520621847e-09
8.50609563218879e-09
7.63092232387174e-09
6.845941933854109e-09
6.1418410292820715e-09
5.510272681189048e-09
4.9437518940625445e-09
4.4355691169414215e-09
3.979750248408692e-09
3.5708449347232996e-09
3.204016722809105e-09
2.8749276973662518e-09
2.5796921449357712e-09
2.3148170455156384e-09
2.07718006687458e-09
1.8639730005803084e-09
1.6726806861203604e-09
1.5010483122790743e-09
1.3470513529986538e-09
1.2088752663386996e-09
1.0848927139156927e-09
9.73642020099531e-10
8.738161691592579e-10
7.842384749994105e-10
7.038552975803189e-10
6.317228704828643e-10
5.669919588945233e-10
5.089025906360272e-10
4.5677212348350323e-10
4.09988116556057e-10
3.6800306134146346e-10
3.3032151919832457e-10
2.9650399560334535e-10
2.661523729846855e-10
2.389122352703575e-10
2.1446217496797182e-10
1.9251828729107873e-10
1.7282254741959947e-10
1.5514374031919394e-10
1.3927467128385764e-10
1.2503216593604361e-10
1.1224730712531801e-10
1.0077070390381865e-10
9.046923715917151e-11
1402 82. FISCAL RISK AND GOVERNMENT DEBT
lecture
We discard the first 2000 observations of the simulation and construct the histogram of the
part value of government debt
We obtain the following graph for the histogram of the last 100,000 observations on the par
value of government debt
The black vertical line denotes the sample mean for the last 100,000 observations included in
ℬ∗
the histogram; the green vertical line denotes the value of 𝐸𝑢 , associated with the sample
𝑐
∗
(presumably) from the ergodic where ℬ is the regression coefficient described below; the red
vertical line denotes an approximation by [17] to the mean of the ergodic distribution that
can be precomputed before sampling from the ergodic distribution, as described below
Before moving on to discuss the histogram and the vertical lines approximating the ergodic
mean of government debt in more detail, the following graphs show government debt and
taxes early in the simulation, for periods 1-100 and 101 to 200 respectively
For the short samples early in our simulated sample of 102,000 observations, fluctuations in
government debt and the tax rate conceal the weak but inexorable force that the Ramsey
planner puts into both series driving them toward ergodic distributions far from these early
observations
• early observations are more influenced by the initial value of the par value of
government debt than by the ergodic mean of the par value of government
debt
• much later observations are more influenced by the ergodic mean and are in-
dependent of the initial value of the par value of government debt
1406 82. FISCAL RISK AND GOVERNMENT DEBT
𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅𝑡−1 =
𝑢𝑐,𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]
[17] call 𝒳𝑡 the effective government deficit, and ℬ𝑡 the effective government debt
Equation (44) of [17] expresses the time 𝑡 state 𝑠 government budget constraint as
where the dependence on 𝜏 is to remind us that these objects depend on the tax rate; 𝑠− is
last period’s Markov state
BEGS interpret random variations in the right side of Eq. (1) as fiscal risks generated by
BEGS give conditions under which the ergodic mean of ℬ𝑡 approximately satisfies the equa-
tion
cov∞ (ℛt , 𝒳t )
ℬ∗ = − (2)
var∞ (ℛt )
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1407
where the superscript ∞ denotes a moment taken with respect to an ergodic distribution
Formula Eq. (2) represents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribu-
tion
Regression coefficient ℬ∗ solves a variance-minimization problem:
The minimand in criterion Eq. (3) measures fiscal risk associated with a given tax-debt policy
that appears on the right side of equation Eq. (1)
Expressing formula Eq. (2) in terms of our notation tells us that the ergodic mean of the par
value 𝑏 of government debt in the AMSS model should approximately equal
ℬ∗ ℬ∗
𝑏̂ = = (4)
𝛽𝐸(𝐸𝑡 𝑢𝑐,𝑡+1 ) 𝛽𝐸(𝑢𝑐,𝑡+1 )
where mathematical expectations are taken with respect to the ergodic distribution
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
∗
≈ (5)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var∞ (ℛ)
2
The remainder of this lecture is about technical material based on formulas from [17]
The topic is interpreting and extending formula Eq. (3) for the ergodic mean ℬ∗
Attributes of the ergodic distribution for ℬ𝑡 appear on the right side of formula Eq. (3) for
the ergodic mean ℬ∗
Thus, formula Eq. (3) is not useful for estimating the mean of the ergodic in advance of actu-
ally computing the ergodic distribution
• we need to know the ergodic distribution to compute the right side of for-
mula Eq. (3)
So the primary use of equation Eq. (3) is how it confirms that the ergodic distribution solves
a fiscal-risk minimization problem
As an example, notice how we used the formula for the mean of ℬ in the ergodic distribution
of the special AMSS economy in Fiscal Insurance via Fluctuating Interest Rates
1408 82. FISCAL RISK AND GOVERNMENT DEBT
[17] propose an approximation to ℬ∗ that can be computed without first knowing the ergodic
distribution
To construct the BEGS approximation to ℬ∗ , we just follow steps set forth on pages 648 - 650
of section III.D of [17]
• notation in BEGS might be confusing at first sight, so it is important to stare and di-
gest before computing
• there are also some sign errors in the [17] text that we’ll want to correct
Step 2: Knowing 𝑐𝜏 (𝑠), 𝑠 = 1, … , 𝑆 for a given 𝜏 , we want to compute the random variables
𝑐𝜏 (𝑠)−𝜎
ℛ𝜏 (𝑠) = 𝑆
𝛽 ∑𝑠′ =1 𝑐𝜏 (𝑠′ )−𝜎 𝜋(𝑠′ )
and
each for 𝑠 = 1, … , 𝑆
BEGS call ℛ𝜏 (𝑠) the effective return on risk-free debt and they call 𝒳𝜏 (𝑠) the effective
government deficit
Step 3: With the preceding objects in hand, for a given ℬ, we seek a 𝜏 that satisfies
𝛽 𝛽
ℬ=− 𝐸𝒳𝜏 ≡ − ∑ 𝒳𝜏 (𝑠)𝜋(𝑠)
1−𝛽 1−𝛽 𝑠
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1409
This equation says that at a constant discount factor 𝛽, equivalent government debt ℬ equals
the present value of the mean effective government surplus
Typo alert: there is a sign error in equation (46) of [17] –the left side should be multiplied
by −1
For a given ℬ, let a 𝜏 that solves the above equation be called 𝜏 (ℬ)
We’ll use a Python root solver to finds a 𝜏 that this equation for a given ℬ
We’ll use this function to induce a function 𝜏 (ℬ)
Step 4: With a Python program that computes 𝜏 (ℬ) in hand, next we write a Python func-
tion to compute the random variable
Step 5: Now that we have a machine to compute the random variable 𝐽 (ℬ)(𝑠), 𝑠 = 1, … , 𝑆,
via a composition of Python functions, we can use the population variance function that we
defined in the code above to construct a function var(𝐽 (ℬ))
We put var(𝐽 (ℬ)) into a function minimizer and compute
Step 6: Next we take the minimizer ℬ∗ and the Python functions for computing means and
variances and compute
1
rate =
1 + 𝛽 2 var(ℛ𝜏(ℬ∗ ) )
(ℬ∗ , rate)
𝑑𝑖𝑣 = 𝛽𝐸𝑢𝑐,𝑡+1
and then compute the mean of the par value of government debt in the AMSS model
ℬ∗
𝑏̂ =
𝑑𝑖𝑣
In the two-Markov-state AMSS economy in Fiscal Insurance via Fluctuating Interest Rates,
𝐸𝑡 𝑢𝑐,𝑡+1 = 𝐸𝑢𝑐,𝑡+1 in the ergodic distribution and we have confirmed that this formula very
accurately describes a constant par value of government debt that
In the three-Markov-state economy of this lecture, the par value of government debt fluctu-
ates in a history-dependent way even asymptotically
In this economy, 𝑏̂ given by the above formula approximates the mean of the ergodic distribu-
tion of the par value of government debt
• this is the red vertical line plotted in the histogram of the last 100,000 obser-
vations of our simulation of the par value of government debt plotted above
• the approximation is fairly accurate but not perfect
• so while the approximation circumvents the chicken and egg problem sur-
rounding the much better approximation associated with the green vertical
line, it does so by enlarging the approximation error
82.5.7 Execution
Step 2
82.5.9 Code
In [14]: u.π
In [15]: s = 0
R, X = compute_R_X(τ, u, s)
In [16]: R
In [17]: mean(R, s)
Out[17]: 1.1111111111111112
In [18]: X
In [19]: mean(X, s)
1412 82. FISCAL RISK AND GOVERNMENT DEBT
Out[19]: 0.19134248445303795
In [20]: X @ u.π
Step 3
In [22]: s = 0
B = 1.0
Out[22]: 0.2740159773695818
In [24]: min_J(B, u, s)
Out[24]: 0.035564405653720765
Step 6
Out[25]: -1.199482032053344
In [27]: div = u.β * (u.Uc(c[0], n[0]) * u.π[s, 0] + u.Uc(c[1], n[1]) * u.π[s, 1] + u.Uc(c[2], n[2]) * u.π[
Out[28]: -1.057765110954647
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1413
Out[29]: 0.09572926599432369
Out[31]: 0.9931353429089931
83.1 Contents
• Overview 83.2
• Setting 83.3
• Analysis 83.6
83.2 Overview
This lecture describes how Chang [25] analyzed competitive equilibria and a best competi-
tive equilibrium called a Ramsey plan
He did this by
1415
1416 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
Roberto Chang [25] chose a model of Calvo [21] as a simple structure that conveys ideas that
apply more broadly
A textbook version of Chang’s model appears in chapter 25 of [87]
This lecture and Credible Government Policies in Chang Model can be viewed as more so-
phisticated and complete treatments of the topics discussed in Ramsey plans, time inconsis-
tency, sustainable plans
Both this lecture and Credible Government Policies in Chang Model make extensive use of an
idea to which we apply the nickname dynamic programming squared
In dynamic programming squared problems there are typically two interrelated Bellman equa-
tions
• A Bellman equation for a set of agents or followers with value or value function 𝑣𝑎
• A Bellman equation for a principal or Ramsey planner or Stackelberg leader with value
or value function 𝑣𝑝 in which 𝑣𝑎 appears as an argument
We encountered problems with this structure in dynamic Stackelberg problems, optimal taxa-
tion with state-contingent debt, and other lectures
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, …
The objects in play are
A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget con-
straints and other constraints imposed by competitive equilibrium
Given tax collection and price of money sequences, a representative household chooses se-
quences (𝑐,⃗ 𝑚)
⃗ of consumption and real balances
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling
decisions of the government and the representative household
Chang adopts a version of a model that [21] designed to exhibit time-inconsistency of a Ram-
sey policy in a simple and transparent setting
By influencing the representative household’s expectations, government actions at time 𝑡 af-
fect components of household utilities for periods 𝑠 before 𝑡
83.3. SETTING 1417
When setting a path for monetary expansion rates, the government takes into account how
the household’s anticipations of the government’s future actions affect the household’s current
decisions
The ultimate source of time inconsistency is that a time 0 Ramsey planner takes these effects
into account in designing a plan of government actions for 𝑡 ≥ 0
83.3 Setting
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, which we can also call the value of money
Chang [25] assumes that
83.3.2 Government
The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋
𝑡
1418 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚]̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:
𝑦𝑡 = 𝑓(𝑥𝑡 ), (5)
where 𝑓 ∶ R → R satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ R, so that subsidies and taxes are equally distorting
Calvo’s and Chang’s purpose is not to model the causes of tax distortions in any detail but
simply to summarize the outcome of those distortions via the function 𝑓(𝑥)
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues
Ramsey plan: A Ramsey plan is a competitive equilibrium that maximizes Eq. (1)
Within-period timing of decisions is as follows:
This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected
This consideration will be important in lecture credible government policies when we study
credible government policies
The model is designed to focus on the intertemporal trade-offs between the welfare benefits
of deflation and the welfare costs associated with the high tax collections required to retire
money at a rate that delivers deflation
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing sufficiently large distorting tax collections
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation
83.4. COMPETITIVE EQUILIBRIUM 1419
∞
ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’
From the standpoint of the household at time 𝑡, equation Eq. (7) shows that 𝜃𝑡+1 intermedi-
ates the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation
A closely related observation pervaded the analysis of Stackelberg plans in lecture dynamic
Stackelberg problems
Definition:
Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be
a competitive equilibrium if
• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 )
• The government budget constraint is satisfied
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem
• Let Ω denote the set of initial promised marginal utilities of money 𝜃0 associated with
competitive equilibria
• Chang exploits the fact that a competitive equilibrium consists of a first period outcome
(ℎ0 , 𝑚0 , 𝑥0 ) and a continuation competitive equilibrium with marginal utility of money
𝜃1 ∈ Ω
ℎ𝑡 = ℎ(𝜃𝑡 )
𝑚𝑡 = 𝑚(𝜃𝑡 )
(8)
𝑥𝑡 = 𝑥(𝜃𝑡 )
𝜃𝑡+1 = Ψ(𝜃𝑡 )
starting from 𝜃0
The range and domain of Ψ(⋅) are both Ω
• Imagine that after a ‘revolution’ at time 𝑡 ≥ 1, a new Ramsey planner is given the op-
portunity to ignore history and solve a brand new Ramsey plan
• This new planner would want to reset the 𝜃𝑡 associated with the original Ramsey plan
to 𝜃0
• The incentive to reinitialize 𝜃𝑡 associated with this revolution experiment indicates the
time-inconsistency of the Ramsey plan
• By resetting 𝜃 to 𝜃0 , the new planner avoids the costs at time 𝑡 that the original Ram-
sey planner must pay to reap the beneficial effects that the original Ramsey plan for
𝑠 ≥ 𝑡 had achieved via its influence on the household’s decisions for 𝑠 = 0, … , 𝑡 − 1
83.6 Analysis
A competitive equilibrium is a triple of sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ that satisfies Eq. (2),
Eq. (3), and Eq. (6)
Chang works with a set of competitive equilibria defined as follows
Definition: 𝐶𝐸 = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ such that Eq. (2), Eq. (3), and Eq. (6) are satisfied }
𝐶𝐸 is not empty because there exists a competitive equilibrium with ℎ𝑡 = 1 for all 𝑡 ≥ 1,
namely, an equilibrium with a constant money supply and constant price level
Chang establishes that 𝐶𝐸 is also compact
Chang makes the following key observation that combines ideas of Abreu, Pearce, and Stac-
chetti [2] with insights of Kydland and Prescott [81]
Proposition: The continuation of a competitive equilibrium is a competitive equilibrium
That is, (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 implies that (𝑚⃗ 𝑡 , 𝑥𝑡⃗ , ℎ⃗ 𝑡 ) ∈ 𝐶𝐸 ∀ 𝑡 ≥ 1
(Lecture dynamic Stackelberg problems also used a version of this insight)
We can now state that a Ramsey problem is to
∞
max ∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈𝐸 ∞
𝑡=0
Ω = {𝜃 ∈ R such that 𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for some (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
Equation Eq. (6) inherits from the household’s Euler equation for money holdings the prop-
erty that the value of 𝑚0 consistent with the representative household’s choices depends on
(ℎ⃗ 1 , 𝑚⃗ 1 )
This dependence is captured in the definition above by making Ω be the set of first period
values of 𝜃0 satisfying 𝜃0 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for first period component (𝑚0 , ℎ0 ) of compet-
itive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗
1422 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
∞
𝑤(𝜃) = max ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈Γ(𝜃) 𝑡=0
and
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥) (11)
and
−𝑥 = 𝑚(1 − ℎ) (12)
and
Before we use this proposition to recover a recursive representation of the Ramsey plan, note
that the proposition relies on knowing the set Ω
To find Ω, Chang uses the insights of Kydland and Prescott [81] together with a method
based on the Abreu, Pearce, and Stacchetti [2] iteration to convergence on an operator 𝐵 that
maps continuation values into values
We want an operator that maps a continuation 𝜃 into a current 𝜃
83.6. ANALYSIS 1423
such that Eq. (11), Eq. (12), and Eq. (13) hold
Thus, 𝐵(𝑄) is the set of first period 𝜃’s attainable with (𝑚, 𝑥, ℎ) ∈ 𝐸 and some 𝜃′ ∈ 𝑄
Proposition:
Let ℎ⃗ 𝑡 = (ℎ0 , ℎ1 , … , ℎ𝑡 ) denote a history of inverse money creation rates with time 𝑡 compo-
nent ℎ𝑡 ∈ Π
A government strategy 𝜎 = {𝜎𝑡 }∞
𝑡=0 is a 𝜎0 ∈ Π and for 𝑡 ≥ 1 a sequence of functions 𝜎𝑡 ∶
Π𝑡−1 → Π
Chang restricts the government’s choice of strategies to the following space:
𝐶𝐸𝜋 = {ℎ⃗ ∈ Π∞ ∶ there is some (𝑚,⃗ 𝑥)⃗ such that (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
In words, 𝐶𝐸𝜋 is the set of money growth sequences consistent with the existence of competi-
tive equilibria
Chang observes that 𝐶𝐸𝜋 is nonempty and compact
Definition: 𝜎 is said to be admissible if for all 𝑡 ≥ 1 and after any history ℎ⃗ 𝑡−1 , the continua-
tion ℎ⃗ 𝑡 implied by 𝜎 belongs to 𝐶𝐸𝜋
Admissibility of 𝜎 means that anticipated policy choices associated with 𝜎 are consistent with
the existence of competitive equilibria after each possible subsequent history
After any history ℎ⃗ 𝑡−1 , admissibility restricts the government’s choice in period 𝑡 to the set
In words, 𝐶𝐸𝜋0 is the set of all first period money growth rates ℎ = ℎ0 , each of which is con-
sistent with the existence of a sequence of money growth rates ℎ⃗ starting from ℎ0 in the ini-
tial period and for which a competitive equilibrium exists
1424 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
Remark: 𝐶𝐸𝜋0 = {ℎ ∈ Π ∶ there is (𝑚, 𝜃′ ) ∈ [0, 𝑚]̄ × Ω such that 𝑚𝑢′ [𝑓((ℎ − 1)𝑚) −
𝑣′ (𝑚)] ≤ 𝛽𝜃′ with equality if 𝑚 < 𝑚}
̄
Definition: An allocation rule is a sequence of functions 𝛼⃗ = {𝛼𝑡 }∞ 𝑡
𝑡=0 such that 𝛼𝑡 ∶ Π →
[0, 𝑚]̄ × 𝑋
Thus, the time 𝑡 component of 𝛼𝑡 (ℎ𝑡 ) is a pair of functions (𝑚𝑡 (ℎ𝑡 ), 𝑥𝑡 (ℎ𝑡 ))
Definition: Given an admissible government strategy 𝜎, an allocation rule 𝛼 is called com-
petitive if given any history ℎ⃗ 𝑡−1 and ℎ𝑡 ∈ 𝐶𝐸𝜋0 , the continuations of 𝜎 and 𝛼 after (ℎ⃗ 𝑡−1 , ℎ𝑡 )
induce a competitive equilibrium sequence
At this point it is convenient to introduce another operator that can be used to compute a
Ramsey plan
For computing a Ramsey plan, this operator is wasteful because it works with a state vector
that is bigger than necessary
We introduce this operator because it helps to prepare the way for Chang’s operator called
̃
𝐷(𝑍) that we shall describe in lecture credible government policies
It is also useful because a fixed point of the operator to be defined here provides a good guess
̃
for an initial set from which to initiate iterations on Chang’s set-to-set operator 𝐷(𝑍) to be
described in lecture credible government policies
Let 𝑆 be the set of all pairs (𝑤, 𝜃) of competitive equilibrium values and associated initial
marginal utilities
Let 𝑊 be a bounded set of values in R
Let 𝑍 be a nonempty subset of 𝑊 × Ω
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, 𝜃 pairs
Define the operator
such that
It is possible to establish
Proposition:
Proposition:
It can be shown that 𝑆 is compact and that therefore there exists a (𝑤, 𝜃) pair within this set
that attains the highest possible value 𝑤
This (𝑤, 𝜃) pair i associated with a Ramsey plan
Further, we can compute 𝑆 by iterating to convergence on 𝐷 provided that one begins with a
sufficiently large initial set 𝑆0
As a very useful by-product, the algorithm that finds the largest fixed point 𝑆 = 𝐷(𝑆) also
produces the Ramsey plan, its value 𝑤, and the associated competitive equilibrium
𝐷(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
We noted that the set 𝑆 can be found by iterating to convergence on 𝐷, provided that we
start with a sufficiently large initial set 𝑆0
1426 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
A key feature of this algorithm is that we discretize the action space, i.e., we create a grid of
possible values for 𝑚 and ℎ (note that 𝑥 is implied by 𝑚 and ℎ). This discretization simplifies
computation of 𝑆 ̃ by allowing us to find it by solving a sequence of linear programs
The outer hyperplane approximation algorithm proceeds as follows:
• Solve a linear program (described below) for each action in the action space
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1427
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
This problem maximizes the hyperplane level for a given set of actions
The second part of Step 2 then finds the maximum possible hyperplane level across the action
space
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0
Step 3 ends the algorithm when the difference between these sets is small enough
We have created a Python class that solves the model assuming the following functional
forms:
𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
In [2]: """
Author: Sebastian Graves
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
1430 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0]*(self.u_vec[j] + self.β * res.x[0]) + self.H[i, 1]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] + self.β*res.x[0]) + self.H[i, 1]
t_a1a2_s[j] = res.x
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1431
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} seconds'.format(iters, round(elapsed, 2
c = np.zeros(Φ.shape[0])
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
cons1 = ({'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[1] * (x[0] - 1) + v_p(x[1])
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[0] * x[1] - θ})
cons2 = ({'type': 'ineq', 'fun': lambda x: uc_p(f(x[0], mbar)) * mbar * (x[0] - 1) + v_p(mbar)
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], mbar)) * x[0] * mbar - θ})
# Bellman Iterations
diff = 1
iters = 1
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1433
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
1434 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)
def plot_competitive(ChangModel):
"""
Method that only plots competitive equilibrium set
"""
poly_C = polytope.Polytope(ChangModel.H, ChangModel.c1_c)
ext_C = polytope.extreme(poly_C)
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
plt.tight_layout()
plt.show()
plot_competitive(ch1)
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1435
[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 971.84 seconds
In [6]: plot_competitive(ch2)
In this section we solve the Bellman equation confronting a continuation Ramsey planner
The construction of a Ramsey plan is decomposed into a two subproblems in Ramsey plans,
time inconsistency, sustainable plans and dynamic Stackelberg problems
subject to:
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥)
𝑥 = 𝑚(ℎ − 1)
(𝑚, 𝑥, ℎ) ∈ 𝐸
𝜃′ ∈ Ω
First, a quick check that our approximations of the value functions are good
We do this by calculating the residuals between iterates on the value function on a fine grid:
The value functions plotted below trace out the right edges of the sets of equilibrium values
plotted above
plt.show()
The next figure plots the optimal policy functions; values of 𝜃′ , 𝑚, 𝑥, ℎ for each value of the
state 𝜃:
plt.show()
83.8. SOLVING A CONTINUATION RAMSEY PLANNER’S BELLMAN EQUATION 1439
With the first set of parameter values, the value of 𝜃′ chosen by the Ramsey planner quickly
hits the upper limit of Ω
But with the second set of parameters it converges to a value in the interior of the set
Consequently, the choice of 𝜃 ̄ is clearly important with the first set of parameter values
One way of seeing this is plotting 𝜃′ (𝜃) for each set of parameters
With the first set of parameter values, this function does not intersect the 45-degree line until
𝜃,̄ whereas in the second set of parameter values, it intersects in the interior
axes[0].legend()
plt.show()
Subproblem 2 is equivalent to the planner choosing the initial value of 𝜃 (i.e. the value which
maximizes the value function)
From this starting point, we can then trace out the paths for {𝜃𝑡 , 𝑚𝑡 , ℎ𝑡 , 𝑥𝑡 }∞
𝑡=0 that support
this equilibrium
These are shown below for both sets of parameters
plt.show()
83.8. SOLVING A CONTINUATION RAMSEY PLANNER’S BELLMAN EQUATION 1441
In Credible Government Policies in Chang Model we shall find a subset of competitive equi-
libria that are sustainable in the sense that a sequence of government administrations that
chooses sequentially, rather than once and for all at time 0 will choose to implement them
In the process of constructing them, we shall construct another, smaller set of competitive
equilibria
1442 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
84
84.1 Contents
• Overview 84.2
• The Setting 84.3
• Calculating the Set of Sustainable Promise-Value Pairs 84.4
84.2 Overview
Some of the material in this lecture and competitive equilibria in the Chang model can be
viewed as more sophisticated and complete treatments of the topics discussed in Ramsey
plans, time inconsistency, sustainable plans
This lecture assumes almost the same economic environment analyzed in competitive equilib-
ria in the Chang model
The only change – and it is a substantial one – is the timing protocol for making government
decisions
In competitive equilibria in the Chang model, a Ramsey planner chose a comprehensive gov-
ernment policy once-and-for-all at time 0
Now in this lecture, there is no time 0 Ramsey planner
Instead there is a sequence of government decision-makers, one for each 𝑡
The time 𝑡 government decision-maker choose time 𝑡 government actions after forecasting
what future governments will do
We use the notion of a sustainable plan proposed in [26], also referred to as a credible public
policy in [124]
1443
1444 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
Technically, this lecture starts where lecture competitive equilibria in the Chang model on
Ramsey plans within the Chang [25] model stopped
That lecture presents recursive representations of competitive equilibria and a Ramsey plan for
a version of a model of Calvo [21] that Chang used to analyze and illustrate these concepts
We used two operators to characterize competitive equilibria and a Ramsey plan, respectively
In this lecture, we define a credible public policy or sustainable plan
Starting from a large enough initial set 𝑍0 , we use iterations on Chang’s set-to-set operator
̃
𝐷(𝑍) to compute a set of values associated with sustainable plans
̃
Chang’s operator 𝐷(𝑍) is closely connected with the operator 𝐷(𝑍) introduced in lecture
competitive equilibria in the Chang model
̃
• 𝐷(𝑍) incorporates all of the restrictions imposed in constructing the operator 𝐷(𝑍),
but …
– these additional restrictions incorporate the idea that a plan must be sustainable
– sustainable means that the government wants to implement it at all times after all
histories
We begin by reviewing the set up deployed in competitive equilibria in the Chang model
Chang’s model, adopted from Calvo, is designed to focus on the intertemporal trade-offs be-
tween the welfare benefits of deflation and the welfare costs associated with the high tax col-
lections required to retire money at a rate that delivers deflation
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing an infinite sequence of sufficiently large distorting tax collections
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation
We start by reviewing notation
For a sequence of scalars 𝑧 ⃗ ≡ {𝑧𝑡 }∞ 𝑡
𝑡=0 , let 𝑧 ⃗ = (𝑧0 , … , 𝑧𝑡 ), 𝑧𝑡⃗ = (𝑧𝑡 , 𝑧𝑡+1 , …)
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, …
The objects in play are
A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget con-
straints and other constraints imposed by competitive equilibrium
Given tax collection and price of money sequences, a representative household chooses se-
quences (𝑐,⃗ 𝑚)
⃗ of consumption and real balances
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling
decisions of the government and the representative household
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, also known as the value of money
Chang [25] assumes that
84.3.2 Government
The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋
𝑡
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚]̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:
𝑦𝑡 = 𝑓(𝑥𝑡 ) (5)
where 𝑓 ∶ R → R satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ R, so that subsidies and taxes are equally distorting
The purpose is not to model the causes of tax distortions in any detail but simply to summa-
rize the outcome of those distortions via the function 𝑓(𝑥)
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues
The government chooses a competitive equilibrium that maximizes Eq. (1)
For the results in this lecture, the timing of actions within a period is important because of
the incentives that it activates
Chang assumed the following within-period timing of decisions:
This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected
This timing will shape the incentives confronting the government at each history that are to
be incorporated in the construction of the 𝐷̃ operator below
∞
ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’
From the standpoint of the household at time 𝑡, equation Eq. (7) shows that 𝜃𝑡+1 intermedi-
ates the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation
A closely related observation pervaded the analysis of Stackelberg plans in dynamic Stackel-
berg problems and the Calvo model
Definition:
• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 )
• The government budget constraint is satisfied
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem
1448 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) (8)
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
• Here it is to be understood that ℎ̂ 𝑡 is the action that the government policy instructs
the government to take, while ℎ𝑡 possibly not equal to ℎ̂ 𝑡 is some other action that the
government is free to take at time 𝑡
so that at each instance and circumstance of choice, a government attains a weakly higher
lifetime utility with continuation value 𝑤𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) by adhering to the plan and con-
firming the associated time 𝑡 action ℎ̂ 𝑡 that the public had expected earlier
Please note the subtle change in arguments of the functions used to represent a competitive
equilibrium and a Ramsey plan, on the one hand, and a credible government plan, on the
other hand
The extra arguments appearing in the functions used to represent a credible plan come from
allowing the government to contemplate disappointing the private sector’s expectation about
its time 𝑡 choice ℎ̂ 𝑡
A credible plan induces the government to confirm the private sector’s expectation
84.3. THE SETTING 1449
The recursive representation of the plan uses the evolution of continuation values to deter the
government from wanting to disappoint the private sector’s expectations
Technically, a Ramsey plan and a credible plan both incorporate history dependence
For a Ramsey plan, this is encoded in the dynamics of the state variable 𝜃𝑡 , a promised
marginal utility that the Ramsey plan delivers to the private sector
For a credible government plan, we the two-dimensional state vector (𝑤𝑡 , 𝜃𝑡 ) encodes history
dependence
A government strategy 𝜎 and an allocation rule 𝛼 are said to constitute a sustainable plan
(SP) if
1. 𝜎 is admissible
2. Given 𝜎, 𝛼 is competitive
3. After any history ℎ⃗ 𝑡−1 , the continuation of 𝜎 is optimal for the government; i.e., the se-
quence ℎ⃗ 𝑡 induced by 𝜎 after ℎ⃗ 𝑡−1 maximizes over 𝐶𝐸𝜋 given 𝛼
Given any history ℎ⃗ 𝑡−1 , the continuation of a sustainable plan is a sustainable plan
Let Θ = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 ∶ there is an SP whose outcome is(𝑚,⃗ 𝑥,⃗ ℎ)}
⃗
with value
∞
𝑤 = ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )] and such that 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) = 𝜃}
𝑡=0
The space 𝑆 is a compact subset of 𝑊 × Ω where 𝑊 = [𝑤, 𝑤] is the space of values associated
with sustainable plans. Here 𝑤 and 𝑤 are finite bounds on the set of values
Because there is at least one sustainable plan, 𝑆 is nonempty
Now recall the within-period timing protocol, which we can depict (ℎ, 𝑥) → 𝑚 = 𝑞𝑀 → 𝑦 = 𝑐
With this timing protocol in mind, the time 0 component of an SP has the following compo-
nents:
1. A period 0 action ℎ̂ ∈ Π that the public expects the government to take, together
with subsequent within-period consequences 𝑚(ℎ), ̂ 𝑥(ℎ)̂ when the government acts as
expected
2. For any first-period action ℎ ≠ ℎ̂ with ℎ ∈ 𝐶𝐸𝜋0 , a pair of within-period consequences
𝑚(ℎ), 𝑥(ℎ) when the government does not act as the public had expected
3. For every ℎ ∈ Π, a pair (𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ 𝑆 to carry into next period
1450 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
These components must be such that it is optimal for the government to choose ℎ̂ as ex-
pected; and for every possible ℎ ∈ Π, the government budget constraint and the household’s
Euler equation must hold with continuation 𝜃 being 𝜃′ (ℎ)
Given the timing protocol within the model, the representative household’s response to a
government deviation to ℎ ≠ ℎ̂ from a prescribed ℎ̂ consists of a first-period action 𝑚(ℎ)
and associated subsequent actions, together with future equilibrium prices, captured by
(𝑤′ (ℎ), 𝜃′ (ℎ))
At this point, Chang introduces an idea in the spirit of Abreu, Pearce, and Stacchetti [2]
Let 𝑍 be a nonempty subset of 𝑊 × Ω
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, promised
marginal utility pairs
Define the following operator:
̃
𝐷(𝑍) = {(𝑤, 𝜃) ∶ there is ℎ̂ ∈ 𝐶𝐸𝜋0 and for each ℎ ∈ 𝐶𝐸𝜋0
(9)
a four-tuple (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
̂ + 𝑣(𝑚(ℎ))
𝑤 = 𝑢(𝑓(𝑥(ℎ))) ̂ + 𝛽𝑤′ (ℎ)̂ (10)
̂
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ)̂ + 𝑥(ℎ))
̂ (11)
and
This operator adds the key incentive constraint to the conditions that had defined the earlier
𝐷(𝑍) operator defined in competitive equilibria in the Chang model
Condition Eq. (12) requires that the plan deter the government from wanting to take one-shot
deviations when candidate continuation values are drawn from 𝑍
Proposition:
̃
1. If 𝑍 ⊂ 𝐷(𝑍), ̃
then 𝐷(𝑍) ⊂ 𝑆 (‘self-generation’)
̃
2. 𝑆 = 𝐷(𝑆) (‘factorization’)
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1451
Proposition:
Chang establishes that 𝑆 is compact and that therefore there exists a highest value SP and a
lowest value SP
Further, the preceding structure allows Chang to compute 𝑆 by iterating to convergence on 𝐷̃
provided that one begins with a sufficiently large initial set 𝑍0
This structure delivers the following recursive representation of a sustainable outcome:
ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
̃
Above we defined the 𝐷(𝑍) operator as Eq. (9)
Chang (1998) provides a method for dealing with the final three constraints
These incentive constraints ensure that the government wants to choose ℎ̂ as the private sec-
tor had expected it to
Chang’s simplification starts from the idea that, when considering whether or not to confirm
the private sector’s expectation, the government only needs to consider the payoff of the best
possible deviation
Equally, to provide incentives to the government, we only need to consider the harshest possi-
ble punishment
Let ℎ denote some possible deviation. Chang defines:
𝑥 = 𝑚(ℎ − 1)
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) + 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
1452 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
For a given deviation ℎ, this problem finds the worst possible sustainable value
We then define:
𝐸(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
and
𝑤 ≥ 𝐵𝑅(𝑍)}
Aside from the final incentive constraint, this is the same as the operator in competitive equi-
libria in the Chang model
Consequently, to implement this operator we just need to add one step to our outer hyper-
plane approximation algorithm :
• Solve a linear program (described below) for each action in the action space
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
This gives us a matrix of possible values, corresponding to each point in the action space
To find 𝐵𝑅(𝑍), we minimize over the 𝑚 dimension and maximize over the ℎ dimension
Step 3 then constructs the set 𝑆𝑡+1 = 𝐸(𝑆𝑡 ). The linear program in Step 3 is designed to
construct a set 𝑆𝑡+1 that is as large as possible while satisfying the constraints of the 𝐸(𝑆)
operator
To do this, for each subgradient ℎ𝑖 , and for each point in the action space (𝑚𝑗 , ℎ𝑗 ), we solve
the following problem:
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
𝑤 ≥ 𝐵𝑅(𝑍)
This problem maximizes the hyperplane level for a given set of actions
1454 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
The second part of Step 3 then finds the maximum possible hyperplane level across the action
space
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0
Step 4 ends the algorithm when the difference between these sets is small enough
We have created a Python class that solves the model assuming the following functional
forms:
𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
In [2]: """
Author: Sebastian Graves
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""
uc = lambda c: np.log(c)
uc_p = lambda c: 1/c
v = lambda m: 1/500 * (mbar * m - 0.5 * m**2)**0.5
v_p = lambda m: 0.5/500 * (mbar * m - 0.5 * m**2)**(-0.5) * (mbar - m)
u = lambda h, m: uc(f(h, m)) + v(m)
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0]*(self.u_vec[j] + self.β * res.x[0]) + self.H[i, 1]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] + self.β*res.x[0]) + self.H[i, 1]
t_a1a2_s[j] = res.x
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} seconds'.format(iters, round(elapsed, 2
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
cons1 = ({'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[1] * (x[0] - 1) + v_p(x[1])
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[0] * x[1] - θ})
cons2 = ({'type': 'ineq', 'fun': lambda x: uc_p(f(x[0], mbar)) * mbar * (x[0] - 1) + v_p(mbar)
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], mbar)) * x[0] * mbar - θ})
# Bellman Iterations
diff = 1
iters = 1
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
1460 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1461
The set of (𝑤, 𝜃) associated with sustainable plans is smaller than the set of (𝑤, 𝜃) pairs asso-
ciated with competitive equilibria, since the additional constraints associated with sustainabil-
ity must also be satisfied
Let’s compute two examples, one with a low 𝛽, another with a higher 𝛽
In [4]: ch1.solve_sustainable()
The following plot shows both the set of 𝑤, 𝜃 pairs associated with competitive equilibria (in
red) and the smaller set of 𝑤, 𝜃 pairs associated with sustainable plans (in blue)
def plot_equilibria(ChangModel):
"""
Method to plot both equilibrium sets
"""
fig, ax = plt.subplots(figsize=(7, 5))
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
R = ext_C[idx_Ramsey, :]
ax.scatter(R[0], R[1], 150, 'black', 'o', zorder=1)
w_min = min(ext_C[:, 0])
plt.tight_layout()
plt.show()
plot_equilibria(ch1)
In [7]: ch2.solve_sustainable()
[0.01795]
[0.01642]
[0.01507]
[0.01284]
[0.01106]
[0.00694]
[0.0085]
[0.00781]
[0.00433]
[0.00492]
[0.00303]
[0.00182]
[0.00638]
[0.00116]
[0.00093]
[0.00075]
[0.0006]
[0.00494]
[0.00038]
[0.00121]
[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 782.13 seconds
In [8]: plot_equilibria(ch2)
1464 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
[1] Dilip Abreu. On the theory of infinitely repeated games with discounting. Economet-
rica, 56:383–396, 1988.
[2] Dilip Abreu, David Pearce, and Ennio Stacchetti. Toward a theory of discounted re-
peated games with imperfect monitoring. Econometrica, 58(5):1041–1063, September
1990.
[3] Daron Acemoglu, Simon Johnson, and James A Robinson. The colonial origins of com-
parative development: An empirical investigation. The American Economic Review,
91(5):1369–1401, 2001.
[4] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.
[5] S Rao Aiyagari, Albert Marcet, Thomas J Sargent, and Juha Seppälä. Optimal taxation
without state-contingent debt. Journal of Political Economy, 110(6):1220–1254, 2002.
[6] D. B. O. Anderson and J. B. Moore. Optimal Filtering. Dover Publications, 2005.
[7] E. W. Anderson, L. P. Hansen, E. R. McGrattan, and T. J. Sargent. Mechanics of
Forming and Estimating Dynamic Linear Economies. In Handbook of Computational
Economics. Elsevier, vol 1 edition, 1996.
[8] Cristina Arellano. Default risk and income fluctuations in emerging economies. The
American Economic Review, pages 690–712, 2008.
[9] Papoulis Athanasios and S Unnikrishna Pillai. Probability, random variables, and
stochastic processes. Mc-Graw Hill, 1991.
[10] Orazio P Attanasio and Nicola Pavoni. Risk sharing in private information models with
asset accumulation: Explaining the excess smoothness of consumption. Econometrica,
79(4):1027–1068, 2011.
[11] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[12] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.
[13] L M Benveniste and J A Scheinkman. On the Differentiability of the Value Function in
Dynamic Models of Economics. Econometrica, 47(3):727–732, 1979.
[14] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.
[15] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.
1465
1466 BIBLIOGRAPHY
[17] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J. Sargent. Fiscal Policy
and Debt Management with Incomplete Markets. The Quarterly Journal of Economics,
132(2):617–663, 2017.
[19] Fischer Black and Robert Litterman. Global portfolio optimization. Financial analysts
journal, 48(5):28–43, 1992.
[20] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor,
Studies in the Quantity Theory of Money, pages 25–117. University of Chicago Press,
Chicago, 1956.
[21] Guillermo A. Calvo. On the time consistency of optimal policy in a monetary economy.
Econometrica, 46(6):1411–1428, 1978.
[22] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.
[23] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.
[24] David Cass. Optimum growth in an aggregative model of capital accumulation. Review
of Economic Studies, 32(3):233–240, 1965.
[25] Roberto Chang. Credible monetary policy in an infinite horizon model: Recursive ap-
proaches. Journal of Economic Theory, 81(2):431–461, 1998.
[26] Varadarajan V Chari and Patrick J Kehoe. Sustainable plans. Journal of Political
Economy, pages 783–802, 1990.
[27] Ronald Harry Coase. The nature of the firm. economica, 4(16):386–405, 1937.
[28] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.
[29] J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.
[30] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.
[31] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.
[32] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.
[33] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.
[34] Raymond J Deneckere and Kenneth L Judd. Cyclical and chaotic behavior in a dy-
namic equilibrium model, with implications for fiscal policy. Cycles and chaos in eco-
nomic equilibrium, pages 308–329, 1992.
BIBLIOGRAPHY 1467
[35] J Dickey. Bayesian alternatives to the f-test and least-squares estimate in the normal
linear model. In S.E. Fienberg and A. Zellner, editors, Studies in Bayesian econometrics
and statistics, pages 515–554. North-Holland, Amsterdam, 1975.
[36] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.
[37] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.
[38] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.
[39] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.
[40] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.
[44] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.
[45] David Gale. The theory of linear economic models. University of Chicago press, 1989.
[46] Albert Gallatin. Report on the finances**, november, 1807. In Reports of the Secretary
of the Treasury of the United States, Vol 1. Government printing office, Washington,
DC, 1837.
[47] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.
[48] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.
[49] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.
[50] Michael J Hamburger, Gerald L Thompson, and Roman L Weil. Computation of expan-
sion rates for the generalized von neumann model of an expanding economy. Economet-
rica, Journal of the Econometric Society, pages 542–547, 1967.
[51] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.
[53] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
1468 BIBLIOGRAPHY
[54] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.
[55] Lars Peter Hansen and Thomas J Sargent. Formulating and estimating dynamic linear
rational expectations models. Journal of Economic Dynamics and control, 2:7–46, 1980.
[56] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics.
Manuscript, Department of Economics, Stanford University., 4, 2000.
[57] Lars Peter Hansen and Thomas J. Sargent. Robust control and model uncertainty.
American Economic Review, 91(2):60–66, 2001.
[58] Lars Peter Hansen and Thomas J Sargent. Robustness. Princeton university press, 2008.
[59] Lars Peter Hansen and Thomas J. Sargent. Recursive Linear Models of Dynamic Eco-
nomics. Princeton University Press, Princeton, New Jersey, 2013.
[60] Lars Peter Hansen and José A Scheinkman. Long-term risk: An operator approach.
Econometrica, 77(1):177–234, 2009.
[61] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.
[62] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.
[63] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.
[64] Elhanan Helpman and Paul Krugman. Market structure and international trade. MIT
Press Cambridge, 1985.
[66] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.
[67] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A
General Equilibrium Analysis. Journal of Political Economy, 101(5):915–938, 1993.
[69] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.
[70] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.
[71] Boyan Jovanovic. Firm-specific capital and turnover. Journal of Political Economy,
87(6):1246–1260, 1979.
[72] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.
BIBLIOGRAPHY 1469
[73] Kenneth L Judd. On the performance of patents. Econometrica, pages 567–585, 1985.
[74] Kenneth L. Judd, Sevin Yeltekin, and James Conklin. Computing Supergame Equilib-
ria. Econometrica, 71(4):1239–1254, 07 2003.
[75] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.
[76] John G Kemeny, Oskar Morgenstern, and Gerald L Thompson. A generalization of the
von neumann model of an expanding economy. Econometrica, Journal of the Economet-
ric Society, pages 115–135, 1956.
[77] Tomoo Kikuchi, Kazuo Nishimura, and John Stachurski. Span of control, transaction
costs, and the structure of production chains. Theoretical Economics, 13(2):729–760,
2018.
[79] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.
[81] Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational expecta-
tions and optimal control. Journal of Economic Dynamics and Control, 2:79–91, 1980.
[82] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynam-
ics. Applied Mathematical Sciences. Springer-Verlag, 1994.
[83] Edward E Leamer. Specification searches: Ad hoc inference with nonexperimental data,
volume 53. John Wiley & Sons Incorporated, 1978.
[84] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.
[85] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.
[86] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.
[87] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.
[88] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.
[89] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.
[90] Robert E Lucas, Jr. and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):55–93, 1983.
1470 BIBLIOGRAPHY
[91] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.
[92] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.
[94] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.
[95] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.
[96] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance.
Cambridge: MIT Press, 2002.
[97] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.
[98] John F Muth. Optimal properties of exponentially weighted forecasts. Journal of the
american statistical association, 55(290):299–306, 1960.
[99] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.
[102] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.
[104] Martin L Puterman. Markov decision processes: discrete stochastic dynamic program-
ming. John Wiley & Sons, 2005.
[105] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.
[107] Kevin L Reffett. Production-based asset pricing in monetary economies with transac-
tions costs. Economica, pages 427–443, 1996.
[110] Sherwin Rosen, Kevin M Murphy, and Jose A Scheinkman. Cattle cycles. Journal of
Political Economy, 102(3):468–492, 1994.
[114] Jaewoo Ryoo and Sherwin Rosen. The engineering labor market. Journal of political
economy, 112(S1):S110–S140, 2004.
[115] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Studies, 21(2):75–78, 1939.
[116] Thomas Sargent, Lars Peter Hansen, and Will Roberts. Observable implications of
present value budget balance. In Rational Expectations Econometrics. Westview Press,
1991.
[117] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational
Expectations: I. International Economic Review, 18(1):59–82, February 1977.
[118] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.
[119] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.
[120] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.
[124] Nancy L Stokey. Reputation and time consistency. The American Economic Review,
pages 134–139, 1989.
[125] Nancy L. Stokey. Credible public policy. Journal of Economic Dynamics and Control,
15(4):627–656, October 1991.
[126] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.
[128] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.
[129] Daniel Treisman. Russia’s billionaires. The American Economic Review, 106(5):236–241,
2016.
1472 BIBLIOGRAPHY
[130] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.
[131] John Von Neumann. Uber ein okonomsiches gleichungssystem und eine verallgemeiner-
ing des browerschen fixpunktsatzes. In Erge. Math. Kolloq., volume 8, pages 73–83,
1937.
[132] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.
[133] Peter Whittle. Prediction and regulation by linear least-square methods. English Univ.
Press, 1963.
[134] Peter Whittle. Prediction and Regulation by Linear Least Squares Methods. University
of Minnesota Press, Minneapolis, Minnesota, 2nd edition, 1983.
[136] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.