Quantitative Economics With Python PDF
Quantitative Economics With Python PDF
Python
January 6, 2020
2
Contents
I Introduction to Python 1
1 About Python 3
3 An Introductory Example 37
4 Python Essentials 57
8 NumPy 113
9 Matplotlib 133
10 SciPy 145
11 Numba 155
12 Parallelization 169
13 Pandas 179
3
4 CONTENTS
16 Debugging 251
57 Robustness 981
Introduction to Python
1
Chapter 1
About Python
1.1 Contents
• Overview 1.2
• What’s Python? 1.3
• Scientific Programming 1.4
• Learn More 1.5
“Python has gotten sufficiently weapons grade that we don’t descend into R any-
more. Sorry, R people. I used to be one of you but we no longer descend into R.”
– Chris Wiggins
1.2 Overview
3
4 CHAPTER 1. ABOUT PYTHON
The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python
The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012.
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science.
For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here.
1.3. WHAT’S PYTHON? 5
Note that pandas takes off in 2012, which is the same year that we seek Python’s popularity
begin to spike in the first figure.
Overall, it’s clear that
• Python is one of the most popular programming languages worldwide.
• Python is a major tool for scientific computing, accounting for a rapidly rising share of
scientific work around the globe.
1.3.3 Features
One nice feature of Python is its elegant syntax — we’ll see many examples later on.
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the
syntax easy to read and easy to remember.
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you don’t need to break your flow in order to hunt down correct syntax.
Closely related to elegant syntax is an elegant design.
Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code.
6 CHAPTER 1. ABOUT PYTHON
Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library.
NumPy provides the basic array data type plus some simple processing operations.
For example, let’s build some arrays
In [2]: b @ c
Out[2]: 2.706168622523819e-16
The number you see here might vary slightly but it’s essentially zero.
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality.
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density.
ϕ = norm()
value, error = quad(ϕ.pdf, -2, 2) # Integrate using Gaussian quadrature
value
Out[3]: 0.9544997361036417
1.4.2 Graphics
The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib.
• Plots, histograms, contour images, 3D, bar charts, etc., etc.
• Output in many formats (PDF, PNG, EPS, etc.)
• LaTeX integration
Example 2D plot with embedded LaTeX annotations
Example 3D plot
• Bokeh
• VPython — 3D graphics and animations
Out[4]: 3𝑥 + 𝑦
Out[5]: 𝑥2 + 2𝑥𝑦 + 𝑦2
solve polynomials
solve(x**2 + x + 2)
limit(1 / x, x, 0)
Out[7]: ∞
In [8]: limit(sin(x) / x, x, 0)
Out[8]: 1
In [9]: diff(sin(x), x)
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language.
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 CHAPTER 1. ABOUT PYTHON
1.4.4 Statistics
Python’s data manipulation and statistics libraries have improved rapidly over the last few
years.
Pandas
One of the most popular libraries for working with data is pandas.
Pandas is fast, efficient, flexible and well designed.
Here’s a simple example, using some fake data
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
In [11]: df.mean()
Here’s some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node.
/home/ubuntu/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:
↪579:
MatplotlibDeprecationWarning:
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3.�
↪Use
np.iterable instead.
if not cb.iterable(width):
12 CHAPTER 1. ABOUT PYTHON
Running your Python code on massive servers in the cloud is becoming easier and easier.
A nice example is Anaconda Enterprise.
See also
• Amazon Elastic Compute Cloud
• The Google App Engine (Python, Java, PHP or Go)
• Pythonanywhere
• Sagemath Cloud
Apart from the cloud computing options listed above, you might like to consider
• Parallel computing through IPython clusters.
• The Starcluster interface to Amazon’s EC2.
• GPU programming through PyCuda, PyOpenCL, Theano or similar.
There are many other interesting developments with scientific programming in Python.
Some representative examples include
• Jupyter — Python in your browser with code cells, embedded images, etc.
• Numba — Make Python run at the same speed as native machine code!
1.5. LEARN MORE 13
2.1 Contents
• Overview 2.2
• Anaconda 2.3
• Jupyter Notebooks 2.4
• Installing Libraries 2.5
• Working with Files 2.6
• Editors and IDEs 2.7
• Exercises 2.8
2.2 Overview
1. get a Python environment up and running with all the necessary tools
2.3 Anaconda
The core Python package is easy to install but not what you should choose for these lectures.
These lectures require the entire scientific programming ecosystem, which
• the core installation doesn’t provide
• is painful to install one piece at a time
Hence the best approach for our purposes is to install a free Python distribution that contains
15
16 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Installing Anaconda is straightforward: download the binary and follow the instructions.
Important points:
• Install the latest version.
• If you are asked during the installation process whether you’d like to make Anaconda
your default Python installation, say yes.
• Otherwise, you can accept all of the defaults.
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages.
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution.
As a practice run, please execute the following
1. Open up a terminal
Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries.
They use a browser-based interface to Python with
• The ability to write and execute Python commands.
• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions.
2.4. JUPYTER NOTEBOOKS 17
Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem.
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook
You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here.
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to
• start coding in Python
• test new ideas or interact with small pieces of code
• share or collaborate scientific ideas with students or colleagues
These lectures are designed for executing in Jupyter notebooks.
Once you have installed Anaconda, you can start the Jupyter notebook.
18 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Either
• search for Jupyter in your applications menu, or
• open up a terminal and type jupyter notebook
– Windows users should substitute “Anaconda command prompt” for “terminal” in
the previous line.
If you use the second option, you will see something like this
The notebook displays an active cell, into which you can type Python commands.
Let’s start with how to edit code and run simple programs.
Running Cells
Notice that in the previous figure the cell is surrounded by a green border.
This means that the cell is in edit mode.
As a result, you can type in Python code and it will appear in the cell.
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter.
2.4. JUPYTER NOTEBOOKS 21
(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem.
This means that the effect of typing at the keyboard depends on which mode you are in.
The two modes are
1. Edit mode
1. Command mode
Python 3 introduced support for unicode characters, allowing the use of characters such as 𝛼
and 𝛽 in your code.
Unicode characters can be typed quickly in Jupyter using the tab key.
Try creating a new code cell and typing \alpha, then hitting the tab key on your keyboard.
A Test Program
ax = plt.subplot(111, projection='polar')
ax.bar(θ, radii, width=width, bottom=0.0, color=colors, alpha=0.5)
plt.show()
2.4. JUPYTER NOTEBOOKS 23
Don’t worry about the details for now — let’s just run it and see what happens.
The easiest way to run this code is to copy and paste it into a cell in the notebook.
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)
Tab Completion
On-Line Help
Clicking on the top right of the lower split closes the on-line help.
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page.
For example, here we enter a mixture of plain text and LaTeX instead of code
26 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX.
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 27
Notebook files are just text files structured in JSON and typically ending with .ipynb.
You can share them in the usual way that you share files — or by using web services such as
nbviewer.
The notebooks you see on that site are static html representations.
To run one, download it as an ipynb file by clicking on the download icon at the top right.
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above.
QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon
Notes.
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community.
28 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
into a cell.
Alternatively, you can type the following into a terminal
Using the run command is often easier than copy and paste.
• For example, %run test.py will run the file test.py.
2.6. WORKING WITH FILES 29
(You might find that the % is unnecessary — use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD).
If test.py isn’t in that directory, you will get an error.
Let’s look at a successful example, where we run a file test.py with contents:
foobar
foobar
foobar
foobar
foobar
Here
• pwd asks Jupyter to show the PWD (or %pwd — see the comment about automagic
above)
– This is where Jupyter is going to look for files to run.
– Your output will look a bit different depending on your OS.
30 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
If you’re trying to run a file not in the present working directory, you’ll get an error.
To fix this error, you need to either
One way to achieve the first option is to use the Upload button
• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture
It’s often convenient to be able to see your code before you run it.
2.7. EDITORS AND IDES 31
The preceding discussion covers most of what you need to know to interact with this website.
However, as you start to write longer programs, you might want to experiment with your
workflow.
There are many different options and we mention them only in passing.
32 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
2.7.1 JupyterLab
A text editor is an application that is specifically designed to work with text files — such as
Python programs.
Nothing beats the power and efficiency of a good text editor for working with program text.
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key.
The up arrow key brings previously typed commands to the prompt.
This saves a lot of typing…
Here’s one set up, on a Linux box, with
• a file being edited in Vim
• an IPython shell next to it, to run the file
34 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
2.7.4 IDEs
IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment.
One of the most popular in recent times is VS Code, which is now available via Anaconda.
We hear good things about VS Code — please tell us about your experiences on the forum.
2.8 Exercises
2.8.1 Exercise 1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it.
Now launch again, but this time using jupyter notebook --no-browser.
This should start the kernel without launching the browser.
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running.
Now
2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the
top.
This is an alternative way to start the notebook that can also be handy.
2.8.2 Exercise 2
1. Installing Git.
For example, if you’ve installed the command line version, open up a terminal and enter.
(This is just git clone in front of the URL for the repository)
As the 2nd task,
1. Sign up to GitHub.
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub).
3. Fork QuantEcon.py.
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo.
An Introductory Example
3.1 Contents
• Overview 3.2
• The Task: Plotting a White Noise Process 3.3
• Version 1 3.4
• Alternative Versions 3.5
• Exercises 3.6
• Solutions 3.7
We’re now ready to start learning the Python language itself.
The level of this and the next few lectures will suit those with some basic knowledge of pro-
gramming.
But don’t give up if you have none—you are not excluded.
You just need to cover a few of the fundamentals of programming before returning here.
Good references for first-time programmers include:
• The first 5 or 6 chapters of How to Think Like a Computer Scientist.
• Automate the Boring Stuff with Python.
• The start of Dive into Python 3.
Note: These references offer help on installing Python but you should probably stick with the
method on our set up page.
You’ll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course.
3.2 Overview
In this lecture, we will write and then pick apart small Python programs.
The objective is to introduce you to basic Python syntax and data structures.
Deeper concepts will be covered in later lectures.
37
38 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.2.1 Prerequisites
Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw
𝜖𝑡 is independent standard normal.
In other words, we want to generate figures that look something like this:
3.4 Version 1
Here are a few lines of code that perform the task we set
x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 39
np.sqrt(4)
Out[2]: 2.0
numpy.sqrt(4)
Out[3]: 2.0
Packages
2. possibly some compiled code that can be accessed by Python (e.g., functions compiled
from C or FORTRAN code)
3. a file called __init__.py that specifies what will be executed when we type import
package_name
In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around.
On this machine, it’s located in
anaconda3/lib/python3.6/site-packages/numpy
Subpackages
np.sqrt(4)
Out[4]: 2.0
3.5. ALTERNATIVE VERSIONS 41
sqrt(4)
Out[5]: 2.0
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
plt.plot(ϵ_values)
plt.show()
42 CHAPTER 3. AN INTRODUCTORY EXAMPLE
In brief,
• The first line sets the desired length of the time series.
• The next line creates an empty list called ϵ_values that will store the 𝜖𝑡 values as we
generate them.
• The next three lines are the for loop, which repeatedly draws a new random number 𝜖𝑡
and appends it to the end of the list ϵ_values.
• The last two lines generate the plot and display it to the user.
Let’s study some parts of this program in more detail.
3.5.2 Lists
In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)
Out[7]: list
The first element of x is an integer, the next is a string, and the third is a Boolean value.
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [8]: x
In [9]: x.append(2.5)
x
Here append() is what’s called a method, which is a function “attached to” an object—in
this case, the list x.
We’ll learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object.
• String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
In [10]: x
In [11]: x.pop()
Out[11]: 2.5
In [12]: x
In [13]: x
In [14]: x[0]
Out[14]: 10
In [15]: x[1]
Out[15]: 'foo'
Now let’s consider the for loop from the program above, which was
Python executes the two indented lines ts_length times before moving on.
These two lines are called a code block, since they comprise the “block” of code that we
are looping over.
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion.
In our program, indentation decreases after line ϵ_values.append(e), telling Python that
this line marks the lower limit of the code block.
More on indentation below—for now, let’s look at another example of a for loop
This example helps to clarify how the for loop works: When we execute a loop of the form
In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation.
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation.
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram.
Once you get used to it, this is a good thing: It
• forces clean, consistent indentation, improving readability
• removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.
3.5. ALTERNATIVE VERSIONS 45
• All lines in a code block must have the same amount of indentation.
• The Python standard is 4 spaces, and that’s what you should use.
Tabs vs Spaces
One small “gotcha” here is the mixing of tabs and spaces, which often leads to errors.
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure it’s configured to do
so.
If you are using a Jupyter notebook, you will have no problems here.
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
— try searching online.
The for loop is the most common technique for iteration in Python.
But, for the purpose of illustration, let’s modify the program above to use a while loop in-
stead.
Note that
• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1
Now let’s go back to the for loop, but restructure our program to make the logic clearer.
To this end, we will break our program into two parts:
data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 47
Let’s go over this carefully, in case you’re not familiar with functions and how they work.
We have defined a function called generate_data() as follows
• def is a Python keyword used to start function definitions.
• def generate_data(n): indicates that the function is called generate_data and
that it has a single argument n.
• The indented code is a code block called the function body—in this case, it creates an
IID list of random draws using the same logic as before.
• The return keyword indicates that ϵ_values is the object that should be returned to
the calling code.
This whole function definition is read by the Python interpreter and stored in memory.
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100.
The net result is that the name data is bound to the list ϵ_values returned by the func-
tion.
3.5.7 Conditions
Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks.
Notes
• We are passing the argument U as a string, which is why we write it as 'U'.
• Notice that equality is tested with the == syntax, not =.
– For example, the statement a = 10 assigns the name a to the value 10.
– The expression a == 10 evaluates to either True or False, depending on the
value of a.
Now, there are several ways that we can simplify the code above.
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function.
To understand this, consider the following version.
Out[22]: 7
In [23]: m = max
m(7, 2, 4)
Out[23]: 7
Here we created another name for the built-in function max(), which could then be used in
identical ways.
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another function—as we did above.
We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension.
50 CHAPTER 3. AN INTRODUCTORY EXAMPLE
In [25]: range(8)
Out[25]: range(0, 8)
ϵ_values = []
for i in range(n):
e = generator_type()
ϵ_values.append(e)
into
3.6 Exercises
3.6.1 Exercise 1
3.6.2 Exercise 2
The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary
trials, where each trial succeeds with probability 𝑝.
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of 𝑌 .
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True
with probability 𝑝.
3.6.3 Exercise 3
3.6.4 Exercise 4
Write a program that prints one realization of the following random device:
• Flip an unbiased coin 10 times.
• If a head occurs three or more times within this sequence, pay one dollar.
• If not, pay nothing.
Use no import besides from numpy.random import uniform.
3.6.5 Exercise 5
Your next task is to simulate and plot the correlated time series
3.6.6 Exercise 6
To do the next exercise, you will need to know how to produce a plot legend.
The following example should be sufficient to convey the idea
Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98.
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 53
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the 𝛼 values.
Important hints:
• If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure.
– And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line.
• The expression 'foo' + str(42) evaluates to 'foo42'.
3.7 Solutions
3.7.1 Exercise 1
factorial(4)
Out[30]: 24
54 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.7.2 Exercise 2
binomial_rv(10, 0.5)
Out[31]: 6
3.7.3 Exercise 3
In [32]: n = 100000
count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
3.14636
3.7.4 Exercise 4
payoff = 0
count = 0
for i in range(10):
3.7. SOLUTIONS 55
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1
print(payoff)
3.7.5 Exercise 5
The next line embeds all subsequent figures in the browser itself
In [34]: α = 0.9
ts_length = 200
current_x = 0
x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values)
plt.show()
3.7.6 Exercise 6
for α in αs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values, label=f'α = {α}')
plt.legend()
plt.show()
Chapter 4
Python Essentials
4.1 Contents
We’ve already met several built-in Python data types, such as strings, integers, floats and
lists.
Let’s learn a bit more about them.
One simple data type is Boolean values, which can be either True or False
In [1]: x = True
x
Out[1]: True
In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value
57
58 CHAPTER 4. PYTHON ESSENTIALS
Out[2]: False
In [3]: type(y)
Out[3]: bool
In [4]: x + y
Out[4]: 1
In [5]: x * y
Out[5]: 0
Out[6]: 2
sum(bools)
Out[7]: 3
The two most common data types used to represent numbers are integers and floats
In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)
Out[8]: int
In [9]: type(c)
Out[9]: float
Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate.
As long as you’re using Python 3.x, division of integers yields floats
In [10]: 1 / 2
Out[10]: 0.5
4.2. DATA TYPES 59
To return only the integer part of the division of two integers in Python 3.x, use this syntax:
In [11]: 1 // 2
Out[11]: 0
In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y
Out[12]: 5j
4.2.2 Containers
Python has several basic types for storing collections of (possibly heterogeneous) data.
We’ve already discussed lists.
A related data type is tuples, which are “immutable” lists
In [14]: type(x)
Out[14]: tuple
In Python, an object is called immutable if, once created, the object cannot be changed.
Conversely, an object is mutable if it can still be altered after creation.
Python lists are mutable
In [15]: x = [1, 2]
x[0] = 10
x
Out[15]: [10, 2]
In [16]: x = (1, 2)
x[0] = 10
60 CHAPTER 4. PYTHON ESSENTIALS
�
↪---------------------------------------------------------------------------
<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10
We’ll say more about the role of mutable and immutable data a bit later.
Tuples (and lists) can be “unpacked” as follows
Out[17]: 10
In [18]: y
Out[18]: 20
Slice Notation
To access multiple elements of a list or tuple, you can use Python’s slice notation.
For example,
In [19]: a = [2, 4, 6, 8]
a[1:]
Out[19]: [4, 6, 8]
In [20]: a[1:3]
Out[20]: [4, 6]
Out[21]: [6, 8]
In [22]: s = 'foobar'
s[-3:] # Select the last three elements
Out[22]: 'bar'
Two other container types we should mention before moving on are sets and dictionaries.
Dictionaries are much like lists, except that the items are named instead of numbered
Out[23]: dict
In [24]: d['age']
Out[24]: 33
Out[25]: set
Out[26]: False
In [27]: s1.intersection(s2)
Out[27]: {'b'}
Let’s briefly review reading and writing to text files, starting with writing
Here
• The built-in function open() creates a file object for writing to.
• Both write() and close() are methods of file objects.
Where is this file that we’ve created?
Recall that Python maintains a concept of the present working directory (pwd) that can be
located from with Jupyter or IPython via
In [30]: %pwd
Out[30]: '/home/ubuntu/repos/lecture-source-py/_build/jupyterpdf/executed'
In [32]: print(out)
Testing
Testing again
4.3.1 Paths
Note that if newfile.txt is not in the present working directory then this call to open()
fails.
In this case, you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
4.4 Iterating
One of the most important tasks in computing is stepping through a sequence of data and
performing a given action.
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for
loop.
4.4. ITERATING 63
Many Python objects are “iterable”, in the sense that they can be looped over.
To give an example, let’s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory.
Overwriting us_cities.txt
Here format() is a string method used for inserting variables into strings.
The reformatting of each line is the result of three different string methods, the details of
which can be left till later.
The interesting part of this program for us is line 2, which shows that
1. The file object data_file is iterable, in the sense that it can be placed to the right of
in within a for loop.
64 CHAPTER 4. PYTHON ESSENTIALS
One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing.
For example,
1
4
9
is preferred to
1
4
9
When you compare these two alternatives, you can see why the first one is preferred.
Python provides some facilities to simplify looping without indices.
One is zip(), which is used for stepping through pairs from two sequences.
For example, try running the following code
The zip() function is also useful for creating dictionaries — for example
If we actually need the index from a list, one option is to use enumerate().
To understand what enumerate() does, consider the following example
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
4.5.1 Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False).
A common type is comparisons, such as
In [41]: x, y = 1, 2
x < y
Out[41]: True
In [42]: x > y
Out[42]: False
Out[43]: True
Out[44]: True
In [45]: x = 1 # Assignment
x == 2 # Comparison
Out[45]: False
In [46]: 1 != 2
Out[46]: True
Note that when testing conditions, we can use any valid Python expression
Out[47]: 'yes'
Out[48]: 'no'
Out[49]: True
Out[50]: False
Out[51]: True
4.6. MORE FUNCTIONS 67
Out[52]: False
Out[53]: True
Remember
• P and Q is True if both are True, else False
• P or Q is False if both are False, else True
Let’s talk a bit more about functions, which are all important for good programming style.
Python has a number of built-in functions that are available without import.
We have already met some
Out[54]: 20
Out[55]: range(0, 4)
In [56]: list(range(4)) # will evaluate the range iterator and create a list
Out[56]: [0, 1, 2, 3]
In [57]: str(22)
Out[57]: '22'
In [58]: type(22)
Out[58]: int
Out[59]: False
Out[60]: True
User-defined functions are important for improving the clarity of your code by
• separating different strands of logic
• facilitating code reuse
(Writing the same thing twice is almost always a bad idea)
The basics of user-defined functions were discussed here.
Functions without a return statement automatically return the special Python object None.
4.6.3 Docstrings
Python has a system for adding comments to functions, modules, etc. called docstrings.
The nice thing about docstrings is that they are available at run-time.
Try running this
In [63]: f?
Type: function
4.6. MORE FUNCTIONS 69
In [64]: f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as
well.
and
quad(lambda x: x**3, 0, 2)
Here the function created by lambda is said to be anonymous because it was never given a
name.
70 CHAPTER 4. PYTHON ESSENTIALS
If you did the exercises in the previous lecture, you would have come across the statement
In this call to Matplotlib’s plot function, notice that the last argument is passed in
name=argument syntax.
This is called a keyword argument, with label being the keyword.
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
• plot(x, 'b-', label="white noise") is different from plot('b-', x,
label="white noise")
Keyword arguments are particularly useful when a function has a lot of arguments, in which
case it’s hard to remember the right order.
You can adopt keyword arguments in user-defined functions with no difficulty.
The next example illustrates the syntax
The keyword argument values we supplied in the definition of f become the default values
In [69]: f(2)
Out[69]: 3
Out[70]: 14
To learn more about the Python programming philosophy type import this at the
prompt.
Among other things, Python strongly favors consistency in programming style.
We’ve all heard the saying about consistency and little minds.
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page.
In Python, the standard style is set out in PEP8.
(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical nota-
tion)
4.8. EXERCISES 71
4.8 Exercises
4.8.1 Exercise 1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip().
Part 2: In one line, count the number of even numbers in 0,…,99.
• Hint: x % 2 returns 0 if x is even, 1 otherwise.
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even.
4.8.2 Exercise 2
𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (1)
𝑖=0
Write a function p such that p(x, coeff) that computes the value in (1) given a point x
and a list of coefficients coeff.
Try to use enumerate() in your loop.
4.8.3 Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters
in the string.
Hint: 'foo'.upper() returns 'FOO'.
4.8.4 Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False.
• By “sequence” we mean a list, a tuple or a string.
• Do the exercise without using sets and set methods.
4.8.5 Exercise 5
When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation.
Nevertheless, let’s write our own function approximation routine as an exercise.
72 CHAPTER 4. PYTHON ESSENTIALS
In particular, without using any imports, write a function linapprox that takes as argu-
ments
• A function f mapping some interval [𝑎, 𝑏] into ℝ.
• Two scalars a and b providing the limits of this interval.
• An integer n determining the number of grid points.
• A number x satisfying a <= x <= b.
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b.
Aim for clarity, not efficiency.
4.9 Solutions
4.9.1 Exercise 1
Part 1 Solution:
Out[71]: 6
Out[72]: 6
Part 2 Solution:
One solution is
Out[73]: 50
Out[74]: 50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 73
Out[75]: 50
and
Out[76]: 50
Part 3 Solution
In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])
Out[77]: 2
4.9.2 Exercise 2
Out[79]: 6
4.9.3 Exercise 3
Out[80]: 3
Out[81]: 3
74 CHAPTER 4. PYTHON ESSENTIALS
4.9.4 Exercise 4
Here’s a solution:
# == test == #
True
False
Of course, if we use the sets data type then the solution is easier
4.9.5 Exercise 5
Parameters
==========
f : function
The function to approximate
n : integer
Number of grid points
Returns
=======
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
point += step
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
5.1 Contents
• Overview 5.2
• Objects 5.3
• Summary 5.4
5.2 Overview
Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach.
However, at a foundational level, Python is object-oriented.
In particular, in Python, everything is an object.
In this lecture, we explain what that statement means and why it matters.
77
78 CHAPTER 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
5.3 Objects
In Python, an object is a collection of data and instructions held in computer memory that
consists of
1. a type
2. a unique identity
4. methods
5.3.1 Type
Python provides for different types of objects, to accommodate different categories of data.
For example
Out[1]: str
Out[2]: int
Out[3]: '300cc'
Out[4]: 700
�
↪---------------------------------------------------------------------------
<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400
Here we are mixing types, and it’s unclear to Python whether the user wants to
• convert '300' to an integer and then add it to 400, or
• convert 400 to string and then concatenate it with '300'
Some languages might try to guess but Python is strongly typed
• Type is important, and implicit type conversion is rare.
• Python will respond instead by raising a TypeError.
To avoid the error, you need to clarify by changing the relevant type.
For example,
5.3.2 Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object.
The identity of an object can be obtained via the id() function
In [6]: y = 2.5
z = 2.5
id(y)
Out[6]: 140495967880488
In [7]: id(z)
Out[7]: 140495967880536
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object.
The identity of an object is in fact just the address of the object in memory.
80 CHAPTER 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
If we set x = 42 then we create an object of type int that contains the data 42.
In fact, it contains more, as the following example shows
In [8]: x = 42
x
Out[8]: 42
In [9]: x.imag
Out[9]: 0
In [10]: x.__class__
Out[10]: int
When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type.
Any name following a dot is called an attribute of the object to the left of the dot.
• e.g.,imag and __class__ are attributes of x.
We see from this example that objects have attributes that contain auxiliary information.
They also have attributes that act like functions, called methods.
These attributes are important, so let’s discuss them in-depth.
5.3.4 Methods
Out[11]: True
In [12]: callable(x.__doc__)
Out[12]: False
Methods typically act on the data contained in the object they belong to, or combine that
data with other data
In [14]: s.lower()
It doesn’t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call.
What actually happens is that Python calls the __setitem__ method, as follows
(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)
5.4 Summary
In [19]: type(f)
Out[19]: function
In [20]: id(f)
Out[20]: 140495967322592
In [21]: f.__name__
Out[21]: 'f'
We can see that f has type, identity, attributes and so on—just like any other object.
It also has methods.
One example is the __call__ method, which just evaluates the function
In [22]: f.__call__(3)
Out[22]: 9
id(math)
Out[23]: 140496031121800
This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent.
Chapter 6
6.1 Contents
• Overview 6.2
• OOP Review 6.3
• Defining Your Own Classes 6.4
• Special Methods 6.5
• Exercises 6.6
• Solutions 6.7
6.2 Overview
83
84 CHAPTER 6. OOP II: BUILDING CLASSES
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into “objects”.
An example is a Python list, which not only stores data but also knows how to sort itself, etc.
In [2]: x = [1, 5, 4]
x.sort()
x
Out[2]: [1, 4, 5]
As we now know, sort is a function that is “part of” the list object — and hence called a
method.
If we want to make our own types of objects we need to use class definitions.
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers).
It describes
• What kind of data the class stores
• What methods it has for acting on these data
An object or instance is a realization of the class, created from the blueprint
• Each instance has its own unique data.
• Methods set out in the class definition act on this (and other) data.
In Python, the data and methods of an object are collectively referred to as attributes.
Attributes are accessed via “dotted attribute notation”
• object_name.data
• object_name.method_name()
In the example
In [3]: x = [1, 5, 4]
x.sort()
x.__class__
6.4. DEFINING YOUR OWN CLASSES 85
Out[3]: list
• x is an object or instance, created from the definition for Python lists, but with its own
particular data.
• x.sort() and x.__class__ are two attributes of x.
• dir(x) can be used to view all the attributes of x.
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure.
For example,
• a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all players’ actions, and a timing protocol
These are all abstractions that collect together “objects” of the same “type”.
Recognizing common structure allows us to employ common tools.
In economic theory, this might be a proposition that applies to all games of a certain type.
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate).
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object.
Usage
Out[5]: 5
In [6]: c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
In [7]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
Out[7]: 8
6.4. DEFINING YOUR OWN CLASSES 87
In [8]: c1.wealth
Out[8]: 10
In [9]: c1.__dict__
In [10]: c2.__dict__
Out[10]: {'wealth': 8}
When we access or set attributes we’re actually just modifying the dictionary maintained by
the instance.
Self
If you look at the Consumer class definition again you’ll see the word self throughout the
code.
The rules with self are that
• Any instance data should be prepended with self
– e.g., the earn method references self.wealth rather than just wealth
• Any method defined within the class should have self as its first argument
– e.g., def earn(self, y) rather than just def earn(y)
• Any method referenced within the class should be called as self.method_name
There are no examples of the last rule in the preceding code but we will see some shortly.
Details
In this section, we look at some more formal details related to classes and self
• You might wish to skip to the next section on first pass of this lecture.
• You can return to these details after you’ve familiarized yourself with more examples.
Methods actually live inside a class object formed when the interpreter reads the class defini-
tion
Note how the three methods __init__, earn and spend are stored in the class object.
Consider the following code
In [12]: c1 = Consumer(10)
c1.earn(10)
c1.wealth
Out[12]: 20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn.
In fact, the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument.
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call.
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth.
For our next example, let’s write a simple class to implement the Solow growth model.
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita 𝑘𝑡 evolves according to the rule
𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (1)
1+𝑛
Here
• 𝑠 is an exogenously given savings rate
• 𝑧 is a productivity parameter
• 𝛼 is capital’s share of income
• 𝑛 is the population growth rate
• 𝛿 is the depreciation rate
The steady state of the model is the 𝑘 that solves (1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘.
Here’s a class that implements this model.
Some points of interest in the code are
• An instance maintains a record of its current capital stock in the variable self.k.
• The h method implements the right-hand side of (1).
• The update method uses h to update capital as per (1).
6.4. DEFINING YOUR OWN CLASSES 89
– Notice how inside update the reference to the local method h is self.h.
The methods steady_state and generate_sequence are fairly self-explanatory
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
Here’s a little program that uses the class to compute time series from two different initial
conditions.
The common steady state is also plotted for comparison
In [15]: s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
90 CHAPTER 6. OOP II: BUILDING CLASSES
ax.set_xlabel('$k_{t+1}$', fontsize=14)
ax.set_ylabel('$k_t$', fontsize=14)
ax.legend()
plt.show()
Next, let’s write a class for a simple one good market where agents are price takers.
The market consists of the following objects:
• A linear demand curve 𝑄 = 𝑎𝑑 − 𝑏𝑑 𝑝
• A linear supply curve 𝑄 = 𝑎𝑧 + 𝑏𝑧 (𝑝 − 𝑡)
Here
• 𝑝 is price paid by the consumer, 𝑄 is quantity and 𝑡 is a per-unit tax.
• Other symbols are demand and supply parameters.
The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus.
6.4. DEFINING YOUR OWN CLASSES 91
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.
↪ bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Here’s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
6.4. DEFINING YOUR OWN CLASSES 93
Out[21]: 1.125
Let’s look at one more example, related to chaotic dynamics in nonlinear systems.
One simple transition rule that can generate complex dynamics is the logistic map
Let’s write a class for generating time series from this model.
Here’s one implementation
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
6.4. DEFINING YOUR OWN CLASSES 95
ax.set_xlabel('$r$', fontsize=16)
ax.set_ylabel('$x_t$', fontsize=16)
plt.show()
96 CHAPTER 6. OOP II: BUILDING CLASSES
Python provides special methods with which some neat tricks can be performed.
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function
Out[26]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
def __len__(self):
return 42
Now we get
In [28]: f = Foo()
len(f)
Out[28]: 42
In [30]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
Out[30]: 50
6.6 Exercises
6.6.1 Exercise 1
The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is
defined as
1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ ℝ) (3)
𝑛 𝑖=1
Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥)
is the fraction of the sample that falls below 𝑥.
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 con-
verges to the true distribution function 𝐹 .
Implement 𝐹𝑛 as a class called ECDF, where
98 CHAPTER 6. OOP II: BUILDING CLASSES
• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations.
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥.
Your code should work as follows (modulo randomness)
6.6.2 Exercise 2
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ ℝ) (4)
𝑛=0
The instance data for the class Polynomial will be the coefficients (in the case of (4), the
numbers 𝑎0 , … , 𝑎𝑁 ).
Provide methods that
2. Differentiate the polynomial, replacing the original coefficients with those of its deriva-
tive 𝑝′ .
6.7 Solutions
6.7.1 Exercise 1
if obs <= x:
counter += 1
return counter / len(self.observations)
In [32]: # == test == #
print(F(0.5))
0.4
0.484
6.7.2 Exercise 2
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
100 CHAPTER 6. OOP II: BUILDING CLASSES
Part II
101
Chapter 7
7.1 Contents
• Overview 7.2
• Scientific Libraries 7.3
• The Need for Speed 7.4
• Vectorization 7.5
• Beyond Vectorization 7.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
7.2 Overview
Let’s briefly review Python’s scientific libraries, starting with why we need them.
103
104 CHAPTER 7. PYTHON FOR SCIENTIFIC COMPUTING
One obvious reason we use scientific libraries is because they implement routines we want to
use.
For example, it’s almost always better to use an existing routine for root finding than to write
a new one from scratch.
(For standard algorithms, efficiency is maximized if the community can coordinate on a com-
mon set of implementations, written by experts and tuned by users to be as fast and robust
as possible.)
But this is not the only reason that we use Python’s scientific libraries.
Another is that pure Python, while flexible and elegant, is not fast.
So we need libraries that are designed to accelerate execution of Python code.
As we’ll see below, there are now Python libraries that can do this extremely well.
In terms of popularity, the big four in the world of scientific Python libraries are
• NumPy
• SciPy
• Matplotlib
• Pandas
For us, there’s another (relatively new) library that will also be essential for numerical com-
puting:
• Numba
Over the next few lectures we’ll see how to use these libraries.
But first, let’s quickly review how they fit together.
• NumPy forms the foundations by providing a basic array data type (think of vectors
and matrices) and functions for acting on these arrays (e.g., matrix multiplication).
• SciPy builds on NumPy by adding the kinds of numerical methods that are routinely
used in science (interpolation, optimization, root finding, etc.).
• Matplotlib is used to generate figures, with a focus on plotting data stored in NumPy
arrays.
• Pandas provides types and functions for empirical work (e.g., manipulating data).
• Numba accellerates execution via JIT compilation — we’ll learn about this soon.
The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug.
The downside is that Python is harder to optimize — that is, turn into fast machine code —
than languages like C or Fortran.
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran.
Does that mean that we should just switch to C or Fortran for everything?
The answer is: No, no and one hundred times no!
(This is what you should say to the senior professor insisting that the model needs to be
rewritten in Fortran or C++.)
There are two reasons why:
First, for any given program, relatively few lines are ever going to be time-critical.
Hence it is far more efficient to write most of our code in a high productivity language like
Python.
Second, even for those lines of code that are time-critical, we can now achieve the same speed
as C or Fortran using Python’s scientific libraries.
Before we learn how to do this, let’s try to understand why plain vanila Python is slower than
C or Fortran.
This will, in turn, help us figure out how to speed things up.
Dynamic Typing
In [2]: a, b = 10, 10
a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do.
For example, in the statement a + b, the interpreter has to know which operation to invoke.
If a and b are strings, then a + b requires string concatenation
Out[3]: 'foobar'
(We say that the operator + is overloaded — its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation.
This involves substantial overheads.
Static Types
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
In C or Fortran, these integers would typically be stored in an array, which is a simple data
structure for storing homogeneous data.
Such an array is stored in a single contiguous block of memory
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).
• For example, a 64 bit integer is stored in 8 bytes of memory.
• An array of 𝑛 such integers occupies 8𝑛 consecutive memory slots.
Moreover, the compiler is made aware of the data type by the programmer.
• In this case 64 bit integers
7.5. VECTORIZATION 107
Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount.
• In this case 8 bytes
7.5 Vectorization
There is a clever method called vectorization that can be used to speed up high level lan-
guages in numerical applications.
The key idea is to send array processing operations in batch to pre-compiled and efficient na-
tive machine code.
The machine code itself is typically compiled from carefully optimized C or Fortran.
For example, when working in a high level language, the operation of inverting a large ma-
trix can be subcontracted to efficient machine code that is pre-compiled for this purpose and
supplied to users as part of a package.
This clever idea dates back to MATLAB, which uses vectorization extensively.
Vectorization can greatly accelerate many numerical computations (but not all, as we shall
see).
Let’s see how vectorization works in Python, using NumPy.
Next let’s try some non-vectorized code, which uses a native Python loop to generate, square
and then sum a large number of random variables:
In [6]: n = 1_000_000
108 CHAPTER 7. PYTHON FOR SCIENTIFIC COMPUTING
In [7]: %%time
In [8]: %%time
x = np.random.uniform(0, 1, n)
y = np.sum(x**2)
As you can see, the second code block runs much faster. Why?
The second code block breaks the loop down into three basic operations
1. draw n uniforms
2. square them
3. sum them
Many functions provided by NumPy are so-called universal functions — also called ufuncs.
This means that they
• map scalars into scalars, as expected
• map arrays into arrays, acting element-wise
For example, np.cos is a ufunc:
7.5. VECTORIZATION 109
In [9]: np.cos(1.0)
Out[9]: 0.5403023058681398
cos(𝑥2 + 𝑦2 )
𝑓(𝑥, 𝑦) = and 𝑎 = 3
1 + 𝑥2 + 𝑦 2
Here’s a plot of 𝑓
In [13]: %%time
m = -np.inf
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z
In [14]: %%time
x, y = np.meshgrid(grid, grid)
np.max(f(x, y))
CPU times: user 16.5 ms, sys: 24.2 ms, total: 40.7 ms
Wall time: 41 ms
Out[14]: 0.9999819641085747
In the vectorized version, all the looping takes place in compiled code.
As you can see, the second version is much faster.
(We’ll make it even faster again later on, using more scientific programming tricks.)
NumPy
8.1 Contents
• Overview 8.2
• NumPy Arrays 8.3
• Operations on Arrays 8.4
• Additional Functionality 8.5
• Exercises 8.6
• Solutions 8.7
“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton
8.2 Overview
8.2.1 References
113
114 CHAPTER 8. NUMPY
In [2]: a = np.zeros(3)
a
In [3]: type(a)
Out[3]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
• Data must be homogeneous (all elements of the same type).
• These types must be one of the data types (dtypes) provided by NumPy.
The most important of these dtypes are:
• float64: 64 bit floating-point number
• int64: 64 bit integer
• bool: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc.
On modern machines, the default dtype for arrays is float64
In [4]: a = np.zeros(3)
type(a[0])
Out[4]: numpy.float64
Out[5]: numpy.int64
8.3. NUMPY ARRAYS 115
In [6]: z = np.zeros(10)
Here z is a flat array with no dimension — neither row nor column vector.
The dimension is recorded in the shape attribute, which is a tuple
In [7]: z.shape
Out[7]: (10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma).
To give it dimension, we can change the shape attribute
Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
In [9]: z = np.zeros(4)
z.shape = (2, 2)
z
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2)).
In [10]: z = np.empty(3)
z
116 CHAPTER 8. NUMPY
In [12]: z = np.identity(2)
z
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [14]: type(z)
Out[14]: numpy.ndarray
See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array.
Out[17]: True
Out[18]: False
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details.
8.3. NUMPY ARRAYS 117
In [19]: z = np.linspace(1, 2, 5)
z
In [20]: z[0]
Out[20]: 1.0
In [22]: z[-1]
Out[22]: 2.0
In [24]: z[0, 0]
Out[24]: 1
In [25]: z[0, 1]
Out[25]: 2
And so on.
Note that indices are still zero-based, to maintain compatibility with Python sequences.
Columns and rows can be extracted as follows
In [26]: z[0, :]
In [27]: z[:, 1]
118 CHAPTER 8. NUMPY
In [28]: z = np.linspace(2, 4, 5)
z
In [30]: z
In [32]: z[d]
Out[32]: array([2.5, 3. ])
In [33]: z = np.empty(3)
z
In [34]: z[:] = 42
z
Out[37]: 10
Out[38]: 2.5
Out[39]: 4
Out[40]: 3
Out[43]: 1.25
Out[44]: 1.118033988749895
In [46]: z = np.linspace(2, 4, 5)
z
In [47]: z.searchsorted(2.2)
Out[47]: 1
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [49]: np.sum(a)
Out[49]: 10
In [50]: np.mean(a)
Out[50]: 2.5
In [52]: a * b
In [53]: a + 10
8.4. OPERATIONS ON ARRAYS 121
In [54]: a * 10
In [56]: A + 10
In [57]: A * B
With Anaconda’s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
Out[59]: 50
In [61]: A @ (0, 1)
Mutability leads to the following behavior (which can be shocking to MATLAB program-
mers…)
In [64]: a = np.random.randn(3)
a
In [65]: b = a
b[0] = 0.0
a
Making Copies
In [66]: a = np.random.randn(3)
a
In [67]: b = np.copy(a)
b
In [68]: b[:] = 1
b
In [69]: a
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays
124 CHAPTER 8. NUMPY
In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions.
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”.
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions.
In [72]: z
In [75]: x = np.random.randn(4)
x
f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
However, this approach doesn’t always obtain the same speed as a more carefully crafted vec-
torized function.
8.5. ADDITIONAL FUNCTIONALITY 125
8.5.2 Comparisons
In [79]: y[0] = 5
z == y
In [80]: z != y
In [82]: z > 3
In [83]: b = z > 3
b
In [84]: z[b]
8.5.3 Sub-packages
NumPy provides some additional functionality related to scientific programming through its
sub-packages.
We’ve already seen how we can generate random variables using np.random
y.mean()
Out[86]: 5.027
Out[87]: -2.0000000000000004
Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy.
We’ll cover the SciPy versions in more detail soon.
For a comprehensive list of what’s available in NumPy see this documentation.
8.6 Exercises
8.6.1 Exercise 1
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (1)
𝑛=0
Earlier, you wrote a simple function p(x, coeff) to evaluate (1) without considering effi-
ciency.
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop.
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
don’t use this class)
• Hint: Use np.cumprod()
8.6. EXERCISES 127
8.6.2 Exercise 2
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you can’t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper.
Your exercise is to speed it up using NumPy, avoiding explicit loops
• Hint: Use np.searchsorted and np.cumsum
If you can, implement the functionality as a class called discreteRV, where
• the data for an instance of the class is the vector of probabilities q
• the class has a draw() method, which returns one draw according to the algorithm de-
scribed above
If you can, write the method so that draw(k) returns k draws from q.
8.6.3 Exercise 3
2. Add a method that plots the ECDF over [𝑎, 𝑏], where 𝑎 and 𝑏 are method parameters.
128 CHAPTER 8. NUMPY
8.7 Solutions
8.7.1 Exercise 1
Let’s test it
In [92]: x = 2
coef = np.linspace(2, 4, 3)
print(coef)
print(p(x, coef))
# For comparison
q = np.poly1d(np.flip(coef))
print(q(x))
[2. 3. 4.]
24.0
24.0
8.7.2 Exercise 2
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with�
↪vector of
probabilities given by q.
"""
"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))
The logic is not obvious, but if you take your time and read it slowly, you will understand.
There is a problem here, however.
Suppose that q is altered after an instance of discreteRV is created, for example by
The problem is that Q does not change accordingly, and Q is the data used in the draw
method.
To deal with this, one option is to compute Q every time the draw method is called.
But this is inefficient relative to computing Q once-off.
A better option is to use descriptors.
A solution from the quantecon library using descriptors that behaves as we desire can be
found here.
8.7.3 Exercise 3
In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
self.observations = np.asarray(observations)
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval
"""
In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
8.7. SOLUTIONS 131
132 CHAPTER 8. NUMPY
Chapter 9
Matplotlib
9.1 Contents
• Overview 9.2
• The APIs 9.3
• More Features 9.4
• Further Reading 9.5
• Exercises 9.6
• Solutions 9.7
9.2 Overview
We’ve already generated quite a few figures in these lectures using Matplotlib.
Matplotlib is an outstanding graphics library, designed for scientific computing, with
• high-quality 2D and 3D plots
• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine-grained control over all aspects of presentation
• animation, etc.
133
134 CHAPTER 9. MATPLOTLIB
Here’s the kind of easy example you might find in introductory treatments
This is simple and convenient, but also somewhat limited and un-Pythonic.
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer.
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line).
This leads us to the alternative, object-oriented Matplotlib API.
Here’s the code corresponding to the preceding figure using the object-oriented API
9.3.3 Tweaks
We’ve also used alpha to make the line slightly transparent—which makes it look smoother.
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center').
Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them.
We mention just a few.
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
9.4. MORE FEATURES 139
9.4.3 3D Plots
ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)
Perhaps you will find a set of customizations that you regularly use.
Suppose we usually prefer our axes to go through the origin, and to have a grid.
Here’s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes.
Read carefully through the code and see if you can follow what’s going on
fig, ax = plt.subplots()
ax.grid()
return fig, ax
1. calls the standard plt.subplots function internally to generate the fig, ax pair,
9.6 Exercises
9.6.1 Exercise 1
9.7 Solutions
9.7.1 Exercise 1
for θ in θ_vals:
ax.plot(x, np.cos(np.pi * θ * x) * np.exp(- x))
plt.show()
144 CHAPTER 9. MATPLOTLIB
Chapter 10
SciPy
10.1 Contents
• Overview 10.2
• SciPy versus NumPy 10.3
• Statistics 10.4
• Roots and Fixed Points 10.5
• Optimization 10.6
• Integration 10.7
• Linear Algebra 10.8
• Exercises 10.9
• Solutions 10.10
10.2 Overview
SciPy builds on top of NumPy to provide common tools for scientific programming such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
Like NumPy, SciPy is stable, mature and widely used.
Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as
LAPACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole.
A more common approach is to get some idea of what’s in the library and then look up docu-
mentation as required.
In this lecture, we aim only to highlight some useful parts of the package.
145
146 CHAPTER 10. SCIPY
SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality.
In fact, when we import SciPy we also get NumPy, as can be seen from this excerpt the
SciPy initialization file:
However, it’s more common and better practice to use NumPy functionality explicitly
a = np.identity(3)
10.4 Statistics
𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
10.4. STATISTICS 147
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface.
Here’s an example of usage
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()
The object q that represents the distribution has additional useful methods, including
Out[5]: 0.26656768000000003
Out[6]: 0.6339134834642708
In [7]: q.mean()
Out[7]: 0.5
148 CHAPTER 10. SCIPY
The general syntax for creating these objects that represent distributions (of type
rv_frozen) is
name = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
fig, ax = plt.subplots()
ax.plot(x, f(x))
ax.axhline(ls='--', c='k')
plt.show()
10.5.1 Bisection
In [12]: bisect(f, 0, 1)
Out[12]: 0.408294677734375
bisect(f, 0, 1)
Out[13]: 0.4082935042806639
Unlike bisection, the Newton-Raphson method uses local slope information in an attempt to
increase the speed of convergence.
Let’s investigate this using the same function 𝑓 defined above.
With a suitable initial condition for the search we get convergence:
Out[14]: 0.40829350427935673
Out[15]: 0.7001700000000279
2. Check diagnostics
In scipy.optimize, the function brentq is such a hybrid method and a good default
brentq(f, 0, 1)
Out[16]: 0.40829350427936706
Here the correct solution is found and the speed is better than bisection:
28.3 µs ± 676 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
152 CHAPTER 10. SCIPY
105 µs ± 1.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Out[19]: array(1.)
If you don’t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function 𝑓 is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥).
10.6 Optimization
Out[20]: 0.0
10.7. INTEGRATION 153
10.7 Integration
Out[21]: 0.33333333333333337
In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK.
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials.
There are other options for univariate integration—a useful one is fixed_quad, which is fast
and hence works well inside for loops.
There are also functions for multivariate integration.
See the documentation for more details.
We saw that NumPy provides a module for linear algebra called linalg.
SciPy also provides a module for linear algebra with the same name.
The latter is not an exact superset of the former, but overall it has more functionality.
We leave you to investigate the set of available routines.
154 CHAPTER 10. SCIPY
10.9 Exercises
10.9.1 Exercise 1
10.10 Solutions
10.10.1 Exercise 1
Out[23]: 0.408294677734375
Chapter 11
Numba
11.1 Contents
• Overview 11.2
• Compiling Functions 11.3
• Decorators and “nopython” Mode ??
• Compiling Classes 11.5
• Alternatives to Numba 11.6
• Summary and Comments 11.7
• Exercises 11.8
• Solutions 11.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
Please also make sure that you have the latest version of Anaconda, since old versions are a
common source of errors.
Let’s start with some imports:
%matplotlib inline
11.2 Overview
In an earlier lecture we learned about vectorization, which is one method to improve speed
and efficiency in numerical work.
Vectorization involves sending array processing operations in batch to efficient low-level code.
However, as discussed previously, vectorization has several weaknesses.
One is that it is highly memory-intensive when working with large amounts of data.
Another is that the set of algorithms that can be entirely vectorized is not universal.
155
156 CHAPTER 11. NUMBA
As stated above, Numba’s primary use is compiling functions to fast native machine code
during runtime.
11.3.1 An Example
Let’s consider a problem that is difficult to vectorize: generating the trajectory of a difference
equation given an initial condition.
We will take the difference equation to be the quadratic map
𝑥𝑡+1 = 𝛼𝑥𝑡 (1 − 𝑥𝑡 )
In [3]: α = 4.0
Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots()
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()
11.3. COMPILING FUNCTIONS 157
qm_numba = jit(qm)
In [6]: n = 10_000_000
qe.tic()
qm(0.1, int(n))
time1 = qe.toc()
In [7]: qe.tic()
qm_numba(0.1, int(n))
time2 = qe.toc()
In [8]: qe.tic()
qm_numba(0.1, int(n))
time2 = qe.toc()
Out[9]: 133.9559653226916
This kind of speed gain is huge relative to how simple and clear the implementation is.
Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project.
It does this by inferring type information on the fly.
(See our earlier lecture on scientific computing for a discussion of types.)
The basic idea is this:
• Python is very flexible and hence we could call the function qm with many types.
– e.g., x0 could be a NumPy array or a list, n could be an integer or a float, etc.
• This makes it hard to pre-compile the function.
• However, when we do actually call the function, by executing qm(0.5, 10), say, the types
of x0 and n become clear.
• Moreover, the types of other variables in qm can be inferred once the input is known.
• So the strategy of Numba and other JIT compilers is to wait until this moment, and
then compile the function.
That’s why it is called “just-in-time” compilation.
Note that, if you make the call qm(0.5, 10) and then follow it with qm(0.9, 20), compilation
only takes place on the first call.
The compiled code is then cached and recycled as required.
In the code above we created a JIT compiled version of qm via the call
(We will explain all about decorators in a later lecture but you can skip the details at this
stage.)
Let’s see how this is done.
To target a function for JIT compilation we can put @jit before the function definition.
Here’s what this looks like for qm
In [11]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = α * x[t] * (1 - x[t])
return x
@njit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
In [15]: solow_data = [
('n', float64),
('s', float64),
('δ', float64),
('α', float64),
('z', float64),
('k', float64)
]
@jitclass(solow_data)
class Solow:
r"""
Implements the Solow growth model with the update rule
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
11.5. COMPILING CLASSES 161
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
First we specified the types of the instance data for the class in solow_data.
After that, targeting the class for JIT compilation only requires adding @jitclass(solow_data)
before the class definition.
When we call the methods in the class, the methods are compiled just like functions.
In [16]: s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots()
ax.legend()
plt.show()
162 CHAPTER 11. NUMBA
11.6.1 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python.
As was the case with Numba, a key problem is the fact that Python is dynamically typed.
As you’ll recall, Numba solves this problem (where possible) by inferring type.
Cython’s approach is different — programmers add type definitions directly to their “Python”
code.
As such, the Cython language can be thought of as Python with type definitions.
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code.
Cython also takes care of building language extensions — the wrapper code that interfaces
between the resulting compiled code and Python.
While Cython has certain advantages, we generally find it both slower and more cumbersome
than Numba.
11.7. SUMMARY AND COMMENTS 163
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py.
F2Py is a Fortran-to-Python interface generator that is particularly simple to use.
Robert Johansson provides a nice introduction to F2Py, among other things.
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a
try.
11.7.1 Limitations
As we’ve seen, Numba needs to infer type information on all variables to generate fast
machine-level instructions.
For simple routines, Numba infers types very well.
For larger ones, or for routines using external libraries, it can easily fail.
Hence, it’s prudent when using Numba to focus on speeding up small, time-critical snippets of
code.
This will give you much better performance than blanketing your Python programs with
@jit statements.
In [17]: a = 1
@jit
def add_a(x):
return a + x
print(add_a(10))
11
In [18]: a = 2
print(add_a(10))
11
164 CHAPTER 11. NUMBA
Notice that changing the global had no effect on the value returned by the function.
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability.
11.8 Exercises
11.8.1 Exercise 1
11.8.2 Exercise 2
For example, let the period length be one day, and suppose the current state is high.
We see from the graph that the state tomorrow will be
• high with probability 0.8
• low with probability 0.2
Your task is to simulate a sequence of daily volatility states according to this rule.
Set the length of the sequence to n = 1_000_000 and start in the high state.
Implement a pure Python version and a Numba version, and compare speeds.
To test your code, evaluate the fraction of time that the chain spends in the low state.
If your code is correct, it should be about 2/3.
Hints:
• Represent the low state as 0 and the high state as 1.
• If you want to store integers in a NumPy array and then apply JIT compilation, use x
= np.empty(n, dtype=np.int_).
11.9. SOLUTIONS 165
11.9 Solutions
11.9.1 Exercise 1
@njit
def calculate_pi(n=1_000_000):
count = 0
for i in range(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
Out[20]: 3.140572
Out[21]: 3.144764
If we switch of JIT compilation by removing @njit, the code takes around 150 times as long
on our machine.
So we get a speed gain of 2 orders of magnitude–which is huge–by adding four characters.
11.9.2 Exercise 2
We let
• 0 represent “low”
• 1 represent “high”
In [22]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively
166 CHAPTER 11. NUMBA
Let’s run this code and check that the fraction of time spent in the low state is about 0.666
In [24]: n = 1_000_000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.66638
In [25]: qe.tic()
compute_series(n)
qe.toc()
Out[25]: 1.2377808094024658
compute_series_numba = jit(compute_series)
In [27]: x = compute_series_numba(n)
print(np.mean(x == 0))
0.664597
In [28]: qe.tic()
compute_series_numba(n)
qe.toc()
Out[28]: 0.02013850212097168
Parallelization
12.1 Contents
• Overview 12.2
• Types of Parallelization 12.3
• Implicit Multithreading in NumPy 12.4
• Multithreaded Loops in Numba 12.5
• Exercises 12.6
• Solutions 12.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
12.2 Overview
The growth of CPU clock speed (i.e., the speed at which a single chain of logic can be run)
has slowed dramatically in recent years.
This is unlikely to change in the near future, due to inherent physical limitations on the con-
struction of chips and circuit boards.
Chip designers and computer programmers have responded to the slowdown by seeking a dif-
ferent path to fast execution: parallelization.
Hardware makers have increased the number of cores (physical CPUs) embedded in each ma-
chine.
For programmers, the challenge has been to exploit these multiple CPUs by running many
processes in parallel (i.e., simultaneously).
This is particularly important in scientific programming, which requires handling
• large amounts of data and
• CPU intensive simulations and other calculations.
In this lecture we discuss parallelization for scientific computing, with a focus on
169
170 CHAPTER 12. PARALLELIZATION
%matplotlib inline
Large textbooks have been written on different approaches to parallelization but we will keep
a tight focus on what’s most useful to us.
We will briefly review the two main kinds of parallelization in common use in scientific com-
puting and discuss their pros and cons.
12.3.1 Multiprocessing
Multiprocessing means concurrent execution of multiple processes using more than one pro-
cessor.
In this context, a process is a chain of instructions (i.e., a program).
Multiprocessing can be carried out on one machine with multiple CPUs or on a collection of
machines connected by a network.
In the latter case, the collection of machines is usually called a cluster.
With multiprocessing, each process has its own memory space, although the physical memory
chip might be shared.
12.3.2 Multithreading
Multithreading is similar to multiprocessing, except that, during execution, the threads all
share the same memory space.
Native Python struggles to implement multithreading due to some legacy design features.
But this is not a restriction for scientific libraries like NumPy and Numba.
Functions imported from these libraries and JIT-compiled code run in low level execution en-
vironments where Python’s legacy restrictions don’t apply.
Multithreading is more lightweight because most system and memory resources are shared by
the threads.
In addition, the fact that multiple threads all access a shared pool of memory is extremely
convenient for numerical programming.
12.4. IMPLICIT MULTITHREADING IN NUMPY 171
On the other hand, multiprocessing is more flexible and can be distributed across clusters.
For the great majority of what we do in these lectures, multithreading will suffice.
Actually, you have already been using multithreading in your Python code, although you
might not have realized it.
(We are, as usual, assuming that you are running the latest version of Anaconda Python.)
This is because NumPy cleverly implements multithreading in a lot of its compiled code.
Let’s look at some examples to see this in action.
The next piece of code computes the eigenvalues of a large number of randomly generated
matrices.
It takes a few seconds to run.
In [3]: n = 20
m = 1000
for i in range(n):
X = np.random.randn(m, m)
λ = np.linalg.eigvals(X)
Now, let’s look at the output of the htop system monitor on our machine while this code is
running:
Over the last few years, NumPy has managed to push this kind of multithreading out to more
and more operations.
For example, let’s return to a maximization problem discussed previously:
956 ms ± 26.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you have a system monitor such as htop (Linux/Mac) or perfmon (Windows), then try run-
ning this and then observing the load on your CPUs.
(You will probably need to bump up the grid size to see large effects.)
At least on our machine, the output shows that the operation is successfully distributed
across multiple threads.
This is one of the reasons why the vectorized code above is fast.
To get some basis for comparison for the last example, let’s try the same thing with Numba.
In fact there is an easy way to do this, since Numba can also be used to create custom ufuncs
with the [@vectorize](http://numba.pydata.org/numba-doc/dev/user/vectorize.html) decora-
tor.
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
Out[6]: 0.9999992797121728
At least on our machine, the difference in the speed between the Numba version and the vec-
torized NumPy version shown above is not large.
12.5. MULTITHREADED LOOPS IN NUMBA 173
But there’s quite a bit going on here so let’s try to break down what is happening.
Both Numba and NumPy use efficient machine code that’s specialized to these floating point
operations.
However, the code NumPy uses is, in some ways, less efficient.
The reason is that, in NumPy, the operation np.cos(x**2 + y**2) / (1 + x**2 +
y**2) generates several intermediate arrays.
For example, a new array is created when x**2 is calculated.
The same is true when y**2 is calculated, and then x**2 + y**2 and so on.
Numba avoids creating all these intermediate arrays by compiling one function that is special-
ized to the entire operation.
But if this is true, then why isn’t the Numba code faster?
The reason is that NumPy makes up for its disadvantages with implicit multithreading, as
we’ve just discussed.
Out[8]: 0.9999992797121728
533 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now our code runs significantly faster than the NumPy version.
We just saw one approach to parallelization in Numba, using the parallel flag in
@vectorize.
This is neat but, it turns out, not well suited to many problems we consider.
Fortunately, Numba provides another approach to multithreading that will work for us almost
everywhere parallelization is possible.
174 CHAPTER 12. PARALLELIZATION
To illustrate, let’s look first at a simple, single-threaded (i.e., non-parallelized) piece of code.
The code simulates updating the wealth 𝑤𝑡 of a household via the rule
Here
• 𝑅 is the gross rate of return on assets
• 𝑠 is the savings rate of the household and
• 𝑦 is labor income.
We model both 𝑅 and 𝑦 as independent draws from a lognormal distribution.
Here’s the code:
@njit
def h(w, r=0.1, s=0.3, v1=0.1, v2=1.0):
"""
Updates household wealth.
"""
# Draw shocks
R = np.exp(v1 * randn()) * (1 + r)
y = np.exp(v2 * randn())
# Update wealth
w = R * s * w + y
return w
T = 100
w = np.empty(T)
w[0] = 5
for t in range(T-1):
w[t+1] = h(w[t])
ax.plot(w)
Now let’s suppose that we have a large population of households and we want to know what
median wealth will be.
This is not easy to solve with pencil and paper, so we will use simulation instead.
In particular, we will simulate a large number of households and then calculate median wealth
for this group.
Suppose we are interested in the long-run average of this median over time.
It turns out that, for the specification that we’ve chosen above, we can calculate this this by
taking a one-period snapshot of what has happened to median wealth of the group at the end
of a long simulation.
Moreover, provided the simulation period is long enough, initial conditions don’t matter.
• This is due to something called ergodicity, which we will discuss later on.
So, in summary, we are going to simulate 50,000 households by
In [12]: @njit
def compute_long_run_median(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in range(num_reps):
w = w0
for t in range(T):
176 CHAPTER 12. PARALLELIZATION
w = h(w)
obs[i] = w
return np.median(obs)
In [13]: %%time
compute_long_run_median()
Out[13]: 1.832989238241507
@njit(parallel=True)
def compute_long_run_median_parallel(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in prange(num_reps):
w = w0
for t in range(T):
w = h(w)
obs[i] = w
return np.median(obs)
In [15]: %%time
compute_long_run_median_parallel()
Out[15]: 1.8362577016890518
12.5.1 A Warning
Parallelization works well in the outer loop of the last example because the individual tasks
inside the loop are independent of each other.
12.6. EXERCISES 177
12.6 Exercises
12.6.1 Exercise 1
12.7 Solutions
12.7.1 Exercise 1
@njit(parallel=True)
def calculate_pi(n=1_000_000):
count = 0
for i in prange(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
178 CHAPTER 12. PARALLELIZATION
Out[17]: 3.13968
Out[18]: 3.14246
By switching parallelization on and off (selecting True or False in the @jnit annotation),
we can test the speed gain that multithreading provides on top of JIT compilation.
On our workstation, we find that parallelization increases execution speed by a factor of 2 or
3.
(If you are executing locally, you will get different numbers, depending mainly on the number
of CPUs on your machine.)
Chapter 13
Pandas
13.1 Contents
• Overview 13.2
• Series 13.3
• DataFrames 13.4
• On-Line Data Sources 13.5
• Exercises 13.6
• Solutions 13.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
13.2 Overview
179
180 CHAPTER 13. PANDAS
Just as NumPy provides the basic array data type plus core array operations, pandas
• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging Section ??
• dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas.
This lecture will provide a basic introduction to pandas.
Throughout the lecture, we will assume that the following imports have taken place
13.3 Series
Two important data types defined by pandas are Series and DataFrame.
You can think of a Series as a “column” of data, such as a collection of observations on a
single variable.
A DataFrame is an object for storing related columns of data.
Let’s start with Series
13.3. SERIES 181
Out[3]: 0 0.992591
1 -0.391037
2 0.832296
3 -1.050007
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares.
Pandas Series are built on top of NumPy arrays and support many similar operations
In [4]: s * 100
Out[4]: 0 99.259107
1 -39.103725
2 83.229571
3 -105.000735
Name: daily returns, dtype: float64
In [5]: np.abs(s)
Out[5]: 0 0.992591
1 0.391037
2 0.832296
3 1.050007
Name: daily returns, dtype: float64
In [6]: s.describe()
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same type—in this case, floats).
In fact, you can use much of the same syntax as Python dictionaries
In [8]: s['AMZN']
Out[8]: 0.9925910656571161
In [9]: s['AMZN'] = 0
s
In [10]: 'AAPL' in s
Out[10]: True
13.4 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each
variable.
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet.
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns.
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here.
Here’s the content of test_pwt.csv
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","
Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:
13.4. DATAFRAMES 183
In [11]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/
↪raw/master/pandas
/data/test_pwt.csv')
type(df)
Out[11]: pandas.core.frame.DataFrame
In [12]: df
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
We can select particular rows using standard Python array slicing notation
In [13]: df[2:5]
cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented
as strings
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04
To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]
To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way
Let’s imagine that we’re only interested in population and total GDP (tcgdp).
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above
Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex.
To do this, we set the index to be the country variable in the dataframe
In [18]: df = df.set_index('country')
df
13.4. DATAFRAMES 185
Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions
One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib.
For example, we can easily generate a bar plot of GDP per capita
At the moment the data frame is ordered alphabetically on the countries—let’s change it to
GDP per capita
https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv
188 CHAPTER 13. PANDAS
One option is to use requests, a standard Python library for requesting data over the Inter-
net.
To begin, try the following code on your computer
In [25]: r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/
↪downloaddata/UNRATE
.csv')
1. You are not connected to the Internet — hopefully, this isn’t the case.
2. Your machine is accessing the Internet through a proxy server, and Python isn’t aware
of this.
source = requests.get(url).content.decode().split("\n")
source[0]
Out[26]: 'DATE,VALUE\r'
In [27]: source[1]
Out[27]: '1948-01-01,3.4\r'
In [28]: source[2]
13.5. ON-LINE DATA SOURCES 189
Out[28]: '1948-02-01,3.8\r'
We could now write some additional code to parse this text and store it as an array.
But this is unnecessary — pandas’ read_csv function can handle the task for us.
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering
The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
In [30]: type(data)
Out[30]: pandas.core.frame.DataFrame
Out[31]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
In [32]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly
Out[32]: VALUE
count 863.0
mean 5.7
std 1.6
min 2.5
25% 4.5
50% 5.6
75% 6.8
max 10.8
We can also plot the unemployment rate from 2006 to 2012 as follows
In [33]: data['2006':'2012'].plot()
plt.show()
190 CHAPTER 13. PANDAS
The maker of pandas has also authored a library called pandas_datareader that gives pro-
grammatic access to many data sources straight from the Jupyter notebook.
While some sources require an access key, many of the most important (e.g., FRED, OECD,
EUROSTAT and the World Bank) are free to use.
For now let’s work through one example of downloading and plotting data — this time from
the World Bank.
The World Bank collects and organizes data on a huge range of indicators.
For example, here’s some data on government debt as a ratio to GDP.
The next code example fetches the data for you and plots time series for the US and Aus-
tralia
The documentation provides more details on how to access various data sources.
13.6 Exercises
13.6.1 Exercise 1
Write a program to calculate the percentage price change over 2013 for the following shares
A dataset of daily closing prices for the above firms can be found in
pandas/data/ticker_data.csv and can be downloaded here.
Plot the result as a bar graph like this one
192 CHAPTER 13. PANDAS
13.7 Solutions
13.7.1 Exercise 1
ndas/data/ticker_data.csv')
ticker.set_index('Date', inplace=True)
price_change = pd.Series()
price_change.sort_values(inplace=True)
13.7. SOLUTIONS 193
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()
Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
194 CHAPTER 13. PANDAS
Part III
195
Chapter 14
14.1 Contents
• Overview 14.2
• An Example of Poor Code 14.3
• Good Coding Practice 14.4
• Revisiting the Example 14.5
• Exercises 14.6
• Solutions 14.7
14.2 Overview
When computer programs are small, poorly written code is not overly costly.
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs.
For such programs, investment in good coding practices will pay high returns.
The main payoffs are higher productivity and faster code.
In this lecture, we review some elements of good coding practice.
We also touch on modern developments in scientific computing — such as just in time compi-
lation — and how they affect good program design.
Here
• 𝑘𝑡 is capital at time 𝑡 and
• 𝑠, 𝛼, 𝛿 are parameters (savings, a productivity parameter and depreciation)
197
198 CHAPTER 14. WRITING GOOD CODE
1. sets 𝑘0 = 1
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\;�
↪\delta={δ}$")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s[j]},\;�
↪\delta={δ}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True)
14.3. AN EXAMPLE OF POOR CODE 199
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\;�
↪\delta={δ[j]}$")
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True)
plt.show()
200 CHAPTER 14. WRITING GOOD CODE
14.4. GOOD CODING PRACTICE 201
There are usually many different ways to write a program that accomplishes a given task.
For small programs, like the one above, the way you write code doesn’t matter too much.
But if you are ambitious and want to produce useful things, you’ll write medium to large pro-
grams too.
In those settings, coding style matters a great deal.
Fortunately, lots of smart people have thought about the best way to write code.
Here are some basic precepts.
If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the
code.
These kinds of numeric literals in the body of your code are sometimes called “magic num-
bers”.
This is not a compliment.
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants.
For example, the code above could declare the variable time_series_length = 50.
Then in the loops, 49 should be replaced by time_series_length - 1.
The advantages are:
• the meaning is much clearer throughout
• to alter the time series length, you only need to change one value
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong.
If you want to know more, read the excellent summary found on this page.
We’ll talk about how to avoid repetition below.
Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient.
Rookie programmers typically use global variables with abandon — as we once did ourselves.
But global variables are dangerous, especially in medium to large size programs, since
• they can affect what happens in any part of your program
• they can be changed by any function
This makes it much harder to be certain about what some small part of a given piece of code
actually commands.
Here’s a useful discussion on the topic.
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them.
(We’ll discuss how just below).
JIT Compilation
For scientific computing, there is another good reason to avoid global variables.
As we’ve seen in previous lectures, JIT compilation can generate excellent performance for
scripting languages like Python.
But the task of the compiler used for JIT compilation becomes harder when global variables
are present.
Put differently, the type inference required for JIT compilation is safer and more effective
when variables are sandboxed inside a function.
Fortunately, we can easily avoid the evils of global variables and WET code.
• WET stands for “we enjoy typing” and is the opposite of DRY.
We can do this by making frequent use of functions or classes.
In fact, functions and classes are designed specifically to help us avoid shaming ourselves by
repeating code or excessive use of global variables.
Both can be useful, and in fact they work well with each other.
14.5. REVISITING THE EXAMPLE 203
Here’s some code that reproduces the plot above with better coding style.
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True)
plt.show()
204 CHAPTER 14. WRITING GOOD CODE
14.6. EXERCISES 205
14.6 Exercises
14.6.1 Exercise 1
𝑞𝑠 (𝑝) = exp(𝛼𝑝) − 𝛽.
𝑞𝑑 (𝑝) = 𝛾𝑝−𝛿 .
This yields the equilibrium price 𝑝∗ . From this we get the equilibrium price by 𝑞 ∗ = 𝑞𝑠 (𝑝∗ )
The parameter values will be
• 𝛼 = 0.1
• 𝛽=1
• 𝛾=1
• 𝛿=1
# Compute equilibrium
def h(p):
return p**(-1) - (np.exp(0.1 * p) - 1) # demand - supply
p_star = brentq(h, 2, 4)
q_star = np.exp(0.1 * p_star) - 1
qs = np.exp(0.1 * grid) - 1
qd = grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
p_star = brentq(h, 2, 4)
14.6. EXERCISES 207
qs = np.exp(0.1 * p_grid) - 1
qd = 1.25 * p_grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
Now we might consider supply shifts, but you already get the idea that there’s a lot of re-
peated code here.
Refactor and improve clarity in the code above using the principles discussed in this lecture.
208 CHAPTER 14. WRITING GOOD CODE
14.7 Solutions
14.7.1 Exercise 1
def compute_equilibrium(self):
def h(p):
return self.qd(p) - self.qs(p)
p_star = brentq(h, 2, 4)
q_star = np.exp(self.α * p_star) - self.β
def plot_equilibrium(self):
# Now plot
grid = np.linspace(2, 4, 100)
fig, ax = plt.subplots()
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
In [8]: eq = Equilibrium()
In [9]: eq.compute_equilibrium()
In [10]: eq.plot_equilibrium()
14.7. SOLUTIONS 209
One of the nice things about our refactored code is that, when we change parameters, we
don’t need to repeat ourselves:
In [12]: eq.compute_equilibrium()
In [13]: eq.plot_equilibrium()
210 CHAPTER 14. WRITING GOOD CODE
Chapter 15
15.1 Contents
• Overview 15.2
• Iterables and Iterators 15.3
• Names and Name Resolution 15.4
• Handling Errors 15.5
• Decorators and Descriptors 15.6
• Generators 15.7
• Recursive Function Calls 15.8
• Exercises 15.9
• Solutions 15.10
15.2 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it.
It’s here
2. for those who have worked through a number of applications, and now want to learn
more about the Python language
A variety of topics are treated in the lecture, including generators, exceptions and descriptors.
211
212 CHAPTER 15. MORE LANGUAGE FEATURES
15.3.1 Iterators
Overwriting us_cities.txt
In [2]: f = open('us_cities.txt')
f.__next__()
In [3]: f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file.
The next method can also be accessed via the builtin function next(), which directly calls
this method
In [4]: next(f)
In [6]: next(e)
Overwriting test_table.csv
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
In [9]: next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements.
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
15.3.3 Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
Out[11]: list
In [12]: next(x)
�
↪---------------------------------------------------------------------------
<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)
In [ ]: x = ['foo', 'bar']
type(x)
In [13]: y = iter(x)
type(y)
Out[13]: list_iterator
In [14]: next(y)
Out[14]: 'foo'
In [15]: next(y)
Out[15]: 'bar'
In [16]: next(y)
�
↪---------------------------------------------------------------------------
<ipython-input-16-81b9d2f0f16a> in <module>
----> 1 next(y)
StopIteration:
In [17]: iter(42)
�
↪---------------------------------------------------------------------------
216 CHAPTER 15. MORE LANGUAGE FEATURES
<ipython-input-17-ef50b48e4398> in <module>
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
• max(), min(), sum(), all(), any()
For example
Out[18]: 10
In [19]: y = iter(x)
type(y)
Out[19]: list_iterator
In [20]: max(y)
Out[20]: 10
One thing to remember about iterators is that they are depleted by use
Out[21]: 10
In [22]: max(y)
�
↪ ---------------------------------------------------------------------------
15.4. NAMES AND NAME RESOLUTION 217
<ipython-input-22-062424e6ec08> in <module>
----> 1 max(y)
In [23]: x = 42
We now know that when this statement is executed, Python creates an object of type int in
your computer’s memory, containing
• the value 42
• some associated attributes
But what is x itself?
In Python, x is called a name, and the statement x = 42 binds the name x to the integer
object we have just discussed.
Under the hood, this process of binding names to objects is implemented as a dictionary—
more about this in a moment.
There is no problem binding two or more names to the one object, regardless of what that
object is
g = f
id(g) == id(f)
Out[24]: True
In [25]: g('test')
test
In the first step, a function object is created, and the name f is bound to it.
After binding the name g to the same object, we can use it anywhere we would use f.
218 CHAPTER 15. MORE LANGUAGE FEATURES
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then
rebound to another
In [26]: x = 'foo'
id(x)
Out[26]: 139915650812536
15.4.2 Namespaces
In [28]: x = 42
Overwriting math2.py
Next let’s import the math module from the standard library
In [32]: math.pi
Out[32]: 3.141592653589793
In [33]: math2.pi
Out[33]: 'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary.
We can look at the dictionary directly, using module_name.__dict__
math.__dict__.items()
origin='/home/ubuntu/anaconda3/lib/python3.7/lib-dynload/math.cpython-37m-
x86_64-linux-
gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in�
↪function acosh>),
math2.__dict__.items()
('__spec__', ModuleSpec(name='math2',
loader=<_frozen_importlib_external.SourceFileLoader object at�
↪0x7f40a0d030b8>,
origin='/home/ubuntu/repos/lecture-source-py/_build/jupyterpdf/executed/
↪math2.py')),
('__file__', '/home/ubuntu/repos/lecture-source-
py/_build/jupyterpdf/executed/math2.py'), ('__cached__', '/home/ubuntu/
↪repos/lecture-
source-py/_build/jupyterpdf/executed/__pycache__/math2.cpython-37.pyc'),
('__builtins__', {'__name__': 'builtins', '__doc__': "Built-in functions,�
↪exceptions,
'license': Type license() to see the full license text, 'help': Type�
↪help() for
InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell�
↪object at
As you know, we access elements of the namespace using the dotted attribute notation
In [36]: math.pi
Out[36]: 3.141592653589793
Out[37]: True
In [38]: vars(math).items()
origin='/home/ubuntu/anaconda3/lib/python3.7/lib-dynload/math.cpython-37m-
x86_64-linux-
gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in�
↪function acosh>),
224 CHAPTER 15. MORE LANGUAGE FEATURES
In [39]: dir(math)[0:10]
Out[39]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
15.4. NAMES AND NAME RESOLUTION 225
'acos',
'acosh',
'asin',
'asinh']
In [40]: print(math.__doc__)
In [41]: math.__name__
Out[41]: 'math'
In [42]: print(__name__)
__main__
When we run a script using IPython’s run command, the contents of the file are executed as
part of __main__ too.
To see this, let’s create a file mod.py that prints its own __name__ attribute
Overwriting mod.py
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__.
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__) .
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session.
If you prefer to see only the variables you have initialized, use whos
In [46]: x = 2
y = 3
import numpy as np
%whos
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module.
While this occurs, the namespace amodule.__dict__ is the global namespace.
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made.
In this case it’s __main__, so the namespace of __main__ again becomes the global names-
pace.
Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace.
The reason for this will be explained in just a moment.
Variables in the local namespace are called local variables.
After the function returns, the namespace is deallocated and lost.
While the function is executing, we can view the contents of the local namespace with
locals().
For example, consider
In [48]: f(1)
{'x': 1, 'a': 2}
Out[48]: 2
We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?
• These definitions are stored in a module called __builtin__.
• They have there own namespace called __builtins__.
228 CHAPTER 15. MORE LANGUAGE FEATURES
In [49]: dir()[0:10]
Out[49]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_18', '_19', '_2']
In [50]: dir(__builtins__)[0:10]
Out[50]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']
In [51]: __builtins__.max
But __builtins__ is special, because we can always access them directly as well
In [52]: max
Out[53]: True
Here f is the enclosing function for g, and each function gets its own namespaces.
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
If the name is not in any of these namespaces, the interpreter raises a NameError.
This is called the LEGB rule (local, enclosing, global, builtin).
Here’s an example that helps to illustrate .
Consider a script test.py that looks as follows
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Overwriting test.py
a = 0 y = 11
In [57]: x
Out[57]: 2
First,
• The global namespace {} is created.
• The function object is created, and g is bound to it within the global namespace.
• The name a is bound to 0, again in the global namespace.
Next g is called via y = g(10), leading to the following sequence of actions
• The local namespace for the function is created.
• Local names x and a are bound, so that the local namespace becomes {'x': 10,
'a': 1}.
• Statement x = x + a uses the local a and local x to compute x + a, and binds local
name x to the result.
• This value is returned, and y is bound to it in the global namespace.
• Local x and a are discarded (and the local namespace is deallocated).
Note that the global a was not affected by the local a.
This is a good time to say a little more about mutable vs immutable objects.
Consider the code segment
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x.
First f and x are registered in the global namespace.
The call f(x) creates a local namespace and adds x to it, bound to 1.
Next, this local x is rebound to the new integer object 2, and this value is returned.
None of this affects the global x.
However, it’s a different story when we use a mutable data type such as a list
return x
x = [1]
print(f(x), x)
[2] [2]
𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1
15.5.1 Assertions
For example, pretend for a moment that the np.var function doesn’t exist and we need to
write our own
If we run this with an array of length one, the program will terminate and print our error
message
In [61]: var([1])
�
↪ ---------------------------------------------------------------------------
<ipython-input-61-8419b6ab38ec> in <module>
----> 1 var([1])
<ipython-input-60-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
The approach used above is a bit limited, because it always leads to termination.
Sometimes we can handle errors more gracefully, by treating special cases.
Let’s look at how this is done.
Exceptions
In [62]: def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program.
Here’s a different kind of error, unrelated to syntax
In [63]: 1 / 0
�
↪---------------------------------------------------------------------------
<ipython-input-63-bc757c3fda29> in <module>
----> 1 1 / 0
Here’s another
In [64]: x1 = y1
�
↪---------------------------------------------------------------------------
<ipython-input-64-a7b8d65e9e45> in <module>
----> 1 x1 = y1
And another
234 CHAPTER 15. MORE LANGUAGE FEATURES
In [65]: 'foo' + 6
�
↪ ---------------------------------------------------------------------------
<ipython-input-65-216809d6e6fe> in <module>
----> 1 'foo' + 6
And another
In [66]: X = []
x = X[0]
�
↪ ---------------------------------------------------------------------------
<ipython-input-66-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]
Catching Exceptions
We can catch and deal with exceptions using try – except blocks.
Here’s a simple example
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
In [68]: f(2)
Out[68]: 0.5
In [69]: f(0)
In [70]: f(0.0)
In [72]: f(2)
Out[72]: 0.5
In [73]: f(0)
In [74]: f('foo')
In [76]: f(2)
Out[76]: 0.5
In [77]: f(0)
In [78]: f('foo')
Let’s look at some special syntax elements that are routinely used by Python developers.
You might not need the following concepts immediately, but you will see them in other peo-
ple’s code.
Hence you need to understand them at some stage of your Python education.
15.6.1 Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar.
It’s very easy to say what decorators do.
On the other hand it takes a bit of effort to explain why you might use them.
15.6. DECORATORS AND DESCRIPTORS 237
An Example
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow.
If you try it, you’ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan .
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined).
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up
later on.
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message.
This change is easy enough to implement
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code.
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid.
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way.
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times.
The situation is still worse if the test logic is longer and more complicated.
In this kind of scenario the following approach would be neater
238 CHAPTER 15. MORE LANGUAGE FEATURES
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
Enter Decorators
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
15.6. DECORATORS AND DESCRIPTORS 239
with
In [85]: @check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
15.6.2 Descriptors
One potential problem we might have here is that a user alters one of these variables but not
the other
Out[87]: 1000
In [88]: car.kms
Out[88]: 1610.0
Out[89]: 1610.0
In the last two lines we see that miles and kms are out of sync.
What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated.
A Solution
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
Out[91]: 1000
Out[92]: 9660.0
How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the
variables.
The objects miles and kms are properties, a common kind of descriptor.
The methods get_miles, set_miles, get_kms and set_kms define what happens when
you get (i.e. access) or set (bind) these variables
• So-called “getter” and “setter” methods.
The builtin Python function property takes getter and setter methods and creates a prop-
erty.
For example, after car is created as an instance of Car, the object car.miles is a property.
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered — in this case set_miles.
These days its very common to see the property function used via a decorator.
Here’s another version of our Car class that works as before but now uses decorators to set
up the properties
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
15.7 Generators
Out[94]: tuple
In [96]: type(plural)
Out[96]: list
Out[97]: generator
In [98]: next(plural)
Out[98]: 'dogs'
In [99]: next(plural)
Out[99]: 'cats'
In [100]: next(plural)
Out[100]: 'birds'
Out[101]: 285
The function sum() calls next() to get the items, adds successive terms.
In fact, we can omit the outer brackets in this case
Out[102]: 285
The most flexible way to create generator objects is to use generator functions.
Let’s look at some examples.
Example 1
It looks like a function, but uses a keyword yield that we haven’t met before.
Let’s see how it works after running this code
In [104]: type(f)
Out[104]: function
In [106]: next(gen)
Out[106]: 'start'
In [107]: next(gen)
Out[107]: 'middle'
In [108]: next(gen)
244 CHAPTER 15. MORE LANGUAGE FEATURES
Out[108]: 'end'
In [109]: next(gen)
�
↪---------------------------------------------------------------------------
<ipython-input-109-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The generator function f() is used to create generator objects (in this case gen).
Generators are iterators, because they support a next method.
The first call to next(gen)
• Executes code in the body of f() until it meets a yield statement.
• Returns that value to the caller of next(gen).
The second call to next(gen) starts executing from the next line
In [ ]: def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
Example 2
In [111]: g
15.7. GENERATORS 245
Out[112]: generator
In [113]: next(gen)
Out[113]: 2
In [114]: next(gen)
Out[114]: 4
In [115]: next(gen)
Out[115]: 16
In [116]: next(gen)
�
↪---------------------------------------------------------------------------
<ipython-input-116-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
Out[119]: 4996183
But we are creating two huge lists here, range(n) and draws.
This uses lots of memory and is very slow.
If we make n even bigger then this happens
In [120]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
In [122]: n = 10000000
draws = f(n)
draws
In [123]: sum(draws)
15.8. RECURSIVE FUNCTION CALLS 247
Out[123]: 4998307
In summary, iterables
• avoid the need to create big lists/tuples, and
• provide a uniform interface to iteration that can be used transparently in for loops
This is not something that you will use every day, but it is still useful — you should learn it
at some stage.
Basically, a recursive function is a function that calls itself.
For example, consider the problem of computing 𝑥𝑡 for some t when
What happens here is that each successive call uses it’s own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution.
We’ll meet less contrived applications of recursion later on.
15.9 Exercises
15.9.1 Exercise 1
The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55.
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡.
15.9.2 Exercise 2
Complete the following code, and test it using this csv file, which we assume that you’ve put
in your current working directory
dates = column_iterator('test_table.csv', 1)
15.9.3 Exercise 3
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers.
15.10 Solutions
15.10.1 Exercise 1
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)
Let’s test it
15.10.2 Exercise 2
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
15.10.3 Exercise 3
7
21
Overwriting numbers.txt
In [130]: f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
Chapter 16
Debugging
16.1 Contents
• Overview 16.2
• Debugging 16.3
• Other Useful Magics 16.4
“Debugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.” – Brian Kernighan
16.2 Overview
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that.
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system.
Debugging tools for Python vary across platforms, IDEs and editors.
Here we’ll focus on Jupyter and leave you to explore other settings.
We’ll need the following imports
16.3 Debugging
251
252 CHAPTER 16. DEBUGGING
�
↪ ---------------------------------------------------------------------------
<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
This code is intended to plot the log function over the interval [1, 2].
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots().
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x)).
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method.
But let’s pretend that we don’t understand this for the moment.
We might suspect there’s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:
In [3]: ax
�
↪ ---------------------------------------------------------------------------
<ipython-input-3-b00e77935981> in <module>
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates.
Let’s try doing it a different way.
We run the first cell block again, generating the same error
�
↪ ---------------------------------------------------------------------------
<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
For example, here we simply type the name ax to see what’s happening with this object:
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
It’s now very clear that ax is an array, which clarifies the source of the problem.
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
plot_log()
256 CHAPTER 16. DEBUGGING
Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10).
Now there won’t be any exception, but the plot won’t look right.
To investigate, it would be helpful if we could inspect variables like x during execution of the
function.
To this end, we add a “break point” by inserting breakpoint() inside the function code
block
def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now let’s run the script, and investigate via the debugger
> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
16.4. OTHER USEFUL MAGICS 257
We used n twice to step forward through the code (one line at a time).
Then we printed the value of x to see what was happening with that variable.
To exit from the debugger, use q.
259
Chapter 17
17.1 Contents
• Overview 17.2
• Key Formulas 17.3
• Example: The Money Multiplier in Fractional Reserve Banking 17.4
• Example: The Keynesian Multiplier 17.5
• Example: Interest Rates and Present Values 17.6
• Back to the Keynesian Multiplier 17.7
17.2 Overview
The lecture describes important ideas in economics that use the mathematics of geometric
series.
Among these are
• the Keynesian multiplier
• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets
(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that
261
262 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯
1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐
To prove key formula (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1), then
the outcome is the equation 1 = 1.
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇
1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1).
We now move on to describe some famous economic applications of geometric series.
In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
17.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING263
• In recent times
– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash
• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)
– cash was a gold or silver coin
– a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)
Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits.
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfies 0 < 𝑟 <
1), banks create money by issuing deposits backed by fractional reserves plus loans that
they make to their customers.
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system.
The geometric series formula (1) is at the heart of the classic model of the money creation
process – one that leads us to the celebrated money multiplier.
𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖
The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖 .
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand).
Each bank 𝑖 sets its reserves to satisfy the equation
𝑅𝑖 = 𝑟𝐷𝑖 (2)
𝐷𝑖+1 = 𝐿𝑖 (3)
264 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1
• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1
Finally, we add an initial condition about an exogenous level of bank 0’s deposits
𝐷0 is given exogenously
We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0.
Now we do a little algebra.
Combining equations (2) and (3) tells us that
𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (4)
This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves.
Combining equation (4) with equation (3) tells us that
Equation (5) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series
1, (1 − 𝑟), (1 − 𝑟)2 , ⋯
∞
𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (6)
𝑖=0
1 − (1 − 𝑟) 𝑟
The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system.
1
Equation (6) asserts that the money multiplier is 𝑟
• An initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0 .
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system
∞
according to 𝐷0 = ∑𝑖=0 𝑅𝑖 .
17.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 265
The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which
• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand
𝑐+𝑖=𝑦
The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:
𝑐 = 𝑏𝑦
1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier.
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
∞
𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0
∞
1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0
∞
The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next.
We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account
• we add a one-period lag in how income affects consumption
We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡.
We modify our consumption function to assume the form
𝑐𝑡 = 𝑏𝑦𝑡−1
so that 𝑏 is the marginal propensity to consume (now) out of last period’s income.
We begin wtih an initial condition stating that
𝑦−1 = 0
𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0
𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖
and
𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖
and
𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖
or
17.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 267
1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏
Evidently, as 𝑡 → +∞,
1
𝑦𝑡 → 𝑖
1−𝑏
Remark 1: The above formula is often applied to assert that an exogenous increase in in-
vestment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by succes-
sive amounts
at times 0, 1, 2, ….
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures.
If we generalize the model so that the national income identity becomes
𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡
then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures.
We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time.
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time.
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate
• if the nominal interest rate is 5 percent, then 𝑟 = .05
A one-period gross nominal interest rate 𝑅 is defined as
𝑅 = 1 + 𝑟 ∈ (1, 2)
• if 𝑟 = .05, then 𝑅 = 1.05
Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡.
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now.
The price at which these exchanges occur is the gross nominal interest rate.
• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow.
268 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net in-
terest rate 𝑟.
We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, ….
Two important geometric sequences are
1, 𝑅, 𝑅2 , ⋯ (7)
and
Sequence (7) tells us how dollar values of an investment accumulate through time.
Sequence (8) tells us how to discount future dollars to get their values in terms of today’s
dollars.
17.6.1 Accumulation
Geometric sequence (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates
• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on
Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence
𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯
17.6.2 Discounting
Geometric sequence (8) tells us how much future dollars are worth in terms of today’s dollars.
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡.
It follows that
• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1
• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗
So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today).
17.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 269
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1
where the last line uses the formula for an infinite geometric series.
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and 𝑔 are typically
small numbers, e.g., .05 or .03.
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,
1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟
1
and the fact that 𝑟 is small to approximate 1+𝑟 ≈ 1 − 𝑟.
Use this approximation to write 𝑝0 as
1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔
𝑥0
𝑝0 =
𝑟−𝑔
is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔.
We can also extend the asset pricing formula so that it applies to finite leases.
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again
270 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
𝑥𝑡 = 𝐺𝑡 𝑥0
𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1
1 1
𝑇 +1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟) 2
Expanding:
We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation.
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in Python.
First we plot the true finite stream present-value after computing it below
return p
# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))
Now that we have defined our functions, we can plot some outcomes.
First we study the quality of our approximations
T_max = 50
T = np.arange(0, T_max+1)
g = 0.02
r = 0.03
x_0 = 1
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
for f in funcs:
plot_function(ax, T, f, our_args)
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
272 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The graph above shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetual lease.
Now we consider two different views of what happens as 𝑟 and 𝑔 covary
ax.legend()
plt.show()
274 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
This graph gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length 𝑇 =
+∞ is to have finite value.
For fans of 3-d graphs the same point comes through in the following graph.
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!
rr, gg = np.meshgrid(r, g)
z = finite_lease_pv_true(T, gg, rr, x_0)
We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔.
We will use a library called SymPy.
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease.
After that, we’ll use SymPy to compute derivatives
𝑥0
Out[7]: 𝑔+1
− 𝑟+1 + 1
dp0 / dg is:
276 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
𝑥0
Out[8]: 2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
dp0 / dr is:
𝑥0 (𝑔 + 1)
Out[9]: − 2
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)
𝜕𝑝0 𝜕𝑝0
We can see that for 𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑟 will
always be negative.
𝜕𝑝0 𝜕𝑝0
Similarly, 𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, so 𝜕𝑔 will always be
positive.
We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed.
# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100
fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
17.7. BACK TO THE KEYNESIAN MULTIPLIER 277
In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income.
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed
fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in bs:
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
278 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Increasing the marginal propensity to consume 𝑏 increases the path of output over time.
Now we will compare the effects on output of increases in investment and government spend-
ing.
x = np.arange(0, T+1)
values = [0.3, 0.4]
for i in values:
y = calculate_y(i, b, g_0, T, y_init)
ax1.plot(x, y, label=f"i={i}")
for g in values:
y = calculate_y(i_0, b, g, T, y_init)
ax2.plot(x, y, label=f"g={g}")
Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical.
280 CHAPTER 17. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Chapter 18
Linear Algebra
18.1 Contents
• Overview 18.2
• Vectors 18.3
• Matrices 18.4
• Solving Systems of Equations 18.5
• Eigenvalues and Eigenvectors 18.6
• Further Topics 18.7
• Exercises 18.8
• Solutions 18.9
18.2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in.
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2
The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
281
282 CHAPTER 18. LINEAR ALGEBRA
18.3 Vectors
A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ].
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example, ℝ2 is the plane, and a vector in ℝ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner
The two most common operators for vectors are addition and scalar multiplication, which we
now describe.
As a matter of definition, when we add two vectors, we add them element-by-element
𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑥
⎣ 𝑛⎦ ⎣ 𝑛⎦𝑦 𝑥
⎣ 𝑛 + 𝑦 𝑛⎦
Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces
𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
scalars = (-2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
18.3. VECTORS 285
In [5]: 4 * x
𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1
Out[6]: 12.0
Out[7]: 1.7320508075688772
Out[8]: 1.7320508075688772
286 CHAPTER 18. LINEAR ALGEBRA
18.3.3 Span
Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
In particular, 𝑦 ∈ ℝ𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if
In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a two-dimensional plane passing through these two points and the origin.
α, β = 0.2, 0.1
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)
18.3. VECTORS 287
Examples
If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is
1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥
⎤
⎣0⎦ ⎣0⎦ ⎣1⎦
then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write
𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3
288 CHAPTER 18. LINEAR ALGEBRA
As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴.
• linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in ℝ3 as a
plane through the origin.
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
• linearly dependent if 𝑎3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since ℝ𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in ℝ𝑛 must be linearly
dependent.
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛
Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and
𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘
18.4 Matrices
Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and
denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix and denoted by 𝐼.
Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:
and
In the latter case, the matrices must have the same shape in order for the definition to make
sense.
We also have a convention for multiplying two matrices.
290 CHAPTER 18. LINEAR ALGEBRA
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations.
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting
matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥.
According to the preceding rule, this gives us an 𝑛 × 1 column vector
Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing.
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations Section ??.
You can create them manually from tuples of tuples (or lists of lists) as follows
type(A)
Out[10]: tuple
In [11]: A = np.array(A)
type(A)
Out[11]: numpy.ndarray
In [12]: A.shape
Out[12]: (2, 2)
18.5. SOLVING SYSTEMS OF EQUATIONS 291
The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion.
To get the transpose of A, use A.transpose() or, more simply, A.T.
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
In [13]: A = np.identity(3)
B = np.ones((3, 3))
2 * A
In [14]: A + B
Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into
𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have
You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥.
𝑦 = 𝐴𝑥 (3)
The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax = axes[0]
ax = axes[1]
ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)
plt.show()
18.5. SOLVING SYSTEMS OF EQUATIONS 293
In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then
𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘
Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent
The property of having linearly independent columns is sometimes expressed as having full
column rank.
Inverse Matrices
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it
here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted.
This case is very important in many settings, not least in the setting of linear regression
(where 𝑛 is the number of observations, and 𝑘 is the number of explanatory variables).
Given arbitrary 𝑦 ∈ ℝ𝑛 , we seek an 𝑥 ∈ ℝ𝑘 such that 𝑦 = 𝐴𝑥.
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent.
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of ℝ𝑛 .
This span is very “unlikely” to contain arbitrary 𝑦 ∈ ℝ𝑛 .
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3.
Imagine an arbitrarily chosen 𝑦 ∈ ℝ3 , located somewhere in that three-dimensional space.
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the 𝑛 > 𝑘 case we usually give up on existence.
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes.
This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span
ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write
Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule.
296 CHAPTER 18. LINEAR ALGEBRA
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code
Out[16]: -2.0
Out[17]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Out[18]: array([[1.],
[1.]])
Out[19]: array([[-1.],
[ 1.]])
Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred.
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y).
𝐴𝑣 = 𝜆𝑣
plt.show()
298 CHAPTER 18. LINEAR ALGEBRA
The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree
𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues.
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
18.7. FURTHER TOPICS 299
A = np.array(A)
evals, evecs = eig(A)
evals
In [22]: evecs
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that
𝐴𝑣 = 𝜆𝐵𝑣
We round out our discussion by briefly mentioning several other important topics.
Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,
∞
then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.
Matrix Norms
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm.
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin Section ??.
Neumann’s Theorem
∞
−1
(𝐼 − 𝐴) = ∑ 𝐴𝑘 (4)
𝑘=0
Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,
Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of
𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1.
In which case (4) is valid.
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse).
• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector
Then
𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′
18.8 Exercises
18.8.1 Exercise 1
𝑦 = 𝐴𝑥 + 𝐵𝑢
Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian
ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
302 CHAPTER 18. LINEAR ALGEBRA
1. 𝜆 = −2𝑃 𝑦.
As we will see, in economic contexts Lagrange multipliers often are shadow prices.
Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥+𝐵𝑢)′ 𝑃 (𝐴𝑥+𝐵𝑢)−𝑢′ 𝑄𝑢
with respect to 𝑢. You can verify that this leads to the same maximizer.
18.9 Solutions
s.t.
𝑦 = 𝐴𝑥 + 𝐵𝑢
with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
The associated Lagrangian is:
𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]
1. ^^.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
18.9. SOLUTIONS 303
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies
𝜆 = −2𝑃 𝑦
2. ^^.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives
𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0
𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0
(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,
which follows from the definition of the first-order conditions for Lagrangian equation.
3. ^^.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then
−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,
Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that
Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
Chapter 19
19.1 Contents
• Overview 19.2
• De Moivre’s Theorem 19.3
• Applications of de Moivre’s Theorem 19.4
19.2 Overview
305
306 CHAPTER 19. COMPLEX NUMBERS AND TRIGONOMETRY
𝑟 = |𝑧| = √𝑥2 + 𝑦2
The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis.
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 ).
Therefore,
𝑦
𝜃 = tan−1 ( )
𝑥
19.2.2 An Example
√
Consider the complex number 𝑧 = 1 + 3𝑖.
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3.
√
It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜 .
√
Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖.
# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)
19.2. OVERVIEW 307
# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')
ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # Less radial ticks
ax.set_rlabel_position(-88.5) # Get radial labels away from plotted line
ax.grid(True)
plt.show()
308 CHAPTER 19. COMPLEX NUMBERS AND TRIGONOMETRY
𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )
and compute.
19.4. APPLICATIONS OF DE MOIVRE’S THEOREM 309
19.4.1 Example 1
1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟
and thus
𝑥2 + 𝑦2 = 𝑟2
19.4.2 Example 2
𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)
19.4.3 Example 3
This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [139].
Thus, consider a second-order linear difference equation
𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛
𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0
or
(𝑧 2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0
has roots 𝑧1 , 𝑧1 .
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation.
Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation
• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions
To solve the difference equation, recall from example 2 that
where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0 .
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is
𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔
We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃).
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛.
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 41 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2.
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the
above initial condition:
# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ω+θ) / cos(ω))
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')
19.4. APPLICATIONS OF DE MOIVRE’S THEOREM 311
# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')
ω = 0.000
p = 2.000
# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)
# Plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')
ax.grid()
plt.show()
312 CHAPTER 19. COMPLEX NUMBERS AND TRIGONOMETRY
𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖
Since both real and imaginary parts of the above formula should be equal, we get:
19.4. APPLICATIONS OF DE MOIVRE’S THEOREM 313
The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:
# Verify
print("cos(ω)cos(θ) - sin(ω)sin(θ) =",
simplify(cos(ω)*cos(θ) - sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =",
simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))
We can also compute the trigonometric integrals using polar forms of complex numbers.
For example, we want to solve the following integral:
𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋
and thus:
𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2
314 CHAPTER 19. COMPLEX NUMBERS AND TRIGONOMETRY
We can verify the analytical as well as numerical results using integrate in the sympy
package:
ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)
sin2 (𝜔)
Out[6]:
2
Out[7]: 0
19.4.6 Exercises
We invite the reader to verify analytically and with the “sympy” package the following two
equalities:
𝜋
𝜋
∫ cos(𝜔)2 𝑑𝜔 =
−𝜋 2
𝜋
𝜋
∫ sin(𝜔)2 𝑑𝜔 =
−𝜋 2
Chapter 20
20.1 Contents
• Overview 20.2
• Key Definitions 20.3
• The Orthogonal Projection Theorem 20.4
• Orthonormal Basis 20.5
• Projection Using Matrix Algebra 20.6
• Least Squares Regression 20.7
• Orthogonalization and Decomposition 20.8
• Exercises 20.9
• Solutions 20.10
20.2 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions.
These include, but are not limited to,
• Least squares projection, also known as linear regression
• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc
In this lecture, we focus on
• key ideas
• least squares regression
We’ll require the following imports:
315
316 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
For background and foundational concepts, see our lecture on linear algebra.
For more proofs and greater theoretical detail, see A Primer in Econometric Theory.
For a complete set of proofs in a general setting, see, for example, [132].
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter.
Assume 𝑥, 𝑧 ∈ ℝ𝑛 .
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖 .
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩.
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors
𝑥 and 𝑧.
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧.
𝑆 ⟂ is a linear subspace of ℝ𝑛
• To see this, fix 𝑥, 𝑦 ∈ 𝑆 ⟂ and 𝛼, 𝛽 ∈ ℝ.
• Observe that if 𝑧 ∈ 𝑆, then
318 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝑦 ̂ ∶= argmin ‖𝑦 − 𝑧‖
𝑧∈𝑆
Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof.
For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship
1. 𝑃 𝑦 ∈ 𝑆 and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
For example, to prove 1, observe that 𝑦 = 𝑃 𝑦 + 𝑦 − 𝑃 𝑦 and apply the Pythagorean law.
Orthogonal Complement
Let 𝑆 ⊂ ℝ𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every
𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂ .
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂ .
We write
𝑌 = 𝑆 ⊕ 𝑆⟂
to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that
𝑦 = 𝑥 1 + 𝑥2 .
20.5. ORTHONORMAL BASIS 321
𝑘
𝑥 = ∑⟨𝑥, 𝑢𝑖 ⟩𝑢𝑖 for all 𝑥∈𝑆
𝑖=1
To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify
322 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1)
𝑗=1
𝑘
⟨𝑥, 𝑢𝑖 ⟩ = ∑ 𝛼𝑗 ⟨𝑢𝑗 , 𝑢𝑖 ⟩ = 𝛼𝑖
𝑗=1
When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then
𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ ℝ𝑛 (2)
𝑖=1
𝑘 𝑘
⟨𝑦 − ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , 𝑢𝑗 ⟩ = ⟨𝑦, 𝑢𝑗 ⟩ − ∑⟨𝑦, 𝑢𝑖 ⟩⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0
𝑖=1 𝑖=1
𝐸𝑆̂ 𝑦 = 𝑃 𝑦
𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′
1. 𝑃 𝑦 ∈ 𝑆, and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆
𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦
𝑘
𝑃 𝑦 = 𝑈 𝑈 ′ 𝑦 = ∑⟨𝑢𝑖 , 𝑦⟩𝑢𝑖
𝑖=1
We have recovered our earlier result about projecting onto the span of an orthonormal basis.
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦
Because 𝑋𝑏 ∈ span(𝑋)
If probabilities and hence 𝔼 are unknown, we cannot solve this problem directly.
However, if a sample is available, we can estimate the risk with the empirical risk:
1 𝑁
min ∑(𝑦 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁 𝑛=1 𝑛
𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
𝑏∈ℝ𝐾
𝑛=1
20.7.2 Solution
𝑦1 𝑥𝑛1
⎛
⎜ 𝑦2 ⎞
⎟ ⎛
⎜ 𝑥𝑛2 ⎞
⎟
𝑦 ∶= ⎜
⎜ ⎟
⎟ , 𝑥𝑛 ∶= ⎜
⎜ ⎟
⎟ = :math:‘n‘-th obs on all regressors
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ 𝑦𝑁 ⎠ ⎝ 𝑥𝑛𝐾 ⎠
and
𝑁
argmin ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2 = argmin ‖𝑦 − 𝑋𝑏‖
𝑏∈ℝ𝐾 𝑛=1 𝑏∈ℝ𝐾
326 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦
Let’s return to the connection between linear independence and orthogonality touched on
above.
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets.
The next section gives details.
Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ ℝ𝑛 , there exists an orthonormal
set {𝑢1 , … , 𝑢𝑘 } with
20.8.2 QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition.
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization
𝑋 = 𝑄𝑅 where
• 𝑅 is 𝑘 × 𝑘, upper triangular, and nonsingular
• 𝑄 is 𝑛 × 𝑘 with orthonormal columns
Proof sketch: Let
• 𝑥𝑗 ∶= col𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with the same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using
Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖
Since 𝑥𝑗 ∈ span{𝑢1 , … , 𝑢𝑗 }, we have
𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1
For matrices 𝑋 and 𝑦 that overdetermine 𝑏𝑒𝑡𝑎 in the linear equation system 𝑦 = 𝑋𝛽, we
found the least squares approximator 𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦.
Using the QR decomposition 𝑋 = 𝑄𝑅 gives
𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦
Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitu-
tion.
328 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
20.9 Exercises
20.9.1 Exercise 1
20.9.2 Exercise 2
Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
20.9.3 Exercise 3
1
𝑦 ∶= ⎜ 3 ⎞
⎛ ⎟,
⎝ −3 ⎠
and
1 0
𝑋 ∶= ⎜ 0 −6 ⎞
⎛ ⎟
⎝ 2 2 ⎠
20.10 Solutions
20.10.1 Exercise 1
20.10.2 Exercise 2
Symmetry and idempotence of 𝑀 and 𝑃 can be established using standard rules for matrix
algebra. The intuition behind idempotence of 𝑀 and 𝑃 is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.).
20.10.3 Exercise 3
Here’s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture
20.10. SOLUTIONS 329
Parameters
----------
X : an n x k array with linearly independent columns
Returns
-------
U : an n x k array with orthonormal columns
"""
# Set up
n, k = X.shape
U = np.empty((n, k))
I = np.eye(n)
# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))
return U
X = [[1, 0],
[0, -6],
[2, 2]]
First, let’s try projection of 𝑦 onto the column space of 𝑋 using the ordinary matrix expres-
sion:
Now let’s do the same using an orthonormal basis created from our gram_schmidt function
330 CHAPTER 20. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
In [5]: U = gram_schmidt(X)
U
This is the same answer. So far so good. Finally, let’s try the same thing but with the basis
obtained via QR decomposition:
21.1 Contents
• Overview 21.2
• Relationships 21.3
• LLN 21.4
• CLT 21.5
• Exercises 21.6
• Solutions 21.7
21.2 Overview
This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables.
• The multivariate case.
Some of these extensions are presented as exercises.
We’ll need the following imports:
331
332 CHAPTER 21. LLN AND CLT
21.3 Relationships
21.4 LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means.
The classical law of large numbers concerns independent and identically distributed (IID)
random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:
𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)
In addition, let
1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
21.4.2 Proof
The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [49].
21.4. LLN 333
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we
have
(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that
𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2
2
⎧
{ 1 𝑛 ⎫
}
̄ 2
𝔼[(𝑋𝑛 − 𝜇) ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
1 𝑛 𝑛
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (3), we come to the estimate
𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (4)
𝑛𝜖2
This idea is very important in time series analysis, and we’ll come across it again soon
enough.
21.4.3 Illustration
Let’s now illustrate the classical IID law of large numbers using simulation.
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted
In [2]: n = 100
for ax in axes:
# Choose a randomly selected distribution
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)
# Plot
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
21.5. CLT 335
plt.show()
The three distributions are chosen at random from a selection stored in the dictionary
distributions.
21.5 CLT
Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means.
336 CHAPTER 21. LLN AND CLT
The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (5)
𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎.
21.5.2 Intuition
The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [49]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact, all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8
plt.show()
21.5. CLT 337
When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes even more pronounced.
We are witnessing the binomial approximation of the normal distribution.
21.5.3 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition.
To this end, we now perform the following simulation
√
2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).
3. Use these draws to compute some measure of their distribution — such as a histogram.
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0,�
↪\sigma^2)$')
ax.legend()
plt.show()
21.5. CLT 339
Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code.
The fit to the normal density is already tight and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .
21.5.4 Simulation 2
Our next simulation is somewhat like the first, except that we aim to track the distribution of
√
𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases.
In the simulation, we’ll be working with random variables having 𝜇 = 0.
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable.
√
For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on.
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve.
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities.
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5
def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution
is itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# Transform rows, so each represents a different distribution
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2}
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# Rescale, so that the random variable is zero mean
m, sigma = X.mean(), X.std()
return (X - m) / sigma
nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))
S = Z.cumsum(axis=1)
# Multiply j-th column by sqrt j
Y = (1 / np.sqrt(ns)) * S
# Plot
fig = plt.figure(figsize = (10, 6))
ax = fig.gca(projection='3d')
a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)
# Build verts
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))
If you run the file from the ordinary IPython shell, the figure should pop up in a window that
you can rotate with your mouse, giving different views on the density sequence.
The law of large numbers and central limit theorem work just as nicely in multidimensional
settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in ℝ𝑘 , we have
𝔼[𝑋1 ] 𝜇1
⎛
⎜ 𝔼[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞
⎟
𝔼[X] ∶= ⎜
⎜ ⎟
⎟ =⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎜ ⎟
⎝ 𝔼[𝑋𝑘 ] ⎠ ⎝ 𝜇𝑘 ⎠
1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1
342 CHAPTER 21. LLN AND CLT
ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (6)
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (7)
21.6 Exercises
21.6.1 Exercise 1
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (8)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥).
√
Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?
21.6.2 Exercise 2
Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that
3. The convergence
21.6. EXERCISES 343
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (9)
is valid.
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then
Var[AX] = A Var[X]A′
𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘,
then
𝑑
AZ𝑛 → AZ
Third, if S is a 𝑘×𝑘 symmetric positive definite matrix, then there exists a symmetric positive
definite matrix Q, called the inverse square root of S, such that
QSQ′ = I
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)
Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2
𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (10)
𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖
where
344 CHAPTER 21. LLN AND CLT
1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it.
21.7 Solutions
21.7.1 Exercise 1
In [6]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""
# Set parameters
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()
g = np.sin
g_prime = np.cos
# Plot
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
21.7. SOLUTIONS 345
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.
21.7.2 Exercise 2
√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)
√
Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)
𝑑
QY𝑛 → QY
Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal.
Its mean is clearly 0, and its variance-covariance matrix is
𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
346 CHAPTER 21. LLN AND CLT
# Compute Σ^{-1/2}
Q = inv(sqrtm(Σ))
# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
21.7. SOLUTIONS 347
348 CHAPTER 21. LLN AND CLT
Chapter 22
Heavy-Tailed Distributions
22.1 Contents
• Overview 22.2
• Visual Comparisons 22.3
• Failure of the LLN 22.4
• Classifying Tail Properties 22.5
• Exercises 22.6
• Solutions 22.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
22.2 Overview
Most commonly used probability distributions in classical statistics and the natural sciences
have either bounded support or light tails.
When a distribution is light-tailed, extreme observations are rare and draws tend not to devi-
ate too much from the mean.
Having internalized these kinds of distributions, many researchers and practitioners use rules
of thumb such as “outcomes more than four or five standard deviations from the mean can
safely be ignored.”
However, some distributions encountered in economics have far more probability mass in the
tails than distributions like the normal distribution.
With such heavy-tailed distributions, what would be regarded as extreme outcomes for
someone accustomed to thin tailed distributions occur relatively frequently.
Examples of heavy-tailed distributions observed in economic and financial settings include
• the income distributions and the wealth distribution (see, e.g., [159], [17]),
• the firm size distribution ([13], [59]}),
• the distribution of returns on holding assets over short time horizons ([112], [128]), and
• the distribution of city sizes ([135], [59]).
349
350 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
These heavy tails turn out to be important for our understanding of economic outcomes.
As one example, the heaviness of the tail in the wealth distribution is one natural measure of
inequality.
It matters for taxation and redistribution policies, as well as for flow-on effects for productiv-
ity growth, business cycles, and political economy
• see, e.g., [4], [63], [24] or [5].
This lecture formalizes some of the concepts introduced above and reviews the key ideas.
Let’s start with some imports:
The following two lines can be added to avoid an annoying FutureWarning, and prevent a
specific compatibility issue between pandas and matplotlib from causing problems down the
line:
One way to build intuition on the difference between light and heavy tails is to plot indepen-
dent draws and compare them side-by-side.
22.3.1 A Simulation
The figure below shows a simulation. (You will be asked to replicate it in the exercises.)
The top two subfigures each show 120 independent draws from the normal distribution, which
is light-tailed.
The bottom subfigure shows 120 independent draws from the Cauchy distribution, which is
22.3. VISUAL COMPARISONS 351
heavy-tailed.
352 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
In the top subfigure, the standard deviation of the normal distribution is 2, and the draws are
clustered around the mean.
In the middle subfigure, the standard deviation is increased to 12 and, as expected, the
amount of dispersion rises.
The bottom subfigure, with the Cauchy draws, shows a different pattern: tight clustering
around the mean for the great majority of observations, combined with a few sudden large
deviations from the mean.
This is typical of a heavy-tailed distribution.
r = s.pct_change()
fig, ax = plt.subplots()
ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)
plt.show()
[*********************100%***********************] 1 of 1 completed
22.4. FAILURE OF THE LLN 353
Five of the 1217 observations are more than 5 standard deviations from the mean.
Overall, the figure is suggestive of heavy tails, although not to the same degree as the Cauchy
distribution the figure above.
If, however, one takes tick-by-tick data rather daily data, the heavy-tailedness of the distribu-
tion increases further.
One impact of heavy tails is that sample averages can be poor estimators of the underlying
mean of the distribution.
To understand this point better, recall our earlier discussion of the Law of Large Numbers,
which considered IID 𝑋1 , … , 𝑋𝑛 with common distribution 𝐹
𝑛
If 𝔼|𝑋𝑖 | is finite, then the sample mean 𝑋̄ 𝑛 ∶= 1
𝑛 ∑𝑖=1 𝑋𝑖 satisfies
ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)
np.random.seed(1234)
N = 1_000
distribution = cauchy()
fig, ax = plt.subplots()
data = distribution.rvs(N)
# Plot
ax.plot(range(N), sample_mean, alpha=0.6, label='$\\bar X_n$')
plt.show()
𝑖𝑡𝑋̄ 𝑛 𝑡 𝑛
𝔼𝑒 = 𝔼 exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= 𝔼 ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ 𝔼 exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛
To keep our discussion precise, we need some definitions concerning tail properties.
We will focus our attention on the right hand tails of nonnegative random variables and their
distributions.
The definitions for left hand tails are very similar and we omit them to simplify the exposi-
tion.
∞
∫ exp(𝑡𝑥)𝐹 (𝑑𝑥) = ∞ for all 𝑡 > 0. (3)
0
One specific class of heavy-tailed distributions has been found repeatedly in economic and
social phenomena: the class of so-called power laws.
Specifically, given 𝛼 > 0, a nonnegative random variable 𝑋 is said to have a Pareto tail with
tail index 𝛼 if
Evidently (4) implies the existence of positive constants 𝑏 and 𝑥̄ such that ℙ{𝑋 > 𝑥} ≥ 𝑏𝑥−𝛼
whenever 𝑥 ≥ 𝑥.̄
The implication is that ℙ{𝑋 > 𝑥} converges to zero no faster than 𝑥−𝛼 .
In some sources, a random variable obeying (4) is said to have a power law tail.
The primary example is the Pareto distribution, which has distribution
𝛼
1 − (𝑥/𝑥)
̄ if 𝑥 ≥ 𝑥̄
𝐹 (𝑥) = { (5)
0 if 𝑥 < 𝑥̄
One graphical technique for investigating Pareto tails and power laws is the so-called rank-
size plot.
This kind of figure plots log size against log rank of the population (i.e., location in the popu-
lation when sorted from smallest to largest).
Often just the largest 5 or 10% of observations are plotted.
For a sufficiently large number of draws from a Pareto distribution, the plot generates a
straight line. For distributions with thinner tails, the data points are concave.
A discussion of why this occurs can be found in [122].
The figure below provides one example, using simulated data.
The rank-size plots shows draws from three different distributions: folded normal, chi-squared
with 1 degree of freedom and Pareto.
The Pareto sample produces a straight line, while the lines produced by the other samples are
concave.
22.6. EXERCISES 357
22.6 Exercises
22.6.1 Exercise 1
Replicate the figure presented above that compares normal and Cauchy draws.
Use np.random.seed(11) to set the seed.
22.6.2 Exercise 2
Prove: If 𝑋 has a Pareto tail with tail index 𝛼, then 𝔼[𝑋 𝑟 ] = ∞ for all 𝑟 ≥ 𝛼.
358 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
22.6.3 Exercise 3
Repeat exercise 1, but replace the three distributions (two normal, one Cauchy) with three
Pareto distributions using different choices of 𝛼.
For 𝛼, try 1.15, 1.5 and 1.75.
Use np.random.seed(11) to set the seed.
22.6.4 Exercise 4
22.6.5 Exercise 5
There is an ongoing argument about whether the firm size distribution should be modeled as
a Pareto distribution or a lognormal distribution (see, e.g., [58], [98] or [146]).
This sounds esoteric but has real implications for a variety of economic phenomena.
To illustrate this fact in a simple way, let us consider an economy with 100,000 firms, an in-
terest rate of r = 0.05 and a corporate tax rate of 15%.
Your task is to estimate the present discounted value of projected corporate tax revenue over
the next 10 years.
Because we are forecasting, we need a model.
We will suppose that
1. the number of firms and the firm size distribution (measured in profits) remain fixed
and
1. generating 100,000 draws of firm profit from the firm size distribution,
The Pareto distribution is assumed to take the form (5) with 𝑥̄ = 1 and 𝛼 = 1.05.
(The value the tail index 𝛼 is plausible given the data [59].)
To make the lognormal option as similar as possible to the Pareto option, choose its parame-
ters such that the mean and median of both distributions are the same.
Note that, for each distribution, your estimate of tax revenue will be random because it is
based on a finite number of draws.
22.7. SOLUTIONS 359
To take this into account, generate 100 replications (evaluations of tax revenue) for each of
the two distributions and compare the two samples by
• producing a violin plot visualizing the two samples side-by-side and
• printing the mean and standard deviation of both samples.
For the seed use np.random.seed(1234).
What differences do you observe?
(Note: a better approach to this problem would be to model firm dynamics and try to track
individual firms given the current distribution. We will discuss firm dynamics in later lec-
tures.)
22.7 Solutions
22.7.1 Exercise 1
In [6]: n = 120
np.random.seed(11)
for ax in axes:
ax.set_ylim((-120, 120))
s_vals = 2, 12
ax = axes[2]
distribution = cauchy()
data = distribution.rvs(n)
ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5, ms=4)
ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"draws from the Cauchy distribution", fontsize=11)
plt.subplots_adjust(hspace=0.25)
plt.show()
360 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
22.7. SOLUTIONS 361
22.7.2 Exercise 2
Let 𝑋 have a Pareto tail with tail index 𝛼 and let 𝐹 be its cdf.
Fix 𝑟 ≥ 𝛼.
As discussed after (4), we can take positive constants 𝑏 and 𝑥̄ such that
But then
∞ 𝑥̄ ∞
𝑟 𝑟−1 𝑟−1
𝔼𝑋 = 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 ≥ 𝑟 ∫ 𝑥 ℙ{𝑋 > 𝑥}𝑥 + 𝑟 ∫ 𝑥𝑟−1 𝑏𝑥−𝛼 𝑥.
0 0 𝑥̄
∞
We know that ∫𝑥̄ 𝑥𝑟−𝛼−1 𝑥 = ∞ whenever 𝑟 − 𝛼 − 1 ≥ −1.
Since 𝑟 ≥ 𝛼, we have 𝔼𝑋 𝑟 = ∞.
22.7.3 Exercise 3
np.random.seed(11)
n = 120
alphas = [1.15, 1.50, 1.75]
plt.subplots_adjust(hspace=0.4)
plt.show()
362 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
22.7.4 Exercise 4
sample_size = 1000
np.random.seed(13)
z = np.random.randn(sample_size)
data_1 = np.abs(z)
data_2 = np.exp(z)
data_3 = np.exp(np.random.exponential(scale=1.0, size=sample_size))
ax.set_xlabel("log rank")
ax.set_ylabel("log size")
ax.legend()
fig.subplots_adjust(hspace=0.4)
plt.show()
364 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
22.7.5 Exercise 5
To do the exercise, we need to choose the parameters 𝜇 and 𝜎 of the lognormal distribution to
match the mean and median of the Pareto distribution.
Here we understand the lognormal distribution as that of the random variable exp(𝜇 + 𝜎𝑍)
when 𝑍 is standard normal.
The mean and median of the Pareto distribution (5) with 𝑥̄ = 1 are
𝛼
mean = and median = 21/𝛼
𝛼−1
Using the corresponding expressions for the lognormal distribution leads us to the equations
22.7. SOLUTIONS 365
𝛼
= exp(𝜇 + 𝜎2 /2) and 21/𝛼 = exp(𝜇)
𝛼−1
which we solve for 𝜇 and 𝜎 given 𝛼 = 1.05
Here is code that generates the two samples, produces the violin plot and prints the mean
and standard deviation of the two samples.
β = 1 / (1 + r) # discount factor
x_bar = 1.0
α = 1.05
def pareto_rvs(n):
"Uses a standard method to generate Pareto draws."
u = np.random.uniform(size=n)
y = x_bar / (u**(1/α))
return y
In [10]: μ = np.log(2) / α
σ_sq = 2 * (np.log(α/(α - 1)) - np.log(2)/α)
σ = np.sqrt(σ_sq)
Here’s a function to compute a single estimate of tax revenue for a particular choice of distri-
bution dist.
tax_rev_lognorm = np.empty(num_reps)
tax_rev_pareto = np.empty(num_reps)
for i in range(num_reps):
tax_rev_pareto[i] = tax_rev('pareto')
tax_rev_lognorm[i] = tax_rev('lognorm')
366 CHAPTER 22. HEAVY-TAILED DISTRIBUTIONS
fig, ax = plt.subplots()
ax.violinplot(data)
plt.show()
Looking at the output of the code, our main conclusion is that the Pareto assumption leads
to a lower mean and greater dispersion.
Part V
Introduction to Dynamics
367
Chapter 23
23.1 Contents
• Overview 23.2
• Some Definitions 23.3
• Graphical Analysis 23.4
• Exercises 23.5
• Solutions 23.6
23.2 Overview
In this lecture we give a quick introduction to discrete time dynamics in one dimension.
In one-dimensional models, the state of the system is described by a single variable.
Although most interesting dynamic models have two or more state variables, the one-
dimensional setting is a good place to learn the foundations of dynamics and build intuition.
Let’s start with some standard imports:
This section sets out the objects of interest and the kinds of properties we study.
369
370 CHAPTER 23. DYNAMICS IN ONE DIMENSION
Here 𝑆 is called the state space and 𝑥 is called the state variable.
In the definition,
• time homogeneity means that 𝑔 is the same at each time 𝑡
• first order means dependence on only one lag (i.e., earlier states such as 𝑥𝑡−1 do not en-
ter into (1)).
If 𝑥0 ∈ 𝑆 is given, then (1) recursively defines the sequence
Continuing in this way, and using our knowledge of geometric series, we find that, for any 𝑡 ≥
0,
1 − 𝑎𝑡
𝑥𝑡 = 𝑎𝑡 𝑥0 + 𝑏 (4)
1−𝑎
This is about all we need to know about the linear model.
We have an exact expression for 𝑥𝑡 for all 𝑡 and hence a full understanding of the dynamics.
Notice in particular that |𝑎| < 1, then, by (4), we have
𝑏
𝑥𝑡 → as 𝑡 → ∞ (5)
1−𝑎
regardless of 𝑥0
This is an example of what is called global stability, a topic we return to below.
In the linear example above, we obtained an exact analytical expression for 𝑥𝑡 in terms of ar-
bitrary 𝑡 and 𝑥0 .
This made analysis of dynamics very easy.
23.4. GRAPHICAL ANALYSIS 371
When models are nonlinear, however, the situation can be quite different.
For example, recall how we previously studied the law of motion for the Solow growth model,
a simplified version of which is
Here 𝑘 is capital stock and 𝑠, 𝑧, 𝛼, 𝛿 are positive parameters with 0 < 𝛼, 𝛿 < 1.
If you try to iterate like we did in (3), you will find that the algebra gets messy quickly.
Analyzing the dynamics of this model requires a different method (see below).
23.3.4 Stability
A steady state of the difference equation 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is a point 𝑥∗ in 𝑆 such that 𝑥∗ =
𝑔(𝑥∗ ).
In other words, 𝑥∗ is a fixed point of the function 𝑔 in 𝑆.
For example, for the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏, you can use the definition to check that
• 𝑥∗ ∶= 𝑏/(1 − 𝑎) is a steady state whenever 𝑎 ≠ 1.
• if 𝑎 = 1 and 𝑏 = 0, then every 𝑥 ∈ ℝ is a steady state.
• if 𝑎 = 1 and 𝑏 ≠ 0, then the linear model has no steady state in ℝ.
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called globally stable if, for all 𝑥0 ∈ 𝑆,
𝑥𝑡 = 𝑔𝑡 (𝑥0 ) → 𝑥∗ as 𝑡 → ∞
For example, in the linear model 𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏 with 𝑎 ≠ 1, the steady state 𝑥∗
• is globally stable if |𝑎| < 1 and
• fails to be globally stable otherwise.
This follows directly from (4).
A steady state 𝑥∗ of 𝑥𝑡+1 = 𝑔(𝑥𝑡 ) is called locally stable if there exists an 𝜖 > 0 such that
Let’s look at an example: the Solow model with dynamics given in (6).
We begin with some plotting code that you can ignore at first reading.
The function of the code is to produce 45 degree diagrams and time series plots.
return fig, ax
x = x0
xticks = [xmin]
xtick_labels = [xmin]
for i in range(num_arrows):
if i == 0:
ax.arrow(x, 0.0, 0.0, g(x), **arrow_args) # x, y, dx, dy
else:
ax.arrow(x, x, 0.0, g(x) - x, **arrow_args)
ax.plot((x, x), (0, x), 'k', ls='dotted')
x = g(x)
xticks.append(x)
xtick_labels.append(r'${}_{}$'.format(var, str(i+1)))
ax.plot((x, x), (0, x), 'k-', ls='dotted')
xticks.append(xmax)
xtick_labels.append(xmax)
23.4. GRAPHICAL ANALYSIS 373
ax.set_xticks(xticks)
ax.set_yticks(xticks)
ax.set_xticklabels(xtick_labels)
ax.set_yticklabels(xtick_labels)
Let’s create a 45 degree diagram for the Solow model at a fixed set of parameters
𝑠𝑧 1/(1−𝛼)
𝑘∗ = ( )
𝛿
23.4. GRAPHICAL ANALYSIS 375
23.4.1 Trajectories
By the preceding discussion, in regions where 𝑔 lies above the 45 degree line, we know that
the trajectory is increasing.
The next figure traces out a trajectory in such a region so we can see this more clearly.
The initial condition is 𝑘0 = 0.25.
In [6]: k0 = 0.25
We can plot the time series of capital corresponding to the figure above as follows:
When capital stock is higher than the unique positive steady state we see that it declines:
In [9]: k0 = 2.95
The Solow model is nonlinear but still generates very regular dynamics.
One model that generates irregular dynamics is the quadratic map
x0 = 0.3
plot45(g, xmin, xmax, x0, num_arrows=0)
380 CHAPTER 23. DYNAMICS IN ONE DIMENSION
23.5 Exercises
23.5.1 Exercise 1
23.6 Solutions
23.6.1 Exercise 1
In [15]: a, b = 0.5, 1
xmin, xmax = -1, 3
g = lambda x: a * x + b
In [16]: x0 = -0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
Here is the corresponding time series, which converges towards the steady state.
In [18]: a, b = -0.5, 1
xmin, xmax = -1, 3
g = lambda x: a * x + b
In [19]: x0 = -0.5
plot45(g, xmin, xmax, x0, num_arrows=5)
386 CHAPTER 23. DYNAMICS IN ONE DIMENSION
Here is the corresponding time series, which converges towards the steady state.
Once again we have convergence to the steady state but the nature of convergence differs.
In particular, the time series jumps from above the steady state to below it and back again.
In the current context, the series is said to exhibit damped oscillations.
388 CHAPTER 23. DYNAMICS IN ONE DIMENSION
Chapter 24
AR1 Processes
24.1 Contents
• Overview 24.2
• The AR(1) Model 24.3
• Stationarity and Asymptotic Stability 24.4
• Ergodicity 24.5
• Exercises 24.6
• Solutions 24.7
24.2 Overview
In this lecture we are going to study a very simple class of stochastic models called AR(1)
processes.
These simple models are used again and again in economic research to represent the dynamics
of series such as
• labor income
• dividends
• productivity, etc.
AR(1) processes can take negative values but are easily converted into positive processes
when necessary by a transformation such as exponentiation.
We are going to study AR(1) processes partly because they are useful and partly because
they help us understand important concepts.
Let’s start with some imports:
389
390 CHAPTER 24. AR1 PROCESSES
𝑡−1 𝑡−1
𝑋𝑡 = 𝑎𝑡 𝑋0 + 𝑏 ∑ 𝑎𝑗 + 𝑐 ∑ 𝑎𝑗 𝑊𝑡−𝑗 (2)
𝑗=0 𝑗=0
Equation (2) shows that 𝑋𝑡 is a well defined random variable, the value of which depends on
• the parameters,
• the initial condition 𝑋0 and
• the shocks 𝑊1 , … 𝑊𝑡 from time 𝑡 = 1 to the present.
Throughout, the symbol 𝜓𝑡 will be used to refer to the density of this random variable 𝑋𝑡 .
One of the nice things about this model is that it’s so easy to trace out the sequence of distri-
butions {𝜓𝑡 } corresponding to the time series {𝑋𝑡 }.
To see this, we first note that 𝑋𝑡 is normally distributed for each 𝑡.
This is immediate form (2), since linear combinations of independent normal random vari-
ables are normal.
Given that 𝑋𝑡 is normally distributed, we will know the full distribution 𝜓𝑡 if we can pin
down its first two moments.
Let 𝜇𝑡 and 𝑣𝑡 denote the mean and variance of 𝑋𝑡 respectively.
We can pin down these values from (2) or we can use the following recursive expressions:
These expressions are obtained from (1) by taking, respectively, the expectation and variance
of both sides of the equality.
24.3. THE AR(1) MODEL 391
In calculating the second expression, we are using the fact that 𝑋𝑡 and 𝑊𝑡+1 are independent.
(This follows from our assumptions and (2).)
Given the dynamics in (2) and initial conditions 𝜇0 , 𝑣0 , we obtain 𝜇𝑡 , 𝑣𝑡 and hence
𝜓𝑡 = 𝑁 (𝜇𝑡 , 𝑣𝑡 )
The following code uses these facts to track the sequence of marginal distributions {𝜓𝑡 }.
The parameters are
sim_length = 10
grid = np.linspace(-5, 7, 120)
fig, ax = plt.subplots()
for t in range(sim_length):
mu = a * mu + b
v = a**2 * v + c**2
ax.plot(grid, norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
label=f"$\psi_{t}$",
alpha=0.7)
ax.legend(bbox_to_anchor=[1.05,1],loc=2,borderaxespad=1)
plt.show()
392 CHAPTER 24. AR1 PROCESSES
Notice that, in the figure above, the sequence {𝜓𝑡 } seems to be converging to a limiting dis-
tribution.
This is even clearer if we project forward further into the future:
fig, ax = plt.subplots()
plot_density_seq(ax)
plt.show()
In fact it’s easy to show that such convergence will occur, regardless of the initial condition,
whenever |𝑎| < 1.
To see this, we just have to look at the dynamics of the first two moments, as given in (3).
When |𝑎| < 1, these sequence converge to the respective limits
𝑏 𝑐2
𝜇∗ ∶= and 𝑣∗ = (4)
1−𝑎 1 − 𝑎2
(See our lecture on one dimensional dynamics for background on deterministic convergence.)
Hence
𝜓𝑡 → 𝜓∗ = 𝑁 (𝜇∗ , 𝑣∗ ) as 𝑡 → ∞ (5)
We can confirm this is valid for the sequence above using the following code.
mu_star = b / (1 - a)
std_star = np.sqrt(c**2 / (1 - a**2)) # square root of v_star
psi_star = norm.pdf(grid, loc=mu_star, scale=std_star)
ax.plot(grid, psi_star, 'k-', lw=2, label="$\psi^*$")
ax.legend()
plt.show()
394 CHAPTER 24. AR1 PROCESSES
A stationary distribution is a distribution that is a fixed point of the update rule for distribu-
tions.
In other words, if 𝜓𝑡 is stationary, then 𝜓𝑡+𝑗 = 𝜓𝑡 for all 𝑗 in ℕ.
A different way to put this, specialized to the current setting, is as follows: a density 𝜓 on ℝ
is stationary for the AR(1) process if
𝑋𝑡 ∼ 𝜓 ⟹ 𝑎𝑋𝑡 + 𝑏 + 𝑐𝑊𝑡+1 ∼ 𝜓
24.5 Ergodicity
1 𝑚
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞ (6)
𝑚 𝑡=1
whenever the integral on the right hand side is finite and well defined.
Notes:
• In (6), convergence holds with probability one.
• The textbook by [117] is a classic reference on ergodicity.
For example, if we consider the identity function ℎ(𝑥) = 𝑥, we get
1 𝑚
∑ 𝑋 → ∫ 𝑥𝜓∗ (𝑥)𝑑𝑥 as 𝑚 → ∞
𝑚 𝑡=1 𝑡
In other words, the time series sample mean converges to the mean of the stationary distribu-
tion.
As will become clear over the next few lectures, ergodicity is a very important concept for
statistics and simulation.
24.6 Exercises
24.6.1 Exercise 1
𝑀𝑘 ∶= 𝔼[(𝑋 − 𝔼𝑋)𝑘 ]
0 if 𝑘 is odd
𝑀𝑘 = {
𝜎𝑘 (𝑘 − 1)!! if 𝑘 is even
1 𝑚
∑(𝑋 − 𝜇∗ )𝑘 ≈ 𝑀𝑘
𝑚 𝑡=1 𝑡
when 𝑚 is large.
Confirm this by simulation at a range of 𝑘 using the default parameters from the lecture.
396 CHAPTER 24. AR1 PROCESSES
24.6.2 Exercise 2
Write your own version of a one dimensional kernel density estimator, which estimates a den-
sity from a sample.
Write it as a class that takes the data 𝑋 and bandwidth ℎ when initialized and provides a
method 𝑓 such that
1 𝑛 𝑥 − 𝑋𝑖
𝑓(𝑥) = ∑𝐾 ( )
ℎ𝑛 𝑖=1 ℎ
24.6.3 Exercise 3
In the lecture we discussed the following fact: For the 𝐴𝑅(1) process
3. Use the resulting sample of 𝑋𝑡+1 values to produce a density estimate via kernel density
estimation.
Try this for 𝑛 = 2000 and confirm that the simulation based estimate of 𝜓𝑡+1 does converge
to the theoretical distribution.
24.7 Solutions
24.7.1 Exercise 1
@njit
def sample_moments_ar1(k, m=100_000, mu_0=0.0, sigma_0=1.0, seed=1234):
np.random.seed(seed)
sample_sum = 0.0
x = mu_0 + sigma_0 * np.random.randn()
for t in range(m):
sample_sum += (x - mu_star)**k
x = a * x + b + c * np.random.randn()
return sample_sum / m
def true_moments_ar1(k):
if k % 2 == 0:
return std_star**k * factorial2(k - 1)
else:
return 0
k_vals = np.arange(6) + 1
sample_moments = np.empty_like(k_vals)
true_moments = np.empty_like(k_vals)
fig, ax = plt.subplots()
ax.plot(k_vals, true_moments, label="true moments")
ax.plot(k_vals, sample_moments, label="sample moments")
ax.legend()
plt.show()
398 CHAPTER 24. AR1 PROCESSES
24.7.2 Exercise 2
In [8]: K = norm.pdf
class KDE:
if h is None:
c = x_data.std()
n = len(x_data)
h = 1.06 * c * n**(-1/5)
self.h = h
self.x_data = x_data
n = 500
parameter_pairs= (2, 2), (2, 5), (0.5, 0.5)
for α, β in parameter_pairs:
plot_kde(beta(α, β))
400 CHAPTER 24. AR1 PROCESSES
We see that the kernel density estimator is effective when the underlying distribution is
smooth but less so otherwise.
24.7.3 Exercise 3
In [11]: a = 0.9
b = 0.0
c = 0.1
μ = -3
s = 0.2
In [12]: μ_next = a * μ + b
s_next = np.sqrt(a**2 * s**2 + c**2)
In [14]: ψ = norm(μ, s)
ψ_next = norm(μ_next, s_next)
In [15]: n = 2000
x_draws = ψ.rvs(n)
x_draws_next = a * x_draws + b + c * np.random.randn(n)
kde = KDE(x_draws_next)
fig, ax = plt.subplots()
ax.legend()
plt.show()
The simulated distribution approximately coincides with the theoretical distribution, as pre-
dicted.
402 CHAPTER 24. AR1 PROCESSES
Chapter 25
25.1 Contents
• Overview 25.2
• Definitions 25.3
• Simulation 25.4
• Marginal Distributions 25.5
• Irreducibility and Aperiodicity 25.6
• Stationary Distributions 25.7
• Ergodicity 25.8
• Computing Expectations 25.9
• Exercises 25.10
• Solutions 25.11
In addition to what’s in Anaconda, this lecture will need the following libraries:
25.2 Overview
Markov chains are one of the most useful classes of stochastic processes, being
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right
You will find them in many of the workhorse models of economics and finance.
In this lecture, we review some of the theory of Markov chains.
We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py.
Prerequisite knowledge is basic probability and linear algebra.
Let’s start with some standard imports:
403
404 CHAPTER 25. FINITE MARKOV CHAINS
25.3 Definitions
Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes.
It is too not difficult to check Section ?? that if 𝑃 is a stochastic matrix, then so is the 𝑘-th
power 𝑃 𝑘 for all 𝑘 ∈ ℕ.
In other words, knowing the current state is enough to know probabilities for future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
We can view 𝑃 as a stochastic matrix where
𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:
25.3. DEFINITIONS 405
25.3.3 Example 1
Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1).
Suppose that, over a one month period,
2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).
1−𝛼 𝛼
𝑃 =( ) (3)
𝛽 1−𝛽
Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?
We’ll cover such applications below.
25.3.4 Example 2
0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞
⎟
⎝ 0 0.508 0.492 ⎠
where
• the frequency is monthly
• the first state represents “normal growth”
• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97.
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }.
406 CHAPTER 25. FINITE MARKOV CHAINS
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities
25.4 Simulation
One natural way to answer questions about Markov chains is to simulate them.
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs).
Nice functionality for simulating Markov chains exists in QuantEcon.py.
• Efficient, bundled with lots of other useful routines for handling Markov chains.
However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py.
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1.
To simulate a Markov chain, we need its stochastic matrix 𝑃 and a probability distribution 𝜓
for the initial state to be drawn from.
The Markov chain is then constructed as discussed above. To repeat:
2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅).
To implement this simulation procedure, we need a method for generating draws from a dis-
crete distribution.
For this task, we’ll use random.draw from QuantEcon, which works as follows:
We’ll write our code as a function that takes the following three arguments
25.4. SIMULATION 407
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return
# set up
P = np.asarray(P)
X = np.empty(sample_size, dtype=int)
# simulate
X[0] = X_0
for t in range(sample_size - 1):
X[t+1] = qe.random.draw(P_dist[X[t]])
return X
As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25.
Moreover, this is true, regardless of the initial distribution from with 𝑋0 is drawn.
The following code illustrates this
Out[6]: 0.25002
You can try changing the initial distribution to confirm that the output is always close to
0.25.
As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion.
Here’s an illustration using the same P as the preceding example
408 CHAPTER 25. FINITE MARKOV CHAINS
mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)
Out[7]: 0.2506
If we want to simulate with output as indices rather than state values we can use
In [13]: mc.simulate_indices(ts_length=4)
Suppose that
In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities.
Rewriting this statement in terms of marginal and conditional probabilities gives
𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)
In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃 .
By repeating this 𝑚 times we move forward 𝑚 steps into the future.
Hence, iterating on (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃 .
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚 .
This is very important, so let’s repeat it
𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)
𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)
410 CHAPTER 25. FINITE MARKOV CHAINS
We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃 .
To see why, consider again (6), but now with 𝜓𝑡 putting all probability on state 𝑥
• 1 in the 𝑥-th position and zero elsewhere
Inserting this into (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚 .
In particular
Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month.
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥).
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product
0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞
⎟
1
⎝ ⎠
The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples.
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above.
Consider a large population of workers, each of whose lifetime experience is described by the
specified dynamics, independent of one another.
Let 𝜓 be the current cross-sectional distribution over {0, 1}.
The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment.
• For example, 𝜓(0) is the unemployment rate.
What will the cross-sectional distribution be in 10 periods hence?
The answer is 𝜓𝑃 10 , where 𝑃 is the stochastic matrix in (3).
This is because each worker is updated according to 𝑃 , so 𝜓𝑃 10 represents probabilities for a
single randomly selected worker.
25.6. IRREDUCIBILITY AND APERIODICITY 411
But when the sample is large, outcomes and probabilities are roughly equal (by the Law of
Large Numbers).
So for a very large (tending to infinite) population, 𝜓𝑃 10 also represents the fraction of work-
ers in each state.
This is exactly the cross-sectional distribution.
Irreducibility and aperiodicity are central concepts of modern Markov chain theory.
Let’s see what they’re about.
25.6.1 Irreducibility
We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes
0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞
⎟
⎝ 0.1 0.1 0.8 ⎠
It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually.
412 CHAPTER 25. FINITE MARKOV CHAINS
Out[14]: True
Here’s a more pessimistic scenario, where the poor are poor forever
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor.
Let’s confirm this
Out[15]: False
In [16]: mc.communication_classes
It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.
25.6. IRREDUCIBILITY AND APERIODICITY 413
25.6.2 Aperiodicity
Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise.
Here’s a trivial example with three states
mc = qe.MarkovChain(P)
mc.period
Out[17]: 3
More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers
In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise.
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2
mc = qe.MarkovChain(P)
mc.period
414 CHAPTER 25. FINITE MARKOV CHAINS
Out[18]: 2
In [19]: mc.is_aperiodic
Out[19]: False
As seen in (4), we can shift probabilities forward one unit of time via postmultiplication by
𝑃.
Some distributions are invariant under this updating process — for example,
25.7.1 Example
Recall our model of employment/unemployment dynamics for a given worker discussed above.
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields
𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment — more on interpretation
below.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.
As discussed above, a given Markov matrix 𝑃 can have many stationary distributions.
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃 .
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,
𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2
mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions
Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off.
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state.
The convergence in the theorem is illustrated in the next figure
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)
plt.show()
25.8. ERGODICITY 417
Here
• 𝑃 is the stochastic matrix for recession and growth considered above.
• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in ℝ3 .
• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, ….
• The black dot is 𝜓∗ .
The code for the figure can be found here — you might like to try experimenting with differ-
ent initial conditions.
25.8 Ergodicity
1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑚 𝑡=1
Here
• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise
• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0
The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity.
This gives us another way to interpret the stationary distribution — provided that the con-
418 CHAPTER 25. FINITE MARKOV CHAINS
25.8.1 Example
𝛽
𝑝=
𝛼+𝛽
𝔼[ℎ(𝑋𝑡 )] (8)
𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)
where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃
• ℎ is a given function, which, in expressions involving matrix algebra, we’ll think of as
the column vector
ℎ(𝑥1 )
⎛
ℎ=⎜ ⋮ ⎞
⎟
⎝ ℎ(𝑥 )
𝑛 ⎠
The unconditional expectation (8) is easy: We just sum over the distribution of 𝑋𝑡 to get
𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ
For the conditional expectation (9), we need to sum over the conditional distribution of 𝑋𝑡+𝑘
given 𝑋𝑡 = 𝑥.
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so
∞
𝔼 [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
where
(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯
25.10 Exercises
25.10.1 Exercise 1
According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix
1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽
with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be
𝛽
𝑝 ∶=
𝛼+𝛽
In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where
420 CHAPTER 25. FINITE MARKOV CHAINS
1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1
The exercise is to illustrate this convergence by computing 𝑋̄ 𝑚 for large 𝑚 and checking that
it is close to 𝑝.
You will see that this statement is true regardless of the choice of initial condition or the val-
ues of 𝛼, 𝛽, provided both lie in (0, 1).
25.10.2 Exercise 2
𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖
where
• ℓ𝑖 is the total number of outbound links from 𝑖
• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 ).
There is, however, another interpretation, and it brings us back to Markov chains.
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise.
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link.
With this definition of 𝑃 we have
𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖
at page 𝑗.
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank.
The data for this graph is in the web_graph_data.txt file — you can also view it here.
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n.
A typical line from the file has the form
d -> h;
In [23]: import re
re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics
When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a.
25.10.3 Exercise 3
𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2
Tauchen’s method [156] is the most common method for approximating this continuous state
process with a finite state Markov chain.
A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise.
As a first step, we choose
25.11. SOLUTIONS 423
1. If 𝑗 = 0, then set
1. If 𝑗 = 𝑛 − 1, then set
1. Otherwise, set
25.11 Solutions
25.11.1 Exercise 1
In [25]: α = β = 0.1
N = 10000
p = β / (α + β)
ax.legend(loc='upper right')
plt.show()
25.11.2 Exercise 2
First, save the data into a file called web_graph_data.txt by executing the next cell
c -> c;
c -> g;
c -> j;
c -> m;
d -> f;
d -> h;
d -> k;
e -> d;
e -> h;
e -> l;
f -> a;
f -> b;
f -> j;
f -> l;
g -> b;
g -> j;
h -> d;
h -> g;
h -> l;
h -> m;
i -> g;
i -> h;
i -> n;
j -> e;
j -> i;
j -> k;
k -> n;
l -> m;
m -> g;
n -> c;
n -> j;
n -> m;
Overwriting web_graph_data.txt
In [27]: """
Return list of pages, ordered by rank
"""
import re
from operator import itemgetter
infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'
print(f'{name}: {rank:.4}')
Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911
25.11.3 Exercise 3
Inventory Dynamics
26.1 Contents
• Overview 26.2
• Sample Paths 26.3
• Marginal Distributions 26.4
• Exercises 26.5
• Solutions 26.6
26.2 Overview
In this lecture we will study the time path of inventories for firms that follow so-called s-S
inventory dynamics.
Such firms
These kinds of policies are common in practice and also optimal in certain circumstances.
A review of early literature and some macroeconomic implications can be found in [30].
Here our main aim is to learn more about simulation, time series and Markov dynamics.
While our Markov environment and many of the concepts we consider are related to those
found in our lecture on finite Markov chains, the state space is a continuum in the current
application.
Let’s start with some imports
427
428 CHAPTER 26. INVENTORY DYNAMICS
(𝑆 − 𝐷𝑡+1 )+ if 𝑋𝑡 ≤ 𝑠
𝑋𝑡+1 = {
(𝑋𝑡 − 𝐷𝑡+1 )+ if 𝑋𝑡 > 𝑠
𝐷𝑡 = exp(𝜇 + 𝜎𝑍𝑡 )
where 𝜇 and 𝜎 are parameters and {𝑍𝑡 } is IID and standard normal.
Here’s a class that stores parameters and generates time paths for inventory.
In [2]: firm_data = [
('s', float64), # restock trigger level
('S', float64), # capacity
('mu', float64), # shock location parameter
('sigma', float64) # shock scale parameter
]
@jitclass(firm_data)
class Firm:
Z = np.random.randn()
D = np.exp(self.mu + self.sigma * Z)
if x <= self.s:
return max(self.S - D, 0)
else:
return max(x - D, 0)
X = np.empty(sim_length)
X[0] = x_init
for t in range(sim_length-1):
X[t+1] = self.update(X[t])
return X
s, S = firm.s, firm.S
sim_length = 100
x_init = 50
X = firm.sim_inventory_path(x_init, sim_length)
fig, ax = plt.subplots()
bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 3,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
ax.plot(X, label="inventory")
ax.plot(s * np.ones(sim_length), 'k--', label="$s$")
ax.plot(S * np.ones(sim_length), 'k-', label="$S$")
ax.set_ylim(0, S+10)
ax.set_xlabel("time")
ax.legend(**legend_args)
plt.show()
Now let’s simulate multiple paths in order to build a more complete picture of the probabili-
ties of different outcomes:
In [4]: sim_length=200
fig, ax = plt.subplots()
430 CHAPTER 26. INVENTORY DYNAMICS
for i in range(400):
X = firm.sim_inventory_path(x_init, sim_length)
ax.plot(X, 'b', alpha=0.2, lw=0.5)
plt.show()
In [5]: T = 50
M = 200 # Number of draws
ymin, ymax = 0, S + 10
for ax in axes:
26.4. MARGINAL DISTRIBUTIONS 431
ax.grid(alpha=0.4)
ax = axes[0]
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$X_t$', fontsize=16)
ax.vlines((T,), -1.5, 1.5)
ax.set_xticks((T,))
ax.set_xticklabels((r'$T$',))
sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, 2 * T)
ax.plot(X, 'b-', lw=1, alpha=0.5)
ax.plot((T,), (X[T+1],), 'ko', alpha=0.5)
sample[m] = X[T+1]
axes[1].set_ylim(ymin, ymax)
axes[1].hist(sample,
bins=16,
density=True,
orientation='horizontal',
histtype='bar',
alpha=0.5)
plt.show()
In [6]: T = 50
M = 50_000
fig, ax = plt.subplots()
432 CHAPTER 26. INVENTORY DYNAMICS
sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, T+1)
sample[m] = X[T]
ax.hist(sample,
bins=36,
density=True,
histtype='bar',
alpha=0.75)
plt.show()
The allocation of probability mass is similar to what was shown by the histogram just above.
26.5 Exercises
26.5.1 Exercise 1
26.5.2 Exercise 2
Using simulation, calculate the probability that firms that start with 𝑋0 = 70 need to order
twice or more in the first 50 periods.
You will need a large sample size to get an accurate reading.
26.6 Solutions
26.6.1 Exercise 1
@njit(parallel=True)
def shift_firms_forward(current_inventory_levels, num_periods):
num_firms = len(current_inventory_levels)
new_inventory_levels = np.empty(num_firms)
for f in prange(num_firms):
x = current_inventory_levels[f]
for t in range(num_periods):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S - D, 0)
else:
x = max(x - D, 0)
new_inventory_levels[f] = x
return new_inventory_levels
In [10]: x_init = 50
num_firms = 50_000
first_diffs = np.diff(sample_dates)
fig, ax = plt.subplots()
X = np.ones(num_firms) * x_init
current_date = 0
for d in first_diffs:
X = shift_firms_forward(X, d)
current_date += d
plot_kde(X, ax, label=f't = {current_date}')
ax.set_xlabel('inventory')
26.6. SOLUTIONS 435
ax.set_ylabel('probability')
ax.legend()
plt.show()
26.6.2 Exercise 2
In [11]: @njit(parallel=True)
def compute_freq(sim_length=50, x_init=70, num_firms=1_000_000):
for t in range(sim_length):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
436 CHAPTER 26. INVENTORY DYNAMICS
x = max(S - D, 0)
restock_counter += 1
else:
x = max(x - D, 0)
if restock_counter > 1:
firm_counter += 1
Note the time the routine takes to run, as well as the output.
In [12]: %%time
freq = compute_freq()
print(f"Frequency of at least two stock outs = {freq}")
Try switching the parallel flag to False in the jitted function above.
Depending on your system, the difference can be substantial.
(On our desktop machine, the speed up is by a factor of 5.)
Chapter 27
27.1 Contents
• Overview 27.2
• The Linear State Space Model 27.3
• Distributions and Moments 27.4
• Stationarity and Ergodicity 27.5
• Noisy Observations 27.6
• Prediction 27.7
• Code 27.8
• Exercises 27.9
• Solutions 27.10
“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace
In addition to what’s in Anaconda, this lecture will need the following libraries:
27.2 Overview
437
438 CHAPTER 27. LINEAR STATE SPACE MODELS
27.3.1 Primitives
1. the matrices 𝐴, 𝐶, 𝐺
Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model (1) pins down the values of the se-
quences {𝑥𝑡 } and {𝑦𝑡 }.
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }.
Later we’ll see how to compute these distributions and their moments.
We’ve made the common assumption that the shocks are independent standardized normal
vectors.
27.3. THE LINEAR STATE SPACE MODEL 439
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence.
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information.
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies
This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼).
27.3.2 Examples
1 1 0 0 0
⎡
𝑥𝑡 = ⎢ 𝑦𝑡 ⎤⎥
⎡
𝐴 = ⎢ 𝜙0 𝜙1 𝜙2 ⎤
⎥ 𝐶 = ⎢0⎤
⎡
⎥ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦
You can confirm that under these definitions, (1) and (2) agree.
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
440 CHAPTER 27. LINEAR STATE SPACE MODELS
𝑦−1 = 1.
𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦
The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ].
The next figure shows the dynamics of this process when
Vector Autoregressions
𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦
Seasonals
0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦
442 CHAPTER 27. LINEAR STATE SPACE MODELS
It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:Section
??
𝑥𝑡+4 = 𝑥𝑡
Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.
Time Trends
1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
′
and starting at initial condition 𝑥0 = [0 1] .
In fact, it’s possible to use the state-space system to represent polynomial trends of any or-
der.
For instance, let
0 1 1 0 0
𝑥0 = ⎡ ⎤
⎢0⎥ 𝐴=⎡
⎢0 1 1 ⎥
⎤ 𝐶=⎡
⎢0⎥
⎤
⎣1⎦ ⎣0 0 1 ⎦ ⎣0⎦
′
and starting at initial condition 𝑥0 = [0 1 1] .
It follows that
1 𝑡 𝑡(𝑡 − 1)/2
𝑡 ⎡
𝐴 = ⎢0 1 𝑡 ⎤
⎥
⎣0 0 1 ⎦
Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1], so that 𝑥𝑡 contains linear and quadratic time trends.
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0
27.4. DISTRIBUTIONS AND MOMENTS 443
1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0
1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0] .
0 1
Substituting into the moving average representation (5), we obtain
𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0
Using (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡 .
We’ll explain what unconditional and conditional mean soon.
Letting 𝜇𝑡 ∶= 𝔼[𝑥𝑡 ] and using linearity of expectations, we find that
27.4.2 Distributions
In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution.
However, there are some situations where these moments alone tell us all we need to know.
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution.
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed).
This is the case here, given
In particular, given our Gaussian assumptions on the primitives and the linearity of (1) we
can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 Section ??.
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix.
But in fact we’ve already done this, in (6) and (7).
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have
𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)
27.4. DISTRIBUTIONS AND MOMENTS 445
In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s.
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
Let’s now try with 500,000 observations, showing only the histogram (without rotation)
446 CHAPTER 27. LINEAR STATE SPACE MODELS
Ensemble Means
Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average
1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇
approximates the expectation 𝔼[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers).
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50.
The parameters are the same as for the preceding figures, and the sample size is relatively
27.4. DISTRIBUTIONS AND MOMENTS 447
small (𝐼 = 20).
1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇
𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0
𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )
Autocovariance Functions
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)
Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date.
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models.
Let’s start with the intuition.
Let’s look at some more time series from the same model that we analyzed above.
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
27.5. STATIONARITY AND ERGODICITY 449
Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇 .
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞.
When such a distribution exists it is called a stationary distribution.
Since
𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )
Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution.
Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size.
By
• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )
• the definitions of 𝜇∞ and Σ∞ as fixed points of (6) and (7) respectively
we’ve ensured that
Moreover, in view of (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ , which
depends on 𝑗 but not on 𝑡.
This motivates the following definition.
A process {𝑥𝑡 } is said to be covariance stationary if
• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡
• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡.
The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity.
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True.
27.5. STATIONARITY AND ERGODICITY 451
The difference equation (7) also has a unique fixed point in this case, and, moreover
𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞
𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0
where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector
′
Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1.
It follows that
Let 𝜇1𝑡 = 𝔼[𝑥1𝑡 ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one.
Then (15) has a unique stationary solution, namely,
𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎
′
The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1] .
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy
Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
452 CHAPTER 27. LINEAR STATE SPACE MODELS
Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡.
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component.
Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on (7) converge to the fixed point of the discrete
Lyapunov equation in the first line of (16).
27.5.5 Ergodicity
Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0 .
So now let’s take a single realization and form the time-series averages
1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡
Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity.
Ergodicity is the property that time series and ensemble averages coincide.
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution.
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
In our linear Gaussian setting, any covariance stationary process is also ergodic.
In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term.
27.7. PREDICTION 453
Often this error term represents the idea that the true state can only be observed imperfectly.
To include an error term in the observation we introduce
• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × ℓ matrix 𝐻.
and extend the linear state-space system to
𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )
27.7 Prediction
The theory of prediction for linear state space systems is elegant and simple.
The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 .
That 𝔼𝑡 [𝑥𝑡+1 ] = 𝔼[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property.
The one-step-ahead forecast error is
More generally, we’d like to compute the 𝑗-step ahead forecasts 𝔼𝑡 [𝑥𝑡+𝑗 ] and 𝔼𝑡 [𝑦𝑡+𝑗 ].
With a bit of algebra, we obtain
In view of the IID property, current and past state values provide no information about fu-
ture values of the shock.
Hence 𝔼𝑡 [𝑤𝑡+𝑘 ] = 𝔼[𝑤𝑡+𝑘 ] = 0.
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is
𝔼𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡
It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors
𝑗−1
𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0
Evidently,
𝑗−1
′
𝑉𝑗 ∶= 𝔼𝑡 [(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ]) ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘
′
(21)
𝑘=0
𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡 .
Under particular conditions, 𝑉𝑗 converges to
𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of 𝐶 that equal 0.
In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system (1).
We want the following objects
∞
• Forecast of a geometric sum of future 𝑥’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ].
∞
• Forecast of a geometric sum of future 𝑦’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ].
These objects are important components of some famous and interesting dynamic models.
For example,
∞
• if {𝑦𝑡 } is a stream of dividends, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price
∞
• if {𝑦𝑡 } is the money supply, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Formulas
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
∞
𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0
27.8 Code
Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package.
The code implements a class for handling linear state space models (simulations, calculating
moments, etc.).
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence().
Go back and read the relevant documentation if you’ve forgotten how generator functions
work.
456 CHAPTER 27. LINEAR STATE SPACE MODELS
27.9 Exercises
27.9.1 Exercise 1
27.9.2 Exercise 2
27.9.3 Exercise 3
27.9.4 Exercise 4
27.10 Solutions
27.10.1 Exercise 1
A = [[1, 0, 0 ],
[ϕ_0, ϕ_1, ϕ_2],
[0, 1, 0 ]]
C = np.zeros((3, 1))
G = [0, 1, 0]
ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()
27.10.2 Exercise 2
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)
27.10.3 Exercise 3
I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15
ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)
ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y
27.10. SOLUTIONS 459
ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()
population_means = []
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()
27.10.4 Exercise 4
T0 = 10
T1 = 50
T2 = 75
T4 = 100
460 CHAPTER 27. LINEAR STATE SPACE MODELS
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)
for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()
Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then (1) and
(10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that every 𝑥𝑡
is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
Chapter 28
28.1 Contents
• Overview 28.2
• Details 28.3
• Implementation 28.4
• Stochastic Shocks 28.5
• Government Spending 28.6
• Wrapping Everything Into a Class 28.7
• Using the LinearStateSpace Class 28.8
• Pure Multiplier Model 28.9
• Summary 28.10
In addition to what’s in Anaconda, this lecture will need the following libraries:
28.2 Overview
This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [139].
In doing so, we extend the example of the Solow model class in our second OOP lecture.
Our objectives are to
• provide a more detailed example of OOP and classes
• review a famous model
• review linear difference equations, both deterministic and stochastic
Let’s start with some standard imports:
461
462 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
We’ll also use the following for various tasks described below:
Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:
• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases.
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1.
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2.
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply.
(To read about linear difference equations see here or chapter IX of [142])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output.
Possible dynamic properties include
• smooth convergence to a constant level of output
• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode
Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand.
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles.
(To read about stochastic linear difference equations see chapter XI of [142])
28.3 Details
𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)
𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)
• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation
(1) asserts that people consume a fraction of math:a in (0,1) of each additional dollar of
income.
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation (2) asserts that
people invest in physical capital when income is increasing and disinvest when it is de-
creasing.
Equations (1), (2), and (3) imply the following second-order linear difference equation for na-
tional income:
𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )
or
̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2
We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large.
The deterministic version of the model described so far — meaning that no random shocks
hit aggregate demand — has only transient fluctuations.
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand.
𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2
or
To discover the properties of the solution of (6), it is useful first to form the characteristic
polynomial for (6):
𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)
𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)
𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase.
28.3. DETAILS 465
𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))
𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))
𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2
where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2 .
When the roots are complex, it is useful to pursue the following calculations.
Notice that
𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)
The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number.
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms
𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃
So we can write
where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2 .
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟.
We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles.
(Draw a cosine function to convince yourself of this please)
Remark: Following [139], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:
466 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model.
1 0 0
𝐴=⎡
⎢𝛾 + 𝐺 𝜌 ⎤
1 𝜌2 ⎥
⎣ 0 1 0⎦
28.4. IMPLEMENTATION 467
28.4 Implementation
# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])
return fig
param_plot()
plt.show()
The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:
• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions.
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles.
Later we’ll present the graph with a red mark showing the particular point implied by the
28.4. IMPLEMENTATION 469
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; \
therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; \
therefore get smooth convergence to a steady state')
categorize_solution(1.3, -.4)
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()
The following function calculates roots of the characteristic polynomial using high school al-
gebra.
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set
470 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
roots = []
ρ1 = α + β
ρ2 = -β
print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')
discriminant = ρ1 ** 2 + 4 * ρ2
if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))
return y_t
plot_y(y_nonstochastic())
ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
28.4. IMPLEMENTATION 471
The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form
𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)
• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots
r = .95
period = 10 # Length of cycle in units of time
ϕ = 2 * math.pi/period
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)
ρ1 = ρ1.real
ρ2 = ρ2.real
ρ1, ρ2
Here we’ll use numpy to compute the roots of the characteristic polynomial
p1 = cmath.polar(r1)
p2 = cmath.polar(r2)
r, ϕ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124, -0.9024999999999999
# Useful constants
ρ1 = α + β
ρ2 = -β
categorize_solution(ρ1, ρ2)
return y_t
plot_y(y_nonstochastic())
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
Roots are less than one
474 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
28.4. IMPLEMENTATION 475
We can also use sympy to compute analytic formulas for the roots
In [14]: init_printing()
r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")
𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2
In [15]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b
𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2
476 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
ϵ = np.random.normal(0, 1, n)
return y_t
plot_y(y_stochastic())
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
28.5. STOCHASTIC SHOCKS 477
Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots
In [17]: r = .97
a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
478 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
ϵ = np.random.normal(0, 1, n)
# Stochastic
else:
ϵ = np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * ϵ[t]
# No government spending
if g == 0:
y_t.append(transition(y_t, t))
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
We can also see the response to a one time jump in government expenditures
Roots are real and absolute values are less than one; therefore get smooth�
↪convergence
to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
28.7. WRAPPING EVERYTHING INTO A CLASS 481
.. math::
Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
482 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.
"""
def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):
def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True
def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'
28.7. WRAPPING EVERYTHING INTO A CLASS 483
# Stochastic
else:
ϵ = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g \
+ self.σ * ϵ[t]
def generate_series(self):
# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))
def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')
if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')
if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
484 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
print('Non-stochastic series')
if self.g != 0:
print('Government spending equal to ' + str(self.g))
if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))
def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()
return fig
def param_plot(self):
fig = param_plot()
ax = fig.gca()
plt.legend(fontsize=12, loc=3)
return fig
Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20
In [23]: sam.plot()
plt.show()
We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed.
The red + sign shows the location of the roots
In [24]: sam.param_plot()
plt.show()
486 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above.
Here is how we map the Samuelson model into an instance of a LinearStateSpace class
A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]
C[1] = σ # stochastic
x, y = sam_t.simulate(ts_length=n)
axes[-1].set_xlabel('Iteration')
plt.show()
Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class
Out[26]: (2, 6, 1)
(2, 6, 1)
Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴
In [27]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)
We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use
"""
This subclass creates a Samuelson multiplier-accelerator model
as a linear state space system.
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):
self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
x, y = self.simulate(ts_length)
axes[-1].set_xlabel('Iteration')
return fig
x, y = self.impulse_response(j)
return fig
28.8.3 Illustrations
In [32]: samlss.plot_irf(100)
plt.show()
In [33]: samlss.multipliers()
492 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator
In [35]: pure_multiplier.plot_simulation()
[35]:
28.9. PURE MULTIPLIER MODEL 493
In [37]: pure_multiplier.plot_simulation()
[37]:
494 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
In [38]: pure_multiplier.plot_irf(100)
[38]:
28.10. SUMMARY 495
28.10 Summary
In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [139].
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating.
We also were able to represent the model using the QuantEcon.py LinearStateSpace class.
496 CHAPTER 28. APPLICATION: THE SAMUELSON MULTIPLIER-ACCELERATOR
Chapter 29
29.1 Contents
• Overview 29.2
• Kesten Processes 29.3
• Heavy Tails 29.4
• Application: Firm Dynamics 29.5
• Exercises 29.6
• Solutions 29.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
29.2 Overview
497
498 CHAPTER 29. KESTEN PROCESSES AND FIRM DYNAMICS
import quantecon as qe
The following two lines are only added to avoid a FutureWarning caused by compatibility
issues between pandas and matplotlib.
Additional technical background related to this lecture can be found in the monograph of
[27].
The GARCH model is common in financial applications, where time series such as asset re-
turns exhibit time varying volatility.
For example, consider the following plot of daily returns on the Nasdaq Composite Index for
the period 1st January 2006 to 1st November 2019.
r = s.pct_change()
fig, ax = plt.subplots()
ax.plot(r, alpha=0.7)
ax.set_ylabel('returns', fontsize=12)
29.3. KESTEN PROCESSES 499
ax.set_xlabel('date', fontsize=12)
plt.show()
[*********************100%***********************] 1 of 1 completed
Notice how the series exhibits bursts of volatility (high variance) and then settles down again.
GARCH models can replicate this feature.
The GARCH(1, 1) volatility process takes the form
2
𝜎𝑡+1 2
= 𝛼0 + 𝜎𝑡2 (𝛼1 𝜉𝑡+1 + 𝛽) (2)
where {𝜉𝑡 } is IID with 𝔼𝜉𝑡2 = 1 and all parameters are positive.
Returns on a given asset are then modeled as
𝑟𝑡 = 𝜎𝑡 𝜁𝑡 (3)
Suppose that a given household saves a fixed fraction 𝑠 of its current wealth in every period.
The household earns labor income 𝑦𝑡 at the start of time 𝑡.
500 CHAPTER 29. KESTEN PROCESSES AND FIRM DYNAMICS
29.3.3 Stationarity
In earlier lectures, such as the one on AR(1) processes, we introduced the notion of a station-
ary distribution.
In the present context, we can define a stationary distribution as follows:
The distribution 𝐹 ∗ on ℝ is called stationary for the Kesten process (1) if
In other words, if the current state 𝑋𝑡 has distribution 𝐹 ∗ , then so does the next period state
𝑋𝑡+1 .
We can write this alternatively as
The left hand side is the distribution of the next period state when the current state is drawn
from 𝐹 ∗ .
The equality in (6) states that this distribution is unchanged.
Noting that the fraction of households with wealth in interval 𝑑𝑤 is 𝐹 ∗ (𝑑𝑤), we get
By the definition of stationarity and the assumption that 𝐹 ∗ is stationary for the wealth pro-
cess, this is just 𝐹 ∗ (𝑦).
Hence the fraction of households with wealth in [0, 𝑦] is the same next period as it is this pe-
riod.
Since 𝑦 was chosen arbitrarily, the distribution is unchanged.
The Kesten process 𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 does not always have a stationary distribution.
For example, if 𝑎𝑡 ≡ 𝜂𝑡 ≡ 1 for all 𝑡, then 𝑋𝑡 = 𝑋0 + 𝑡, which diverges to infinity.
To prevent this kind of divergence, we require that {𝑎𝑡 } is strictly less than 1 most of the
time.
In particular, if
Under certain conditions, the stationary distribution of a Kesten process has a Pareto tail.
(See our earlier lecture on heavy-tailed distributions for background.)
This fact is significant for economics because of the prevalence of Pareto-tailed distributions.
To state the conditions under which the stationary distribution of a Kesten process has a
Pareto tail, we first recall that a random variable is called nonarithmetic if its distribution
is not concentrated on {… , −2𝑡, −𝑡, 0, 𝑡, 2𝑡, …} for any 𝑡 ≥ 0.
For example, any random variable with a density is nonarithmetic.
The famous Kesten–Goldie Theorem (see, e.g., [27], theorem 2.4.4) states that if
𝔼𝑎𝛼
𝑡 = 1, 𝔼𝜂𝑡𝛼 < ∞, and 𝔼[𝑎𝛼+1
𝑡 ]<∞
then the stationary distribution of the Kesten process has a Pareto tail with tail index 𝛼.
More precisely, if 𝐹 ∗ is the unique stationary distribution and 𝑋 ∗ ∼ 𝐹 ∗ , then
29.4.2 Intuition
In [5]: μ = -0.5
σ = 1.0
def kesten_ts(ts_length=100):
x = np.zeros(ts_length)
for t in range(ts_length-1):
a = np.exp(μ + σ * np.random.randn())
b = np.exp(np.random.randn())
x[t+1] = a * x[t] + b
return x
fig, ax = plt.subplots()
num_paths = 10
29.5. APPLICATION: FIRM DYNAMICS 503
np.random.seed(12)
for i in range(num_paths):
ax.plot(kesten_ts())
ax.set(xlabel='time', ylabel='$X_t$')
plt.show()
As noted in our lecture on heavy tails, for common measures of firm size such as revenue or
employment, the US firm size distribution exhibits a Pareto tail (see, e.g., [13], [59]).
Let us try to explain this rather striking fact using the Kesten–Goldie Theorem.
It was postulated many years ago by Robert Gibrat [62] that firm size evolves according to a
simple rule whereby size next period is proportional to current size.
This is now known as Gibrat’s law of proportional growth.
We can express this idea by stating that a suitably defined measure 𝑠𝑡 of firm size obeys
𝑠𝑡+1
= 𝑎𝑡+1 (8)
𝑠𝑡
One implication of Gibrat’s law is that the growth rate of individual firms does not depend
on their size.
However, over the last few decades, research contradicting Gibrat’s law has accumulated in
the literature.
For example, it is commonly found that, on average,
1. small firms grow faster than large firms (see, e.g., [53] and [66]) and
2. the growth rate of small firms is more volatile than that of large firms [50].
On the other hand, Gibrat’s law is generally found to be a reasonable approximation for large
firms [53].
We can accommodate these empirical findings by modifying (8) to
where {𝑎𝑡 } and {𝑏𝑡 } are both IID and independent of each other.
In the exercises you are asked to show that (9) is more consistent with the empirical findings
presented above than Gibrat’s law in (8).
29.6 Exercises
29.6.1 Exercise 1
Simulate and plot 15 years of daily returns (consider each year as having 250 working days)
using the GARCH(1, 1) process in (2)–(3).
Take 𝜉𝑡 and 𝜁𝑡 to be independent and standard normal.
Set 𝛼0 = 0.00001, 𝛼1 = 0.1, 𝛽 = 0.9 and 𝜎0 = 0.
Compare visually with the Nasdaq Composite Index returns shown above.
While the time path differs, you should see bursts of high volatility.
29.6. EXERCISES 505
29.6.2 Exercise 2
In our discussion of firm dynamics, it was claimed that (9) is more consistent with the empiri-
cal literature than Gibrat’s law in (8).
(The empirical literature was reviewed immediately above (9).)
In what sense is this true (or false)?
29.6.3 Exercise 3
29.6.4 Exercise 4
One unrealistic aspect of the firm dynamics specified in (9) is that it ignores entry and exit.
In any given period and in any given market, we observe significant numbers of firms entering
and exiting the market.
Empirical discussion of this can be found in a famous paper by Hugo Hopenhayn [85].
In the same paper, Hopenhayn builds a model of entry and exit that incorporates profit max-
imization by firms and market clearing quantities, wages and prices.
In his model, a stationary equilibrium occurs when the number of entrants equals the number
of exiting firms.
In this setting, firm dynamics can be expressed as
𝑠𝑡+1 = 𝑒𝑡+1 𝟙{𝑠𝑡 < 𝑠}̄ + (𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 )𝟙{𝑠𝑡 ≥ 𝑠}̄ (10)
Here
• the state variable 𝑠𝑡 is represents productivity (which is a proxy for output and hence
firm size),
• the IID sequence {𝑒𝑡 } is thought of as a productivity draw for a new entrant and
• the variable 𝑠 ̄ is a threshold value that we take as given, although it is determined en-
dogenously in Hopenhayn’s model.
The idea behind (10) is that firms stay in the market as long as their productivity 𝑠𝑡 remains
at or above 𝑠.̄
• In this case, their productivity updates according to (9).
Firms choose to exit when their productivity 𝑠𝑡 falls below 𝑠.̄
• In this case, they are replaced by a new firm with productivity 𝑒𝑡+1 .
506 CHAPTER 29. KESTEN PROCESSES AND FIRM DYNAMICS
29.7 Solutions
29.7.1 Exercise 1
years = 15
days = years * 250
def garch_ts(ts_length=days):
σ2 = 0
r = np.zeros(ts_length)
for t in range(ts_length-1):
ξ = np.random.randn()
σ2 = α_0 + σ2 * (α_1 * ξ**2 + β)
r[t] = np.sqrt(σ2) * np.random.randn()
return r
fig, ax = plt.subplots()
29.7. SOLUTIONS 507
np.random.seed(12)
ax.plot(garch_ts(), alpha=0.7)
ax.set(xlabel='time', ylabel='$\\sigma_t^2$')
plt.show()
29.7.2 Exercise 2
Also, Gibrat’s law is generally found to be a reasonable approximation for large firms than for
small firms
The claim is that the dynamics in (9) are more consistent with points 1-2 than Gibrat’s law.
To see why, we rewrite (9) in terms of growth dynamics:
𝑠𝑡+1 𝑏
= 𝑎𝑡+1 + 𝑡+1 (11)
𝑠𝑡 𝑠𝑡
𝔼𝑏 𝕍𝑏
𝔼𝑎 + and 𝕍𝑎 +
𝑠 𝑠2
508 CHAPTER 29. KESTEN PROCESSES AND FIRM DYNAMICS
Both of these decline with firm size 𝑠, consistent with the data.
Moreover, the law of motion (11) clearly approaches Gibrat’s law (8) as 𝑠𝑡 gets large.
29.7.3 Exercise 3
𝔼 ln 𝑎𝑡 = 𝔼(𝜇 + 𝜎𝑍) = 𝜇,
and since 𝜂𝑡 has finite moments of all orders, the stationarity condition holds if and only if
𝜇 < 0.
Given the properties of the lognormal distribution (which has finite moments of all orders),
the only other condition in doubt is existence of a positive constant 𝛼 such that 𝔼𝑎𝛼
𝑡 = 1.
𝛼2 𝜎 2
exp (𝛼𝜇 + ) = 1.
2
29.7.4 Exercise 4
@njit(parallel=True)
def generate_draws(μ_a=-0.5,
σ_a=0.1,
μ_b=0.0,
σ_b=0.5,
μ_e=0.0,
σ_e=0.5,
s_bar=1.0,
T=500,
M=1_000_000,
s_init=1.0):
draws = np.empty(M)
for m in prange(M):
s = s_init
for t in range(T):
if s < s_bar:
new_s = np.exp(μ_e + σ_e * randn())
else:
a = np.exp(μ_a + σ_a * randn())
b = np.exp(μ_b + σ_b * randn())
29.7. SOLUTIONS 509
new_s = a * s + b
s = new_s
draws[m] = s
return draws
data = generate_draws()
plt.show()
30.1 Contents
• Overview 30.2
• Lorenz Curves and the Gini Coefficient 30.3
• A Model of Wealth Dynamics 30.4
• Implementation 30.5
• Applications 30.6
• Exercises 30.7
• Solutions 30.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
30.2 Overview
The evolution of wealth for any given household depends on their savings behavior.
511
512 CHAPTER 30. WEALTH DISTRIBUTION DYNAMICS
Modeling such behavior will form an important part of this lecture series.
However, in this particular lecture, we will be content with rather ad hoc (but plausible) sav-
ings rules.
We do this to more easily explore the implications of different specifications of income dy-
namics and investment returns.
At the same time, all of the techniques discussed here can be plugged into models that use
optimization to obtain savings rules.
We will use the following imports.
import quantecon as qe
from numba import njit, jitclass, float64, prange
fig, ax = plt.subplots()
ax.plot(f_vals, l_vals, label='Lorenz curve, lognormal sample')
ax.plot(f_vals, f_vals, label='Lorenz curve, equality')
ax.legend()
plt.show()
30.3. LORENZ CURVES AND THE GINI COEFFICIENT 513
This curve can be understood as follows: if point (𝑥, 𝑦) lies on the curve, it means that, col-
lectively, the bottom (100𝑥)% of the population holds (100𝑦)% of the wealth.
The “equality” line is the 45 degree line (which might not be exactly 45 degrees in the figure,
depending on the aspect ratio).
A sample that produces this line exhibits perfect equality.
The other line in the figure is the Lorenz curve for the lognormal sample, which deviates sig-
nificantly from perfect equality.
For example, the bottom 80% of the population holds around 40% of total wealth.
Here is another example, which shows how the Lorenz curve shifts as the underlying distribu-
tion changes.
We generate 10,000 observations using the Pareto distribution with a range of parameters,
and then compute the Lorenz curve corresponding to each set of observations.
You can see that, as the tail parameter of the Pareto distribution increases, inequality de-
creases.
This is to be expected, because a higher tail index implies less weight in the tail of the Pareto
distribution.
The definition and interpretation of the Gini coefficient can be found on the corresponding
Wikipedia page.
A value of 0 indicates perfect equality (corresponding the case where the Lorenz curve
matches the 45 degree line) and a value of 1 indicates complete inequality (all wealth held
by the richest household).
The QuantEcon.py library contains a function to calculate the Gini coefficient.
We can test it on the Weibull distribution with parameter 𝑎, where the Gini coefficient is
known to be
𝐺 = 1 − 2−1/𝑎
Let’s see if the Gini coefficient computed from a simulated sample matches this at each fixed
value of 𝑎.
fig, ax = plt.subplots()
for a in a_vals:
30.4. A MODEL OF WEALTH DYNAMICS 515
y = np.random.weibull(a, size=n)
ginis.append(qe.gini_coefficient(y))
ginis_theoretical.append(1 - 2**(-1/a))
ax.plot(a_vals, ginis, label='estimated gini coefficient')
ax.plot(a_vals, ginis_theoretical, label='theoretical gini coefficient')
ax.legend()
ax.set_xlabel("Weibull parameter $a$")
ax.set_ylabel("Gini coefficient")
plt.show()
where
• 𝑤𝑡 is wealth at time 𝑡 for a given household,
• 𝑟𝑡 is the rate of return of financial assets,
• 𝑦𝑡 is current non-financial (e.g., labor) income and
• 𝑠(𝑤𝑡 ) is current wealth net of consumption
Letting {𝑧𝑡 } be a correlated state process of the form
𝑅𝑡 ∶= 1 + 𝑟𝑡 = 𝑐𝑟 exp(𝑧𝑡 ) + exp(𝜇𝑟 + 𝜎𝑟 𝜉𝑡 )
and
𝑦𝑡 = 𝑐𝑦 exp(𝑧𝑡 ) + exp(𝜇𝑦 + 𝜎𝑦 𝜁𝑡 )
𝑠(𝑤) = 𝑠0 𝑤 ⋅ 𝟙{𝑤 ≥ 𝑤}
̂ (2)
30.5 Implementation
In [7]: wealth_dynamics_data = [
('w_hat', float64), # savings parameter
('s_0', float64), # savings parameter
('c_y', float64), # labor income parameter
('μ_y', float64), # labor income paraemter
('σ_y', float64), # labor income parameter
('c_r', float64), # rate of return parameter
('μ_r', float64), # rate of return parameter
('σ_r', float64), # rate of return parameter
('a', float64), # aggregate shock parameter
('b', float64), # aggregate shock parameter
('σ_z', float64), # aggregate shock parameter
('z_mean', float64), # mean of z process
('z_var', float64), # variance of z process
('y_mean', float64), # mean of y process
('R_mean', float64) # mean of R process
]
Here’s a class that stores instance data and implements methods that update the aggregate
state and household wealth.
30.5. IMPLEMENTATION 517
In [8]: @jitclass(wealth_dynamics_data)
class WealthDynamics:
def __init__(self,
w_hat=1.0,
s_0=0.75,
c_y=1.0,
μ_y=1.0,
σ_y=0.2,
c_r=0.05,
μ_r=0.1,
σ_r=0.5,
a=0.5,
b=0.0,
σ_z=0.1):
def parameters(self):
"""
Collect and return parameters.
"""
parameters = (self.w_hat, self.s_0,
self.c_y, self.μ_y, self.σ_y,
self.c_r, self.μ_r, self.σ_r,
self.a, self.b, self.σ_z)
return parameters
# Simplify names
params = self.parameters()
w_hat, s_0, c_y, μ_y, σ_y, c_r, μ_r, σ_r, a, b, σ_z = params
zp = a * z + b + σ_z * np.random.randn()
# Update wealth
y = c_y * np.exp(zp) + np.exp(μ_y + σ_y * np.random.randn())
wp = y
518 CHAPTER 30. WEALTH DISTRIBUTION DYNAMICS
if w >= w_hat:
R = c_r * np.exp(zp) + np.exp(μ_r + σ_r * np.random.randn())
wp += R * s_0 * w
return wp, zp
Here’s function to simulate the time series of wealth for in individual households.
In [9]: @njit
def wealth_time_series(wdy, w_0, n):
"""
Generate a single time series of length n for wealth given
initial value w_0.
The initial persistent state z_0 for each household is drawn from
the stationary distribution of the AR(1) process.
"""
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = np.empty(n)
w[0] = w_0
for t in range(n-1):
w[t+1], z = wdy.update_states(w[t], z)
return w
In [10]: @njit(parallel=True)
def update_cross_section(wdy, w_distribution, shift_length=500):
"""
Shifts a cross-section of household forward in time
"""
new_distribution = np.empty_like(w_distribution)
new_distribution[i] = w
return new_distribution
Parallelization is very effective in the function above because the time path of each household
can be calculated independently once the path for the aggregate state is known.
30.6 Applications
Let’s try simulating the model at different parameter values and investigate the implications
for the wealth distribution.
ts_length = 200
w = wealth_time_series(wdy, wdy.y_mean, ts_length)
fig, ax = plt.subplots()
ax.plot(w)
plt.show()
Now we investigate how the Lorenz curves associated with the wealth distribution change as
return to savings varies.
The code below plots Lorenz curves for three different values of 𝜇𝑟 .
If you are running this yourself, note that it will take one or two minutes to execute.
This is unavoidable because we are executing a CPU intensive task.
In fact the code, which is JIT compiled and parallelized, runs extremely fast relative to the
number of computations.
The Lorenz curve shifts downwards as returns on financial income rise, indicating a rise in
inequality.
We will look at this again via the Gini coefficient immediately below, but first consider the
following image of our system resources when the code above is executing:
Notice how effectively Numba has implemented multithreading for this routine: all 8 CPUs
on our workstation are running at maximum capacity (even though four of them are virtual).
Since the code is both efficiently JIT compiled and fully parallelized, it’s close to impossible
to make this sequence of tasks run faster without changing hardware.
Now let’s check the Gini coefficient.
522 CHAPTER 30. WEALTH DISTRIBUTION DYNAMICS
Once again, we see that inequality increases as returns on financial income rise.
Let’s finish this section by investigating what happens when we change the volatility term 𝜎𝑟
in financial returns.
We see that greater volatility has the effect of increasing inequality in this model.
30.7 Exercises
30.7.1 Exercise 1
For a wealth or income distribution with Pareto tail, a higher tail index suggests lower in-
equality.
Indeed, it is possible to prove that the Gini coefficient of the Pareto distribution with tail in-
dex 𝑎 is 1/(2𝑎 − 1).
To the extent that you can, confirm this by simulation.
In particular, generate a plot of the Gini coefficient against the tail index using both the theo-
retical value just given and the value computed from a sample via qe.gini_coefficient.
For the values of the tail index, use a_vals = np.linspace(1, 10, 25).
Use sample of size 1,000 for each 𝑎 and the sampling method for generating Pareto draws em-
ployed in the discussion of Lorenz curves for the Pareto distribution.
To the extend that you can, interpret the monotone relationship between the Gini index and
𝑎.
30.7.2 Exercise 2
The Kesten–Goldie theorem tells us that Kesten processes have Pareto tails under a range of
parameterizations.
The theorem does not directly apply here, since savings is not always constant and since the
multiplicative and additive terms in (1) are not IID.
At the same time, given the similarities, perhaps Pareto tails will arise.
To test this, run a simulation that generates a cross-section of wealth and generate a rank-size
plot.
In viewing the plot, remember that Pareto tails generate a straight line. Is this what you see?
For sample size and initial conditions, use
30.8 Solutions
Here is one solution, which produces a good match between theory and simulation.
30.8.1 Exercise 1
In general, for a Pareto distribution, a higher tail index implies less weight in the right hand
tail.
This means less extreme values for wealth and hence more equality.
More equality translates to a lower Gini index.
30.8.2 Exercise 2
fig, ax = plt.subplots()
ax.plot(rank_data, size_data, 'o', markersize=3.0, alpha=0.5)
ax.set_xlabel("log rank")
ax.set_ylabel("log size")
plt.show()
Chapter 31
31.1 Contents
• Overview 31.2
• The Density Case 31.3
• Beyond Densities 31.4
• Stability 31.5
• Exercises 31.6
• Solutions 31.7
• Appendix 31.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
31.2 Overview
In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models.
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains.
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications.
In this lecture, our focus will be on continuous Markov models that
• evolve in discrete-time
• are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as we’ll see later on.
The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,
• What happens to the distribution of the state variables?
• Is there anything we can say about the “average behavior” of these variables?
527
528 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
Note
For some people, the term “Markov chain” always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[117]) in using the term to refer to any discrete time Markov process.
You are probably aware that some distributions can be represented by densities and some
cannot.
(For example, distributions on the real numbers ℝ that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations.
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition.
Once we’ve built some intuition we’ll cover the general case.
In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space 𝑆.
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnega-
tive square matrix 𝑃 = 𝑃 [𝑖, 𝑗] such that each row 𝑃 [𝑖, ⋅] sums to one.
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖
to state 𝑗 in one unit of time.
In symbols,
ℙ{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]
Equivalently,
31.3. THE DENSITY CASE 529
1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (1)
2𝜋 2
IID
𝑋𝑡+1 = 𝑋𝑡 + 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (2)
In the previous section, we made the connection between stochastic difference equation (2)
and stochastic kernel (1).
In economics and time-series analysis we meet stochastic difference equations of all different
shapes and sizes.
It will be useful for us if we have some systematic methods for converting stochastic difference
equations into stochastic kernels.
To this end, consider the generic (scalar) stochastic difference equation given by
530 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
This is a special case of (3) with 𝜇(𝑥) = 𝛼𝑥 and 𝜎(𝑥) = (𝛽 + 𝛾𝑥2 )1/2 .
Example 3: With stochastic production and a constant savings rate, the one-sector neoclas-
sical growth model leads to a law of motion for capital per worker such as
Here
• 𝑠 is the rate of savings
• 𝐴𝑡+1 is a production shock
– The 𝑡 + 1 subscript indicates that 𝐴𝑡+1 is not visible at time 𝑡
• 𝛿 is a depreciation rate
• 𝑓 ∶ ℝ+ → ℝ+ is a production function satisfying 𝑓(𝑘) > 0 whenever 𝑘 > 0
(The fixed savings rate can be rationalized as the optimal policy for a particular set of tech-
nologies and preferences (see [108], section 3.1.2), although we omit the details here).
Equation (5) is a special case of (3) with 𝜇(𝑥) = (1 − 𝛿)𝑥 and 𝜎(𝑥) = 𝑠𝑓(𝑥).
Now let’s obtain the stochastic kernel corresponding to the generic model (3).
To find it, note first that if 𝑈 is a random variable with density 𝑓𝑈 , and 𝑉 = 𝑎 + 𝑏𝑈 for some
constants 𝑎, 𝑏 with 𝑏 > 0, then the density of 𝑉 is given by
1 𝑣−𝑎
𝑓𝑉 (𝑣) = 𝑓𝑈 ( ) (6)
𝑏 𝑏
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3).
Taking (6) as given for the moment, we can obtain the stochastic kernel 𝑝 for (3) by recalling
that 𝑝(𝑥, ⋅) is the conditional density of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥.
In the present case, this is equivalent to stating that 𝑝(𝑥, ⋅) is the density of 𝑌 ∶= 𝜇(𝑥) +
𝜎(𝑥) 𝜉𝑡+1 when 𝜉𝑡+1 ∼ 𝜙.
Hence, by (6),
31.3. THE DENSITY CASE 531
1 𝑦 − 𝜇(𝑥)
𝑝(𝑥, 𝑦) = 𝜙( ) (7)
𝜎(𝑥) 𝜎(𝑥)
1 𝑦 − (1 − 𝛿)𝑥
𝑝(𝑥, 𝑦) = 𝜙( ) (8)
𝑠𝑓(𝑥) 𝑠𝑓(𝑥)
In this section of our lecture on finite Markov chains, we asked the following question: If
This intuitive equality states that the probability of being at 𝑗 tomorrow is the probability of
visiting 𝑖 today and then going on to 𝑗, summed over all possible 𝑖.
In the density case, we just replace the sum with an integral and probability mass functions
with densities, yielding
Note
Unlike most operators, we write 𝑃 to the right of its argument, instead of to the
left (i.e., 𝜓𝑃 instead of 𝑃 𝜓). This is a common convention, with the intention be-
ing to maintain the parallel with the finite case — see here
532 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
With this notation, we can write (9) more succinctly as 𝜓𝑡+1 (𝑦) = (𝜓𝑡 𝑃 )(𝑦) for all 𝑦, or, drop-
ping the 𝑦 and letting “=” indicate equality of functions,
𝜓𝑡+1 = 𝜓𝑡 𝑃 (11)
Equation (11) tells us that if we specify a distribution for 𝜓0 , then the entire sequence of fu-
ture distributions can be obtained by iterating with 𝑃 .
It’s interesting to note that (11) is a deterministic difference equation.
Thus, by converting a stochastic difference equation such as (3) into a stochastic kernel 𝑝 and
hence an operator 𝑃 , we convert a stochastic difference equation into a deterministic one (al-
beit in a much higher dimensional space).
Note
Some people might be aware that discrete Markov chains are in fact a special case
of the continuous Markov chains we have just described. The reason is that proba-
bility mass functions are densities with respect to the counting measure.
31.3.4 Computation
To learn about the dynamics of a given process, it’s useful to compute and study the se-
quences of densities generated by the model.
One way to do this is to try to implement the iteration described by (10) and (11) using nu-
merical integration.
However, to produce 𝜓𝑃 from 𝜓 via (10), you would need to integrate at every 𝑦, and there is
a continuum of such 𝑦.
Another possibility is to discretize the model, but this introduces errors of unknown size.
A nicer alternative in the present setting is to combine simulation with an elegant estimator
called the look-ahead estimator.
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of
which we repeat here for convenience:
Our aim is to compute the sequence {𝜓𝑡 } associated with this model and fixed initial condi-
tion 𝜓0 .
To approximate 𝜓𝑡 by simulation, recall that, by definition, 𝜓𝑡 is the density of 𝑘𝑡 given 𝑘0 ∼
𝜓0 .
If we wish to generate observations of this random variable, all we need to do is
1 𝑛
𝜓𝑡𝑛 (𝑦) = 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) (13)
𝑛 𝑖=1
1 𝑛 𝑖 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) → 𝔼𝑝(𝑘𝑡−1 , 𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡−1 (𝑥) 𝑑𝑥 = 𝜓𝑡 (𝑦)
𝑛 𝑖=1
31.3.5 Implementation
A class called LAE for estimating densities by this technique can be found in lae.py.
Given our use of the __call__ method, an instance of LAE acts as a callable object, which
is essentially a function that can store its own data (see this discussion).
This function returns the right-hand side of (13) using
• the data and stochastic kernel that it stores as its instance data
• the value 𝑦 as its argument
The function is vectorized, in the sense that if psi is such an instance and y is an array, then
the call psi(y) acts elementwise.
(This is the reason that we reshaped X and y inside the class — to make vectorization work)
Because the implementation is fully vectorized, it is about as efficient as it would be in C or
Fortran.
534 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
31.3.6 Example
The following code is an example of usage for the stochastic growth model described above
# == Generate T instances of LAE using this data, one for each date t == #
laes = [LAE(p, k[:, t]) for t in range(T)]
# == Plot == #
fig, ax = plt.subplots()
ygrid = np.linspace(0.01, 4.0, 200)
greys = [str(g) for g in np.linspace(0.0, 0.8, T)]
greys.reverse()
for ψ, g in zip(laes, greys):
ax.plot(ygrid, ψ(ygrid), color=g, lw=2, alpha=0.6)
ax.set_xlabel('capital')
ax.set_title(f'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={T}$')
plt.show()
31.4. BEYOND DENSITIES 535
The figure shows part of the density sequence {𝜓𝑡 }, with each density computed via the look-
ahead estimator.
Notice that the sequence of densities shown in the figure seems to be converging — more on
this in just a moment.
Another quick comment is that each of these distributions could be interpreted as a cross-
sectional distribution (recall this discussion).
Up until now, we have focused exclusively on continuous state Markov chains where all condi-
tional distributions 𝑝(𝑥, ⋅) are densities.
As discussed above, not all distributions can be represented as densities.
If the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 cannot be represented as a density for
some 𝑥 ∈ 𝑆, then we need a slightly different theory.
The ultimate option is to switch from densities to probability measures, but not all readers
will be familiar with measure theory.
We can, however, construct a fairly general theory using distribution functions.
To illustrate the issues, recall that Hopenhayn and Rogerson [87] study a model of firm dy-
namics where individual firm productivity follows the exogenous process
536 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
IID
𝑋𝑡+1 = 𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 , where {𝜉𝑡 } ∼ 𝑁 (0, 𝜎2 )
If you think about it, you will see that for any given 𝑥 ∈ [0, 1], the conditional distribution of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥 puts positive probability mass on 0 and 1.
Hence it cannot be represented as a density.
What we can do instead is use cumulative distribution functions (cdfs).
To this end, set
This family of cdfs 𝐺(𝑥, ⋅) plays a role analogous to the stochastic kernel in the density case.
The distribution dynamics in (9) are then replaced by
Here 𝐹𝑡 and 𝐹𝑡+1 are cdfs representing the distribution of the current state and next period
state.
The intuition behind (14) is essentially the same as for (9).
31.4.2 Computation
If you wish to compute these cdfs, you cannot use the look-ahead estimator as before.
Indeed, you should not use any density estimator, since the objects you are estimating/com-
puting are not densities.
One good option is simulation as before, combined with the empirical distribution function.
31.5 Stability
In our lecture on finite Markov chains, we also studied stationarity, stability and ergodicity.
Here we will cover the same topics for the continuous case.
We will, however, treat only the density case (as in this section), where the stochastic kernel
is a family of densities.
The general case is relatively similar — references are given below.
31.5. STABILITY 537
Analogous to the finite case, given a stochastic kernel 𝑝 and corresponding Markov operator
as defined in (10), a density 𝜓∗ on 𝑆 is called stationary for 𝑃 if it is a fixed point of the op-
erator 𝑃 .
In other words,
As with the finite case, if 𝜓∗ is stationary for 𝑃 , and the distribution of 𝑋0 is 𝜓∗ , then, in
view of (11), 𝑋𝑡 will have this same distribution for all 𝑡.
Hence 𝜓∗ is the stochastic equivalent of a steady state.
In the finite case, we learned that at least one stationary distribution exists, although there
may be many.
When the state space is infinite, the situation is more complicated.
Even existence can fail very easily.
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210).
However, there are well-known conditions under which a stationary density 𝜓∗ exists.
With additional conditions, we can also get a unique stationary density (𝜓 ∈ 𝒟 and 𝜓 =
𝜓𝑃 ⟹ 𝜓 = 𝜓∗ ), and also global convergence in the sense that
∀ 𝜓 ∈ 𝒟, 𝜓𝑃 𝑡 → 𝜓∗ as 𝑡 → ∞ (16)
This combination of existence, uniqueness and global convergence in the sense of (16) is often
referred to as global stability.
Under very similar conditions, we get ergodicity, which means that
1 𝑛
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑛 → ∞ (17)
𝑛 𝑡=1
for any (measurable) function ℎ ∶ 𝑆 → ℝ such that the right-hand side is finite.
Note that the convergence in (17) does not depend on the distribution (or value) of 𝑋0 .
This is actually very important for simulation — it means we can learn about 𝜓∗ (i.e., ap-
proximate the right-hand side of (17) via the left-hand side) without requiring any special
knowledge about what to do with 𝑋0 .
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that
1. Probability mass does not drift off to the “edges” of the state space.
2. Sufficient “mixing” obtains.
As stated above, the growth model treated here is stable under mild conditions on the primi-
tives.
• See EDTC, section 11.3.4 for more details.
We can see this stability in action — in particular, the convergence in (16) — by simulating
the path of densities from various initial conditions.
Here is such a figure.
All sequences are converging towards the same limit, regardless of their initial condition.
The details regarding initial conditions and so on are given in this exercise, where you are
asked to replicate the figure.
In the preceding figure, each sequence of densities is converging towards the unique stationary
density 𝜓∗ .
Even from this figure, we can get a fair idea what 𝜓∗ looks like, and where its mass is located.
31.6. EXERCISES 539
However, there is a much more direct way to estimate the stationary density, and it involves
only a slight modification of the look-ahead estimator.
Let’s say that we have a model of the form (3) that is stable and ergodic.
Let 𝑝 be the corresponding stochastic kernel, as given in (7).
To approximate the stationary density 𝜓∗ , we can simply generate a long time-series
𝑋0 , 𝑋1 , … , 𝑋𝑛 and estimate 𝜓∗ via
1 𝑛
𝜓𝑛∗ (𝑦) = ∑ 𝑝(𝑋𝑡 , 𝑦) (18)
𝑛 𝑡=1
This is essentially the same as the look-ahead estimator (13), except that now the observa-
tions we generate are a single time-series, rather than a cross-section.
The justification for (18) is that, with probability one as 𝑛 → ∞,
1 𝑛
∑ 𝑝(𝑋𝑡 , 𝑦) → ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥 = 𝜓∗ (𝑦)
𝑛 𝑡=1
where the convergence is by (17) and the equality on the right is by (15).
The right-hand side is exactly what we want to compute.
On top of this asymptotic result, it turns out that the rate of convergence for the look-ahead
estimator is very good.
The first exercise helps illustrate this point.
31.6 Exercises
31.6.1 Exercise 1
IID
𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (19)
This is one of those rare nonlinear stochastic models where an analytical expression for the
stationary density is available.
In particular, provided that |𝜃| < 1, there is a unique stationary density 𝜓∗ given by
𝜃𝑦
𝜓∗ (𝑦) = 2 𝜙(𝑦) Φ [ ] (20)
(1 − 𝜃2 )1/2
Here 𝜙 is the standard normal density and Φ is the standard normal cdf.
As an exercise, compute the look-ahead estimate of 𝜓∗ , as defined in (18), and compare it
with 𝜓∗ in (20) to see whether they are indeed close for large 𝑛.
In doing so, set 𝜃 = 0.8 and 𝑛 = 500.
The next figure shows the result of such a computation
540 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
The additional density (black line) is a nonparametric kernel density estimate, added to the
solution for illustration.
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look-ahead estimator is a much tighter fit than the kernel density estima-
tor.
If you repeat the simulation you will see that this is consistently the case.
31.6.2 Exercise 2
31.6.3 Exercise 3
{𝑋1 , … , 𝑋𝑛 } ∼ 𝐿𝑁 (0, 1), {𝑌1 , … , 𝑌𝑛 } ∼ 𝑁 (2, 1), and {𝑍1 , … , 𝑍𝑛 } ∼ 𝑁 (4, 1),
31.6. EXERCISES 541
In [4]: n = 500
x = np.random.randn(n) # N(0, 1)
x = np.exp(x) # Map x to lognormal
y = np.random.randn(n) + 2.0 # N(2, 1)
z = np.random.randn(n) + 4.0 # N(4, 1)
Each data set is represented by a box, where the top and bottom of the box are the third and
first quartiles of the data, and the red line in the center is the median.
The boxes give some indication as to
• the location of probability mass for each sample
• whether the distribution is right-skewed (as is the lognormal distribution), etc
Now let’s put these ideas to use in a simulation.
Consider the threshold autoregressive model in (19).
We know that the distribution of 𝑋𝑡 will converge to (20) whenever |𝜃| < 1.
Let’s observe this convergence from different initial conditions using boxplots.
In particular, the exercise is to generate J boxplot figures, one for each initial condition 𝑋0 in
initial_conditions = np.linspace(8, 0, J)
542 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
2. Create a boxplot representing 𝑛 distributions, where the 𝑡-th distribution shows the 𝑘
observations of 𝑋𝑡 .
31.7 Solutions
31.7.1 Exercise 1
In [5]: ϕ = norm()
n = 500
θ = 0.8
# == Frequently used constants == #
d = np.sqrt(1 - θ**2)
δ = θ / d
def ψ_star(y):
"True stationary density of the TAR Model"
return 2 * norm.pdf(y) * norm.cdf(δ * y)
Z = ϕ.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = θ * np.abs(X[t]) + d * Z[t]
ψ_est = LAE(p, X)
k_est = gaussian_kde(X)
31.7.2 Exercise 2
ϕ = lognorm(a_σ)
for i in range(4):
ax = axes[i]
ax.set_xlim(0, xmax)
ψ_0 = beta(5, 5, scale=0.5, loc=i*2) # Initial distribution
544 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
31.7.3 Exercise 3
In [7]: n = 20
k = 5000
J = 6
31.7. SOLUTIONS 545
θ = 0.9
d = np.sqrt(1 - θ**2)
δ = θ / d
for j in range(J):
axes[j].set_ylim(-4, 8)
axes[j].set_title(f'time series from t = {initial_conditions[j]}')
Z = np.random.randn(k, n)
X[:, 0] = initial_conditions[j]
for t in range(1, n):
X[:, t] = θ * np.abs(X[:, t-1]) + d * Z[:, t]
axes[j].boxplot(X)
plt.show()
546 CHAPTER 31. CONTINUOUS STATE MARKOV CHAINS
31.8. APPENDIX 547
31.8 Appendix
32.1 Contents
• Overview 32.2
• The Growth Model 32.3
• Competitive Equilibrium 32.4
Coauthor: Brandon Kaplowitz
32.2 Overview
This lecture describes a model that Tjalling Koopmans [99] and David Cass [33] used to ana-
lyze optimal growth.
The model can be viewed as an extension of the model of Robert Solow described in an ear-
lier lecture but adapted to make the savings rate the outcome of an optimal choice.
(Solow assumed a constant saving rate determined outside the model).
We describe two versions of the model to illustrate what is, in fact, a more general connection
between a planned economy and an economy organized as a competitive equilibrium.
The lecture uses important ideas including
• Hicks-Arrow prices named after John R. Hicks and Kenneth Arrow.
• A min-max problem for solving a planning problem.
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions.
• A connection between some Lagrange multipliers in the min-max problem and the
Hicks-Arrow prices.
• A Big 𝐾 , little 𝑘 trick widely used in macroeconomic dynamics.
• We shall encounter this trick in this lecture and also in this lecture.
• An application of a guess and verify method for solving a system of difference equa-
tions.
549
550 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
• The intimate connection between the cases for the optimality of two competing visions
of good ways to organize an economy, namely:
𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡 (1)
𝑡=0
1−𝛾
where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function.
Note that
32.3. THE GROWTH MODEL 551
𝐶𝑡1−𝛾
𝑢(𝐶𝑡 ) = (2)
1−𝛾
𝑇
ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ = ∑ 𝛽 𝑡 {𝑢(𝐶𝑡 ) + 𝜇𝑡 (𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 )}
𝑡=0
𝛼
𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 = 𝑁𝑡 𝐴 ( )
𝑁𝑡
𝛼
𝐾 𝐾
𝑓 ( 𝑡) = 𝐴( 𝑡)
𝑁𝑡 𝑁𝑡
552 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝑁𝑡 𝑓 ( )
𝑁𝑡
𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
=
𝜕𝐾 𝜕𝐾𝑡
𝐾 1
= 𝑁𝑡 𝑓 ′ ( 𝑡 ) (Chain rule)
𝑁𝑡 𝑁𝑡 (6)
𝐾
= 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁 =1
𝑡
= 𝑓 ′ (𝐾𝑡 )
Also
𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡
= (Product rule)
𝜕𝑁 𝜕𝑁𝑡
𝐾 𝐾 −𝐾
= 𝑓 ( 𝑡 ) +𝑁𝑡 𝑓 ′ ( 𝑡 ) 2𝑡 (Chain rule)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝐾 𝐾 𝐾
= 𝑓 ( 𝑡 ) − 𝑡 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁𝑡 𝑁𝑡 𝑁 =1
𝑡
To solve the Lagrangian extremization problem, we compute first derivatives of the La-
grangian and set them equal to 0.
• Note: Our objective function and constraints satisfy conditions that work to assure
that required second-order conditions are satisfied at an allocation that satisfies the
first-order conditions that we are about to compute.
Here are the first order necessary conditions for extremization (i.e., maximization with
respect to 𝐶,⃗ 𝐾,⃗ minimization with respect to 𝜇):
⃗
𝜕𝐹
Note that in (8) we plugged in for 𝜕𝐾 using our formula (6) above.
Because 𝑁𝑡 = 1 for 𝑡 = 1, … , 𝑇 , need not differentiate with respect to those arguments.
Note that (9) comes from the occurrence of 𝐾𝑡 in both the period 𝑡 and period 𝑡− 1 feasibility
constraints.
(10) comes from differentiating with respect to 𝐾𝑇 +1 in the last period and applying the fol-
lowing condition called a Karush-Kuhn-Tucker condition (KKT):
𝜇𝑇 𝐾𝑇 +1 = 0 (11)
Rewriting gives
Taking the inverse of the utility function on both sides of the above equation gives
−1
′−1 𝛽
𝐶𝑡+1 = 𝑢 (( ′ [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)]) )
𝑢 (𝐶𝑡 )
1/𝛾
𝐶𝑡+1 = (𝛽𝐶𝑡𝛾 [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
1/𝛾
= 𝐶𝑡 (𝛽[𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
In [2]: @njit
def u(c, γ):
'''
Utility function
ASIDE: If you have a utility function that is hard to solve by hand
you can use automatic or symbolic differentiation
See https://github.com/HIPS/autograd
'''
if γ == 1:
# If γ = 1 we can show via L'hopital's Rule that the utility
554 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
# becomes log
return np.log(c)
else:
return c**(1 - γ) / (1 - γ)
@njit
def u_prime(c, γ):
'''Derivative of utility'''
if γ == 1:
return 1 / c
else:
return c**(-γ)
@njit
def u_prime_inv(c, γ):
'''Inverse utility'''
if γ == 1:
return c
else:
return c**(-1 / γ)
@njit
def f(A, k, α):
'''Production function'''
return A * k**α
@njit
def f_prime(A, k, α):
'''Derivative of production function'''
return α * A * k**(α - 1)
@njit
def f_prime_inv(A, k, α):
return (k / (A * α))**(1 / (α - 1))
We shall use a shooting method to compute an optimal allocation 𝐶,⃗ 𝐾⃗ and an associated
Lagrange multiplier sequence 𝜇.⃗
The first-order necessary conditions for the planning problem, namely, equations (7), (8), and
(9), form a system of difference equations with two boundary conditions:
• 𝐾0 is a given initial condition for capital
• 𝐾𝑇 +1 = 0 is a terminal condition for capital that we deduced from the first-order
necessary condition for 𝐾𝑇 +1 the KKT condition (11)
We have no initial condition for the Lagrange multiplier 𝜇0 .
If we did, solving for the allocation would be simple:
• Given 𝜇0 and 𝑘0 , we could compute 𝑐0 from equation (7) and then 𝑘1 from equation (9)
and 𝜇1 from equation (8).
• We could then iterate on to compute the remaining elements of 𝐶,⃗ 𝐾,⃗ 𝜇.⃗
But we don’t have an initial condition for 𝜇0 , so this won’t work.
But a simple modification called the shooting algorithm will work.
32.3. THE GROWTH MODEL 555
In [3]: # Parameters
γ = 2
δ = 0.02
β = 0.95
α = 0.33
A = 1
# Initial guesses
T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
# T periods of capital initialized to 0 (T+2 to include t+1 variable as well)
k = np.zeros(T+2)
k[0] = 0.3 # Initial k
c[0] = 0.2 # Guess of c_0
@njit
def shooting_method(c, # Initial consumption
k, # Initial capital
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate on capital# Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A): # Technology
T = len(c) - 1
for t in range(T):
# Equation 1 with inequality
k[t+1] = f(A=A, k=k[t], α=α) + (1 - δ) * k[t] - c[t]
if k[t+1] < 0: # Ensure nonnegativity
k[t+1] = 0
return c, k
paths = shooting_method(c, k, γ, δ, β, α, A)
ax.scatter(T+1, 0, s=80)
ax.axvline(T+1, color='k', ls='--', lw=1)
plt.tight_layout()
plt.show()
Evidently, our initial guess for 𝜇0 is too high and makes initial consumption is too low.
We know this because we miss our 𝐾𝑇 +1 = 0 target on the high side.
Now we automate things with a search-for-a-good 𝜇0 algorithm that stops when we hit the
target 𝐾𝑡+1 = 0.
The search procedure is to use a bisection method.
Here is how we apply the bisection method.
We take an initial guess for 𝐶0 (we can eliminate 𝜇0 because 𝐶0 is an exact function of 𝜇0 ).
We know that the lowest 𝐶0 can ever be is 0 and the largest it can be is initial output 𝑓(𝐾0 ).
We take a 𝐶0 guess and shoot forward to 𝑇 + 1.
If the 𝐾𝑇 +1 > 0, let it be our new lower bound on 𝐶0 .
If 𝐾𝑇 +1 < 0, let it be our new upper bound.
32.3. THE GROWTH MODEL 557
Make a new guess for 𝐶0 exactly halfway between our new upper and lower bounds.
Shoot forward again and iterate the procedure.
When 𝐾𝑇 +1 gets close enough to 0 (within some error tolerance bounds), stop and declare
victory.
In [4]: @njit
def bisection_method(c,
k,
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A, # Technology
tol=1e-4,
max_iter=1e4,
terminal=0): # Value we are shooting towards
T = len(c) - 1
i = 1 # Initial iteration
c_high = f(k=k[0], α=α, A=A) # Initial high value of c
c_low = 0 # Initial low value of c
μ = u_prime(c=path_c, γ=γ)
return path_c, path_k, μ
In [5]: T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
558 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
paths = bisection_method(c, k, γ, δ, β, α, A)
T = len(paths[0])
if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))
plot_paths(paths)
𝑓(𝐾)̄ − 𝛿 𝐾̄ = 𝐶 ̄ (13)
32.3. THE GROWTH MODEL 559
𝑢′ (𝐶)̄ ′ ̄
1=𝛽 [𝑓 (𝐾) + (1 − 𝛿)]
𝑢′ (𝐶)̄
1
Defining 𝛽 = 1+𝜌 , and cancelling gives
Simplifying gives
𝑓 ′ (𝐾)̄ = 𝜌 + 𝛿
and
𝐾̄ = 𝑓 ′−1 (𝜌 + 𝛿)
𝛼𝐾̄ 𝛼−1 = 𝜌 + 𝛿
Finally, using 𝛼 = .33, 𝜌 = 1/𝛽 − 1 = 1/(19/20) − 1 = 20/19 − 19/19 = 1/19, 𝛿 = 1/50, we get
67
33 100
𝐾̄ = ( 1
100
1 ) ≈ 9.57583
50 + 19
Let’s verify this with Python and then use this steady state 𝐾̄ as our initial capital stock 𝐾0 .
In [6]: ρ = 1 / β - 1
k_ss = f_prime_inv(k=ρ+δ, A=A, α=α)
Now we plot
In [7]: T = 150
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss # Start at steady state
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss)
Evidently, in this economy with a large value of 𝑇 , 𝐾𝑡 stays near its initial value at the until
the end of time approaches closely.
Evidently, the planner likes the steady state capital stock and wants to stay near there for a
long time.
Let’s see what happens when we push the initial 𝐾0 below 𝐾.̄
plot_paths(paths, ss=k_ss)
Notice how the planner pushes capital toward the steady state, stays near there for a while,
then pushes 𝐾𝑡 toward the terminal value 𝐾𝑇 +1 = 0 as 𝑡 gets close to 𝑇 .
The following graphs compare outcomes as we vary 𝑇 .
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)
32.3. THE GROWTH MODEL 561
The following calculation shows that when we set 𝑇 very large the planner makes the capital
stock spend most of its time close to its steady state value.
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)
The different colors in the above graphs are tied to outcomes with different horizons 𝑇 .
Notice that as the horizon increases, the planner puts 𝐾𝑡 closer to the steady state value 𝐾̄
for longer.
This pattern reflects a turnpike property of the steady state.
A rule of thumb for the planner is
562 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
• for whatever 𝐾0 you start with, push 𝐾𝑡 toward the steady state and stay there for as
long as you can
In loose language: head for the turnpike and stay near it for as long as you can.
As we drive 𝑇 toward +∞, the planner keeps 𝐾𝑡 very close to its steady state for all dates
after some transition toward the steady state.
𝑓(𝐾𝑡 )−𝐶𝑡
The planner makes the saving rate 𝑓(𝐾𝑡 ) vary over time.
Let’s calculate it
In [11]: @njit
def S(K):
'''Aggregate savings'''
T = len(K) - 2
S = np.zeros(T+1)
for t in range(T+1):
S[t] = K[t+1] - (1 - δ) * K[t]
return S
@njit
def s(K):
'''Savings rate'''
T = len(K) - 2
Y = f(A, K, α)
Y = Y[0:T+1]
s = S(K) / Y
return s
T = len(paths[0])
k_star = paths[1]
savings_path = s(k_star)
new_paths = (paths[0], paths[1], savings_path)
if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_savings(paths, k_ss=k_ss, axes=axes)
𝛿 𝐾̄
𝑠̄ =
𝑓(𝐾)̄
In [12]: T = 130
# Steady states
S_ss = δ * k_ss
c_ss = f(A, k_ss, α) - S_ss
s_ss = S_ss / f(A, k_ss, α)
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3 # Start below steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)
32.3.5 Exercise
• Plot the optimal consumption, capital, and savings paths when the initial capital level
begins at 1.5 times the steady state level as we shoot towards the steady state at 𝑇 =
130.
• Why does the savings rate respond like it does?
32.3.6 Solution
In [13]: T = 130
32.4. COMPETITIVE EQUILIBRIUM 565
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss * 1.5 # Start above steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)
Next, we study a decentralized version of an economy with the same technology and prefer-
ence structure as our planned economy.
But now there is no planner.
Market prices adjust to reconcile distinct decisions that are made separately by a representa-
tive household and a representative firm.
The technology for producing goods and accumulating capital via physical investment re-
mains as in our planned economy.
There is a representative consumer who has the same preferences over consumption plans as
did the consumer in the planned economy.
Instead of being told what to consume and save by a planner, the household chooses for itself
subject to a budget constraint
• At each time 𝑡, the household receives wages and rentals of capital from a firm – these
comprise its income at time 𝑡.
• The consumer decides how much income to allocate to consumption or to savings.
• The household can save either by acquiring additional physical capital (it trades one
for one with time 𝑡 consumption) or by acquiring claims on consumption at dates other
than 𝑡.
• A utility-maximizing household owns all physical capital and labor and rents them to
the firm.
• The household consumes, supplies labor, and invests in physical capital.
• A profit-maximizing representative firm operates the production technology.
• The firm rents labor and capital each period from the representative household and sells
its output each period to the household.
• The representative household and the representative firm are both price takers:
– they (correctly) believe that prices are not affected by their choices
566 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
Note: We are free to think of there being a large number 𝑀 of identical representative con-
sumers and 𝑀 identical representative firms.
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘̃ 𝑡
𝐹𝑘 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝜂𝑡
and
𝐹𝑛 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝑤𝑡 (14)
𝜕𝐹 ̃ 𝜕𝐹
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) =chain rule 𝑘𝑡 + 𝑛̃
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
𝜕𝐹 ̃ 𝜕𝐹
𝑘𝑡 + 𝑛̃ − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡
or
32.4. COMPETITIVE EQUILIBRIUM 567
𝜕𝐹 𝜕𝐹
( − 𝜂𝑡 ) 𝑘̃ 𝑡 + ( − 𝑤𝑡 ) 𝑛̃ 𝑡
̃
𝜕 𝑘𝑡 𝜕 𝑛̃ 𝑡
𝜕𝐹 𝜕𝐹
Because 𝐹 is homogeneous of degree 1, it follows that 𝜕 𝑘̃ 𝑡
and 𝜕 𝑛̃ 𝑡 are homogeneous of degree
0 and therefore fixed with respect to 𝑘̃ 𝑡 and 𝑛̃ 𝑡 .
If 𝜕𝐹
𝜕 𝑘̃ 𝑡
> 𝜂𝑡 , then the firm makes positive profits on each additional unit of 𝑘̃ 𝑡 , so it will want
to make 𝑘̃ 𝑡 arbitrarily large.
But setting 𝑘̃ 𝑡 = +∞ is not physically feasible, so presumably equilibrium prices will assume
values that present the firm with no such arbitrage opportunity.
𝜕𝐹
A related argument applies if 𝜕 𝑛̃ 𝑡 > 𝑤𝑡 .
𝜕 𝑘̃ 𝑡
If 𝜕 𝑘̃ 𝑡
< 𝜂𝑡 , the firm will set 𝑘̃ 𝑡 to zero.
𝑤𝑡 1 + 𝜂 𝑡 𝑘 𝑡
Here (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) is the household’s net investment in physical capital and 𝛿 ∈ (0, 1) is
again a depreciation rate of capital.
In period 𝑡 is free to purchase more goods to be consumed and invested in physical capital
than its income from supplying capital and labor to the firm, provided that in some other
periods its income exceeds its purchases.
A household’s net excess demand for time 𝑡 consumption goods is the gap
There is a single grand competitive market in which a representative household can trade
date 0 goods for goods at all other dates 𝑡 = 1, 2, … , 𝑇 .
What matters are not bilateral trades of the good at one date 𝑡 for the good at another date
𝑡 ̃ ≠ 𝑡.
Instead, think of there being multilateral and multitemporal trades in which bundles of
goods at some dates can be traded for bundles of goods at some other dates.
There exist complete markets in such bundles with associated market prices.
Because 𝑞𝑡0 is a relative price, the units in terms of which prices are quoted are arbitrary –
we can normalize them without substantial consequence.
If we use the price vector {𝑞𝑡0 }𝑇𝑡=0 to evaluate a stream of excess demands {𝑒𝑡 }𝑇𝑡=0 we compute
𝑇
the present value of {𝑒𝑡 }𝑇𝑡=0 to be ∑𝑡=0 𝑞𝑡0 𝑒𝑡 .
That the market is multitemporal is reflected in the situation that the household faces a
single budget constraint.
It states that the present value of the household’s net excess demands must be zero:
𝑇
∑ 𝑞𝑡0 𝑒𝑡 ≤ 0
𝑡=0
or
𝑇
∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )) ≤ 0
𝑡=0
𝑇
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐,⃗ 𝑘⃗ 𝑡=0
𝑇
subject to ∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 ) ≤ 0
𝑡=0
32.4. COMPETITIVE EQUILIBRIUM 569
32.4.6 Definitions
We have computed an allocation {𝐶,⃗ 𝐾,⃗ 1}⃗ that solves the planning problem.
We use that allocation to construct our guess for the equilibrium price system.
In particular, we guess that for 𝑡 = 0, … , 𝑇 :
𝜂𝑡 = 𝑓 ′ (𝐾𝑡 ) (17)
and so on.
If our guess for the equilibrium price system is correct, then it must occur that
𝑘𝑡∗ = 𝑘̃ 𝑡∗ (19)
1 = 𝑛̃ ∗𝑡 (20)
∗
𝑐𝑡∗ + 𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡∗ = 𝐹 (𝑘̃ 𝑡∗ , 𝑛̃ ∗𝑡 )
570 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
We shall verify that for 𝑡 = 0, … , 𝑇 the allocations chosen by the household and the firm both
equal the allocation that solves the planning problem:
Our approach is to stare at first-order necessary conditions for the optimization problems of
the household and the firm.
At the price system we have guessed, both sets of first-order conditions are satisfied at the
allocation that solves the planning problem.
To solve the household’s problem, we formulate the appropriate Lagrangian and pose the
min-max problem:
𝑇 𝑇
min max ℒ(𝑐,⃗ 𝑘,⃗ 𝜆) = ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) + 𝜆 (∑ 𝑞𝑡0 (((1 − 𝛿)𝑘𝑡 − 𝑤𝑡 ) + 𝜂𝑡 𝑘𝑡 − 𝑐𝑡 − 𝑘𝑡+1 ))
𝜆 𝑐,⃗ 𝑘⃗ 𝑡=0 𝑡=0
𝑇
𝜆∶ (∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 )) ≤ 0 (24)
𝑡=0
Now we plug in for our guesses of prices and derive all the FONC of the planner problem (7)-
(10):
Combining (22) and (15), we get:
𝑢′ (𝐶𝑡 ) = 𝜇𝑡
which is (7).
Combining (23), (15), and (17) we get:
Rewriting (26) by dividing by 𝜆 on both sides (which is nonzero due to u’>0) we get:
32.4. COMPETITIVE EQUILIBRIUM 571
or
which is (8).
Combining (24), (15), (16) and (17) after multiplying both sides of (24) by 𝜆, we get:
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + (𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 ) − 𝑓(𝐾𝑡 ) + 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 ) ≤ 0
𝑡=0
Cancelling,
𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1)) ≤ 0
𝑡=0
Since 𝛽 𝑡 and 𝜇𝑡 are always positive here, (excepting perhaps the T+1 period) we get:
which is (9).
Combining (25) and (15), we get:
−𝛽 𝑇 +1 𝜇𝑇 +1 ≤ 0
−𝜇𝑇 +1 ≤ 0
𝜕𝐹 (𝐾𝑡 , 1)
= 𝑓 ′ (𝐾𝑡 ) = 𝜂𝑡
𝜕𝐾𝑡
which is (17).
If we now plug (21) into (14) for all t, we get:
𝜕𝐹 (𝐾̃ 𝑡 , 1)
= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 = 𝑤𝑡
𝜕 𝐿̃
572 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
In [14]: @njit
def q_func(β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c) - 2
q = np.zeros(T+1)
q[0] = 1
for t in range(1, T+2):
q[t] = β**t * u_prime(c[t], γ)
return q
@njit
def w_func(A, k, α):
w = f(A, k, α) - k * f_prime(A, k, α)
return w
@njit
def η_func(A, k, α):
η = f_prime(A, k, α)
return η
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]
32.4. COMPETITIVE EQUILIBRIUM 573
plt.tight_layout()
plt.show()
Varying Curvature
Now we see how our results change if we keep 𝑇 constant, but allow the curvature parameter,
𝛾 to vary, starting with 𝐾0 below the steady state.
We plot the results for 𝑇 = 150
for γ in γ_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
574 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]
axes[0, 0].legend()
plt.tight_layout()
plt.show()
Now, we compute Hicks-Arrow prices again, but also calculate the implied yields to maturity.
This will let us plot a yield curve.
The key formulas are:
The yield to maturity
32.4. COMPETITIVE EQUILIBRIUM 575
𝑡
log 𝑞𝑡 0
𝑟𝑡0 ,𝑡 = −
𝑡 − 𝑡0
−𝛾
𝑡 𝑢′ (𝑐𝑡 ) 𝑡−𝑡0 𝑐𝑡
𝑞𝑡 0 = 𝛽 𝑡−𝑡0 = 𝛽
𝑢′ (𝑐𝑡0 ) 𝑐𝑡−𝛾
0
We redefine our function for 𝑞 to allow arbitrary base years, and define a new function for 𝑟,
then plot both.
First, we plot when 𝑡0 = 0 as before, for different values of 𝑇 , with 𝐾0 below the steady state
In [17]: @njit
def q_func(t_0, β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c)
q = np.zeros(T+1-t_0)
q[0] = 1
for t in range(t_0+1, T):
q[t-t_0] = β**(t - t_0) * u_prime(c[t], γ) / u_prime(c[t_0], γ)
return q
@njit
def r_func(t_0, β, c, γ):
'''Yield to maturity'''
T = len(c) - 1
r = np.zeros(T+1-t_0)
for t in range(t_0+1, T+1):
r[t-t_0]= -np.log(q_func(t_0, β, c, γ)[t-t_0]) / (t - t_0)
return r
t_0 = 0
T_list = [150, 75, 50]
γ = 2
titles = ['Hicks-Arrow Prices', 'Yields']
ylabels = ['$q_t^0$', '$r_t^0$']
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)
plt.tight_layout()
plt.show()
576 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
In [18]: t_0 = 20
for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)
We shall have more to say about the term structure of interest rates in a later lecture on the
topic.
578 CHAPTER 32. CASS-KOOPMANS OPTIMAL GROWTH MODEL
Chapter 33
33.1 Contents
• Overview 33.2
• The Basic Idea 33.3
• Convergence 33.4
• Implementation 33.5
• Exercises 33.6
• Solutions 33.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
33.2 Overview
This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [108], section 2.7
• [8]
The second reference presents a comprehensive treatment of the Kalman filter.
Required knowledge: Familiarity with matrix manipulations, multivariate normal distribu-
tions, covariance matrices, etc.
We’ll need the following imports:
579
580 CHAPTER 33. A FIRST LOOK AT THE KALMAN FILTER
The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists.
A missile has been launched from country Y and our mission is to track it.
Let 𝑥 ∈ ℝ2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map.
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥.
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸.
The density 𝑝 is called our prior for the random variable 𝑥.
To keep things tractable in our example, we assume that our prior is Gaussian.
In particular, we take
𝑝 = 𝑁 (𝑥,̂ Σ) (1)
where 𝑥̂ is the mean of the distribution and Σ is a 2×2 covariance matrix. In our simulations,
we will suppose that
This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥.̂
y = np.matrix([2.3, -1.9]).T
Parameters
----------
x : array_like(float)
Random variable
y : array_like(float)
Random variable
σ_x : array_like(float)
Standard deviation of random variable x
σ_y : array_like(float)
Standard deviation of random variable y
μ_x : scalar(float)
Mean value of random variable x
μ_y : scalar(float)
Mean value of random variable y
σ_xy : array_like(float)
Covariance of random variables x and y
"""
x_μ = x - μ_x
y_μ = y - μ_y
Z = gen_gaussian_plot_vals(x_hat, Σ)
582 CHAPTER 33. A FIRST LOOK AT THE KALMAN FILTER
plt.show()
We are now presented with some good news and some bad news.
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9).
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦
Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
33.3. THE BASIC IDEA 583
Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥.
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via
𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)
𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )
where
𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)
Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden object
𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥.̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map.
The original density is left in as contour lines for comparison
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
plt.show()
33.3. THE BASIC IDEA 585
Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥.̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in (2).
Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time.
586 CHAPTER 33. A FIRST LOOK AT THE KALMAN FILTER
In view of (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work out
the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄).
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian.
Elementary calculations and the expressions in (4) tell us that
and
The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain.
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥.̂
Using this notation, we can summarize our results as follows.
Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where
𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄
• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution
The predictive distribution is the new density shown in the following figure, where the update
has used parameters.
1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2
# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs3, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
33.3. THE BASIC IDEA 587
plt.show()
𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄
These are the standard dynamic equations for the Kalman filter (see, for example, [108], page
58).
33.4 Convergence
33.5 Implementation
The class Kalman from the QuantEcon.py package implements the Kalman filter
• Instance data consists of:
– the moments (𝑥𝑡̂ , Σ𝑡 ) of the current prior.
– An instance of the LinearStateSpace class from QuantEcon.py.
The latter represents a linear state space model of the form
𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′
• The class Kalman from the QuantEcon.py package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures.
• Methods pertinent for this lecture are:
– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 )
– filtered_to_forecast, which updates the filtering distribution to the predic-
tive distribution – which becomes the new prior (𝑥𝑡+1
̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to (9) and the correspond-
ing (stationary) Kalman gain
You can view the program on GitHub.
33.6 Exercises
33.6.1 Exercise 1
Consider the following simple application of the Kalman filter, loosely based on [108], section
2.9.2.
Suppose that
• all variables are scalars
• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ ℝ unknown to the modeler
State dynamics are therefore given by (5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃.
The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and IID.
The task of this exercise to simulate the model and, using the code from kalman.py, plot
the first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
As shown in [108], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃.
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1.
Your figure should – modulo randomness – look something like this
590 CHAPTER 33. A FIRST LOOK AT THE KALMAN FILTER
33.6.2 Exercise 2
The preceding figure gives some support to the idea that probability mass converges to 𝜃.
To get a better idea, choose a small 𝜖 > 0 and calculate
𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖
for 𝑡 = 0, 1, 2, … , 𝑇 .
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600.
Your figure should show error erratically declining something like this
33.6. EXERCISES 591
33.6.3 Exercise 3
As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 ).
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1 .
This competitor will use the conditional expectation 𝔼[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1 .
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error.
(More precisely, the minimizer of 𝔼 ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶= 𝔼[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error.
Our horse race will be assessed in terms of squared error.
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50.
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity.
Set
0.5 0.4
𝐴=( )
0.6 0.3
0.9 0.3
Σ0 = ( )
0.3 0.9
Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state.
33.6.4 Exercise 4
33.7 Solutions
33.7.1 Exercise 1
In [7]: # Parameters
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
33.7. SOLUTIONS 593
# Set up plot
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ - 5, θ + 2, 200)
for i in range(N):
# Record the current predicted mean and variance
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# Plot, update filter
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)),�
↪label=f'$t={i}$')
kalman.update(y[i])
33.7.2 Exercise 2
In [8]: ϵ = 0.1
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)
T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()
for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]
kalman.update(y[t])
33.7.3 Exercise 3
In [9]: # Define A, C, G, H
G = np.identity(2)
H = np.sqrt(0.5) * np.identity(2)
A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)
# Print eigenvalues of A
print("Eigenvalues of A:")
print(eigvals(A))
# Print stationary Σ
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)
e1 = np.empty(T-1)
e2 = np.empty(T-1)
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k-', lw=2, alpha=0.6,
label='Kalman filter error')
ax.plot(range(1, T), e2, 'g-', lw=2, alpha=0.6,
label='Conditional expectation error')
ax.legend()
plt.show()
Eigenvalues of A:
[ 0.9+0.j -0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
596 CHAPTER 33. A FIRST LOOK AT THE KALMAN FILTER
[0.1050718 0.41061709]]
Footnotes
[1] See, for example, page 93 of [25]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
Chapter 34
34.1 Contents
This lecture uses the Kalman filter to reformulate John F. Muth’s first paper [120] about ra-
tional expectations.
Muth used classical prediction methods to reverse engineer a stochastic process that renders
optimal Milton Friedman’s [56] “adaptive expectations” scheme.
Milton Friedman [56] (1956) posited that consumer’s forecast their future disposable income
with the adaptive expectations scheme
∞
∗
𝑦𝑡+𝑖,𝑡 = 𝐾 ∑(1 − 𝐾)𝑗 𝑦𝑡−𝑗 (1)
𝑗=0
∗
where 𝐾 ∈ (0, 1) and 𝑦𝑡+𝑖,𝑡 is a forecast of future 𝑦 over horizon 𝑖.
597
598 CHAPTER 34. REVERSE ENGINEERING A LA MUTH
Milton Friedman justified the exponential smoothing forecasting scheme (1) informally,
noting that it seemed a plausible way to use past income to forecast future income.
In his first paper about rational expectations, John F. Muth [120] reverse-engineered a uni-
variate stochastic process {𝑦𝑡 }∞
𝑡=−∞ for which Milton Friedman’s adaptive expectations
scheme gives linear least forecasts of 𝑦𝑡+𝑗 for any horizon 𝑖.
Muth sought a setting and a sense in which Friedman’s forecasting scheme is optimal.
That is, Muth asked for what optimal forecasting question is Milton Friedman’s adaptive
expectation scheme the answer.
Muth (1960) used classical prediction methods based on lag-operators and 𝑧-transforms to
find the answer to his question.
Please see lectures Classical Control with Linear Algebra and Classical Filtering and Predic-
tion with Linear Algebra for an introduction to the classical tools that Muth used.
Rather than using those classical tools, in this lecture we apply the Kalman filter to express
the heart of Muth’s analysis concisely.
The lecture First Look at Kalman Filter describes the Kalman filter.
We’ll use limiting versions of the Kalman filter corresponding to what are called stationary
values in that lecture.
Suppose that an observable 𝑦𝑡 is the sum of an unobserved random walk 𝑥𝑡 and an IID shock
𝜖2,𝑡 :
𝑥𝑡+1 = 𝑥𝑡 + 𝜎𝑥 𝜖1,𝑡+1
(2)
𝑦𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡
where
𝜖
[ 1,𝑡+1 ] ∼ 𝒩(0, 𝐼)
𝜖2,𝑡
is an IID process.
Note: A property of the state-space representation (2) is that in general neither 𝜖1,𝑡 nor 𝜖2,𝑡
is in the space spanned by square-summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , ….
𝜖
In general [ 1,𝑡 ] has more information about future 𝑦𝑡+𝑗 ’s than is contained in 𝑦𝑡 , 𝑦𝑡−1 , ….
𝜖2𝑡
We can use the asymptotic or stationary values of the Kalman gain and the one-step-ahead
conditional state covariance matrix to compute a time-invariant innovations representation
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾𝑎𝑡
(3)
𝑦𝑡 = 𝑥𝑡̂ + 𝑎𝑡
where 𝑥𝑡̂ = 𝐸[𝑥𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …] and 𝑎𝑡 = 𝑦𝑡 − 𝐸[𝑦𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …].
Note: A key property about an innovations representation is that 𝑎𝑡 is in the space spanned
by square summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , ….
34.2. FRIEDMAN (1956) AND MUTH (1960) 599
For more ramifications of this property, see the lectures Shock Non-Invertibility and Recursive
Models of Dynamic Linear Economies.
Later we’ll stack these state-space systems (2) and (3) to display some classic findings of
Muth.
But first, let’s create an instance of the state-space system (2) then apply the quantecon
Kalman class, then uses it to construct the associated “innovations representation”
Now we want to map the time-invariant innovations representation (3) and the original state-
space system (2) into a convenient form for deducing the impulse responses from the original
shocks to the 𝑥𝑡 and 𝑥𝑡̂ .
Putting both of these representations into a single state-space system is yet another applica-
tion of the insight that “finding the state is an art”.
We’ll define a state vector and appropriate state-space matrices that allow us to represent
both systems in one fell swoop.
Note that
𝑎𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂
so that
𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾(𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂ )
= (1 − 𝐾)𝑥𝑡̂ + 𝐾𝑥𝑡 + 𝐾𝜎𝑦 𝜖2,𝑡
𝑥𝑡+1 1 0 0 𝑥𝑡 𝜎𝑥 0
⎡ 𝑥̂ ⎤ = ⎡𝐾 (1 − 𝐾) 𝐾𝜎 ⎤ ⎡ 𝑥̂ ⎤ + ⎡ 0 0⎤ [𝜖1,𝑡+1 ]
⎢ 𝑡+1 ⎥ ⎢ 𝑦⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝜖
⎣𝜖2,𝑡+1 ⎦ ⎣ 0 0 0 ⎦ ⎣𝜖2,𝑡 ⎦ ⎣ 0 1⎦ 2,𝑡+1
𝑥
𝑦 1 0 𝜎𝑦 ⎡ 𝑡 ⎤
[ 𝑡] = [ ] ⎢ 𝑥𝑡̂ ⎥
𝑎𝑡 1 −1 𝜎𝑦
⎣𝜖2,𝑡 ⎦
𝜖
is a state-space system that tells us how the shocks [ 1,𝑡+1 ] affect states 𝑥𝑡+1
̂ , 𝑥𝑡 , the observ-
𝜖2,𝑡+1
able 𝑦𝑡 , and the innovation 𝑎𝑡 .
With this tool at our disposal, let’s form the composite system and simulate it
In [4]: # Create grand state-space for y_t, a_t as observed vars -- Use
# stacking trick above
Af = np.array([[ 1, 0, 0],
[K1, 1 - K1, K1 * σ_y],
[ 0, 0, 0]])
Cf = np.array([[σ_x, 0],
[ 0, K1 * σ_y],
[ 0, 1]])
Gf = np.array([[1, 0, σ_y],
[1, -1, σ_y]])
Now that we have simulated our joint system, we have 𝑥𝑡 , 𝑥𝑡̂ , and 𝑦𝑡 .
We can now investigate how these variables are related by plotting some key objects.
First, let’s plot the hidden state 𝑥𝑡 and the filtered version 𝑥𝑡̂ that is linear-least squares pro-
jection of 𝑥𝑡 on the history 𝑦𝑡−1 , 𝑦𝑡−2 , …
We see above that 𝑦 seems to look like white noise around the values of 𝑥.
34.2.5 Innovations
Recall that we wrote down the innovation representation that depended on 𝑎𝑡 . We now plot
the innovations {𝑎𝑡 }:
fig, ax = plt.subplots(2)
ax[0].plot(coefs_ma_array, label="MA")
ax[0].legend()
ax[1].plot(coefs_var_array, label="VAR")
ax[1].legend()
plt.show()
604 CHAPTER 34. REVERSE ENGINEERING A LA MUTH
The moving average coefficients in the top panel show tell-tale signs of 𝑦𝑡 being a process
whose first difference is a first-order autoregression.
The autoregressive coefficients decline geometrically with decay rate (1 − 𝐾).
These are exactly the target outcomes that Muth (1960) aimed to reverse engineer
Dynamic Programming
605
Chapter 35
Shortest Paths
35.1 Contents
• Overview 35.2
• Outline of the Problem 35.3
• Finding Least-Cost Paths 35.4
• Solving for Minimum Cost-to-Go 35.5
• Exercises 35.6
• Solutions 35.7
35.2 Overview
The shortest path problem is a classic problem in mathematics and computer science with
applications in
• Economics (sequential decision making, analysis of social networks, etc.)
• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.
Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as
• Google Maps
• routing packets on the internet
For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming.
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site.
The only scientific library we’ll need in what follows is NumPy:
607
608 CHAPTER 35. SHORTEST PATHS
The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost.
Consider the following graph
• A, D, F, G at cost 8
1. Start at node 𝑣 = 𝐴
where
• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step.
• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤.
Hence, if we know the function 𝐽 , then finding the best path is almost trivial.
But how can we find the cost-to-go function 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies
This is known as the Bellman equation, after the mathematician Richard Bellman.
The Bellman equation can be thought of as a restriction that 𝐽 must satisfy.
What we want to do now is use this restriction to compute 𝐽 .
Let’s look at an algorithm for computing 𝐽 and then think about how to implement it.
35.5. SOLVING FOR MINIMUM COST-TO-GO 611
The standard algorithm for finding 𝐽 is to start an initial guess and then iterate.
This is a standard approach to solving nonlinear equations, often called the method of suc-
cessive approximations.
Our initial guess will be
Now
1. Set 𝑛 = 0
35.5.2 Implementation
Having an algorithm is a good start, but we also need to think about how to implement it on
a computer.
First, for the cost function 𝑐, we’ll implement it as a matrix 𝑄, where a typical element is
𝑐(𝑣, 𝑤) if 𝑤 ∈ 𝐹𝑣
𝑄(𝑣, 𝑤) = {
+∞ otherwise
Notice that the cost of staying still (on the principle diagonal) is set to
612 CHAPTER 35. SHORTEST PATHS
max_iter = 500
i = 0
35.6 Exercises
35.6.1 Exercise 1
Note: You will be dealing with floating point numbers now, rather than integers, so consider
replacing np.equal() with np.allclose().
Overwriting graph.txt
35.7 Solutions
35.7.1 Exercise 1
First let’s write a function that reads in the graph data above and builds a distance matrix.
35.7. SOLUTIONS 615
def map_graph_to_distance_matrix(in_file):
infile.close()
return Q
1. a “Bellman operator” function that takes a distance matrix and current guess of J and
returns an updated guess of J, and
def compute_cost_to_go(Q):
J = np.zeros(num_nodes) # Initial guess
next_J = np.empty(num_nodes) # Stores updated guess
max_iter = 500
i = 0
i += 1
return(J)
We used np.allclose() rather than testing exact equality because we are dealing with floating
point numbers now.
Finally, here’s a function that uses the cost-to-go function to obtain the optimal path (and its
cost).
print(destination_node)
print('Cost: ', sum_costs)
Okay, now we have the necessary functions, let’s call them to do the job we were assigned.
In [8]: Q = map_graph_to_distance_matrix('graph.txt')
J = compute_cost_to_go(Q)
print_best_path(J, Q)
0
8
11
18
23
33
41
53
56
57
60
67
70
73
76
85
87
88
93
94
96
97
98
99
Cost: 160.55000000000007
Chapter 36
36.1 Contents
• Overview 36.2
• The McCall Model 36.3
• Computing the Optimal Policy: Take 1 36.4
• Computing the Optimal Policy: Take 2 36.5
• Exercises 36.6
• Solutions 36.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
36.2 Overview
The McCall search model [116] helped transform economists’ way of thinking about labor
markets.
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as
• current and likely future wages
• impatience
• unemployment compensation
To solve the decision problem he used dynamic programming.
617
618 CHAPTER 36. JOB SEARCH I: THE MCCALL SEARCH MODEL
Here we set up McCall’s model and adopt the same solution method.
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehi-
cle for learning dynamic programming.
Let’s start with some imports:
𝑤𝑡 = 𝑤(𝑠𝑡 ) where 𝑠𝑡 ∈ 𝕊
Here you should think of state process {𝑠𝑡 } as some underlying, unspecified random factor
that impacts on wages.
(Introducing an exogenous stochastic state process is a standard way for economists to inject
randomness into their models.)
In this lecture, we adopt the following simple environment:
• {𝑠𝑡 } is IID, with 𝑞(𝑠) being the probability of observing state 𝑠 in 𝕊 at each point in
time, and
• the agent observes 𝑠𝑡 at the start of 𝑡 and hence knows 𝑤𝑡 = 𝑤(𝑠𝑡 ),
• the set 𝕊 is finite.
(In later lectures, we will relax all of these assumptions.)
At time 𝑡, our agent has two choices:
2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period.
The agent is infinitely lived and aims to maximize the expected discounted sum of earnings
∞
𝔼 ∑ 𝛽 𝑡 𝑦𝑡
𝑡=0
36.3.1 A Trade-Off
In order to optimally trade-off current and future rewards, we need to think about two things:
2. the different states that those choices will lead to in next period (in this case, either em-
ployment or unemployment)
To weigh these two aspects of the decision problem, we need to assign values to states.
To this end, let 𝑣∗ (𝑠) be the total lifetime value accruing to an unemployed worker who enters
the current period unemployed when the state is 𝑠 ∈ 𝕊.
In particular, the agent has wage offer 𝑤(𝑠) in hand.
More precisely, 𝑣∗ (𝑠) denotes the value of the objective function (1) when an agent in this sit-
uation makes optimal decisions now and at all future points in time.
Of course 𝑣∗ (𝑠) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible state 𝑠 the maximal lifetime value
that can be obtained with that offer in hand.
A crucial observation is that this function 𝑣∗ must satisfy the recursion
𝑤(𝑠)
𝑣∗ (𝑠) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )} (1)
1−𝛽 𝑠′ ∈𝕊
This important equation is a version of the Bellman equation, which is ubiquitous in eco-
nomic dynamics and other fields involving planning over time.
The intuition behind it is as follows:
• the first term inside the max operation is the lifetime payoff from accepting current of-
fer, since
𝑤(𝑠)
= 𝑤(𝑠) + 𝛽𝑤(𝑠) + 𝛽 2 𝑤(𝑠) + ⋯
1−𝛽
• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current state 𝑠.
But this is precisely 𝑣∗ (𝑠), which is the l.h.s. of (1).
Suppose for now that we are able to solve (1) for the unknown function 𝑣∗ .
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject).
All we have to do is select the maximal choice on the r.h.s. of (1).
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions.
Given any 𝑠, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of (1).
Thus, we have a map from ℝ to {0, 1}, with 1 meaning accept and 0 meaning reject.
We can write the policy as follows
𝑤(𝑠)
𝜎(𝑠) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ )}
1−𝛽 𝑠′ ∈𝕊
𝜎(𝑠) ∶= 1{𝑤(𝑠) ≥ 𝑤}
̄
where
Here 𝑤̄ (called the reservation wage) is a constant depending on 𝛽, 𝑐 and the wage distribu-
tion.
36.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 621
The agent should accept if and only if the current wage offer exceeds the reservation wage.
In view of (2), we can compute this reservation wage if we can compute the value function.
To put the above ideas into action, we need to compute the value function at each possible
state 𝑠 ∈ 𝕊.
Let’s suppose that 𝕊 = {1, … , 𝑛}.
The value function is then represented by the vector 𝑣∗ = (𝑣∗ (𝑖))𝑛𝑖=1 .
In view of (1), this vector satisfies the nonlinear system of equations
𝑤(𝑖)
𝑣∗ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 1≤𝑗≤𝑛
𝑤(𝑖)
𝑣′ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 1≤𝑗≤𝑛
Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣(𝑖) − 𝑣(𝑖′ )|.
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue.
Step 5: return 𝑣.
Let {𝑣𝑘 } denote the sequence genererated by this algorithm.
This sequence converges to the solution to (3) as 𝑘 → ∞, which is the value function 𝑣∗ .
𝑤(𝑖)
(𝑇 𝑣)(𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (5)
1−𝛽 1≤𝑗≤𝑛
(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
The element 𝑣𝑘 in the sequence {𝑣𝑘 } of successive approximations corresponds to 𝑇 𝑘 𝑣.
• This is 𝑇 applied 𝑘 times, starting at the initial guess 𝑣
622 CHAPTER 36. JOB SEARCH I: THE MCCALL SEARCH MODEL
One can show that the conditions of the Banach fixed point theorem are satisfied by 𝑇 on ℝ𝑛 .
One implication is that 𝑇 has a unique fixed point in ℝ𝑛 .
• That is, a unique vector 𝑣 ̄ such that 𝑇 𝑣 ̄ = 𝑣.̄
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is 𝑣∗ .
A second implication of the Banach contraction mapping theorem is that {𝑇 𝑘 𝑣} converges to
the fixed point 𝑣∗ regardless of 𝑣.
36.4.3 Implementation
Our default for 𝑞, the distribution of the state process, will be Beta-binomial.
plt.show()
36.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 623
In [6]: mccall_data = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('w', float64[:]), # array of wage values, w[i] = wage at state i
('q', float64[:]) # array of probabilities
]
Here’s a class that stores the data and computes the value on the right hand side of the Bell-
man equation (4).
Default parameter values are embedded in the class.
In [7]: @jitclass(mccall_data)
class McCallModel:
self.c, self.β = c, β
self.w, self.q = w_default, q_default
Based on these defaults, let’s try plotting the first few approximate value functions in the se-
quence {𝑇 𝑘 𝑣}.
We will start from guess 𝑣 given by 𝑣(𝑖) = 𝑤(𝑖)/(1 − 𝛽), which is the value of accepting at
every given wage.
Here’s a function to implement this:
"""
n = len(mcm.w)
v = mcm.w / (1 - mcm.β)
v_next = np.empty_like(v)
624 CHAPTER 36. JOB SEARCH I: THE MCCALL SEARCH MODEL
for i in range(num_plots):
ax.plot(mcm.w, v, '-', alpha=0.4, label=f"iterate {i}")
# Update guess
for i in range(n):
v_next[i] = mcm.bellman(i, v)
v[:] = v_next # copy contents into v
ax.legend(loc='lower right')
fig, ax = plt.subplots()
ax.set_xlabel('wage')
ax.set_ylabel('value')
plot_value_function_seq(mcm, ax)
plt.show()
You can see that convergence is occuring: successive iterates are getting closer together.
Here’s a more serious iteration effort to compute the limit, which continues until measured
deviation between successive iterates is below tol.
Once we obtain a good approximation to the limit, we will use it to calculate the reservation
wage.
We’ll be using JIT compilation via Numba to turbocharge our loops.
In [10]: @jit(nopython=True)
def compute_reservation_wage(mcm,
max_iter=500,
36.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 625
tol=1e-6):
# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q
n = len(w)
v = w / (1 - β) # initial guess
v_next = np.empty_like(v)
i = 0
error = tol + 1
while i < max_iter and error > tol:
for i in range(n):
v_next[i] = mcm.bellman(i, v)
The next line computes the reservation wage at the default parameters
In [11]: compute_reservation_wage(mcm)
Out[11]: 47.316499710024964
Now we know how to compute the reservation wage, let’s see how it varies with parameters.
In particular, let’s look at what happens when we change 𝛽 and 𝑐.
In [12]: grid_size = 25
R = np.empty((grid_size, grid_size))
for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcm = McCallModel(c=c, β=β)
R[i, j] = compute_reservation_wage(mcm)
plt.colorbar(cs1, ax=ax)
ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)
ax.ticklabel_format(useOffset=False)
plt.show()
As expected, the reservation wage increases both with patience and with unemployment com-
pensation.
The approach to dynamic programming just described is very standard and broadly applica-
ble.
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function.
Let ℎ denote the continuation value:
𝑤(𝑠′ )
𝑣∗ (𝑠′ ) = max { , ℎ}
1−𝛽
𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (7)
𝑠′ ∈𝕊
1−𝛽
𝑤(𝑠′ )
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (8)
𝑠′ ∈𝕊
1−𝛽
In [14]: @jit(nopython=True)
def compute_reservation_wage_two(mcm,
max_iter=500,
tol=1e-5):
# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q
# == First compute h == #
h = np.sum(w * q) / (1 - β)
i = 0
error = tol + 1
while i < max_iter and error > tol:
s = np.maximum(w / (1 - β), h)
h_next = c + β * np.sum(s * q)
error = np.abs(h_next - h)
i += 1
h = h_next
return (1 - β) * h
36.6 Exercises
36.6.1 Exercise 1
Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values
That is, start the agent off as unemployed, compute their reservation wage given the parame-
ters, and then simulate to see how long it takes to accept.
Repeat a large number of times and take the average.
Plot mean unemployment duration as a function of 𝑐 in c_vals.
36.6.2 Exercise 2
The purpose of this exercise is to show how to replace the discrete wage offer distribution
used above with a continuous distribution.
This is a significant topic because many convenient distributions are continuous (i.e., have a
density).
Fortunately, the theory changes little in our simple model.
Recall that ℎ in (6) denotes the value of not accepting a job in this period but then behaving
optimally in all subsequent periods:
To shift to a continuous offer distribution, we can replace (6) by
𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∫ max { , ℎ} 𝑞(𝑠′ )𝑑𝑠′ (10)
1−𝛽
The aim is to solve this nonlinear equation by iteration, and from it obtain the reservation
wage.
Try to carry this out, setting
• the state sequence {𝑠𝑡 } to be IID and standard normal and
• the wage function to be 𝑤(𝑠) = exp(𝜇 + 𝜎𝑠).
36.7. SOLUTIONS 629
You will need to implement a new version of the McCallModel class that assumes a lognor-
mal wage distribution.
Calculate the integral by Monte Carlo, by averaging over a large number of wage draws.
For default parameters, use c=25, β=0.99, σ=0.5, μ=2.5.
Once your code is working, investigate how the reservation wage changes with 𝑐 and 𝛽.
36.7 Solutions
36.7.1 Exercise 1
@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):
np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_default[qe.random.draw(cdf)]
# Stop when the draw is above the reservation wage
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time
@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
return obs.mean()
fig, ax = plt.subplots()
plt.show()
630 CHAPTER 36. JOB SEARCH I: THE MCCALL SEARCH MODEL
36.7.2 Exercise 2
In [16]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_draws', float64[:]) # draws of wages for Monte Carlo
]
@jitclass(mccall_data_continuous)
class McCallModelContinuous:
@jit(nopython=True)
def compute_reservation_wage_continuous(mcmc, max_iter=500, tol=1e-5):
error = np.abs(h_next - h)
i += 1
h = h_next
return (1 - β) * h
In [17]: grid_size = 25
R = np.empty((grid_size, grid_size))
for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcmc = McCallModelContinuous(c=c, β=β)
R[i, j] = compute_reservation_wage_continuous(mcmc)
ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)
ax.ticklabel_format(useOffset=False)
plt.show()
632 CHAPTER 36. JOB SEARCH I: THE MCCALL SEARCH MODEL
Chapter 37
37.1 Contents
• Overview 37.2
• The Model 37.3
• Solving the Model 37.4
• Implementation 37.5
• Impact of Parameters 37.6
• Exercises 37.7
• Solutions 37.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
37.2 Overview
Previously we looked at the McCall job search model [116] as a way of understanding unem-
ployment and worker decisions.
One unrealistic feature of the model is that every job is permanent.
In this lecture, we extend the McCall model by introducing job separation.
Once separation enters the picture, the agent comes to view
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
The other minor addition is that a utility function will be included to make worker prefer-
ences slightly more sophisticated.
We’ll need the following imports
633
634 CHAPTER 37. JOB SEARCH II: SEARCH AND SEPARATION
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑦𝑡 ) (1)
𝑡=0
At this stage the only difference from the baseline model is that we’ve added some flexibility
to preferences by introducing a utility function 𝑢.
It satisfies 𝑢′ > 0 and 𝑢″ < 0.
For now we will drop the separation of state process and wage process that we maintained for
the baseline model.
In particular, we simply suppose that wage offers {𝑤𝑡 } are IID with common distribution 𝑞.
The set of possible wage values is denoted by 𝕎.
(Later we will go back to having a separate state process {𝑠𝑡 } driving random outcomes, since
this formulation is usually convenient in more sophisticated models.)
If currently unemployed, the worker either accepts or rejects the current offer 𝑤𝑡 .
If he accepts, then he begins work immediately at wage 𝑤𝑡 .
If he rejects, then he receives unemployment compensation 𝑐.
The process then repeats.
(Note: we do not allow for job search while employed—this topic is taken up in a later lec-
ture)
We drop time subscripts in what follows and primes denote next period values.
Let
• 𝑣(𝑤𝑒 ) be total lifetime value accruing to a worker who enters the current period em-
ployed with existing wage 𝑤𝑒
• ℎ(𝑤) be total lifetime value accruing to a worker who who enters the current period un-
employed and receives wage offer 𝑤.
Here value means the value of the objective function (1) when the worker makes optimal deci-
sions at all future points in time.
Our first aim is to obtain these functions.
Suppose for now that the worker can calculate the functions 𝑣 and ℎ and use them in his de-
cision making.
Then 𝑣 and ℎ should satisfy
and
Rather than jumping straight into solving these equations, let’s see if we can simplify them
somewhat.
(This process will be analogous to our second pass at the plain vanilla McCall model, where
we simplified the Bellman equation.)
First, let
We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture.
Here this amounts to
2. plug these guesses into the right-hand sides of (5) and (6)
3. update the left-hand sides from this rule and then repeat
37.5 Implementation
In [3]: @njit
def u(c, σ=2.0):
return (c**(1 - σ) - 1) / (1 - σ)
Also, here’s a default wage distribution, based around the BetaBinomial distribution:
Here’s our jitted class for the McCall model with separation.
In [5]: mccall_data = [
('α', float64), # job separation rate
('β', float64), # discount factor
('c', float64), # unemployment compensation
('w', float64[:]), # list of wage values
('q', float64[:]) # pmf of random variable w
]
@jitclass(mccall_data)
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""
v_new = np.empty_like(v)
for i in range(len(w)):
v_new[i] = u(w[i]) + β * ((1 - α) * v[i] + α * d)
Now we iterate until successive realizations are closer together than some small tolerance
level.
We then return the current iterate as an approximate solution.
In [6]: @njit
def solve_model(mcm, tol=1e-5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations
v = v_new
d = d_new
i += 1
return v, d
fig, ax = plt.subplots()
plt.show()
The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed.
640 CHAPTER 37. JOB SEARCH II: SEARCH AND SEPARATION
In [8]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.
v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d
w_bar = np.inf
for i, wage in enumerate(mcm.w):
if v[i] > h:
w_bar = wage
break
return w_bar
Next we will investigate how the reservation wage varies with parameters.
In each instance below, we’ll show you a figure and then ask you to reproduce it in the exer-
cises.
As expected, higher unemployment compensation causes the worker to hold out for higher
wages.
In effect, the cost of continuing job search is reduced.
Again, the results are intuitive: More patient workers will hold out for higher wages.
Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼.
642 CHAPTER 37. JOB SEARCH II: SEARCH AND SEPARATION
Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed.
37.7 Exercises
37.7.1 Exercise 1
In [9]: grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # unemployment compensation
beta_vals = np.linspace(0.8, 0.99, grid_size) # discount factors
alpha_vals = np.linspace(0.05, 0.5, grid_size) # separation rate
37.8 Solutions
37.8.1 Exercise 1
w_bar_vals = np.empty_like(c_vals)
fig, ax = plt.subplots()
37.8. SOLUTIONS 643
for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.legend()
plt.show()
for i, β in enumerate(beta_vals):
mcm.β = β
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
644 CHAPTER 37. JOB SEARCH II: SEARCH AND SEPARATION
for i, α in enumerate(alpha_vals):
mcm.α = α
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
ax.legend()
plt.show()
37.8. SOLUTIONS 645
646 CHAPTER 37. JOB SEARCH II: SEARCH AND SEPARATION
Chapter 38
38.1 Contents
• Overview 38.2
• The Algorithm 38.3
• Implementation 38.4
• Exercises 38.5
• Solutions 38.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
38.2 Overview
In this lecture we again study the McCall job search model with separation, but now with a
continuous wage distribution.
While we already considered continuous wage distributions briefly in the exercises of the first
job search lecture, the change was relatively trivial in that case.
This is because we were able to reduce the problem to solving for a single scalar value (the
continuation value).
Here, with separation, the change is less trivial, since a continuous wage distribution leads to
an uncountably infinite state space.
The infinite state space leads to additional challenges, particularly when it comes to applying
value function iteration (VFI).
These challenges will lead us to modify VFI by adding an interpolation step.
The combination of VFI and this interpolation step is called fitted value function itera-
tion (fitted VFI).
Fitted VFI is very common in practice, so we will take some time to work through the de-
tails.
647
648 CHAPTER 38. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64, int32
The model is the same as the McCall model with job separation we studied before, except
that the wage offer distribution is continuous.
We are going to start with the two Bellman equations we obtained for the model with job
separation after a simplifying transformation.
Modified to accommodate continuous wage draws, they take the following form:
and
1. in (1), what used to be a sum over a finite number of wage values is an integral over an
infinite set.
2. Plug 𝑣, 𝑑 into the right hand side of (1)–(2) and compute the left hand side to obtain
updates 𝑣′ , 𝑑′
3. Unless some stopping condition is satisfied, set (𝑣, 𝑑) = (𝑣′ , 𝑑′ ) and go to step 2.
38.3. THE ALGORITHM 649
However, there is a problem we must confront before we implement this procedure: The iter-
ates of the value function can neither be calculated exactly nor stored on a computer.
To see the issue, consider (2).
Even if 𝑣 is a known function, the only way to store its update 𝑣′ is to record its value 𝑣′ (𝑤)
for every 𝑤 ∈ ℝ+ .
Clearly, this is impossible.
1. Begin with an array v representing the values of an initial guess of the value function on
some grid points {𝑤𝑖 }.
3. Obtain and record the samples of the updated function 𝑣′ (𝑤𝑖 ) on each grid point 𝑤𝑖 .
4. Unless some stopping condition is satisfied, take this as the new array and go to step 1.
1. combines well with value function iteration (see., e.g., [64] or [148]) and
return y1 + 2.5
c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)
def Af(x):
return interp(c_grid, f(c_grid), x)
fig, ax = plt.subplots()
ax.legend(loc="upper center")
38.4 Implementation
The first step is to build a jitted class for the McCall model with separation and a continuous
wage offer distribution.
We will take the utility function to be the log function for this application, with 𝑢(𝑐) = ln 𝑐.
We will adopt the lognormal distribution for wages, with 𝑤 = exp(𝜇 + 𝜎𝑧) when 𝑧 is standard
normal and 𝜇, 𝜎 are parameters.
In [4]: @njit
def lognormal_draws(n=1000, μ=2.5, σ=0.5, seed=1234):
38.4. IMPLEMENTATION 651
np.random.seed(seed)
z = np.random.randn(n)
w_draws = np.exp(μ + σ * z)
return w_draws
In [5]: mccall_data_continuous = [
('c', float64), # unemployment compensation
('α', float64), # job separation rate
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_grid', float64[:]), # grid of points for fitted VFI
('w_draws', float64[:]) # draws of wages for Monte Carlo
]
@jitclass(mccall_data_continuous)
class McCallModelContinuous:
def __init__(self,
c=1,
α=0.1,
β=0.96,
grid_min=1e-10,
grid_max=5,
grid_size=100,
w_draws=lognormal_draws()):
# Simplify names
c, α, β, σ, μ = self.c, self.α, self.β, self.σ, self.μ
w = self.w_grid
u = lambda x: np.log(x)
# Update v
v_new = u(w) + β * ((1 - α) * v + α * d)
In [6]: @njit
def solve_model(mcm, tol=1e-5, max_iter=2000):
652 CHAPTER 38. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
"""
Iterates to convergence on the Bellman equations
return v, d
In [7]: @njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.
v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d
w_bar = np.inf
for i, wage in enumerate(mcm.w_grid):
if v[i] > h:
w_bar = wage
break
return w_bar
The exercises ask you to explore the solution and how it changes with parameters.
38.5. EXERCISES 653
38.5 Exercises
38.5.1 Exercise 1
Use the code above to explore what happens to the reservation wage when the wage parame-
ter 𝜇 changes.
Use the default parameters and 𝜇 in mu_vals = np.linspace(0.0, 2.0, 15)
Is the impact on the reservation wage as you expected?
38.5.2 Exercise 2
38.6 Solutions
38.6.1 Exercise 1
fig, ax = plt.subplots()
for i, m in enumerate(mu_vals):
mcm.w_draws = lognormal_draws(μ=m)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
plt.show()
654 CHAPTER 38. JOB SEARCH III: FITTED VALUE FUNCTION ITERATION
Not surprisingly, the agent is more inclined to wait when the distribution of offers shifts to
the right.
38.6.2 Exercise 2
fig, ax = plt.subplots()
for i, s in enumerate(s_vals):
a, b = m - s, m + s
mcm.w_draws = np.random.uniform(low=a, high=b, size=10_000)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar
ax.legend()
plt.show()
38.6. SOLUTIONS 655
39.1 Contents
• Overview 39.2
• The Model 39.3
• Implementation 39.4
• Unemployment Duration 39.5
• Exercises 39.6
• Solutions 39.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
39.2 Overview
In this lecture we solve a McCall style job search model with persistent and transitory com-
ponents to wages.
In other words, we relax the unrealistic assumption that randomness in wages is independent
over time.
At the same time, we will go back to assuming that jobs are permanent and no separation
occurs.
This is to keep the model relatively simple as we study the impact of correlation.
We will use the following imports:
import quantecon as qe
from interpolation import interp
from numpy.random import randn
from numba import njit, jitclass, prange, float64
657
658 CHAPTER 39. JOB SEARCH IV: CORRELATED WAGE OFFERS
𝑤𝑡 = exp(𝑧𝑡 ) + 𝑦𝑡
where
Here {𝜁𝑡 } and {𝜖𝑡 } are both IID and standard normal.
Here {𝑦𝑡 } is a transitory component and {𝑧𝑡 } is persistent.
As before, the worker can either
𝑢(𝑤)
𝑣∗ (𝑤, 𝑧) = max { , 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )}
1−𝛽
In this express, 𝑢 is a utility function and 𝔼𝑧 is expectation of next period variables given cur-
rent 𝑧.
The variable 𝑧 enters as a state in the Bellman equation because its current value helps pre-
dict future wages.
39.3.1 A Simplification
There is a way that we can reduce dimensionality in this problem, which greatly accelerates
computation.
To start, let 𝑓 ∗ be the continuation value function, defined by
𝑢(𝑤) ∗
𝑣∗ (𝑤, 𝑧) = max { , 𝑓 (𝑧)}
1−𝛽
Combining the last two expressions, we see that the continuation value function satisfies
𝑢(𝑤′ ) ∗ ′
𝑓 ∗ (𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓 (𝑧 )}
1−𝛽
𝑢(𝑤′ )
𝑄𝑓(𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓(𝑧 ′ )}
1−𝛽
𝑢(𝑤)
≥ 𝑓 ∗ (𝑧)
1−𝛽
𝑤(𝑧)
̄ ∶= exp(𝑓 ∗ (𝑧)(1 − 𝛽)) (1)
Our main aim is to solve for the reservation rule and study its properties and implications.
39.4 Implementation
In [3]: job_search_data = [
('μ', float64), # transient shock log mean
('s', float64), # transient shock log variance
('d', float64), # shift coefficient of persistent state
('ρ', float64), # correlation coefficient of persistent state
('σ', float64), # state volatility
('β', float64), # discount factor
('c', float64), # unemployment compensation
('z_grid', float64[:]), # grid over the state space
('e_draws', float64[:,:]) # Monte Carlo draws for integration
]
Here’s a class that stores the data and the right hand side of the Bellman equation.
660 CHAPTER 39. JOB SEARCH IV: CORRELATED WAGE OFFERS
In [4]: @jitclass(job_search_data)
class JobSearch:
def __init__(self,
μ=0.0, # transient shock log mean
s=1.0, # transient shock log variance
d=0.0, # shift coefficient of persistent state
ρ=0.9, # correlation coefficient of persistent state
σ=0.1, # state volatility
β=0.98, # discount factor
c=5, # unemployment compensation
mc_size=1000,
grid_size=100):
# Set up grid
z_mean = d / (1 - ρ)
z_sd = np.sqrt(σ / (1 - ρ**2))
k = 3 # std devs from mean
a, b = z_mean - k * z_sd, z_mean + k * z_sd
self.z_grid = np.linspace(a, b, grid_size)
def parameters(self):
"""
Return all parameters as a tuple.
"""
return self.μ, self.s, self.d, \
self.ρ, self.σ, self.β, self.c
In [5]: @njit(parallel=True)
def Q(js, f_in, f_out):
"""
Apply the operator Q.
* js is an instance of JobSearch
* f_in and f_out are arrays that represent f and Qf respectively
"""
μ, s, d, ρ, σ, β, c = js.parameters()
M = js.e_draws.shape[1]
for i in prange(len(js.z_grid)):
z = js.z_grid[i]
expectation = 0.0
for m in range(M):
e1, e2 = js.e_draws[:, m]
39.4. IMPLEMENTATION 661
z_next = d + ρ * z + σ * e1
go_val = interp(js.z_grid, f_in, z_next) # f(z')
y_next = np.exp(μ + s * e2) # y' draw
w_next = np.exp(z_next) + y_next # w' draw
stop_val = np.log(w_next) / (1 - β)
expectation += max(stop_val, go_val)
expectation = expectation / M
f_out[i] = np.log(c) + β * expectation
# Set up loop
f_in = f_init
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return f_out
In [7]: js = JobSearch()
qe.tic()
f_star = compute_fixed_point(js, verbose=True)
qe.toc()
Out[7]: 6.022047758102417
Next we will compute and plot the reservation wage function defined in (1).
fig, ax = plt.subplots()
ax.plot(js.z_grid, res_wage_function, label="reservation wage given $z$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()
In [9]: c_vals = 1, 2, 3
fig, ax = plt.subplots()
for c in c_vals:
39.5. UNEMPLOYMENT DURATION 663
js = JobSearch(c=c)
f_star = compute_fixed_point(js, verbose=False)
res_wage_function = np.exp(f_star * (1 - js.β))
ax.plot(js.z_grid, res_wage_function, label=f"$\\bar w$ at $c = {c}$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()
As expected, higher unemployment compensation shifts the reservation wage up at all state
values.
Next we study how mean unemployment duration varies with unemployment compensation.
For simplicity we’ll fix the initial state at 𝑧𝑡 = 0.
@njit
def f_star_function(z):
return interp(z_grid, f_star, z)
@njit
664 CHAPTER 39. JOB SEARCH IV: CORRELATED WAGE OFFERS
def draw_tau(t_max=10_000):
z = 0
t = 0
unemployed = True
while unemployed and t < t_max:
# draw current wage
y = np.exp(μ + s * np.random.randn())
w = np.exp(z) + y
res_wage = np.exp(f_star_function(z) * (1 - β))
# if optimal to stop, record t
if w >= res_wage:
unemployed = False
τ = t
# else increment data and state
else:
z = ρ * z + d + σ * np.random.randn()
t += 1
return τ
@njit(parallel=True)
def compute_expected_tau(num_reps=100_000):
sum_value = 0
for i in prange(num_reps):
sum_value += draw_tau()
return sum_value / num_reps
return compute_expected_tau()
Let’s test this out with some possible values for unemployment compensation.
39.6 Exercises
39.6.1 Exercise 1
Investigate how mean unemployment duration varies with the discount factor 𝛽.
• What is your prior expectation?
• Do your results match up?
39.7 Solutions
39.7.1 Exercise 1
ax.set_xlabel("$\\beta$")
ax.set_ylabel("mean unemployment duration")
plt.show()
The figure shows that more patient individuals tend to wait longer before accepting an offer.
Chapter 40
40.1 Contents
• Overview 40.2
• Appendix 40.3
Co-author: Zejin Shi
40.2 Overview
667
668 CHAPTER 40. EXCHANGEABILITY AND BAYESIAN UPDATING
so that the joint density is the product of a sequence of identical marginal densities.
40.2.2 IID Means Past Observations Don’t Tell Us Anything About Future
Observations
If a sequence is random variables is IID, past information provides no information about fu-
ture realizations.
In this sense, there is nothing to learn about the future from the past.
To understand these statements, let the joint distribution of a sequence of random variables
{𝑊𝑡 }𝑇𝑡=0 that is not necessarily IID, be
𝑝(𝑊𝑇 , 𝑊𝑇 −1 , … , 𝑊1 , 𝑊0 )
Using the laws of probability, we can always factor such a joint density into a product of con-
ditional densities:
In general,
which states that the conditional density on the left side does not equal the marginal
density on the right side.
In the special IID case,
• doesnt’t know which of these two distributions that nature has drawn
• summarizing his ignorance by acting as if or thinking that nature chose distribution 𝐹
with probability 𝜋̃ ∈ (0, 1) and distribution 𝐺 with probability 1 − 𝜋̃
670 CHAPTER 40. EXCHANGEABILITY AND BAYESIAN UPDATING
• at date 𝑡 ≥ 0 has observed the partial history 𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 of draws from the appro-
priate joint density of the partial history
But what do we mean by the appropriate joint distribution?
We’ll discuss that next and in the process describe the concept of exchangeability.
𝑓(𝑊0 )𝑓(𝑊1 ) ⋯
𝑔(𝑊0 )𝑔(𝑊1 ) ⋯
ℎ(𝑊0 , 𝑊1 , …) ≡ 𝜋[𝑓(𝑊
̃ 0 )𝑓(𝑊1 ) ⋯] + (1 − 𝜋)[𝑔(𝑊
̃ 0 )𝑔(𝑊1 ) ⋯] (1)
Under the unconditional distribution ℎ(𝑊0 , 𝑊1 , …), the sequence 𝑊0 , 𝑊1 , … is not indepen-
dently and identically distributed.
To verify this claim, it is sufficient to notice, for example, that
ℎ(𝑤0 , 𝑤1 ) = 𝜋𝑓(𝑤
̃ 0 )𝑓(𝑤1 )+(1− 𝜋)𝑔(𝑤
̃ 0 )𝑔(𝑤1 ) ≠ (𝜋𝑓(𝑤
̃ 0 )+(1− 𝜋)𝑔(𝑤
̃ 0 ))(𝜋𝑓(𝑤
̃ 1 )+(1− 𝜋)𝑔(𝑤
̃ 1 ))
ℎ(𝑤0 , 𝑤1 )
ℎ(𝑤1 |𝑤0 ) ≡ ≠ (𝜋𝑓(𝑤
̃ 1 ) + (1 − 𝜋)𝑔(𝑤
̃ 1 ))
(𝜋𝑓(𝑤
̃ 0 + (1 − 𝜋)𝑔(𝑤
) ̃ 0 ))
40.2.5 Exchangeability
While the sequence 𝑊0 , 𝑊1 , … is not IID, it can be verified that it is exchangeable, which
means that
40.2. OVERVIEW 671
ℎ(𝑤0 , 𝑤1 ) = ℎ(𝑤1 , 𝑤0 )
and so on.
More generally, a sequence of random variables is said to be exchangeable if the joint prob-
ability distribution for the sequence does not change when the positions in the sequence in
which finitely many of the random variables appear are altered.
Equation (1) represents our instance of an exchangeable joint density over a sequence of ran-
dom variables as a mixture of two IID joint densities over a sequence of random variables.
For a Bayesian statistician, the mixing parameter 𝜋̃ ∈ (0, 1) has a special interpretation as a
prior probability that nature selected probability distribution 𝐹 .
DeFinetti [40] established a related representation of an exchangeable process created by mix-
ing sequences of IID Bernoulli random variables with parameters 𝜃 and mixing probability
𝜋(𝜃) for a density 𝜋(𝜃) that a Bayesian statistician would interpret as a prior over the un-
known Bernoulli paramter 𝜃.
We noted above that in our example model there is something to learn about about the fu-
ture from past data drawn from our particular instance of a process that is exchangeable but
not IID.
But how can we learn?
And about what?
The answer to the about what question is about 𝑝𝑖.
̃
The answer to the how question is to use Bayes’ Law.
Another way to say use Bayes’ Law is to say compute an appropriate conditional distribution.
Let’s dive into Bayes’ Law in this context.
Let 𝑞 represent the distribution that nature actually draws from 𝑤 from and let
𝜋 = ℙ{𝑞 = 𝑓}
where we regard 𝜋 as the decision maker’s subjective probability (also called a personal
probability.
Suppose that at 𝑡 ≥ 0, the decision maker has observed a history 𝑤𝑡 ≡ [𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤0 ].
We let
𝜋𝑡 = ℙ{𝑞 = 𝑓|𝑤𝑡 }
𝜋−1 = 𝜋̃
𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔.
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
The last expression follows from Bayes’ rule, which tells us that
ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}
Let’s stare at and rearrange Bayes’ Law as represented in equation (2) with the aim of under-
standing how the posterior 𝜋𝑡+1 is influenced by the prior 𝜋𝑡 and the likelihood ratio
𝑓(𝑤)
𝑙(𝑤) =
𝑔(𝑤)
𝑓(𝑤𝑡+1 )
𝜋𝑡 𝑓 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤 ) 𝜋𝑡 𝑙 (𝑤𝑡+1 )
𝜋𝑡+1 = = 𝑓(𝑤 ) 𝑡+1 =
𝜋𝑡 𝑓 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 ) 𝑔 (𝑤𝑡+1 ) 𝜋𝑡 𝑔(𝑤𝑡+1) + (1 − 𝜋𝑡 ) 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
𝑡+1
Notice how the likelihood ratio and the prior interact to determine whether an observation
𝑤𝑡+1 leads the decision maker to increase or decrease the subjective probability he/she at-
taches to distribution 𝐹 .
When the likelihood ratio 𝑙(𝑤𝑡+1 ) exceeds one, the observation 𝑤𝑡+1 nudges the probability
𝜋 put on distribution 𝐹 upward, and when the likelihood ratio 𝑙(𝑤𝑡+1 ) is less that one, the
observation 𝑤𝑡+1 nudges 𝜋 downward.
Representation (3) is the foundation of the graphs that we’ll use to display the dynamics of
{𝜋𝑡 }∞
𝑡=0 that are induced by Bayes’ Law.
We’ll plot 𝑙 (𝑤) as a way to enlighten us about how learning – i.e., Bayesian updating of the
probability 𝜋 that nature has chosen distribution 𝑓 – works.
To create the Python infrastructure to do our work for us, we construct a wrapper function
that displays informative graphs given parameters of 𝑓 and 𝑔.
In [2]: @vectorize
def p(x, a, b):
40.2. OVERVIEW 673
w_max = 1
w_grid = np.linspace(1e-12, w_max-1e-12, 100)
ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 - π) - 1)
plt.show()
Now we’ll create a group of graphs designed to illustrate the dynamics induced by Bayes’
Law.
We’ll begin with the default values of various objects, then change them in a subsequent ex-
ample.
In [3]: learning_example()
Please look at the three graphs above created for an instance in which 𝑓 is a uniform distri-
bution on [0, 1] (i.e., a Beta distribution with parameters 𝐹𝑎 = 1, 𝐹𝑏 = 1, while 𝑔 is a Beta
distribution with the default parameter values 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.
The graph in the left plots the likehood ratio 𝑙(𝑤) on the coordinate axis against 𝑤 on the
coordinate axis.
The middle graph plots both 𝑓(𝑤) and 𝑔(𝑤) against 𝑤, with the horizontal dotted lines show-
ing values of 𝑤 at which the likelihood ratio equals 1.
The graph on the right side plots arrows to the right that show when Bayes’ Law makes 𝜋
increase and arrows to the left that show when Bayes’ Law make 𝜋 decrease.
Notice how the length of the arrows, which show the magnitude of the force from Bayes’ Law
40.3. APPENDIX 675
impelling 𝜋 to change, depend on both the prior probability 𝜋 on the ordinate axis and the
evidence in the form of the current draw of 𝑤 on the coordinate axis.
The fractions in the colored areas of the middle graphs are probabilities under 𝐹 and 𝐺, re-
spectively, that realizations of 𝑤 fall into the interval that updates the belief 𝜋 in a correct
direction (i.e., toward 0 when 𝐺 is the true distribution, and towards 1 when 𝐹 is the true
distribution).
For example, in the above example, under true distribution 𝐹 , 𝜋 will be updated toward 0 if
𝑤 falls into the interval [0.524, 0.999], which occurs with probability 1 − .524 = .476 under
𝐹 . But this would occur with probability 0.816 if 𝐺 were the true distribution. The fraction
0.816 in the orange region is the integral of 𝑔(𝑤) over this interval.
Next we use our code to create graphs for another instance of our model.
We keep 𝐹 the same as in the preceding instance, namely a uniform distribution, but now
assume that 𝐺 is a Beta distribution with parameters 𝐺𝑎 = 2, 𝐺𝑏 = 1.6.
Notice how the likelihood ratio, the middle graph, and the arrows compare with the previous
instance of our example.
40.3 Appendix
Now we’ll have some fun by plotting multiple realizations of sample paths of 𝜋𝑡 under two
possible assumptions about nature’s choice of distribution:
• that nature permanently draws from 𝐹
• that nature permanently draws from 𝐺
Outcomes depend on a peculiar property of likelihood ratio processes that are discussed in
this lecture
To do this, we create some Python code.
# define f and g
f = njit(lambda x: p(x, F_a, F_b))
676 CHAPTER 40. EXCHANGEABILITY AND BAYESIAN UPDATING
@njit
def update(a, b, π):
"Update π by drawing from beta distribution with parameters a and b"
# Draw
w = np.random.beta(a, b)
# Update belief
π = 1 / (1 + ((1 - π) * g(w)) / (π * f(w)))
return π
@njit
def simulate_path(a, b, T=50):
"Simulates a path of beliefs π with length T"
π = np.empty(T+1)
# initial condition
π[0] = 0.5
return π
for i in range(N):
π_paths[i] = simulate_path(a=a, b=b, T=T)
if display:
plt.plot(range(T+1), π_paths[i], color='b', lw=0.8, alpha=0.5)
if display:
plt.show()
return π_paths
return simulate
We begin by generating 𝑁 simulated {𝜋𝑡 } paths with 𝑇 periods when the sequence is truly
IID draws from 𝐹 . We set the initial prior 𝜋−1 = .5.
In [7]: T = 50
In the above graph we observe that for most paths 𝜋𝑡 → 1. So Bayes’ Law evidently eventu-
ally discovers the truth for most of our paths.
Next, we generate paths with 𝑇 periods when the sequence is truly IID draws from 𝐺. Again,
we set the initial prior 𝜋−1 = .5.
We study rates of convergence of 𝜋𝑡 to 1 when nature generates the data as IID draws from 𝐹
and of 𝜋𝑡 to 0 when nature generates the data as IID draws from 𝐺.
We do this by averaging across simulated paths of {𝜋𝑡 }𝑇𝑡=0 .
𝑁
Using 𝑁 simulated 𝜋𝑡 paths, we compute 1 − ∑𝑖=1 𝜋𝑖,𝑡 at each 𝑡 when the data are generated
𝑁
as draws from 𝐹 and compute ∑𝑖=1 𝜋𝑖,𝑡 when the data are generated as draws from 𝐺.
From the above graph, rates of convergence appear not to depend on whether 𝐹 or 𝐺 gener-
ates the data.
More insights about the dynamics of {𝜋𝑡 } can be gleaned by computing the following con-
𝜋
ditional expectations of 𝜋𝑡+1 as functions of 𝜋𝑡 via integration with respect to the pertinent
𝑡
probability distribution:
𝜋𝑡+1 𝑙 (𝑤𝑡+1 )
𝐸[ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] = 𝐸 [ ∣ 𝑞 = 𝜔, 𝜋𝑡 ] ,
𝜋𝑡 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
1
𝑙 (𝑤𝑡+1 )
=∫ 𝜔 (𝑤𝑡+1 ) 𝑑𝑤𝑡+1
0 𝜋𝑡 𝑙 (𝑤𝑡+1 ) + (1 − 𝜋𝑡 )
40.3. APPENDIX 679
where 𝜔 = 𝑓, 𝑔.
The following code approximates the integral above:
# define f and g
f = njit(lambda x: p(x, F_a, F_b))
g = njit(lambda x: p(x, G_a, G_b))
expected_rario = np.empty(len(π_grid))
for q, inte in zip(["f", "g"], [integrand_f, integrand_g]):
for i, π in enumerate(π_grid):
expected_rario[i]= quad(inte, 0, 1, args=(π,))[0]
plt.plot(π_grid, expected_rario, label=f"{q} generates")
plt.hlines(1, 0, 1, linestyle="--")
plt.xlabel("$π_t$")
plt.ylabel("$E[\pi_{t+1}/\pi_t]$")
plt.legend()
plt.show()
In [12]: expected_ratio()
680 CHAPTER 40. EXCHANGEABILITY AND BAYESIAN UPDATING
The above graphs shows that when 𝐹 generates the data, 𝜋𝑡 on average always heads north,
while when 𝐺 generates the data, 𝜋𝑡 heads south.
Next, we’ll look at a degenerate case in whcih 𝑓 and 𝑔 are identical beta distributions, and
𝐹𝑎 = 𝐺𝑎 = 3, 𝐹𝑏 = 𝐺𝑏 = 1.2.
In a sense, here there is nothing to learn.
The above graph says that 𝜋𝑡 is inert and would remain at its initial value.
Finally, let’s look at a case in which 𝑓 and 𝑔 are neither very different nor identical, in partic-
ular one in which 𝐹𝑎 = 2, 𝐹𝑏 = 1 and 𝐺𝑎 = 3, 𝐺𝑏 = 1.2.
41.1 Contents
• Overview 41.2
• Model 41.3
• Take 1: Solution by VFI 41.4
• Take 2: A More Efficient Method 41.5
• Another Functional Equation 41.6
• Solving the RWFE 41.7
• Implementation 41.8
• Exercises 41.9
• Solutions 41.10
• Appendix A 41.11
• Appendix B 41.12
• Examples 41.13
In addition to what’s in Anaconda, this lecture deploys the libraries:
41.2 Overview
In this lecture, we consider an extension of the previously studied job search model of McCall
[116].
We’ll build on a model of Bayesian learning discussed in this lecture on the topic of exchange-
ability and its relationship to the concept of IID (identically and independently distributed)
random variables and to Bayesian updating.
In the McCall model, an unemployed worker decides when to accept a permanent job at a
specific fixed wage, given
• his or her discount factor
• the level of unemployment compensation
• the distribution from which wage offers are drawn
In the version considered below, the wage distribution is unknown and must be learned.
683
684 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
• Infinite horizon dynamic programming with two states and one binary control.
• Bayesian updating to learn the unknown distribution.
41.3 Model
Let’s first review the basic McCall model [116] and then add the variation we want to con-
sider.
Recall that, in the baseline model , an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡 .
At time 𝑡, our worker either
2. rejects the offer, receives unemployment compensation 𝑐 and reconsiders next period
𝑤
𝑣(𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ )𝑞(𝑤′ )𝑑𝑤′ } (1)
1−𝛽
Now let’s extend the model by considering the variation presented in [108], section 6.6.
The model is as above, apart from the fact that
41.3. MODEL 685
𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )
This last expression follows from Bayes’ rule, which tells us that
ℙ{𝑊 = 𝑤 | 𝑞 = 𝑓}ℙ{𝑞 = 𝑓}
ℙ{𝑞 = 𝑓 | 𝑊 = 𝑤} = and ℙ{𝑊 = 𝑤} = ∑ ℙ{𝑊 = 𝑤 | 𝑞 = 𝜔}ℙ{𝑞 = 𝜔}
ℙ{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}
The fact that (2) is recursive allows us to progress to a recursive solution method.
Letting
𝜋𝑓(𝑤)
𝑞𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝜅(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)
we can express the value function for the unemployed worker recursively as follows
𝑤
𝑣(𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝜅(𝑤′ , 𝜋) (3)
1−𝛽
Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards.
41.3.3 Parameterization
In [3]: @vectorize
def p(x, a, b):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)
686 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
ax.legend()
plt.show()
What kind of optimal policy might result from (3) and the parameterization specified above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏 .
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤.̄
But 𝑤̄ should depend on 𝜋 — in fact, it should be decreasing in 𝜋 because
• 𝑓 is a less attractive offer distribution than 𝑔
• larger 𝜋 means more weight on 𝑓 and less on 𝑔
Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive.
41.4. TAKE 1: SOLUTION BY VFI 687
Let’s set about solving the model and see how our results match with our intuition.
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best.
The class SearchProblem is used to store parameters and methods needed to compute opti-
mal actions.
"""
def __init__(self,
β=0.95, # Discount factor
c=0.3, # Unemployment compensation
F_a=1,
F_b=1,
G_a=3,
G_b=1.2,
w_max=1, # Maximum wage possible
w_grid_size=100,
π_grid_size=100,
mc_size=500):
self.mc_size = mc_size
The following function takes an instance of this class and returns jitted versions of the Bell-
man operator T, and a get_greedy() function to compute the approximate optimal policy
from a guess v of the value function
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
688 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 - π) * g(w)
π_new = pf / (pf + pg)
return π_new
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator.
"""
v_func = lambda x, y: mlinterp((w_grid, π_grid), v, (x, y))
v_new = np.empty_like(v)
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 - β)
v_2 = c + β * integral
v_new[i, j] = max(v_1, v_2)
return v_new
@njit(parallel=parallel_flag)
def get_greedy(v):
""""
Compute optimal actions taking v as the value function.
"""
for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]
v_1 = w / (1 - β)
for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π))
integral_g += v_func(w_g[m], κ(w_g[m], π))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size
v_2 = c + β * integral
return σ
return T, get_greedy
We will omit a detailed discussion of the code because there is a more efficient solution
method that we will use later.
To solve the model we will use the following function that iterates using T to find a fixed
point
"""
Solves for the value function
* sp is an instance of SearchProblem
"""
T, _ = operator_factory(sp, use_parallel)
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize v
v = np.zeros((m, n)) + sp.c / (1 - sp.β)
if i == max_iter:
print("Failed to converge!")
return v_new
690 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
In [7]: sp = SearchProblem()
v_star = solve_model(sp)
fig, ax = plt.subplots(figsize=(6, 6))
ax.contourf(sp.π_grid, sp.w_grid, v_star, 12, alpha=0.6, cmap=cm.jet)
cs = ax.contour(sp.π_grid, sp.w_grid, v_star, 12, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.set(xlabel='$\pi$', ylabel='$w$')
plt.show()
Converged in 32 iterations.
plt.show()
The results fit well with our intuition from section [#looking-forward}{looking forward}.
• The black line in the figure above corresponds to the function 𝑤(𝜋)
̄ introduced there.
• It is decreasing as expected.
We will use iteration with an operator that has the same contraction rate as the Bellman op-
erator, but
• one dimensional rather than two dimensional
• no maximization step
As a consequence, the algorithm is orders of magnitude faster than VFI.
This section illustrates the point that when it comes to programming, a bit of mathematical
analysis goes a long way.
𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽
𝑤 𝑤(𝜋)
̄
𝑣(𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽
𝑤(𝜋)
̄ 𝑤′ ̄ ′)
𝑤(𝜋
= 𝑐 + 𝛽 ∫ max { , } 𝑞𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽
𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (6)
Equation (6) can be understood as a functional equation, where 𝑤̄ is the unknown function.
• Let’s call it the reservation wage functional equation (RWFE).
• The solution 𝑤̄ to the RWFE is the object that we wish to compute.
To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping.
To this end, let
• 𝑏[0, 1] be the bounded real-valued functions on [0, 1]
• ‖𝜔‖ ∶= sup𝑥∈[0,1] |𝜔(𝑥)|
Consider the operator 𝑄 mapping 𝜔 ∈ 𝑏[0, 1] into 𝑄𝜔 ∈ 𝑏[0, 1] via
41.8. IMPLEMENTATION 693
Comparing (6) and (7), we see that the set of fixed points of 𝑄 exactly coincides with the set
of solutions to the RWFE.
• If 𝑄𝑤̄ = 𝑤̄ then 𝑤̄ solves (6) and vice versa.
Moreover, for any 𝜔, 𝜔′ ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} − max {𝑤′ , 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)}| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (8)
Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have
|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |𝜔 ∘ 𝜅(𝑤′ , 𝜋) − 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (10)
In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖).
Hence
• A unique solution 𝑤̄ to the RWFE exists in 𝑏[0, 1].
• 𝑄𝑘 𝜔 → 𝑤̄ uniformly as 𝑘 → ∞, for any 𝜔 ∈ 𝑏[0, 1].
41.8 Implementation
The following function takes an instance of SearchProblem and returns the operator Q
f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid
@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
694 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
return π_new
@njit(parallel=parallel_flag)
def Q(ω):
"""
"""
ω_func = lambda p: interp(π_grid, ω, p)
ω_new = np.empty_like(ω)
for i in prange(len(π_grid)):
π = π_grid[i]
integral_f, integral_g = 0.0, 0.0
for m in prange(mc_size):
integral_f += max(w_f[m], ω_func(κ(w_f[m], π)))
integral_g += max(w_g[m], ω_func(κ(w_g[m], π)))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size
ω_new[i] = (1 - β) * c + β * integral
return ω_new
return Q
41.9 Exercises
41.9.1 Exercise 1
41.10 Solutions
41.10.1 Exercise 1
This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function.
You should find that the run time is shorter than that of the value function approach.
41.10. SOLUTIONS 695
Similar to above, we set up a function to iterate with Q to find the fixed point
Q = Q_factory(sp, use_parallel)
# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)
# Initialize w
w = np.ones_like(sp.π_grid)
if i == max_iter:
print("Failed to converge!")
return w_new
In [11]: sp = SearchProblem()
w_bar = solve_wbar(sp)
Converged in 26 iterations.
41.11 Appendix A
The next piece of code generates a fun simulation to see what the effect of a change in the
underlying distribution on the unemployment rate is.
At a point in the simulation, the distribution becomes significantly worse.
It takes a while for agents to learn this, and in the meantime, they are too optimistic and
turn down too many jobs.
As a result, the unemployment rate spikes
@njit
def update(a, b, e, π):
"Update e and π by drawing wage offer from beta distribution with�
↪parameters a and
b"
if e == False:
w = np.random.beta(a, b) # Draw random wage
if w >= w_func(π):
e = True # Take new job
else:
π = 1 / (1 + ((1 - π) * g(w)) / (π * f(w)))
return e, π
@njit
def simulate_path(F_a=F_a,
F_b=F_b,
G_a=G_a,
G_b=G_b,
N=5000, # Number of agents
T=600, # Simulation length
d=200, # Change date
s=0.025): # Separation rate
e = np.ones((N, T+1))
π = np.ones((N, T+1)) * 1e-3
for t in range(T+1):
if t == d:
a, b = F_a, F_b # Change distribution parameters
41.12 Appendix B
In this appendix we provide more details about how Bayes’ Law contributes to the workings
of the model.
We present some graphs that bring out additional insights about how learning works.
We build on graphs proposed in this lecture.
In particular, we’ll add actions of our searching worker to a key graph presented in that lec-
ture.
To begin, we first define two functions for computing the empirical distributions of unemploy-
ment duration and π at the time of employment.
In [13]: @njit
def empirical_dist(F_a, F_b, G_a, G_b, w_bar, π_grid,
N=10000, T=600):
"""
Simulates population for computing empirical cumulative
distribution of unemployment duration and π at time when
the worker accepts the wage offer. For each job searching
problem, we simulate for two cases that either f or g is
the true offer distribution.
41.12. APPENDIX B 699
Parameters
----------
Returns
-------
accpet_t : 2 by N ndarray. the empirical distribution of
unemployment duration when f or g generates offers.
accept_π : 2 by N ndarray. the empirical distribution of
π at the time of employment when f or g generates offers.
"""
# f or g generates offers
for i, (a, b) in enumerate([(F_a, F_b), (G_a, G_b)]):
# update each agent
for n in range(N):
# initial priori
π = 0.5
for t in range(T+1):
def cumfreq_x(res):
"""
A helper function for calculating the x grids of
the cumulative frequency histogram.
"""
cumcount = res.cumcount
lowerlimit, binsize = res.lowerlimit, res.binsize
return x
Now we define a wrapper function for analyzing job search models with learning under differ-
ent parameterizations.
The wrapper takes parameters of beta distributions and unemployment compensation as in-
puts and then displays various things we want to know to interpret the solution of our search
model.
In addition, it computes empirical cumulative distributions of two key objects.
# part 1: display the details of the model settings and some results
w_grid = np.linspace(1e-12, 1-1e-12, 100)
ΔW = np.zeros((len(W), len(Π)))
ΔΠ = np.empty((len(W), len(Π)))
for i, w in enumerate(W):
for j, π in enumerate(Π):
lw = l(w)
ΔΠ[i, j] = π * (lw / (π * lw + 1 - π) - 1)
plt.show()
axs[0].grid(linestyle='--')
axs[0].legend(loc=4)
axs[0].title.set_text('CDF of duration of unemployment')
axs[0].set(xlabel='time', ylabel='Prob(time)')
plt.show()
We now provide some examples that provide insights about how the model works.
41.13 Examples
The formula implies that the direction of motion of 𝜋𝑡 is determined by the relationship be-
tween 𝑙(𝑤𝑡 ) and 1.
The magnitude is small if
• 𝑙(𝑤) is close to 1, which means the new 𝑤 is not very informative for distinguishing two
distributions,
• 𝜋𝑡−1 is close to either 0 or 1, which means the priori is strong.
Will an unemployed worker accept an offer earlier or not, when the actual ruling distribution
is 𝑔 instead of 𝑓?
Two countervailing effects are at work.
• if f generates successive wage offers, then 𝑤 is more likely to be low, but 𝜋 is moving up
toward to 1, which lowers the reservation wage, i.e., the worker becomes less selective
the longer he or she remains unemployed.
• if g generates wage offers, then 𝑤 is more likely to be high, but 𝜋 is moving downward
toward 0, increasing the reservation wage, i.e., the worker becomes more selective the
longer he or she remains unemployed.
Quantitatively, the lower right figure sheds light on which effect dominates in this example.
It shows the probability that a previously unemployed worker accepts an offer at different val-
ues of 𝜋 when 𝑓 or 𝑔 generates wage offers.
That graph shows that for the particular 𝑓 and 𝑔 in this example, the worker is always more
likely to accept an offer when 𝑓 generates the data even when 𝜋 is close to zero so that the
worker believes the true distribution is 𝑔 and therefore is relatively more selective.
41.13. EXAMPLES 703
The empirical cumulative distribution of the duration of unemployment verifies our conjec-
ture.
In [15]: job_search_example()
704 CHAPTER 41. JOB SEARCH V: SEARCH WITH LEARNING
41.13.2 Example 2
41.13.3 Example 3
41.13.4 Example 4
41.13.5 Example 5
42.1 Contents
• Overview 42.2
• Model 42.3
• Implementation 42.4
• Exercises 42.5
• Solutions 42.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
42.2 Overview
• Career and job within career both chosen to maximize expected discounted wage flow.
711
712 CHAPTER 42. JOB SEARCH VI: MODELING CAREER CHOICE
42.3 Model
∞
𝔼 ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0
where
𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)
Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively.
42.3. MODEL 713
42.3.1 Parameterization
As in [108], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:
• both 𝜃 and 𝜖 take values in the set np.linspace(0, B, grid_size) — an even
grid of points between 0 and 𝐵 inclusive
• grid_size = 50
• B = 5
• β = 0.95
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
np.linspace(0, B, grid_size).
A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function
𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)
Interpretation:
• draw 𝑞 from a Beta distribution with shape parameters (𝑎, 𝑏)
• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials
Nice properties:
• very flexible class of distributions, including uniform, symmetric unimodal, etc.
• only three parameters
Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50.
n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
ab_label = f'$a = {a:.1f}$, $b = {b:.1f}$'
ax.plot(list(range(0, n+1)), gen_probs(n, a, b), '-o', label=ab_label)
ax.legend()
plt.show()
714 CHAPTER 42. JOB SEARCH VI: MODELING CAREER CHOICE
42.4 Implementation
We will first create a class CareerWorkerProblem which will hold the default parameteri-
zations of the model and an initial guess for the value function.
def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):
The following function takes an instance of CareerWorkerProblem and returns the corre-
sponding Bellman operator 𝑇 and the greedy policy function.
42.4. IMPLEMENTATION 715
In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in
(2).
"""
Returns jitted versions of the Bellman operator and the
greedy policy function
cw is an instance of ``CareerWorkerProblem``
"""
@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"
v_new = np.empty_like(v)
for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j] # Stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # New job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # New life
v_new[i, j] = max(v1, v2, v3)
return v_new
@njit
def get_greedy(v):
"Computes the v-greedy policy"
σ = np.empty(v.shape)
for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action
return σ
return T, get_greedy
T, _ = operator_factory(cw, parallel_flag=use_parallel)
# Set up loop
v = np.ones((cw.grid_size, cw.grid_size)) * 100 # Initial guess
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
In [7]: cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with a new job
and new career.
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found.
• If both job and career are good, the worker will stay put.
Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job.
The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs.
• Sometimes a good job must be sacrificed in order to change to a better career.
42.5 Exercises
42.5.1 Exercise 1
Using the default parameterization in the class CareerWorkerProblem, generate and plot
typical sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy.
42.5. EXERCISES 719
In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)
42.5.2 Exercise 2
Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0).
In other words, we want to study the distribution of the random variable
𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes
Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space.
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:
𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}
Collect 25,000 draws of this random variable and compute the median (which should be
about 7).
Repeat the exercise with 𝛽 = 0.99 and interpret the change.
720 CHAPTER 42. JOB SEARCH VI: MODELING CAREER CHOICE
42.5.3 Exercise 3
Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure –
interpret.
42.6 Solutions
42.6.1 Exercise 1
In [9]: F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)
plt.legend()
plt.show()
42.6. SOLUTIONS 721
42.6.2 Exercise 2
In [10]: cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)
@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
t += 1
@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
722 CHAPTER 42. JOB SEARCH VI: MODELING CAREER CHOICE
samples = np.empty(M)
for i in prange(M):
samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)
median_time(greedy_star, F, G)
Out[10]: 7.0
To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw =
CareerWorkerProblem() with cw = CareerWorkerProblem(β=0.99).
The medians are subject to randomness but should be about 7 and 14 respectively.
Not surprisingly, more patient workers will wait longer to settle down to their final job.
42.6.3 Exercise 3
In the new figure, you see that the region for which the worker stays put has grown because
the distribution for 𝜖 has become more concentrated around the mean, making high-paying
jobs less realistic.
724 CHAPTER 42. JOB SEARCH VI: MODELING CAREER CHOICE
Chapter 43
43.1 Contents
• Overview 43.2
• Model 43.3
• Implementation 43.4
• Solving for Policies 43.5
• Exercises 43.6
• Solutions 43.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
43.2 Overview
725
726 CHAPTER 43. JOB SEARCH VII: ON-THE-JOB SEARCH
43.3 Model
Let 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm and
let 𝑤𝑡 denote current wages.
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
• 𝜙𝑡 is investment in job-specific human capital for the current role and
• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝑔(𝑥𝑡 , 𝜙𝑡 ).
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈
[0, 1].
The value of the offer, measured in job-specific human capital, is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with
common distribution 𝑓.
The worker can reject the current offer and continue with existing job.
Hence 𝑥𝑡+1 = 𝑢𝑡+1 if he/she accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) otherwise.
Let 𝑏𝑡+1 ∈ {0, 1} be a binary random variable, where 𝑏𝑡+1 = 1 indicates that the worker
receives an offer at the end of time 𝑡.
We can write
Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }.
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using (1), the Bellman equation for this problem can
be written as
𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (2)
𝑠+𝜙≤1
43.3.1 Parameterization
√
𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)
Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
2. search for a new job with better job-specific capital match via 𝑠
Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical.
Our risk-neutral worker should focus on whatever instrument has the highest expected return.
The relative expected return will depend on 𝑥.
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of (1) gives expected
next period capital equal to 𝜋(𝑠)𝔼𝑢 = 𝔼𝑢 = 0.5.
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23.
Both rates of return are good, but the return from search is better.
Next, suppose that 𝑥 = 0.4
• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5
• If 𝑠 = 0 and 𝜙 = 1, then 𝑔(𝑥, 𝜙) = 𝑔(0.4, 1) ≈ 0.8
Return from investment via 𝜙 dominates expected return from search.
Combining these observations gives us two informal predictions:
1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return.
Now let’s turn to implementation, and see if we can match our predictions.
43.4 Implementation
We will set up a class JVWorker that holds the parameters of the model described above
"""
def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
728 CHAPTER 43. JOB SEARCH VII: ON-THE-JOB SEARCH
# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
� = 1e-4
grid_max = max(A**(1 / (1 - α)), stats.beta(a, b).ppf(1 - �))
# Human capital
self.x_grid = np.linspace(�, grid_max, grid_size)
The function operator_factory takes an instance of this class and returns a jitted version
of the Bellman operator T, ie.
where
When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid.
But to evaluate the right-hand side of (3), we need a function, so we replace the arrays v and
x_grid with a function v_func that gives linear interpolation of v on x_grid.
Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) =
𝑤(𝑠, 𝜙) defined in (3).
The function is maximized over all feasible (𝑠, 𝜙) pairs.
Another function, get_greedy returns the optimal choice of 𝑠 and 𝜙 at each 𝑥, given a
value function.
"""
Returns a jitted version of the Bellman operator T
jv is an instance of JVWorker
"""
π, β = jv.π, jv.β
x_grid, �, mc_size = jv.x_grid, jv.�, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g
43.4. IMPLEMENTATION 729
@njit
def objective(z, x, v):
s, ϕ = z
v_func = lambda x: interp(x_grid, v, x)
integral = 0
for m in range(mc_size):
u = f_rvs[m]
integral += v_func(max(g(x, ϕ), u))
integral = integral / mc_size
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = -1
for s in search_grid:
for ϕ in search_grid:
current_val = objective((s, ϕ), x, v) if s + ϕ <= 1 else -1
if current_val > max_val:
max_val = current_val
v_new[i] = max_val
return v_new
@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
s_policy, ϕ_policy = np.empty_like(v), np.empty_like(v)
for i in range(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(�, 1, 15)
max_val = -1
for s in search_grid:
for ϕ in search_grid:
current_val = objective((s, ϕ), x, v) if s + ϕ <= 1 else -1
if current_val > max_val:
max_val = current_val
max_s, max_ϕ = s, ϕ
s_policy[i], ϕ_policy[i] = max_s, max_ϕ
return s_policy, ϕ_policy
730 CHAPTER 43. JOB SEARCH VII: ON-THE-JOB SEARCH
return T, get_greedy
To solve the model, we will write a function that uses the Bellman operator and iterates to
find a fixed point.
"""
Solves the model by value function iteration
* jv is an instance of JVWorker
"""
T, _ = operator_factory(jv, parallel_flag=use_parallel)
# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return v_new
Let’s generate the optimal policies and see what they look like.
In [6]: jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, ϕ_star = get_greedy(v_star)
axes[-1].set_xlabel("x")
plt.show()
732 CHAPTER 43. JOB SEARCH VII: ON-THE-JOB SEARCH
The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥).
Overall, the policies match well with our predictions from above
• Worker switches from one investment strategy to the other depending on relative re-
turn.
• For low values of 𝑥, the best option is to search for a new job.
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position.
43.6 Exercises
43.6.1 Exercise 1
Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies.
The dynamics are given by (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal policies,
and ℙ{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 ).
Since the dynamics are random, analysis is a bit subtle.
One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large
number 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥.
Plot this with one dot for each realization, in the form of a 45 degree diagram, setting
jv = JVWorker(grid_size=25, mc_size=50)
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)
By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity.
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6.
43.6.2 Exercise 2
In the preceding exercise, we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6.
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker.
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital.
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡).
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝑔(𝑥, 𝜙).
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙).
43.7. SOLUTIONS 733
43.7 Solutions
43.7.1 Exercise 1
plt.show()
734 CHAPTER 43. JOB SEARCH VII: ON-THE-JOB SEARCH
43.7.2 Exercise 2
In [9]: jv = JVWorker()
def xbar(ϕ):
A, α = jv.A, jv.α
return (A * ϕ**α)**(1 / (1 - α))
plt.show()
44.1 Contents
• Overview 44.2
• Origin of the Problem 44.3
• A Dynamic Programming Approach 44.4
• Implementation 44.5
• Analysis 44.6
• Comparison with Neyman-Pearson Formulation 44.7
Co-authors: Chase Coleman
In addition to what’s in Anaconda, this lecture will need the following libraries:
44.2 Overview
This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University.
This problem led Abraham Wald [162] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming.
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem.
Key ideas in play will be:
• Bayes’ Law
• Dynamic programming
• Type I and type II statistical errors
737
738 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
– a type I error occurs when you reject a null hypothesis that is true
– a type II error is when you accept a null hypothesis that is false
• Abraham Wald’s sequential probability ratio test
• The power of a statistical test
• The critical region of a statistical test
• A uniformly most powerful test
We’ll begin with some imports:
On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [57], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University.
Let’s listen to Milton Friedman tell us what happened
The standard statistical answer was to specify a number of firings (say 1,000) and
a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1
in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in
fewer than 47%, B can be regarded as superior; if the percentage is between 47%
and 53%, neither can be so regarded.
When Allen Wallis was discussing such a problem with (Navy) Captain Garret L.
Schyler, the captain objected that such a test, to quote from Allen’s account, may
prove wasteful. If a wise and seasoned ordnance officer like Schyler were on the
premises, he would see after the first few thousand or even few hundred [rounds]
that the experiment need not be completed either because the new method is ob-
viously inferior or because it is obviously superior beyond what was hoped for ….
Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald.
That started Wald on the path that led him to Sequential Analysis [162].
We’ll formulate the problem using dynamic programming.
44.4. A DYNAMIC PROGRAMMING APPROACH 739
The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [20].
A decision-maker observes IID draws of a random variable 𝑧.
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧.
After a number of draws, also to be determined, he makes a decision as to which of the distri-
butions is generating the draws he observes.
He starts with prior
𝜋𝑘 = ℙ{𝑓 = 𝑓0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 }
𝜋𝑘 𝑓0 (𝑧𝑘+1 )
𝜋𝑘+1 = , 𝑘 = −1, 0, 1, …
𝜋𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝜋𝑘 )𝑓1 (𝑧𝑘+1 )
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker believes that 𝑧𝑘+1 has probability distribu-
tion
This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑓 = 𝑓0 Section ??.
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions.
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is
∞
Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0
The next figure shows two beta distributions in the top panel.
The bottom panel presents mixtures of these distributions, with various mixing probabilities
𝜋𝑘
@vectorize
def p(x):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)
@njit
def p_rvs():
740 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
return np.random.beta(a, b)
return p, p_rvs
f0, _ = beta_function_factory(1, 1)
f1, _ = beta_function_factory(9, 9)
grid = np.linspace(0, 1, 50)
axes[0].set_title("Original Distributions")
axes[0].plot(grid, f0(grid), lw=2, label="$f_0$")
axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")
axes[1].set_title("Mixtures")
for π in 0.25, 0.5, 0.75:
y = π * f0(grid) + (1 - π) * f1(grid)
axes[1].plot(y, lw=2, label=f"$\pi_k$ = {π}")
for ax in axes:
ax.legend()
ax.set(xlabel="$z$ values", ylabel="probability of $z_k$")
plt.tight_layout()
plt.show()
44.4. A DYNAMIC PROGRAMMING APPROACH 741
After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker chooses among three distinct actions:
• He decides that 𝑓 = 𝑓0 and draws no more 𝑧’s
• He decides that 𝑓 = 𝑓1 and draws no more 𝑧’s
• He postpones deciding now and instead chooses to draw a 𝑧𝑘+1
Associated with these three actions, the decision-maker can suffer three kinds of losses:
• A loss 𝐿0 if he decides 𝑓 = 𝑓0 when actually 𝑓 = 𝑓1
• A loss 𝐿1 if he decides 𝑓 = 𝑓1 when actually 𝑓 = 𝑓0
• A cost 𝑐 if he postpones deciding and chooses instead to draw another 𝑧
44.4.3 Intuition
Let’s try to guess what an optimal decision rule might look like before we go further.
Suppose at some given point in time that 𝜋 is close to 1.
Then our prior beliefs and the evidence so far point strongly to 𝑓 = 𝑓0 .
If, on the other hand, 𝜋 is close to 0, then 𝑓 = 𝑓1 is strongly favored.
Finally, if 𝜋 is in the middle of the interval [0, 1], then we have little information in either di-
rection.
This reasoning suggests a decision rule such as the one shown in the figure
As we’ll see, this is indeed the correct form of the decision rule.
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above.
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽.
742 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Let 𝐽 (𝜋) be the total loss for a decision-maker with current belief 𝜋 who chooses optimally.
With some thought, you will agree that 𝐽 should satisfy the Bellman equation
𝜋𝑓0 (𝑧 ′ )
𝜋′ = 𝜅(𝑧 ′ , 𝜋) =
𝜋𝑓0 (𝑧′ ) + (1 − 𝜋)𝑓1 (𝑧′ )
when 𝜋 is fixed and 𝑧′ is drawn from the current best guess, which is the distribution 𝑓 de-
fined by
and
accept 𝑓 = 𝑓0 if 𝜋 ≥ 𝛼
accept 𝑓 = 𝑓1 if 𝜋 ≤ 𝛽
draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼
44.5. IMPLEMENTATION 743
Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽.
To make our computations simpler, using (2), we can write the continuation value ℎ(𝜋) as
The equality
ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧′ )𝑑𝑧 ′ (4)
44.5 Implementation
def __init__(self,
c=1.25, # Cost of another draw
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=25, # Cost of selecting f0 when f1 is true
L1=25, # Cost of selecting f1 when f0 is true
π_grid_size=200,
mc_size=1000):
# Set up distributions
self.f0, self.f0_rvs = beta_function_factory(a0, b0)
self.f1, self.f1_rvs = beta_function_factory(a1, b1)
"""
Returns a jitted version of the Q operator.
@njit
def κ(z, π):
"""
Updates π using Bayes' rule and the current observation z
"""
π_f0, π_f1 = π * f0(z), (1 - π) * f1(z)
π_new = π_f0 / (π_f0 + π_f1)
return π_new
@njit(parallel=parallel_flag)
def Q(h):
h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)
for i in prange(len(π_grid)):
π = π_grid[i]
h_new[i] = c + integral
return h_new
return Q
To solve the model, we will iterate using Q to find the fixed point
44.6. ANALYSIS 745
"""
Compute the continuation value function
* wf is an instance of WaldFriedman
"""
Q = operator_factory(wf, parallel_flag=use_parallel)
# Set up loop
h = np.zeros(len(wf.π_grid))
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return h_new
44.6 Analysis
In [7]: wf = WaldFriedman()
plt.show()
746 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
Converged in 26 iterations.
We will also set up a function to compute the cutoffs 𝛼 and 𝛽 and plot these on our value
function plot
"""
This function takes a continuation value function and returns the
corresponding cutoffs of where you transition between continuing and
choosing a specific model
"""
π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1
return (β, α)
β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 - wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1
plt.legend(borderpad=1.1)
plt.show()
748 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
44.6.2 Simulations
The next figure shows the outcomes of 500 simulations of the decision process.
On the left is a histogram of the stopping times, which equal the number of draws of 𝑧𝑘 re-
quired to make a decision.
The average number of draws is around 6.6.
On the right is the fraction of correct decisions at the stopping time.
In this case, the decision-maker is correct 80% of the time
"""
This function takes an initial condition and simulates until it
stops (when a decision is made)
"""
return π_new
if true_dist == "f0":
f, f_rvs = wf.f0, wf.f0_rvs
elif true_dist == "f1":
f, f_rvs = wf.f1, wf.f1_rvs
# Find cutoffs
β, α = find_cutoff_rule(wf, h_star)
if true_dist == "f0":
if decision == 0:
correct = True
else:
correct = False
return correct, π, t
"""
Simulates repeatedly to get distributions of time needed to make a
decision and how often they are correct
"""
for i in range(ndraws):
correct, π, t = simulate(wf, true_dist, h_star)
tdist[i] = t
cdist[i] = correct
def simulation_plot(wf):
h_star = solve_model(wf)
ndraws = 500
cdist, tdist = stopping_dist(wf, h_star, ndraws)
ax[0].hist(tdist, bins=np.max(tdist))
ax[0].set_title(f"Stopping times over {ndraws} replications")
ax[0].set(xlabel="time", ylabel="number of stops")
ax[0].annotate(f"mean = {np.mean(tdist)}", xy=(max(tdist) / 2,
max(np.histogram(tdist, bins=max(tdist))[0]) / 2))
ax[1].hist(cdist.astype(int), bins=2)
ax[1].set_title(f"Correct decisions over {ndraws} replications")
ax[1].annotate(f"% correct = {np.mean(cdist)}",
xy=(0.05, ndraws / 2))
plt.show()
simulation_plot(wf)
Converged in 26 iterations.
In [11]: wf = WaldFriedman(c=2.5)
simulation_plot(wf)
Converged in 13 iterations.
Increased cost per draw has induced the decision-maker to take less draws before deciding.
Because he decides with less, the percentage of time he is correct drops.
This leads to him having a higher expected loss when he puts equal weight on both models.
To facilitate comparative statics, we provide a Jupyter notebook that generates the same
plots, but with sliders.
With these sliders, you can adjust parameters and immediately observe
• effects on the smoothness of the value function in the indecisive middle range as we in-
crease the number of grid points in the piecewise linear approximation.
• effects of different settings for the cost parameters 𝐿0 , 𝐿1 , 𝑐, the parameters of two beta
distributions 𝑓0 and 𝑓1 , and the number of points and linear functions 𝑚 to use in the
piece-wise continuous approximation to the value function.
• various simulations from 𝑓0 and associated distributions of waiting times to making a
decision.
• associated histograms of correct and incorrect decisions.
For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed.
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test.
752 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
As a basis for choosing among critical regions the following considerations have
been advanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may
commit errors of two kinds. We commit an error of the first kind if we reject 𝐻0
when it is true; we commit an error of the second kind if we accept 𝐻0 when 𝐻1
is true. After a particular critical region 𝑊 has been chosen, the probability of
44.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 753
Let’s listen carefully to how Wald applies law of large numbers to interpret 𝛼 and 𝛽:
The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
power of the critical region.
Wald notes that
one critical region 𝑊 is more desirable than another if it has smaller values of 𝛼
and 𝛽. Although either 𝛼 or 𝛽 can be made arbitrarily small by a proper choice of
the critical region 𝑊 , it is possible to make both 𝛼 and 𝛽 arbitrarily small for a
fixed value of 𝑛, i.e., a fixed sample size.
Neyman and Pearson show that a region consisting of all samples (𝑧1 , 𝑧2 , … , 𝑧𝑛 )
which satisfy the inequality
𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓0 (𝑧𝑛 )
is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative hy-
pothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will have
the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test.
Here is how Wald introduces the notion of a sequential test
754 CHAPTER 44. A PROBLEM THAT STUMPED MILTON FRIEDMAN
A rule is given for making one of the following three decisions at any stage of the
experiment (at the m th trial for each integral value of m ): (1) to accept the hy-
pothesis H , (2) to reject the hypothesis H , (3) to continue the experiment by
making an additional observation. Thus, such a test procedure is carried out se-
quentially. On the basis of the first observation, one of the aforementioned deci-
sion is made. If the first or second decision is made, the process is terminated. If
the third decision is made, a second trial is performed. Again, on the basis of the
first two observations, one of the three decision is made. If the third decision is
made, a third trial is performed, and so on. The process is continued until either
the first or the second decisions is made. The number n of observations required
by such a test procedure is a random variable, since the value of n depends on the
outcome of the observations.
Footnotes
[1] Because the decision-maker believes that 𝑧𝑘+1 is drawn from a mixture of two IID distri-
butions, he does not believe that the sequence [𝑧𝑘+1 , 𝑧𝑘+2 , …] is IID Instead, he believes that
it is exchangeable. See [100] chapter 11, for a discussion of exchangeability.
Chapter 45
45.1 Contents
• Overview 45.2
• The Model 45.3
• Computation 45.4
• Exercises 45.5
• Solutions 45.6
45.2 Overview
In this lecture, we’re going to study a simple optimal growth model with one agent.
The model is a version of the standard one sector infinite horizon growth model studied in
• [149], chapter 2
• [108], section 3.1
• EDTC, chapter 1
• [153], chapter 12
The technique we use to solve the model is dynamic programming.
Our treatment of dynamic programming follows on from earlier treatments in our lectures on
shortest paths and job search.
We’ll discuss some of the technical details of dynamic programming as we go along.
45.2.1 Code
Regarding code, our implementation in this lecture will focus on clarity and flexibility.
Both of these things are nice, particularly for those readers still trying to understand the ma-
terial, but they do cost us some speed — as you will see when you run the code.
In the next lecture we will sacrifice some of this clarity and flexibility in order to accelerate
our code with just-in-time compilation.
755
756CHAPTER 45. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
%matplotlib inline
𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)
In what follows,
• The sequence {𝜉𝑡 } is assumed to be IID.
• The common distribution of each 𝜉𝑡 will be denoted 𝜙.
• The production function 𝑓 is assumed to be increasing and continuous.
• Depreciation of capital is not made explicit but can be incorporated into the production
function.
While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡 .
This will allow us to treat a stochastic model while maintaining only one state variable.
We consider alternative states and timing specifications in some of our other lectures.
45.3.2 Optimization
∞
𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0
subject to
where
• 𝑢 is a bounded, continuous and strictly increasing utility function and
• 𝛽 ∈ (0, 1) is a discount factor.
In (3) we are assuming that the resource constraint (1) holds with equality — which is rea-
sonable because 𝑢 is strictly increasing and no output will be wasted at the optimum.
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is
1. nonnegative,
3. optimal, in the sense that it maximizes (2) relative to all other feasible consumption
sequences, and
4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not on
future outcomes such as 𝜉𝑡+1 .
One way to think about solving this problem is to look for the best policy function.
A policy function is a map from past and present observables into current action.
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡 .
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy.
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today.
This is quite intuitive but if you wish you can find proofs in texts such as [149] (section 4.1).
Hereafter we focus on finding the best Markov policy.
In our context, a Markov policy is a function 𝜎 ∶ ℝ+ → ℝ+ , with the understanding that states
are mapped to actions via
758CHAPTER 45. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint.
The set of all feasible consumption policies will be denoted by Σ.
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via
This is the time path for output when we choose and stick with the policy 𝜎.
We insert this process into the objective function to get
∞ ∞
𝔼 [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = 𝔼 [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0
This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0 .
The aim is to select a policy that makes this number as large as possible.
The next section covers these ideas more formally.
45.3.4 Optimality
∞
𝑣𝜎 (𝑦) = 𝔼 [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0
The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies.
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in (8) for all 𝑦 ∈ ℝ+ .
45.3. THE MODEL 759
With our assumptions on utility and production function, the value function as defined in (8)
also satisfies a Bellman equation.
For this problem, the Bellman equation takes the form
The primary importance of the value function is that we can use it to compute optimal poli-
cies.
The details are as follows.
Given a continuous function 𝑣 on ℝ+ , we say that 𝜎 ∈ Σ is 𝑣-greedy if 𝜎(𝑦) is a solution to
for every 𝑦 ∈ ℝ+ .
In other words, 𝜎 ∈ Σ is 𝑣-greedy if it optimally trades off current and future rewards when 𝑣
is taken to be the value function.
In our setting, we have the following key result
The intuition is similar to the intuition for the Bellman equation, which was provided after
(9).
See, for example, theorem 10.1.11 of EDTC.
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy.
The advantage is that we are now solving a much lower dimensional optimization problem.
In other words, 𝑇 sends the function 𝑣 into the new function 𝑇 𝑣 defined by (11).
By construction, the set of solutions to the Bellman equation (9) exactly coincides with the
set of fixed points of 𝑇 .
For example, if 𝑇 𝑣 = 𝑣, then, for any 𝑦 ≥ 0,
One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on ℝ+ under the supremum distance
It’s not too hard to show that a 𝑣∗ -greedy policy exists (see EDTC, theorem 10.1.11 if you
get stuck).
Hence at least one optimal policy exists.
Our problem now is how to compute it.
The results stated above assume that the utility function is bounded.
In practice economists often work with unbounded utility functions — and so will we.
In the unbounded setting, various optimality theories exist.
Unfortunately, they tend to be case-specific, as opposed to valid for a large range of applica-
tions.
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”).
Consult, for example, section 12.2 of EDTC, [95] or [114].
45.4 Computation
Let’s now look at computing the value function and the optimal policy.
We will use fitted value function iteration, which was described in detail in a previous lecture.
The algorithm will be
1. Begin with an array of values {𝑣1 , … , 𝑣𝐼 } representing the values of some initial function
𝑣 on the grid points {𝑦1 , … , 𝑦𝐼 }.
2. Build a function 𝑣 ̂ on the state space ℝ+ by linear interpolation, based on these data
points.
To maximize the right hand side of the Bellman equation, we are going to use the
minimize_scalar routine from SciPy.
Since we are maximizing rather than minmizing, we will use the fact that the maximizer of 𝑔
on the interval [𝑎, 𝑏] is the minimizer of −𝑔 on the same interval.
To this end, and to keep the interface tidy, we will wrap minimize_scalar in an outer
function as follows:
We will assume for now that 𝜙 is the distribution of exp(𝜇 + 𝑠𝜁) when 𝜁 is standard normal.
We will store this and other primitives of the optimal growth model in a class.
The class, defined below, combines both parameters and a method that realizes the right
hand side of the Bellman equation (9).
def __init__(self,
u, # utility function
f, # production function
β=0.96, # discount factor
μ=0, # shock location parameter
s=0.1, # shock scale parameter
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e-5, grid_max, grid_size)
v = interp1d(self.grid, v_array)
In the last line, the expectation in (11) is computed via Monte Carlo, using the approxima-
tion
1 𝑛
∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑣(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1
* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function
"""
v_new = np.empty_like(v)
v_greedy = np.empty_like(v)
for i in range(len(grid)):
y = grid[i]
45.4.4 An Example
For this particular problem, an exact analytical solution is available (see [108], section 3.1.2),
with
𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦
It is valuable to have these closed-form solutions because it lets us check whether our code
works for this particular case.
In Python, the functions above can be expressed as
Next let’s create an instance of the model with the above primitives and assign it to the vari-
able og.
In [6]: α = 0.4
def fcd(k):
return k**α
og = OptimalGrowthModel(u=np.log, f=fcd)
Now let’s see what happens when we apply our Bellman operator to the exact solution 𝑣∗ in
this case.
In theory, since 𝑣∗ is a fixed point, the resulting function should again be 𝑣∗ .
In practice, we expect some small numerical error
fig, ax = plt.subplots()
ax.set_ylim(-35, -24)
ax.plot(grid, v, lw=2, alpha=0.6, label='$Tv^*$')
ax.plot(grid, v_init, lw=2, alpha=0.6, label='$v^*$')
ax.legend()
plt.show()
45.4. COMPUTATION 765
The two functions are essentially indistinguishable, so we are off to a good start.
Now let’s have a look at iterating with the Bellman operator, starting off from an arbitrary
initial condition.
The initial condition we’ll start with is, somewhat arbitrarily, 𝑣(𝑦) = 5 ln(𝑦)
fig, ax = plt.subplots()
ax.plot(grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial condition')
for i in range(n):
v_greedy, v = T(og, v) # Apply the Bellman operator
ax.plot(grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
ax.legend()
ax.set(ylim=(-40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()
766CHAPTER 45. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates
We can write a function that iterates until the difference is below a particular tolerance level.
# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
v = v_new
if i == max_iter:
print("Failed to converge!")
ax.legend()
ax.set_ylim(-35, -24)
plt.show()
768CHAPTER 45. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
ax.legend()
plt.show()
45.5. EXERCISES 769
The figure shows that we’ve done a good job in this instance of approximating the true pol-
icy.
45.5 Exercises
45.5.1 Exercise 1
A common choice for utility function in this kind of work is the CRRA specification
𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾
Maintaining the other defaults, including the Cobb-Douglas production function, solve the
optimal growth model with this utility specification.
In doing so,
• Set 𝛾 = 1.5.
• Use the solve_model function defined above.
• Time how long this function takes to run, so we can compare it to faster code developed
in the next lecture
45.6 Solutions
45.6.1 Exercise 1
def u_crra(c):
return (c**(1 - γ) - 1) / (1 - γ)
og = OptimalGrowthModel(u=u_crra, f=fcd)
In [14]: %%time
v_greedy, v_solution = solve_model(og)
Let’s plot the policy function just to see what it looks like:
ax.legend()
plt.show()
Chapter 46
46.1 Contents
• Overview 46.2
• The Model 46.3
• Computation 46.4
• Exercises 46.5
• Solutions 46.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
46.2 Overview
In a previous lecture, we studied a stochastic optimal growth model with one representative
agent.
We solved the model using dynamic programming.
In writing our code, we focused on clarity and flexibility.
These are good things but there’s often a trade-off between flexibility and speed.
The reason is that, when code is less flexible, we can exploit structure more easily.
(This is true about algorithms and mathematical problems more generally: more specific
problems have more structure, which, with some thought, can be exploited for better results.)
So, in this lecture, we are going to accept less flexibility while gaining speed, using just-in-
time compilation to accelerate our code.
Let’s start with some imports:
771
772CHAPTER 46. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
%matplotlib inline
𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾
46.4 Computation
As before, we will store the primitives of the optimal growth model in a class.
But now we are going to use Numba’s @jitclass decorator to target our class for JIT com-
pilation.
Because we are going to use Numba to compile our class, we need to specify the types of the
data:
In [3]: opt_growth_data = [
('α', float64), # Production parameter
('β', float64), # Discount factor
('μ', float64), # Shock location parameter
('γ', float64), # Preference parameter
('s', float64), # Shock scale parameter
('grid', float64[:]), # Grid (array)
('shocks', float64[:]) # Shock draws (array)
]
In [4]: @jitclass(opt_growth_data)
class OptimalGrowthModel:
def __init__(self,
α=0.4,
β=0.96,
μ=0,
s=0.1,
γ=1.5,
grid_max=4,
grid_size=120,
shock_size=250,
seed=1234):
# Set up grid
self.grid = np.linspace(1e-5, grid_max, grid_size)
In [5]: @jit(nopython=True)
def T(og, v):
"""
The Bellman operator.
* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function
"""
v_new = np.empty_like(v)
774CHAPTER 46. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
for i in range(len(og.grid)):
y = og.grid[i]
return v_new
Here’s another function, very similar to the last, that computes a 𝑣-greedy policy:
In [6]: @jit(nopython=True)
def get_greedy(og, v):
"""
Compute a v-greedy policy.
* og is an instance of OptimalGrowthModel
* v is an array representing a guess of the value function
"""
v_greedy = np.empty_like(v)
for i in range(len(og.grid)):
y = og.grid[i]
return v_greedy
The last two functions could be merged, as they were in our previous implementation, but we
resisted doing so to increase efficiency.
Here’s a function that iterates from a starting guess of the value function until the difference
between successive iterates is below a particular tolerance level.
# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
if i == max_iter:
46.4. COMPUTATION 775
print("Failed to converge!")
return v_new
In [8]: og = OptimalGrowthModel()
Now we call solve_model, using the %%time magic to check how long it takes.
In [9]: %%time
v_solution = solve_model(og)
You will notice that this is much faster than our original implementation.
Let’s plot the resulting policy:
fig, ax = plt.subplots()
ax.legend(loc='lower right')
plt.show()
776CHAPTER 46. OPTIMAL GROWTH II: ACCELERATING THE CODE WITH NUMBA
46.5 Exercises
46.5.1 Exercise 1
The next figure shows a simulation of 100 elements of this sequence for three different dis-
count factors (and hence three different policies)
46.6. SOLUTIONS 777
46.6 Solutions
46.6.1 Exercise 1
og = OptimalGrowthModel(β=β, s=0.05)
v_solution = solve_model(og)
v_greedy = get_greedy(og, v_solution)
ax.legend(loc='lower right')
plt.show()
Converged in 44 iterations.
Error at iteration 25 is 0.20961181523261985.
Error at iteration 50 is 0.008387575216147525.
Error at iteration 75 is 0.0006017314226482995.
Converged in 93 iterations.
Error at iteration 25 is 1.636332251620388.
Error at iteration 50 is 0.5549102065497209.
Error at iteration 75 is 0.3346444091976082.
Error at iteration 100 is 0.20194598162470356.
Error at iteration 125 is 0.12186727717256929.
Error at iteration 150 is 0.07354260348984099.
Error at iteration 175 is 0.04438036734114803.
Error at iteration 200 is 0.026781986385216783.
Error at iteration 225 is 0.016161984176847.
Error at iteration 250 is 0.009753187414219155.
Error at iteration 275 is 0.005885704607351272.
Error at iteration 300 is 0.003551815140596659.
Error at iteration 325 is 0.0021433951642393367.
Error at iteration 350 is 0.00129346338508185.
Error at iteration 375 is 0.0007805595314920311.
Error at iteration 400 is 0.0004710401467207248.
Error at iteration 425 is 0.0002842561149236644.
Error at iteration 450 is 0.00017153853966078714.
Error at iteration 475 is 0.00010351745852688055.
47.1 Contents
• Overview 47.2
• The Euler Equation 47.3
• Comparison with Value Function Iteration 47.4
• Implementation 47.5
• Exercises 47.6
• Solutions 47.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
47.2 Overview
In this lecture, we’ll continue our earlier study of the stochastic optimal growth model.
In that lecture, we solved the associated discounted dynamic programming problem using
value function iteration.
The beauty of this technique is its broad applicability.
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand.
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives.
We’ll use this structure to obtain an Euler equation based method that’s more efficient
than value function iteration for this and some other closely related applications.
In a subsequent lecture, we’ll see that the numerical implementation part of the Euler equa-
tion method can be further adjusted to obtain even more efficiency.
Let’s start with some imports:
781
782 CHAPTER 47. OPTIMAL GROWTH III: TIME ITERATION
Let’s take the model set out in the stochastic growth model lecture and add the assumptions
that
The last result is called the envelope condition due to its relationship with the envelope
theorem.
To see why (2) might be valid, write the Bellman equation in the equivalent form
Combining (2) and the first-order condition (3) gives the famous Euler equation
over interior consumption policies 𝜎, one solution of which is the optimal policy 𝜎∗ .
Our aim is to solve the functional equation (5) and hence obtain 𝜎∗ .
Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation.
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior (i.e., 0 < 𝜎(𝑦) < 𝑦 for all strictly positive 𝑦).
Henceforth we denote this set of policies by 𝒫
2. returns a new function 𝐾𝜎, where 𝐾𝜎(𝑦) is the 𝑐 ∈ (0, 𝑦) that solves.
We call this operator the Coleman-Reffett operator to acknowledge the work of [37] and
[130].
In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎.
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation (5).
In particular, the optimal policy 𝜎∗ is a fixed point.
Indeed, for fixed 𝑦, the value 𝐾𝜎∗ (𝑦) is the 𝑐 that solves
How does Euler equation time iteration compare with value function iteration?
Both can be used to compute the optimal policy, but is one faster or more accurate?
There are two parts to this story.
First, on a theoretical level, the two methods are essentially isomorphic.
In particular, they converge at the same rate.
We’ll prove this in just a moment.
The other side of the story is the accuracy of the numerical implementation.
It turns out that, once we actually implement these two routines, time iteration is more accu-
rate than value function iteration.
More on this below.
𝜏 ∘𝑔 =ℎ∘𝜏
𝑔 = 𝜏 −1 ∘ ℎ ∘ 𝜏 (8)
Here’s a similar figure that traces out the action of the maps on a point 𝑥 ∈ 𝑋
786 CHAPTER 47. OPTIMAL GROWTH III: TIME ITERATION
𝑔𝑛 = 𝜏 −1 ∘ ℎ𝑛 ∘ 𝜏
A Bijection
Let 𝒱 be all strictly concave, continuously differentiable functions 𝑣 mapping ℝ+ to itself and
satisfying 𝑣(0) = 0 and 𝑣′ (𝑦) > 𝑢′ (𝑦) for all positive 𝑦.
For 𝑣 ∈ 𝒱 let
Commutative Operators
It is an additional solved exercise (see below) to show that 𝑇 and 𝐾 commute under 𝑀 , in
the sense that
𝑀 ∘𝑇 =𝐾 ∘𝑀 (9)
𝑇 𝑛 = 𝑀 −1 ∘ 𝐾 𝑛 ∘ 𝑀
47.5 Implementation
We’ve just shown that the operators 𝑇 and 𝐾 have the same rate of convergence.
However, it turns out that, once numerical approximation is taken into account, significant
differences arise.
In particular, the image of policy functions under 𝐾 can be calculated faster and with greater
accuracy than the image of value functions under 𝑇 .
Our intuition for this result is that
• the Coleman-Reffett operator exploits more information because it uses first order and
envelope conditions
• policy functions generally have less curvature than value functions, and hence admit
more accurate approximations based on grid point information
First, we’ll store the parameters of the model in a class OptimalGrowthModel
def __init__(self,
f,
f_prime,
u,
u_prime,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):
"""
A function factory for building the Coleman-Reffett operator.
Here og is an instance of OptimalGrowthModel.
"""
β = og.β
f, u = og.f, og.u
f_prime, u_prime = og.f_prime, og.u_prime
grid, shocks = og.grid, og.shocks
@njit
def objective(c, σ, y):
"""
The right hand side of the operator
"""
# First turn w into a function via interpolation
σ_func = lambda x: interp(grid, σ, x)
vals = u_prime(σ_func(f(y - c) * shocks)) * f_prime(y - c) * shocks
return u_prime(c) - β * np.mean(vals)
@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star
return σ_new
return K
It has some similarities to the code for the Bellman operator in our optimal growth lecture.
For example, it evaluates integrals by Monte Carlo and approximates functions using linear
interpolation.
Here’s that Bellman operator code again, which needs to be executed because we’ll use it in
some tests below.
@njit
def objective(c, v, y):
"""
The right-hand side of the Bellman equation
"""
# First turn v into a function via interpolation
v_func = lambda x: interp(grid, v, x)
return u(c) + β * np.mean(v_func(f(y - c) * shocks))
@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal v at y
v_max = brent_max(objective, 1e-10, y, args=(v, y))[1]
v_new[i] = v_max
return v_new
@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
σ = np.empty_like(v)
for i in range(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_max = brent_max(objective, 1e-10, y, args=(v, y))[0]
σ[i] = c_max
return σ
return T, get_greedy
As we did for value function iteration, let’s start by testing our method in the presence of a
model that does have an analytical solution.
First, we generate an instance of OptimalGrowthModel and return the corresponding
Coleman-Reffett operator.
In [6]: α = 0.3
@njit
def f(k):
"Deterministic part of production function"
return k**α
@njit
def f_prime(k):
return α * k**(α - 1)
790 CHAPTER 47. OPTIMAL GROWTH III: TIME ITERATION
og = OptimalGrowthModel(f=f, f_prime=f_prime,
u=np.log, u_prime=njit(lambda x: 1/x))
K = time_operator_factory(og)
In [7]: @njit
def σ_star(y, α, β):
"True optimal policy"
return (1 - α * β) * y
fig, ax = plt.subplots()
ax.plot(grid, σ_star(grid, α, β), label="optimal policy $\sigma^*$")
ax.plot(grid, σ_star_new, label="$K\sigma^*$")
ax.legend()
plt.show()
We can’t really distinguish the two plots, so we are looking good, at least for this test.
Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗ .
The initial condition we’ll use is the one that eats the whole pie: 𝜎(𝑦) = 𝑦.
In [8]: n = 15
σ = grid.copy() # Set initial condition
fig, ax = plt.subplots(figsize=(9, 6))
lb = 'initial condition $\sigma(y) = y$'
47.5. IMPLEMENTATION 791
for i in range(n):
σ = K(σ)
ax.plot(grid, σ, color=plt.cm.jet(i / n), alpha=0.6)
plt.show()
We see that the policy has converged nicely, in only a few steps.
Now let’s compare the accuracy of iteration between the operators.
We’ll generate
1. 𝐾 𝑛 𝜎 where 𝜎(𝑦) = 𝑦
2. (𝑀 ∘ 𝑇 𝑛 ∘ 𝑀 −1 )𝜎 where 𝜎(𝑦) = 𝑦
for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration
As you can see, time iteration is much more accurate for a given number of iterations.
47.6 Exercises
47.6.1 Exercise 1
47.6.2 Exercise 2
47.6.3 Exercise 3
Consider the same model as above but with the CRRA utility function
𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾
Iterate 20 times with Bellman iteration and Euler equation time iteration
• start time iteration from 𝜎(𝑦) = 𝑦
• start value function iteration from 𝑣(𝑦) = 𝑢(𝑦)
• set 𝛾 = 1.5
Compare the resulting policies and check that they are close.
47.6.4 Exercise 4
Solve the above model as we did in the previous lecture using the operators 𝑇 and 𝐾, and
check the solutions are similiar by plotting.
47.7 Solutions
47.7.1 Exercise 1
47.7.2 Exercise 2
47.7.3 Exercise 3
Here’s the code, which will execute if you’ve run all the code above
@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)
@njit
def u_prime(c):
return c**(-γ)
T, get_greedy = operator_factory(og)
K = time_operator_factory(og)
for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration
47.7.4 Exercise 4
Here’s is the function we need to solve the model using value function iteration, copied from
the previous lecture
def solve_model(og,
use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):
T, _ = operator_factory(og, parallel_flag=use_parallel)
# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
if i == max_iter:
796 CHAPTER 47. OPTIMAL GROWTH III: TIME ITERATION
print("Failed to converge!")
return v_new
K = time_operator_factory(og, parallel_flag=use_parallel)
# Set up loop
σ = og.grid # Initial condition
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Converged in 10 iterations.
Time iteration is numerically far more accurate for a given number of iterations.
798 CHAPTER 47. OPTIMAL GROWTH III: TIME ITERATION
Chapter 48
48.1 Contents
• Overview 48.2
• Key Idea 48.3
• Implementation 48.4
• Speed 48.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
48.2 Overview
799
800 CHAPTER 48. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
Let’s start by reminding ourselves of the theory and then see how the numerics fit in.
48.3.1 Theory
Take the model set out in the time iteration lecture, following the same terminology and no-
tation.
The Euler equation is
The method discussed above requires a root-finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖 .
Root-finding is costly because it typically involves a significant number of function evalua-
tions.
As pointed out by Carroll [32], we can avoid this if 𝑦𝑖 is chosen endogenously.
The only assumption required is that 𝑢′ is invertible on (0, ∞).
The idea is this:
First, we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐).
Then we obtain 𝑐𝑖 via
48.4 Implementation
Let’s implement this version of the Coleman-Reffett operator and see how it performs.
First, we will construct a class OptimalGrowthModel to hold the parameters of the model.
"""
The class holds parameters and true value and policy functions.
"""
def __init__(self,
f, # Production function
f_prime, # f'(k)
u, # Utility function
u_prime, # Marginal utility
u_prime_inv, # Inverse marginal utility
β=0.96, # Discount factor
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):
self.f, self.u = f, u
self.f_prime, self.u_prime, self.u_prime_inv = f_prime, u_prime, \
u_prime_inv
# Set up grid
self.grid = np.linspace(1e-5, grid_max, grid_size)
# Store shocks
self.shocks = np.exp(μ + s * np.random.randn(shock_size))
def K(σ):
"""
The Bellman operator
* σ is a function
"""
# Allocate memory for value of consumption on endogenous grid points
c = np.empty_like(grid)
return σ_new
return K
We’ll also run our original implementation, which uses an exogenous grid and requires root-
finding, so we can perform some comparisons.
@njit
def objective(c, σ, y):
"""
The right hand side of the operator
"""
# First turn w into a function via interpolation
σ_func = lambda x: interp(grid, σ, x)
vals = u_prime(σ_func(f(y - c) * shocks)) * f_prime(y - c) * shocks
return u_prime(c) - β * np.mean(vals)
@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star
return σ_new
return K
As we did for value function iteration and time iteration, let’s start by testing our method
with the log-linear benchmark.
First, we generate an instance
@njit
def f(k):
"""
Cobb-Douglas production function
"""
return k**α
@njit
def f_prime(k):
"""
First derivative of the production function
"""
return α * k**(α - 1)
@njit
def u_prime(c):
return 1 / c
og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=np.log,
u_prime=u_prime,
u_prime_inv=u_prime)
def c_star(y):
"True optimal policy"
return (1 - α * β) * y
ax.legend()
plt.show()
48.4. IMPLEMENTATION 805
Out[8]: 9.881666666666669e-06
Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗ .
Let’s start from the consumption policy that eats the whole pie: 𝜎(𝑦) = 𝑦
In [9]: σ = lambda x: x
n = 15
fig, ax = plt.subplots(figsize=(9, 6))
for i in range(n):
σ = K(σ) # Update policy
ax.plot(grid, σ(grid), color=plt.cm.jet(i / n), alpha=0.6)
ax.legend()
plt.show()
806 CHAPTER 48. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
We see that the policy has converged nicely, in only a few steps.
48.5 Speed
Now let’s compare the clock times per iteration for the standard Coleman-Reffett operator
(with exogenous grid) and the EGM version.
We’ll do so using the CRRA model adopted in the exercises of the Euler equation time itera-
tion lecture.
@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)
@njit
def u_prime(c):
return c**(-γ)
@njit
def u_prime_inv(c):
return c**(-1 / γ)
og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=u,
u_prime=u_prime,
u_prime_inv=u_prime_inv)
K_time = time_operator_factory(og)
# Call once to compile jitted version
K_time(grid)
# Coleman-Reffett operator with endogenous grid
K_egm = egm_operator_factory(og)
In [11]: sim_length = 20
Out[11]: 0.3163783550262451
We see that the EGM version is significantly faster, even without jit compilation!
The absence of numerical root-finding means that it is typically more accurate at each step as
well.
808 CHAPTER 48. OPTIMAL GROWTH IV: THE ENDOGENOUS GRID METHOD
Chapter 49
49.1 Contents
• Overview 49.2
• The Optimal Savings Problem 49.3
• Computation 49.4
• Exercises 49.5
• Solutions 49.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
49.2 Overview
Next, we study an optimal savings problem for an infinitely lived consumer—the “common
ancestor” described in [108], section 1.3.
This is an essential sub-problem for many representative macroeconomic models
• [6]
• [88]
• etc.
It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways.
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint.
Our presentation of the model will be relatively brief.
• For further details on economic intuition, implication and models, see [108].
• Proofs of all mathematical results stated below can be found in this paper.
To solve the model we will use Euler equation based time iteration, similar to this lecture.
This method turns out to be globally convergent under mild assumptions, even when utility is
unbounded (both above and below).
809
810 CHAPTER 49. THE INCOME FLUCTUATION PROBLEM
49.2.1 References
Other useful references include [41], [43], [101], [127], [131] and [143].
Let’s write down the model and then discuss how to solve it.
49.3.1 Set-Up
Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
Here
• 𝛽 ∈ (0, 1) is the discount factor
• 𝑎𝑡 is asset holdings at time 𝑡, with ad-hoc borrowing constraint 𝑎𝑡 ≥ −𝑏
• 𝑐𝑡 is consumption
• 𝑧𝑡 is non-capital income (wages, unemployment compensation, etc.)
• 𝑅 ∶= 1 + 𝑟, where 𝑟 > 0 is the interest rate on savings
Non-capital income {𝑧𝑡 } is assumed to be a Markov process taking values in 𝑍 ⊂ (0, ∞) with
stochastic kernel Π.
This means that Π(𝑧, 𝐵) is the probability that 𝑧𝑡+1 ∈ 𝐵 given 𝑧𝑡 = 𝑧.
The expectation of 𝑓(𝑧𝑡+1 ) given 𝑧𝑡 = 𝑧 is written as
2. 𝑢 is smooth, strictly increasing and strictly concave with lim𝑐→0 𝑢′ (𝑐) = ∞ and
lim𝑐→∞ 𝑢′ (𝑐) = 0
The asset space is [−𝑏, ∞) and the state is the pair (𝑎, 𝑧) ∈ 𝑆 ∶= [−𝑏, ∞) × 𝑍.
A feasible consumption path from (𝑎, 𝑧) ∈ 𝑆 is a consumption sequence {𝑐𝑡 } such that {𝑐𝑡 }
and its induced asset path {𝑎𝑡 } satisfy
1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)
The meaning of the third point is just that consumption at time 𝑡 can only be a function of
outcomes that have already been observed.
∞
𝑉 (𝑎, 𝑧) ∶= sup 𝔼 {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0
where the supremum is overall feasible consumption paths from (𝑎, 𝑧).
An optimal consumption path from (𝑎, 𝑧) is a feasible consumption path from (𝑎, 𝑧) that at-
tains the supremum in (2).
To pin down such paths we can use a version of the Euler equation, which in the present set-
ting is
and
In essence, this says that the natural “arbitrage” relation 𝑢′ (𝑐𝑡 ) = 𝛽𝑅 𝔼𝑡 [𝑢′ (𝑐𝑡+1 )] holds when
the choice of current consumption is interior.
Interiority means that 𝑐𝑡 is strictly less than its upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏.
(The lower boundary case 𝑐𝑡 = 0 never arises at the optimum because 𝑢′ (0) = ∞)
When 𝑐𝑡 does hit the upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏, the strict inequality 𝑢′ (𝑐𝑡 ) > 𝛽𝑅 𝔼𝑡 [𝑢′ (𝑐𝑡+1 )]
can occur because 𝑐𝑡 cannot increase sufficiently to attain equality.
With some thought and effort, one can show that (3) and (4) are equivalent to
1. For each (𝑎, 𝑧) ∈ 𝑆, a unique optimal consumption path from (𝑎, 𝑧) exists
2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality (5) and
the transversality condition
Moreover, there exists an optimal consumption function 𝜎∗ ∶ 𝑆 → [0, ∞) such that the path
from (𝑎, 𝑧) generated by
(𝑎0 , 𝑧0 ) = (𝑎, 𝑧), 𝑧𝑡+1 ∼ Π(𝑧𝑡 , 𝑑𝑦), 𝑐𝑡 = 𝜎∗ (𝑎𝑡 , 𝑧𝑡 ) and 𝑎𝑡+1 = 𝑅𝑎𝑡 + 𝑧𝑡 − 𝑐𝑡
satisfies both (5) and (6), and hence is the unique optimal path from (𝑎, 𝑧).
In summary, to solve the optimization problem, we need to compute 𝜎∗ .
49.4 Computation
We can rewrite (5) to make it a statement about functions rather than random variables.
In particular, consider the functional equation
𝑢′ ∘ 𝜎 (𝑎, 𝑧) = max {𝛾 ∫ 𝑢′ ∘ 𝜎 {𝑅𝑎 + 𝑧 − 𝑐(𝑎, 𝑧), 𝑧}́ Π(𝑧, 𝑑𝑧)́ , 𝑢′ (𝑅𝑎 + 𝑧 + 𝑏)} (7)
where
We have to be careful with VFI (i.e., iterating with 𝑇 ) in this setting because 𝑢 is not as-
sumed to be bounded
• In fact typically unbounded both above and below — e.g. 𝑢(𝑐) = log 𝑐.
• In which case, the standard DP theory does not apply.
• 𝑇 𝑛 𝑣 is not guaranteed to converge to the value function for arbitrary continuous
bounded 𝑣.
Nonetheless, we can always try the popular strategy “iterate and hope”.
We can then check the outcome by comparing with that produced by TI.
The latter is known to converge, as described above.
49.4.3 Implementation
First, we build a class called ConsumerProblem that stores the model primitives.
814 CHAPTER 49. THE INCOME FLUCTUATION PROBLEM
self.u, self.du = u, du
self.r, self.R = r, 1 + r
self.β, self.b = β, b
self.Π, self.z_vals = np.array(Π), tuple(z_vals)
self.asset_grid = np.linspace(-b, grid_max, grid_size)
@njit
def euler_diff(c, a, z, i_z, σ):
"""
The difference of the left-hand side and the right-hand side
of the Euler Equation.
"""
lhs = du(c)
expectation = 0
for i in range(len(z_vals)):
expectation += du(interp(asset_grid, σ[:, i], R * a + z - c)) \
* Π[i_z, i]
rhs = max(γ * expectation, du(R * a + z + b))
@njit
def K(σ):
"""
The operator K.
49.4. COMPUTATION 815
return σ_new
return K
K uses linear interpolation along the asset grid to approximate the value and consumption
functions.
To solve for the optimal policy function, we will write a function solve_model to iterate
and find the optimal 𝜎.
"""
Solves for the optimal policy using time iteration
* cp is an instance of ConsumerProblem
"""
# Initial guess of σ
σ = np.empty((len(asset_grid), len(z_vals)))
for i_a, a in enumerate(asset_grid):
for i_z, z in enumerate(z_vals):
c_max = R * a + z + b
σ[i_a, i_z] = c_max
K = operator_factory(cp)
i = 0
error = tol + 1
if i == max_iter:
print("Failed to converge!")
return σ_new
Plotting the result using the default parameters of the ConsumerProblem class
In [6]: cp = ConsumerProblem()
σ_star = solve_model(cp)
Converged in 41 iterations.
The following exercises walk you through several applications where policy functions are com-
puted.
49.5. EXERCISES 817
49.5 Exercises
49.5.1 Exercise 1
49.5.2 Exercise 2
Now let’s consider the long run asset levels held by households.
We’ll take r = 0.03 and otherwise use default parameters.
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal
𝑎′ = ℎ(𝑎, 𝑧) ∶= 𝑅𝑎 + 𝑧 − 𝜎∗ (𝑎, 𝑧)
Hence to approximate the stationary distribution we can simulate a long time series for assets
and histogram, as in the following figure
49.5.3 Exercise 3
Following on from exercises 1 and 2, let’s look at how savings and aggregate asset holdings
vary with the interest rate
• Note: [108] section 18.6 can be consulted for more background on the topic treated in
this exercise.
For a given parameterization of the model, the mean of the stationary distribution can be in-
terpreted as aggregate capital in an economy with a unit mass of ex-ante identical households
facing idiosyncratic shocks.
Let’s look at how this measure of aggregate capital varies with the interest rate and borrow-
ing constraint.
The next figure plots aggregate capital against the interest rate for b in (1, 3)
820 CHAPTER 49. THE INCOME FLUCTUATION PROBLEM
49.6 Solutions
49.6.1 Exercise 1
49.6.2 Exercise 2
cp is an instance of ConsumerProblem
"""
Π, z_vals, R = cp.Π, cp.z_vals, cp.R # Simplify names
mc = MarkovChain(Π)
σ_star = solve_model(cp, verbose=False)
cf = lambda a, i_z: interp(cp.asset_grid, σ_star[:, i_z], a)
a = np.zeros(T+1)
z_seq = mc.simulate(T)
for t in range(T):
i_z = z_seq[t]
a[t+1] = R * a[t] + z_vals[i_z] - cf(a[t], i_z)
return a
cp = ConsumerProblem(r=0.03, grid_max=4)
a = compute_asset_series(cp)
49.6.3 Exercise 3
In [10]: M = 25
r_vals = np.linspace(0, 0.04, M)
fig, ax = plt.subplots(figsize=(10, 8))
Finished iteration b = 1
Finished iteration b = 3
49.6. SOLUTIONS 823
824 CHAPTER 49. THE INCOME FLUCTUATION PROBLEM
Chapter 50
50.1 Contents
• Overview 50.2
• Discrete DPs 50.3
• Solving Discrete DPs 50.4
• Example: A Growth Model 50.5
• Exercises 50.6
• Solutions 50.7
• Appendix: Algorithms 50.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
50.2 Overview
In this lecture we discuss a family of dynamic programming problems with the following fea-
tures:
2. an infinite horizon
3. discounted rewards
825
826 CHAPTER 50. DISCRETE STATE DYNAMIC PROGRAMMING
• asset pricing
• industrial organization, etc.
When a given model is not inherently discrete, it is common to replace it with a discretized
version in order to use discrete DP techniques.
This lecture covers
• the theory of dynamic programming in a discrete setting, plus examples and applica-
tions
• a powerful set of routines for solving discrete DPs from the QuantEcon code library
Let’s start with some imports:
50.2.2 Code
50.2.3 References
For background reading on dynamic programming and additional applications, see, for exam-
ple,
• [108]
• [84], section 3.5
50.3. DISCRETE DPS 827
• [126]
• [149]
• [136]
• [118]
• EDTC, chapter 5
∞
𝔼 ∑ 𝛽 𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 ) (1)
𝑡=0
where
• 𝑠𝑡 is the state variable
• 𝑎𝑡 is the action
• 𝛽 is a discount factor
• 𝑟(𝑠𝑡 , 𝑎𝑡 ) is interpreted as a current reward when the state is 𝑠𝑡 and the action chosen is
𝑎𝑡
Each pair (𝑠𝑡 , 𝑎𝑡 ) pins down transition probabilities 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) for the next period state
𝑠𝑡+1 .
Thus, actions influence not only current rewards but also the future time path of the state.
The essence of dynamic programming problems is to trade off current rewards vs favorable
positioning of the future state (modulo randomness).
Examples:
• consuming today vs saving and accumulating assets
• accepting a job offer today vs seeking a better one in the future
• exercising an option now vs waiting
50.3.1 Policies
The most fruitful way to think about solutions to discrete DP problems is to compare poli-
cies.
In general, a policy is a randomized map from past actions and states to current action.
In the setting formalized below, it suffices to consider so-called stationary Markov policies,
which consider only the current state.
In particular, a stationary Markov policy is a map 𝜎 from states to actions
• 𝑎𝑡 = 𝜎(𝑠𝑡 ) indicates that 𝑎𝑡 is the action to be taken in state 𝑠𝑡
It is known that, for any arbitrary policy, there exists a stationary Markov policy that domi-
nates it at least weakly.
• See section 5.5 of [126] for discussion and proofs.
In what follows, stationary Markov policies are referred to simply as policies.
828 CHAPTER 50. DISCRETE STATE DYNAMIC PROGRAMMING
The aim is to find an optimal policy, in the sense of one that maximizes (1).
Let’s now step through these ideas more carefully.
SA ∶= {(𝑠, 𝑎) ∣ 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)}
1. A reward function 𝑟 ∶ SA → ℝ.
2. A transition probability function 𝑄 ∶ SA → Δ(𝑆), where Δ(𝑆) is the set of probability
distributions over 𝑆.
3. A discount factor 𝛽 ∈ [0, 1).
We also use the notation 𝐴 ∶= ⋃𝑠∈𝑆 𝐴(𝑠) = {0, … , 𝑚 − 1} and call this set the action space.
A policy is a function 𝜎 ∶ 𝑆 → 𝐴.
A policy is called feasible if it satisfies 𝜎(𝑠) ∈ 𝐴(𝑠) for all 𝑠 ∈ 𝑆.
Denote the set of all feasible policies by Σ.
If a decision-maker uses a policy 𝜎 ∈ Σ, then
• the current reward at time 𝑡 is 𝑟(𝑠𝑡 , 𝜎(𝑠𝑡 ))
• the probability that 𝑠𝑡+1 = 𝑠′ is 𝑄(𝑠𝑡 , 𝜎(𝑠𝑡 ), 𝑠′ )
For each 𝜎 ∈ Σ, define
• 𝑟𝜎 by 𝑟𝜎 (𝑠) ∶= 𝑟(𝑠, 𝜎(𝑠)))
• 𝑄𝜎 by 𝑄𝜎 (𝑠, 𝑠′ ) ∶= 𝑄(𝑠, 𝜎(𝑠), 𝑠′ )
Notice that 𝑄𝜎 is a stochastic matrix on 𝑆.
It gives transition probabilities of the controlled chain when we follow policy 𝜎.
If we think of 𝑟𝜎 as a column vector, then so is 𝑄𝑡𝜎 𝑟𝜎 , and the 𝑠-th row of the latter has the
interpretation
Comments
• {𝑠𝑡 } ∼ 𝑄𝜎 means that the state is generated by stochastic matrix 𝑄𝜎 .
• See this discussion on computing expectations of Markov chains for an explanation of
the expression in (2).
Notice that we’re not really distinguishing between functions from 𝑆 to ℝ and vectors in ℝ𝑛 .
This is natural because they are in one to one correspondence.
50.3. DISCRETE DPS 829
Let 𝑣𝜎 (𝑠) denote the discounted sum of expected reward flows from policy 𝜎 when the initial
state is 𝑠.
To calculate this quantity we pass the expectation through the sum in (1) and use (2) to get
∞
𝑣𝜎 (𝑠) = ∑ 𝛽 𝑡 (𝑄𝑡𝜎 𝑟𝜎 )(𝑠) (𝑠 ∈ 𝑆)
𝑡=0
This function is called the policy value function for the policy 𝜎.
The optimal value function, or simply value function, is the function 𝑣∗ ∶ 𝑆 → ℝ defined by
(We can use max rather than sup here because the domain is a finite set)
A policy 𝜎 ∈ Σ is called optimal if 𝑣𝜎 (𝑠) = 𝑣∗ (𝑠) for all 𝑠 ∈ 𝑆.
Given any 𝑤 ∶ 𝑆 → ℝ, a policy 𝜎 ∈ Σ is called 𝑤-greedy if
As discussed in detail below, optimal policies are precisely those that are 𝑣∗ -greedy.
𝑇𝜎 𝑣 = 𝑟𝜎 + 𝛽𝑄𝜎 𝑣
Now that the theory has been set out, let’s turn to solution methods.
The code for solving discrete DPs is available in ddp.py from the QuantEcon.py code library.
It implements the three most important solution methods for discrete dynamic programs,
namely
• value function iteration
• policy function iteration
• modified policy function iteration
Let’s briefly review these algorithms and their implementation.
Perhaps the most familiar method for solving all manner of dynamic programs is value func-
tion iteration.
This algorithm uses the fact that the Bellman operator 𝑇 is a contraction mapping with fixed
point 𝑣∗ .
Hence, iterative application of 𝑇 to any initial function 𝑣0 ∶ 𝑆 → ℝ converges to 𝑣∗ .
The details of the algorithm can be found in the appendix.
This routine, also known as Howard’s policy improvement algorithm, exploits more closely the
particular structure of a discrete DP problem.
50.5. EXAMPLE: A GROWTH MODEL 831
1. A policy evaluation step that computes the value 𝑣𝜎 of a policy 𝜎 by solving the linear
equation 𝑣 = 𝑇𝜎 𝑣.
In the current setting, policy iteration computes an exact optimal policy in finitely many iter-
ations.
• See theorem 10.2.6 of EDTC for a proof.
The details of the algorithm can be found in the appendix.
Modified policy iteration replaces the policy evaluation step in policy iteration with “partial
policy evaluation”.
The latter computes an approximation to the value of a policy 𝜎 by iterating 𝑇𝜎 for a speci-
fied number of times.
This approach can be useful when the state space is very large and the linear system in the
policy evaluation step of policy iteration is correspondingly difficult to solve.
The details of the algorithm can be found in the appendix.
𝑠′ = 𝑎 + 𝑈 where 𝑈 ∼ 𝑈 [0, … , 𝐵]
1
if 𝑎 ≤ 𝑠′ ≤ 𝑎 + 𝐵
𝑄(𝑠, 𝑎, 𝑠′ ) ∶= { 𝐵+1 (3)
0 otherwise
This information will be used to create an instance of DiscreteDP by passing the following
information
1. An 𝑛 × 𝑚 reward array 𝑅.
3. A discount factor 𝛽.
self.populate_Q()
self.populate_R()
def populate_R(self):
50.5. EXAMPLE: A GROWTH MODEL 833
"""
Populate the R matrix, with R[s, a] = -np.inf for infeasible
state-action pairs.
"""
for s in range(self.n):
for a in range(self.m):
self.R[s, a] = self.u(s - a) if a <= s else -np.inf
def populate_Q(self):
"""
Populate the Q matrix by setting
for a in range(self.m):
self.Q[:, a, a:(a + self.B + 1)] = 1.0 / (self.B + 1)
In [7]: dir(results)
(In IPython version 4.0 and above you can also type results. and hit the tab key)
The most important attributes are v, the value function, and σ, the optimal policy
In [8]: results.v
In [9]: results.sigma
834 CHAPTER 50. DISCRETE STATE DYNAMIC PROGRAMMING
Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound
max_iter.
Let’s make sure this didn’t happen
In [10]: results.max_iter
Out[10]: 250
In [11]: results.num_iter
Out[11]: 3
Another interesting object is results.mc, which is the controlled chain defined by 𝑄𝜎∗ ,
where 𝜎∗ is the optimal policy.
In other words, it gives the dynamics of the state when the agent follows the optimal policy.
Since this object is an instance of MarkovChain from QuantEcon.py (see this lecture for more
discussion), we can easily simulate it, compute its stationary distribution and so on.
In [12]: results.mc.stationary_distributions
If we look at the bar graph we can see the rightward shift in probability mass
def u(c):
return c**α
s_indices = []
a_indices = []
Q = []
R = []
836 CHAPTER 50. DISCRETE STATE DYNAMIC PROGRAMMING
b = 1.0 / (B + 1)
for s in range(n):
for a in range(min(M, s) + 1): # All feasible a at this s
s_indices.append(s)
a_indices.append(a)
q = np.zeros(n)
q[a:(a + B + 1)] = b # b on these values, otherwise 0
Q.append(q)
R.append(u(s - a))
For larger problems, you might need to write this code more efficiently by vectorizing or using
Numba.
50.6 Exercises
In the stochastic optimal growth lecture dynamic programming lecture, we solve a benchmark
model that has an analytical solution to check we could replicate it numerically.
The exercise is to replicate this solution using DiscreteDP.
50.7 Solutions
50.7.1 Setup
In [15]: α = 0.65
f = lambda k: k**α
u = np.log
β = 0.95
Here we want to solve a finite state version of the continuous state model above.
We discretize the state space into a grid of size grid_size=500, from 10−6 to grid_max=2
In [16]: grid_max = 2
grid_size = 500
grid = np.linspace(1e-6, grid_max, grid_size)
We choose the action to be the amount of capital to save for the next period (the state is the
capital stock at the beginning of the period).
Thus the state indices and the action indices are both 0, …, grid_size-1.
50.7. SOLUTIONS 837
Action (indexed by) a is feasible at state (indexed by) s if and only if grid[a] <
f([grid[s]) (zero consumption is not allowed because of the log utility).
Thus the Bellman equation is:
# State-action indices
s_indices, a_indices = np.where(C > 0)
print(L)
print(s_indices)
print(a_indices)
118841
[ 0 1 1 … 499 499 499]
[ 0 0 1 … 389 390 391]
(Degenerate) transition probability matrix Q (of shape (L, grid_size)), where we choose
the scipy.sparse.lil_matrix format, while any format will do (internally it will be converted to
the csr format):
(If you are familiar with the data structure of scipy.sparse.csr_matrix, the following is the
most efficient way to create the Q matrix in the current case)
Notes
Here we intensively vectorized the operations on arrays to simplify the code.
As noted, however, vectorization is memory consumptive, and it can be prohibitively so for
grids with large size.
Out[22]: 10
Note that sigma contains the indices of the optimal capital stocks to save for the next pe-
riod. The following translates sigma to the corresponding consumption vector.
def v_star(k):
return c1 + c2 * np.log(k)
def c_star(k):
return (1 - ab) * k**α
Let us compare the solution of the discrete model with that of the original continuous model
Out[25]: 121.49819147053378
Out[26]: 0.012681735127500815
Out[27]: 0.003826523100010082
In fact, the optimal consumption obtained in the discrete version is not really monotone, but
the decrements are quite small:
Out[28]: False
Out[29]: 174
In [30]: np.abs(diff[dec_ind]).max()
Out[30]: 0.001961853339766839
Out[31]: True
840 CHAPTER 50. DISCRETE STATE DYNAMIC PROGRAMMING
Value Iteration
Out[32]: 294
Out[33]: True
Out[34]: 16
Out[35]: True
Speed Comparison
337 ms ± 27.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
26.1 ms ± 1.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
30 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
As is often the case, policy iteration and modified policy iteration are much faster than value
iteration.
Let us first visualize the convergence of the value iteration algorithm as in the lecture, where
we use ddp.bellman_operator implemented as a method of DiscreteDP
plt.show()
We next plot the consumption policies along with the value iteration
/home/ubuntu/anaconda3/lib/python3.7/site-packages/quantecon/compute_fp.py:151:
RuntimeWarning: max_iter attained before convergence in compute_fixed_point
warnings.warn(_non_convergence_msg, RuntimeWarning)
50.7. SOLUTIONS 843
Finally, let us work on Exercise 2, where we plot the trajectories of the capital stock for three
different discount factors, 0.9, 0.94, and 0.98, with initial condition 𝑘0 = 0.1.
sample_size = 25
fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel("time")
ax.set_ylabel("capital")
ax.set_ylim(0.10, 0.30)
ax.legend(loc='lower right')
plt.show()
This appendix covers the details of the solution algorithms implemented for DiscreteDP.
We will make use of the following notions of approximate optimality:
• For 𝜀 > 0, 𝑣 is called an 𝜀-approximation of 𝑣∗ if ‖𝑣 − 𝑣∗ ‖ < 𝜀.
• A policy 𝜎 ∈ Σ is called 𝜀-optimal if 𝑣𝜎 is an 𝜀-approximation of 𝑣∗ .
50.8. APPENDIX: ALGORITHMS 845
The DiscreteDP value iteration method implements value function iteration as follows
2. Compute 𝑣𝑖+1 = 𝑇 𝑣𝑖 .
3. If ‖𝑣𝑖+1 − 𝑣𝑖 ‖ < [(1 − 𝛽)/(2𝛽)]𝜀, then go to step 4; otherwise, set 𝑖 = 𝑖 + 1 and go to step
2.
4. If 𝜎𝑖+1 = 𝜎𝑖 , then return 𝑣𝜎𝑖 and 𝜎𝑖+1 ; otherwise, set 𝑖 = 𝑖 + 1 and go to step 2.
Given 𝜀 > 0, provided that 𝑣0 is such that 𝑇 𝑣0 ≥ 𝑣0 , the modified policy iteration algorithm
terminates in a finite number of iterations.
It returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy func-
tion (unless iter_max is reached).
See also the documentation for DiscreteDP.
Part VII
LQ Control
847
Chapter 51
LQ Control: Foundations
51.1 Contents
• Overview 51.2
• Introduction 51.3
• Optimality – Finite Horizon 51.4
• Implementation 51.5
• Extensions and Comments 51.6
• Further Applications 51.7
• Exercises 51.8
• Solutions 51.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
51.2 Overview
Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field.
This lecture provides an introduction to LQ control and its economic applications.
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems.
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially.
These themes appear repeatedly below.
Mathematically, LQ control problems are closely related to the Kalman filter
• Recursive formulations of linear-quadratic control problems and Kalman filtering prob-
lems both involve matrix Riccati equations.
• Classical formulations of linear control and linear filtering problems make use of similar
matrix decompositions (see for example this lecture and this lecture).
In reading what follows, it will be useful to have some familiarity with
• matrix manipulations
849
850 CHAPTER 51. LQ CONTROL: FOUNDATIONS
51.3 Introduction
The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences.
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem.
Here
• 𝑢𝑡 is a “control” vector, incorporating choices available to a decision-maker confronting
the current state 𝑥𝑡
• {𝑤𝑡 } is an uncorrelated zero mean shock process satisfying 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼, where the right-
hand side is the identity matrix
Regarding the dimensions
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
Example 1
𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡
Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income.
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as
This is clearly a special case of (1), with assets being the state and consumption being the
control.
Example 2
One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative.
This can easily be overcome by adding a sufficiently large mean.
Hence in this example, we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇.
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐.̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐.̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes
How can we write this new system in the form of equation (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side.
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion).
Fortunately, we can easily circumvent this problem by adding an extra state variable.
In particular, if we write
𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 ) + ( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0
𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0
852 CHAPTER 51. LQ CONTROL: FOUNDATIONS
51.3.2 Preferences
In the LQ model, the aim is to minimize flow of losses, where time-𝑡 loss is given by the
quadratic expression
Here
• 𝑅 is assumed to be 𝑛 × 𝑛, symmetric and nonnegative definite.
• 𝑄 is assumed to be 𝑘 × 𝑘, symmetric and positive definite.
Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [71] for details.
Example 1
A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices so that current loss is
Thus, for both the state and the control, loss is measured as squared distance from the origin.
(In fact, the general case (5) can also be understood in this way, but with 𝑅 and 𝑄 identify-
ing other – non-Euclidean – notions of “distance” from the zero vector).
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as
• deviation of inflation from some target level
• deviation of a firm’s capital stock from some desired quantity
The aim is to put the state close to the target, while using controls parsimoniously.
Example 2
Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐.̄
51.4. OPTIMALITY – FINITE HORIZON 853
Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it.
We will begin with the finite horizon case, with terminal time 𝑇 ∈ ℕ.
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0
51.4.2 Information
There’s one constraint we’ve neglected to mention so far, which is that the decision-maker
who solves this LQ problem knows only the present and the past, not the future.
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }.
When choosing these controls, the decision-maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system.
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only.
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡 .
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”)
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 .
In fact, it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision-maker needs to set controls optimally.
More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone.
854 CHAPTER 51. LQ CONTROL: FOUNDATIONS
Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 ).
Actually, the preceding discussion applies to all standard dynamic programming problems.
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡 .
51.4.3 Solution
To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backward induction that is conceptually similar to the approach adopted in this lecture.
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥.
Now consider the problem of the decision-maker in the second to last period.
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1 .
The decision-maker must trade-off current and (discounted) final losses, and hence solves
The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision-maker behaves
optimally.
Now let’s step back to 𝑇 − 2.
For a decision-maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision-maker at 𝑇 − 1.
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥.
The decision-maker chooses her control 𝑢 to trade off current loss against future loss, where
• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control.
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 ).
Her problem is therefore
Letting
The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem.
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls.
As a first step, let’s find out what the value functions look like.
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant.
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0.
Using this notation, (7) becomes
To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero.
Applying the relevant rules of matrix calculus, this gives
𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1
where
and
𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)
and
51.5 Implementation
We will use code from lqcontrol.py in QuantEcon.py to solve finite and infinite horizon linear
quadratic control problems.
In the module, the various updating, simulation and fixed point methods are wrapped in a
class called LQ, which includes
• Instance data:
– The required parameters 𝑄, 𝑅, 𝐴, 𝐵 and optional parameters C, β, T, R_f, N spec-
ifying a given LQ model
* set 𝑇 and 𝑅𝑓 to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
– the value function and policy data
* 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 in the finite horizon case
* 𝑑, 𝑃 , 𝐹 in the infinite horizon case
• Methods:
– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via (12), (13) and (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assum-
ing standard normal shocks
51.5.1 An Application
Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income.
Data contradicted the constancy of the marginal propensity to consume.
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream.
(See, for example, [56] or [119])
One property of those models is that households purchase and sell financial assets to make
consumption streams smoother than income streams.
The household savings problem outlined above captures these ideas.
The optimization problem for the household is to choose a consumption sequence in order to
minimize
51.5. IMPLEMENTATION 857
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0
0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0
Now that the problem is expressed in LQ form, we can proceed to the solution by applying
(12) and (14).
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via (15).
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106 .
The shocks {𝑤𝑡 } were taken to be IID and standard normal.
# Formulate as an LQ problem
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
A = [[1 + r, -c_bar + μ],
[0, 1]]
B = [[-1],
[ 0]]
C = [[σ],
[0]]
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
51.5. IMPLEMENTATION 859
The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation.
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income.
(But note that consumption becomes more irregular towards the end of life, when the zero
final asset requirement impinges more on consumption choices).
The second panel in the figure shows that the time path of assets 𝑎𝑡 is closely correlated with
cumulative unanticipated income, where the latter is defined as
𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0
A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets.
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement)
These results are relatively robust to changes in parameters.
For example, let’s increase 𝛽 from 1/(1 + 𝑟) ≈ 0.952 to 0.96 while keeping other parameters
fixed.
This consumer is slightly more patient than the last one, and hence puts relatively more
860 CHAPTER 51. LQ CONTROL: FOUNDATIONS
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
51.6. EXTENSIONS AND COMMENTS 861
We now have a slowly rising consumption stream and a hump-shaped build-up of assets in the
middle periods to fund rising consumption.
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income.
Let’s now consider a number of standard extensions to the LQ problem treated above.
In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0
Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by
∞
𝔼 {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0
In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡 .
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡.
That decision rules are constant over time is intuitive — after all, the decision-maker faces
the same infinite horizon at every stage, with only the current state changing.
Not surprisingly, 𝑃 and 𝑑 are also constant.
The stationary matrix 𝑃 is the solution to the discrete-time algebraic Riccati equation.
Equation (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃 into
the right-hand side of (21) is called the LQ Bellman operator.
The stationary optimal policy for this model is
𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽
The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1 .
An example infinite horizon problem is treated below.
Linear quadratic control problems of the class discussed above have the property of certainty
equivalence.
By this, we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process.
This can be confirmed by inspecting (22) or (19).
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics.
𝑇 −1
𝔼 { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0
The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡 .
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily.
Thus, for the dynamics we set
𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎛
⎜ 1 ⎞⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎛
⎜ 0 ⎞
⎟ ⎛
⎜ 0 ⎞
⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜
⎜ 0 ⎟⎟ ⎜
⎜ 0 ⎟
⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠
If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will find
that assets follow (25) as desired and that the other state variables also update appropriately.
To implement preference specification (24) we take
0 0 0 0 𝑞 0 0 0
⎛
⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞
⎟
𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠
The next figure shows a simulation of consumption and assets computed using the
compute_sequence method of lqcontrol.py with initial assets set to zero.
The asset path exhibits dynamics consistent with standard life cycle theory.
Exercise 1 gives the full set of parameters used here and asks you to replicate the figure.
In the previous application, we generated income dynamics with an inverted U shape using
polynomials and placed them in an LQ framework.
It is arguably the case that this income process still contains unrealistic features.
A more common earning profile is where
1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise
Here
• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
• 𝑠 is retirement income
We suppose that preferences are unchanged and given by (16).
The budget constraint is also unchanged and given by 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 .
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture.
In fact, this is a nontrivial problem, as the kink in the dynamics (28) at 𝐾 makes it very diffi-
cult to express the law of motion as a fixed-coefficient linear system.
However, we can still use our LQ methods here by suitably linking two-component LQ prob-
lems.
These two LQ problems describe the consumer’s behavior during her working life
(lq_working) and retirement (lq_retired).
(This is possible because, in the two separate periods of life, the respective income processes
[polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem,
it is still a dynamic programming problem, and hence we can use appropriate Bellman equa-
tions at every stage.
Based on this logic, we can
1. solve lq_retired by the usual backward induction procedure, iterating back to the
start of retirement.
866 CHAPTER 51. LQ CONTROL: FOUNDATIONS
2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification.
3. solve lq_working by backward induction from this choice of 𝑅𝑓 , iterating back to the
start of working life.
This process gives the entire life-time sequence of value functions and optimal policies.
The next figure shows one simulation based on this procedure.
The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure.
Once again, the dominant feature observable in the simulation is consumption smoothing.
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving.
Assets peak at retirement and subsequently decline.
𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡
∞
𝔼 { ∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0
Here
• 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs
• 𝑐 is average cost of production
This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition.
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0.
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period.
It’s not difficult to show that profit-maximizing output is
𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1
∞
min 𝔼 ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0
It’s now relatively straightforward to find 𝑅 and 𝑄 such that (30) can be written as (20).
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from (1) can be found by writing down the dynamics
of each element of the state.
Exercise 3 asks you to complete this process, and reproduce the preceding figures.
870 CHAPTER 51. LQ CONTROL: FOUNDATIONS
51.8 Exercises
51.8.1 Exercise 1
51.8.2 Exercise 2
51.8.3 Exercise 3
51.9 Solutions
51.9.1 Exercise 1
𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1
where {𝑤𝑡 } is IID 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 +
𝑚2 𝑡2 has an inverted U shape with
• 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and
• 𝑝(𝑇 ) = 0
# Formulate as an LQ problem
Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, -c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[-1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
51.9.2 Exercise 2
This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income.
The model is solved by combining two LQ programming problems as described in the lecture.
51.9. SOLUTIONS 873
up = np.column_stack((up_w, up_r))
c = up.flatten() + c_bar # Consumption
# Plot results
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))
plt.subplots_adjust(hspace=0.5)
for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)
plt.show()
51.9. SOLUTIONS 875
51.9.3 Exercise 3
The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem.
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡 .
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation
𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1
∞
min 𝔼 {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0
# Useful constants
m0 = (a0-c)/(2 * a1)
m1 = 1/(2 * a1)
# Formulate LQ problem
Q = γ
R = [[ a1, -a1, 0],
[-a1, a1, 0],
[ 0, 0, 0]]
A = [[ρ, 0, m0 * (1 - ρ)],
[0, 1, 0],
[0, 0, 1]]
B = [[0],
[1],
[0]]
C = [[m1 * σ],
[ 0],
[ 0]]
time = range(len(q))
ax.set(xlabel='Time', xlim=(0, max(time)))
ax.plot(time, q_bar, 'k-', lw=2, alpha=0.6, label=r'$\bar q_t$')
ax.plot(time, q, 'b-', lw=2, alpha=0.6, label='$q_t$')
ax.legend(ncol=2, **legend_args)
s = f'dynamics with $\gamma = {γ}$'
ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14)
plt.show()
51.9. SOLUTIONS 877
878 CHAPTER 51. LQ CONTROL: FOUNDATIONS
Chapter 52
52.1 Contents
• Overview 52.2
• The Savings Problem 52.3
• Alternative Representations 52.4
• Two Classic Examples 52.5
• Further Reading 52.6
• Appendix: The Euler Equation 52.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
52.2 Overview
This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [56].
Robert Hall cast Friedman’s model within a linear-quadratic setting [67].
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem.
We use the model as a vehicle for illustrating
• alternative formulations of the state of a dynamic system
• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income
Background readings on the linear-quadratic-Gaussian permanent income model are Hall’s
[67] and chapter 2 of [108].
Let’s start with some imports
879
880 CHAPTER 52. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
import numpy as np
import random
from numba import njit
In this section, we state and solve the savings and consumption problem faced by the con-
sumer.
52.3.1 Preliminaries
𝔼𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …
𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1
𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1
Not every martingale arises as a random walk (see, for example, Wald’s martingale).
A consumer has preferences over consumption streams that are ordered by the utility func-
tional
∞
𝔼0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0
52.3. THE SAVINGS PROBLEM 881
where
• 𝔼𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information
• 𝑐𝑡 is time 𝑡 consumption
• 𝑢 is a strictly concave one-period utility function
• 𝛽 ∈ (0, 1) is a discount factor
The consumer maximizes (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0 subject
to the sequence of budget constraints
1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1
Here
• 𝑦𝑡 is an exogenous endowment process.
• 𝑟 > 0 is a time-invariant risk-free net interest rate.
• 𝑏𝑡 is one-period risk-free debt maturing at 𝑡.
The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random.
52.3.3 Assumptions
For the remainder of this lecture, we follow Friedman and Hall in assuming that (1+𝑟)−1 = 𝛽.
Regarding the endowment process, we assume it has the state-space representation
where
• {𝑤𝑡 } is an IID vector process with 𝔼𝑤𝑡 = 0 and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
• The spectral radius of 𝐴 satisfies 𝜌(𝐴) < √1/𝛽.
• 𝑈 is a selection vector that pins down 𝑦𝑡 as a particular linear combination of compo-
nents of 𝑧𝑡 .
The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite.
Regarding preferences, we assume the quadratic utility function
Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired.
∞
𝔼0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0
This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever.
𝔼𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)
(In fact, quadratic preferences are necessary for this conclusion Section ??)
One way to interpret (6) is that consumption will change only when “new information” about
permanent income is revealed.
These ideas will be clarified below.
Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes.
𝑡
To accomplish this, observe first that (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0.
Using this restriction on the debt path and solving (2) forward yields
52.3. THE SAVINGS PROBLEM 883
∞
𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0
Take conditional expectations on both sides of (7) and use the martingale property of con-
sumption and the law of iterated expectations to deduce
∞
𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽
∞ ∞
𝑗 𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0
∞ ∞
∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0
𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟
884 CHAPTER 52. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡
To get from the second last to the last expression in this chain of equalities is not trivial.
∞
A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗 .
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡 .
A State-Space Representation
We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:
𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ −1 ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴) (𝐴 − 𝐼) 1 0
and
𝑈 0 𝑦
𝑈̃ = [ ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 −(1 − 𝛽) 𝑐𝑡
𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
̃
𝑦𝑡̃ = 𝑈 𝑥𝑡
We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = 𝔼𝑥𝑡 and covariance Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡
̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡
𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′
52.3. THE SAVINGS PROBLEM 885
To gain some preliminary intuition on the implications of (11), let’s look at a highly stylized
example where income is just IID.
(Later examples will investigate more realistic income streams)
In particular, let {𝑤𝑡 }∞
𝑡=1 be IID and scalar standard normal, and let
𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0
𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1
Thus income is IID and debt and consumption are both Gaussian random walks.
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date.
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15
In [3]: r = 0.05
β = 1 / (1 + r)
σ = 0.15
μ = 1
T = 60
@njit
def time_path(T):
w = np.random.randn(T+1) # w_0, w_1, ..., w_T
w[0] = 0
b = np.zeros(T+1)
for t in range(1, T+1):
b[t] = w[1:t].sum()
b = -σ * b
c = μ + (1 - β) * (σ * w - b)
return w, b, c
w, b, c = time_path(T)
ax.set_xlabel('Time')
plt.show()
b_sum = np.zeros(T+1)
for i in range(250):
w, b, c = time_path(T) # Generate new time path
rcolor = random.choice(('c', 'g', 'b', 'k'))
ax.plot(c, color=rcolor, lw=0.8, alpha=0.7)
ax.grid()
ax.set(xlabel='Time', ylabel='Consumption')
plt.show()
52.4. ALTERNATIVE REPRESENTATIONS 887
In this section, we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways.
Hall [67] suggested an insightful way to summarize the implications of LQ permanent income
theory.
First, to represent the solution for 𝑏𝑡 , shift (9) forward one period and eliminate 𝑏𝑡+1 by using
(2) to obtain
∞
𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0
∞
If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 𝔼𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain
∞
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {𝔼𝑡+1 [𝑦𝑡+𝑗+1 ] − 𝔼𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0
The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }.
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of (16) and (8), which we
repeat:
888 CHAPTER 52. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
∞
1
𝑏𝑡 = ∑ 𝛽 𝑗 𝔼𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡
Equation (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value of
its endowment minus the expected present value of its consumption stream.
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡 .
Recalling again our discussion on forecasting geometric sums, we have
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0
∞
𝔼𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0
∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0
Using these formulas together with (3) and substituting into (16) and (17) gives the following
representation for the consumer’s optimum decision rule:
52.4.2 Cointegration
Representation (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that Engle
and Granger [51] called cointegration.
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models.
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary
Section ??.
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see (11) for
𝑏𝑡 ).
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary.
In particular, from the second equality in (18) we have
52.4. ALTERNATIVE REPRESENTATIONS 889
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (20)
𝑗=0
Equation (20) asserts that the cointegrating residual on the left side equals the conditional
expectation of the geometric sum of future incomes on the right Section ??.
Consider again (18), this time in light of our discussion of distribution dynamics in the lec-
ture on linear systems.
The dynamics of 𝑐𝑡 are given by
or
𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1
The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡.
In particular, since {𝑤̂ 𝑡 } is IID, we have
where
Impulse response functions measure responses to various impulses (i.e., temporary shocks).
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box.
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1.
It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡 .
The endowment process defined by (3) has the moving average representation
where
∞
• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator Section ??
• at time 𝑡, the consumer has an information set Section ?? 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]
Notice that
It follows that
The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡 .
𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ] + [ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1
Here
• 𝑤𝑡+1 is an IID 2 × 1 process distributed as 𝑁 (0, 𝐼).
• 𝑧1𝑡 is a permanent component of 𝑦𝑡 .
• 𝑧2𝑡 is a purely transitory component of 𝑦𝑡 .
52.5. TWO CLASSIC EXAMPLES 891
52.5.1 Example 1
Formula (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income 𝑧1𝑡+1
leads to
• a permanent one-for-one increase in consumption and
• no increase in savings −𝑏𝑡+1
But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income.
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1 .
Application of the formula for debt in (11) to this example shows that
This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved.
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions
In [5]: r = 0.05
β = 1 / (1 + r)
S = 5 # Impulse date
σ1 = σ2 = 0.15
@njit
def time_path(T, permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] - σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 - β) * σ2 * w2[t+1]
return b, c
L = 0.175
axes[0].legend(loc='lower right')
plt.tight_layout()
plt.show()
52.5.2 Example 2
Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡 .
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in (18).
The discussion in sections 2.9.1 and 2.11.3 of [108] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is
52.6. FURTHER READING 893
𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where
• 𝐾 ∶= the stationary Kalman gain
• 𝑎𝑡 ∶= 𝑦𝑡 − 𝐸[𝑦𝑡 | 𝑦𝑡−1 , … , 𝑦0 ]
In the same discussion in [108] it is shown that 𝐾 ∈ [0, 1] and that 𝐾 increases as 𝜎1 /𝜎2 does.
In other words, 𝐾 increases as the ratio of the standard deviation of the permanent shock to
that of the transitory shock increases.
Please see first look at the Kalman filter.
Applying formulas (18) implies
where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as
This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved.
The model described above significantly changed how economists think about consumption.
894 CHAPTER 52. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data.
For example, liquidity constraints and precautionary savings appear to be present sometimes.
Further discussion can be found in, e.g., [68], [125], [41], [31].
𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟
𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 𝔼0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅
53.1 Contents
• Overview 53.2
• Setup 53.3
• The LQ Approach 53.4
• Implementation 53.5
• Two Example Economies 53.6
Co-author: Chase Coleman
In addition to what’s in Anaconda, this lecture will need the following libraries:
53.2 Overview
This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption.
As we saw in our previous lecture on this topic, Robert Hall [67] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth.
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income
• consumption is a random walk
• financial wealth has a unit root and is cointegrated with consumption
Other applications use the same LQ framework.
For example, a model isomorphic to the LQ permanent income model has been used by
Robert Barro [14] to interpret intertemporal comovements of a government’s tax collections,
its expenditures net of debt service, and its public debt.
This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model.
It is just a matter of appropriately relabeling the variables in Hall’s model.
897
898 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
53.3 Setup
Let’s recall the basic features of the model discussed in the permanent income model.
Consumer preferences are ordered by
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0
The interpretation of all variables and parameters are the same as in the previous lecture.
We continue to assume that (1 + 𝑟)𝛽 = 1.
The dynamics of {𝑦𝑡 } again follow the linear state space model
53.3. SETUP 899
The restrictions on the shock process and parameters are the same as in our previous lecture.
For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:
We can map this into the linear state space framework in (4), as discussed in our lecture on
linear models.
To do so we take
900 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
1 1 0 0 0
𝑧𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥,
⎤ 𝐴=⎡
⎢𝛼 𝜌1 𝜌2 ⎤
⎥, 𝐶=⎡ 𝜎
⎢ ⎥,
⎤ and 𝑈 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0 ⎦ ⎣0⎦
Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions.
Here we solve the same model using LQ methods based on dynamic programming.
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LinearStateSpace class to illustrate features of the model.
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model.
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income.
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that
• finding the state (of a dynamic programming problem) is an art, and
• iterations on a Bellman equation implicitly jointly solve both a forecasting problem and
a control problem
Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize
∞
𝔼 ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0
̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)
where 𝑤𝑡+1 is IID with mean vector zero and 𝔼𝑤𝑡 𝑤𝑡′ = 𝐼.
The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in (4).
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where
• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion.
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶̃ ′̃ ).
1
𝑧𝑡 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦
as the state vector and 𝑢𝑡 ∶= 𝑐𝑡 − 𝛾 as the control.
With this notation and 𝑈𝛾 ∶= [𝛾 0 0], we can write the state dynamics as in (5) when
𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0
Please confirm for yourself that, with these definitions, the LQ dynamics (5) match the dy-
namics of 𝑧𝑡 and 𝑏𝑡 described above.
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set
• 𝑄 ∶= 1 (remember that we are minimizing) and
• 𝑅 ∶= a 4 × 4 matrix of zeros
However, there is one problem remaining.
We have no direct way to capture the non-recursive restriction (3) on the debt sequence {𝑏𝑡 }
from within the LQ framework.
To try to enforce it, we’re going to use a trick: put a small penalty on 𝑏𝑡2 in the criterion func-
tion.
In the present setting, this means adding a small entry 𝜖 > 0 in the (4, 4) position of 𝑅.
That will induce a (hopefully) small approximation error in the decision rule.
We’ll check whether it really is small numerically soon.
53.5 Implementation
R = 1 / β
A = np.array([[1., 0., 0.],
[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
G = np.array([[0., 1., 0.]])
QLQ = np.array([1.0])
BLQ = np.array([0., 0., 0., R]).reshape(4,1)
CLQ = np.array([0., σ, 0., 0.]).reshape(4,1)
β_LQ = β
A =
[[ 1. 0. 0. 0. ]
[10. 0.9 0. 0. ]
53.5. IMPLEMENTATION 903
[ 0. 1. 0. 0. ]
[ 0. -1.05263158 0. 1.05263158]]
B =
[[0. ]
[0. ]
[0. ]
[1.05263158]]
R =
[[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 1.e-09]]
Q =
[1.]
We’ll save the implied optimal policy function soon compare them with what we get by em-
ploying an alternative solution method
In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method.
The method was based around
• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings.
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations.
• solving those equations to obtain the solution.
Expressed in state space notation, the solution took the form
In [8]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G @ la.inv(np.eye(3, 3) - β * A) @ (A - np.eye(3, 3))
c_pol = (1 - β) * G @ la.inv(np.eye(3, 3) - β * A)
# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = np.array([1., 0., 0., 0.])
Σ_0 = np.zeros((4, 4))
A_LSS calculated as we have here should equal ABF calculated above using the LQ model
We have verified that the two methods give the same solution.
Now let’s create instances of the LinearStateSpace class and use it to do some interesting ex-
periments.
To do this, we’ll use the outcomes from our second method.
• In the second example, while all begin with zero debt, we draw their initial income lev-
els from the invariant distribution of financial income.
– Consumers are ex-ante heterogeneous.
In the first example, consumers’ nonfinancial income paths display pronounced transients
early in the sample
• these will affect outcomes in striking ways
Those transient effects will not be present in the second example.
We use methods affiliated with the LinearStateSpace class to simulate the model.
We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In the first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths.
A second graph plots a collection of simulations against the population distribution that we
extract from the LinearStateSpace instance LSS.
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers
# Simulation/Moment Parameters
moment_generator = lss.moment_sequence()
for i in range(npaths):
906 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
sims = lss.simulate(T)
bsim[i, :] = sims[0][-1, :]
csim[i, :] = sims[1][1, :]
ysim[i, :] = sims[1][0, :]
# Get T
T = bsim.shape[1]
# Plot debt
ax[1].plot(bsim[0, :], label="b", color="r")
ax[1].plot(bsim.T, alpha=.1, color="r")
ax[1].legend(loc=4)
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
# Consumption fan
ax[0].plot(xvals, cons_mean, color="k")
ax[0].plot(csim.T, color="k", alpha=.25)
ax[0].fill_between(xvals, c_perc_95m, c_perc_95p, alpha=.25, color="b")
ax[0].fill_between(xvals, c_perc_90m, c_perc_90p, alpha=.25, color="r")
ax[0].set(title="Consumption/Debt over time",
ylim=(cmean-15, cmean+15), ylabel="consumption")
# Debt fan
ax[1].plot(xvals, debt_mean, color="k")
ax[1].plot(bsim.T, color="k", alpha=.25)
ax[1].fill_between(xvals, d_perc_95m, d_perc_95p, alpha=.25, color="b")
ax[1].fill_between(xvals, d_perc_90m, d_perc_90p, alpha=.25, color="r")
ax[1].set(xlabel="t", ylabel="debt")
fig.tight_layout()
return fig
Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0
plt.show()
908 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
plt.show()
53.6. TWO EXAMPLE ECONOMIES 909
∞
(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0
So at time 0 we have
∞
𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0
This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0.
To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt.
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income.
910 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt.
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex-ante identical people each of whom starts with zero debt.
All of them accumulate debt in anticipation of rising nonfinancial income.
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0.
Cointegration Residual
The following figure plots realizations of the left side of (6), which, as discussed in our last
lecture, is called the cointegrating residual.
As mentioned above, the right side can be thought of as an annuity payment on the expected
∞
present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 .
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −
∞
𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together.
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary.
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample.
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below
return fig
When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample.
Average debt in the cross-section rises and approaches the asymptote.
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes.
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅.
This excess supply is filled by “foreigner lenders” willing to make those loans.
We can use virtually the same code to rig a “poor man’s Bewley [22] model” in the following
way
• as before, we start everyone at 𝑏0 = 0.
𝑦
• But instead of starting everyone at 𝑦−1 = 𝑦−2 = 0, we draw [ −1 ] from the invariant
𝑦−2
distribution of the {𝑦𝑡 } process.
This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1 .
Across the group of people being analyzed, risk-free loans are in zero excess supply.
912 CHAPTER 53. OPTIMAL SAVINGS II: LQ TECHNIQUES
We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply.
So the risk-free loans are being made from one person to another within our closed set of
agent.
There is no need for foreigners to lend to our group.
Let’s have a look at the corresponding figures
plt.show()
plt.show()
53.6. TWO EXAMPLE ECONOMIES 913
54.1 Contents
• Overview 54.2
• Two Representations of the Same Nonfinancial Income Process ??
• State Space Representations 54.4
Co-author: Zejin Shi
In addition to what’s in Anaconda, this lecture employs the following libraries:
54.2 Overview
This lecture studies two consumers who have exactly the same nonfinancial income process
and who both conform to the linear-quadratic permanent income of consumption smoothing
model described in the quantecon lecture.
The two consumers have different information about future nonfinancial incomes.
One consumer each period receives news in the form of a shock that simultaneously affects
both today’s nonfinancial income and the present value of future nonfinancial incomes in a
particular way.
The other, less well informed, consumer each period receives a shock that equals the part of
today’s nonfinancial income that could not be forecast from all past values of nonfinancial
income.
Even though they receive exactly the same nonfinancial incomes each period, our two con-
sumers behave differently because they have different information about their future nonfi-
nancial incomes.
The second consumer receives less information about future nonfinancial incomes in a sense
that we shall make precise below.
This difference in their information sets manifests itself in their responding differently to what
they regard as time 𝑡 information shocks.
915
916 CHAPTER 54. INFORMATION AND CONSUMPTION SMOOTHING
Thus, while they receive exactly the same histories of nonfinancial income, our two consumers
receive different shocks or news about their future nonfinancial incomes.
We compare behaviors of our two consumers as a way to learn about
• operating characteristics of the linear-quadratic permanent income model
• how the Kalman filter introduced in this lecture and/or the theory of optimal forecast-
ing introduced in this lecture embody lessons that can be applied to the news and
noise literature
• various ways of representing and computing optimal decision rules in the linear-
quadratic permanent income model
• a Ricardian equivalence outcome describing effects on optimal consumption of a tax
cut at time 𝑡 accompanied by a foreseen permanent increases in taxes that is just suffi-
cient to cover the interest payments used to service the risk-free government bonds that
are issued to finance the tax cut
• a simple application of alternative ways to factor a covariance generating function along
lines described in this lecture
This lecture can be regarded as an introduction to some of the invertibility issues that take
center stage in the analysis of fiscal foresight by Eric Leeper, Todd Walker, and Susan Yang
[? ].
Where 𝛽 ∈ (0, 1), we study consequences of endowing a consumer with one of the two alterna-
tive representations for the change in the consumer’s nonfinancial income 𝑦𝑡+1 − 𝑦𝑡 .
The first representation, which we shall refer to as the original representation, is
where {𝜖𝑡 } is an i.i.d. normally distributed scalar process with means of zero and contempo-
raneous variances 𝜎𝜖2 .
This representation of the process is used by a consumer who at time 𝑡 knows both 𝑦𝑡 and the
original shock 𝜖𝑡 and can use both of them to forecast future 𝑦𝑡+𝑗 ’s.
Furthermore, as we’ll see below, representation (1) has the peculiar property that a positive
shock 𝜖𝑡+1 leaves the discounted present value of the consumer’s financial income at time 𝑡 + 1
unaltered.
The second representation of the same {𝑦𝑡 } process is
where {𝑎𝑡 } is another i.i.d. normally distributed scalar process, with means of zero and now
variances 𝜎𝑎2 .
The two i.i.d. shock variances are related by
so that the variance of the innovation exceeds the variance of the original shock by a multi-
plicative factor 𝛽 −2 .
The second representation is the innovations representation from Kalman filtering theory.
To see how this works, note that equating representations (1) and (2) for 𝑦𝑡+1 − 𝑦𝑡 implies
𝜖𝑡+1 − 𝛽 −1 𝜖𝑡 = 𝑎𝑡+1 − 𝛽𝑎𝑡 , which in turn implies
Solving this difference equation backwards for 𝑎𝑡+1 gives, after a few lines of algebra,
∞
𝑎𝑡+1 = 𝜖𝑡+1 + (𝛽 − 𝛽 −1 ) ∑ 𝛽 𝑗 𝜖𝑡−𝑗 (3)
𝑗=0
∞
𝑎𝑡+1 = ∑ 𝜖𝑡+1−𝑗 ≡ ℎ(𝐿)𝜖𝑡+1
𝑗=0
𝐼 − 𝛽 −1 𝐿
ℎ(𝐿) =
𝐼 − 𝛽𝐿
Let 𝑐𝑗 ≡ 𝐸𝑧𝑡 𝑧𝑡−𝑗 be the 𝑗th autocovariance of the {𝑦𝑡 − 𝑦𝑡−1 } process.
Using calculations in the quantecon lecture, where 𝑧 ∈ 𝐶 is a complex variable, the covariance
∞
generating function 𝑔(𝑧) = ∑𝑗=−∞ 𝑐𝑗 𝑧𝑗 of the {(𝑦𝑡 − 𝑦𝑡−1 )} process equals
𝜎𝑎2 = 𝛽 −1 𝜎𝜖2 .
To verify these claims, just notice that 𝑔(𝑧) = 𝛽 −2 𝜎𝜖2 implies that the coefficient 𝑔0 = 𝛽 −2 𝜎𝜖2
and that 𝑔𝑗 = 0 for 𝑗 ≠ 0.
Alternatively, if you are uncomfortable with covariance generating functions, note that we can
directly calculate 𝜎𝑎2 from formula (3) according to
∞
𝜎𝑎2 = 𝜎𝜖2 + [1 + (𝛽 − 𝛽 −1 )2 ∑ 𝛽 2𝑗 ] = 𝛽 −1 𝜎𝜖2 .
𝑗=0
We can also obtain representation (2) from representation (1) by using the Kalman filter.
918 CHAPTER 54. INFORMATION AND CONSUMPTION SMOOTHING
Thus, from equations associated with the Kalman filter, it can be verified that the steady-
state Kalman gain 𝐾 = 𝛽 2 and the steady state conditional covariance Σ = 𝐸[(𝜖𝑡 −
𝜖𝑡̂ )2 |𝑦𝑡−1 , 𝑦𝑡−2 , …] = (1 − 𝛽 2 )𝜎𝜖2 .
In a little more detail, let 𝑧𝑡 = 𝑦𝑡 − 𝑦𝑡−1 and form the state-space representation
𝜖𝑡+1
̂ = 0𝜖𝑡̂ + 𝐾𝑎𝑡+1
𝑧𝑡+1 = −𝛽 −1 𝑎𝑡 + 𝑎𝑡+1
By applying formulas for the steady-state Kalman filter, by hand we computed that 𝐾 =
𝛽 2 , 𝜎𝑎2 = 𝛽 −2 𝜎𝜖2 = 𝛽 −2 , and Σ = (1 − 𝛽 2 )𝜎𝜖2 .
We can also obtain these formulas via the classical filtering theory described in this lecture.
Representation (1) is cast in terms of a news shock 𝜖𝑡+1 that represents a shock to nonfinan-
cial income coming from taxes, transfers, and other random sources of income changes known
to a well-informed person having all sorts of information about the income process.
Representation (2) for the same income process is driven by shocks 𝑎𝑡 that contain less infor-
mation than the news shock 𝜖𝑡 .
Representation (2) is called the innovations representation for the {𝑦𝑡 − 𝑦𝑡−1 } process.
It is cast in terms of what time series statisticians call the innovation or fundamental
shock that emerges from applying the theory of optimally predicting nonfinancial income
based solely on the information contained solely in past levels of growth in nonfinancial in-
come.
Fundamental for the 𝑦𝑡 process means that the shock 𝑎𝑡 can be expressed as a square-
summable linear combination of 𝑦𝑡 , 𝑦𝑡−1 , ….
The shock 𝜖𝑡 is not fundamental and has more information about the future of the {𝑦𝑡 −
𝑦𝑡−1 } process than is contained in 𝑎𝑡 .
Representation (3) reveals the important fact that the original shock 𝜖𝑡 contains more in-
formation about future 𝑦’s than is contained in the semi-infinite history 𝑦𝑡 = [𝑦𝑡 , 𝑦𝑡−1 , …] of
current and past 𝑦’s.
Staring at representation (3) for 𝑎𝑡+1 shows that it consists both of new news 𝜖𝑡+1 as well as
∞
a long moving average (𝛽 − 𝛽 −1 ) ∑𝑗=0 𝛽 𝑗 𝜖𝑡−𝑗 of old news.
The better informed representation (1) asserts that a shock 𝜖𝑡 results in an impulse re-
sponse to nonfinancial income of 𝜖𝑡 times the sequence
54.3. TWO REPRESENTATIONS OF THE SAME NONFINANCIAL INCOME PROCESS919
1, 1 − 𝛽 −1 , 1 − 𝛽 −1 , …
Representation (2), i.e., the innovation representation, asserts that a shock 𝑎𝑡 results in an
impulse response to nonfinancial income of 𝑎𝑡 times
1, 1 − 𝛽, 1 − 𝛽, …
Notice that reprentation (1), namely, 𝑦𝑡+1 − 𝑦𝑡 = −𝛽 −1 𝜖𝑡 + 𝜖𝑡+1 implies the linear difference
equation
𝜖𝑡 = 𝛽𝜖𝑡+1 − 𝛽(𝑦𝑡+1 − 𝑦𝑡 ).
∞
𝜖𝑡 = 𝛽(𝑦𝑡 − (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 )
𝑗=0
This equation shows that 𝜖𝑡 equals 𝛽 times the one-step-backwards error in optimally back-
𝑡
casting 𝑦𝑡 based on the future 𝑦+ ≡ 𝑦𝑡+1 , 𝑦𝑡+2 , …] via the optimal backcasting formula
∞
𝑡
𝐸[𝑦𝑡 |𝑦+ ] = (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1
𝑗=0
920 CHAPTER 54. INFORMATION AND CONSUMPTION SMOOTHING
Thus, 𝜖𝑡 contains exact information about an important linear combination of future nonfi-
nancial income.
Next notice that representation (2), namely, 𝑦𝑡+1 − 𝑦𝑡 = −𝛽𝑎𝑡 + 𝑎𝑡+1 implies the linear differ-
ence equation
Solving this equation backward establishes that the one-step-prediction error 𝑎𝑡+1 is
∞
𝑎𝑡+1 = 𝑦𝑡+1 − (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡−𝑗
𝑗=0
and where the information set is 𝑦𝑡 = [𝑦𝑡 , 𝑦𝑡−1 , …], the one step-ahead optimal prediction is
∞
𝐸[𝑦𝑡+1 |𝑦𝑡 ] = (1 − 𝛽) ∑ 𝛽 𝑗 𝑦𝑡−𝑗
𝑗=0
When we computed optimal consumption-saving policies for the two representations using
formulas obtained with the difference equation approach described in the quantecon lecture,
we obtain:
for a consumer having the information assumed in the news representation (1):
𝑐𝑡+1 − 𝑐𝑡 = 0
𝑏𝑡+1 − 𝑏𝑡 = −𝛽 −1 𝜖𝑡
for a consumer having the more limited information associated with the innova-
tions representation (2):
𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽 2 )𝑎𝑡+1
𝑏𝑡+1 − 𝑏𝑡 = −𝛽𝑎𝑡
These formulas agree with outcomes from the Python programs to be reported below using
state-space representations and dynamic programming.
Evidently the two consumers behave differently though they receive exactly the same histories
of nonfinancial income.
The consumer with information associated with representation (1) responds to each shock
𝜖𝑡+1 by leaving his consumption unaltered and saving all of 𝑎𝑡+1 in anticipation of the per-
manently increased taxes that he will bear to pay for the addition 𝑎𝑡+1 to his time 𝑡 + 1 nonfi-
nancial income.
54.4. STATE SPACE REPRESENTATIONS 921
The consumer with information associated with representation (2) responds to a shock 𝑎𝑡+1
by increasing his consumption by what he perceives to be the permanent part of the in-
crease in consumption and by increasing his saving by what he perceives to be the tempo-
rary part.
We can regard the first consumer as someone whose behavior sharply illustrates the behavior
assumed in a classic Ricardian equivalence experiment.
We can cast our two representations in terms of the following two state space systems
𝑦 1 −𝛽 −1 𝑦𝑡 𝜎
[ 𝑡+1 ] = [ ] [ ] + [ 𝜖 ] 𝑣𝑡+1
𝜖𝑡+1 0 0 𝜖𝑡 𝜎𝜖
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝜖𝑡
and
𝑦 1 −𝛽 𝑦𝑡 𝜎
[ 𝑡+1 ] = [ ] [ ] + [ 𝑎 ] 𝑢𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 𝜎𝑎
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡
where {𝑣𝑡 } and {𝑢𝑡 } are both i.i.d. sequences of univariate standardized normal random vari-
ables.
These two alternative income processes are ready to be used in the framework presented in
the section “Comparison with the Difference Equation Approach” in the quantecon lecture.
All the code that we shall use below is presented in that lecture.
54.4.1 Computations
We shall use Python to form both of the above two state-space representations, using the
following parameter values 𝜎𝜖 = 1, 𝜎𝑎 = 𝛽 −1 𝜎𝜖 = 𝛽 −1 where 𝛽 is the same value as the
discount factor in the household’s problem in the LQ savings problem in the lecture.
For these two representations, we use the code in the lecture to
• compute optimal decision rules for 𝑐𝑡 , 𝑏𝑡 for the two types of consumers associated with
our two representations of nonfinancial income
• use the value function objects 𝑃 , 𝑑 returned by the code to compute optimal values for
the two representations when evaluated at the following initial conditions 𝑥0 =
10
[ ]
0
• create instances of the LinearStateSpace class for the two representations of the {𝑦𝑡 }
process and use them to obtain impulse response functions of 𝑐𝑡 and 𝑏𝑡 to the respective
shocks 𝜖𝑡 and 𝑎𝑡 for the two representations.
• run simulations of {𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡 } of length 𝑇 under both of the representations (later I’ll
give some more details about how we’ll run some special versions of these)
We want to solve the LQ problem:
∞
2
min ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)
𝑡=0
1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0
1 + 𝑟 𝑡+1
𝑦𝑡+1 1 −𝛽 −1 0 𝑦𝑡 0 𝜎𝜖
⎡ 𝜖 ⎤= ⎡ 0 0 0 ⎤⎡ 𝜖 ⎤ + ⎡ 0 ⎤[ 𝑐 ] + ⎡ 𝜎 ⎤𝜈 ,
⎢ 𝑡+1 ⎥ ⎢ ⎥⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡 ⎢ 𝜖 ⎥ 𝑡+1
⎣ − (1 + 𝑟)
⎣ 𝑏𝑡+1 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 0 ⎣⏟1⏟
1 + 𝑟 ⎦ ⎣ 𝑏𝑡 ⎦ ⏟ +⏟𝑟⏟
⎦ ⎣
⏟ 0 ⎦
≡𝐴1 ≡𝐵1 ≡𝐶1
and
𝑦𝑡+1 1 −𝛽 0 𝑦𝑡 0 𝜎𝑎
⎡ 𝑎 ⎤= ⎡ 0 0 0 ⎥ ⎢ 𝑎𝑡 ⎥ + ⎢ 0 ⎥ [ 𝑐𝑡 ] + ⎢ 𝜎𝑎 ⎤
⎤ ⎡ ⎤ ⎡ ⎤ ⎡
⎢ 𝑡+1 ⎥ ⎢ ⎥𝑢𝑡+1 .
𝑏 − (1 +
⎣ 𝑡+1 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⎣ 𝑟) 0 1 + 𝑟 𝑏
⎦⎣ 𝑡 ⎦ ⏟ 1 +
⎣⏟⏟⏟⏟ 𝑟 ⎦ ⏟⎦
⎣ 0
≡𝐴2 ≡𝐵2 ≡𝐶2
R = 1 / β
Evidently optimal consumption and debt decision rules for the consumer having news repre-
sentation (1) are
𝑐𝑡∗ = 𝑦𝑡 − 𝜖𝑡 − (1 − 𝛽) 𝑏𝑡 ,
∗
𝑏𝑡+1 = 𝛽 −1 𝑐𝑡∗ + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝛽 −1 𝑦𝑡 − 𝛽 −1 𝜖𝑡 − (𝛽 −1 − 1) 𝑏𝑡 + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝑏𝑡 − 𝛽 −1 𝜖𝑡 .
In [7]: -F2
For a consumer having access only to the information associated with the innovations repre-
sentation (2), the optimal decision rules are
𝑐𝑡∗ = 𝑦𝑡 − 𝛽 2 𝑎𝑡 − (1 − 𝛽) 𝑏𝑡 ,
∗
𝑏𝑡+1 = 𝛽 −1 𝑐𝑡∗ + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝛽 −1 𝑦𝑡 − 𝛽𝑎𝑡 − (𝛽 −1 − 1) 𝑏𝑡 + 𝛽 −1 𝑏𝑡 − 𝛽 −1 𝑦𝑡
= 𝑏𝑡 − 𝛽𝑎𝑡 .
Now we construct two Linear State Space models that emerge from using optimal policies
𝑢𝑡 = −𝐹 𝑥𝑡 for the control variable.
924 CHAPTER 54. INFORMATION AND CONSUMPTION SMOOTHING
𝑦𝑡+1 𝑦𝑡
⎡ 𝜖 ⎤ = (𝐴 − 𝐵 𝐹 ) ⎡ 𝜖 ⎤ + 𝐶 𝜈
⎢ 𝑡+1 ⎥ 1 1 1 ⎢ 𝑡 ⎥ 1 𝑡+1
⎣ 𝑏𝑡+1 ⎦ ⎣ 𝑏𝑡 ⎦
𝑦
𝑐𝑡 −𝐹1 ⎡ 𝑡 ⎤
[ ]=[ ] ⎢ 𝜖𝑡 ⎥
𝑏𝑡 𝑆𝑏
⎣ 𝑏𝑡 ⎦
To have the Linear State Space model of the innovations representation case, we can simply
replace the corresponding matrices.
The above two impulse response functions show that when the consumer has the information
assumed in the original representation, his response to receiving a positive shock of 𝜖𝑡 is to
leave his consumption unchanged and to save the entire amount of his extra income and then
forever roll over the extra bonds that he holds.
To see this notice, that starting from next period on, his debt permanently decreases by 𝛽 −1
The above impulse responses show that when the consumer has only the information that is
assumed to be available under the innovations representation for {𝑦𝑡 − 𝑦𝑡−1 }, he responds to a
positive 𝑎𝑡 by permanently increasing his consumption.
He accomplishes this by consuming a fraction (1 − 𝛽 2 ) of the increment 𝑎𝑡 to his nonfinancial
income and saving the rest in order to lower 𝑏𝑡+1 to finance the permanent increment in his
consumption.
The preceding computations confirm what we had derived earlier using paper and pencil.
Now let’s simulate some paths of consumption and debt for our two types of consumers while
always presenting both types with the same {𝑦𝑡 } path, constructed as described below.
54.4.2 Simulating the Income Process and Two Associated Shock Processes
We now describe how we form a single {𝑦𝑡 }𝑇𝑡=0 realization that we will use to simulate the
two different decision rules associated with our two types of consumer.
We accomplish this in the following steps.
1. We form a {𝑦𝑡 , 𝜖𝑡 } realization by drawing a long simulation of {𝜖𝑡 }𝑇𝑡=0 where 𝑇 is a big
integer 𝜖𝑡 = 𝜎𝜖 𝑣𝑡 , 𝑣𝑡 is a standard normal scalar, 𝑦0 = 100, and
𝑦𝑡+1 − 𝑦𝑡 = −𝛽 −1 𝜖𝑡 + 𝜖𝑡+1 .
1. We take the same {𝑦𝑡 } realization generated in step 1 and form an innovation process
{𝑎𝑡 } from the formulas
𝑎0 = 0
𝑡−1
𝑎𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡−𝑗 − 𝑦𝑡−𝑗−1 ) + 𝛽 𝑡 𝑎0 , 𝑡≥1
𝑗=0
1. We throw away the first 𝑆 observations and form the sample {𝑦𝑡 , 𝜖𝑡 , 𝑎𝑡 }𝑇𝑆+1 as the real-
ization that we’ll use in the following steps.
2. We use the step 3 realization to evaluate and simulate the decision rules for 𝑐𝑡 , 𝑏𝑡 that
Python has computed for us above.
The above steps implement the experiment of comparing decisions made by two consumers
having identical incomes at each date but at each date having different information about
their future incomes.
Here we use formula (3) above to compute 𝑎𝑡+1 as a function of the history 𝜖𝑡+1 , 𝜖𝑡 , 𝜖𝑡−1 , …
Thus, we compute
We can verify that we recover the same {𝑎𝑡 } sequence computed earlier.
54.4. STATE SPACE REPRESENTATIONS 929
This quantecon lecture contains another example of a shock-invertibility issue that is endemic
to the LQ permanent income or consumption smoothing model.
The technical issue discussed there is ultimately the source of the shock-invertibility issues
discussed by Eric Leeper, Todd Walker, and Susan Yang [? ] in their analysis of fiscal fore-
sight.
930 CHAPTER 54. INFORMATION AND CONSUMPTION SMOOTHING
Chapter 55
55.1 Contents
• Overview 55.2
• Background 55.3
• Linear State Space Version of Complete Markets Model 55.4
• Model 1 (Complete Markets) 55.5
• Model 2 (One-Period Risk-Free Debt Only) 55.6
In addition to what’s in Anaconda, this lecture uses the library:
55.2 Overview
931
932CHAPTER 55. CONSUMPTION SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
We maintain Hall’s assumption about the interest rate when we describe an incomplete mar-
kets version of our model.
In addition, we extend Hall’s assumption about the risk-free interest rate to an appropriate
counterpart to create a “complete markets” model in which there are markets in a complete
array of one-period Arrow state-contingent securities.
In this lecture we’ll consider two closely related but distinct alternative assumptions about
the consumer’s exogenous nonfinancial income process:
• that it is generated by a finite 𝑁 state Markov chain (setting 𝑁 = 2 most of the time in
this lecture)
• that it is described by a linear state space model with a continuous state vector in ℝ𝑛
driven by a Gaussian vector IID shock process
We’ll spend most of this lecture studying the finite-state Markov specification, but will begin
by studying the linear state space specification because it is so closely linked to earlier lec-
tures.
Let’s start with some imports:
55.3 Background
These state-contingent securities are commonly called Arrow securities, after Kenneth Arrow
https://en.wikipedia.org/wiki/Kenneth_Arrow
In the incomplete markets version of the model, the consumer can buy and sell only one
security each period, a risk-free one-period bond with gross one-period return 𝛽 −1 .
Now we’ll study a complete markets model adapted to a setting with a continuous Markov
state like that in the first lecture on the permanent income model.
In that model, there are
• incomplete markets: the consumer can trade only a single risk-free one-period bond
bearing gross one-period risk-free interest rate equal to 𝛽 −1 .
• the consumer’s exogenous nonfinancial income is governed by a linear state space model
driven by Gaussian shocks, the kind of model studied in an earlier lecture about linear
state space models.
We’ll write down a complete markets counterpart of that model.
Suppose that nonfinancial income is governed by the state space system
where 𝜙(⋅ | 𝜇, Σ) is a multivariate Gaussian distribution with mean vector 𝜇 and covariance
matrix Σ.
With the pricing kernel 𝑞𝑡+1 (𝑥𝑡+1 | 𝑥𝑡 ) in hand, we can price claims to consumption at time
𝑡 + 1 consumption that pay off when 𝑥𝑡+1 ∈ 𝐴 at time 𝑡 + 1:
In the complete markets setting, the consumer faces a sequence of budget constraints
which verifies that 𝐸𝑡 𝑏𝑡+1 is the value of time 𝑡 + 1 state-contingent claims issued by the con-
sumer at time 𝑡
We can solve the time 𝑡 budget constraint forward to obtain
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 )
𝑗=0
We assume as before that the consumer cares about the expected value of
∞
∑ 𝛽 𝑡 𝑢(𝑐𝑡 ), 0<𝛽<1
𝑡=0
In the incomplete markets version of the model, we assumed that 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 , so that
the above utility functional became
∞
− ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)2 , 0<𝛽<1
𝑡=0
But in the complete markets version, it is tractable to assume a more general utility function
that satisfies 𝑢′ > 0 and 𝑢″ < 0.
The first-order conditions for the consumer’s problem with complete markets and our assump-
tion about Arrow securities prices are
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐)̄
𝑗=0
or
1
𝑏𝑡 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥𝑡 − 𝑐̄ (2)
1−𝛽
where 𝑐 ̄ satisfies
55.4. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 935
1
𝑏̄0 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥0 − 𝑐̄ (3)
1−𝛽
where 𝑏̄0 is an initial level of the consumer’s debt, specified as a parameter of the problem.
Thus, in the complete markets version of the consumption-smoothing model, 𝑐𝑡 = 𝑐,̄ ∀𝑡 ≥ 0
is determined by (3) and the consumer’s debt is a fixed function of the state 𝑥𝑡 described by
(2).
Please recall that in the LQ permanent income model studied in first lecture on the perma-
nent income model, the state is 𝑥𝑡 , 𝑏𝑡 , where 𝑏𝑡 is a complicated function of past state vectors
𝑥𝑡−𝑗 .
Notice that in contrast to that incomplete markets model, in our complete markets model , at
time 𝑡 the state vector is 𝑥𝑡 alone.
Here’s an example that shows how in this setting the availability of insurance against fluctu-
ating nonfinancial income allows the consumer completely to smooth consumption across time
and across states of the world
# Debt
x_hist, y_hist = lss.simulate(T)
b_hist = np.squeeze(S_y @ rm @ x_hist - cbar / (1 - β))
# Define parameters
N_simul = 80
936CHAPTER 55. CONSUMPTION SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
# Consumption plots
ax[0].set_title('Consumption and income')
ax[0].plot(np.arange(N_simul), c_hist_com, label='consumption')
ax[0].plot(np.arange(N_simul), y_hist_com, label='income', alpha=.6,�
↪linestyle='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([80, 120])
# Debt plots
ax[1].set_title('Debt and income')
ax[1].plot(np.arange(N_simul), b_hist_com, label='debt')
ax[1].plot(np.arange(N_simul), y_hist_com, label='Income', alpha=.6,�
↪linestyle='--')
ax[1].legend()
ax[1].axhline(0, color='k')
ax[1].set_xlabel('Periods')
plt.show()
55.4. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 937
The incomplete markets version of the model with nonfinancial income being governed by a
linear state space system is described in the first lecture on the permanent income model and
the followup lecture on the permanent income model.
In that version, consumption follows a random walk and the consumer’s debt follows a pro-
cess with a unit root.
We now turn to a finite-state Markov version of the model in which the consumer’s nonfinan-
cial income is an exact function of a Markov state that takes one of 𝑁 values.
We’ll start with a setting in which in each version of our consumption-smoothing models,
nonfinancial income is governed by a two-state Markov chain (it’s easy to generalize this to
an 𝑁 state Markov chain).
In particular, the state of the world is given by 𝑠𝑡 ∈ {1, 2} that follows a Markov chain with
transition probability matrix
𝑃𝑖𝑗 = ℙ{𝑠𝑡+1 = 𝑗 | 𝑠𝑡 = 𝑖}
𝑦1̄ if 𝑠𝑡 = 1
𝑦𝑡 = {
𝑦2̄ if 𝑠𝑡 = 2
∞
𝔼 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] where 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 and 0<𝛽<1 (4)
𝑡=0
Our complete and incomplete markets models differ in how effectively the market structure
allows a consumer to transfer resources across time and Markov states, there being more
transfer opportunities in the complete markets setting than in the incomplete markets setting.
Watch how these differences in opportunities affect
• how smooth consumption is across time and Markov states
• how the consumer chooses to make his levels of indebtedness behave over time and
across Markov states
At each date 𝑡 ≥ 0, the consumer trades a full array of one-period ahead Arrow securi-
ties.
We assume that prices of these securities are exogenous to the consumer.
Exogenous means that they are unaffected by the consumer’s decisions.
In Markov state 𝑠𝑡 at time 𝑡, one unit of consumption in state 𝑠𝑡+1 at time 𝑡 + 1 costs
𝑞(𝑠𝑡+1 | 𝑠𝑡 ) units of the time 𝑡 consumption good.
The prices 𝑞(𝑠𝑡+1 | 𝑠𝑡 ) are given and can be organized into a matrix 𝑄 with 𝑄𝑖𝑗 = 𝑞(𝑗|𝑖)
At time 𝑡 = 0, the consumer starts with an inherited level of debt due at time 0 of 𝑏0 units of
time 0 consumption goods.
The consumer’s budget constraint at 𝑡 ≥ 0 in Markov state 𝑠𝑡 is
where 𝑏𝑡 is the consumer’s one-period debt that falls due at time 𝑡 and 𝑏𝑡+1 (𝑗 | 𝑠𝑡 ) are the con-
sumer’s time 𝑡 sales of the time 𝑡 + 1 consumption good in Markov state 𝑗.
These are
• when multiplied by 𝑞(𝑗 | 𝑠𝑡 ), a source of time 𝑡 revenues to the consumer
• a source of time 𝑡 + 1, obligations or expenditures
A natural analog of Hall’s assumption that the one-period risk-free gross interest rate is 𝛽 −1
is
To understand how this is a natural analogue, observe that in state 𝑖 it costs ∑𝑗 𝑞(𝑗 | 𝑖) to
purchase one unit of consumption next period for sure, i.e., meaning no matter what state of
the world occurs at 𝑡 + 1.
Hence the implied price of a risk-free claim on one unit of consumption next period is
∑ 𝑞(𝑗 | 𝑖) = ∑ 𝛽𝑃𝑖𝑗 = 𝛽
𝑗 𝑗
55.5. MODEL 1 (COMPLETE MARKETS) 939
This confirms the sense in which (6) is a natural counterpart to Hall’s assumption that the
risk-free one-period gross interest rate is 𝑅 = 𝛽 −1 .
It is timely please to recall that the gross one-period risk-free interest rate is the reciprocal of
the price at time 𝑡 of a risk-free claim on one unit of consumption tomorrow.
First-order necessary conditions for maximizing the consumer’s expected utility subject to the
sequence of budget constraints (5) are
𝑢′ (𝑐𝑡+1 )
𝛽 ℙ{𝑠𝑡+1 | 𝑠𝑡 } = 𝑞(𝑠𝑡+1 | 𝑠𝑡 )
𝑢′ (𝑐𝑡 )
𝑐𝑡+1 = 𝑐𝑡 (7)
Thus, our consumer sets 𝑐𝑡 = 𝑐 ̄ for all 𝑡 ≥ 0 for some value 𝑐 ̄ that it is our job now to deter-
mine along with values for 𝑏𝑡+1 (𝑗|𝑠𝑡 = 𝑖) for 𝑖 = 1, 2 and 𝑗 = 1, 2
We’ll use a guess and verify method to determine these objects
Guess: We’ll make the plausible guess that
so that the amount borrowed today turns out to depend only on tomorrow’s Markov state.
(Why is this is a plausible guess?)
To determine 𝑐,̄ we shall pursue implications of the consumer’s budget constraints in each
Markov state today and our guess (8) about the consumer’s debt level choices.
For 𝑡 ≥ 1, these imply
or
where 𝑏0 is the (exogenous) debt the consumer is assumed to bring into period 0
If we substitute (10) into the first equation of (9) and rearrange, we discover that
𝑏(1) = 𝑏0 (11)
We can then use the second equation of (9) to deduce the restriction
The preceding calculations indicate that in the complete markets version of our model, we
obtain the following striking results:
• The consumer chooses to make consumption perfectly constant across time and across
Markov states.
• State-contingent debt purchases 𝑏𝑡+1 (𝑠𝑡+1 = 𝑗|𝑠𝑡 = 𝑖) depend only on 𝑗
• If the initial Markov state is 𝑠0 = 𝑗 and initial consumer debt is 𝑏0 , then debt in Markov
state 𝑗 satisfied 𝑏(𝑗) = 𝑏0
To summarize what we have achieved up to now, we have computed the constant level of con-
sumption 𝑐 ̄ and indicated how that level depends on the underlying specifications of prefer-
ences, Arrow securities prices, the stochastic process of exogenous nonfinancial income, and
the initial debt level 𝑏0
• The consumer’s debt neither accumulates, nor decumulates, nor drifts – instead, the
debt level each period is an exact function of the Markov state, so in the two-state
Markov case, it switches between two values.
• We have verified guess (8).
• When the state 𝑠𝑡 returns to the initial state 𝑠0 , debt returns to the initial debt level.
• Debt levels in all other states depend on virtually all remaining parameters of the
model.
55.5.2 Code
Here’s some code that, among other things, contains a function called consump-
tion_complete().
This function computes {𝑏(𝑖)}𝑁
𝑖=1 , 𝑐 ̄ as outcomes given a set of parameters for the general case
with 𝑁 Markov states under the assumption of complete markets
def __init__(self,
β=.96,
55.5. MODEL 1 (COMPLETE MARKETS) 941
y=[2, 1.5],
b0=3,
P=[[.8, .2],
[.4, .6]],
init=0):
"""
Parameters
----------
β : discount factor
y : list containing the two income levels
b0 : debt in period 0 (= initial state debt level)
P : 2x2 transition matrix
init : index of initial state s0
"""
self.β = β
self.y = np.asarray(y)
self.b0 = b0
self.P = np.asarray(P)
self.init = init
return s_path
def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.
Parameters
----------
cp : instance of ConsumptionProblem
Returns
-------
Q = β * P
"""
β, P, y, b0, init = cp.β, cp.P, cp.y, cp.b0, cp.init # Unpack
942CHAPTER 55. CONSUMPTION SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
A = np.zeros((n, n))
A[:, 0] = 1
A[1:, 1:] = np.eye(n-1)
c_bar = x[0, 0]
b = x[1:, 0]
return c_bar, b
Parameters
----------
cp : instance of ConsumptionProblem
s_path : the path of states
"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack
N_simul = len(s_path)
# Useful variables
n = len(y)
y.shape = (n, 1)
v = np.linalg.inv(np.eye(n) - β * P) @ y
for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - b_path[i] * np.ones((n, 1)))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]
In [5]: cp = ConsumptionProblem()
c_bar, b = consumption_complete(cp)
np.isclose(c_bar + b[1] - cp.y[1] - (cp.β * cp.P)[1, :] @ b, 0)
Out[5]: True
Below, we’ll take the outcomes produced by this code – in particular the implied consumption
and debt paths – and compare them with outcomes from an incomplete markets model in the
spirit of Hall [67]
This is a version of the original models of Hall (1978) in which the consumer’s ability to sub-
stitute intertemporally is constrained by his ability to buy or sell only one security, a risk-free
one-period bond bearing a constant gross interest rate that equals 𝛽 −1 .
Given an initial debt 𝑏0 at time 0, the consumer faces a sequence of budget constraints
𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝑏𝑡+1 , 𝑡≥0
where 𝛽 is the price at time 𝑡 of a risk-free claim on one unit of time consumption at time
𝑡 + 1.
First-order conditions for the consumer’s problem are
which for our finite-state Markov setting is Hall’s (1978) conclusion that consumption follows
a random walk.
As we saw in our first lecture on the permanent income model, this leads to
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − (1 − 𝛽)−1 𝑐𝑡 (14)
𝑗=0
and
∞
𝑐𝑡 = (1 − 𝛽) [𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑏𝑡 ] (15)
𝑗=0
Equation (15) expresses 𝑐𝑡 as a net interest rate factor 1 − 𝛽 times the sum of the expected
∞
present value of nonfinancial income 𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 and financial wealth −𝑏𝑡 .
Substituting (15) into the one-period budget constraint and rearranging leads to
944CHAPTER 55. CONSUMPTION SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
∞
𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑦𝑡 ] (16)
𝑗=0
∞
Now let’s calculate the key term 𝔼𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 in our finite Markov chain setting.
Define
∞
𝑣𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0
𝑣𝑡 ∶= 𝑦𝑡 + 𝛽𝔼𝑡 𝑣𝑡+1
In our two-state Markov chain setting, 𝑣𝑡 = 𝑣(1) when 𝑠𝑡 = 1 and 𝑣𝑡 = 𝑣(2) when 𝑠𝑡 = 2.
Therefore, we can write our Bellman equation as
or
𝑣 ⃗ = 𝑦 ⃗ + 𝛽𝑃 𝑣 ⃗
𝑣(1) 𝑦(1)
where 𝑣 ⃗ = [ ] and 𝑦 ⃗ = [ ].
𝑣(2) 𝑦(2)
We can also write the last expression as
𝑣 ⃗ = (𝐼 − 𝛽𝑃 )−1 𝑦 ⃗
In our finite Markov chain setting, from expression (15), consumption at date 𝑡 when debt is
𝑏𝑡 and the Markov state today is 𝑠𝑡 = 𝑖 is evidently
In contrast to outcomes in the complete markets model, in the incomplete markets model
• consumption drifts over time as a random walk; the level of consumption at time 𝑡 de-
pends on the level of debt that the consumer brings into the period as well as the ex-
pected discounted present value of nonfinancial income at 𝑡.
55.6. MODEL 2 (ONE-PERIOD RISK-FREE DEBT ONLY) 945
• the consumer’s debt drifts upward over time in response to low realizations of nonfinan-
cial income and drifts downward over time in response to high realizations of nonfinan-
cial income.
• the drift over time in the consumer’s debt and the dependence of current consumption
on today’s debt level account for the drift over time in consumption.
The code above also contains a function called consumption_incomplete() that uses (17) and
(18) to
• simulate paths of 𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡+1
• plot these against values of 𝑐,̄ 𝑏(𝑠1 ), 𝑏(𝑠2 ) found in a corresponding complete markets
economy
Let’s try this, using the same parameters in both complete and incomplete markets economies
In [6]: cp = ConsumptionProblem()
s_path = cp.simulate()
N_simul = len(s_path)
ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), c_bar * np.ones(N_simul),
label='complete market')
ax[0].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path],
label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')
plt.show()
946CHAPTER 55. CONSUMPTION SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets, but takes a random walk in
the incomplete markets version of the model.
• the consumer’s debt oscillates between two values that are functions of the Markov state
in the complete markets model, while the consumer’s debt drifts in a “unit root” fashion
in the incomplete markets economy.
55.6.3 A sequel
In tax smoothing with complete and incomplete markets, we reinterpret the mathematics and
Python code presented in this lecture in order to construct tax-smoothing models in the in-
complete markets tradition of Barro [14] as well as in the complete markets tradition of Lucas
and Stokey [111].
Chapter 56
56.1 Contents
• Overview 56.2
• Tax Smoothing with Complete Markets 56.3
• Returns on State-Contingent Debt 56.4
• More Finite Markov Chain Tax-Smoothing Examples 56.5
In addition to what’s in Anaconda, this lecture uses the library:
56.2 Overview
This lecture describes two types of tax-smoothing models that are counterparts to the
consumption-smoothing models in Consumption Smoothing with Complete and Incomplete
Markets.
• one is in the complete markets tradition of Lucas and Stokey [111].
• the other is in the incomplete markets tradition of Hall [67] and Barro [14].
Complete markets allow a government to buy or sell claims contingent on all possible states of
the world.
Incomplete markets allow a government to buy or sell only a limited set of securities, often
only a single risk-free security.
Barro [14] worked in an incomplete markets tradition by assuming that the only asset that
can be traded is a risk-free one period bond.
Hall assumed an exogenous stochastic process of nonfinancial income and an exogenous gross
interest rate on one period risk-free debt that equals 𝛽 −1 , where 𝛽 ∈ (0, 1) is also a con-
sumer’s intertemporal discount factor.
Barro [14] made an analogous assumption about the risk-free interest rate in a tax-smoothing
model that turns out to have the same mathematical structure as Hall’s consumption-
smoothing model.
947
948CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
Link to History
For those who love history, President Thomas Jefferson’s Secretary of Treasury Albert Gal-
latin (1807) [61] prescribed policies that come from Barro’s model [14]
Let’s start with some standard imports:
56.2.2 Code
Here’s some code that, among other things, contains a function called consump-
tion_complete().
This function computes {𝑏(𝑖)}𝑁
𝑖=1 , 𝑐 ̄ as outcomes given a set of parameters for the general case
with 𝑁 Markov states under the assumption of complete markets
def __init__(self,
β=.96,
y=[2, 1.5],
b0=3,
P=[[.8, .2],
[.4, .6]],
init=0):
"""
Parameters
----------
β : discount factor
y : list containing the two income levels
b0 : debt in period 0 (= initial state debt level)
P : 2x2 transition matrix
init : index of initial state s0
"""
self.β = β
self.y = np.asarray(y)
self.b0 = b0
self.P = np.asarray(P)
self.init = init
return s_path
def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.
Parameters
----------
950CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
cp : instance of ConsumptionProblem
Returns
-------
Q = β * P
"""
β, P, y, b0, init = cp.β, cp.P, cp.y, cp.b0, cp.init # Unpack
A = np.zeros((n, n))
A[:, 0] = 1
A[1:, 1:] = np.eye(n-1)
c_bar = x[0, 0]
b = x[1:, 0]
return c_bar, b
Parameters
----------
cp : instance of ConsumptionProblem
s_path : the path of states
"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack
N_simul = len(s_path)
# Useful variables
n = len(y)
y.shape = (n, 1)
v = np.linalg.inv(np.eye(n) - β * P) @ y
56.2. OVERVIEW 951
for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - b_path[i] * np.ones((n, 1)))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]
In [4]: cp = ConsumptionProblem()
s_path = cp.simulate()
N_simul = len(s_path)
ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), c_bar * np.ones(N_simul), label='complete�
↪market')
ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path], label='complete�
↪market')
plt.show()
952CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that
• consumption is constant when there are complete markets, but takes a random walk in
the incomplete markets version of the model.
• the consumer’s debt oscillates between two values that are functions of the Markov state
in the complete markets model, while the consumer’s debt in the incomplete markets
economy drifts because it contains a unit root.
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([1.4, 2.1])
plt.show()
56.3. TAX SMOOTHING WITH COMPLETE MARKETS 953
𝑇𝑖 + 𝑏𝑖 = 𝐺𝑖 + ∑ 𝑄𝑖𝑗 𝑏𝑗
𝑗
where
𝑄𝑖𝑗 = 𝛽𝑃𝑖𝑗
is the price of one unit of goods when tomorrow’s Markov state is 𝑗 and when today’s Markov
state is 𝑖
𝑏𝑖 is the government’s level of assets when it arrives in Markov state 𝑖.
That is, 𝑏𝑖 equals one-period state-contingent claims owed to the government that fall due at
time 𝑡 when the Markov state is 𝑖.
Thus, if 𝑏𝑖 < 0, it means the government is owed 𝑏𝑖 or owes** −𝑏𝑖 when the economy arrives
in Markov state 𝑖 at time 𝑡.
In our examples below, this happens when in a previous war-time period the government has
sold an Arrow securities paying off −𝑏𝑖 in peacetime Markov state 𝑖
It can be enlightening to express the government’s budget constraint in Markov state 𝑖 as
954CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
𝑇𝑖 = 𝐺𝑖 + (∑ 𝑄𝑖𝑗 𝑏𝑗 − 𝑏𝑖 )
𝑗
in which the term (∑𝑗 𝑄𝑖𝑗 𝑏𝑗 − 𝑏𝑖 ) equals the net amount that the government spends to pur-
chase one-period Arrow securities that will pay off next period in Markov states 𝑗 = 1, … , 𝑁 .
The ex post one-period gross return on the portfolio of government assets held from state 𝑖 at
time 𝑡 to state 𝑗 at time 𝑡 + 1 is
𝑏(𝑗)
𝑅(𝑗|𝑖) = 𝑁
∑𝑗′ =1 𝑄𝑖𝑗′ 𝑏(𝑗′ )
𝑁
where ∑𝑗′ =1 𝑄𝑖𝑗′ 𝑏(𝑗′ ) is the amount that the government spends at time 𝑡 in Markov state 𝑖
to purchase one-period state-contingent claims that will pay off at time 𝑡 + 1 depending on
what Markov state 𝑗 is realized then.
The cumulative return earned from putting 1 unit of time 𝑡 goods into the government port-
folio of state-contingent securities at time 𝑡 and then rolling over the proceeds into the gov-
ernment portfolio each period thereafter is
values = Q @ b
n = len(b)
R = np.zeros((n, n))
for i in range(n):
ind = cp.P[i, :] != 0
R[i, ind] = b[ind] / values[i]
return R
RT_path = np.empty(T)
RT_path[0] = 1
RT_path[1:] = np.cumprod([R[s_path[t], s_path[t+1]] for t in range(T-
1)])
return RT_path
We’ll study a tax-smoothing version of the two Markov state example studied above.
There is peace and government expenditures are low in Markov state 1.
There is war and government expenditures are high in Markov state 2.
We’ll compute optimal policies in both complete and incomplete markets settings.
Then we’ll feed in a particular assumed path of Markov states and study outcomes.
• We’ll assume that the initial Markov state is state 1, which means we start from a state
of peace.
• The government then experiences 3 time periods of war and come back to peace again.
• The history of Markov states is therefore {𝑝𝑒𝑎𝑐𝑒, 𝑤𝑎𝑟, 𝑤𝑎𝑟, 𝑤𝑎𝑟, 𝑝𝑒𝑎𝑐𝑒}.
In addition, as indicated above, to simplify our example, we’ll set the government’s initial as-
set level to 1, so that 𝑏1 = 1.
Here’s our code to compute a quantitative example intialized to have government assets being
one in an initial peace time state:
In [7]: # Parameters
β = .96
cp = ConsumptionProblem(β, g, b0, P)
Q = β * P
print(f"P \n {P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in peace and war = {g}")
print(f"Constant tax collections = {T_bar}")
print(f"Govt debts in two states = {-b}")
956CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
msg = """
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
"""
print(msg)
AS1 = Q[0, :] @ b
# spending on Arrow security
# since the spending on Arrow peace security is not 0 anymore after we�
↪change b0 to 1
print("")
# tax collections minus debt levels
print("Government tax collections minus debt levels in peace and war")
TB1 = T_bar + b[0]
print(f"T+b in peace = {TB1}")
TB2 = T_bar + b[1]
print(f"T+b in war = {TB2}")
print("")
print("Total government spending in peace and war")
G1 = g[0] + AS1
G2 = g[1] + AS2
print(f"Peace = {G1}")
print(f"War = {G2}")
print("")
print("Let's see ex-post and ex-ante returns on Arrow securities")
Π = np.reciprocal(Q)
exret = Π
print(f"Ex-post returns to purchase of Arrow securities = \n {exret}")
exant = Π * P
print(f"Ex-ante returns to purchase of Arrow securities \n {exant}")
print("")
print("The Ex-post one-period gross return on the portfolio of government�
↪assets")
print(R)
print("")
print("The cumulative return earned from holding 1 unit market portfolio�
↪of government
bonds")
print(RT_path[-1])
P
[[0.8 0.2]
[0.4 0.6]]
Q
[[0.768 0.192]
[0.384 0.576]]
Govt expenditures in peace and war = [1, 2]
56.4. RETURNS ON STATE-CONTINGENT DEBT 957
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
The cumulative return earned from holding 1 unit market portfolio of government bonds
2.0860704239993675
56.4.2 Explanation
In this example, the government always purchase 1 units of the Arrow security that pays off
in peace time (Markov state 1).
And it purchases a higher amount of the security that pays off in war time (Markov state 2).
We recommend plugging the quantities computed above into the government budget con-
straints in the two Markov states and staring.
This is an example in which
• during peacetime, the government purchases insurance against the possibility that war
breaks out next period
• during wartime, the government purchases insurance against the possibility that war
continues another period
• the return on the insurance against war is low so long as peace continues
• the return on the insurance against war is high when war breaks out or continues
• given the history of states that we assumed, the value of one unit of the portfolio of
government assets will double in the end because of high returns during wartime.
Exercise: try changing the Markov transition matrix so that
1 0
𝑃 =[ ]
.2 .8
958CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
Also, start the system in Markov state 2 (war) with initial government assets −10, so that the
government starts the war in debt and 𝑏2 = −10.
For thinking about some episodes in the fiscal history of the United States, we find it inter-
esting to study a few more examples that we now present.
Here we give more examples of tax-smoothing models with both complete and incomplete
markets in an 𝑁 state Markov setting.
These examples differ in how Markov states are jumping between peace and war.
To wrap procedures for solving models, relabeling graphs so that we record government debt
rather than government assets, and displaying results, we construct a new class below.
"""
def __init__(self, g, P, b0, states, β=.96,
init=0, s_path=None, N_simul=80, random_state=1):
def display(self):
# plot graphs
N = len(self.T_path)
plt.figure()
plt.title('Tax collection paths')
plt.plot(np.arange(N), self.T_path, label='incomplete market')
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 959
fig, ax = plt.subplots()
ax.set_title('Cumulative return path (complete market)')
line1 = ax.plot(np.arange(N), self.RT_path)[0]
c1 = line1.get_color()
ax.set_xlabel('Periods')
ax.set_ylabel('Cumulative return', color=c1)
ax_ = ax.twinx()
ax_._get_lines.prop_cycler = ax._get_lines.prop_cycler
line2 = ax_.plot(np.arange(N), self.g_path, ls='--')[0]
c2 = line2.get_color()
ax_.set_ylabel('Government expenditures', color=c2)
plt.show()
print(f"P \n {self.cp.P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in {', '.join(self.states)} = {self.cp.y.
↪ flatten()}")
print(f"Constant tax collections = {self.T_bar}")
print(f"Govt debt in {len(self.states)} states = {-self.b}")
print("")
print(f"Government tax collections minus debt levels in {',
'.join(self.states)}")
for i in range(len(self.states)):
TB = self.T_bar + self.b[i]
print(f" T+b in {self.states[i]} = {TB}")
print("")
print(f"Total government spending in {', '.join(self.states)}")
960CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
for i in range(len(self.states)):
G = self.cp.y[i, 0] + Q[i, :] @ self.b
print(f" {self.states[i]} = {G}")
print("")
print("Let's see ex-post and ex-ante returns on Arrow securities \n")
print("")
exant = 1 / self.cp.β
print(f"Ex-ante returns to purchase of Arrow securities = {exant}")
print("")
print("The Ex-post one-period gross return on the portfolio of�
↪ government
assets")
print(self.R)
print("")
print("The cumulative return earned from holding 1 unit market�
↪ portfolio of
government bonds")
print(self.RT_path[-1])
56.5.1 Parameters
In [9]: γ = .1
λ = .1
ϕ = .1
θ = .1
ψ = .1
g_L = .5
g_M = .8
g_H = 1.2
β = .96
56.5.2 Example 1
This example is designed to produce some stylized versions of tax, debt, and deficit paths fol-
lowed by the United States during and after the Civil War and also during and after World
War I.
We set the Markov chain to have three states
1−𝜆 𝜆 0
𝑃 =⎡
⎢ 0 1 − 𝜙 𝜙 ⎤
⎥
⎣ 0 0 1⎦
We set 𝑏0 = 1 and assume that the initial Markov state is state 1 so that the system starts off
in peace.
These parameters have government expenditure beginning at a low level, surging during the
war, then decreasing after the war to a level that exceeds its prewar level.
(This type of pattern occurred in the US Civil War and World War I experiences.)
ts_ex1.display()
962CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
P
[[0.9 0.1 0. ]
[0. 0.9 0.1]
[0. 0. 1. ]]
Q
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 963
[[0.864 0.096 0. ]
[0. 0.864 0.096]
[0. 0. 0.96 ]]
Govt expenditures in peace, war, postwar = [0.5 1.2 0.8]
Constant tax collections = 0.7548096885813149
Govt debt in 3 states = [-1. -4.07093426 -1.12975779]
The cumulative return earned from holding 1 unit market portfolio of government bonds
0.17908622141460384
In [12]: # The following shows the use of the wrapper class when a specific state�
↪path is given
s_path = [0, 0, 1, 1, 2]
ts_s_path = TaxSmoothingExample(g_ex1, P_ex1, b0_ex1, states_ex1,�
↪s_path=s_path)
ts_s_path.display()
964CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 965
P
[[0.9 0.1 0. ]
[0. 0.9 0.1]
[0. 0. 1. ]]
Q
[[0.864 0.096 0. ]
[0. 0.864 0.096]
[0. 0. 0.96 ]]
Govt expenditures in peace, war, postwar = [0.5 1.2 0.8]
Constant tax collections = 0.7548096885813149
Govt debt in 3 states = [-1. -4.07093426 -1.12975779]
The cumulative return earned from holding 1 unit market portfolio of government bonds
0.9045311615620274
56.5.3 Example 2
This example captures a peace followed by a war, eventually followed by a permanent peace .
Here we set
1 0 0
𝑃 =⎡
⎢ 0 1 − 𝛾 𝛾 ⎤
⎥
⎣𝜙 0 1 − 𝜙⎦
ts_ex2.display()
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 967
P
[[1. 0. 0. ]
968CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
Government tax collections minus debt levels in peace, temporary peace, war
T+b in peace = -2.027889273356399
T+b in temporary peace = 1.6053287197231834
T+b in war = 3.1191695501730106
The cumulative return earned from holding 1 unit market portfolio of government bonds
-9.3689917325942
56.5.4 Example 3
This example features a situation in which one of the states is a war state with no hope of
peace next period, while another state is a war state with a positive probability of peace next
period.
The Markov chain is:
1−𝜆 𝜆 0 0
⎡ 0 1−𝜙 𝜙 0 ⎤
𝑃 =⎢ ⎥
⎢ 0 0 1−𝜓 𝜓 ⎥
⎣ 𝜃 0 0 1 − 𝜃⎦
with government expenditure levels for the four states being [𝑔𝐿 𝑔𝐿 𝑔𝐻 𝑔𝐻 ] where 𝑔𝐿 <
𝑔𝐻 .
We start with 𝑏0 = 1 and 𝑠0 = 1.
ts_ex3.display()
970CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
P
[[0.9 0.1 0. 0. ]
[0. 0.9 0.1 0. ]
[0. 0. 0.9 0.1]
[0.1 0. 0. 0.9]]
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 971
Q
[[0.864 0.096 0. 0. ]
[0. 0.864 0.096 0. ]
[0. 0. 0.864 0.096]
[0.096 0. 0. 0.864]]
Govt expenditures in peace1, peace2, war1, war2 = [0.5 0.5 1.2 1.2]
Constant tax collections = 0.6927944572748268
Govt debt in 4 states = [-1. -3.42494226 -6.86027714 -4.43533487]
Government tax collections minus debt levels in peace1, peace2, war1, war2
T+b in peace1 = 1.6927944572748268
T+b in peace2 = 4.117736720554273
T+b in war1 = 7.553071593533488
T+b in war2 = 5.1281293302540405
The cumulative return earned from holding 1 unit market portfolio of government bonds
0.02371440178864223
56.5.5 Example 4
1−𝜆 𝜆 0 0 0
⎡ 0 1−𝜙 𝜙 0 0⎤
⎢ ⎥
𝑃 =⎢ 0 0 1−𝜓 𝜓 0⎥
⎢ 0 0 0 1−𝜃 𝜃⎥
⎣ 0 0 0 0 1⎦
with government expenditure levels for the five states being [𝑔𝐿 𝑔𝐿 𝑔𝐻 𝑔𝐻 𝑔𝐿 ] where
𝑔𝐿 < 𝑔 𝐻 .
We ssume that 𝑏0 = 1 and 𝑠0 = 1.
972CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
ts_ex4.display()
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 973
P
[[0.9 0.1 0. 0. 0. ]
[0. 0.9 0.1 0. 0. ]
[0. 0. 0.9 0.1 0. ]
[0. 0. 0. 0.9 0.1]
974CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
[0. 0. 0. 0. 1. ]]
Q
[[0.864 0.096 0. 0. 0. ]
[0. 0.864 0.096 0. 0. ]
[0. 0. 0.864 0.096 0. ]
[0. 0. 0. 0.864 0.096]
[0. 0. 0. 0. 0.96 ]]
Govt expenditures in peace1, peace2, war1, war2, permanent peace = [0.5 0.5 1.2 1.2
0.5]
Constant tax collections = 0.6349979047185738
Govt debt in 5 states = [-1. -2.82289484 -5.4053292 -1.77211121 3.37494762]
Government tax collections minus debt levels in peace1, peace2, war1, war2, permanent
peace
T+b in peace1 = 1.6349979047185736
T+b in peace2 = 3.4578927455370505
T+b in war1 = 6.040327103363229
T+b in war2 = 2.407109110283644
T+b in permanent peace = -2.739949713245767
The cumulative return earned from holding 1 unit market portfolio of government bonds
-11.132109773063592
56.5.6 Example 5
The example captures a case when the system follows a deterministic path from peace to war,
and back to peace again.
Since there is no randomness, the outcomes in complete markets setting should be the same
as in incomplete markets setting.
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 975
0 1 0 0 0 0 0
⎡0 0 1 0 0 0 0⎤
⎢ ⎥
⎢0 0 0 1 0 0 0⎥
𝑃 = ⎢0 0 0 0 1 0 0⎥
⎢ ⎥
⎢0 0 0 0 0 1 0⎥
⎢0 0 0 0 0 0 1⎥
⎣0 0 0 0 0 0 1⎦
P
[[0 1 0 0 0 0 0]
[0 0 1 0 0 0 0]
[0 0 0 1 0 0 0]
[0 0 0 0 1 0 0]
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 977
[0 0 0 0 0 1 0]
[0 0 0 0 0 0 1]
[0 0 0 0 0 0 1]]
Q
[[0. 0.96 0. 0. 0. 0. 0. ]
[0. 0. 0.96 0. 0. 0. 0. ]
[0. 0. 0. 0.96 0. 0. 0. ]
[0. 0. 0. 0. 0.96 0. 0. ]
[0. 0. 0. 0. 0. 0.96 0. ]
[0. 0. 0. 0. 0. 0. 0.96]
[0. 0. 0. 0. 0. 0. 0.96]]
Govt expenditures in peace1, peace2, war1, war2, war3, permanent peace = [0.5 0.5 1.2
1.2 1.2 1.2 0.5]
Constant tax collections = 0.5571895472128002
Govt debt in 6 states = [-1. -1.10123911 -1.20669652 -0.58738132 0.05773868
0.72973868
1.42973868]
Government tax collections minus debt levels in peace1, peace2, war1, war2, war3,
permanent peace
T+b in peace1 = 1.5571895472128001
T+b in peace2 = 1.6584286588928006
T+b in war1 = 1.7638860668928005
T+b in war2 = 1.1445708668928007
T+b in war3 = 0.4994508668928011
T+b in permanent peace = -0.1725491331071991
Total government spending in peace1, peace2, war1, war2, war3, permanent peace
peace1 = 1.5571895472128003
peace2 = 1.6584286588928003
war1 = 1.7638860668928005
war2 = 1.1445708668928007
war3 = 0.4994508668928006
permanent peace = -0.17254913310719933
The cumulative return earned from holding 1 unit market portfolio of government bonds
978CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
1.2775343959060064
where 𝑇𝑡 is tax revenues, 𝑏𝑡 are receipts at 𝑡 from contingent claims that the government had
purchased at time 𝑡 − 1, and
∞
𝑏𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗 (𝑔𝑡+𝑗 − 𝑇𝑡+𝑗 )
𝑗=0
∞ ∞
𝔼𝑡 ∑ 𝛽 𝑗 𝑔𝑡+𝑗 = 𝑏𝑡 + 𝔼𝑡 ∑ 𝛽 𝑗 𝑇𝑡+𝑗
𝑗=0 𝑗=0
which states that the present value of government purchases equals the value of government
assets at 𝑡 plus the present value tax receipts.
With these relabelings, examples presented in consumption smoothing with complete and in-
complete markets can be interpreted as tax-smoothing models.
Returns: In the continuous state version of our incomplete markets model, the gross rate of
return on the government portfolio equals
..math::
R(x_{t+1} | x_t) = frac{b(x_{t+1}}{beta E (b(x_{t+1})| x_t)}
Throughout this lecture, we have taken one-period interest rates and Arrow security prices as
exogenous objects determined outside the model and specified in ways designed to align our
models closely with the consumption smoothing model of Barro [14].
56.5. MORE FINITE MARKOV CHAIN TAX-SMOOTHING EXAMPLES 979
Other lectures make these objects endogenous and describe how a government optimally ma-
nipulates prices of government debt, albeit indirectly via effects distorting taxes have on equi-
librium prices and allocations.
In optimal taxation in an LQ economy and recursive optimal taxation, we study complete-
markets models in which the government recognizes that it can manipulate Arrow securities
prices.
• That lecture is a warm-up for the non-linear-quadratic model of tax smoothing de-
scribed in Optimal Taxation with State-Contingent Debt.
• In both Optimal Taxation in an LQ Economy and Optimal Taxation with State-
Contingent Debt, the government recognizes that its decisions affect prices.
In optimal taxation with incomplete markets, we study an incomplete-markets model in
which the government also manipulates prices of government debt.
980CHAPTER 56. TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
Chapter 57
Robustness
57.1 Contents
• Overview 57.2
• The Model 57.3
• Constructing More Robust Policies 57.4
• Robustness as Outcome of a Two-Person Zero-Sum Game 57.5
• The Stochastic Case 57.6
• Implementation 57.7
• Application 57.8
• Appendix 57.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
57.2 Overview
This lecture modifies a Bellman equation to express a decision-maker’s doubts about transi-
tion dynamics.
His specification doubts make the decision-maker want a robust decision rule.
Robust means insensitive to misspecification of transition dynamics.
The decision-maker has a single approximating model.
He calls it approximating to acknowledge that he doesn’t completely trust it.
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly.
All that he knows is that the actual data-generating model is in some (uncountable) set of
models that surrounds his approximating model.
He quantifies the discrepancy between his approximating model and the genuine data-
generating model by using a quantity called entropy.
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models
981
982 CHAPTER 57. ROBUSTNESS
Note
In reading this lecture, please don’t think that our decision-maker is paranoid
when he conducts a worst-case analysis. By designing a rule that works well
against a worst-case, his intention is to construct a rule that will work well across
a set of models.
Our “robust” decision-maker wants to know how well a given rule will work when he does not
know a single transition law ….
… he wants to know sets of values that will be attained by a given decision rule 𝐹 under a set
of transition laws.
Ultimately, he wants to design a decision rule 𝐹 that shapes these sets of values in ways that
he prefers.
With this in mind, consider the following graph, which relates to a particular decision prob-
lem to be explained below
57.2. OVERVIEW 983
If you want to understand more about why one serious quantitative researcher is interested in
this approach, we recommend Lars Peter Hansen’s Nobel lecture.
57.3. THE MODEL 985
For simplicity, we present ideas in the context of a class of problems with linear transition
laws and quadratic objective functions.
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than
value maximization.
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of con-
trols {𝑢𝑡 } to minimize
∞
∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (1)
𝑡=0
As before,
• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
• 𝑅 is 𝑛 × 𝑛 and 𝑄 is 𝑘 × 𝑘
Here 𝑥𝑡 is the state, 𝑢𝑡 is the control, and 𝑤𝑡 is a shock vector.
For now, we take {𝑤𝑡 } ∶= {𝑤𝑡 }∞
𝑡=1 to be deterministic — a single fixed sequence.
We also allow for model uncertainty on the part of the agent solving this optimization prob-
lem.
In particular, the agent takes 𝑤𝑡 = 0 for all 𝑡 ≥ 0 as a benchmark model but admits the
possibility that this model might be wrong.
As a consequence, she also considers a set of alternative models expressed in terms of se-
quences {𝑤𝑡 } that are “close” to the zero sequence.
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {𝑤𝑡 }.
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the
∞
expression ∑𝑡=0 𝛽 𝑡+1 𝑤𝑡+1
′
𝑤𝑡+1 .
986 CHAPTER 57. ROBUSTNESS
If our agent takes {𝑤𝑡 } as a given deterministic sequence, then, drawing on intuition from
earlier lectures on dynamic programming, we can anticipate Bellman equations such as
where
and 𝐼 is a 𝑗 × 𝑗 identity matrix. Substituting this expression for the maximum into (3) yields
𝑃 = ℬ(𝒟(𝑃 ))
The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and 𝑃 = ℬ(𝑃 ) is the
standard matrix Riccati equation coming from the Bellman equation — see this discussion.
Under some regularity conditions (see [71]), the operator ℬ ∘ 𝒟 has a unique positive definite
fixed point, which we denote below by 𝑃 ̂ .
A robust policy, indexed by 𝜃, is 𝑢 = −𝐹 ̂ 𝑥 where
We also define
The interpretation of 𝐾̂ is that 𝑤𝑡+1 = 𝐾𝑥̂ 𝑡 on the worst-case path of {𝑥𝑡 }, in the sense that
this vector is the maximizer of (4) evaluated at the fixed rule 𝑢 = −𝐹 ̂ 𝑥.
Note that 𝑃 ̂ , 𝐹 ̂ , 𝐾̂ are all determined by the primitives and 𝜃.
Note also that if 𝜃 is very large, then 𝒟 is approximately equal to the identity mapping.
Hence, when 𝜃 is large, 𝑃 ̂ and 𝐹 ̂ are approximately equal to their standard LQ values.
Furthermore, when 𝜃 is large, 𝐾̂ is approximately equal to zero.
Conversely, smaller 𝜃 is associated with greater fear of model misspecification and greater
concern for robustness.
What we have done above can be interpreted in terms of a two-person zero-sum game in
which 𝐹 ̂ , 𝐾̂ are Nash equilibrium objects.
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting
the possibility of misspecification.
Agent 2 is an imaginary malevolent player.
Agent 2’s malevolence helps the original agent to compute bounds on his value function
across a set of models.
We begin with agent 2’s problem.
988 CHAPTER 57. ROBUSTNESS
Agent 2
1. knows a fixed policy 𝐹 specifying the behavior of agent 1, in the sense that 𝑢𝑡 = −𝐹 𝑥𝑡
for all 𝑡
2. responds by choosing a shock sequence {𝑤𝑡 } from a set of paths sufficiently close to the
benchmark sequence {0, 0, 0, …}
A natural way to say “sufficiently close to the zero sequence” is to restrict the summed inner
∞
product ∑𝑡=1 𝑤𝑡′ 𝑤𝑡 to be small.
However, to obtain a time-invariant recursive formulation, it turns out to be convenient to
restrict a discounted inner product
∞
∑ 𝛽 𝑡 𝑤𝑡′ 𝑤𝑡 ≤ 𝜂 (9)
𝑡=1
Now let 𝐹 be a fixed policy, and let 𝐽𝐹 (𝑥0 , w) be the present-value cost of that policy given
sequence w ∶= {𝑤𝑡 } and initial condition 𝑥0 ∈ ℝ𝑛 .
Substituting −𝐹 𝑥𝑡 for 𝑢𝑡 in (1), this value can be written as
∞
𝐽𝐹 (𝑥0 , w) ∶= ∑ 𝛽 𝑡 𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 (10)
𝑡=0
where
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃(𝑤𝑡+1
′
𝑤𝑡+1 − 𝜂)}
w
𝑡=0
∞
max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 }
w
𝑡=0
or, equivalently,
∞
min ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 + 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (12)
w
𝑡=0
57.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 989
subject to (11).
What’s striking about this optimization problem is that it is once again an LQ discounted
dynamic programming problem, with w = {𝑤𝑡 } as the sequence of controls.
The expression for the optimal policy can be found by applying the usual LQ formula (see
here).
We denote it by 𝐾(𝐹 , 𝜃), with the interpretation 𝑤𝑡+1 = 𝐾(𝐹 , 𝜃)𝑥𝑡 .
The remaining step for agent 2’s problem is to set 𝜃 to enforce the constraint (9), which can
be done by choosing 𝜃 = 𝜃𝜂 such that
∞
𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃𝜂 )′ 𝐾(𝐹 , 𝜃𝜂 )𝑥𝑡 = 𝜂 (13)
𝑡=0
Here 𝑥𝑡 is given by (11) — which in this case becomes 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡 .
57.5.2 Using Agent 2’s Problem to Construct Bounds on the Value Sets
Define the minimized object on the right side of problem (12) as 𝑅𝜃 (𝑥0 , 𝐹 ).
Because “minimizers minimize” we have
∞ ∞
𝑅𝜃 (𝑥0 , 𝐹 ) ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } + 𝛽𝜃 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1 ,
𝑡=0 𝑡=0
∞
𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } (14)
𝑡=0
where
∞
ent ∶= 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (14) is a straight line with slope −𝜃.
Technically, it is a “separating hyperplane”.
At a particular value of entropy, the line is tangent to the lower bound of values as a function
of entropy.
In particular, the lower bound on the left side of (14) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)′ 𝐾(𝐹 , 𝜃)𝑥𝑡 (15)
𝑡=0
990 CHAPTER 57. ROBUSTNESS
To construct the lower bound on the set of values associated with all perturbations w satisfy-
ing the entropy constraint (9) at a given entropy level, we proceed as follows:
• Compute the minimizer 𝑅𝜃 (𝑥0 , 𝐹 ) and the associated entropy using (15).
• Compute the lower bound on the value function 𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent and plot it against
ent.
• Repeat the preceding three steps for a range of values of 𝜃 to trace out the lower bound.
Note
This procedure sweeps out a set of separating hyperplanes indexed by different
values for the Lagrange multiplier 𝜃.
∞
𝑉𝜃 (𝑥 𝑡 ′ ′ ̃ ′
̃ 0 , 𝐹 ) = max ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 − 𝛽 𝜃𝑤𝑡+1 𝑤𝑡+1 } (16)
w
𝑡=0
∞ ∞
𝑉𝜃 (𝑥
̃ 0, 𝐹 ) ≥ ∑ 𝛽
𝑡
{−𝑥′𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } − 𝛽 𝜃 ̃ ∑ 𝛽 𝑡 𝑤𝑡+1
′ ′
𝑤𝑡+1
𝑡=0 𝑡=0
∞
̃ 𝑡 ′
̃ 0 , 𝐹 ) + 𝜃 ent ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 }
𝑉𝜃 (𝑥 ′
(17)
𝑡=0
where
∞
ent ≡ 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1
′
𝑤𝑡+1
𝑡=0
The left side of inequality (17) is a straight line with slope 𝜃.̃
The upper bound on the left side of (17) is attained when
∞
ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)̃ ′ 𝐾(𝐹 , 𝜃)𝑥
̃
𝑡 (18)
𝑡=0
To construct the upper bound on the set of values associated all perturbations w with a given
entropy we proceed much as we did for the lower bound
57.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 991
Now in the interest of reshaping these sets of values by choosing 𝐹 , we turn to agent 1’s prob-
lem.
∞
min ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 − 𝛽𝜃𝑤𝑡+1
′
𝑤𝑡+1 } (19)
{𝑢𝑡 }
𝑡=0
∞
∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 − 𝛽𝜃𝐾 ′ 𝐾)𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (20)
𝑡=0
subject to
Once again, the expression for the optimal policy can be found here — we denote it by 𝐹 ̃ .
Clearly, the 𝐹 ̃ we have obtained depends on 𝐾, which, in agent 2’s problem, depended on an
initial policy 𝐹 .
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where
𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))
As you may have already guessed, the robust policy 𝐹 ̂ defined in (7) is a fixed point of the
mapping Φ.
In particular, for any given 𝜃,
2. Φ(𝐾)̂ = 𝐹 ̂
Now we turn to the stochastic case, where the sequence {𝑤𝑡 } is treated as an IID sequence of
random vectors.
In this setting, we suppose that our agent is uncertain about the conditional probability distri-
bution of 𝑤𝑡+1 .
The agent takes the standard normal distribution 𝑁 (0, 𝐼) as the baseline conditional distribu-
tion, while admitting the possibility that other “nearby” distributions prevail.
These alternative conditional distributions of 𝑤𝑡+1 might depend nonlinearly on the history
𝑥𝑠 , 𝑠 ≤ 𝑡.
To implement this idea, we need a notion of what it means for one distribution to be near
another one.
Here we adopt a very useful measure of closeness for distributions known as the relative en-
tropy, or Kullback-Leibler divergence.
For densities 𝑝, 𝑞, the Kullback-Leibler divergence of 𝑞 from 𝑝 is defined as
𝑝(𝑥)
𝐷𝐾𝐿 (𝑝, 𝑞) ∶= ∫ ln [ ] 𝑝(𝑥) 𝑑𝑥
𝑞(𝑥)
𝐽 (𝑥) = min max {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [∫ 𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]} (22)
𝑢 𝜓∈𝒫
Here 𝒫 represents the set of all densities on ℝ𝑛 and 𝜙 is the benchmark distribution 𝑁 (0, 𝐼).
The distribution 𝜙 is chosen as the least desirable conditional distribution in terms of next
period outcomes, while taking into account the penalty term 𝜃𝐷𝐾𝐿 (𝜓, 𝜙).
This penalty term plays a role analogous to the one played by the deterministic penalty 𝜃𝑤′ 𝑤
in (3), since it discourages large deviations from the benchmark.
The maximization problem in (22) appears highly nontrivial — after all, we are maximizing
over an infinite dimensional space consisting of the entire set of densities.
57.6. THE STOCHASTIC CASE 993
However, it turns out that the solution is tractable, and in fact also falls within the class of
normal distributions.
First, we note that 𝐽 has the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 for some positive definite matrix 𝑃 and
constant real number 𝑑.
Moreover, it turns out that if (𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 is nonsingular, then
where
Substituting the expression for the maximum into Bellman equation (22) and using 𝐽 (𝑥) =
𝑥′ 𝑃 𝑥 + 𝑑 gives
𝑥′ 𝑃 𝑥 + 𝑑 = min {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]} (25)
𝑢
Since constant terms do not affect minimizers, the solution is the same as (6), leading to
To solve this Bellman equation, we take 𝑃 ̂ to be the positive definite fixed point of ℬ ∘ 𝒟.
In addition, we take 𝑑 ̂ as the real number solving 𝑑 = 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )], which is
𝛽
𝑑 ̂ ∶= 𝜅(𝜃, 𝑃 ) (26)
1−𝛽
The robust policy in this stochastic case is the minimizer in (25), which is once again 𝑢 =
−𝐹 ̂ 𝑥 for 𝐹 ̂ given by (7).
Substituting the robust policy into (24) we obtain the worst-case shock distribution:
̂ 𝑡 , (𝐼 − 𝜃−1 𝐶 ′ 𝑃 ̂ 𝐶)−1 )
𝑤𝑡+1 ∼ 𝑁 (𝐾𝑥
Before turning to implementation, we briefly outline how to compute several other quantities
of interest.
994 CHAPTER 57. ROBUSTNESS
One thing we will be interested in doing is holding a policy fixed and computing the dis-
counted loss associated with that policy.
So let 𝐹 be a given policy and let 𝐽𝐹 (𝑥) be the associated loss, which, by analogy with (22),
satisfies
Writing 𝐽𝐹 (𝑥) = 𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 and applying the same argument used to derive (23) we get
and
𝛽 𝛽
𝑑𝐹 ∶= 𝜅(𝜃, 𝑃𝐹 ) = 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃𝐹 𝐶)−1 ] (27)
1−𝛽 1−𝛽
If you skip ahead to the appendix, you will be able to verify that −𝑃𝐹 is the solution to the
Bellman equation in agent 2’s problem discussed above — we use this in our computations.
57.7 Implementation
The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ
optimal control.
The code can be found on GitHub.
Here is a brief description of the methods of the class
• d_operator() and b_operator() implement 𝒟 and ℬ respectively
• robust_rule() and robust_rule_simple() both solve for the triple 𝐹 ̂ , 𝐾,̂ 𝑃 ̂ , as
described in equations (7) – (8) and the surrounding discussion
– robust_rule() is more efficient
– robust_rule_simple() is more transparent and easier to follow
• K_to_F() and F_to_K() solve the decision problems of agent 1 and agent 2 respec-
tively
• compute_deterministic_entropy() computes the left-hand side of (13)
• evaluate_F() computes the loss and entropy associated with a given policy — see
this discussion
57.8 Application
Let us consider a monopolist similar to this one, but now facing model uncertainty.
57.8. APPLICATION 995
IID
𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑑 𝑤𝑡+1 , {𝑤𝑡 } ∼ 𝑁 (0, 1)
(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑦𝑡 − 𝛾 − 𝑐𝑦𝑡
2
1
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡⎥
⎤ and 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣𝑑𝑡 ⎦
0 𝑏 0
𝑅 = − ⎢𝑏 −𝑎1 1/2⎤
⎡
⎥ and 𝑄 = 𝛾/2
⎣0 1/2 0 ⎦
1 0 0 0 0
𝐴=⎡ ⎤
⎢0 1 0⎥ , 𝐵=⎡ ⎤
⎢1⎥ , 𝐶=⎡
⎢0⎥
⎤
⎣0 0 𝜌⎦ ⎣0⎦ ⎣𝜎𝑑 ⎦
The standard normal distribution for 𝑤𝑡 is understood as the agent’s baseline, with uncer-
tainty parameterized by 𝜃.
We compute value-entropy correspondences for two policies
1. The no concern for robustness policy 𝐹0 , which is the ordinary LQ loss minimizer.
The code for producing the graph shown above, with blue being for the robust policy, is as
follows
996 CHAPTER 57. ROBUSTNESS
In [3]: """
Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski
"""
# Model parameters
a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
c = 2
γ = 50.0
θ = 0.002
ac = (a_0 - c) / 2.0
# Define LQ matrices
R = -R # For minimization
Q = γ / 2
# ----------------------------------------------------------------------- #
# Functions
# ----------------------------------------------------------------------- #
"""
Given θ (scalar, dtype=float) and policy F (array_like), returns the
value associated with that policy under the worst case path for {w_t},
as well as the entropy level.
"""
rlq = qe.robustlq.RBLQ(Q, R, A, B, C, β, θ)
K_F, P_F, d_F, O_F, o_F = rlq.evaluate_F(F)
x0 = np.array([[1.], [0.], [0.]])
value = - x0.T @ P_F @ x0 - d_F
entropy = x0.T @ O_F @ x0 + o_F
return list(map(float, (value, entropy)))
"""
Compute the value function and entropy levels for a θ path
increasing until it reaches the specified target entropy value.
Parameters
==========
emax: scalar
The target entropy value
F: array_like
The policy function to be evaluated
bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.
Returns
=======
df: pd.DataFrame
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.
"""
if bw == 'worst':
θs = 1 / np.linspace(1e-8, 1000, grid_size)
else:
θs = -1 / np.linspace(1e-8, 1000, grid_size)
for θ in θs:
df.loc[θ] = evaluate_policy(θ, F)
if df.loc[θ, 'entropy'] >= emax:
break
df = df.dropna(how='any')
return df
# ------------------------------------------------------------------------�
↪ #
# Main
# ------------------------------------------------------------------------�
↪ #
emax = 1.6e6
fig, ax = plt.subplots()
ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()
class Curve:
plt.show()
57.8. APPLICATION 999
Can you explain the different shape of the value-entropy correspondence for the robust pol-
icy?
1000 CHAPTER 57. ROBUSTNESS
57.9 Appendix
We sketch the proof only of the first claim in this section, which is that, for any given 𝜃,
𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as given in (8).
This is the content of the next lemma.
Lemma. If 𝑃 ̂ is the fixed point of the map ℬ ∘ 𝒟 and 𝐹 ̂ is the robust policy as given in (7),
then
Proof: As a first step, observe that when 𝐹 = 𝐹 ̂ , the Bellman equation associated with the
LQ problem (11) – (12) is
(revisit this discussion if you don’t know where (29) comes from) and the optimal policy is
Using the definition of 𝒟, we can rewrite the right-hand side more simply as
Although it involves a substantial amount of algebra, it can be shown that the latter is just
𝑃̂ .
(Hint: Use the fact that 𝑃 ̂ = ℬ(𝒟(𝑃 ̂ )))
Chapter 58
58.1 Contents
• Overview 58.2
• Review of useful LQ dynamic programming formulas 58.3
• Linked Ricatti equations for Markov LQ dynamic programming 58.4
• Applications 58.5
• Example 1 58.6
• Example 2 58.7
• More examples 58.8
Co-authors: Sebastian Graves and Zejin Shi
In addition to what’s in Anaconda, this lecture will need the following libraries:
58.2 Overview
This lecture describes Markov jump linear quadratic dynamic programming, an ex-
tension of the method described in the first LQ control lecture.
Markov jump linear quadratic dynamic programming is described and analyzed in [46] and
the references cited there.
The method has been applied to problems in macroeconomics and monetary economics by
[155] and [154].
The periodic models of seasonality described in chapter 14 of [78] are a special case of Markov
jump linear quadratic problems.
Markov jump linear quadratic dynamic programming combines advantages of
• the computational simplicity of linear quadratic dynamic programming, with
• the ability of finite state Markov chains to represent interesting patterns of random
variation.
The idea is to replace the constant matrices that define a linear quadratic dynamic pro-
1001
1002CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
gramming problem with 𝑁 sets of matrices that are fixed functions of the state of an 𝑁
state Markov chain.
The state of the Markov chain together with the continuous 𝑛 × 1 state vector 𝑥𝑡 form the
state of the system.
For the class of infinite horizon problems being studied in this lecture, we obtain 𝑁 interre-
lated matrix Riccati equations that determine 𝑁 optimal value functions and 𝑁 linear deci-
sion rules.
One of these value functions and one of these decision rules apply in each of the 𝑁 Markov
states.
That is, when the Markov state is in state 𝑗, the value function and the decision rule for state
𝑗 prevails.
The problem is
∞
−𝑥′0 𝑃 𝑥0 − 𝜌 = min
∞
𝐸 ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝑢𝑡 )
{𝑢𝑡 }𝑡=0
𝑡=0
𝑢𝑡 = −𝐹 𝑥𝑡
− (𝑥′𝑡 𝑃 𝑥𝑡 + 𝜌)
𝜌 = 𝛽 (𝜌 + trace(𝑃 𝐶𝐶 ′ ))
With the preceding formulas in mind, we are ready to approach Markov Jump linear
quadratic dynamic programming.
The key idea is to make the matrices 𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 fixed functions of a finite state 𝑠 that
is governed by an 𝑁 state Markov chain.
This makes decision rules depend on the Markov state, and so fluctuate through time in lim-
ited ways.
In particular, we use the following extension of a discrete-time linear quadratic dynamic pro-
gramming problem.
We let 𝑠(𝑡) ≡ 𝑠𝑡 ∈ [1, 2, … , 𝑁 ] be a time 𝑡 realization of an 𝑁 -state Markov chain with transi-
tion matrix Π having typical element Π𝑖𝑗 .
Here 𝑖 denotes today and 𝑗 denotes tomorrow and
We’ll switch between labeling today’s state as 𝑠(𝑡) and 𝑖 and between labeling tomorrow’s
state as 𝑠(𝑡 + 1) or 𝑗.
The decision-maker solves the minimization problem:
∞
min
∞
𝐸 ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝑠(𝑡), 𝑢𝑡 )
{𝑢𝑡 }𝑡=0
𝑡=0
with
𝑟(𝑥𝑡 , 𝑠(𝑡), 𝑢𝑡 ) = −(𝑥′𝑡 𝑅(𝑠𝑡 )𝑥𝑡 + 𝑢′𝑡 𝑄(𝑠𝑡 )𝑢𝑡 + 2𝑢′𝑡 𝑊 (𝑠𝑡 )𝑥𝑡 )
subject to linear laws of motion with matrices (𝐴, 𝐵, 𝐶) each possibly dependent on the
Markov-state-𝑠𝑡 :
𝑢𝑡 = −𝐹 (𝑠𝑡 )𝑥𝑡
1004CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
or equivalently
−𝑥′𝑡 𝑃𝑖 𝑥𝑡 − 𝜌𝑖
The optimal value functions −𝑥′ 𝑃𝑖 𝑥 − 𝜌𝑖 for 𝑖 = 1, … , 𝑛 satisfy the 𝑁 interrelated Bellman
equations
−𝑥′ 𝑃𝑖 𝑥 − 𝜌𝑖 = max −
𝑢
The matrices 𝑃 (𝑠(𝑡)) = 𝑃𝑖 and the scalars 𝜌(𝑠𝑡 ) = 𝜌𝑖 , 𝑖 = 1, …, n satisfy the following stacked
system of algebraic matrix Riccati equations:
58.5 Applications
We now describe some Python code and a few examples that put the code to work.
To begin, we import these Python modules
%matplotlib inline
58.6 Example 1
∞
max
∞
𝐸0 ∑ 𝛽 𝑡 𝑟 (𝑠𝑡 , 𝑘𝑡 )
{𝑘𝑡 }𝑡=1 𝑡=0
2
𝑟 (𝑠𝑡 , 𝑘𝑡 ) = 𝑓1 (𝑠𝑡 ) 𝑘𝑡 − 𝑓2 (𝑠𝑡 ) 𝑘𝑡2 − 𝑑 (𝑠𝑡 ) (𝑘𝑡+1 − 𝑘𝑡 ) ,
𝑘𝑡+1 − 𝑘𝑡 = 𝑢𝑡
We can think of 𝑘𝑡 as the decision-maker’s capital and 𝑢𝑡 as costs of adjusting the level of
capital.
We assume that 𝑓1 (𝑠𝑡 ) > 0, 𝑓2 (𝑠𝑡 ) > 0, and 𝑑 (𝑠𝑡 ) > 0.
Denote the state transition matrix for Markov state 𝑠𝑡 ∈ {𝑠1̄ , 𝑠2̄ } as Π:
𝑘
Let 𝑥𝑡 = [ 𝑡 ]
1
We can represent the one-period payoff function 𝑟 (𝑠𝑡 , 𝑘𝑡 ) and the state-transition law as
⎛
⎜ ⎞
⎟
⎜
⎜ 𝑓2 (𝑠𝑡 ) − 𝑓1 (𝑠 𝑡)
2⎟
′
= − ⎜𝑥𝑡 [ 𝑓1 (𝑠𝑡 ) 2 (𝑠𝑡 ) 𝑢𝑡 ⎟
]𝑥𝑡 + 𝑑⏟ ⎟
⎜
⎜ ⏟⏟ −⏟⏟⏟⏟ 0 ⏟⏟ ⎟
⎟
2 ⏟ ≡𝑄(𝑠𝑡 )
⎝ ≡𝑅(𝑠 𝑡 ) ⎠
𝑘𝑡+1 1
𝑥𝑡+1 = [ ]= 𝐼⏟2 𝑥𝑡 + [ ] 𝑢𝑡
1 ⏟0
≡𝐴(𝑠𝑡 )
≡𝐵(𝑠𝑡 )
"""
Rs = np.zeros((m, n, n))
Qs = np.zeros((m, k, k))
for i in range(m):
Rs[i, 0, 0] = f2_vals[i]
Rs[i, 1, 0] = - f1_vals[i] / 2
Rs[i, 0, 1] = - f1_vals[i] / 2
Qs[i, 0, 0] = d_vals[i]
The continuous part of the state 𝑥𝑡 consists of two variables, namely, 𝑘𝑡 and a constant term.
We start with a Markov transition matrix that makes the Markov state be strictly periodic:
0 1
Π1 = [ ],
1 0
𝑓1 (𝑠1̄ ) = 𝑓1 (𝑠2̄ ) = 1,
𝑓2 (𝑠1̄ ) = 𝑓2 (𝑠2̄ ) = 1
In contrast to 𝑓1 (𝑠𝑡 ) and 𝑓2 (𝑠𝑡 ), we make the adjustment cost 𝑑(𝑠𝑡 ) vary across Markov states
𝑠𝑡 .
We set the adjustment cost to be lower in Markov state 𝑠2̄
The following code forms a Markov switching LQ problem and computes the optimal value
functions and optimal decision rules for each Markov state
58.6. EXAMPLE 1 1007
Let’s look at the value function matrices and the decision rules for each Markov state
In [9]: # P(s)
ex1_a.Ps
[[ 1.37424214, -0.68712107],
[-0.68712107, -4.65643947]]])
In [11]: # F(s)
ex1_a.Fs
[[ 0.74848427, -0.37424214]]])
Now we’ll plot the decision rules and see if they make sense
fig, ax = plt.subplots()
ax.plot(k_grid, k_grid + u1_star, label="$\overline{s}_1$ (high)")
ax.plot(k_grid, k_grid + u2_star, label="$\overline{s}_2$ (low)")
# The optimal k*
ax.scatter([0.5, 0.5], [0.5, 0.5], marker="*")
ax.plot([k_star[0], k_star[0]], [0., 1.0], '--')
# 45 degree line
1008CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
ax.set_xlabel("$k_t$")
ax.set_ylabel("$k_{t+1}$")
ax.legend()
plt.show()
The above graph plots 𝑘𝑡+1 = 𝑘𝑡 + 𝑢𝑡 = 𝑘𝑡 − 𝐹 𝑥𝑡 as an affine (i.e., linear in 𝑘𝑡 plus a constant)
function of 𝑘𝑡 for both Markov states 𝑠𝑡 .
It also plots the 45 degree line.
Notice that the two 𝑠𝑡 -dependent closed loop functions that determine 𝑘𝑡+1 as functions of 𝑘𝑡
share the same rest point (also called a fixed point) at 𝑘𝑡 = 0.5.
Evidently, the optimal decision rule in Markov state 𝑠2̄ , in which the adjustment cost is lower,
makes 𝑘𝑡+1 a flatter function of 𝑘𝑡 in Markov state 𝑠2̄ .
This happens because when 𝑘𝑡 is not at its fixed point, |𝑢𝑡 (𝑠2̄ )| > |𝑢𝑡 (𝑠1̄ )|, so that the
decision-maker adjusts toward the fixed point faster when the Markov state 𝑠𝑡 takes a value
that makes it cheaper.
fig, ax = plt.subplots()
ax.plot(range(T), x_path[0, :-1])
ax.set_xlabel("$t$")
ax.set_ylabel("$k_t$")
ax.set_title("Optimal path of $k_t$")
plt.show()
58.6. EXAMPLE 1 1009
Now we’ll depart from the preceding transition matrix that made the Markov state be strictly
periodic.
We’ll begin with symmetric transition matrices of the form
1−𝜆 𝜆
Π2 = [ ].
𝜆 1−𝜆
[[ 0.74434525, -0.37217263]]])
[[ 0.72818728, -0.36409364]]])
1010CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
for i, λ in enumerate(λ_vals):
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
ax.set_xlabel("$\lambda$")
ax.set_ylabel("$F(s_t)$")
ax.set_title(f"Coefficient on {state_var}")
ax.legend()
plt.show()
58.6. EXAMPLE 1 1011
Notice how the decision rules’ constants and slopes behave as functions of 𝜆.
Evidently, as the Markov chain becomes more nearly periodic (i.e., as 𝜆 → 1), the dynamic
program adjusts capital faster in the low adjustment cost Markov state to take advantage of
what is only temporarily a more favorable time to invest.
Now let’s study situations in which the Markov transition matrix Π is asymmetric
1−𝜆 𝜆
Π3 = [ ].
𝛿 1−𝛿
[[ 0.72749075, -0.36374537]]])
for i, λ in enumerate(λ_vals):
λ_grid[i, :] = λ
δ_grid[i, :] = δ_vals
for j, δ in enumerate(δ_vals):
Π3 = np.array([[1-λ, λ],
[δ, 1-δ]])
The following code defines a wrapper function that computes optimal decision rules for cases
with different Markov transition matrices
# Symmetric Π
# Notice that pure periodic transition is a special case
# when λ=1
print("symmetric Π case:\n")
λ_vals = np.linspace(0., 1., 10)
F1 = np.empty((λ_vals.size, len(state_vec)))
F2 = np.empty((λ_vals.size, len(state_vec)))
for i, λ in enumerate(λ_vals):
Π2 = np.array([[1-λ, λ],
[λ, 1-λ]])
ax.set_xlabel("$\lambda$")
ax.set_ylabel("$F(\overline{s}_t)$")
1014CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
ax.set_title(f"coefficient on {state_var}")
ax.legend()
plt.show()
ax.set_xlabel("$\lambda$")
ax.set_ylabel("$k$")
ax.set_title("Optimal k levels and k targets")
ax.text(0.5, min(k_star)+(max(k_star)-min(k_star))/20,�
↪"$\lambda=0.5$")
ax.legend(bbox_to_anchor=(1., 1.))
plt.show()
# Asymmetric Π
print("asymmetric Π case:\n")
δ_vals = np.linspace(0., 1., 10)
for i, λ in enumerate(λ_vals):
λ_grid[i, :] = λ
δ_grid[i, :] = δ_vals
for j, δ in enumerate(δ_vals):
Π3 = np.array([[1-λ, λ],
[δ, 1-δ]])
plt.show()
To illustrate the code with another example, we shall set 𝑓2 (𝑠𝑡 ) and 𝑑(𝑠𝑡 ) as constant func-
tions and
Thus, the sole role of the Markov jump state 𝑠𝑡 is to identify times in which capital is very
productive and other times in which it is less productive.
The example below reveals much about the structure of the optimum problem and optimal
policies.
Only 𝑓1 (𝑠𝑡 ) varies with 𝑠𝑡 .
𝑓1 (𝑠𝑡 )
So there are different 𝑠𝑡 -dependent optimal static 𝑘 level in different states 𝑘∗ (𝑠𝑡 ) = 2𝑓2 (𝑠𝑡 ) ,
values of 𝑘 that maximize one-period payoff functions in each state.
We denote a target 𝑘 level as 𝑘𝑡𝑎𝑟𝑔𝑒𝑡 (𝑠𝑡 ), the fixed point of the optimal policies in each state,
given the value of 𝜆.
We call 𝑘𝑡𝑎𝑟𝑔𝑒𝑡 (𝑠𝑡 ) a “target” because in each Markov state 𝑠𝑡 , optimal policies are contrac-
tion mappings and will push 𝑘𝑡 towards a fixed point 𝑘𝑡𝑎𝑟𝑔𝑒𝑡 (𝑠𝑡 ).
When 𝜆 → 0, each Markov state becomes close to absorbing state and consequently
𝑘𝑡𝑎𝑟𝑔𝑒𝑡 (𝑠𝑡 ) → 𝑘∗ (𝑠𝑡 ).
But when 𝜆 → 1, the Markov transition matrix becomes more nearly periodic, so the op-
timum decision rules target more at the optimal k level in the other state in order to enjoy
higher expected payoff in the next period.
The switch happens at 𝜆 = 0.5 when both states are equally likely to be reached.
Below we plot an additional figure that shows optimal 𝑘 levels in the two states Markov jump
state and also how the targeted 𝑘 levels change as 𝜆 changes.
symmetric Π case:
1016CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
58.6. EXAMPLE 1 1017
asymmetric Π case:
1018CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
symmetric Π case:
58.6. EXAMPLE 1 1019
asymmetric Π case:
1020CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
58.7 Example 2
We now add to the example 1 setup another state variable 𝑤𝑡 that follows the evolution law
We think of 𝑤𝑡 as a rental rate or tax rate that the decision maker pays each period for 𝑘𝑡 .
To capture this idea, we add to the decision-maker’s one-period payoff function the product of
𝑤𝑡 and 𝑘𝑡
58.7. EXAMPLE 2 1021
2
𝑟 (𝑠𝑡 , 𝑘𝑡 , 𝑤𝑡 ) = 𝑓1 (𝑠𝑡 ) 𝑘𝑡 − 𝑓2 (𝑠𝑡 ) 𝑘𝑡2 − 𝑑 (𝑠𝑡 ) (𝑘𝑡+1 − 𝑘𝑡 ) − 𝑤𝑡 𝑘𝑡 ,
𝑘𝑡
⎡
We now let the continuous part of the state at time 𝑡 be 𝑥𝑡 = ⎢ 1 ⎤⎥ and continue to set the
𝑤
⎣ 𝑡⎦
control 𝑢𝑡 = 𝑘𝑡+1 − 𝑘𝑡 .
We can write the one-period payoff function 𝑟 (𝑠𝑡 , 𝑘𝑡 , 𝑤𝑡 ) and the state-transition law as
2
𝑟 (𝑠𝑡 , 𝑘𝑡 , 𝑤𝑡 ) = 𝑓1 (𝑠𝑡 ) 𝑘𝑡 − 𝑓2 (𝑠𝑡 ) 𝑘𝑡2 − 𝑑 (𝑠𝑡 ) (𝑘𝑡+1 − 𝑘𝑡 ) − 𝑤𝑡 𝑘𝑡
⎛
⎜ ⎞
⎟
⎜ 𝑓2 (𝑠𝑡 ) − 𝑓1 (𝑠 2
𝑡) 1
2 ⎟
⎜
⎜ ⎡ ⎤ 2⎟
′
= − ⎜𝑥𝑡 ⎢− 2𝑓 1 (𝑠 𝑡 )
0 (𝑠𝑡 ) 𝑢𝑡 ⎟
0 ⎥𝑥𝑡 + 𝑑⏟ ⎟ ,
⎜
⎜ 1 ⎟
⎟
⎜ ⏟⏟⎣ ⏟2⏟⏟⏟⏟ 0 ⏟⏟⏟⏟ 0⎦ ≡𝑄(𝑠𝑡 ) ⎟
⎝ ≡𝑅(𝑠𝑡 ) ⎠
and
𝑘𝑡+1 1 0 0 1 0
𝑥𝑡+1 = ⎡
⎢ 1 ⎤ = ⎡0
⎥ ⎢ 1 0 ⎤𝑥 + ⎡0⎤ 𝑢 + ⎡ 0 ⎤𝜖
⎥ 𝑡 ⎢ ⎥ 𝑡 ⎢ ⎥ 𝑡+1
⎣0 𝛼0 (𝑠𝑡 ) 𝜌 (𝑠𝑡 )⎦
⎣𝑤𝑡+1 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⎣0⎦
⏟ ⎣
⏟ 𝜎 (𝑠𝑡 )⎦
≡𝐴(𝑠𝑡 ) ≡𝐵(𝑠𝑡 ) ≡𝐶(𝑠𝑡 )
m = len(f1_vals)
n, k, j = 3, 1, 1
Rs = np.zeros((m, n, n))
Qs = np.zeros((m, k, k))
As = np.zeros((m, n, n))
Bs = np.zeros((m, n, k))
Cs = np.zeros((m, n, j))
for i in range(m):
Rs[i, 0, 0] = f2_vals[i]
Rs[i, 1, 0] = - f1_vals[i] / 2
Rs[i, 0, 1] = - f1_vals[i] / 2
Rs[i, 0, 2] = 1/2
Rs[i, 2, 0] = 1/2
Qs[i, 0, 0] = d_vals[i]
As[i, 0, 0] = 1
1022CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
As[i, 1, 1] = 1
As[i, 2, 1] = α0_vals[i]
As[i, 2, 2] = ρ_vals[i]
Ns = None
k_star = None
symmetric Π case:
58.7. EXAMPLE 2 1023
asymmetric Π case:
1024CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
58.7. EXAMPLE 2 1025
symmetric Π case:
1026CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
asymmetric Π case:
58.7. EXAMPLE 2 1027
1028CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
symmetric Π case:
58.7. EXAMPLE 2 1029
asymmetric Π case:
1030CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
58.7. EXAMPLE 2 1031
symmetric Π case:
1032CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
asymmetric Π case:
58.7. EXAMPLE 2 1033
1034CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
symmetric Π case:
58.7. EXAMPLE 2 1035
asymmetric Π case:
1036CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
58.7. EXAMPLE 2 1037
symmetric Π case:
1038CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
asymmetric Π case:
58.7. EXAMPLE 2 1039
1040CHAPTER 58. MARKOV JUMP LINEAR QUADRATIC DYNAMIC PROGRAMMING
The following lectures describe how Markov jump linear quadratic dynamic programming can
be used to extend the [14] model of optimal tax-smoothing and government debt in several
interesting directions
59.1 Contents
This lecture uses the method of Markov jump linear quadratic dynamic programming
that is described in lecture Markov Jump LQ dynamic programming to extend the [14] model
of optimal tax-smoothing and government debt in a particular direction.
This lecture has two sequels that offer further extensions of the Barro model
The extensions are modified versions of his 1979 model later suggested by Barro (1999 [15],
2003 [16]).
1041
1042 CHAPTER 59. HOW TO PAY FOR A WAR: PART 1
Barro’s original 1979 [14] model is about a government that borrows and lends in order to
minimize an intertemporal measure of distortions caused by taxes.
Technical tractability induced Barro [14] to assume that
• the government trades only one-period risk-free debt, and
• the one-period risk-free interest rate is constant
By using Markov jump linear quadratic dynamic programming we can allow interest rates to
move over time in empirically interesting ways.
Also, by expanding the dimension of the state, we can add a maturity composition decision to
the government’s problem.
It is by doing these two things that we extend Barro’s 1979 [14] model along lines he sug-
gested in Barro (1999 [15], 2003 [16]).
Barro (1979) [14] assumed
• that a government faces an exogenous sequence of expenditures that it must finance
by a tax collection sequence whose expected present value equals the initial debt it owes
plus the expected present value of those expenditures.
• that the government wants to minimize the following measure of tax distortions:
∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝑇𝑡2 , where 𝑇𝑡 are total tax collections and 𝐸0 is a mathematical expectation
conditioned on time 0 information.
• that the government trades only one asset, a risk-free one-period bond.
• that the gross interest rate on the one-period bond is constant and equal to 𝛽 −1 , the
reciprocal of the factor 𝛽 at which the government discounts future tax distortions.
Barro’s model can be mapped into a discounted linear quadratic dynamic programming prob-
lem.
Partly inspired by Barro (1999) [15] and Barro (2003) [16], our generalizations of Barro’s
(1979) [14] model assume
• that the government borrows or saves in the form of risk-free bonds of maturities
1, 2, … , 𝐻.
• that interest rates on those bonds are time-varying and in particular, governed by a
jointly stationary stochastic process.
Our generalizations are designed to fit within a generalization of an ordinary linear quadratic
dynamic programming problem in which matrices that define the quadratic objective function
and the state transition function are time-varying and stochastic.
This generalization, known as a Markov jump linear quadratic dynamic program, com-
bines
• the computational simplicity of linear quadratic dynamic programming, and
• the ability of finite state Markov chains to represent interesting patterns of random
variation.
We want the stochastic time variation in the matrices defining the dynamic programming
problem to represent variation over time in
• interest rates
• default rates
• roll over risks
As described in Markov Jump LQ dynamic programming, the idea underlying Markov jump
linear quadratic dynamic programming is to replace the constant matrices defining a
59.3. PUBLIC FINANCE QUESTIONS 1043
linear quadratic dynamic programming problem with matrices that are fixed functions
of an 𝑁 state Markov chain.
For infinite horizon problems, this leads to 𝑁 interrelated matrix Riccati equations that pin
down 𝑁 value functions and 𝑁 linear decision rules, applying to the 𝑁 Markov states.
We begin by solving a version of the Barro (1979) [14] model by mapping it into the original
LQ framework.
As mentioned in this lecture, the Barro model is mathematically isomorphic with the LQ per-
manent income model.
Let 𝑇𝑡 denote tax collections, 𝛽 a discount factor, 𝑏𝑡,𝑡+1 time 𝑡 + 1 goods that the government
promises to pay at 𝑡, 𝐺𝑡 government purchases, 𝑝𝑡,𝑡+1 the number of time 𝑡 goods received per
time 𝑡 + 1 goods promised.
Evidently, 𝑝𝑡,𝑡+1 is inversely related to appropriate corresponding gross interest rates on gov-
ernment debt.
In the spirit of Barro (1979) [14], the stochastic process of government expenditures is exoge-
nous.
The government’s problem is to choose a plan for taxation and borrowing {𝑏𝑡+1 , 𝑇𝑡 }∞
𝑡=0 to
minimize
1044 CHAPTER 59. HOW TO PAY FOR A WAR: PART 1
∞
𝐸0 ∑ 𝛽 𝑡 𝑇𝑡2
𝑡=0
𝐺𝑡 = 𝑈𝑔,𝑡 𝑧𝑡
• later we will extend the model to allow 𝑝𝑡,𝑡+1 to vary over time
𝑏𝑡−1,𝑡
To map into the LQ framework, we use 𝑥𝑡 = [ ] as the state vector, and 𝑢𝑡 = 𝑏𝑡,𝑡+1 as
𝑧𝑡
the control variable.
Therefore, the (𝐴, 𝐵, 𝐶) matrices are defined by the state-transition law:
0 0 1 0
𝑥𝑡+1 = [ ] 𝑥𝑡 + [ ] 𝑢𝑡 + [ ] 𝑤𝑡+1
0 𝐴22 0 𝐶2
To find the appropriate (𝑅, 𝑄, 𝑊 ) matrices, we note that 𝐺𝑡 and 𝑏𝑡−1,𝑡 can be written as ap-
propriately defined functions of the current state:
𝐺𝑡 = 𝑆𝐺 𝑥𝑡 , 𝑏𝑡−1,𝑡 = 𝑆1 𝑥𝑡
𝑇𝑡 = 𝑆𝑥𝑡 + 𝑀𝑡 𝑢𝑡
We will implement this constant interest-rate version first, assuming that 𝐺𝑡 follows an AR(1)
process:
1
To do this, we set 𝑧𝑡 = [ ], and consequently:
𝐺𝑡
1 0 0
𝐴22 = [ ̄ ] , 𝐶2 = [ ]
𝐺 𝜌 𝜎
C2 = np.array([[0],
[σ]])
Ug = np.array([[0, 1]])
# LQ framework matrices
A_t = np.zeros((1, 3))
A_b = np.hstack((np.zeros((2, 1)), A22))
A = np.vstack((A_t, A_b))
B = np.zeros((3, 1))
B[0, 0] = 1
M = np.array([[-β]])
R = S.T @ S
Q = M.T @ M
W = M.T @ S
We can see the isomorphism by noting that consumption is a martingale in the permanent
income model and that taxation is a martingale in Barro’s model.
1046 CHAPTER 59. HOW TO PAY FOR A WAR: PART 1
𝑇𝑡 = 𝑆𝑥𝑡 + 𝑀 𝑢𝑡 = (𝑆 − 𝑀 𝐹 )𝑥𝑡
and
(𝑆 − 𝑀 𝐹 )(𝐴 − 𝐵𝐹 ) = (𝑆 − 𝑀 𝐹 ),
In [5]: S - M @ F, (S - M @ F) @ (A - B @ F)
This explains the gradual fanning out of taxation if we simulate the Barro model a large
number of times:
In [6]: T = 500
for i in range(250):
x, u, w = LQBarro.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), ((S - M @ F) @ x)[0, :])
plt.xlabel('Time')
plt.ylabel('Taxation')
plt.show()
59.4. BARRO (1979) MODEL 1047
We can see a similar, but a smoother pattern, if we plot government debt over time.
Debt is smoother due to the persistence of the government spending process.
In [7]: T = 500
for i in range(250):
x, u, w = LQBarro.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), x[0, :])
plt.xlabel('Time')
plt.ylabel('Taxation')
plt.show()
1048 CHAPTER 59. HOW TO PAY FOR A WAR: PART 1
To implement the extension to the Barro model in which 𝑝𝑡,𝑡+1 varies over time, we must al-
low the M matrix to be time-varying.
Our 𝑄 and 𝑊 matrices must also vary over time.
We can solve such a model using the LQMarkov class that solves Markov jump linear quan-
dratic control problems as described above.
The code for the class can be viewed here.
The class takes lists of matrices that corresponds to 𝑁 Markov states.
The value and policy functions are then found by iterating on the system of algebraic matrix
Riccati equations.
The solutions for 𝑃 𝑠, 𝐹 𝑠, 𝑑𝑠 are stored as attributes.
The class also contains a “method” for simulating the model.
We can use the above class to implement a version of the Barro model with a time-varying
interest rate. The simplest way to extend the model is to allow the interest rate to take two
possible values. We set:
1
𝑝𝑡,𝑡+1 = 𝛽 + 0.02 = 0.97
59.6. BARRO MODEL WITH A TIME-VARYING INTEREST RATE 1049
2
𝑝𝑡,𝑡+1 = 𝛽 − 0.017 = 0.933
Thus, the first Markov state has a low-interest rate, and the second Markov state has a high-
interest rate.
We also need to specify a transition matrix for the Markov state.
We use:
0.8 0.2
Π=[ ]
0.2 0.8
(so each Markov state is persistent, and there is an equal chance of moving from one state to
the other)
The choice of parameters means that the unconditional expectation of 𝑝𝑡,𝑡+1 is 0.9515, higher
than 𝛽(= 0.95).
If we were to set 𝑝𝑡,𝑡+1 = 0.9515 in the version of the model with a constant interest rate,
government debt would explode.
As = [A, A]
Bs = [B, B]
Cs = [C, C]
Rs = [R, R]
M1 = np.array([[-β - 0.02]])
M2 = np.array([[-β + 0.017]])
Q1 = M1.T @ M1
Q2 = M2.T @ M2
Qs = [Q1, Q2]
W1 = M1.T @ S
W2 = M2.T @ S
Ws = [W1, W2]
In [9]: lqm.Fs[0]
In [10]: lqm.Fs[1]
Simulating a large number of such economies over time reveals interesting dynamics.
Debt tends to stay low and stable but recurrently surges temporarily to higher levels.
In [11]: T = 2000
x0 = np.array([[1000, 1, 25]])
for i in range(250):
x, u, w, s = lqm.compute_sequence(x0, ts_length=T)
plt.plot(list(range(T+1)), x[0, :])
plt.xlabel('Time')
plt.ylabel('Taxation')
plt.show()
Chapter 60
60.1 Contents
1051
1052 CHAPTER 60. HOW TO PAY FOR A WAR: PART 2
In our earlier lecture, we relaxed the second of these assumptions but not the first.
In particular, we used Markov jump linear quadratic dynamic programming to allow the ex-
ogenous interest rate to vary over time.
In this lecture, we add a maturity composition decision to the government’s problem by ex-
panding the dimension of the state.
We assume
• that the government borrows or saves in the form of risk-free bonds of maturities
1, 2, … , 𝐻.
• that interest rates on those bonds are time-varying and in particular are governed by a
jointly stationary stochastic process.
Let’s start with some standard imports:
Let 𝑇𝑡 denote tax collections, 𝛽 a discount factor, 𝑏𝑡,𝑡+1 time 𝑡 + 1 goods that the government
promises to pay at 𝑡, 𝑏𝑡,𝑡+2 time 𝑡 + 2 goods that the government promises to pay at time 𝑡,
𝐺𝑡 government purchases, 𝑝𝑡,𝑡+1 the number of time 𝑡 goods received per time 𝑡 + 1 goods
promised, and 𝑝𝑡,𝑡+2 the number of time 𝑡 goods received per time 𝑡 + 2 goods promised.
Evidently, 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 are inversely related to appropriate corresponding gross interest rates
on government debt.
In the spirit of Barro (1979) [14], government expenditures are governed by an exogenous
stochastic process.
Given initial conditions 𝑏−2,0 , 𝑏−1,0 , 𝑧0 , 𝑖0 , where 𝑖0 is the initial Markov state, the government
chooses a contingency plan for {𝑏𝑡,𝑡+1 , 𝑏𝑡,𝑡+2 , 𝑇𝑡 }∞ 𝑡=0 to maximize.
∞
−𝐸0 ∑ 𝛽 𝑡 [𝑇𝑡2 + 𝑐1 (𝑏𝑡,𝑡+1 − 𝑏𝑡,𝑡+2 )2 ]
𝑡=0
Here 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼) and Π𝑖𝑗 is the probability that the Markov state moves from state 𝑖 to
state 𝑗 in one period.
The variables 𝑇𝑡 , 𝑏𝑡,𝑡+1 , 𝑏𝑡,𝑡+2 are control variables chosen at 𝑡, while the variables 𝑏𝑡−1,𝑡 , 𝑏𝑡−2,𝑡
are endogenous state variables inherited from the past at time 𝑡 and 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 are exoge-
nous state variables at time 𝑡.
The parameter 𝑐1 imposes a penalty on the government’s issuing different quantities of one
and two-period debt.
This penalty deters the government from taking large “long-short” positions in debt of differ-
ent maturities. An example below will show this in action.
As well as extending the model to allow for a maturity decision for government debt, we can
also in principle allow the matrices 𝑈𝑔,𝑡 , 𝐴22,𝑡 , 𝐶2,𝑡 to depend on the Markov state.
First, define
𝑏̂𝑡
𝑏̄𝑡 = [ ]
𝑏𝑡−1,𝑡+1
𝑏̄
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
𝑏𝑡,𝑡+1
𝑢𝑡 = [ ]
𝑏𝑡,𝑡+2
𝑏̂ 0 1 𝑏̂𝑡 1 0 𝑏𝑡,𝑡+1
[ 𝑡+1 ] = [ ][ ]+[ ][ ]
𝑏𝑡,𝑡+2 0 0 𝑏𝑡−1,𝑡+1 0 1 𝑏𝑡,𝑡+2
or
𝐺𝑡 = 𝑆𝐺,𝑡 𝑥𝑡 , 𝑏̂𝑡 = 𝑆1 𝑥𝑡
and
𝑀𝑡 = [−𝑝𝑡,𝑡+1 −𝑝𝑡,𝑡+2 ]
where 𝑝𝑡,𝑡+1 is the discount on one period loans in the discrete Markov state at time 𝑡 and
𝑝𝑡,𝑡+2 is the discount on two-period loans in the discrete Markov state.
Define
𝑆𝑡 = 𝑆𝐺,𝑡 + 𝑆1
𝑇𝑡 = 𝑀𝑡 𝑢𝑡 + 𝑆𝑡 𝑥𝑡
It follows that
or
where
Because the payoff function also includes the penalty parameter on issuing debt of different
maturities, we have:
1 −1
where 𝑄𝑐 = [ ]. Therefore, the overall 𝑄 matrix for the Markov jump LQ problem is:
−1 1
𝑄𝑐𝑡 = 𝑄𝑡 + 𝑐1 𝑄𝑐
60.5. MAPPING THE TWO-PERIOD MODEL INTO AN LQ MARKOV JUMP PROBLEM1055
where
𝐴11 0 𝐵1 0
𝐴𝑡 = [ ], 𝐵=[ ], 𝐶𝑡 = [ ]
0 𝐴22,𝑡 0 𝐶2,𝑡
Thus, in this problem all the matrices apart from 𝐵 may depend on the Markov state at time
𝑡.
As shown in the previous lecture, the LQMarkov class can solve Markov jump LQ problems
when given the 𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 matrices for each Markov state.
The function below maps the primitive matrices and parameters from the above two-period
model into the matrices that the LQMarkov class requires:
"""
Function which takes A22, C2, Ug, p_{t, t+1}, p_{t, t+2} and penalty
parameter c1, and returns the required matrices for the LQMarkov
model: A, B, C, R, Q, W.
This version uses the condensed version of the endogenous state.
"""
B1 = np.eye(2)
# Create M matrix
M = np.hstack((-p1, -p2))
# Create A, B, C matrices
A_T = np.hstack((A11, np.zeros((2, nz))))
A_B = np.hstack((np.zeros((nz, 2)), A22))
A = np.vstack((A_T, A_B))
# Create R, Q, W matrices
R = S.T @ S
Q = M.T @ M + c1 * Qc
W = M.T @ S
return A, B, C, R, Q, W
With the above function, we can proceed to solve the model in two steps:
1. Use LQ_markov_mapping to map 𝑈𝑔,𝑡 , 𝐴22,𝑡 , 𝐶2,𝑡 , 𝑝𝑡,𝑡+1 , 𝑝𝑡,𝑡+2 into the
𝐴, 𝐵, 𝐶, 𝑅, 𝑄, 𝑊 matrices for each of the 𝑛 Markov states.
2. Use the LQMarkov class to solve the resulting n-state Markov jump LQ problem.
To implement a simple example of the two-period model, we assume that 𝐺𝑡 follows an AR(1)
process:
1
To do this, we set 𝑧𝑡 = [ ], and consequently:
𝐺𝑡
1 0 0
𝐴22 = [ ̄ ] , 𝐶2 = [ ] , 𝑈𝑔 = [0 1]
𝐺 𝜌 𝜎
1 1
𝑝𝑡,𝑡+1 = 𝛽 , 𝑝𝑡,𝑡+2 = 𝛽 2 − 0.02
2 2
𝑝𝑡,𝑡+1 = 𝛽 , 𝑝𝑡,𝑡+2 = 𝛽 2 + 0.02
We first solve the model with no penalty parameter on different issuance across maturities,
i.e. 𝑐1 = 0.
We also need to specify a transition matrix for the Markov state, we use:
60.6. EXAMPLE SHOWING THE IMPORTANCE OF THE PENALTY ON DIFFERENT ISSUANCE ACR
0.9 0.1
Π=[ ]
0.1 0.9
Thus, each Markov state is persistent, and there is an equal chance of moving from one to the
other.
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping(A22, C_2, Ug, p1, p2, c1)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping(A22, C_2, Ug, p3, p4, c1)
Π = np.array([[0.9, 0.1],
[0.1, 0.9]])
The above simulations show that when no penalty is imposed on different issuances across
maturities, the government has an incentive to take large “long-short” positions in debt of
different maturities.
To prevent such an outcome, we now set 𝑐1 = 0.01.
This penalty is enough to ensure that the government issues positive quantities of both one
and two-period debt:
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping(A22, C_2, Ug, p1, p2, c1)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping(A22, C_2, Ug, p3, p4, c1)
2. The government is able to redesign the maturity structure of debt every period.
We impose a cost on adjusting issuance of each maturity by amending the payoff function to
become:
𝐻−1
𝑇𝑡2 + ∑ 𝑐2 (𝑏𝑡+𝑗
𝑡−1 𝑡
− 𝑏𝑡+𝑗+1 )2
𝑗=0
𝐻 𝐻−1
𝑡
𝑇𝑡 + ∑ 𝑝𝑡,𝑡+𝑗 𝑏𝑡+𝑗 = 𝑏𝑡𝑡−1 + ∑ 𝑝𝑡,𝑡+𝑗 𝑏𝑡+𝑗
𝑡−1
+ 𝐺𝑡
𝑗=1 𝑗=1
To map this into the Markov Jump LQ framework, we define state and control variables.
Let:
𝑏𝑡𝑡−1 𝑡
𝑏𝑡+1
⎡ 𝑏𝑡−1 ⎤ ⎡ 𝑏𝑡 ⎤
𝑏̄𝑡 = ⎢ 𝑡+1 ⎥ , 𝑢𝑡 = ⎢ 𝑡+2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑡−1 𝑡
𝑏
⎣ 𝑡+𝐻−1 ⎦ ⎣𝑏𝑡+𝐻 ⎦
Thus, 𝑏̄𝑡 is the endogenous state (debt issued last period) and 𝑢𝑡 is the control (debt issued
today).
As before, we will also have the exogenous state 𝑧𝑡 , which determines government spending.
Therefore, the full state is:
1060 CHAPTER 60. HOW TO PAY FOR A WAR: PART 2
𝑏̄
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
We also define a vector 𝑝𝑡 that contains the time 𝑡 price of goods in period 𝑡 + 𝑗:
𝑝𝑡,𝑡+1
⎡𝑝 ⎤
𝑝𝑡 = ⎢ 𝑡,𝑡+2 ⎥
⎢ ⋮ ⎥
⎣𝑝𝑡,𝑡+𝐻 ⎦
𝑝𝑡,𝑡+1 1 0 0 ⋯ 0
⎡ 𝑝 ⎤ ⎡0 1 0 ⋯ 0⎤
⎢ 𝑡,𝑡+2 ⎥ = 𝑆𝑠 𝑝𝑡 where 𝑆𝑠 = ⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋱ ⎥
𝑝
⎣ 𝑡,𝑡+𝐻−1 ⎦ ⎣0 0 ⋯ 1 0 ⎦
𝑡−1
𝑏𝑡+1 0 1 0 ⋯ 0
⎡ 𝑏𝑡−1 ⎤ ⎡0 0 1 ⋯ 0⎤
⎢ 𝑡+2 ⎥ = 𝑆𝑥 𝑏̄𝑡 where 𝑆𝑥 = ⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋱ ⎥
𝑡−1
⎣𝑏𝑡+𝑇 −1 ⎦ ⎣0 0 ⋯ 0 1⎦
𝐻−1 𝐻
𝑇𝑡 = 𝑏𝑡𝑡−1 + ∑ 𝑝𝑡+𝑗
𝑡 𝑡−1
𝑏𝑡+𝑗 𝑡
+ 𝐺𝑡 − ∑ 𝑝𝑡+𝑗 𝑡
𝑏𝑡+𝑗
𝑗=1 𝑗=1
or
𝑇𝑡 = 𝑆𝑡 𝑥𝑡 − 𝑝𝑡′ 𝑢𝑡
Therefore
60.7. A MODEL WITH RESTRUCTURING 1061
where
Because the payoff function also includes the penalty parameter for rescheduling, we have:
𝐻−1
𝑇𝑡2 + ∑ 𝑐2 (𝑏𝑡+𝑗
𝑡−1 𝑡
− 𝑏𝑡+𝑗+1 )2 = 𝑇𝑡2 + 𝑐2 (𝑏̄𝑡 − 𝑢𝑡 )′ (𝑏̄𝑡 − 𝑢𝑡 )
𝑗=0
Because the complete state is 𝑥𝑡 and not 𝑏̄𝑡 , we rewrite this as:
where 𝑆𝑐 = [𝐼 0]
Multiplying this out gives:
Therefore, with the cost term, we must amend our 𝑅, 𝑄, 𝑊 matrices as follows:
𝑅𝑡𝑐 = 𝑅𝑡 + 𝑐2 𝑆𝑐′ 𝑆𝑐
𝑄𝑐𝑡 = 𝑄𝑡 + 𝑐2 𝐼
𝑊𝑡𝑐 = 𝑊𝑡 − 𝑐2 𝑆𝑐
To finish mapping into the Markov jump LQ setup, we need to construct the law of motion
for the full state.
This is simpler than in the previous setup, as we now have 𝑏̄𝑡+1 = 𝑢𝑡 .
Therefore:
𝑏̄𝑡+1
𝑥𝑡+1 ≡ [ ] = 𝐴𝑡 𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑡 𝑤𝑡+1
𝑧𝑡+1
where
0 0 𝐼 0
𝐴𝑡 = [ ], 𝐵 = [ ], 𝐶=[ ]
0 𝐴22,𝑡 0 𝐶2,𝑡
As with the previous model, we can use a function to map the primitives of the model with
restructuring into the matrices that the LQMarkov class requires:
"""
Function which takes A22, C2, T, p_t, c and returns the
required matrices for the LQMarkov model: A, B, C, R, Q, W
Note, p_t should be a T by 1 matrix
c is the rescheduling cost (a scalar)
This version uses the condensed version of the endogenous state
"""
# Create Sx, tSx, Ss, S_t matrices (tSx stands for \tilde S_x)
Ss = np.hstack((np.eye(T-1), np.zeros((T-1, 1))))
Sx = np.hstack((np.zeros((T-1, 1)), np.eye(T-1)))
tSx = np.zeros((1, T))
tSx[0, 0] = 1
# Create A, B, C matrices
A_T = np.hstack((np.zeros((T, T)), np.zeros((T, nz))))
A_B = np.hstack((np.zeros((nz, T)), A22))
A = np.vstack((A_T, A_B))
We will assume that there are two Markov states, one with a flatter yield curve, and one with
a steeper yield curve.
In state 1, prices are:
1 1 1
𝑝𝑡,𝑡+1 = 0.9695 , 𝑝𝑡,𝑡+2 = 0.902 , 𝑝𝑡,𝑡+3 = 0.8369
2 2 2
𝑝𝑡,𝑡+1 = 0.9295 , 𝑝𝑡,𝑡+2 = 0.902 , 𝑝𝑡,𝑡+3 = 0.8769
A1, B1, C1, R1, Q1, W1 = LQ_markov_mapping_restruct(A22, C_2, Ug, H, p1, c2)
A2, B2, C2, R2, Q2, W2 = LQ_markov_mapping_restruct(A22, C_2, Ug, H, p2, c2)
ax2.set_xlabel('Time')
ax3.plot(u[2, :])
ax3.set_title('Three-period debt issuance')
ax3.set_xlabel('Time')
ax4.plot(u[0, :] + u[1, :] + u[2, :])
ax4.set_title('Total debt issuance')
ax4.set_xlabel('Time')
plt.tight_layout()
plt.show()
fig, ax = plt.subplots()
ax.plot((u[0, :] / (u[0, :] + u[1, :] + u[2, :])))
ax.set_title('One-period debt issuance share')
ax.set_xlabel('Time')
plt.show()
Chapter 61
61.1 Contents
1065
1066 CHAPTER 61. HOW TO PAY FOR A WAR: PART 3
Let 𝑇𝑡 denote tax collections, 𝛽 a discount factor, 𝑏𝑡,𝑡+1 time 𝑡 + 1 goods that the government
𝑡
promises to pay at 𝑡, 𝐺𝑡 government purchases, 𝑝𝑡+1 the number of time 𝑡 goods received per
time 𝑡 + 1 goods promised.
The stochastic process of government expenditures is exogenous.
The government’s problem is to choose a plan for borrowing and tax collections {𝑏𝑡+1 , 𝑇𝑡 }∞
𝑡=0
to minimize
∞
𝐸0 ∑ 𝛽 𝑡 𝑇𝑡2
𝑡=0
𝑡
𝑇𝑡 + 𝑝𝑡+1 𝑏𝑡,𝑡+1 = 𝐺𝑡 + 𝑏𝑡−1,𝑡
𝐺𝑡 = 𝑈𝑔,𝑡 𝑧𝑡
where 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼). The variables 𝑇𝑡 , 𝑏𝑡,𝑡+1 are control variables chosen at 𝑡, while 𝑏𝑡−1,𝑡 is
𝑡
an endogenous state variable inherited from the past at time 𝑡 and 𝑝𝑡+1 is an exogenous state
variable at time 𝑡.
This is the same set-up as used in this lecture.
We will consider a situation in which the government faces “roll-over risk”.
Specifically, we shut down the government’s ability to borrow in one of the Markov states.
𝑡
𝑝𝑡+1 =𝛽
𝑡
𝑝𝑡+1 =0
in Markov state 2.
Consequently, in the second Markov state, the government is unable to borrow, and the bud-
get constraint becomes 𝑇𝑡 = 𝐺𝑡 + 𝑏𝑡−1,𝑡 .
However, if this is the only adjustment we make in our linear-quadratic model, the govern-
ment will not set 𝑏𝑡,𝑡+1 = 0, which is the outcome we want to express roll-over risk in period
𝑡.
61.5. A BETTER REPRESENTATION OF ROLL-OVER RISK 1067
Instead, the government would have an incentive to set 𝑏𝑡,𝑡+1 to a large negative number in
state 2 – it would accumulate large amounts of assets to bring into period 𝑡 + 1 because that
is cheap (Our Riccati equations will discover this for us!).
Thus, we must represent “roll-over risk” some other way.
To force the government to set 𝑏𝑡,𝑡+1 = 0, we can instead extend the model to have four
Markov states:
where good is a state in which effectively the government can issue debt and bad is a state in
which effectively the government can’t issue debt.
We’ll explain what effectively means shortly.
We now set
𝑡
𝑝𝑡+1 =𝛽
in all states.
In addition – and this is important because it defines what we mean by effectively – we put a
large penalty on the 𝑏𝑡−1,𝑡 element of the state vector in states 2 and 4.
This will prevent the government from wishing to issue any debt in states 3 or 4 because it
would experience a large penalty from doing so in the next period.
The transition matrix for this formulation is:
0.95 0 0.05 0
⎡0.95 0 0.05 0 ⎤
Π=⎢ ⎥
⎢ 0 0.9 0 0.1⎥
⎣ 0 0.9 0 0.1⎦
This transition matrix ensures that the Markov state cannot move, for example, from state 3
to state 1.
Because state 3 is “bad today”, the next period cannot have “good yesterday”.
Ug = np.array([[0, 1]])
# LQ framework matrices
A_t = np.zeros((1, 3))
A_b = np.hstack((np.zeros((2, 1)), A22))
A = np.vstack((A_t, A_b))
B = np.zeros((3, 1))
B[0, 0] = 1
R = S.T @ S
M = np.array([[-β]])
Q = M.T @ M
W = M.T @ S
This model is simulated below, using the same process for 𝐺𝑡 as in this lecture.
𝑡
When 𝑝𝑡+1 = 𝛽 government debt fluctuates around zero.
The spikes in the series for taxation show periods when the government is unable to access
financial markets: positive spikes occur when debt is positive, and the government must raise
taxes in the current period.
Negative spikes occur when the government has positive asset holdings.
An inability to use financial markets in the next period means that the government uses those
assets to lower taxation today.
T = 300
x, u, w, state = lqm.compute_sequence(x0, ts_length=T)
# Calculate taxation each period from the budget constraint and the Markov�
↪ state
tax = np.zeros([T, 1])
for i in range(T):
tax[i, :] = S @ x[:, i] + M @ u[:, i]
We can adjust the model so that, rather than having debt fluctuate around zero, the govern-
ment is a debtor in every period we allow it to borrow.
𝑡
To accomplish this, we simply raise 𝑝𝑡+1 to 𝛽 + 0.02 = 0.97.
Q = M.T @ M
W = M.T @ S
# Calculate taxation each period from the budget constraint and the
# Markov state
tax = np.zeros([T, 1])
for i in range(T):
1070 CHAPTER 61. HOW TO PAY FOR A WAR: PART 3
With a lower interest rate, the government has an incentive to increase debt over time.
However, with “roll-over risk”, debt is recurrently reset to zero and taxes spike up.
Consequently, the government is wary of letting debt get too high, due to the high costs of a
“sudden stop”.
Chapter 62
Optimal Taxation in an LQ
Economy
62.1 Contents
• Overview 62.2
• The Ramsey Problem 62.3
• Implementation 62.4
• Examples 62.5
• Exercises 62.6
• Solutions 62.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
62.2 Overview
1071
1072 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
There is a large number of competitive equilibria indexed by different government fiscal poli-
cies.
The Ramsey planner chooses the best competitive equilibrium.
We want to study the dynamics of tax rates, tax revenues, government debt under a Ramsey
plan.
Because the Lucas and Stokey model features state-contingent government debt, the govern-
ment debt dynamics differ substantially from those in a model of Robert Barro [14].
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde.
We cover only the key features of the problem in this lecture, leaving you to refer to that
source for additional results and intuition.
We’ll need the following imports:
We begin by outlining the key assumptions regarding technology, households and the govern-
ment sector.
62.3.1 Technology
62.3.2 Households
Consider a representative household who chooses a path {ℓ𝑡 , 𝑐𝑡 } for labor and consumption to
maximize
1 ∞
−𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] (1)
2 𝑡=0
∞
𝔼 ∑ 𝛽 𝑡 𝑝𝑡0 [𝑑𝑡 + (1 − 𝜏𝑡 )ℓ𝑡 + 𝑠𝑡 − 𝑐𝑡 ] = 0 (2)
𝑡=0
Here
• 𝛽 is a discount factor in (0, 1).
• 𝑝𝑡0 is a scaled Arrow-Debreu price at time 0 of history contingent goods at time 𝑡 + 𝑗.
• 𝑏𝑡 is a stochastic preference parameter.
• 𝑑𝑡 is an endowment process.
• 𝜏𝑡 is a flat tax rate on labor income.
• 𝑠𝑡 is a promised time-𝑡 coupon payment on debt issued by the government.
The scaled Arrow-Debreu price 𝑝𝑡0 is related to the unscaled Arrow-Debreu price as follows.
If we let 𝜋𝑡0 (𝑥𝑡 ) denote the probability (density) of a history 𝑥𝑡 = [𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] of the state
𝑥𝑡 , then the Arrow-Debreu time 0 price of a claim on one unit of consumption at date 𝑡, his-
tory 𝑥𝑡 would be
𝛽 𝑡 𝑝𝑡0
𝜋𝑡0 (𝑥𝑡 )
Thus, our scaled Arrow-Debreu price is the ordinary Arrow-Debreu price multiplied by the
discount factor 𝛽 𝑡 and divided by an appropriate probability.
The budget constraint (2) requires that the present value of consumption be restricted to
equal the present value of endowments, labor income and coupon payments on bond holdings.
62.3.3 Government
The government imposes a linear tax on labor income, fully committing to a stochastic path
of tax rates at time zero.
The government also issues state-contingent debt.
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes.
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare
of the representative consumer.
1074 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
Endowments, government expenditure, the preference shock process 𝑏𝑡 , and promised coupon
payments on initial government debt 𝑠𝑡 are all exogenous, and given by
• 𝑑𝑡 = 𝑆𝑑 𝑥𝑡
• 𝑔𝑡 = 𝑆𝑔 𝑥𝑡
• 𝑏𝑡 = 𝑆𝑏 𝑥𝑡
• 𝑠𝑡 = 𝑆𝑠 𝑥𝑡
The matrices 𝑆𝑑 , 𝑆𝑔 , 𝑆𝑏 , 𝑆𝑠 are primitives and {𝑥𝑡 } is an exogenous stochastic process taking
values in ℝ𝑘 .
We consider two specifications for {𝑥𝑡 }.
1. Discrete case: {𝑥𝑡 } is a discrete state Markov chain with transition matrix 𝑃 .
2. VAR case: {𝑥𝑡 } obeys 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 where {𝑤𝑡 } is independent zero-mean Gaus-
sian with identify covariance matrix.
62.3.5 Feasibility
𝑐𝑡 + 𝑔𝑡 = 𝑑𝑡 + ℓ𝑡 (3)
Where 𝑝𝑡0 is again a scaled Arrow-Debreu price, the time zero government budget constraint
is
∞
𝔼 ∑ 𝛽 𝑡 𝑝𝑡0 (𝑠𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 ) = 0 (4)
𝑡=0
62.3.7 Equilibrium
An equilibrium is a feasible allocation {ℓ𝑡 , 𝑐𝑡 }, a sequence of prices {𝑝𝑡0 }, and a tax system
{𝜏𝑡 } such that
1. The allocation {ℓ𝑡 , 𝑐𝑡 } is optimal for the household given {𝑝𝑡0 } and {𝜏𝑡 }.
The Ramsey problem is to choose the equilibrium {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } that maximizes the house-
hold’s welfare.
If {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } solves the Ramsey problem, then {𝜏𝑡 } is called the Ramsey plan.
The solution procedure we adopt is
62.3. THE RAMSEY PROBLEM 1075
1. Use the first-order conditions from the household problem to pin down prices and allo-
cations given {𝜏𝑡 }.
2. Use these expressions to rewrite the government budget constraint (4) in terms of ex-
ogenous variables and allocations.
3. Maximize the household’s objective function (1) subject to the constraint constructed in
step 2 and the feasibility constraint (3).
The solution to this maximization problem pins down all quantities of interest.
62.3.8 Solution
Step one is to obtain the first-conditions for the household’s problem, taking taxes and prices
as given.
Letting 𝜇 be the Lagrange multiplier on (2), the first-order conditions are 𝑝𝑡0 = (𝑐𝑡 − 𝑏𝑡 )/𝜇 and
ℓ𝑡 = (𝑐𝑡 − 𝑏𝑡 )(1 − 𝜏𝑡 ).
Rearranging and normalizing at 𝜇 = 𝑏0 − 𝑐0 , we can write these conditions as
𝑏𝑡 − 𝑐𝑡 ℓ𝑡
𝑝𝑡0 = and 𝜏𝑡 = 1 − (5)
𝑏0 − 𝑐0 𝑏𝑡 − 𝑐𝑡
∞
𝔼 ∑ 𝛽 𝑡 [(𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 − ℓ𝑡 ) + ℓ𝑡2 ] = 0 (6)
𝑡=0
The Ramsey problem now amounts to maximizing (1) subject to (6) and (3).
The associated Lagrangian is
∞
1
ℒ = 𝔼 ∑ 𝛽 𝑡 {− [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] + 𝜆 [(𝑏𝑡 − 𝑐𝑡 )(ℓ𝑡 − 𝑠𝑡 − 𝑔𝑡 ) − ℓ𝑡2 ] + 𝜇𝑡 [𝑑𝑡 + ℓ𝑡 − 𝑐𝑡 − 𝑔𝑡 ]} (7)
𝑡=0
2
and
ℓ𝑡 − 𝜆[(𝑏𝑡 − 𝑐𝑡 ) − 2ℓ𝑡 ] = 𝜇𝑡
Combining these last two equalities with (3) and working through the algebra, one can show
that
where
1076 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
• 𝜈 ∶= 𝜆/(1 + 2𝜆)
• ℓ𝑡̄ ∶= (𝑏𝑡 − 𝑑𝑡 + 𝑔𝑡 )/2
• 𝑐𝑡̄ ∶= (𝑏𝑡 + 𝑑𝑡 − 𝑔𝑡 )/2
• 𝑚𝑡 ∶= (𝑏𝑡 − 𝑑𝑡 − 𝑠𝑡 )/2
Apart from 𝜈, all of these quantities are expressed in terms of exogenous variables.
To solve for 𝜈, we can use the government’s budget constraint again.
The term inside the brackets in (6) is (𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 ) − (𝑏𝑡 − 𝑐𝑡 )ℓ𝑡 + ℓ𝑡2 .
Using (8), the definitions above and the fact that ℓ ̄ = 𝑏 − 𝑐,̄ this term can be rewritten as
∞ ∞
𝔼 {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} + (𝜈 2 − 𝜈)𝔼 {∑ 𝛽 𝑡 2𝑚2𝑡 } = 0 (9)
𝑡=0 𝑡=0
∞ ∞
𝑏0 ∶= 𝔼 {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} and 𝑎0 ∶= 𝔼 {∑ 𝛽 𝑡 2𝑚2𝑡 } (10)
𝑡=0 𝑡=0
𝑏0 + 𝑎0 (𝜈 2 − 𝜈) = 0
for 𝜈.
Provided that 4𝑏0 < 𝑎0 , there is a unique solution 𝜈 ∈ (0, 1/2), and a unique corresponding
𝜆 > 0.
Let’s work out how to compute mathematical expectations in (10).
For the first one, the random variable (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) inside the summation can be expressed
as
1 ′
𝑥 (𝑆 − 𝑆𝑑 + 𝑆𝑔 )′ (𝑆𝑔 + 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
For the second expectation in (10), the random variable 2𝑚2𝑡 can be written as
62.3. THE RAMSEY PROBLEM 1077
1 ′
𝑥 (𝑆 − 𝑆𝑑 − 𝑆𝑠 )′ (𝑆𝑏 − 𝑆𝑑 − 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
It follows that both objects of interest are special cases of the expression
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 𝑥′𝑡 𝐻𝑥𝑡 (11)
𝑡=0
Next, suppose that {𝑥𝑡 } is the discrete Markov process described above.
Suppose further that each 𝑥𝑡 takes values in the state space {𝑥1 , … , 𝑥𝑁 } ⊂ ℝ𝑘 .
Let ℎ ∶ ℝ𝑘 → ℝ be a given function, and suppose that we wish to evaluate
∞
𝑞(𝑥0 ) = 𝔼 ∑ 𝛽 𝑡 ℎ(𝑥𝑡 ) given 𝑥0 = 𝑥𝑗
𝑡=0
∞
𝑞(𝑥0 ) = ∑ 𝛽 𝑡 (𝑃 𝑡 ℎ)[𝑗] (12)
𝑡=0
Here
• 𝑃 𝑡 is the 𝑡-th power of the transition matrix 𝑃 .
• ℎ is, with some abuse of notation, the vector (ℎ(𝑥1 ), … , ℎ(𝑥𝑁 )).
• (𝑃 𝑡 ℎ)[𝑗] indicates the 𝑗-th element of 𝑃 𝑡 ℎ.
It can be shown that (12) is in fact equal to the 𝑗-th element of the vector (𝐼 − 𝛽𝑃 )−1 ℎ.
This last fact is applied in the calculations below.
We are interested in tracking several other variables besides the ones described above.
To prepare the way for this, we define
1078 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
𝑡
𝑏𝑡+𝑗 − 𝑐𝑡+𝑗
𝑝𝑡+𝑗 =
𝑏𝑡 − 𝑐𝑡
as the scaled Arrow-Debreu time 𝑡 price of a history contingent claim on one unit of con-
sumption at time 𝑡 + 𝑗.
These are prices that would prevail at time 𝑡 if markets were reopened at time 𝑡.
These prices are constituents of the present value of government obligations outstanding at
time 𝑡, which can be expressed as
∞
𝐵𝑡 ∶= 𝔼𝑡 ∑ 𝛽 𝑗 𝑝𝑡+𝑗
𝑡
(𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) (13)
𝑗=0
Using our expression for prices and the Ramsey plan, we can also write 𝐵𝑡 as
∞ 2
(𝑏𝑡+𝑗 − 𝑐𝑡+𝑗 )(ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) − ℓ𝑡+𝑗
𝐵𝑡 = 𝔼𝑡 ∑ 𝛽 𝑗
𝑗=0
𝑏𝑡 − 𝑐𝑡
𝑡 𝑡 𝑡+1
𝑝𝑡+𝑗 = 𝑝𝑡+1 𝑝𝑡+𝑗
∞
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝐸𝑡 ∑ 𝑝𝑡+𝑗 (𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 )
𝑗=1
and
𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 (14)
Define
𝑅𝑡−1 ∶= 𝔼𝑡 𝛽 𝑗 𝑝𝑡+1
𝑡
(15)
62.3.12 A Martingale
𝑡
Π𝑡 ∶= ∑ 𝜋𝑡
𝑠=0
• 𝑅𝑡 [𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ], which is what the government would have owed at the beginning of
period 𝑡 + 1 if it had simply borrowed at the one-period risk-free rate rather than selling
state-contingent securities.
Thus, 𝜋𝑡+1 is the excess payout on the actual portfolio of state-contingent government debt
relative to an alternative portfolio sufficient to finance 𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 and consisting entirely of
risk-free one-period bonds.
Use expressions (14) and (15) to obtain
1 𝑡
𝜋𝑡+1 = 𝐵𝑡+1 − 𝑡 [𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 ]
𝛽𝐸𝑡 𝑝𝑡+1
or
where 𝐸𝑡̃ is the conditional mathematical expectation taken with respect to a one-step tran-
sition density that has been formed by multiplying the original transition density with the
likelihood ratio
𝑡
𝑝𝑡+1
𝑚𝑡𝑡+1 = 𝑡
𝐸𝑡 𝑝𝑡+1
which asserts that {𝜋𝑡+1 } is a martingale difference sequence under the distorted probability
measure, and that {Π𝑡 } is a martingale under the distorted probability measure.
In the tax-smoothing model of Robert Barro [14], government debt is a random walk.
In the current model, government debt {𝐵𝑡 } is not a random walk, but the excess payoff
{Π𝑡 } on it is.
62.4 Implementation
Parameters
===========
T: int
Length of the simulation
Returns
========
path: a namedtuple of type 'Path', containing
g - Govt spending
d - Endowment
b - Utility shift parameter
s - Coupon payment on existing debt
c - Consumption
l - Labor
p - Price
62.4. IMPLEMENTATION 1081
τ - Tax rate
rvn - Revenue
B - Govt debt
R - Risk-free gross return
π - One-period risk-free interest rate
Π - Cumulative rate of return, adjusted
ξ - Adjustment factor for Π
"""
# Simplify names
β, Sg, Sd, Sb, Ss = econ.β, econ.Sg, econ.Sd, econ.Sb, econ.Ss
if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc
return path
def gen_fig_1(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# Prepare axes
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10))
plt.subplots_adjust(hspace=0.4)
for i in range(num_rows):
for j in range(num_cols):
axes[i, j].grid()
axes[i, j].set_xlabel('Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
def gen_fig_2(path):
1084 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""
T = len(path.c)
# Prepare axes
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
bbox = (0., 1.02, 1., .102)
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}
plt.show()
The function var_quadratic_sum imported from quadsums is for computing the value of
(11) when the exogenous process {𝑥𝑡 } is of the VAR type described above.
Below the definition of the function, you will see definitions of two namedtuple objects,
Economy and Path.
The first is used to collect all the parameters and primitives of a given LQ economy, while the
second collects output of the computations.
In Python, a namedtuple is a popular data type from the collections module of the
standard library that replicates the functionality of a tuple, but also allows you to assign a
name to each tuple element.
These elements can then be references via dotted attribute notation — see for example the
use of path in the functions gen_fig_1() and gen_fig_2().
The benefits of using namedtuples:
• Keeps content organized by meaning.
• Helps reduce the number of global variables.
Other than that, our code is long but relatively straightforward.
62.5. EXAMPLES 1085
62.5 Examples
In [4]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .7, .35
A = eye(2)
A[0, :] = ρ, mg * (1-ρ)
C = np.zeros((2, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 10
Sg = np.array((1, 0)).reshape(1, 2)
Sd = np.array((0, 0)).reshape(1, 2)
Sb = np.array((0, 2.135)).reshape(1, 2)
Ss = np.array((0, 0)).reshape(1, 2)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
1086 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
In [5]: gen_fig_2(path)
62.5. EXAMPLES 1087
Our second example adopts a discrete Markov specification for the exogenous process
In [6]: # == Parameters == #
β = 1 / 1.05
P = np.array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 1, 0, 0)).reshape(1, 5)
Ss = np.array((0, 0, 0, 1, 0)).reshape(1, 5)
T = 15
path = compute_paths(T, economy)
gen_fig_1(path)
In [7]: gen_fig_2(path)
62.6. EXERCISES 1089
62.6 Exercises
62.6.1 Exercise 1
62.7 Solutions
62.7.1 Exercise 1
In [8]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .95, .35
A = np.array([[0, 0, 0, ρ, mg*(1-ρ)],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1]])
C = np.zeros((5, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 8
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
# Chosen st. (Sc + Sg) * x0 = 1
Sb = np.array((0, 0, 0, 0, 2.135)).reshape(1, 5)
Ss = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
62.7. SOLUTIONS 1091
In [9]: gen_fig_2(path)
1092 CHAPTER 62. OPTIMAL TAXATION IN AN LQ ECONOMY
Chapter 63
63.1 Contents
• Overview 63.2
• Example 1 63.3
• Inventories Not Useful 63.4
• Inventories Useful but are Hardwired to be Zero Always 63.5
• Example 2 63.6
• Example 3 63.7
• Example 4 63.8
• Example 5 63.9
• Example 6 63.10
• Exercises 63.11
Co-author: Zejin Shi
In addition to what’s in Anaconda, this lecture employs the following library:
63.2 Overview
1093
1094 CHAPTER 63. PRODUCTION SMOOTHING VIA INVENTORIES
But the firm also prefers to sell out of existing inventories, a preference that we represent by
a cost that is quadratic in the difference between sales in a period and the firm’s beginning of
period inventories.
We compute examples designed to indicate how the firm optimally chooses to smooth produc-
tion and manage inventories while keeping inventories close to sales.
To introduce components of the model, let
• 𝑆𝑡 be sales at time 𝑡
• 𝑄𝑡 be production at time 𝑡
• 𝐼𝑡 be inventories at the beginning of time 𝑡
• 𝛽 ∈ (0, 1) be a discount factor
• 𝑐(𝑄𝑡 ) = 𝑐1 𝑄𝑡 + 𝑐2 𝑄2𝑡 , be a cost of production function, where 𝑐1 > 0, 𝑐2 > 0, be an
inventory cost function
• 𝑑(𝐼𝑡 , 𝑆𝑡 ) = 𝑑1 𝐼𝑡 + 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 , where 𝑑1 > 0, 𝑑2 > 0, be a cost-of-holding-inventories
function, consisting of two components:
– a cost 𝑑1 𝑡 of carrying inventories, and
– a cost 𝑑2 (𝑆𝑡 − 𝐼𝑡 )2 of having inventories deviate from sales
• 𝑝𝑡 = 𝑎0 − 𝑎1 𝑆𝑡 + 𝑣𝑡 be an inverse demand function for a firm’s product, where 𝑎0 >
0, 𝑎1 > 0 and 𝑣𝑡 is a demand shock at time 𝑡
• 𝜋_𝑡 = 𝑝𝑡 𝑆𝑡 − 𝑐(𝑄𝑡 ) − 𝑑(𝐼𝑡 , 𝑆𝑡 ) be the firm’s profits at time 𝑡
∞
• ∑𝑡=0 𝛽 𝑡 𝜋𝑡 be the present value of the firm’s profits at time 0
• 𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡 be the law of motion of inventories
• 𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝜖_𝑡 + 1 be the law of motion for an exogenous state vector 𝑧𝑡 that
contains time 𝑡 information useful for predicting the demand shock 𝑣𝑡
• 𝑣𝑡 = 𝐺𝑧𝑡 link the demand shock to the information set 𝑧𝑡
• the constant 1 be the first component of 𝑧𝑡
To map our problem into a linear-quadratic discounted dynamic programming problem (also
known as an optimal linear regulator), we define the state vector at time 𝑡 as
𝐼
𝑥𝑡 = [ 𝑡 ]
𝑧𝑡
𝑄𝑡
𝑢𝑡 = [ ]
𝑆𝑡
𝐼 1 0 𝐼 1 −1 𝑄𝑡 0
[ 𝑡+1 ] = [ ] [ 𝑡] + [ ] [ ] + [ ] 𝜖𝑡+1
𝑧𝑡 0 𝐴22 𝑧𝑡 0 0 𝑆𝑡 𝐶2
or
(At this point, we ask that you please forgive us for using 𝑄𝑡 to be the firm’s production at
time 𝑡, while below we use 𝑄 as the matrix in the quadratic form 𝑢′𝑡 𝑄𝑢𝑡 that appears in the
firm’s one-period profit function)
We can express the firm’s profit as a function of states and controls as
63.2. OVERVIEW 1095
To form the matrices 𝑅, 𝑄, 𝐻, we note that the firm’s profits at time 𝑡 function can be ex-
pressed
⎛ 2 + 𝑑 𝑆 2 + 𝑐 𝑄2 − 𝑎 𝑆 − 𝐺𝑧 𝑆 + 𝑐 𝑄 − 2𝑑 𝑆 𝐼 ⎞
=−⎜
⎜⏟𝑑1⏟
𝐼𝑡⏟
+ 𝑑⏟𝐼𝑡2 ⏟⏟
2⏟ + 𝑎⏟
1𝑆
⏟𝑡⏟⏟⏟ 2 𝑡 ⏟⏟⏟⏟2 𝑡 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
0 𝑡 𝑡 𝑡 1 𝑡
⎟
2 𝑡 𝑡⎟
⎝ 𝑥′𝑡 𝑅𝑥𝑡 𝑢′𝑡 𝑄𝑢𝑡 2𝑢′𝑡 𝐻𝑥𝑡 ⎠
⎛
⎜ 𝑑2 𝑑1
0 𝑐1
=−⎜ [ 𝐼 𝑧 ′
] [ 2 𝑆𝑐 ] [ 𝐼𝑡 ] + [ 𝑄 𝑆 ] [
𝑐2 0
] [
𝑄𝑡
] + 2 [ 𝑄 𝑆 ] [ 2
⎜
⎜ 𝑡 𝑡 ⏟⏟ 𝑑1 ′
𝑆𝑐 0⏟⏟ 𝑧𝑡 𝑡 𝑡
0⏟⏟⏟
𝑎1 + 𝑑⏟⏟ 𝑆𝑡 𝑡 𝑡
−𝑑2 − 𝑎20 𝑆
2⏟⏟⏟ ⏟⏟ 2 ⏟⏟⏟⏟⏟⏟
⎝ ≡𝑅 ≡𝑄 ≡𝑁
(63.1)
𝑢𝑡 = −𝐹 𝑥𝑡
and the evolution of the state under the optimal decision rule is
Here is code for computing an optimal decision rule and for analyzing its consequences.
def __init__(self,
β=0.96, # Discount factor
c1=1, # Cost-of-production
c2=1,
d1=1, # Cost-of-holding inventories
d2=1,
1096 CHAPTER 63. PRODUCTION SMOOTHING VIA INVENTORIES
self.β = β
self.c1, self.c2 = c1, c2
self.d1, self.d2 = d1, d2
self.a0, self.a1 = a0, a1
self.A22 = np.atleast_2d(A22)
self.C2 = np.atleast_2d(C2)
self.G = np.atleast_2d(G)
# Dimensions
k, j = self.C2.shape # Dimensions for randomness part
n = k + 1 # Number of states
m = 2 # Number of controls
Sc = np.zeros(k)
Sc[0] = 1
B = np.zeros((n, m))
B[0, :] = 1, -1
C = np.zeros((n, j))
C[1:, :] = C2
Q = np.zeros((m, m))
Q[0, 0] = c2
Q[1, 1] = a1 + d2
N = np.zeros((m, n))
N[1, 0] = - d2
N[0, 1:] = c1 / 2 * Sc
N[1, 1:] = - a0 / 2 * Sc - self.G / 2
# Construct LQ instance
self.LQ = qe.LQ(Q, R, A, B, C, N, beta=β)
self.LQ.stationary_values()
Q_path = u_path[0, :]
S_path = u_path[1, :]
plt.show()
Notice that the above code sets parameters at the following default values
• discount factor β=0.96,
• inverse demand function: 𝑎0 = 10, 𝑎1 = 1
• cost of production 𝑐1 = 1, 𝑐2 = 1
• costs of holding inventories 𝑑1 = 1, 𝑑2 = 1
In the examples below, we alter some or all of these parameter values.
63.3 Example 1
𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1 + 𝜖𝑡 ,
which implies
1 1 0 1 0
𝑧𝑡+1 = [ ]=[ ][ ] + [ ] 𝜖𝑡+1 .
𝑣𝑡+1 𝛼 𝜌 ⏟ 𝑣𝑡 1
𝑧𝑡
x0 = [0, 1, 0]
ex1.simulate(x0)
63.4. INVENTORIES NOT USEFUL 1099
∞
∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶(𝑄𝑡 )}
𝑡=0
It turns out that the optimal plan for 𝑄𝑡 for this problem also solves a sequence of static
problems max𝑄𝑡 {𝑝𝑡 𝑄𝑡 − 𝑐(𝑄𝑡 )}.
When inventories aren’t required or used, sales always equal production.
This simplifies the problem and the optimal no-inventory production maximizes the expected
value of
∞
∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 )} .
𝑡=0
𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄𝑛𝑖
𝑡 = .
𝑐2 + 𝑎1
Next, we turn to a distinct problem in which inventories are useful – meaning that there are
costs of 𝑑2 (𝐼𝑡 − 𝑆𝑡 )2 associated with having sales not equal to inventories – but we arbitrarily
impose on the firm the costly restriction that it never hold inventories.
1100 CHAPTER 63. PRODUCTION SMOOTHING VIA INVENTORIES
∞
max ∑ 𝛽 𝑡 {𝑝𝑡 𝑆𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (𝐼𝑡 , 𝑆𝑡 )}
{𝐼𝑡 ,𝑄𝑡 ,𝑆𝑡 }
𝑡=0
∞
max ∑ 𝛽 𝑡 {𝑝𝑡 𝑄𝑡 − 𝐶 (𝑄𝑡 ) − 𝑑 (0, 𝑄𝑡 )}
𝑄𝑡
𝑡=0
𝑎0 + 𝜈 𝑡 − 𝑐 1
𝑄ℎ𝑡 = .
𝑐2 + 𝑎 1 + 𝑑 2
We introduce this 𝐼𝑡 is hardwired to zero specification in order to shed light on the role
that inventories play by comparing outcomes with those under our two other versions of the
problem.
The bottom right panel displays an production path for the original problem that we are in-
terested in (the blue line) as well with an optimal production path for the model in which
inventories are not useful (the green path) and also for the model in which, although invento-
ries are useful, they are hardwired to zero and the firm pays cost 𝑑(0, 𝑄𝑡 ) for not setting sales
𝑆𝑡 = 𝑄𝑡 equal to zero (the orange line).
Notice that it is typically optimal for the firm to produce more when inventories aren’t useful.
Here there is no requirement to sell out of inventories and no costs from having sales deviate
from inventories.
But “typical” does not mean “always”.
Thus, if we look closely, we notice that for small 𝑡, the green “production when inventories
aren’t useful” line in the lower right panel is below optimal production in the original model.
High optimal production in the original model early on occurs because the firm wants to ac-
cumulate inventories quickly in order to acquire high inventories for use in later periods.
But how the green line compares to the blue line early on depends on the evolution of the
demand shock, as we will see in a deterministically seasonal demand shock example to be an-
alyzed below.
In that example, the original firm optimally accumulates inventories slowly because the next
positive demand shock is in the distant future.
To make the green-blue model production comparison easier to see, let’s confine the graphs to
the first 10 periods:
63.6 Example 2
Next, we shut down randomness in demand and assume that the demand shock 𝜈𝑡 follows a
deterministic path:
𝜈𝑡 = 𝛼 + 𝜌𝜈𝑡−1
x0 = [0, 1, 0]
ex2.simulate(x0)
1102 CHAPTER 63. PRODUCTION SMOOTHING VIA INVENTORIES
63.7 Example 3
Now we’ll put randomness back into the demand shock process and also assume that there
are zero costs of holding inventories.
In particular, we’ll look at a situation in which 𝑑1 = 0 but 𝑑2 > 0.
Now it becomes optimal to set sales approximately equal to inventories and to use inventories
to smooth production quite well, as the following figures confirm
x0 = [0, 1, 0]
ex3.simulate(x0)
63.8. EXAMPLE 4 1103
63.8 Example 4
To bring out some features of the optimal policy that are related to some technical issues in
linear control theory, we’ll now temporarily assume that it is costless to hold inventories.
When we completely shut down the cost of holding inventories by setting 𝑑1 = 0 and 𝑑2 = 0,
something absurd happens (because the Bellman equation is opportunistic and very smart).
(Technically, we have set parameters that end up violating conditions needed to assure sta-
bility of the optimally controlled state.)
The firm finds it optimal to set 𝑄𝑡 ≡ 𝑄∗ = −𝑐2𝑐2 , an output level that sets the costs of produc-
1
tion to zero (when 𝑐1 > 0, as it is with our default settings, then it is optimal to set produc-
tion negative, whatever that means!).
Recall the law of motion for inventories
𝐼𝑡+1 = 𝐼𝑡 + 𝑄𝑡 − 𝑆𝑡
−𝑐1
So when 𝑑1 = 𝑑2 = 0 so that the firm finds it optimal to set 𝑄𝑡 = 2𝑐2 for all 𝑡, then
−𝑐1
𝐼𝑡+1 − 𝐼𝑡 = − 𝑆𝑡 < 0
2𝑐2
for almost all values of 𝑆𝑡 under our default parameters that keep demand positive almost all
of the time.
The dynamic program instructs the firm to set production costs to zero and to run a Ponzi
scheme by running inventories down forever.
1104 CHAPTER 63. PRODUCTION SMOOTHING VIA INVENTORIES
(We can interpret this as the firm somehow going short in or borrowing inventories)
The following figures confirm that inventories head south without limit
x0 = [0, 1, 0]
ex4.simulate(x0)
Let’s shorten the time span displayed in order to highlight what is going on.
We’ll set the horizon 𝑇 = 30 with the following code
63.9 Example 5
Now we’ll assume that the demand shock that follows a linear time trend
0
To represent this, we set 𝐶2 = [ ] and
0
1 0 1
𝐴22 = [ ] , 𝑥0 = [ ] , 𝐺 = [ 𝑏 𝑎 ]
1 1 0
In [11]: ex5 = smoothing_example(A22=[[1, 0], [1, 1]], C2=[[0], [0]], G=[b, a])
63.10 Example 6
1 0 0 0 0 0 𝑏
⎡0 0 0 0 1⎤ ⎡0⎤ ⎡𝑎⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
𝐴22 = ⎢0 1 0 0 0⎥ , 𝐶2 = ⎢0⎥ , 𝐺′ = ⎢0⎥
⎢0 0 1 0 0⎥ ⎢0⎥ ⎢0⎥
⎣0 0 0 1 0⎦ 0
⎣ ⎦ ⎣0⎦
1
⎡0⎤
⎢ ⎥
𝑥0 = ⎢1⎥
⎢0⎥
⎣0⎦
Now we’ll generate some more examples that differ simply from the initial season of the year
in which we begin the demand shock
63.11 Exercises
Please try to analyze some inventory sales smoothing problems using the
smoothing_example class.
63.11.1 Exercise 1
𝜈𝑡 = 𝛼 + 𝜌1 𝜈𝑡−1 + 𝜌2 𝜈𝑡−2 + 𝜖𝑡 .
You need to construct 𝐴22, 𝐶, and 𝐺 matrices properly and then to input them as the key-
word arguments of smoothing_example class. Simulate paths starting from the initial con-
′
dition 𝑥0 = [0, 1, 0, 0] .
After this, try to construct a very similar smoothing_example with the same demand
shock process but exclude the randomness 𝜖𝑡 . Compute the stationary states 𝑥̄ by simulating
for a long period. Then try to add shocks with different magnitude to 𝜈𝑡̄ and simulate paths.
You should see how firms respond differently by staring at the production plans.
63.11.2 Exercise 2
63.11.3 Solution 1
# initial condition
x0 = [0, 1, 0, 0]
In the following, we add small and large shocks to 𝜈𝑡̄ and compare how firm responds differ-
ently in quantity. As the shock is not very persistent under the parameterization we are us-
ing, we focus on a short period response.
In [20]: T = 40
63.11.4 Solution 2
In [23]: x0 = [0, 1, 0]
In [24]: smoothing_example(c2=5).simulate(x0)
In [25]: smoothing_example(d2=5).simulate(x0)
Part VIII
1113
Chapter 64
64.1 Contents
• Overview 64.2
• Slicing and Reshaping Data 64.3
• Merging Dataframes and Filling NaNs 64.4
• Grouping and Summarizing Data 64.5
• Final Remarks 64.6
• Exercises 64.7
• Solutions 64.8
64.2 Overview
1115
1116 CHAPTER 64. PANDAS FOR PANEL DATA
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage.
The dataset pandas_panel/realwage.csv can be downloaded here.
Make sure the file is in your current working directory
realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.
↪ code/raw/master/
pandas_panel/realwage.csv')
The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data.
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data.
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe.
By passing a list in columns, we can create a MultiIndex in our column axis
Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
2006-01-01 23,826.64 …
2007-01-01 24,616.84 …
2008-01-01 24,185.70 …
2009-01-01 24,496.84 …
2010-01-01 24,373.76 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88
To more easily filter our time series data, later on, we will convert the index into a
DateTimeIndex
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period).
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
In [5]: type(realwage.columns)
1118 CHAPTER 64. PANDAS FOR PANEL DATA
Out[5]: pandas.core.indexes.multi.MultiIndex
In [6]: realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need.
.stack() rotates the lowest level of the column MultiIndex to the row index
(.unstack() works in the opposite direction - try it out)
In [8]: realwage.stack().head()
Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70
64.3. SLICING AND RESHAPING DATA 1119
Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
2006-01-01 Annual 21,042.28 …
Hourly 10.09 …
2007-01-01 Annual 21,310.05 …
Hourly 10.22 …
2008-01-01 Annual 21,416.96 …
Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56
[5 rows x 64 columns]
We can also pass in an argument to select the level we would like to stack
In [9]: realwage.stack(level='Country').head()
Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars.
To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)
rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()
2009-01-01 7.58
2010-01-01 7.88
[5 rows x 32 columns]
Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether.
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function.
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here.
/pandas_panel/countries.csv', sep=';')
worlddata.head()
[5 rows x 17 columns]
First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’
Our dataframes will be merged using country names, requiring us to use the transpose of
realwage_f so that rows correspond to country names in both dataframes
In [14]: realwage_f.transpose().head()
Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91
[5 rows x 11 columns]
We can use either left, right, inner, or outer join to merge our datasets:
• left join includes only countries from the left dataset
• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets
By default, merge will use an inner join.
Here we will pass how='left' to keep all countries in realwage_f, but discard countries
in worlddata that do not have a corresponding data entry realwage_f.
This is illustrated by the red shading in the following diagram
64.4. MERGING DATAFRAMES AND FILLING NANS 1123
We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’.
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True.
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'
[5 rows x 13 columns]
Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column.
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe
In [16]: merged[merged['Continent'].isnull()]
[3 rows x 13 columns]
merged['Country'].map(missing_continents)
Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object
In [18]: merged['Continent'] =
merged['Continent'].fillna(merged['Country'].map(missing_continents))
merged[merged['Country'] == 'Korea']
[1 rows x 13 columns]
We will also combine the Americas into a single continent - this will make our visualization
nicer later on.
To do this, we will use .replace() and loop through a list of the continent values we want
to replace
Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex.
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on.
By default, levels will be sorted top-down
2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63
[5 rows x 11 columns]
While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format
In [21]: merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged
[5 rows x 32 columns]
Grouping and summarizing data can be particularly useful for understanding large panel
datasets.
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max().
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)
In [24]: merged.mean().head(10)
Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set
merged.mean().sort_values(ascending=False).plot(kind='bar',�
↪ title="Average real minimum
wage 2006 - 2016")
plt.show()
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)
1128 CHAPTER 64. PANDAS FOR PANEL DATA
In [26]: merged.mean(axis=1).head()
Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64
In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
We can also specify a level of the MultiIndex (in the column axis) to aggregate over
We can plot the average minimum wages in each continent as a time series
In [31]: merged.stack().describe()
Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure.
For example, we can return the number of countries in our dataset for each continent using
.size().
In this case, our new data structure is a Series
In [33]: grouped.size()
Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent.
grouped.groups.keys() will return the keys from the groupby object
continents = grouped.groups.keys()
shade=True)
This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting.
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures.
64.7 Exercises
64.7.1 Exercise 1
In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat.
The dataset pandas_panel/employ.csv can be downloaded here.
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns.
Start off by exploring the dataframe and the variables available in the MultiIndex levels.
Write a program that quickly returns all values in the MultiIndex.
64.7.2 Exercise 2
Filter the above dataframe to only include employment as a percentage of ‘active population’.
64.8. SOLUTIONS 1133
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex.
Hint: GEO includes both areas and countries.
64.8 Solutions
64.8.1 Exercise 1
ndas_panel/employ.csv')
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are�
↪datetime format
employ.head()
UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00
This is a large dataset so it is useful to explore the levels and variables available
In [36]: employ.columns.names
64.8.2 Exercise 2
To easily filter by country, swap GEO to the top level and sort the MultiIndex
We need to get rid of a few items in GEO which are not countries.
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’
Select only percentage employed in the active population from the dataframe
level=('UNIT', 'INDIC_EM'),
axis=1)
employ_f.head()
GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
1136 CHAPTER 64. PANDAS FOR PANEL DATA
Chapter 65
65.1 Contents
• Overview 65.2
• Simple Linear Regression 65.3
• Extending the Linear Regression Model 65.4
• Endogeneity 65.5
• Summary 65.6
• Exercises 65.7
• Solutions 65.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
65.2 Overview
Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables.
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models.
Along the way, we’ll discuss a variety of topics, including
• simple and multivariate linear regression
• visualization
• endogeneity and omitted variable bias
• two-stage least squares
As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [3].
• You can download a copy here.
In the paper, the authors emphasize the importance of institutions in economic development.
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences.
1137
1138 CHAPTER 65. LINEAR REGRESSION IN PYTHON
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around.
Let’s start with some imports:
65.2.1 Prerequisites
65.2.2 Comments
[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes.
How do we measure institutional differences and economic outcomes?
In this paper,
• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates.
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group.
These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage.
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes
/maketable1.dta')
df1.head()
Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index
In [4]: plt.style.use('seaborn')
The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita.
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita).
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption.
We can write our model as
1140 CHAPTER 65. LINEAR REGRESSION IN PYTHON
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖
where:
• 𝛽0 is the intercept of the linear trend line on the y-axis
• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)
Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])
X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']
ax.set_xlim([3.3,10.5])
ax.set_ylim([4,10.5])
ax.set_xlabel('Average Expropriation Risk 1985-95')
ax.set_ylabel('Log GDP per capita, PPP, 1995')
ax.set_title('Figure 2: OLS relationship between expropriation \
risk and income')
plt.show()
65.3. SIMPLE LINEAR REGRESSION 1141
The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS).
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, i.e.
𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1
where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable.
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)
In [6]: df1['const'] = 1
Now we can construct our model in statsmodels using the OLS function.
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments
Out[7]: statsmodels.regression.linear_model.OLS
1142 CHAPTER 65. LINEAR REGRESSION IN PYTHON
Out[8]: statsmodels.regression.linear_model.RegressionResultsWrapper
In [9]: print(results.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
Using our parameter estimates, we can now write our estimated relationship as
̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖
This equation describes the line that best fits our data, as shown in Figure 2.
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection.
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38.
Out[10]: 6.515625
Out[11]: 8.3771
An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟
Out[12]: array([8.09156367])
We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results.
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above.
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes
fix, ax = plt.subplots()
ax.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5,
label='predicted')
ax.legend()
1144 CHAPTER 65. LINEAR REGRESSION IN PYTHON
So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model.
Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates.
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 .
[3] consider other factors such as:
• the effect of climate on economic outcomes; latitude is used to proxy this
• differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies
Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta
/maketable2.dta')
65.4. EXTENDING THE LINEAR REGRESSION MODEL 1145
Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)
results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])
print(results_table)
65.5 Endogeneity
As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates.
Namely, there is likely a two-way relationship between institutions and economic outcomes:
• richer countries may be able to afford or prefer better institutions
• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression.
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:
2. not correlated with the error term (ie. it should not directly affect the dependent vari-
able, otherwise it would be correlated with 𝑢𝑖 due to omitted variable bias)
The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences.
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences.
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today.
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument.
X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']
ax.set_xlim([1.8,8.4])
ax.set_ylim([3.3,10.4])
65.5. ENDOGENEITY 1147
The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions).
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance.
[3] argue this is unlikely because:
• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people.
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization.
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates.
First stage
The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument.
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced).
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
1148 CHAPTER 65. LINEAR REGRESSION IN PYTHON
𝑙𝑜𝑔𝑒𝑚4𝑖 .
Therefore, we will estimate the first-stage regression as
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖
The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)
/maketable4.dta')
df4 = df4[df4['baseco'] == 1]
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict().
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model.
65.5. ENDOGENEITY 1149
𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖
results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes.
The result suggests a stronger positive relationship than what the OLS results indicated.
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended.
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels
Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)
In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')
print(iv.summary)
1150 CHAPTER 65. LINEAR REGRESSION IN PYTHON
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================
Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False
Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today.
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development.
65.6 Summary
65.7 Exercises
65.7.1 Exercise 1
In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development.
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test.
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖
Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation
If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.
Using the above information, estimate a Hausman test and interpret your results.
65.7.2 Exercise 2
The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise).
The linear equation we want to estimate is (written in matrix form)
𝑦 = 𝑋𝛽 + 𝑢
To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals
min𝑢̂′ 𝑢̂
𝛽̂
Rearranging the first equation and substituting into the second equation, we can write
Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients
𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture.
65.8 Solutions
65.8.1 Exercise 1
/maketable4.dta')
df4['const'] = 1
print(reg2.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous.
65.8.2 Exercise 2
/maketable1.dta')
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])
df1['const'] = 1
# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)
β_0 = 4.6
β_1 = 0.53
66.1 Contents
• Overview 66.2
• Set Up and Assumptions 66.3
• Conditional Distributions 66.4
• Maximum Likelihood Estimation 66.5
• MLE with Numerical Methods 66.6
• Maximum Likelihood Estimation with statsmodels 66.7
• Summary 66.8
• Exercises 66.9
• Solutions 66.10
66.2 Overview
In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression.
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values.
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables.
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics.
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict.
We’ll require the following imports:
1155
1156 CHAPTER 66. MAXIMUM LIKELIHOOD ESTIMATION
%matplotlib inline
from scipy.special import factorial
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D
import statsmodels.api as sm
from statsmodels.api import Poisson
from scipy import stats
from scipy.stats import norm
from statsmodels.iolib.summary2 import summary_col
66.2.1 Prerequisites
66.2.2 Comments
Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study.
The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data.
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data.
• e.g., the class of all normal distributions, or the class of all gamma distributions.
Each such class is a family of distributions indexed by a finite number of parameters.
• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞).
We’ll let the data pick out a particular element of the class by pinning down the parameters.
The parameter estimates so produced will be called maximum likelihood estimates.
One integer distribution is the Poisson distribution, the probability mass function (pmf) of
which is
𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!
We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)
plt.show()
1158 CHAPTER 66. MAXIMUM LIKELIHOOD ESTIMATION
Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases.
Let’s have a look at the distribution of the data we’ll be working with in this lecture.
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth.
The dataset mle/fp.dta can be downloaded here or from its AER page.
In [3]: pd.options.display.max_columns = 10
fp.dta')
df.head()
[5 rows x 36 columns]
Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)
plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(left=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
66.4. CONDITIONAL DISTRIBUTIONS 1159
From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers).
𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑒 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !
To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation.
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖
for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)
ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate.
We will label our entire parameter vector as 𝛽 where
𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦
To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽.
Intuitively, we want to find the 𝛽̂ that best fits our data.
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function.
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 ).
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 ).
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so
plot_joint_poisson(μ=7, y_n=20)
1162 CHAPTER 66. MAXIMUM LIKELIHOOD ESTIMATION
Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑒
𝑖=1
𝑦𝑖 !
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑒
𝑖=1
𝑦𝑖 !
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)
Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value
maxℒ(𝛽)
𝛽
The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving
𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1
However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods.
Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates.
One such numerical method is the Newton-Raphson algorithm.
Our goal is to find the maximum likelihood estimate 𝛽.̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0.
Let’s illustrate this by supposing
ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
1164 CHAPTER 66. MAXIMUM LIKELIHOOD ESTIMATION
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()
𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot).
Therefore, the likelihood is maximized when 𝛽 = 10.
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative.
The Newton-Raphson algorithm finds a point where the first derivative is 0.
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then
def μ(self):
return np.exp(self.X @ self.β)
def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) - μ - np.log(factorial(y)))
def G(self):
y = self.y
μ = self.μ()
return X.T @ (y - μ)
def H(self):
X = self.X
μ = self.μ()
return -(X.T @ (μ * X))
Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector 𝛽 0 .
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates.
Iteration will end when either:
• The difference between the parameter and the updated parameter is below a tolerance
level.
• The maximum number of iterations has been achieved (meaning convergence is not
achieved).
So we can get an idea of what’s going on while the algorithm is running, an option
display=True is added to print out values at each iteration.
i = 0
error = 100 # Initial error value
# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)
i += 1
Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X.
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------
---
---
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
66.6. MLE WITH NUMERICAL METHODS 1167
As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations.
You can see that with each iteration, the log-likelihood value increased.
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve.
Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration.
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller.
The gradient vector should be close to 0 at 𝛽̂
In [11]: poi.G()
Out[11]: array([[-3.95169226e-07],
[-1.00114804e-06],
[-7.73114559e-07]])
The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10
β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')
labelpad=25,
fontsize=15)
ax.grid(alpha=0.3)
plt.show()
Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize.
Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication.
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more.
statsmodels uses the same algorithm as above to find the maximum likelihood estimates.
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value.
y = np.array([1, 0, 1, 1, 0])
Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture.
Treisman starts by estimating equation (1), where:
• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)
The paper only considers the year 2008 for estimation.
We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)
# Add a constant
df['const'] = 1
# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']
Then we can use the Poisson function from statsmodels to fit the model.
We’ll use robust standard errors as in the author’s paper
results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires \
in 2008')
print(results_table)
66.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 1171
The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate.
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15
# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']
As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected).
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR.
66.8 Summary
66.9 Exercises
66.9.1 Exercise 1
Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions.
66.10. SOLUTIONS 1173
𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)
Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability).
𝛽 is a vector of coefficients.
Following the example in the lecture, write a class to represent the Probit model.
To begin, find the log-likelihood function and derive the gradient and Hessian.
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution.
66.9.2 Exercise 2
Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture
1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥
⎤
⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦
Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement
Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values.
66.10 Solutions
66.10.1 Exercise 1
𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1
𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
1174 CHAPTER 66. MAXIMUM LIKELIHOOD ESTIMATION
𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖
𝑛
𝜕 2 log ℒ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙 (x′ 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x′𝑖 𝛽)[𝑦𝑖 ′ 2
+ (1 − 𝑦𝑖 ) 𝑖 𝑖 ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]
Using these results, we can write a class for the Probit model as follows
def μ(self):
return norm.cdf(self.X @ self.β.T)
def ϕ(self):
return norm.pdf(self.X @ self.β.T)
def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 - y) * np.log(1 - μ))
def G(self):
μ = self.μ()
ϕ = self.ϕ()
return np.sum((X.T * y * ϕ / μ - X.T * (1 - y) * ϕ / (1 - μ)),
axis=1)
def H(self):
X = self.X
β = self.β
μ = self.μ()
ϕ = self.ϕ()
a = (ϕ + (X @ β.T) * μ) / μ**2
b = (ϕ - (X @ β.T) * (1 - μ)) / (1 - μ)**2
return -(ϕ * (y * a + (1 - y) * b) * X.T) @ X
66.10.2 Exercise 2
y = np.array([1, 0, 1, 1, 0])
66.10. SOLUTIONS 1175
Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------
---
---
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
β_hat = [-1.54625858 0.77778952 -0.09709757]
print(Probit(y, X).fit().summary())
1177
Chapter 67
67.1 Contents
• Outline 67.2
• The Model 67.3
• Results 67.4
• Exercises 67.5
• Solutions 67.6
67.2 Outline
In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation
[145].
His model studies the dynamics of racially mixed neighborhoods.
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure.
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation.
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann).
In this lecture, we (in fact you) will build and run a version of Schelling’s model.
Let’s start with some imports:
We will cover a variation of Schelling’s model that is easy to program and captures the main
idea.
1179
1180 CHAPTER 67. SCHELLING’S SEGREGATION MODEL
67.3.1 Set-Up
Suppose we have two types of people: orange people and green people.
For the purpose of this lecture, we will assume there are 250 of each type.
These agents all live on a single unit square.
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1.
67.3.2 Preferences
We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type.
Here ‘nearest’ is in terms of Euclidean distance.
An agent who is not happy is called unhappy.
An important point here is that agents are not averse to living in mixed areas.
They are perfectly happy if half their neighbors are of the other color.
67.3.3 Behavior
3. Else, go to step 1
67.4 Results
Let’s have a look at the results we got when we coded and ran this model.
As discussed above, agents are initially mixed randomly together.
67.4. RESULTS 1181
But after several cycles, they become segregated into distinct regions.
1182 CHAPTER 67. SCHELLING’S SEGREGATION MODEL
67.4. RESULTS 1183
1184 CHAPTER 67. SCHELLING’S SEGREGATION MODEL
In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness.
What is striking about the pictures is how rapidly racial integration breaks down.
This is despite the fact that people in the model don’t actually mind living mixed with the
other type.
Even with these preferences, the outcome is a high degree of segregation.
67.5 Exercises
67.5.1 Exercise 1
* Data:
67.6. SOLUTIONS 1185
* Methods:
67.6 Solutions
67.6.1 Exercise 1
class Agent:
def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)
if self != agent:
distance = self.get_distance(agent)
distances.append((distance, agent))
# == Sort from smallest to largest, according to distance == #
distances.sort()
# == Extract the neighboring agents == #
neighbors = [agent for d, agent in distances[:num_neighbors]]
# == Count how many neighbors have the same type as self == #
num_same_type = sum(self.type == agent.type for agent in neighbors)
return num_same_type >= require_same_type
ax.set_title(f'Cycle {cycle_num-1}')
plt.show()
# == Main == #
num_of_type_0 = 250
num_of_type_1 = 250
num_neighbors = 10 # Number of agents regarded as neighbors
require_same_type = 5 # Want at least this many neighbors to be same type
count = 1
# == Loop until none wishes to move == #
while True:
print('Entering loop ', count)
plot_distribution(agents, count)
count += 1
67.6. SOLUTIONS 1187
no_one_moved = True
for agent in agents:
old_location = agent.location
agent.update(agents)
if agent.location != old_location:
no_one_moved = False
if no_one_moved:
break
print('Converged, terminating.')
Entering loop 1
Entering loop 2
1188 CHAPTER 67. SCHELLING’S SEGREGATION MODEL
Entering loop 3
67.6. SOLUTIONS 1189
Entering loop 4
1190 CHAPTER 67. SCHELLING’S SEGREGATION MODEL
Converged, terminating.
Chapter 68
68.1 Contents
• Overview 68.2
• The Model 68.3
• Implementation 68.4
• Dynamics of an Individual Worker 68.5
• Endogenous Job Finding Rate 68.6
• Exercises 68.7
• Solutions 68.8
• Lake Model Solutions 68.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
68.2 Overview
1191
1192 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
68.2.1 Prerequisites
Before working through what follows, we recommend you read the lecture on finite Markov
chains.
You will also need some basic linear algebra and probability.
The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed.
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as
𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡
(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)
This law tells us how total employment and unemployment evolve over time.
𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
Letting
𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡
1194 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔
68.4 Implementation
Parameters:
------------
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
"""
68.4. IMPLEMENTATION 1195
def compute_derived_values(self):
# Unpack names to simplify expression
λ, α, b, d = self._λ, self._α, self._b, self._d
self._g = b - d
self._A = np.array([[(1-d) * (1-λ) + b, (1 - d) * α + b],
[ (1-d) * λ, (1 - d) * (1 - α)]])
@property
def g(self):
return self._g
@property
def A(self):
return self._A
@property
def A_hat(self):
return self._A_hat
@property
def λ(self):
return self._λ
@λ.setter
def λ(self, new_value):
self._α = new_value
self.compute_derived_values()
@property
def α(self):
return self._α
@α.setter
def α(self, new_value):
self._α = new_value
self.compute_derived_values()
@property
def b(self):
return self._b
@b.setter
def b(self, new_value):
self._b = new_value
self.compute_derived_values()
@property
def d(self):
return self._d
@d.setter
1196 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Returns
--------
xbar : steady state vector of employment and unemployment rates
"""
x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x - x))
x = new_x
return x
Parameters
------------
X0 : array
Contains initial values (E0, U0)
T : int
Number of periods to simulate
Returns
---------
X : iterator
Contains sequence of employment and unemployment stocks
"""
Parameters
------------
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate
Returns
---------
x : iterator
Contains sequence of employment and unemployment rates
68.4. IMPLEMENTATION 1197
"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x
As desired, if we create an instance and update a primitive like 𝛼, derived objects like 𝐴 will
also change
In [4]: lm = LakeModel()
lm.α
Out[4]: 0.013
In [5]: lm.A
In [6]: lm.α = 2
lm.A
Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)
In [7]: lm = LakeModel()
N_0 = 150 # Population
e_0 = 0.92 # Initial employment rate
u_0 = 1 - e_0 # Initial unemployment rate
T = 50 # Simulation length
axes[2].plot(X_path.sum(1), lw=2)
axes[2].set_title('Labor force')
for ax in axes:
1198 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
ax.grid()
plt.tight_layout()
plt.show()
The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔.
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that
̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1
This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue.
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1.
This is the case for our default parameters:
In [8]: lm = LakeModel()
e, f = np.linalg.eigvals(lm.A_hat)
abs(e), abs(f)
Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)
68.5. DYNAMICS OF AN INDIVIDUAL WORKER 1199
In [9]: lm = LakeModel()
e_0 = 0.92 # Initial employment rate
u_0 = 1 - e_0 # Initial unemployment rate
T = 50 # Simulation length
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
An individual worker’s employment dynamics are governed by a finite state Markov process.
1200 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼
Let 𝜓𝑡 denote the marginal distribution over employment/unemployment states for the
worker at time 𝑡.
As usual, we regard it as a row vector.
We know from an earlier discussion that 𝜓𝑡 follows the law of motion
𝜓𝑡+1 = 𝜓𝑡 𝑃
We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗ .
The unique stationary distribution satisfies
𝛼
𝜓∗ [0] =
𝛼+𝜆
Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate.
68.5.1 Ergodicity
1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 0}
𝑇 𝑡=1
and
1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 𝟙{𝑠𝑡 = 1}
𝑇 𝑡=1
lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞
How long does it take for time series sample averages to converge to cross-sectional averages?
We can use QuantEcon.py’s MarkovChain class to investigate this.
Let’s plot the path of the sample averages over 5,000 periods
α, λ = lm.α, lm.λ
P = [[1 - λ, λ],
[ α, 1 - α]]
mc = MarkovChain(P)
xbar = lm.rate_steady_state()
plt.tight_layout()
plt.show()
1202 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤̄
• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts.
• Otherwise, the worker rejects.
As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
• 𝛼, the separation rate
68.6. ENDOGENOUS JOB FINDING RATE 1203
Suppose that all workers inside a lake model behave according to the McCall search model.
The exogenous probability of leaving employment remains 𝛼.
But their optimal decision rules determine the probability 𝜆 of leaving unemployment.
This is now
̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾ℙ{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄
We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance.
We assume that the government sets unemployment compensation 𝑐.
The government imposes a lump-sum tax 𝜏 sufficient to finance total unemployment pay-
ments.
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy
𝜏 = 𝑢𝑐
𝜏 = 𝑢(𝑐, 𝜏 )𝑐
𝑊 ∶= 𝑒 𝔼[𝑉 | employed] + 𝑢 𝑈
1204 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture.
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure
@jit
def u(c, σ):
if c > 0:
return (c**(1 - σ) - 1) / (1 - σ)
else:
return -10e6
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""
def __init__(self,
α=0.2, # Job separation rate
68.6. ENDOGENOUS JOB FINDING RATE 1205
# Add a default wage vector and probabilities over the vector using
# the beta-binomial distribution
if w_vec is None:
n = 60 # Number of possible outcomes for wage
# Wages between 10 and 20
self.w_vec = np.linspace(10, 20, n)
a, b = 600, 400 # Shape parameters
dist = BetaBinomial(n-1, a, b)
self.p_vec = dist.pdf()
else:
self.w_vec = w_vec
self.p_vec = p_vec
@jit
def _update_bellman(α, β, γ, c, σ, w_vec, p_vec, V, V_new, U):
"""
A jitted function to update the Bellman equations. Note that V_new is
modified in place (i.e, modified by this function). The new value of U
is returned.
"""
for w_idx, w in enumerate(w_vec):
# w_idx indexes the vector of possible wages
V_new[w_idx] = u(w, σ) + β * ((1 - α) * V[w_idx] + α * U)
U_new = u(c, σ) + β * (1 - γ) * U + \
β * γ * np.sum(np.maximum(U, V) * p_vec)
return U_new
Parameters
----------
mcm : an instance of McCallModel
tol : float
error tolerance
max_iter : int
the maximum number of iterations
"""
error = tol + 1
return V, U
If V(w) > U for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vec.
Parameters
----------
mcm : an instance of McCallModel
return_values : bool (optional, default=False)
Return the value functions as well
Returns
-------
w_bar : scalar
The reservation wage
"""
V, U = solve_mccall_model(mcm)
w_idx = np.searchsorted(V - U, 0)
if w_idx == len(V):
w_bar = np.inf
else:
w_bar = mcm.w_vec[w_idx]
if return_values == False:
return w_bar
else:
return w_bar, V, U
Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate
"""
mcm = McCallModel(α=α_q,
β=β,
γ=γ,
c=c-τ, # Post tax compensation
σ=σ,
w_vec=w_vec-τ, # Post tax wages
p_vec=p_vec)
"""
w_bar, λ, V, U = compute_optimal_quantities(c, τ)
return e, u, welfare
1208 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
def find_balanced_budget_tax(c):
"""
Find the tax level that will induce a balanced budget.
"""
def steady_state_budget(t):
e, u, w = compute_steady_state_quantities(c, t)
return t - u * c
tax_vec = []
unempl_vec = []
empl_vec = []
welfare_vec = []
for c in c_vec:
t = find_balanced_budget_tax(c)
e_rate, u_rate, welfare = compute_steady_state_quantities(c, t)
tax_vec.append(t)
unempl_vec.append(u_rate)
empl_vec.append(e_rate)
welfare_vec.append(welfare)
plt.tight_layout()
plt.show()
68.7. EXERCISES 1209
68.7 Exercises
68.7.1 Exercise 1
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
(The values for 𝛼 and 𝜆 follow [39])
Suppose that in response to new legislation the hiring rate reduces to 𝜆 = 0.2.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to converge to its new steady state?
What is the new steady state level of employment?
1210 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
68.7.2 Exercise 2
Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization.
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.0025) and then re-
turned to its original level.
Plot the transition dynamics of the unemployment and employment stocks for 50 periods.
Plot the transition dynamics for the rates.
How long does the economy take to return to its original steady state?
68.8 Solutions
68.9.1 Exercise 1
We begin by constructing the class containing the default parameters and assigning the
steady state values to x0
In [14]: lm = LakeModel()
x0 = lm.rate_steady_state()
print(f"Initial Steady State: {x0}")
In [15]: N0 = 100
T = 50
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
axes[i].set_title(title)
axes[i].grid()
plt.tight_layout()
plt.show()
We see that it takes 20 periods for the economy to converge to its new steady state levels.
68.9.2 Exercise 2
This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels.
For 20 periods the economy has a new entry rate into the labor market.
Let’s start off at the baseline parameterization and record the steady state
In [19]: lm = LakeModel()
x0 = lm.rate_steady_state()
Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods
axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')
axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')
axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')
for ax in axes:
ax.grid()
plt.tight_layout()
plt.show()
1214 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
plt.tight_layout()
plt.show()
68.9. LAKE MODEL SOLUTIONS 1215
1216 CHAPTER 68. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
Chapter 69
69.1 Contents
• Overview 69.2
• Defining Rational Expectations Equilibrium 69.3
• Computation of an Equilibrium 69.4
• Exercises 69.5
• Solutions 69.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
69.2 Overview
1217
1218 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
Finally, we will learn about the important “Big 𝐾, little 𝑘” trick, a modeling device widely
used in macroeconomics.
Except that for us
• Instead of “Big 𝐾” it will be “Big 𝑌 ”.
• Instead of “little 𝑘” it will be “little 𝑦”.
Let’s start with some standard imports:
This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium.
We want to impose that
• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but ….
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative.
The Big 𝑌 , little 𝑦 trick accomplishes these two goals by
• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but ….
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem.
Please watch for how this strategy is applied as the lecture unfolds.
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context.
Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market.
Each of these 𝑛 firms sell output 𝑦.
The price 𝑝 of the good lies on an inverse demand curve
𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output
Each firm has a total cost function
69.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 1219
𝑎0 − 𝑎1 𝑌 − 𝑐1 − 𝑐2 𝑦 = 0 (3)
At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into (3) to obtain the following linear
equation
Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs.
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices.
Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms.
Hence each firm wants to forecast future total industry supplies.
1220 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state.
Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief.
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that
maps beliefs into optimal beliefs.
𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)
where
• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output
∞
∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0
where
𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2
In view of (5), the firm’s incentive to forecast the market price translates into an incentive to
forecast aggregate output 𝑌𝑡 .
Aggregate output depends on the choices of other firms.
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output.
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions.
We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion
For now, let’s fix a particular belief 𝐻 in (8) and investigate the firm’s response to it.
Let 𝑣 be the optimal value function for the firm’s problem given 𝐻.
The value function satisfies the Bellman equation
𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)
′
𝑦 2
where
𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶= argmax {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
𝑦′ 2
A First-Order Characterization
𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)
The firm optimally sets an output path that satisfies (13), taking (8) as given, and subject to
• the initial conditions for (𝑦0 , 𝑌0 ).
• the terminal condition lim𝑡→∞ 𝛽 𝑡 𝑦𝑡 𝑣𝑦 (𝑦𝑡 , 𝑌𝑡 ) = 0.
This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”.
The firm’s decision rule solves the difference equation (13) subject to the given initial condi-
tion 𝑦0 and the transversality condition.
Note that solving the Bellman equation (9) for 𝑣 and then ℎ in (11) yields a decision rule that
automatically imposes both the Euler equation (13) and the transversality condition.
Thus, when firms believe that the law of motion for market-wide output is (8), their optimiz-
ing behavior makes the actual law of motion be (14).
A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that
Thus, a rational expectations equilibrium equates the perceived and actual laws of motion (8)
and (14).
69.4. COMPUTATION OF AN EQUILIBRIUM 1223
As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻).
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via (9)–(11), and a decision rule into an actual law via (14).
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ.
Now let’s consider the problem of computing the rational expectations equilibrium.
Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ.
Unfortunately, the mapping Φ is not a contraction.
In particular, there is no guarantee that direct iterations on Φ converge Section ??.
Fortunately, there is another method that works here.
The method exploits a general connection between equilibrium and Pareto optimality ex-
pressed in the fundamental theorems of welfare economics (see, e.g, [115]).
Lucas and Prescott [110] used this method to construct a rational expectations equilibrium.
The details follow.
Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem.
As we’ll see, this planning problem can be solved by LQ control (linear regulator).
The optimal quantities from the planning problem are rational expectations equilibrium
quantities.
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem.
For convenience, in this section, we set 𝑛 = 1.
We first compute a sum of consumer and producer surplus at time 𝑡
𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2
The first term is the area under the demand curve, while the second measures the social costs
of changing output.
The planning problem is to choose a production plan {𝑌𝑡 } to maximize
1224 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
∞
∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0
Evaluating the integral in (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2.
As a result, the Bellman equation for the planning problem is
𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2
−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)
𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )
Substituting this into equation (17) and rearranging leads to the Euler equation
If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence.
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation (16).
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium.
69.5. EXERCISES 1225
As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form
𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)
𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)
69.5 Exercises
69.5.1 Exercise 1
Express the solution of the firm’s problem in the form (20) and give the values for each ℎ𝑗 .
If there were 𝑛 identical competitive firms all behaving according to (20), what would (20)
imply for the actual law of motion (8) for market supply.
69.5.2 Exercise 2
Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see (19)).
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium
• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)
1226 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium.
(You are not being asked actually to use the algorithm you are suggesting)
69.5.3 Exercise 3
69.5.4 Exercise 4
∞
A monopolist faces the industry demand curve (5) and chooses {𝑌𝑡 } to maximize ∑𝑡=0 𝛽 𝑡 𝑟𝑡
where
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2
𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡
69.6 Solutions
69.6.1 Exercise 1
To map a problem into a discounted optimal linear control problem, we need to define
• state vector 𝑥𝑡 and control vector 𝑢𝑡
• matrices 𝐴, 𝐵, 𝑄, 𝑅 that define preferences and the law of motion for the state
For the state and control vectors, we choose
𝑦𝑡
𝑥𝑡 = ⎡ 𝑌 ⎤
⎢ 𝑡⎥ , 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣1⎦
69.6. SOLUTIONS 1227
For 𝐵, 𝑄, 𝑅 we set
1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴=⎡ ⎤
⎢0 𝜅1 𝜅0 ⎥ , 𝐵=⎡ ⎤
⎢0⎥ , 𝑅=⎡ 𝑎
⎢ 1 /2 0 0 ⎥ ⎤, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦
𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2
ℎ0 = −𝐹2 , ℎ 1 = 1 − 𝐹0 , ℎ2 = −𝐹1
a0 = 100
a1 = 0.05
β = 0.95
γ = 10.0
# Beliefs
κ0 = 95.5
κ1 = 0.95
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
out1 = f"F = [{F[0]:.3f}, {F[1]:.3f}, {F[2]:.3f}]"
h0, h1, h2 = -F[2], 1 - F[0], -F[1]
out2 = f"(h0, h1, h2) = ({h0:.3f}, {h1:.3f}, {h2:.3f})"
print(out1)
print(out2)
1228 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields
69.6.2 Exercise 2
To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:
• Determine the corresponding firm law of motion 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 .
• Test whether the associated aggregate law :𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) evaluates to 𝑌𝑡+1 =
𝜅0 + 𝜅1 𝑌𝑡 .
In the second step, we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes
The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475).
(Notice we use np.allclose to test equality of floating-point numbers, since exact equality
is too strict).
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair.
This amounts to implementing the operator Φ described in the lecture.
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)
69.6.3 Exercise 3
𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1
1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0
𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1
we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0 .
The Python code to solve this problem is below:
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
1230 CHAPTER 69. RATIONAL EXPECTATIONS EQUILIBRIUM
95.08187459215002 0.9524590627039248
The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise.
69.6.4 Exercise 4
The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that
𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
m0, m1 = -F[1], 1 - F[0]
print(m0, m1)
73.47294403502818 0.9265270559649701
We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡 .
In the rational expectations case, the law of motion was approximately 𝑌𝑡+1 = 95.0818 +
0.9525𝑌𝑡 .
One way to compare these two laws of motion is by their fixed points, which give long-run
equilibrium output in each case.
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 ).
If you crunch the numbers, you will see that the monopolist adopts a lower long-run quantity
than obtained by the competitive market, implying a higher market price.
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge
to rational expectations equilibria features iterations on a modification of the mapping Φ that
can be approximated as 𝛾Φ + (1 − 𝛾)𝐼. Here 𝐼 is the identity operator and 𝛾 ∈ (0, 1) is a
relaxation parameter. See [113] and [54] for statements and applications of this approach to
establish conditions under which collections of adaptive agents who use least squares learning
to converge to a rational expectations equilibrium.
Chapter 70
70.1 Contents
• Overview 70.2
• Background 70.3
• Linear Markov Perfect Equilibria 70.4
• Application 70.5
• Exercises 70.6
• Solutions 70.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
70.2 Overview
1231
1232 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
70.3 Background
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0.
In (1) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve (1) into (2) lets us express the one-period payoff as
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current
state (𝑞𝑖 , 𝑞−𝑖 ).
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule
of the other firm as known and given.
Given 𝑓−𝑖 , the Bellman equation of firm 𝑖 is
𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂
Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,
• The value function 𝑣𝑖 satisfies the Bellman equation (4).
• The maximizer on the right side of (4) is equal to 𝑓𝑖 (𝑞𝑖 , 𝑞−𝑖 ).
The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories.
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states.
• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state.
70.3.2 Computation
One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules.
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion.
Imagine constructing the iterates
𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂
As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations.
In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure.
We’ll lay out that structure in a general setup and then apply it to some simple problems.
1234 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0
Here
• 𝑥𝑡 is an 𝑛 × 1 state vector and 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0
subject to
where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡
′
• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡
′
70.4. LINEAR MARKOV PERFECT EQUILIBRIA 1235
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)
𝑃1𝑡 = Π1𝑡 −(𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )′ (𝑄1 +𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 +Γ1𝑡 )+𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡 (11)
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)
𝑃2𝑡 = Π2𝑡 −(𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )′ (𝑄2 +𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 +Γ2𝑡 )+𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡 (13)
Key Insight
A key insight is that equations (10) and (12) are linear in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (11) and (13).
Infinite Horizon
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞.
This is the approach we adopt in the next section.
70.4.3 Implementation
We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium
of the infinite horizon linear-quadratic dynamic game in the manner described above.
1236 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
70.5 Application
Let’s use these procedures to treat some applications, starting with the duopoly model.
To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤
⎡
⎥, 𝐵1 ∶= ⎢1⎤
⎡
⎥, 𝐵2 ∶= ⎢0⎤
⎡
⎥
⎣0 0 1 ⎦ ⎣0⎦ ⎣1⎦
The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed-loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
Consider the previously presented duopoly model with parameter values of:
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
From these, we compute the infinite horizon MPE using the preceding code
In [3]: """
@authors: Chase Coleman, Thomas Sargent, John Stachurski
"""
70.5. APPLICATION 1237
import numpy as np
import quantecon as qe
# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
In [4]: Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
1238 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
This is close enough for rock and roll, as they say in the trade.
Indeed, np.allclose agrees with our assessment
Out[5]: True
70.5.3 Dynamics
Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies.
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to (14).
The following program
• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters.
• computes the evolution of 𝑥𝑡 using (14).
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 .
In [6]: AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0.
To gain some perspective we can compare this to what happens in the monopoly case.
The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time.
1240 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
Here parameters are the same as above for both the MPE and monopoly solutions.
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case.
As expected, output is higher and prices are lower under duopoly than monopoly.
70.6 Exercises
70.6.1 Exercise 1
Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE.
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies
under duopoly.
The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class.
70.6.2 Exercise 2
It takes the form of infinite horizon linear-quadratic game proposed by Judd [92].
Two firms set prices and quantities of two goods interrelated through their demand curves.
Relevant variables are defined as follows:
• 𝐼𝑖𝑡 = inventories of firm 𝑖 at beginning of 𝑡
• 𝑞𝑖𝑡 = production of firm 𝑖 during period 𝑡
• 𝑝𝑖𝑡 = price charged by firm 𝑖 during period 𝑡
• 𝑆𝑖𝑡 = sales made by firm 𝑖 during period 𝑡
• 𝐸𝑖𝑡 = costs of production of firm 𝑖 during period 𝑡
• 𝐶𝑖𝑡 = costs of carrying inventories for firm 𝑖 during 𝑡
The firms’ cost functions are
2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars
Inventories obey the laws of motion
𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏
where
′
• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants
Firm 𝑖 maximizes the undiscounted sum
1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡
𝐼1𝑡
𝑝
𝑢𝑖𝑡 = [ 𝑖𝑡 ] and 𝑥𝑡 = ⎢𝐼2𝑡 ⎤
⎡
⎥
𝑞𝑖𝑡
1
⎣ ⎦
Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 .
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately.
The exercise is to calculate these matrices and compute the following figures.
The first figure shows the dynamics of inventories for each firm when the parameters are
In [7]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
1242 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
70.7 Solutions
70.7.1 Exercise 1
First, let’s compute the duopoly MPE under the stated parameters
In [8]: # == Parameters == #
a0 = 10.0
a1 = 2.0
70.7. SOLUTIONS 1243
β = 0.96
γ = 12.0
# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
R1 = [[ 0., -a0/2, 0.],
[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1.
In [9]: AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
𝑅 = 𝑎1 and 𝑄=𝛾
𝐴=𝐵=1
In [10]: R = a1
Q = γ
A = B = 1
lq_alt = qe.LQ(Q, R, A, B, beta=β)
P, F, d = lq_alt.stationary_values()
q_bar = a0 / (2.0 * a1)
qm = np.empty(n)
qm[0] = 2
x0 = qm[0] - q_bar
x = x0
for i in range(1, n):
x = A * x - B * F * x
qm[i] = float(x) + q_bar
pm = a0 - a1 * qm
ax = axes[0]
ax.plot(qm, 'b-', lw=2, alpha=0.75, label='monopolist output')
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE total output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(pm, 'b-', lw=2, alpha=0.75, label='monopolist price')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
70.7. SOLUTIONS 1245
70.7.2 Exercise 2
In [12]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
δ_1 = 1 - δ
𝐼1𝑡
𝑝𝑖𝑡
𝑢𝑖𝑡 = [ ] and 𝑥𝑡 = ⎡ ⎤
⎢𝐼2𝑡 ⎥
𝑞𝑖𝑡
⎣1⎦
S1 = np.zeros((2, 2))
S2 = np.copy(S1)
W1 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e1[1], b[0] / 2.]])
W2 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e2[1], b[1] / 2.]])
Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02
In [15]: AF = A - B1 @ F1 - B2 @ F2
n = 25
x = np.empty((3, n))
x[:, 0] = 2, 0, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
I1 = x[0, :]
I2 = x[1, :]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(I1, 'b-', lw=2, alpha=0.75, label='inventories, firm 1')
ax.plot(I2, 'g-', lw=2, alpha=0.75, label='inventories, firm 2')
ax.set_title(rf'$\delta = {δ}$')
ax.legend()
plt.show()
1248 CHAPTER 70. MARKOV PERFECT EQUILIBRIUM
Chapter 71
71.1 Contents
• Overview 71.2
• Linear Markov Perfect Equilibria with Robust Agents 71.3
• Application 71.4
Co-author: Dongchen Zou
In addition to what’s in Anaconda, this lecture will need the following libraries:
71.2 Overview
1249
1250 CHAPTER 71. ROBUST MARKOV PERFECT EQUILIBRIUM
Decisions of two agents affect the motion of a state vector that appears as an argument of
payoff functions of both agents.
As described in Markov perfect equilibrium, when decision-makers have no concerns about
the robustness of their decision rules to misspecifications of the state dynamics, a Markov
perfect equilibrium can be computed via backward recursion on two sets of equations
• a pair of equations that express linear decision rules for each agent as functions of that
agent’s continuation value function as well as parameters of preferences and state tran-
sition matrices.
This lecture shows how a similar equilibrium concept and similar computational procedures
apply when we impute concerns about robustness to both decision-makers.
A Markov perfect equilibrium with robust agents will be characterized by
• a pair of equations that express linear decision rules for each agent as functions of that
agent’s continuation value function as well as parameters of preferences and state tran-
sition matrices.
• a pair of equations that express linear decision rules for worst-case shocks for each agent
as functions of that agent’s continuation value function as well as parameters of prefer-
ences and state transition matrices.
Below, we’ll construct a robust firms version of the classic duopoly model with adjustment
costs analyzed in Markov perfect equilibrium.
As we saw in Markov perfect equilibrium, the study of Markov perfect equilibria in dynamic
games with two players leads us to an interrelated pair of Bellman equations.
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure.
We consider a general linear quadratic regulator game with two players, each of whom fears
model misspecifications.
We often call the players agents.
The agents share a common baseline model for the transition dynamics of the state vector
But now one or more agents doubt that the baseline model is correctly specified.
71.3. LINEAR MARKOV PERFECT EQUILIBRIA WITH ROBUST AGENTS 1251
The agents express the possibility that their baseline specification is incorrect by adding a
contribution 𝐶𝑣𝑖𝑡 to the time 𝑡 transition law for the state
𝑡1 −1
′
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 − 𝜃𝑖 𝑣𝑖𝑡 𝑣𝑖𝑡 } (1)
𝑡=𝑡0
Here
• 𝑥𝑡 is an 𝑛 × 1 state vector, 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖, and
• 𝑣𝑖𝑡 is an ℎ × 1 vector of distortions to the state dynamics that concern player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖
• 𝐶 is 𝑛 × ℎ
• 𝜃𝑖 ∈ [𝜃𝑖 , +∞] is a scalar multiplier parameter of player 𝑖
If 𝜃𝑖 = +∞, player 𝑖 completely trusts the baseline model.
If 𝜃𝑖 <∞ , player 𝑖 suspects that some other unspecified model actually governs the transition
dynamics.
′
The term 𝜃𝑖 𝑣𝑖𝑡 𝑣𝑖𝑡 is a time 𝑡 contribution to an entropy penalty that an (imaginary) loss-
maximizing agent inside agent 𝑖’s mind charges for distorting the law of motion in a way that
harms agent 𝑖.
Player 𝑖 employs linear decision rules 𝑢𝑖𝑡 = −𝐹𝑖𝑡 𝑥𝑡 , where 𝐹𝑖𝑡 is a 𝑘𝑖 × 𝑛 matrix.
Player 𝑖’s malevolent alter ego employs decision rules 𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡 where 𝐾𝑖𝑡 is an ℎ × 𝑛 ma-
trix.
A robust Markov perfect equilibrium is a pair of sequences {𝐹1𝑡 , 𝐹2𝑡 } and a pair of sequences
{𝐾1𝑡 , 𝐾2𝑡 } over 𝑡 = 𝑡0 , … , 𝑡1 − 1 that satisfy
• {𝐹1𝑡 , 𝐾1𝑡 } solves player 1’s robust decision problem, taking {𝐹2𝑡 } as given, and
• {𝐹2𝑡 , 𝐾2𝑡 } solves player 2’s robust decision problem, taking {𝐹1𝑡 } as given.
If we substitute 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 into (1) and (2), then player 1’s problem becomes
minimization-maximization of
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 − 𝜃1 𝑣1𝑡
′
𝑣1𝑡 } (3)
𝑡=𝑡0
subject to
where
• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡
′
• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡
′
This is an LQ robust dynamic programming problem of the type studied in the Robustness
lecture, which can be solved by working backward.
Maximization with respect to distortion 𝑣1𝑡 leads to the following version of the 𝒟 operator
from the Robustness lecture, namely
The matrix 𝐹1𝑡 in the policy rule 𝑢1𝑡 = −𝐹1𝑡 𝑥𝑡 that solves agent 1’s problem satisfies
𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) (6)
𝑃1𝑡 = Π1𝑡 −(𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 +Γ1𝑡 )′ (𝑄1 +𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 +Γ1𝑡 )+𝛽Λ′1𝑡 𝒟1 (𝑃1𝑡+1 )Λ1𝑡
(7)
Similarly, the policy that solves player 2’s problem is
𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) (8)
𝑃2𝑡 = Π2𝑡 −(𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 +Γ2𝑡 )′ (𝑄2 +𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 +Γ2𝑡 )+𝛽Λ′2𝑡 𝒟2 (𝑃2𝑡+1 )Λ2𝑡
(9)
71.3. LINEAR MARKOV PERFECT EQUILIBRIA WITH ROBUST AGENTS 1253
As in Markov perfect equilibrium, a key insight here is that equations (6) and (8) are linear
in 𝐹1𝑡 and 𝐹2𝑡 .
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in (7) and (9).
Notice how 𝑗’s control law 𝐹𝑗𝑡 is a function of {𝐹𝑖𝑠 , 𝑠 ≥ 𝑡, 𝑖 ≠ 𝑗}.
Thus, agent 𝑖’s choice of {𝐹𝑖𝑡 ; 𝑡 = 𝑡0 , … , 𝑡1 − 1} influences agent 𝑗’s choice of control laws.
However, in the Markov perfect equilibrium of this game, each agent is assumed to ignore the
influence that his choice exerts on the other agent’s choice.
After these equations have been solved, we can also deduce associated sequences of worst-case
shocks.
𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡
where
We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞.
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞.
This is the approach we adopt in the next section.
71.3.6 Implementation
We use the function nnash_robust to compute a Markov perfect equilibrium of the infinite
horizon linear quadratic dynamic game with robust planers in the manner described above.
1254 CHAPTER 71. ROBUST MARKOV PERFECT EQUILIBRIUM
71.4 Application
Without concerns for robustness, the model is identical to the duopoly model from the
Markov perfect equilibrium lecture.
To begin, we briefly review the structure of that model.
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function
𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (10)
Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0.
In (10) and what follows,
• the time subscript is suppressed when possible to simplify notation
• 𝑥̂ denotes a next period value of variable 𝑥
Each firm recognizes that its output affects total output and therefore the market price.
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:
Substituting the inverse demand curve (10) into (11) lets us express the one-period payoff as
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
If we write
where 𝑄1 = 𝑄2 = 𝛾,
71.4. APPLICATION 1255
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤
⎥
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
then we recover the one-period payoffs (11) for the two firms in the duopoly model.
The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where
1 0 0 0 0
𝐴 ∶= ⎡ ⎤
⎢0 1 0 ⎥ , 𝐵1 ∶= ⎡ ⎤
⎢1⎥ , 𝐵2 ∶= ⎡
⎢0⎥
⎤
⎣0 0 1 ⎦ ⎣0⎦ ⎣1⎦
A robust decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following closed-
loop system for the evolution of 𝑥 in the Markov perfect equilibrium:
In [3]: """
@authors: Chase Coleman, Thomas Sargent, John Stachurski
"""
import numpy as np
import quantecon as qe
# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
We add robustness concerns to the Markov Perfect Equilibrium model by extending the func-
tion qe.nnash (link) into a robustness version by adding the maximization operator 𝒟(𝑃 )
into the backward induction.
The MPE with robustness function is nnash_robust.
The function’s code is as follows
In [4]: def nnash_robust(A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
θ1, θ2, beta=1.0, tol=1e-8, max_iter=1000):
"""
Compute the limit of a Nash linear quadratic dynamic game with
robustness concern.
\sum_{t=0}^{\infty}
\left\{
\beta^{t+1} \theta_{i} w_{it+1}'w_{it+1}
\right\}
Parameters
----------
A : scalar(float) or array_like(float)
Corresponds to the MPE equations, should be of size (n, n)
C : scalar(float) or array_like(float)
As above, size (n, c), c is the size of w
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
θ1 : scalar(float)
Robustness parameter of player 1
θ2 : scalar(float)
Robustness parameter of player 2
beta : scalar(float), optional(default=1.0)
Discount factor
tol : scalar(float), optional(default=1e-8)
This is the tolerance level for convergence
max_iter : scalar(int), optional(default=1000)
This is the maximum number of iterations allowed
Returns
-------
F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
1258 CHAPTER 71. ROBUST MARKOV PERFECT EQUILIBRIUM
# Initial values
n = A.shape[0]
k_1 = B1.shape[1]
k_2 = B2.shape[1]
v1 = np.eye(k_1)
v2 = np.eye(k_2)
P1 = np.eye(n) * 1e-5
P2 = np.eye(n) * 1e-5
F1 = np.random.randn(k_1, n)
F2 = np.random.randn(k_2, n)
for it in range(max_iter):
# Update
F10 = F1
F20 = F2
I = np.eye(C.shape[1])
# D1(P1)
# Note: INV1 may not be solved if the matrix is singular
INV1 = solve(θ1 * I - C.T @ P1 @ C, I)
D1P1 = P1 + P1 @ C @ INV1 @ C.T @ P1
# D2(P2)
# Note: INV2 may not be solved if the matrix is singular
INV2 = solve(θ2 * I - C.T @ P2 @ C, I)
D2P2 = P2 + P2 @ C @ INV2 @ C.T @ P2
Λ1 = A - B2 @ F2
71.4. APPLICATION 1259
Λ2 = A - B1 @ F1
Π1 = R1 + F2.T @ S1 @ F2
Π2 = R2 + F1.T @ S2 @ F1
Γ1 = W1.T - M1.T @ F2
Γ2 = W2.T - M2.T @ F1
# Compute P1 and P2
P1 = Π1 - (B1.T @ D1P1 @ Λ1 + Γ1).T @ F1 + \
Λ1.T @ D1P1 @ Λ1
P2 = Π2 - (B2.T @ D2P2 @ Λ2 + Γ2).T @ F2 + \
Λ2.T @ D2P2 @ Λ2
else:
raise ValueError(f'No convergence: Iteration limit of {maxiter} \
reached in nnash')
𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 }
𝑡=𝑡0
where
1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦
and
0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡ −
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥, 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤,
⎥ 𝑄1 = 𝑄2 = 𝛾, 𝑆1 = 𝑆2 = 0, 𝑊1 = 𝑊2 = 0, 𝑀1 = 𝑀
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦
• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12
1260 CHAPTER 71. ROBUST MARKOV PERFECT EQUILIBRIUM
In [5]: # Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0
# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
Consistency Check
We first conduct a comparison test to check if nnash_robust agrees with qe.nnash in the
non-robustness case in which each 𝜃𝑖 ≈ +∞
We can see that the results are consistent across the two functions.
71.4. APPLICATION 1261
We want to compare the dynamics of price and output under the baseline MPE model with
those under the baseline model under the robust decision rules within the robust MPE.
This means that we simulate the state dynamics under the MPE equilibrium closed-loop
transition matrix
𝐴𝑜 = 𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2
where 𝐹1 and 𝐹2 are the firms’ robust decision rules within the robust markov_perfect equi-
librium
• by simulating under the baseline model transition dynamics and the robust
MPE rules we are in assuming that at the end of the day firms’ concerns
about misspecification of the baseline model do not materialize.
• a short way of saying this is that misspecification fears are all ‘just in the minds’ of the
firms.
• simulating under the baseline model is a common practice in the literature.
• note that some assumption about the model that actually governs the data has to be
made in order to create a simulation.
• later we will describe the (erroneous) beliefs of the two firms that justify their robust
decisions as best responses to transition laws that are distorted relative to the baseline
model.
After simulating 𝑥𝑡 under the baseline transition dynamics and robust decision rules 𝐹𝑖 , 𝑖 =
1, 2, we extract and plot industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 .
Here we set the robustness and volatility matrix parameters as follows:
• 𝜃1 = 0.02
• 𝜃2 = 0.04
0
• 𝐶=⎛ ⎜0.01⎞⎟
⎝ 0.01⎠
Because we have set 𝜃1 < 𝜃2 < +∞ we know that
• both firms fear that the baseline specification of the state transition dynam-
ics are incorrect.
The following code prepares graphs that compare market-wide output 𝑞1𝑡 + 𝑞2𝑡 and the price
of the good 𝑝𝑡 under equilibrium decision rules 𝐹𝑖 , 𝑖 = 1, 2 from an ordinary Markov perfect
equilibrium and the decision rules under a Markov perfect equilibrium with robust firms with
multiplier parameters 𝜃𝑖 , 𝑖 = 1, 2 set as described above.
Both industry output and price are under the transition dynamics associated with the base-
71.4. APPLICATION 1263
line model; only the decision rules 𝐹𝑖 differ across the two equilibrium objects presented.
ax = axes[0]
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE output')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
Under the dynamics associated with the baseline model, the price path is higher with the
Markov perfect equilibrium robust decision rules than it is with decision rules for the ordinary
Markov perfect equilibrium.
1264 CHAPTER 71. ROBUST MARKOV PERFECT EQUILIBRIUM
ax = axes[0]
ax.plot(q1, 'g-', lw=2, alpha=0.75, label='firm 1 MPE output')
ax.plot(qr1, 'b-', lw=2, alpha=0.75, label='firm 1 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(q2, 'g-', lw=2, alpha=0.75, label='firm 2 MPE output')
ax.plot(qr2, 'r-', lw=2, alpha=0.75, label='firm 2 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
plt.show()
71.4. APPLICATION 1265
Evidently, firm 1’s output path is substantially lower when firms are robust firms while firm
2’s output path is virtually the same as it would be in an ordinary Markov perfect equilib-
rium with no robust firms.
Recall that we have set 𝜃1 = .02 and 𝜃2 = .04, so that firm 1 fears misspecification of the
baseline model substantially more than does firm 2
• but also please notice that firm 2’s behavior in the Markov perfect equilibrium with ro-
bust firms responds to the decision rule 𝐹1 𝑥𝑡 employed by firm 1.
• thus it is something of a coincidence that its output is almost the same in the two equi-
libria.
Larger concerns about misspecification induce firm 1 to be more cautious than firm 2 in pre-
dicting market price and the output of the other firm.
To explore this, we study next how ex-post the two firms’ beliefs about state dynamics differ
in the Markov perfect equilibrium with robust firms.
(by ex-post we mean after extremization of each firm’s intertemporal objective)
Heterogeneous Beliefs
As before, let 𝐴𝑜 = 𝐴 − 𝐵_1𝐹 _1𝑟 − 𝐵_2𝐹 _2𝑟 , where in a robust MPE, 𝐹𝑖𝑟 is a robust
decision rule for firm 𝑖.
Worst-case forecasts of 𝑥𝑡 starting from 𝑡 = 0 differ between the two firms.
This means that worst-case forecasts of industry output 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 also differ be-
tween the two firms.
To find these worst-case beliefs, we compute the following three “closed-loop” transition ma-
trices
• 𝐴𝑜
• 𝐴𝑜 + 𝐶𝐾_1
• 𝐴𝑜 + 𝐶𝐾_2
We call the first transition law, namely, 𝐴𝑜 , the baseline transition under firms’ robust deci-
sion rules.
We call the second and third worst-case transitions under robust decision rules for firms 1 and
2.
From {𝑥𝑡 } paths generated by each of these transition laws, we pull off the associated price
and total output sequences.
The following code plots them
In [11]: # == Plot == #
fig, axes = plt.subplots(2, 1, figsize=(9, 9))
ax = axes[0]
ax.plot(qrp1, 'b--', lw=2, alpha=0.75,
label='RMPE worst-case belief output player 1')
ax.plot(qrp2, 'r:', lw=2, alpha=0.75,
label='RMPE worst-case belief output player 2')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)
ax = axes[1]
ax.plot(prp1, 'b--', lw=2, alpha=0.75,
label='RMPE worst-case belief price player 1')
ax.plot(prp2, 'r:', lw=2, alpha=0.75,
label='RMPE worst-case belief price player 2')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
71.4. APPLICATION 1267
We see from the above graph that under robustness concerns, player 1 and player 2 have het-
erogeneous beliefs about total output and the goods price even though they share the same
baseline model and information
• firm 1 thinks that total output will be higher and price lower than does firm
2
Uncertainty Traps
72.1 Contents
• Overview 72.2
• The Model 72.3
• Implementation 72.4
• Results 72.5
• Exercises 72.6
• Solutions 72.7
72.2 Overview
1269
1270 CHAPTER 72. UNCERTAINTY TRAPS
The original model described in [55] has many interesting moving parts.
Here we examine a simplified version that nonetheless captures many of the key ideas.
72.3.1 Fundamentals
where
• 𝜎𝜃 > 0 and 0 < 𝜌 < 1
• {𝑤𝑡 } is IID and standard normal
The random variable 𝜃𝑡 is not observable at any time.
72.3.2 Output
Dropping time subscripts, beliefs for current 𝜃 are represented by the normal distribution
𝑁 (𝜇, 𝛾 −1 ).
Here 𝛾 is the precision of beliefs; its inverse is the degree of uncertainty.
These parameters are updated by Kalman filtering.
Let
• 𝕄 ⊂ {1, … , 𝑀̄ } denote the set of currently active firms.
• 𝑀 ∶= |𝕄| denote the number of currently active firms.
1
• 𝑋 be the average output 𝑀 ∑𝑚∈𝕄 𝑥𝑚 of the active firms.
With this notation and primes for next period values, we can write the updating of the mean
and precision via
𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥
−1
′ 𝜌2
𝛾 =( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥
These are standard Kalman filtering results applied to the current setting.
Exercise 1 provides more details on how (2) and (3) are derived and then asks you to fill in
remaining steps.
The next figure plots the law of motion for the precision in (3) as a 45 degree diagram, with
one curve for each 𝑀 ∈ {0, … , 6}.
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
1272 CHAPTER 72. UNCERTAINTY TRAPS
Points where the curves hit the 45 degree lines are long-run steady states for precision for dif-
ferent values of 𝑀 .
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision
• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state
In practice, as we’ll see, the number of active firms fluctuates stochastically.
72.3.4 Participation
Omitting time subscripts once more, entrepreneurs enter the market in the current period if
Here
• the mathematical expectation of 𝑥𝑚 is based on (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but pre-visible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
The statement that 𝐹𝑚 is pre-visible means that it is realized at the start of the period and
72.4. IMPLEMENTATION 1273
1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
1
{1 − 𝔼[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition
1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2
72.4 Implementation
def __init__(self,
a=1.5, # Risk aversion
γ_x=0.5, # Production shock precision
ρ=0.99, # Correlation coefficient for θ
σ_θ=0.5, # Standard dev of θ shock
num_firms=100, # Number of firms
σ_F=1.5, # Standard dev of fixed costs
c=-420, # External opportunity cost
μ_init=0, # Initial value for μ
γ_init=4, # Initial value for γ
θ_init=0): # Initial value for θ
# == Record values == #
self.a, self.γ_x, self.ρ, self.σ_θ = a, γ_x, ρ, σ_θ
1274 CHAPTER 72. UNCERTAINTY TRAPS
# == Initialize states == #
self.γ, self.μ, self.θ = γ_init, μ_init, θ_init
def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (μ, γ). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.σ_F * np.random.randn(self.num_firms)
M = np.sum(self.ψ(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.θ + self.σ_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M
In the results below we use this code to simulate time series for the major variables.
72.5 Results
Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
72.5. RESULTS 1275
We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market.
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information.
These are episodes where the uncertainty traps take hold.
During these episodes
• precision is low and uncertainty is high
• few firms are in the market
To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
1276 CHAPTER 72. UNCERTAINTY TRAPS
Notice how the traps only take hold after a sequence of bad draws for the fundamental.
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity.
72.6. EXERCISES 1277
72.6 Exercises
72.6.1 Exercise 1
Fill in the details behind (2) and (3) based on the following standard result (see, e.g., p. 24 of
[166]).
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is
where
𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥
72.6.2 Exercise 2
72.7 Solutions
72.7.1 Exercise 1
This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting. The stated result
tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior beliefs will be
𝑁 (𝜇0 , 1/𝛾0 )
where
𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥
If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture.
72.7.2 Exercise 2
First, let’s replicate the plot that illustrates the law of motion for precision, which is
1278 CHAPTER 72. UNCERTAINTY TRAPS
−1
𝜌2
𝛾𝑡+1 =( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥
Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀
for M in range(7):
γ_next = 1 / (ρ**2 / (γ + M * γ_x) + σ_θ**2)
label_string = f"$M = {M}$"
ax.plot(γ, γ_next, lw=2, label=label_string)
ax.legend(loc='lower right', fontsize=14)
ax.set_xlabel(r'$\gamma$', fontsize=16)
ax.set_ylabel(r"$\gamma'$", fontsize=16)
ax.grid()
plt.show()
72.7. SOLUTIONS 1279
The points where the curves hit the 45 degree lines are the long-run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long-run steady state of precision.
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output
In [4]: sim_length=2000
μ_vec = np.empty(sim_length)
θ_vec = np.empty(sim_length)
γ_vec = np.empty(sim_length)
X_vec = np.empty(sim_length)
M_vec = np.empty(sim_length)
μ_vec[0] = econ.μ
γ_vec[0] = econ.γ
θ_vec[0] = 0
w_shocks = np.random.randn(sim_length)
for t in range(sim_length-1):
X, M = econ.gen_aggregates()
X_vec[t] = X
M_vec[t] = M
econ.update_beliefs(X, M)
econ.update_θ(w_shocks[t])
μ_vec[t+1] = econ.μ
γ_vec[t+1] = econ.γ
θ_vec[t+1] = econ.θ
plt.show()
72.7. SOLUTIONS 1281
If you run the code above you’ll get different plots, of course.
1282 CHAPTER 72. UNCERTAINTY TRAPS
Try experimenting with different parameters to see the effects on the time series.
(It would also be interesting to experiment with non-Gaussian distributions for the shocks,
but this is a big exercise since it takes us outside the world of the standard Kalman filter)
Chapter 73
73.1 Contents
• Overview 73.2
• The Economy 73.3
• Firms 73.4
• Code 73.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
73.2 Overview
In this lecture, we describe the structure of a class of models that build on work by Truman
Bewley [21].
We begin by discussing an example of a Bewley model due to Rao Aiyagari.
The model features
• Heterogeneous agents
• A single exogenous vehicle for borrowing and lending
• Limits on amounts individual agents may borrow
The Aiyagari model has been used to investigate many topics, including
• precautionary savings and the effect of liquidity constraints [6]
• risk sharing and asset pricing [82]
• the shape of the wealth distribution [18]
• etc., etc., etc.
Let’s start with some imports:
1283
1284 CHAPTER 73. THE AIYAGARI MODEL
73.2.1 References
73.3.1 Households
∞
max 𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0
subject to
where
• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow
The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃.
The wage and interest rate are fixed over time.
In this simple version of the model, households supply labor inelastically because they do not
value leisure.
73.4 Firms
𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼
where
• 𝐴 and 𝛼 are parameters with 𝐴 > 0 and 𝛼 ∈ (0, 1)
• 𝐾𝑡 is aggregate capital
• 𝑁 is total labor supply (which is constant in this simple version of the model)
The firm’s problem is
1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾
Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as
73.4.1 Equilibrium
73.5 Code
"""
def __init__(self,
r=0.01, # Interest rate
w=1.0, # Wages
β=0.96, # Discount factor
a_min=1e-10,
Π=[[0.9, 0.1], [0.1, 0.9]], # Markov chain
z_vals=[0.1, 1.0], # Exogenous states
a_max=18,
73.5. CODE 1287
a_size=200):
self.Π = np.asarray(Π)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)
def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Π)
def build_R(self):
self.R.fill(-np.inf)
populate_R(self.R,
self.a_size,
self.z_size,
self.a_vals,
self.z_vals,
self.r,
self.w)
@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a - a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility
1288 CHAPTER 73. THE AIYAGARI MODEL
@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Π):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i*z_size + next_z_i] = Π[z_i, next_z_i]
@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
for a_i in range(a_size):
for z_i in range(z_size):
a_probs[a_i] += s_probs[a_i*z_size + z_i]
return a_probs
As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices.
# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size
# Get all optimal actions across the set of a indices with z fixed in each�
↪ row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]
plt.show()
73.5. CODE 1289
The plot shows asset accumulation policies at different values of the exogenous state.
Now we want to calculate the equilibrium.
Let’s do this visually as a first pass.
The following code draws aggregate supply and demand curves.
The intersection gives equilibrium interest rates and capital.
In [5]: A = 1.0
N = 1.0
α = 0.33
β = 0.96
δ = 0.05
def r_to_w(r):
"""
Equilibrium wages associated with a given interest rate r.
"""
return A * (1 - α) * (A * α / (r + δ))**(α / (1 - α))
def rd(K):
1290 CHAPTER 73. THE AIYAGARI MODEL
"""
Inverse demand curve for capital. The interest rate associated with a
given demand for capital K.
"""
return A * α * (N / K)**(1 - α) - δ
Parameters:
----------
am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, β)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)
plt.show()
73.5. CODE 1291
1292 CHAPTER 73. THE AIYAGARI MODEL
Chapter 74
74.1 Contents
• Overview 74.2
• Structure 74.3
• Equilibrium 74.4
• Computation 74.5
• Results 74.6
• Exercises 74.7
• Solutions 74.8
In addition to what’s in Anaconda, this lecture will need the following libraries:
74.2 Overview
1293
1294 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
74.3 Structure
A small open economy is endowed with an exogenous stochastically fluctuating potential out-
put stream {𝑦𝑡 }.
Potential output is realized only in periods in which the government honors its sovereign
debt.
The output good can be traded or consumed.
The sequence {𝑦𝑡 } is described by a Markov process with stochastic density kernel 𝑝(𝑦, 𝑦′ ).
Households within the country are identical and rank stochastic consumption streams accord-
ing to
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
Here
• 0 < 𝛽 < 1 is a time discount factor
• 𝑢 is an increasing and strictly concave utility function
Consumption sequences enjoyed by households are affected by the government’s decision to
borrow or lend internationally.
The government is benevolent in the sense that its aim is to maximize (1).
The government is the only domestic actor with access to foreign credit.
74.3. STRUCTURE 1295
Because household are averse to consumption fluctuations, the government will try to smooth
consumption by borrowing from (and lending to) foreign creditors.
The only credit instrument available to the government is a one-period bond traded in inter-
national credit markets.
The bond market has the following features
• The bond matures in one period and is not state contingent.
• A purchase of a bond with face value 𝐵′ is a claim to 𝐵′ units of the consumption good
next period.
• To purchase 𝐵′ next period costs 𝑞𝐵′ now, or, what is equivalent.
• For selling −𝐵′ units of next period goods the seller earns −𝑞𝐵′ of today’s goods.
– If 𝐵′ < 0, then −𝑞𝐵′ units of the good are received in the current period, for a
promise to repay −𝐵′ units next period.
– There is an equilibrium price function 𝑞(𝐵′ , 𝑦) that makes 𝑞 depend on both 𝐵′
and 𝑦.
Earnings on the government portfolio are distributed (or, if negative, taxed) lump sum to
households.
When the government is not excluded from financial markets, the one-period national budget
constraint is
Here and below, a prime denotes a next period value or a claim maturing next period.
To rule out Ponzi schemes, we also require that 𝐵 ≥ −𝑍 in every period.
• 𝑍 is chosen to be sufficiently large that the constraint never binds in equilibrium.
Foreign creditors
• are risk neutral
• know the domestic output stochastic process {𝑦𝑡 } and observe 𝑦𝑡 , 𝑦𝑡−1 , … , at time 𝑡
• can borrow or lend without limit in an international credit market at a constant inter-
national interest rate 𝑟
• receive full payment if the government chooses to pay
• receive zero if the government defaults on its one-period debt due
When a government is expected to default next period with probability 𝛿, the expected value
of a promise to pay one unit of consumption next period is 1 − 𝛿.
Therefore, the discounted expected value of a promise to pay 𝐵 next period is
1−𝛿
𝑞= (3)
1+𝑟
Next we turn to how the government in effect chooses the default probability 𝛿.
1296 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
1. defaulting
2. meeting its current obligations and purchasing or selling an optimal quantity of one-
period sovereign debt
• It returns to 𝑦 only after the country regains access to international credit markets.
While in a state of default, the economy regains access to foreign credit in each subsequent
period with probability 𝜃.
74.4 Equilibrium
1. The interest rate on the government’s debt includes a risk-premium sufficient to make
foreign creditors expect on average to earn the constant risk-free international interest
rate.
To express these ideas more precisely, consider first the choices of the government, which
1. enters a period with initial assets 𝐵, or what is the same thing, initial debt to be repaid
now of −𝐵
74.4. EQUILIBRIUM 1297
3. chooses either
4. to default, or
In a recursive formulation,
• state variables for the government comprise the pair (𝐵, 𝑦)
• 𝑣(𝐵, 𝑦) is the optimum value of the government’s problem when at the beginning of a
period it faces the choice of whether to honor or default
• 𝑣𝑐 (𝐵, 𝑦) is the value of choosing to pay obligations falling due
• 𝑣𝑑 (𝑦) is the value of choosing to default
𝑣𝑑 (𝑦) does not depend on 𝐵 because, when access to credit is eventually regained, net foreign
assets equal 0.
Expressed recursively, the value of defaulting is
𝑣𝑐 (𝐵, 𝑦) = max
′
{𝑢(𝑦 − 𝑞(𝐵′ , 𝑦)𝐵′ + 𝐵) + 𝛽 ∫ 𝑣(𝐵′ , 𝑦′ )𝑝(𝑦, 𝑦′ )𝑑𝑦′ }
𝐵 ≥−𝑍
Given zero profits for foreign creditors in equilibrium, we can combine (3) and (4) to pin
down the bond price function:
1 − 𝛿(𝐵′ , 𝑦)
𝑞(𝐵′ , 𝑦) = (5)
1+𝑟
An equilibrium is
• a pricing function 𝑞(𝐵′ , 𝑦),
1298 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
74.5 Computation
1. Update the value function 𝑣(𝐵, 𝑦), the default rule, the implied ex ante default probabil-
ity, and the price function.
In [3]: """
74.5. COMPUTATION 1299
"""
class Arellano_Economy:
"""
Arellano 2008 deals with a small open economy whose government
invests in foreign assets in order to smooth the consumption of
domestic households. Domestic households receive a stochastic
path of income.
Parameters
----------
β : float
Time discounting parameter
γ : float
Risk-aversion parameter
r : float
int lending rate
ρ : float
Persistence in the income process
η : float
Standard deviation of the income process
θ : float
Probability of re-entering financial markets in each period
ny : int
Number of points in y grid
nB : int
Number of points in B grid
tol : float
Error tolerance in iteration
maxit : int
Maximum number of iterations
"""
def __init__(self,
β=.953, # time discount rate
γ=2., # risk aversion
r=0.017, # international interest rate
ρ=.945, # persistence in output
η=0.025, # st dev of output shock
θ=0.282, # prob of regaining access
ny=21, # number of points in y grid
nB=251, # number of points in B grid
tol=1e-8, # error tolerance in iteration
maxit=10000):
# Save parameters
self.β, self.γ, self.r = β, γ, r
self.ρ, self.η, self.θ = ρ, η, θ
self.ny, self.nB = ny, nB
# Allocate memory
self.Vd = np.zeros(ny)
self.Vc = np.zeros((ny, nB))
self.V = np.zeros((ny, nB))
self.Q = np.ones((ny, nB)) * .95 # Initial guess for prices
self.default_prob = np.empty((ny, nB))
# Main loop
while dist > tol and maxit > it:
# Update prices
Vd_compat = np.repeat(self.Vd, self.nB).reshape(self.ny, self.nB)
default_states = Vd_compat > self.Vc
self.default_prob[:, :] = self.Py @ default_states
self.Q[:, :] = (1 - self.default_prob)/(1 + self.r)
it += 1
if it % 25 == 0:
print(f"Running iteration {it} with dist of {dist}")
return None
def compute_savings_policy(self):
"""
74.5. COMPUTATION 1301
# Allocate memory
self.next_B_index = np.empty((self.ny, self.nB))
EV = self.Py @ self.V
if y_init is None:
# Set to index near the mean of the ygrid
y_init = np.searchsorted(self.ygrid, self.ygrid.mean())
if B_init is None:
B_init = zero_B_index
# Start off not in default
in_default = False
for t in range(T-1):
yi, Bi = y_sim_indices[t], B_sim_indices[t]
if not in_default:
if self.Vc[yi, Bi] < self.Vd[yi]:
in_default = True
Bi_next = zero_B_index
else:
new_index = self.next_B_index[yi, Bi]
Bi_next = new_index
else:
in_default_series[t] = 1
Bi_next = zero_B_index
if random.uniform(0, 1) < self.θ:
in_default = False
B_sim_indices[t+1] = Bi_next
q_sim[t] = self.Q[yi, int(Bi_next)]
return return_vecs
1302 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
@jit(nopython=True)
def u(c, γ):
return c**(1-γ)/(1-γ)
@jit(nopython=True)
def _inner_loop(ygrid, def_y, Bgrid, Vd, Vc, EVc,
EVd, EV, qq, β, θ, γ):
"""
This is a numba version of the inner loop of the solve in the
Arellano class. It updates Vd and Vc in place.
"""
ny, nB = len(ygrid), len(Bgrid)
zero_ind = nB // 2 # Integer division
for iy in range(ny):
y = ygrid[iy] # Pull out current y
# Compute Vd
Vd[iy] = u(def_y[iy], γ) + \
β * (θ * EVc[iy, zero_ind] + (1 - θ) * EVd[iy])
# Compute Vc
for ib in range(nB):
B = Bgrid[ib] # Pull out current B
current_max = -1e14
for ib_next in range(nB):
c = max(y - qq[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
Vc[iy, ib] = current_max
return None
@jit(nopython=True)
def _compute_savings_policy(ygrid, Bgrid, Q, EV, γ, β, next_B_index):
# Compute best index in Bgrid given iy, ib
ny, nB = len(ygrid), len(Bgrid)
for iy in range(ny):
y = ygrid[iy]
for ib in range(nB):
B = Bgrid[ib]
current_max = -1e10
for ib_next in range(nB):
c = max(y - Q[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
current_max_index = ib_next
next_B_index[iy, ib] = current_max_index
return None
74.6. RESULTS 1303
74.6 Results
We can use the results of the computation to study the default probability 𝛿(𝐵′ , 𝑦) defined in
(4).
74.6. RESULTS 1305
The next plot shows these default probabilities over (𝐵′ , 𝑦) as a heat map.
As anticipated, the probability that the government chooses to default in the following period
increases with indebtedness and falls with income.
Next let’s run a time series simulation of {𝑦𝑡 }, {𝐵𝑡 } and 𝑞(𝐵𝑡+1 , 𝑦𝑡 ).
The grey vertical bars correspond to periods when the economy is excluded from financial
markets because of a past default.
1306 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
One notable feature of the simulated data is the nonlinear response of interest rates.
Periods of relative stability are followed by sharp spikes in the discount rate on government
debt.
74.7 Exercises
74.7.1 Exercise 1
To the extent that you can, replicate the figures shown above
• Use the parameter values listed as defaults in the __init__ method of the
Arellano_Economy.
• The time series will of course vary depending on the shock draws.
74.8. SOLUTIONS 1307
74.8 Solutions
In [5]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))
In [6]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))
# Create figure
fig, ax = plt.subplots(figsize=(10, 6.5))
hm = ax.pcolormesh(xx, yy, zz)
cax = fig.add_axes([.92, .1, .02, .8])
fig.colorbar(hm, cax=cax)
ax.axis([xx.min(), 0.05, yy.min(), yy.max()])
ax.set(xlabel="$B'$", ylabel="$y$", title="Probability of Default")
plt.show()
1310 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
In [8]: T = 250
y_vec, B_vec, q_vec, default_vec = ae.simulate(T)
color='k', alpha=0.3)
ax.grid()
ax.plot(range(T), series, lw=2, alpha=0.7)
ax.set(title=title, xlabel="time")
plt.show()
1312 CHAPTER 74. DEFAULT RISK AND INCOME FLUCTUATIONS
Chapter 75
75.1 Contents
• Overview 75.2
• Key Ideas 75.3
• Model 75.4
• Simulation 75.5
• Exercises 75.6
• Solutions 75.7
This lecture is coauthored with Chase Coleman.
75.2 Overview
In this lecture, we review the paper Globalization and Synchronization of Innovation Cycles
by Kiminori Matsuyama, Laura Gardini and Iryna Sushko.
This model helps us understand several interesting stylized facts about the world economy.
One of these is synchronized business cycles across different countries.
Most existing models that generate synchronized business cycles do so by assumption, since
they tie output in each country to a common shock.
They also fail to explain certain features of the data, such as the fact that the degree of syn-
chronization tends to increase with trade ties.
By contrast, in the model we consider in this lecture, synchronization is both endogenous and
increasing with the extent of trade integration.
In particular, as trade costs fall and international competition increases, innovation incentives
become aligned and countries synchronize their innovation cycles.
Let’s start with some imports:
1313
1314 CHAPTER 75. GLOBALIZATION AND CYCLES
75.2.1 Background
The model builds on work by Judd [93], Deneckner and Judd [44] and Helpman and Krugman
[83] by developing a two-country model with trade and innovation.
On the technical side, the paper introduces the concept of coupled oscillators to economic
modeling.
As we will see, coupled oscillators arise endogenously within the model.
Below we review the model and replicate some of the results on synchronization of innovation
across countries.
As discussed above, two countries produce and trade with each other.
In each country, firms innovate, producing new varieties of goods and, in doing so, receiving
temporary monopoly power.
Imitators follow and, after one period of monopoly, what had previously been new varieties
now enter competitive production.
Firms have incentives to innovate and produce new goods when the mass of varieties of goods
currently in production is relatively low.
In addition, there are strategic complementarities in the timing of innovation.
Firms have incentives to innovate in the same period, so as to avoid competing with substi-
tutes that are competitively produced.
This leads to temporal clustering in innovations in each country.
After a burst of innovation, the mass of goods currently in production increases.
However, goods also become obsolete, so that not all survive from period to period.
This mechanism generates a cycle, where the mass of varieties increases through simultaneous
innovation and then falls through obsolescence.
75.3.2 Synchronization
In the absence of trade, the timing of innovation cycles in each country is decoupled.
This will be the case when trade costs are prohibitively high.
If trade costs fall, then goods produced in each country penetrate each other’s markets.
As illustrated below, this leads to synchronization of business cycles across the two countries.
75.4. MODEL 1315
75.4 Model
𝑜 1−𝛼 𝛼
𝑋𝑘,𝑡 𝑋𝑘,𝑡
𝑌𝑘,𝑡 = 𝐶𝑘,𝑡 = ( ) ( )
1−𝛼 𝛼
𝑜
Here 𝑋𝑘,𝑡 is a homogeneous input which can be produced from labor using a linear, one-for-
one technology.
It is freely tradeable, competitively supplied, and homogeneous across countries.
By choosing the price of this good as numeraire and assuming both countries find it optimal
to always produce the homogeneous good, we can set 𝑤1,𝑡 = 𝑤2,𝑡 = 1.
The good 𝑋𝑘,𝑡 is a composite, built from many differentiated goods via
1
1− 1 1− 𝜎
𝑋𝑘,𝑡 𝜎 = ∫ [𝑥𝑘,𝑡 (𝜈)] 𝑑𝜈
Ω𝑡
Here 𝑥𝑘,𝑡 (𝜈) is the total amount of a differentiated good 𝜈 ∈ Ω𝑡 that is produced.
The parameter 𝜎 > 1 is the direct partial elasticity of substitution between a pair of varieties
and Ω𝑡 is the set of varieties available in period 𝑡.
We can split the varieties into those which are supplied competitively and those supplied mo-
nopolistically; that is, Ω𝑡 = Ω𝑐𝑡 + Ω𝑚
𝑡 .
75.4.1 Prices
−𝜎
𝑝𝑘,𝑡 (𝜈) 𝛼𝐿𝑘
𝑥𝑘,𝑡 (𝜈) = ( )
𝑃𝑘,𝑡 𝑃𝑘,𝑡
Here
• 𝑝𝑘,𝑡 (𝜈) is the price of the variety 𝜈 and
• 𝑃𝑘,𝑡 is the price index for differentiated inputs in 𝑘, defined by
1−𝜎
[𝑃𝑘,𝑡 ] = ∫ [𝑝𝑘,𝑡 (𝜈)]1−𝜎 𝑑𝜈
Ω𝑡
1316 CHAPTER 75. GLOBALIZATION AND CYCLES
The price of a variety also depends on the origin, 𝑗, and destination, 𝑘, of the goods because
shipping varieties between countries incurs an iceberg trade cost 𝜏𝑗,𝑘 .
Thus the effective price in country 𝑘 of a variety 𝜈 produced in country 𝑗 becomes 𝑝𝑘,𝑡 (𝜈) =
𝜏𝑗,𝑘 𝑝𝑗,𝑡 (𝜈).
Using these expressions, we can derive the total demand for each variety, which is
where
𝜌𝑗,𝑘 𝐿𝑘
𝐴𝑗,𝑡 ∶= ∑ and 𝜌𝑗,𝑘 = (𝜏𝑗,𝑘 )1−𝜎 ≤ 1
𝑘
(𝑃𝑘,𝑡 )1−𝜎
It is assumed that 𝜏1,1 = 𝜏2,2 = 1 and 𝜏1,2 = 𝜏2,1 = 𝜏 for some 𝜏 > 1, so that
𝑐 𝑐 𝑐 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 𝜓 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
Monopolists will have the same marked-up price, so, for all 𝜈 ∈ Ω𝑚 ,
𝑚 𝜓 𝑚 𝑚 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 1 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
1− 𝜎
Define
𝑐
𝑝𝑗,𝑡 𝑐
𝑦𝑗,𝑡 1 1−𝜎
𝜃 ∶= 𝑚 𝑚 = (1 − )
𝑝𝑗,𝑡 𝑦𝑗,𝑡 𝜎
Using the preceding definitions and some algebra, the price indices can now be rewritten as
1−𝜎 𝑚
𝑃𝑘,𝑡 𝑐
𝑁𝑗,𝑡
( ) = 𝑀𝑘,𝑡 + 𝜌𝑀𝑗,𝑡 where 𝑀𝑗,𝑡 ∶= 𝑁𝑗,𝑡 +
𝜓 𝜃
𝑐 𝑚
The symbols 𝑁𝑗,𝑡 and 𝑁𝑗,𝑡 will denote the measures of Ω𝑐 and Ω𝑚 respectively.
To introduce a new variety, a firm must hire 𝑓 units of labor per variety in each country.
Monopolist profits must be less than or equal to zero in expectation, so
75.4. MODEL 1317
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝑁𝑗,𝑡 ≥ 0, 𝜋𝑗,𝑡 ∶= (𝑝𝑗,𝑡 − 𝜓)𝑦𝑗,𝑡 −𝑓 ≤0 and 𝜋𝑗,𝑡 𝑁𝑗,𝑡 =0
𝑚 𝑐 1 𝛼𝐿𝑗 𝛼𝐿𝑘
𝑁𝑗,𝑡 = 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ) ≥ 0, [ + ]≤𝑓
𝜎 𝜃(𝑀𝑗,𝑡 + 𝜌𝑀𝑘,𝑡 ) 𝜃(𝑀𝑗,𝑡 + 𝑀𝑘,𝑡 /𝜌)
With 𝛿 as the exogenous probability of a variety becoming obsolete, the dynamic equation for
the measure of firms becomes
𝑐 𝑐 𝑚 𝑐 𝑐
𝑁𝑗,𝑡+1 = 𝛿(𝑁𝑗,𝑡 + 𝑁𝑗,𝑡 ) = 𝛿(𝑁𝑗,𝑡 + 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ))
𝑐 𝑚
𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑀𝑗,𝑡 𝑖𝑗,𝑡
𝑛𝑗,𝑡 ∶= , 𝑖𝑗,𝑡 ∶= , 𝑚𝑗,𝑡 ∶= = 𝑛𝑗,𝑡 +
𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝜃
𝐿𝑗
We also use 𝑠𝑗 ∶= 𝐿1 +𝐿2 to be the share of labor employed in country 𝑗.
We can use these definitions and the preceding expressions to obtain a law of motion for
𝑛𝑡 ∶= (𝑛1,𝑡 , 𝑛2,𝑡 ).
In particular, given an initial condition, 𝑛0 = (𝑛1,0 , 𝑛2,0 ) ∈ ℝ2+ , the equilibrium trajectory,
{𝑛𝑡 }∞ ∞ 2 2
𝑡=0 = {(𝑛1,𝑡 , 𝑛2,𝑡 )}𝑡=0 , is obtained by iterating on 𝑛𝑡+1 = 𝐹 (𝑛𝑡 ) where 𝐹 ∶ ℝ+ → ℝ+ is
given by
Here
while
𝑠1 − 𝜌𝑠2
𝑠1 (𝜌) = 1 − 𝑠2 (𝜌) = min { , 1}
1−𝜌
𝑠𝑗 𝑠𝑘
1= +
ℎ𝑗 (𝑛𝑘 ) + 𝜌𝑛𝑘 ℎ𝑗 (𝑛𝑘 ) + 𝑛𝑘 /𝜌
1318 CHAPTER 75. GLOBALIZATION AND CYCLES
1 𝑠𝑗 𝑛𝑘
ℎ𝑗 (𝑛𝑘 )2 + ((𝜌 + )𝑛𝑘 − 𝑠𝑗 − 𝑠𝑘 ) ℎ𝑗 (𝑛𝑘 ) + (𝑛2𝑘 − − 𝑠𝑘 𝑛𝑘 𝜌) = 0
𝜌 𝜌
75.5 Simulation
In [2]: @jit(nopython=True)
def _hj(j, nk, s1, s2, θ, δ, ρ):
"""
If we expand the implicit function for h_j(n_k) then we find that
it is quadratic. We know that h_j(n_k) > 0 so we can get its
value by using the quadratic form
"""
# Find out who's h we are evaluating
if j == 1:
sj = s1
sk = s2
else:
sj = s2
sk = s1
return root
75.5. SIMULATION 1319
@jit(nopython=True)
def DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLL"
return (n1 <= s1_ρ) and (n2 <= s2_ρ)
@jit(nopython=True)
def DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHH"
return (n1 >= _hj(1, n2, s1, s2, θ, δ, ρ)) and \
(n2 >= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHL"
return (n1 >= s1_ρ) and (n2 <= _hj(2, n1, s1, s2, θ, δ, ρ))
@jit(nopython=True)
def DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLH"
return (n1 <= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= s2_ρ)
@jit(nopython=True)
def one_step(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Takes a current value for (n_{1, t}, n_{2, t}) and returns the
values (n_{1, t+1}, n_{2, t+1}) according to the law of motion.
"""
# Depending on where we are, evaluate the right branch
if DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * s1_ρ + (1 - θ) * n1)
n2_tp1 = δ * (θ * s2_ρ + (1 - θ) * n2)
elif DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * n2
elif DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * (θ * _hj(2, n1, s1, s2, θ, δ, ρ) + (1 - θ) * n2)
elif DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * _hj(1, n2, s1, s2, θ, δ, ρ) + (1 - θ) * n1)
n2_tp1 = δ * n2
@jit(nopython=True)
def n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Given an initial condition, continues to yield new values of
n1 and n2
"""
n1_t, n2_t = n1_0, n2_0
while True:
n1_tp1, n2_tp1 = one_step(n1_t, n2_t, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
yield (n1_tp1, n2_tp1)
n1_t, n2_t = n1_tp1, n2_tp1
@jit(nopython=True)
def _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers):
"""
1320 CHAPTER 75. GLOBALIZATION AND CYCLES
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Initialize the status of synchronization
synchronized = False
pers_2_sync = maxiter
iters = 0
# Initialize generator
n_gen = n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
@jit(nopython=True)
def _create_attraction_basis(s1_ρ, s2_ρ, s1, s2, θ, δ, ρ,
maxiter, npers, npts):
# Create unit range with npts
synchronized, pers_2_sync = False, 0
unit_range = np.linspace(0.0, 1.0, npts)
return time_2_sync
class MSGSync:
"""
The paper "Globalization and Synchronization of Innovation Cycles"�
↪presents
Parameters
----------
s1 : scalar(Float)
Amount of total labor in country 1 relative to total worldwide labor
θ : scalar(Float)
A measure of how much more of the competitive variety is used in
production of final goods
δ : scalar(Float)
Percentage of firms that are not exogenously destroyed every period
ρ : scalar(Float)
Measure of how expensive it is to trade between countries
"""
def __init__(self, s1=0.5, θ=2.5, δ=0.7, ρ=0.2):
# Store model parameters
self.s1, self.θ, self.δ, self.ρ = s1, θ, δ, ρ
def _unpack_params(self):
return self.s1, self.s2, self.θ, self.δ, self.ρ
def _calc_s1_ρ(self):
# Unpack params
s1, s2, θ, δ, ρ = self._unpack_params()
# s_1(ρ) = min(val, 1)
val = (s1 - ρ * s2) / (1 - ρ)
return min(val, 1)
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
T : scalar(Int)
Number of periods to simulate
Returns
-------
n1 : Array(Float64, ndim=1)
A history of normalized measures of firms in country one
n2 : Array(Float64, ndim=1)
A history of normalized measures of firms in country two
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
# Allocate space
n1 = np.empty(T)
n2 = np.empty(T)
# Store in arrays
n1[t] = n1_tp1
n2[t] = n2_tp1
return n1, n2
If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check�
↪ whether
firms stay synchronized for `npers` periods with Euclidean norm
Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for
Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ
return ab
We write a short function below that exploits the preceding code and plots two time series.
Each time series gives the dynamics for the two countries.
The time series share parameters but differ in their initial condition.
Here’s the function
1324 CHAPTER 75. GLOBALIZATION AND CYCLES
ax.legend()
ax.set(title=title, ylim=(0.15, 0.8))
return ax
# Create figure
fig, ax = plt.subplots(2, 1, figsize=(10, 8))
fig.tight_layout()
plt.show()
75.5. SIMULATION 1325
In the first case, innovation in the two countries does not synchronize.
In the second case, different initial conditions are chosen, and the cycles become synchro-
nized.
Next, let’s study the initial conditions that lead to synchronized cycles more systematically.
We generate time series from a large collection of different initial conditions and mark those
conditions with different colors according to whether synchronization occurs or not.
The next display shows exactly this for four different parameterizations (one for each subfig-
ure).
1326 CHAPTER 75. GLOBALIZATION AND CYCLES
Dark colors indicate synchronization, while light colors indicate failure to synchronize.
75.6 Exercises
75.6.1 Exercise 1
Replicate the figure shown above by coloring initial conditions according to whether or not
synchronization occurs from those conditions.
75.7 Solutions
return ab, cf
Additionally, instead of just seeing 4 plots at once, we might want to manually be able to
change 𝜌 and see how it affects the plot in real-time. Below we use an interactive plot to do
this.
Note, interactive plotting requires the ipywidgets module to be installed and enabled.
76.1 Contents
• Overview 76.2
• The Model 76.3
• Equilibrium 76.4
• Existence, Uniqueness and Computation of Equilibria 76.5
• Implementation 76.6
• Exercises 76.7
• Solutions 76.8
76.2 Overview
In 1937, Ronald Coase wrote a brilliant essay on the nature of the firm [36].
Coase was writing at a time when the Soviet Union was rising to become a significant indus-
trial power.
At the same time, many free-market economies were afflicted by a severe and painful depres-
sion.
This contrast led to an intensive debate on the relative merits of decentralized, price-based
allocation versus top-down planning.
In the midst of this debate, Coase made an important observation: even in free-market
economies, a great deal of top-down planning does in fact take place.
This is because firms form an integral part of free-market economies and, within firms, alloca-
tion is by planning.
In other words, free-market economies blend both planning (within firms) and decentralized
production coordinated by prices.
The question Coase asked is this: if prices and free markets are so efficient, then why do firms
even exist?
Couldn’t the associated within-firm planning be done more efficiently by the market?
We’ll use the following imports:
1331
1332 CHAPTER 76. COASE’S THEORY OF THE FIRM
On top of asking a deep and fascinating question, Coase also supplied an illuminating answer:
firms exist because of transaction costs.
Here’s one example of a transaction cost:
Suppose agent A is considering setting up a small business and needs a web developer to con-
struct and help run an online store.
She can use the labor of agent B, a web developer, by writing up a freelance contract for
these tasks and agreeing on a suitable price.
But contracts like this can be time-consuming and difficult to verify
• How will agent A be able to specify exactly what she wants, to the finest detail, when
she herself isn’t sure how the business will evolve?
• And what if she isn’t familiar with web technology? How can she specify all the relevant
details?
• And, if things go badly, will failure to comply with the contract be verifiable in court?
In this situation, perhaps it will be easier to employ agent B under a simple labor contract.
The cost of this contract is far smaller because such contracts are simpler and more standard.
The basic agreement in a labor contract is: B will do what A asks him to do for the term of
the contract, in return for a given salary.
Making this agreement is much easier than trying to map every task out in advance in a con-
tract that will hold up in a court of law.
So agent A decides to hire agent B and a firm of nontrivial size appears, due to transaction
costs.
76.2.2 A Trade-Off
76.2.3 Summary
76.3.1 Subcontracting
The subcontracting scheme by which tasks are allocated across firms is illustrated in the fig-
ure below
1334 CHAPTER 76. COASE’S THEORY OF THE FIRM
In this example,
• Firm 1 receives a contract to sell one unit of the completed good to a final buyer.
• Firm 1 then forms a contract with firm 2 to purchase the partially completed good at
stage 𝑡1 , with the intention of implementing the remaining 1 − 𝑡1 tasks in-house (i.e.,
processing from stage 𝑡1 to stage 1).
• Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at
stage 𝑡2 .
• Firm 3 decides to complete the chain, selecting 𝑡3 = 0.
At this point, production unfolds in the opposite direction (i.e., from upstream to down-
stream).
• Firm 3 completes processing stages from 𝑡3 = 0 up to 𝑡2 and transfers the good to firm
2.
• Firm 2 then processes from 𝑡2 up to 𝑡1 and transfers the good to firm 1,
• Firm 1 processes from 𝑡1 to 1 and delivers the completed good to the final buyer.
The length of the interval of stages (range of tasks) carried out by firm 𝑖 is denoted by ℓ𝑖 .
Each firm chooses only its upstream boundary, treating its downstream boundary as given.
The benefit of this formulation is that it implies a recursive structure for the decision problem
for each firm.
In choosing how many processing stages to subcontract, each successive firm faces essentially
the same decision problem as the firm above it in the chain, with the only difference being
76.4. EQUILIBRIUM 1335
that the decision space is a subinterval of the decision space for the firm above.
We will exploit this recursive structure in our study of equilibrium.
76.3.2 Costs
76.4 Equilibrium
We assume that all firms are ex-ante identical and act as price takers.
As price takers, they face a price function 𝑝, which is a map from [0, 1] to ℝ+ , with 𝑝(𝑡) inter-
preted as the price of the good at processing stage 𝑡.
There is a countable infinity of firms indexed by 𝑖 and no barriers to entry.
The cost of supplying the initial input (the good processed up to stage zero) is set to zero for
simplicity.
Free entry and the infinite fringe of competitors rule out positive profits for incumbents, since
any incumbent could be replaced by a member of the competitive fringe filling the same role
in the production chain.
Profits are never negative in equilibrium because firms can freely exit.
An equilibrium in this setting is an allocation of firms and a price function such that
1. all active firms in the chain make zero profits, including suppliers of raw materials
In particular, 𝑡𝑖−1 is the downstream boundary of firm 𝑖 and 𝑡𝑖 is its upstream boundary.
As transaction costs are incurred only by the buyer, its profits are
1. 𝑝(0) = 0,
2. 𝜋𝑖 = 0 for all 𝑖, and
3. 𝑝(𝑠) − 𝑐(𝑠 − 𝑡) − 𝛿𝑝(𝑡) ≤ 0 for any pair 𝑠, 𝑡 with 0 ≤ 𝑠 ≤ 𝑡 ≤ 1.
The rationale behind these conditions was given in our informal definition of equilibrium
above.
We have defined an equilibrium but does one exist? Is it unique? And, if so, how can we com-
pute it?
By definition, 𝑡∗ (𝑠) is the cost-minimizing upstream boundary for a firm that is contracted to
deliver the good at stage 𝑠 and faces the price function 𝑝∗ .
Since 𝑝∗ lies in 𝒫 and since 𝑐 is strictly convex, it follows that the right-hand side of (4) is
continuous and strictly convex in 𝑡.
Hence the minimizer 𝑡∗ (𝑠) exists and is uniquely defined.
We can use 𝑡∗ to construct an equilibrium allocation as follows:
Recall that firm 1 sells the completed good at stage 𝑠 = 1, its optimal upstream boundary is
𝑡∗ (1).
Hence firm 2’s optimal upstream boundary is 𝑡∗ (𝑡∗ (1)).
Continuing in this way produces the sequence {𝑡∗𝑖 } defined by
The sequence ends when a firm chooses to complete all remaining tasks.
We label this firm (and hence the number of firms in the chain) as
The task allocation corresponding to (5) is given by ℓ𝑖∗ ∶= 𝑡∗𝑖−1 − 𝑡∗𝑖 for all 𝑖.
In [97] it is shown that
1338 CHAPTER 76. COASE’S THEORY OF THE FIRM
3. the price function 𝑝∗ and this allocation together forms an equilibrium for the produc-
tion chain.
While the proofs are too long to repeat here, much of the insight can be obtained by observ-
ing that, as a fixed point of 𝑇 , the equilibrium price function must satisfy
From this equation, it is clear that so profits are zero for all incumbent firms.
We can develop some additional insights on the behavior of firms by examining marginal con-
ditions associated with the equilibrium.
As a first step, let ℓ∗ (𝑠) ∶= 𝑠 − 𝑡∗ (𝑠).
This is the cost-minimizing range of in-house tasks for a firm with downstream boundary 𝑠.
In [97] it is shown that 𝑡∗ and ℓ∗ are increasing and continuous, while 𝑝∗ is continuously dif-
ferentiable at all 𝑠 ∈ (0, 1) with
Equation (8) follows from 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} and the envelope theorem for
derivatives.
A related equation is the first order condition for 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)}, the
minimization problem for a firm with upstream boundary 𝑠, which is
This condition matches the marginal condition expressed verbally by Coase that we stated
above:
“A firm will tend to expand until the costs of organizing an extra transaction
within the firm become equal to the costs of carrying out the same transaction
by means of an exchange on the open market…”
Combining (8) and (9) and evaluating at 𝑠 = 𝑡𝑖 , we see that active firms that are adjacent
satisfy
𝛿 𝑐′ (ℓ𝑖+1
∗
) = 𝑐′ (ℓ𝑖∗ ) (10)
In other words, the marginal in-house cost per task at a given firm is equal to that of its up-
stream partner multiplied by gross transaction cost.
76.6. IMPLEMENTATION 1339
76.6 Implementation
For most specifications of primitives, there is no closed-form solution for the equilibrium as
far as we are aware.
However, we know that we can compute the equilibrium corresponding to a given transaction
cost parameter 𝛿 and a cost function 𝑐 by applying the results stated above.
In particular, we can
As we step between iterates, we will use linear interpolation of functions, as we did in our lec-
ture on optimal growth and several other places.
To begin, here’s a class to store primitives and a grid
def __init__(self,
n=1000,
delta=1.05,
c=lambda t: np.exp(10 * t) - 1):
* pc is an instance of ProductionChain
* The initial condition is p = c
"""
delta, c, n, grid = pc.delta, pc.c, pc.n, pc.grid
p = c(grid) # Initial condition is c(s), as an array
new_p = np.empty_like(p)
error = tol + 1
i = 0
for i, s in enumerate(grid):
Tp = lambda t: delta * interp(grid, p, t) + c(s - t)
new_p[i] = Tp(fminbound(Tp, 0, s))
error = np.max(np.abs(p - new_p))
p = new_p
i = i + 1
if i < max_iter:
print(f"Iteration converged in {i} steps")
else:
print(f"Warning: iteration hit upper bound {max_iter}")
The next function computes optimal choice of upstream boundary and range of task imple-
mented for a firm face price function p_function and with downstream boundary 𝑠.
"""
delta, c = pc.delta, pc.c
f = lambda t: delta * p_function(t) + c(s - t)
t_star = max(fminbound(f, -1, s), 0)
ell_star = s - t_star
return t_star, ell_star
The allocation of firms can be computed by recursively stepping through firms’ choices of
their respective upstream boundary, treating the previous firm’s upstream boundary as their
own downstream boundary.
In doing so, we start with firm 1, who has downstream boundary 𝑠 = 1.
In [6]: pc = ProductionChain()
p_star = compute_prices(pc)
fig, ax = plt.subplots()
ax.plot(pc.grid, p_star(pc.grid))
ax.set_xlim(0.0, 1.0)
ax.set_ylim(0.0)
for s in transaction_stages:
ax.axvline(x=s, c="0.5")
plt.show()
Here’s the function ℓ∗ , which shows how large a firm with downstream boundary 𝑠 chooses to
be
fig, ax = plt.subplots()
ax.plot(pc.grid, ell_star, label="$\ell^*$")
ax.legend(fontsize=14)
plt.show()
1342 CHAPTER 76. COASE’S THEORY OF THE FIRM
76.7 Exercises
76.7.1 Exercise 1
76.7.2 Exercise 2
76.8 Solutions
76.8.1 Exercise 1
pc = ProductionChain(delta=delta)
p_star = compute_prices(pc)
transaction_stages = compute_stages(pc, p_star)
num_firms = len(transaction_stages)
print(f"When delta={delta} there are {num_firms} firms")
76.8.2 Exercise 2
Firm size increases with downstreamness because 𝑝∗ , the equilibrium price function, is in-
creasing and strictly convex.
This means that, for a given producer, the marginal cost of the input purchased from the pro-
ducer just upstream from itself in the chain increases as we go further downstream.
Hence downstream firms choose to do more in house than upstream firms — and are therefore
larger.
The equilibrium price function is strictly convex due to both transaction costs and diminish-
ing returns to management.
One way to put this is that firms are prevented from completely mitigating the costs associ-
ated with diminishing returns to management — which induce convexity — by transaction
costs. This is because transaction costs force firms to have nontrivial size.
Here’s one way to compute and graph value added across firms
In [9]: pc = ProductionChain()
p_star = compute_prices(pc)
stages = compute_stages(pc, p_star)
va = []
fig, ax = plt.subplots()
ax.plot(va, label="value added by firm")
ax.set_xticks((5, 25))
ax.set_xticklabels(("downstream firms", "upstream firms"))
plt.show()
1345
Chapter 77
77.1 Contents
“Mathematics is the art of giving the same name to different things” – Henri
Poincare
“Complete market economies are all alike” – Robert E. Lucas, Jr., (1989)
1347
1348 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
5. Cattle cycles
In saying that “complete markets are all alike”, Robert E. Lucas, Jr. was noting that all of
them have
• a commodity space.
• a space dual to the commodity space in which prices reside.
• endowments of resources.
• peoples’ preferences over goods.
• physical technologies for transforming resources into goods.
• random processes that govern shocks to technologies and preferences and associated in-
formation flows.
• a single budget constraint per person.
• the existence of a representative consumer even when there are many people in the
model.
• a concept of competitive equilibrium.
• theorems connecting competitive equilibrium allocations to allocations that would be
chosen by a benevolent social planner.
The models have no frictions such as …
• Enforcement difficulties
• Information asymmetries
• Other forms of transactions costs
• Externalities
The models extensively use the powerful ideas of
• Indexing commodities and their prices by time (John R. Hicks).
• Indexing commodities and their prices by chance (Kenneth Arrow).
Much of the imperialism of complete markets models comes from applying these two tricks.
The Hicks trick of indexing commodities by time is the idea that dynamics are a special
case of statics.
The Arrow trick of indexing commodities by chance is the idea that analysis of trade un-
der uncertainty is a special case of the analysis of trade under certainty.
The [78] class of models specify the commodity space, preferences, technologies, stochastic
shocks and information flows in ways that allow the models to be analyzed completely using
only the tools of linear time series models and linear-quadratic optimal control described in
the two lectures Linear State Space Models and Linear Quadratic Control.
77.2. A SUITE OF MODELS 1349
There are costs and benefits associated with the simplifications and specializations needed to
make a particular model fit within the [78] class
• the costs are that linear-quadratic structures are sometimes too confining.
• benefits include computational speed, simplicity, and ability to analyze many model fea-
tures analytically or nearly analytically.
A variety of superficially different models are all instances of the [78] class of models
• Lucas asset pricing model
• Lucas-Prescott model of investment under uncertainty
• Asset pricing models with habit persistence
• Rosen-Topel equilibrium model of housing
• Rosen schooling models
• Rosen-Murphy-Scheinkman model of cattle cycles
• Hansen-Sargent-Tallarini model of robustness and asset pricing
• Many more …
The diversity of these models conceals an essential unity that illustrates the quotation by
Robert E. Lucas, Jr., with which we began this lecture.
77.2.2 Forecasting?
A consequence of a single budget constraint per person plus the Hicks-Arrow tricks is that
households and firms need not forecast.
But there exist equivalent structures called recursive competitive equilibria in which they
do appear to need to forecast.
In these structures, to forecast, households and firms use:
• equilibrium pricing functions, and
• knowledge of the Markov structure of the economy’s state vector.
For an application of the [78] class of models, the outcome of theorizing is a stochastic pro-
cess, i.e., a probability distribution over sequences of prices and quantities, indexed by param-
eters describing preferences, technologies, and information flows.
Another name for that object is a likelihood function, a key object of both frequentist and
Bayesian statistics.
There are two important uses of an equilibrium stochastic process or likelihood func-
tion.
The first is to solve the direct problem.
The direct problem takes as inputs values of the parameters that define preferences, tech-
nologies, and information flows and as an output characterizes or simulates random paths of
quantities and prices.
The second use of an equilibrium stochastic process or likelihood function is to solve the in-
verse problem.
The inverse problem takes as an input a time series sample of observations on a subset of
prices and quantities determined by the model and from them makes inferences about the
1350 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
parameters that define the model’s preferences, technologies, and information flows.
A [78] economy consists of lists of matrices that describe peoples’ household technologies,
their preferences over consumption services, their production technologies, and their informa-
tion sets.
There are complete markets in history-contingent commodities.
Competitive equilibrium allocations and prices
• satisfy equations that are easy to write down and solve
• have representations that are convenient econometrically
Different example economies manifest themselves simply as different settings for various ma-
trices.
[78] use these tools:
• A theory of recursive dynamic competitive economies
• Linear optimal control theory
• Recursive methods for estimating and interpreting vector autoregressions
The models are flexible enough to express alternative senses of a representative household
• A single ‘stand-in’ household of the type used to good effect by Edward C. Prescott.
• Heterogeneous households satisfying conditions for Gorman aggregation into a represen-
tative household.
• Heterogeneous household technologies that violate conditions for Gorman aggregation
but are still susceptible to aggregation into a single representative household via ‘non-
Gorman’ or ‘mongrel’ aggregation’.
These three alternative types of aggregation have different consequences in terms of how
prices and allocations can be computed.
In particular, can prices and an aggregate allocation be computed before the equilibrium allo-
cation to individual heterogeneous households is computed?
• Answers are “Yes” for Gorman aggregation, “No” for non-Gorman aggregation.
In summary, the insights and practical benefits from economics to be introduced in this lec-
ture are
• Deeper understandings that come from recognizing common underlying structures.
• Speed and ease of computation that comes from unleashing a common suite of Python
programs.
We’ll use the following mathematical tools
• Stochastic Difference Equations (Linear).
• Duality: LQ Dynamic Programming and Linear Filtering are the same things mathe-
matically.
• The Spectral Factorization Identity (for understanding vector autoregressions and non-
Gorman aggregation).
So here is our roadmap.
We’ll describe sets of matrices that pin down
77.2. A SUITE OF MODELS 1351
• Information
• Technologies
• Preferences
Then we’ll describe
• Equilibrium concept and computation
• Econometric representation and estimation
We’ll use stochastic linear difference equations to describe information flows and equilibrium
outcomes.
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be a martingale difference sequence adapted to
{𝐽𝑡 ∶ 𝑡 = 0, 1, …} if 𝐸(𝑤𝑡+1 |𝐽𝑡 ) = 0 for 𝑡 = 0, 1, … .
′
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be conditionally homoskedastic if 𝐸(𝑤𝑡+1 𝑤𝑡+1 ∣
𝐽𝑡 ) = 𝐼 for 𝑡 = 0, 1, … .
We assume that the {𝑤𝑡 ∶ 𝑡 = 1, 2, …} process is conditionally homoskedastic.
Let {𝑥𝑡 ∶ 𝑡 = 1, 2, …} be a sequence of 𝑛-dimensional random vectors, i.e. an 𝑛-dimensional
stochastic process.
The process {𝑥𝑡 ∶ 𝑡 = 1, 2, …} is constructed recursively using an initial random vector 𝑥0 ∼
𝒩(𝑥0̂ , Σ0 ) and a time-invariant law of motion:
𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ) = 𝐴𝑥𝑡
𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
𝑡−1
= [∑ 𝐴𝜏 𝐶𝑤𝑡−𝜏 ] + 𝐴𝑡 𝑥0
𝜏=0
𝑗−1
𝑥𝑡+𝑗 = ∑ 𝐴𝑠 𝐶𝑤𝑡+𝑗−𝑠 + 𝐴𝑗 𝑥𝑡
𝑠=0
𝐸𝑡 𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡
𝑗−1
′
𝐸𝑡 (𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 )(𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 ) = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 ≡ 𝑣𝑗
′
𝑘=0
𝑣1 = 𝐶𝐶 ′
𝑣𝑗 = 𝐶𝐶 ′ + 𝐴𝑣𝑗−1 𝐴′ , 𝑗≥2
𝑗−1
′
𝜐𝑗,𝜏 = ∑ 𝐴𝑘 𝐶𝑖𝜏 𝑖′𝜏 𝐶 ′ 𝐴 𝑘 .
𝑘=0
𝑁
Note that ∑𝜏=1 𝑖𝜏 𝑖′𝜏 = 𝐼, so that we have
𝑁
∑ 𝜐𝑗,𝜏 = 𝜐𝑗
𝜏=1
𝑏𝑡 = 𝑈𝑏 𝑧𝑡 and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡 ,
𝑈𝑏 and 𝑈𝑑 are matrices that select entries of 𝑧𝑡 . The law of motion for {𝑧𝑡 ∶ 𝑡 = 0, 1, …} is
where 𝑧0 is a given initial condition. The eigenvalues of the matrix 𝐴22 have absolute values
that are less than or equal to one.
Thus, in summary, our model of information and shocks is
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Here Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 are all matrices conformable to the vectors they multiply and ℓ𝑡 is a
disutility generating resource supplied by the household.
For technical reasons that facilitate computations, we make the following.
Assumption: [Φ𝑐 Φ𝑔 ] is nonsingular.
Households confront a technology that allows them to devote consumption goods to construct
a vector ℎ𝑡 of household capital goods and a vector 𝑠𝑡 of utility generating house services
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
1354 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
77.2.12 Preferences
Where 𝑏𝑡 is a stochastic process of preference shocks that will play the role of demand
shifters, the representative household orders stochastic processes of consumption services 𝑠𝑡
according to
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
We now proceed to give examples of production and household technologies that appear in
various models that appear in the literature.
First, we give examples of production Technologies
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
∣ 𝑔𝑡 ∣≤ ℓ𝑡
𝑐𝑡 + 𝑖𝑡 = 𝑑1𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡
1 1 0 0 𝑑
Φ𝑐 = [ ] , Φ𝑖 = [ ] , Φ𝑔 = [ ] , Γ = [ ] , 𝑑𝑡 = [ 1𝑡 ]
0 𝜙1 −1 0 0
77.2. A SUITE OF MODELS 1355
We can use this specification to create a linear-quadratic version of Lucas’s (1978) asset pric-
ing model.
There is a single consumption good, a single intermediate good, and a single investment good.
The technology is described by
Set
1 0 0
Φ𝑐 = [ ] , Φ𝑔 = [ ] , Φ𝑖 = [ ]
0 −1 𝜙1
𝛾
Γ = [ ] , Δ𝑘 = 𝛿 𝑘 , Θ 𝑘 = 1
0
We set 𝐴22 , 𝐶2 and 𝑈𝑑 to make (𝑑1𝑡 , 𝑑2𝑡 )′ = 𝑑𝑡 follow a desired stochastic process.
Now we describe some examples of preferences, which as we have seen are ordered by
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + (ℓ𝑡 )2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and we make
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one.
Later we shall introduce canonical household technologies that satisfy an ‘invertibility’ re-
quirement relating sequences {𝑠𝑡 } of services and {𝑐𝑡 } of consumption flows.
And we’ll describe how to obtain a canonical representation of a household technology from
one that is not canonical.
Here are some examples of household preferences.
Time Separable preferences
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0
1356 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Consumer Durables
Services at 𝑡 are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
Set Δℎ = 𝛿ℎ , Θℎ = 1, Λ = 𝜆, Π = 0.
Habit Persistence
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0
2 𝑡=0 𝑗=0
𝑡
ℎ𝑡 = (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗 + 𝛿ℎ𝑡+1 ℎ−1
𝑗=0
𝑠𝑡 = −𝜆ℎ𝑡−1 + 𝑐𝑡 , 𝜆 > 0
∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−4𝑗−4 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0 𝑗=0
To implement, set
ℎ̃ 0 0 0 𝛿ℎ ℎ̃ (1 − 𝛿ℎ )
⎡ ̃ 𝑡 ⎤ ⎡ ⎡ ̃ 𝑡−1 ⎤ ⎡
⎢ℎ𝑡−1 ⎥ ⎢1 0 ⎤
0 0 ⎥ ⎢ℎ𝑡−2 ⎥ ⎢ 0 ⎤ ⎥ 𝑐𝑡
ℎ𝑡 = ⎢ ̃ ⎥ = ⎥+
⎢ℎ𝑡−2 ⎥ ⎢0 1 0 0 ⎥⎢ ⎢ℎ̃ 𝑡−3 ⎥ ⎢ 0 ⎥
⎣ℎ̃ 𝑡−3 ⎦ ⎣0 0 1 0 ⎦ ⎣ℎ̃ 𝑡−4 ⎦ ⎣ 0 ⎦
Adjustment Costs.
Recall
∞
1
−( )𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏1𝑡 )2 + 𝜆2 (𝑐𝑡 − 𝑐𝑡−1 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0
0<𝛽<1 , 𝜆>0
ℎ𝑡 = 𝑐 𝑡
0 1
𝑠𝑡 = [ ]ℎ + [ ] 𝑐𝑡
−𝜆 𝑡−1 𝜆
so that
𝑠1𝑡 = 𝑐𝑡
We set the first component 𝑏1𝑡 of 𝑏𝑡 to capture the stochastic bliss process and set the second
component identically equal to zero.
Thus, we set Δℎ = 0, Θℎ = 1
0 1
Λ=[ ] , Π=[ ]
−𝜆 𝜆
0 𝜋 0
Λ = [ ] and Π = [ 1 ]
0 𝜋2 𝜋3
1
− 𝛽 𝑡 (Π𝑐𝑡 − 𝑏𝑡 )′ (Π𝑐𝑡 − 𝑏𝑡 )
2
𝑚𝑢𝑡 = −𝛽 𝑡 [Π′ Π 𝑐𝑡 − Π′ 𝑏𝑡 ]
Production Technology
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2
Household Technology
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
Preferences
77.2. A SUITE OF MODELS 1359
∞
1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0
∞
−(1/2)𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]∣𝐽0
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡
∞ ∞
𝐸 ∑ 𝛽 ℎ𝑡 ⋅ ℎ𝑡 ∣ 𝐽0 < ∞ and 𝐸 ∑ 𝛽 𝑡 𝑘𝑡 ⋅ 𝑘𝑡 ∣ 𝐽0 < ∞
𝑡
𝑡=0 𝑡=0
Define:
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
Thus, we require that each component of ℎ𝑡 and each component of 𝑘𝑡 belong to 𝐿20 .
We shall compare and utilize two approaches to solving the planning problem
• Lagrangian formulation
• Dynamic programming
∞
1
ℒ = −𝐸 ∑ 𝛽 𝑡 [( )[(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
𝑡=0
2
+ 𝑀𝑡𝑑′ ⋅ (Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 − Γ𝑘𝑡−1 − 𝑑𝑡 )
+ 𝑀𝑡𝑘′ ⋅ (𝑘𝑡 − Δ𝑘 𝑘𝑡−1 − Θ𝑘 𝑖𝑡 )
+ 𝑀𝑡ℎ′ ⋅ (ℎ𝑡 − Δℎ ℎ𝑡−1 − Θℎ 𝑐𝑡 )
for 𝑡 = 0, 1, ….
In addition, we have the complementary slackness conditions (these recover the original tran-
sition equations) and also transversality conditions
lim 𝛽 𝑡 𝐸[𝑀𝑡𝑘′ 𝑘𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
lim 𝛽 𝑡 𝐸[𝑀𝑡ℎ′ ℎ𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
The system formed by the FONCs and the transition equations can be handed over to
Python.
Python will solve the planning problem for fixed parameter values.
Here are the Python Ready Equations
𝑀𝑡𝑠 = 𝑏𝑡 − 𝑠𝑡
∞
𝑀𝑡ℎ = 𝐸[∑ 𝛽 𝜏 (Δ′ℎ )𝜏−1 Λ′ 𝑀𝑡+𝜏
𝑠
∣ 𝐽𝑡 ]
𝜏=1
−1
Φ′ Θ′ 𝑀 ℎ + Π′ 𝑀𝑡𝑠
𝑀𝑡𝑑 = [ ′𝑐 ] [ ℎ 𝑡 ]
Φ𝑔 −𝑔𝑡
∞
𝑀𝑡𝑘 = 𝐸[∑ 𝛽 𝜏 (Δ′𝑘 )𝜏−1 Γ′ 𝑀𝑡+𝜏
𝑑
∣ 𝐽𝑡 ]
𝜏=1
Although it is possible to use matrix operator methods to solve the above Python ready
equations, that is not the approach we’ll use.
Instead, we’ll use dynamic programming to get recursive representations for both quantities
and shadow prices.
Φ𝑐 𝑐0 + Φ𝑔 𝑔0 + Φ𝑖 𝑖0 = Γ𝑘−1 + 𝑑0 ,
𝑘0 = Δ𝑘 𝑘−1 + Θ𝑘 𝑖0 ,
ℎ0 = Δℎ ℎ−1 + Θℎ 𝑐0 ,
𝑠0 = Λℎ−1 + Π𝑐0 ,
𝑧1 = 𝐴22 𝑧0 + 𝐶2 𝑤1 , 𝑏0 = 𝑈𝑏 𝑧0 and 𝑑0 = 𝑈𝑑 𝑧0
Because this is a linear-quadratic dynamic programming problem, it turns out that the value
function has the form
𝑉 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝜌
1362 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
∞
−𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0 < 𝛽 < 1
𝑡=0
subject to
𝑉 (𝑥𝑡 ) = −𝑥′𝑡 𝑃 𝑥𝑡 − 𝜌
𝑃 satisfies
The optimum decision rule for 𝑢𝑡 is independent of the parameters 𝐶, and so of the noise
statistics.
Iterating on the Bellman operator leads to
𝑉𝑗 (𝑥𝑡 ) = −𝑥′𝑡 𝑃𝑗 𝑥𝑡 − 𝜌𝑗
∞
max −𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0<𝛽<1
{𝑢𝑡 ,𝑥𝑡+1 }
𝑡=0
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢𝑘𝑡−1 ⎥ , 𝑢𝑡 = 𝑖𝑡
⎣ 𝑧𝑡 ⎦
where
′ ′
𝑥 𝑥 𝑥 𝑅 𝑊 𝑥
[ 𝑡] 𝑆 [ 𝑡] = [ 𝑡] [ ′ ] [ 𝑡]
𝑢𝑡 𝑢𝑡 𝑢𝑡 𝑊 𝑄 𝑢𝑡
𝑆 = (𝐺′ 𝐺 + 𝐻 ′ 𝐻)/2
−1
Φ′𝑐 Θ′ 𝑀 + Π′ 𝑀𝑠
ℳ𝑑𝑡 = 𝑀𝑑 𝑥𝑡 where 𝑀𝑑 = [ ] [ ℎ ℎ ]
Φ′𝑔 −𝑆𝑔
We will use this fact and these equations to compute competitive equilibrium prices.
Let’s start with describing the commodity space and pricing functional for our competi-
tive equilibrium.
For the commodity space, we use
∞
𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
∞
𝜋(𝑐) = 𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0
𝑡=0
The representative household owns endowment process and initial stocks of ℎ and 𝑘 and
chooses stochastic processes for {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , ℓ𝑡 }∞ 2
𝑡=0 , each element of which is in 𝐿0 , to maximize
∞
1
− 𝐸0 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0
subject to
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘−1
𝑡=0 𝑡=0
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
We now describe the problems faced by two types of firms called type I and type II.
77.2. A SUITE OF MODELS 1365
A type I firm rents capital and labor and endowments and produces 𝑐𝑡 , 𝑖𝑡 .
It chooses stochastic processes for {𝑐𝑡 , 𝑖𝑡 , 𝑘𝑡 , ℓ𝑡 , 𝑔𝑡 , 𝑑𝑡 }, each element of which is in 𝐿20 , to maxi-
mize
∞
𝐸0 ∑ 𝛽 𝑡 (𝑝𝑡0 ⋅ 𝑐𝑡 + 𝑞𝑡0 ⋅ 𝑖𝑡 − 𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑤𝑡0 ℓ𝑡 − 𝛼0𝑡 ⋅ 𝑑𝑡 )
𝑡=0
subject to
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
− ℓ𝑡2 + 𝑔𝑡 ⋅ 𝑔𝑡 = 0
A firm of type II acquires capital via investment and then rents stocks of capital to the 𝑐, 𝑖-
producing type I firm.
A type II firm is a price taker facing the vector 𝑣0 and the stochastic processes {𝑟𝑡0 , 𝑞𝑡0 }.
The firm chooses 𝑘−1 and stochastic processes for {𝑘𝑡 , 𝑖𝑡 }∞
𝑡=0 to maximize
∞
𝐸 ∑ 𝛽 𝑡 (𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑞𝑡0 ⋅ 𝑖𝑡 ) ∣ 𝐽0 − 𝑣0 ⋅ 𝑘−1
𝑡=0
subject to
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
• Each component of the price system and the allocation resides in the space
𝐿20 .
• Given the price system and given ℎ−1 , 𝑘−1 , the allocation solves the representative
household’s problem and the problems of the two types of firms.
Versions of the two classical welfare theorems prevail under our assumptions.
We exploit that fact in our algorithm for computing a competitive equilibrium.
Step 1: Solve the planning problem by using dynamic programming.
1366 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
The allocation (i.e., quantities) that solve the planning problem are the competi-
tive equilibrium quantities.
Step 2: use the following formulas to compute the equilibrium price system
𝑤𝑡0 =∣ 𝑆𝑔 𝑥𝑡 ∣ /𝜇𝑤
0
𝑣0 = Γ′ 𝑀0𝑑 /𝜇𝑤 ′ 𝑘 𝑤
0 + Δ𝑘 𝑀0 /𝜇0
Verification: With this price system, values can be assigned to the Lagrange multipliers for
each of our three classes of agents that cause all first-order necessary conditions to be satisfied
at these prices and at the quantities associated with the optimum of the planning problem.
𝑦𝑡 = 𝑈𝑎 𝑥𝑡
∞
𝑎0 = 𝐸 ∑ 𝛽 𝑡 𝑥′𝑡 𝑍𝑎 𝑥𝑡 ∣ 𝐽0
𝑡=0
𝑍𝑎 = 𝑈𝑎′ 𝑀𝑐 /𝜇𝑤
0
𝑎0 = 𝑥′0 𝜇𝑎 𝑥0 + 𝜎𝑎
77.3. ECONOMETRICS 1367
∞
𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜′ )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0
∞
𝛽
𝜎𝑎 = trace (𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 ′ (𝐴𝑜′ )𝜏 )
1−𝛽 𝜏=0
𝐿2𝑡 = [{𝑦𝑠 }∞
𝑠=𝑡 ∶ 𝑦𝑠 is a random variable in 𝐽𝑠 for 𝑠 ≥ 𝑡
∞
and 𝐸 ∑ 𝛽 𝑠−𝑡 𝑦𝑠2 ∣ 𝐽𝑡 < +∞].
𝑠=𝑡
𝑤𝑠𝑡 =∣ 𝑆𝑔 𝑥𝑠 |/[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
𝑟𝑠𝑡 = Γ′ 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
𝛼𝑡𝑠 = 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡
77.3 Econometrics
Up to now, we have described how to solve the direct problem that maps model parameters
into an (equilibrium) stochastic process of prices and quantities.
Recall the inverse problem of inferring model parameters from a single realization of a time
series of some of the prices and quantities.
Another name for the inverse problem is econometrics.
An advantage of the [78] structure is that it comes with a self-contained theory of economet-
rics.
It is really just a tale of two state-space representations.
1368 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡
where 𝑣𝑡 is a martingale difference sequence of measurement errors that satisfies 𝐸𝑣𝑡 𝑣𝑡′ =
𝑅, 𝐸𝑤𝑡+1 𝑣𝑠′ = 0 for all 𝑡 + 1 ≥ 𝑠 and
𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 )
Innovations Representation:
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡 ,
• 𝑛𝑤 + 𝑛𝑦 versus 𝑛𝑦
• 𝐻(𝑦𝑡 ) ⊂ 𝐻(𝑤𝑡 , 𝑣𝑡 )
• 𝐻(𝑦𝑡 ) = 𝐻(𝑎𝑡 )
Kalman Filter:.
Kalman gain:
𝐾𝑡 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1
Σ𝑡+1 = 𝐴𝑜 Σ𝑡 𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴𝑜′
𝑎𝑡 = 𝑦𝑡 − 𝐺𝑥𝑡̂
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
can be used recursively to construct a record of innovations {𝑎𝑡 }𝑇𝑡=0 from an (𝑥0̂ , Σ0 ) and a
record of observations {𝑦𝑡 }𝑇𝑡=0 .
Limiting Time-Invariant Innovations Representation
77.3. ECONOMETRICS 1369
Σ = 𝐴𝑜 Σ𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴𝑜′
𝐾 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝐺′ + 𝑅)−1
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡
𝑓(𝑦𝑇 , 𝑦𝑇 −1 , … , 𝑦0 ) = 𝑓𝑇 (𝑦𝑇 |𝑦𝑇 −1 , … , 𝑦0 )𝑓𝑇 −1 (𝑦𝑇 −1 |𝑦𝑇 −2 , … , 𝑦0 ) ⋯ 𝑓1 (𝑦1 |𝑦0 )𝑓0 (𝑦0 )
= 𝑔𝑇 (𝑎𝑇 )𝑔𝑇 −1 (𝑎𝑇 −1 ) … 𝑔1 (𝑎1 )𝑓0 (𝑦0 ).
Gaussian Log-Likelihood:
𝑇
−.5 ∑{𝑛𝑦 ln(2𝜋) + ln |Ω𝑡 | + 𝑎′𝑡 Ω−1
𝑡 𝑎𝑡 }
𝑡=0
Key Insight: The zeros of the polynomial det[𝐺(𝑧𝐼 −𝐴𝑜 )−1 𝐾+𝐼] all lie inside the unit circle,
which means that 𝑎𝑡 lies in the space spanned by square summable linear combinations of 𝑦𝑡 .
𝐻(𝑎𝑡 ) = 𝐻(𝑦𝑡 )
1370 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝐿𝑥𝑡 ≡ 𝑥𝑡−1
𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1
Applying the inverse of the operator on the right side and using
∞
𝑦𝑡 = ∑ 𝐺(𝐴𝑜 − 𝐾𝐺)𝑗−1 𝐾𝑦𝑡−𝑗 + 𝑎𝑡
𝑗=1
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
• Π is nonsingular, and
√
• the absolute values of the eigenvalues of (Δℎ − Θℎ Π−1 Λ) are strictly less than 1/ 𝛽.
Key invertibility property: A canonical household service technology maps a service pro-
cess {𝑠𝑡 } in 𝐿20 into a corresponding consumption process {𝑐𝑡 } for which the implied house-
hold capital stock process {ℎ𝑡 } is also in 𝐿20 .
An inverse household technology:
77.5. GORMAN AGGREGATION AND ENGEL CURVES 1371
The restriction on the eigenvalues of the matrix (Δℎ − Θℎ Π−1 Λ) keeps the household capital
stock {ℎ𝑡 } in 𝐿20 .
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1
∞
𝑊0 = 𝐸0 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) + 𝑣0 ⋅ 𝑘−1
𝑡=0
∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ (𝑏𝑡 − 𝑠𝑖,𝑡 ) − 𝑊0
𝜇𝑤
0 = ∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ 𝜌𝑡0
This system expresses consumption demands at date 𝑡 as functions of: (i) time-𝑡 conditional
0
expectations of future scaled Arrow-Debreu prices {𝑝𝑡+𝑠 }∞
𝑠=0 ; (ii) the stochastic process for
the household’s endowment {𝑑𝑡 } and preference shock {𝑏𝑡 }, as mediated through the multi-
plier 𝜇𝑤
0 and wealth 𝑊0 ; and (iii) past values of consumption, as mediated through the state
variable ℎ𝑡−1 .
We shall explore how the dynamic demand schedule for consumption goods opens up the pos-
sibility of satisfying Gorman’s (1953) conditions for aggregation in a heterogeneous consumer
model.
The first equation of our demand system is an Engel curve for consumption that is linear in
the marginal utility 𝜇20 of individual wealth with a coefficient on 𝜇𝑤
0 that depends only on
prices.
The multiplier 𝜇𝑤
0 depends on wealth in an affine relationship, so that consumption is linear
in wealth.
In a model with multiple consumers who have the same household technologies (Δℎ , Θℎ , Λ, Π)
but possibly different preference shock processes and initial values of household capital stocks,
the coefficient on the marginal utility of wealth is the same for all consumers.
1372 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Gorman showed that when Engel curves satisfy this property, there exists a unique commu-
nity or aggregate preference ordering over aggregate consumption that is independent of the
distribution of wealth.
𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1 ,
∞
𝑊𝑡 = 𝐸𝑡 ∑ 𝛽 𝑗 (𝑤𝑡+𝑗
𝑡
ℓ𝑡+𝑗 + 𝛼𝑡𝑡+𝑗 ⋅ 𝑑𝑡+𝑗 ) + 𝑣𝑡 ⋅ 𝑘𝑡−1
𝑗=0
∞
𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝜌𝑡+𝑗
𝑡
⋅ (𝑏𝑡+𝑗 − 𝑠𝑖,𝑡+𝑗 ) − 𝑊𝑡
𝜇𝑤
𝑡 = ∞ 𝑡 𝑡
𝐸𝑡 ∑𝑡=0 𝛽 𝑗 𝜌𝑡+𝑗 ⋅ 𝜌𝑡+𝑗
′
[Π + 𝛽 1/2 𝐿Λ(𝐼 − 𝛽 1/2 𝐿Δℎ )−1 Θℎ ]
̂ − 𝛽 1/2 𝐿−1 Δ )−1 Θ ]′ [Π̂ + 𝛽 1/2 𝐿Λ(𝐼
= [Π̂ + 𝛽 1/2 𝐿−1 Λ(𝐼 ̂ − 𝛽 1/2 𝐿Δ )−1 Θ ]
ℎ ℎ ℎ ℎ
The factorization identity guarantees that the [Λ,̂ Π]̂ representation satisfies both require-
ments for a canonical representation.
77.6. PARTIAL EQUILIBRIUM 1373
Now we’ll provide quick overviews of examples of economies that fit within our framework
We provide details for a number of these examples in subsequent lectures
5. Cattle cycles
We’ll start with an example of a partial equilibrium in which we posit demand and supply
curves
Suppose that we want to capture the dynamic demand curve:
From material described earlier in this lecture, we know how to reverse engineer preferences
that generate this demand system
• note how the demand equations are cast in terms of the matrices in our stan-
dard preference representation
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 ⋅ 𝑐𝑡 − 𝑔𝑡 ⋅ 𝑔𝑡 /2}
𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑖 𝑖𝑡 + Φ𝑔 𝑔𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 .
1374 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
∞
𝐸 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − 𝑔𝑡2 /2}
𝑡=0
𝑐𝑡 = 𝛾𝑘𝑡−1
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝑓1 𝑖𝑡 + 𝑓2 𝑑𝑡
where 𝑑𝑡 is a cost shifter, 𝛾 > 0, and 𝑓1 > 0 is a cost parameter and 𝑓2 = 1. Demand is
governed by
𝑝 𝑡 = 𝛼 0 − 𝛼 1 𝑐𝑡 + 𝑢 𝑡
where 𝑢𝑡 is a demand shifter with mean zero and 𝛼0 , 𝛼1 are positive parameters.
Assume that 𝑢𝑡 , 𝑑𝑡 are uncorrelated first-order autoregressive processes.
𝑅𝑡 = 𝑏𝑡 + 𝛼ℎ𝑡
∞
𝑝𝑡 = 𝐸𝑡 ∑(𝛽𝛿ℎ )𝜏 𝑅𝑡+𝜏
𝜏=0
where ℎ𝑡 is the stock of housing at time 𝑡 𝑅𝑡 is the rental rate for housing, 𝑝𝑡 is the price of
new houses, and 𝑏𝑡 is a demand shifter; 𝛼 < 0 is a demand parameter, and 𝛿ℎ is a deprecia-
tion factor for houses.
We cast this demand specification within our class of models by letting the stock of houses ℎ𝑡
evolve according to
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 𝛿ℎ ∈ (0, 1)
𝑠𝑡 = 𝑏𝑡 − 𝜇0 𝜌𝑡0
where the price of new houses 𝑝𝑡 is related to 𝜌𝑡0 by 𝜌𝑡0 = 𝜋−1 [𝑝𝑡 − 𝛽𝛿ℎ 𝐸𝑡 𝑝𝑡+1 ].
77.9. CATTLE CYCLES 1375
Rosen, Murphy, and Scheinkman (1994). Let 𝑝𝑡 be the price of freshly slaughtered beef, 𝑚𝑡
the feeding cost of preparing an animal for slaughter, ℎ̃ 𝑡 the one-period holding cost for a ma-
ture animal, 𝛾1 ℎ̃ 𝑡 the one-period holding cost for a yearling, and 𝛾0 ℎ̃ 𝑡 the one-period holding
cost for a calf.
The cost processes {ℎ̃ 𝑡 , 𝑚𝑡 }∞ ∞
𝑡=0 are exogenous, while the stochastic process {𝑝𝑡 }𝑡=0 is deter-
mined by a rational expectations equilibrium. Let 𝑥𝑡̃ be the breeding stock, and 𝑦𝑡̃ be the to-
tal stock of animals.
The law of motion for cattle stocks is
𝑥𝑡̃ = (1 − 𝛿)𝑥𝑡−1
̃ + 𝑔𝑥𝑡−3
̃ − 𝑐𝑡
∞
𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ̃ 𝑡 𝑥𝑡̃ − (𝛾0 ℎ̃ 𝑡 )(𝑔𝑥𝑡−1
̃ ) − (𝛾1 ℎ̃ 𝑡 )(𝑔𝑥𝑡−2
̃ ) − 𝑚𝑡 𝑐𝑡
𝑡=0
− Ψ(𝑥𝑡̃ , 𝑥𝑡−1
̃ , 𝑥𝑡−2
̃ , 𝑐𝑡 )}
where
𝜓1 2 𝜓2 2 𝜓 𝜓
Ψ= 𝑥𝑡̃ + ̃ + 3 𝑥2𝑡−2
𝑥𝑡−1 ̃ + 4 𝑐𝑡2
2 2 2 2
Demand is governed by
𝑐𝑡 = 𝛼0 − 𝛼1 𝑝𝑡 + 𝑑𝑡̃
We’ll describe the following pair of schooling models that view education as a time-to-build
process:
• Rosen schooling model for engineers
• Two-occupation model
1376 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
Ryoo and Rosen’s (2004) [138] model consists of the following equations:
first, a demand curve for engineers
third, a definition of the discounted present value of each new engineering student
∞
𝑣𝑡 = 𝛽 𝑘 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗 ;
𝑗=0
𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖2𝑡 , 𝛼𝑠 > 0
Here {𝜖1𝑡 , 𝜖2𝑡 } are stochastic processes of labor demand and supply shocks.
∞
Definition: A partial equilibrium is a stochastic process {𝑤𝑡 , 𝑁𝑡 , 𝑣𝑡 , 𝑛𝑡 }𝑡=0 satisfying these
four equations, and initial conditions 𝑁−1 , 𝑛−𝑠 , 𝑠 = 1, … , −𝑘.
We sweep the time-to-build structure and the demand for engineers into the household tech-
nology and putting the supply of new engineers into the technology for producing goods.
ℎ1𝑡−1
⎡ ℎ ⎤
𝑠𝑡 = [𝜆1 0 … 0] ⎢ 2𝑡−1 ⎥ + 0 ⋅ 𝑐𝑡
⎢ ⋮ ⎥
ℎ
⎣ 𝑘+1,𝑡−1 ⎦
ℎ1𝑡 𝛿𝑁 1 0 ⋯ 0 ℎ1𝑡−1 0
⎡ ℎ ⎤ ⎡0 0 1 ⋯ 0⎤ ⎡ ℎ2𝑡−1 ⎤ ⎡0⎤
⎢ 2𝑡 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥=⎢ ⋮ ⋮ ⋮ ⋱ ⋮⎥⎢ ⋮ ⎥ + ⎢ ⋮ ⎥ 𝑐𝑡
⎢ ℎ𝑘,𝑡 ⎥ ⎢ 0 ⋯ ⋯ 0 1⎥ ⎢ ℎ𝑘,𝑡−1 ⎥ ⎢0⎥
⎣ℎ𝑘+1,𝑡 ⎦ ⎣ 0 0 0 ⋯ 0⎦ ⎣ℎ𝑘+1,𝑡−1 ⎦ ⎣1⎦
This specification sets Rosen’s 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−𝜏 , 𝜏 = 1, … , 𝑘, and uses the
home-produced service to capture the demand for labor. Here 𝜆1 embodies Rosen’s demand
parameter 𝛼𝑑 .
• The supply of new workers becomes our consumption.
• The dynamic demand curve becomes Rosen’s dynamic supply curve for new workers.
Remark: This has an Imai-Keane flavor.
For more details and Python code see Rosen schooling model.
77.11. PERMANENT INCOME MODELS 1377
𝑤 𝑁
[ 𝑢𝑡 ] = 𝛼𝑑 [ 𝑢𝑡 ] + 𝜖1𝑡
𝑤𝑠𝑡 𝑁𝑠𝑡
where 𝑁𝑠𝑡 , 𝑁𝑢𝑡 are stocks of the two types of labor, and 𝑛𝑠𝑡 , 𝑛𝑢𝑡 are entry rates into the two
occupations.
third, definitions of discounted present values of new entrants to the skilled and unskilled oc-
cupations, respectively:
∞
𝑣𝑠𝑡 = 𝐸𝑡 𝛽 𝑘 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑠𝑡+𝑘+𝑗
𝑗=0
∞
𝑣𝑢𝑡 = 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑢𝑡+𝑗
𝑗=0
where 𝑤𝑢𝑡 , 𝑤𝑠𝑡 are wage rates for the two occupations; and fourth, supply curves for new en-
trants:
𝑛 𝑣
[ 𝑠𝑡 ] = 𝛼𝑠 [ 𝑢𝑡 ] + 𝜖2𝑡
𝑛𝑢𝑡 𝑣𝑠𝑡
Short Cut
As an alternative, Siow simply used the equalizing differences condition
𝑣𝑢𝑡 = 𝑣𝑠𝑡
𝜙𝑐 ⋅ 𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑒𝑡
𝑘𝑡 = 𝑘𝑡−1 + 𝑖𝑡
1378 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
𝜙𝑖 𝑖 𝑡 − 𝑔 𝑡 = 0
Implication One:
Equality of Present Values of Moving Average Coefficients of 𝑐 and 𝑒
∞
𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 (𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 ) ⇒
𝑗=0
∞
𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 𝐸(𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )|𝐽𝑡 ⇒
𝑗=0
∞ ∞
∑ 𝛽 𝑗 (𝜙𝑐 )′ 𝜒𝑗 = ∑ 𝛽 𝑗 𝜖𝑗
𝑗=0 𝑗=0
and
For more details see Permanent Income Using the DLE class
Testing Permanent Income Models:
We have two types of implications of permanent income models:
• Equality of present values of moving average coefficients.
• Martingale ℳ𝑘𝑡 .
These have been tested in work by Hansen, Sargent, and Roberts (1991) [140] and by Attana-
sio and Pavoni (2011) [12].
We now assume that there is a finite number of households, each with its own household tech-
nology and preferences over consumption services.
Household 𝑗 orders preferences over consumption processes according to
∞
1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) ⋅ (𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) + ℓ𝑗𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑏𝑗𝑡 = 𝑈𝑏𝑗 𝑧𝑡
∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑗𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑗𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑗𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘𝑗,−1 ,
𝑡=0 𝑡=0
where 𝑘𝑗,−1 is given. The 𝑗th consumer owns an endowment process 𝑑𝑗𝑡 , governed by the
stochastic process 𝑑𝑗𝑡 = 𝑈𝑑𝑗 𝑧𝑡 .
We refer to this as a setting with Gorman heterogeneous households.
This specification confines heterogeneity among consumers to:
• differences in the preference processes {𝑏𝑗𝑡 }, represented by different selections of 𝑈𝑏𝑗
• differences in the endowment processes {𝑑𝑗𝑡 }, represented by different selections of 𝑈𝑑𝑗
• differences in ℎ𝑗,−1 and
• differences in 𝑘𝑗,−1
The matrices Λ, Π, Δℎ , Θℎ do not depend on 𝑗.
This makes everybody’s demand system have the form described earlier, with different 𝜇𝑤 𝑗0 ’s
(reflecting different wealth levels) and different 𝑏𝑗𝑡 preference shock processes and initial con-
ditions for household capital stocks.
Punchline: there exists a representative consumer.
We can use the representative consumer to compute a competitive equilibrium aggregate
allocation and price system.
With the equilibrium aggregate allocation and price system in hand, we can then compute
allocations to each household.
Computing Allocations to Individuals:
Set
ℓ𝑗𝑡 = (𝜇𝑤 𝑤
0𝑗 /𝜇0𝑎 )ℓ𝑎𝑡
∞ ∞
𝜇𝑤 𝑡 0 0 0 𝑤 𝑡 0 𝑖 0
0𝑗 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ 𝜌𝑡 + (𝑤𝑡 /𝜇0𝑎 )ℓ𝑎𝑡 } = 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ (𝑏𝑗𝑡 − 𝑠𝑗𝑡 ) − 𝛼𝑡 ⋅ 𝑑𝑗𝑡 } − 𝑣0 𝑘𝑗,−1
𝑡=0 𝑡=0
𝑠𝑗𝑡 − 𝑏𝑗𝑡 = 𝜇𝑤 0
0𝑗 𝜌𝑡
We now describe a less tractable type of heterogeneity across households that we dub Non-
Gorman heterogeneity.
Here is the specification:
Preferences and Household Technologies:
∞
1
− 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) ⋅ (𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) + ℓ𝑖𝑡
2
] ∣ 𝐽0
2 𝑡=0
𝑏𝑖𝑡 = 𝑈𝑏𝑖 𝑧𝑡
Production Technology
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑑𝑖𝑡 = 𝑈𝑑𝑖 𝑧𝑡 , 𝑖 = 1, 2
Pareto Problem:
∞
1
− 𝜆𝐸0 ∑ 𝛽 𝑡 [(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + ℓ1𝑡
2
]
2 𝑡=0
∞
1
− (1 − 𝜆)𝐸0 ∑ 𝛽 𝑡 [(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 ) + ℓ2𝑡
2
]
2 𝑡=0
𝑝𝑡 = 𝜇−1 ′ −1 ′
0 Π 𝑏𝑡 − 𝜇0 Π Π𝑐𝑡
Integrating the marginal utility vector shows that preferences can be taken to be
𝜇−1 ′ −1 −1′
0 Π Π = (𝜇01 Π1 Π2 + 𝜇02 Π−1 −1′ −1
2 Π2 )
Dynamic Analogue:
We now describe how to extend mongrel aggregation to a dynamic setting.
The key comparison is
• Static: factor a covariance matrix-like object
• Dynamic: factor a spectral-density matrix-like object
Programming Problem for Dynamic Mongrel Aggregation:
Our strategy for deducing the mongrel preference ordering over 𝑐𝑡 = 𝑐1𝑡 + 𝑐2𝑡 is to solve the
programming problem: choose {𝑐1𝑡 , 𝑐2𝑡 } to maximize the criterion
∞
∑ 𝛽 𝑡 [𝜆(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + (1 − 𝜆)(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 )]
𝑡=0
subject to
subject to (ℎ1,−1 , ℎ2,−1 ) given and {𝑏1𝑡 }, {𝑏2𝑡 }, {𝑐𝑡 } being known and fixed sequences.
Substituting the {𝑐1𝑡 , 𝑐2𝑡 } sequences that solve this problem as functions of {𝑏1𝑡 , 𝑏2𝑡 , 𝑐𝑡 } into
the objective determines a mongrel preference ordering over {𝑐𝑡 } = {𝑐1𝑡 + 𝑐2𝑡 }.
1382 CHAPTER 77. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
In solving this problem, it is convenient to proceed by using Fourier transforms. For details,
please see [78] where they deploy a
Secret Weapon: Another application of the spectral factorization identity.
Concluding remark: The [78] class of models described in this lecture are all complete
markets models. We have exploited the fact that complete market models are all alike to
allow us to define a class that gives the same name to different things in the spirit of
Henri Poincare.
Could we create such a class for incomplete markets models?
That would be nice, but before trying it would be wise to contemplate the remainder of a
statement by Robert E. Lucas, Jr., with which we began this lecture.
“Complete market economies are all alike but each incomplete market economy is
incomplete in its own individual way.” Robert E. Lucas, Jr., (1989)
Chapter 78
78.1 Contents
This lecture describes several complete market economies having a common linear-quadratic-
Gaussian structure.
Three examples of such economies show how the DLE class can be used to compute equilibria
of such economies in Python and to illustrate how different versions of these economies can or
cannot generate sustained growth.
We require the following imports
1383
1384 CHAPTER 78. GROWTH IN DYNAMIC LINEAR ECONOMIES
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡
• Consumption and physical investment goods are produced using the following technol-
ogy
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
where 𝑐𝑡 is a vector of consumption goods, 𝑔𝑡 is a vector of intermediate goods, 𝑖𝑡 is a
vector of investment goods, 𝑘𝑡 is a vector of physical capital goods, and 𝑙𝑡 is the amount
of labor supplied by the representative household.
• Preferences of a representative household are described by
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
where 𝑠𝑡 is a vector of consumption services, and ℎ𝑡 is a vector of household capital
stocks.
Thus, an instance of this class of economies is described by the matrices
{𝐴22 , 𝐶2 , 𝑈𝑏 , 𝑈𝑑 , Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 , Λ, Π, Δℎ , Θℎ }
The first welfare theorem asserts that a competitive equilibrium allocation solves the follow-
ing planning problem.
Choose {𝑐𝑡 , 𝑠𝑡 , 𝑖𝑡 , ℎ𝑡 , 𝑘𝑡 , 𝑔𝑡 }∞
𝑡=0 to maximize
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
2 𝑡=0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
78.4. EXAMPLE ECONOMIES 1385
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
and
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡
The DLE class in Python maps this planning problem into a linear-quadratic dynamic pro-
gramming problem and then solves it by using QuantEcon’s LQ class.
(See Section 5.5 of Hansen & Sargent (2013) [78] for a full description of how to map these
economies into an LQ setting, and how to use the solution to the LQ problem to construct
the output matrices in order to simulate the economies)
The state for the LQ problem is
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦
𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
Each of the example economies shown here will share a number of components. In particular,
for each we will consider preferences of the form
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0
1386 CHAPTER 78. GROWTH IN DYNAMIC LINEAR ECONOMIES
𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡
ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝜃ℎ 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑐𝑡 + 𝑖𝑡 = 𝛾1 𝑘𝑡−1 + 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤𝑧 + ⎡ 1 0 ⎤𝑤
⎥ 𝑡 ⎢ ⎥ 𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
We shall vary {𝜆, 𝜋, 𝛿ℎ , 𝜃ℎ , 𝛾1 , 𝛿𝑘 , 𝜙1 } and the initial state 𝑥0 across the three economies.
First, we set parameters such that consumption follows a random walk. In particular, we set
1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05
(In this economy 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1.
For simulations of this economy, we choose an initial condition of
′
𝑥0 = [ 5 150 1 0 0 ]
78.4. EXAMPLE ECONOMIES 1387
# Initial condition
x0 = np.array([[5], [150], [1], [0], [0]])
These parameter values are used to define an economy of the DLE class.
We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0
The economy stores the simulated values for each variable. Below we plot consumption and
investment
In [6]: # This is the right panel of Fig 5.7.1 from p.105 of HS2013
plt.plot(econ1.c[0], label='Cons.')
plt.plot(econ1.i[0], label='Inv.')
plt.legend()
plt.show()
1388 CHAPTER 78. GROWTH IN DYNAMIC LINEAR ECONOMIES
Inspection of the plot shows that the sample paths of consumption and investment drift in
ways that suggest that each has or nearly has a random walk or unit root component.
This is confirmed by checking the eigenvalues of 𝐴𝑜
The endogenous eigenvalue that appears to be unity reflects the random walk character of
consumption in Hall’s model.
• Actually, the largest endogenous eigenvalue is very slightly below 1.
• This outcome comes from the small adjustment cost 𝜙1 .
In [8]: econ1.endo[1]
Out[8]: 0.9999999999904767
The fact that the largest endogenous eigenvalue is strictly less than unity in modulus means
that it is possible to compute the non-stochastic steady state of consumption, investment and
capital.
In [9]: econ1.compute_steadystate()
np.set_printoptions(precision=3, suppress=True)
print(econ1.css, econ1.iss, econ1.kss)
However, the near-unity endogenous eigenvalue means that these steady state values are of
little relevance.
78.4. EXAMPLE ECONOMIES 1389
We generate our next economy by making two alterations to the parameters of Example 1.
• First, we raise 𝜙1 from 0.00001 to 1.
– This will lower the endogenous eigenvalue that is close to 1, causing the economy
to head more quickly to the vicinity of its non-stochastic steady-state.
• Second, we raise 𝛾1 from 0.1 to 0.15.
– This has the effect of raising the optimal steady-state value of capital.
We also start the economy off from an initial condition with a lower capital stock
′
𝑥0 = [ 5 20 1 0 0 ]
In [10]: γ2 = 0.15
γ22 = np.array([[γ2], [0]])
ϕ_12 = 1
ϕ_i2 = np.array([[1], [-ϕ_12]])
Creating the DLE class and then simulating gives the following plot for consumption and in-
vestment
econ2.compute_sequence(x02, ts_length=300)
plt.plot(econ2.c[0], label='Cons.')
plt.plot(econ2.i[0], label='Inv.')
plt.legend()
plt.show()
1390 CHAPTER 78. GROWTH IN DYNAMIC LINEAR ECONOMIES
Simulating our new economy shows that consumption grows quickly in the early stages of the
sample.
However, it then settles down around the new non-stochastic steady-state level of consump-
tion of 17.5, which we find as follows
In [12]: econ2.compute_steadystate()
print(econ2.css, econ2.iss, econ2.kss)
The economy converges faster to this level than in Example 1 because the largest endogenous
eigenvalue of 𝐴𝑜 is now significantly lower than 1.
For our third economy, we choose parameter values with the aim of generating sustained
growth in consumption, investment and capital.
To do this, we set parameters so that Jones and Manuelli’s “growth condition” is just satis-
fied.
In our notation, just satisfying the growth condition is actually equivalent to setting 𝛽(𝛾1 +
𝛿𝑘 ) = 1, the condition that was necessary for consumption to be a random walk in Hall’s
model.
Thus, we lower 𝛾1 back to 0.1.
78.4. EXAMPLE ECONOMIES 1391
In our model, this is a necessary but not sufficient condition for growth.
To generate growth we set preference parameters to reflect habit persistence.
In particular, we set 𝜆 = −1, 𝛿ℎ = 0.9 and 𝜃ℎ = 1 − 𝛿ℎ = 0.1.
This makes preferences assume the form
1 ∞ ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 − (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 )2 + 𝑙2𝑡 ]
2 𝑡=0 𝑗=0
Thus, adding habit persistence to the Hall model of Example 1 is enough to generate sus-
tained growth in our economy.
The eigenvalues of 𝐴𝑜 in this new economy are
We now have two unit endogenous eigenvalues. One stems from satisfying the growth condi-
tion (as in Example 1).
The other unit eigenvalue results from setting 𝜆 = −1.
To show the importance of both of these for generating growth, we consider the following ex-
periments.
econ4.compute_sequence(x0, ts_length=300)
plt.plot(econ4.c[0], label='Cons.')
plt.plot(econ4.i[0], label='Inv.')
plt.legend()
plt.show()
78.4. EXAMPLE ECONOMIES 1393
econ5.compute_sequence(x0, ts_length=300)
plt.plot(econ5.c[0], label='Cons.')
plt.plot(econ5.i[0], label='Inv.')
plt.legend()
plt.show()
79.1 Contents
This lecture uses the DLE class to price payout streams that are linear functions of the econ-
omy’s state vector, as well as risk-free assets that pay out one unit of the first consumption
good with certainty.
We assume basic knowledge of the class of economic environments that fall within the domain
of the DLE class.
Many details about the basic environment are contained in the lecture Growth in Dynamic
Linear Economies.
We’ll also need the following imports
We use a linear-quadratic version of an economy that Lucas (1978) [109] used to develop an
equilibrium theory of asset prices:
Preferences
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
1395
1396 CHAPTER 79. LUCAS ASSET PRICING USING DLE
𝑠𝑡 = 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 = 𝑑1𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0
Information
1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤𝑧 + ⎡ 1 0 ⎤𝑤
⎥ 𝑡 ⎢ ⎥ 𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦
𝑈𝑏 = [ 30 0 0 ]
5 1 0
𝑈𝑑 = [ ]
0 0 0
′
𝑥0 = [ 5 150 1 0 0 ]
[78] show that the time t value of a permanent claim to a stream 𝑦𝑠 = 𝑈𝑎 𝑥𝑠 , 𝑠 ≥ 𝑡 is:
𝑎𝑡 = (𝑥′𝑡 𝜇𝑎 𝑥𝑡 + 𝜎𝑎 )/(𝑒1̄ 𝑀𝑐 𝑥𝑡 )
with
∞
′
𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0
∞
𝛽 ′ ′
𝜎𝑎 = trace(𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 (𝐴𝑜 )𝜏 )
1−𝛽 𝜏=0
where
79.3. ASSET PRICING SIMULATIONS 1397
′
𝑍𝑎 = 𝑈 𝑎 𝑀 𝑐
The use of 𝑒1̄ indicates that the first consumption good is the numeraire.
In [3]: gam = 0
γ = np.array([[gam], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
ϕ_1 = 1e-4
ϕ_i = np.array([[0], [-ϕ_1]])
δ_k = np.array([[.95]])
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
ud = np.array([[5, 1, 0],
[0, 0, 0]])
a22 = np.array([[1, 0, 0],
[0, 0.8, 0],
[0, 0, 0.5]])
c2 = np.array([[0, 1, 0],
[0, 0, 1]]).T
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ub = np.array([[30, 0, 0]])
x0 = np.array([[5, 150, 1, 0, 0]]).T
The graph below plots the price of this claim over time:
The next plot displays the realized gross rate of return on this “Lucas tree” as well as on a
risk-free one-period bond:
Above we have also calculated the correlation coefficient between these two returns.
To give an idea of how the term structure of interest rates moves in this economy, the next
plot displays the net rates of return on one-period and five-period risk-free bonds:
From the above plot, we can see the tendency of the term structure to slope up when rates
are low and to slope down when rates are high.
Comparing it to the previous plot of the price of the “Lucas tree”, we can also see that net
rates of return are low when the price of the tree is high, and vice versa.
We now plot the realized gross rate of return on a “Lucas tree” as well as on a risk-free one-
period bond when the autoregressive parameter for the endowment process is reduced to 0.4:
The correlation between these two gross rates is now more negative.
Next, we again plot the net rates of return on one-period and five-period risk-free bonds:
We can see the tendency of the term structure to slope up when rates are low (and down
when rates are high) has been accentuated relative to the first instance of our economy.
1402 CHAPTER 79. LUCAS ASSET PRICING USING DLE
Chapter 80
80.1 Contents
This lecture shows how the DLE class can be used to create impulse response functions for
three related economies, starting from Hall (1978) [67].
Knowledge of the basic economic environment is assumed.
See the lecture “Growth in Dynamic Linear Economies” for more details.
1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05
1403
1404 CHAPTER 80. IRFS IN HALL MODELS
(In this example 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services.
We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1.
For simulations of this economy, we choose an initial condition of:
′
𝑥0 = [ 5 150 1 0 0 ]
These parameter values are used to define an economy of the DLE class.
We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0 .
The economy stores the simulated values for each variable. Below we plot consumption and
investment:
The DLE class can be used to create impulse response functions for each of the endogenous
variables: {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , 𝑔𝑡 }.
If no selector vector for the shock is specified, the default choice is to give IRFs to the first
shock in 𝑤𝑡+1 .
Below we plot the impulse response functions of investment and consumption to an endow-
ment innovation (the first shock) in the Hall model:
It can be seen that the endowment shock has permanent effects on the level of both consump-
tion and investment, consistent with the endogenous unit eigenvalue in this economy.
Investment is much more responsive to the endowment shock at shorter time horizons.
We generate our next economy by making only one change to the parameters of Example 1:
we raise the parameter associated with the cost of adjusting capital,𝜙1 , from 0.00001 to 0.2.
This will lower the endogenous eigenvalue that is unity in Example 1 to a value slightly below
1.
In [7]: econ2.irf(ts_length=40,shock=None)
# This is the left panel of Fig 5.8.1 from p.106 of HS2013
plt.plot(econ2.c_irf,label='Cons.')
plt.plot(econ2.i_irf,label='Inv.')
plt.legend()
plt.show()
In [8]: econ2.endo
1408 CHAPTER 80. IRFS IN HALL MODELS
In [9]: econ2.compute_steadystate()
print(econ2.css, econ2.iss, econ2.kss)
The first graph shows that there seems to be a downward trend in both consumption and in-
vestment.
his is a consequence of the decrease in the largest endogenous eigenvalue from unity in the
earlier economy, caused by the higher adjustment cost.
The present economy has a nonstochastic steady state value of 5 for consumption and 0 for
both capital and investment.
Because the largest endogenous eigenvalue is still close to 1, the economy heads only slowly
towards these mean values.
The impulse response functions now show that an endowment shock does not have a perma-
nent effect on the levels of either consumption or investment.
We generate our third economy by raising 𝜙1 further, to 1.0. We also raise the production
function parameter from 0.1 to 0.15 (which raises the non-stochastic steady state value of
capital above zero).
We also change the specification of preferences to make the consumption good durable.
Specifically, we allow for a single durable household good obeying:
Services are related to the stock of durables at the beginning of the period:
𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
To implement this, we set 𝜆 = 0.1 and 𝜋 = 0 (we have already set 𝜃ℎ = 1 and 𝛿ℎ = 0.9).
We start from an initial condition that makes consumption begin near around its non-
stochastic steady state.
In [10]: ϕ_13 = 1
ϕ_i3 = np.array([[1], [-ϕ_13]])
80.4. EXAMPLE 3: DURABLE CONSUMPTION GOODS 1409
γ_12 = 0.15
γ_2 = np.array([[γ_12], [0]])
l_λ2 = np.array([[0.1]])
π_h2 = np.array([[0]])
In contrast to Hall’s original model of Example 1, it is now investment that is much smoother
than consumption.
This illustrates how making consumption goods durable tends to undo the strong consump-
tion smoothing result that Hall obtained.
The impulse response functions confirm that consumption is now much more responsive to an
endowment shock (and investment less so) than in Example 1.
As in Example 2, the endowment shock has permanent effects on neither variable.
Chapter 81
81.1 Contents
This lecture adds a third solution method for the linear-quadratic-Gaussian permanent in-
come model with 𝛽𝑅 = 1, complementing the other two solution methods described in
Optimal Savings I: The Permanent Income Model and Optimal Savings II: LQ Techniques
and this Jupyter notebook http://nbviewer.jupyter.org/github/QuantEcon/
QuantEcon.notebooks/blob/master/permanent_income.ipynb.
The additional solution method uses the DLE class.
In this way, we map the permanent income model into the framework of Hansen & Sargent
(2013) “Recursive Models of Dynamic Linear Economies” [78].
We’ll also require the following imports
np.set_printoptions(suppress=True, precision=4)
1411
1412 CHAPTER 81. PERMANENT INCOME MODEL USING THE DLE CLASS
∞
𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
where 𝑤𝑡+1 is an IID process with mean zero and identity contemporaneous covariance ma-
trix, 𝐴22 is a stable matrix, its eigenvalues being strictly below unity in modulus, and 𝑈𝑦 is a
selection vector that identifies 𝑦 with a particular linear combination of the 𝑧𝑡 .
We impose the following condition on the consumption, borrowing plan:
∞
𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < +∞ (4)
𝑡=0
𝑧
𝑥𝑡 = [ 𝑡 ]
𝑏𝑡
where 𝑏𝑡 is its one-period debt falling due at the beginning of period 𝑡 and 𝑧𝑡 contains all
variables useful for forecasting its future endowment.
We assume that {𝑦𝑡 } follows a second order univariate autoregressive process:
One way of solving this model is to map the problem into the framework outlined in Section
4.8 of [78] by setting up our technology, information and preference matrices as follows:
1 0 −1 −1
Technology: 𝜙𝑐 = [ ] , 𝜙𝑔 = [ ] , 𝜙𝑖 = [ ], Γ = [ ], Δ𝑘 = 0, Θ𝑘 = 𝑅.
0 1 −0.00001 0
1 0 0 0
0 1 0
Information: 𝐴22 ⎡ ⎤
= ⎢ 𝛼 𝜌1 𝜌2 ⎥, 𝐶2 = ⎢ 𝜎 ⎤
⎡
⎥, 𝑈𝑏 = [ 𝛾 0 0 ], 𝑈𝑑 = [ 0 0 0 ].
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
Preferences: Λ = 0, Π = 1, Δℎ = 0, Θℎ = 0.
We set parameters
𝛼 = 10, 𝛽 = 0.95, 𝜌1 = 0.9, 𝜌2 = 0, 𝜎 = 1
(The value of 𝛾 does not affect the optimal decision rule)
The chosen matrices mean that the household’s technology is:
𝑐𝑡 + 𝑘𝑡−1 = 𝑖𝑡 + 𝑦𝑡
𝑘𝑡
= 𝑖𝑡
𝑅
𝑙2𝑡 = (0.00001)2 𝑖𝑡
Combining the first two of these gives the budget constraint of the permanent income model,
where 𝑘𝑡 = 𝑏𝑡+1 .
The third equation is a very small penalty on debt-accumulation to rule out Ponzi schemes.
We set up this instance of the DLE class below:
γ = np.array([[-1], [0]])
ϕ_c = np.array([[1], [0]])
ϕ_g = np.array([[0], [1]])
ϕ_1 = 1e-5
ϕ_i = np.array([[-1], [-ϕ_1]])
δ_k = np.array([[0]])
1414 CHAPTER 81. PERMANENT INCOME MODEL USING THE DLE CLASS
To check the solution of this model with that from the LQ problem, we select the 𝑆𝑐 matrix
from the DLE class.
The solution to the DLE economy has:
𝑐𝑡 = 𝑆𝑐 𝑥𝑡
In [4]: econ1.Sc
ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦
for i in range(25):
econ1.compute_sequence(x0, ts_length=150)
ax1.plot(econ1.c[0], c='g')
ax1.plot(econ1.d[0], c='b')
ax1.plot(econ1.c[0], label='Consumption', c='g')
ax1.plot(econ1.d[0], label='Income', c='b')
ax1.legend()
for i in range(25):
econ1.compute_sequence(x0, ts_length=150)
ax2.plot(econ1.k[0], color='r')
ax2.plot(econ1.k[0], label='Debt', c='r')
ax2.legend()
plt.show()
1416 CHAPTER 81. PERMANENT INCOME MODEL USING THE DLE CLASS
Chapter 82
82.1 Contents
1417
1418 CHAPTER 82. ROSEN SCHOOLING MODEL
𝑤𝑡 = −𝛼𝑑 𝑁𝑡 + 𝜖𝑑𝑡
• a time-to-build structure of the education process:
𝑁𝑡+𝑘 = 𝛿𝑁 𝑁𝑡+𝑘−1 + 𝑛𝑡
• a definition of the discounted present value of each new engineering student:
∞
𝑣𝑡 = 𝛽𝑘 𝔼 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗
𝑗=0
𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖𝑠𝑡
82.3.1 Preferences
𝛿𝑁 1 0 ⋯ 0 0
⎡0 0 1 ⋯ 0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
Π = 0, Λ = [𝛼𝑑 0 ⋯ 0] , Δℎ = ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ , Θℎ = ⎢ ⋮ ⎥
⎢0 ⋯ ⋯ 0 1⎥ ⎢0⎥
⎣0 0 0 ⋯ 0⎦ ⎣1⎦
82.3.2 Technology
To capture Ryoo and Rosen’s [138] supply curve, we use the physical technology:
𝑐𝑡 = 𝑖𝑡 + 𝑑1𝑡
𝜓1 𝑖 𝑡 = 𝑔 𝑡
82.3.3 Information
1 0 0 0 0
𝐴22 = ⎡0 𝜌 0 ⎤ , 𝐶 = ⎡1 0⎤ , 𝑈 = [30 0 1] , 𝑈 = [10 1 0]
⎢ 𝑠 ⎥ 2 ⎢ ⎥ 𝑏 𝑑
0 0 0
⎣0 0 𝜌𝑑 ⎦ ⎣0 1⎦
where 𝜌𝑠 and 𝜌𝑑 describe the persistence of the supply and demand shocks
β = np.array([[1 / 1.05]])
α_d = np.array([[0.1]])
α_s = 1
ε_1 = 1e-7
λ_1 = np.ones((1, k)) * ε_1
# Use of ε_1 is trick to aquire detectability, see HS2013 p. 228 footnote 4
l_λ = np.hstack((α_d, λ_1))
π_h = np.array([[0]])
δ_n = np.array([[0.95]])
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
ψ_1 = 1 / α_s
δ_k = np.array([[0]])
θ_k = np.array([[0]])
ρ_s = 0.8
1420 CHAPTER 82. ROSEN SCHOOLING MODEL
ρ_d = 0.8
1. Raising 𝛼𝑑 to 2
2. Raising k to 7
3. Raising k to 10
α_d = np.array([[0.1]])
k = 7
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k+1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
k = 10
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))
econ1.irf(ts_length=25, shock=shock_demand)
econ2.irf(ts_length=25, shock=shock_demand)
econ3.irf(ts_length=25, shock=shock_demand)
econ4.irf(ts_length=25, shock=shock_demand)
The first figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝛼𝑑 = 0.1 and 𝛼𝑑 = 2.
When 𝛼𝑑 = 2, the number of new students 𝑛𝑡 rises initially, but the response then turns nega-
tive.
A positive demand shock raises wages, drawing new students into the profession.
However, these new students raise 𝑁𝑡 .
The higher is 𝛼𝑑 , the larger the effect of this rise in 𝑁𝑡 on wages.
This counteracts the demand shock’s positive effect on wages, reducing the number of new
students in subsequent periods.
Consequently, when 𝛼𝑑 is lower, the effect of a demand shock on 𝑁𝑡 is larger
The next figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝑘 = 4, 𝑘 = 7 and 𝑘 = 10 (with 𝛼𝑑 = 0.1)
ax2.plot(econ1.h_irf[:,0], label='$k=4$')
ax2.plot(econ3.h_irf[:,0], label='$k=7$')
ax2.plot(econ4.h_irf[:,0], label='$k=10$')
ax2.legend()
ax2.set_title('Response of $N_t$ to a demand shock')
plt.show()
Both panels in the above figure show that raising k lowers the effect of a positive demand
shock on entry into the engineering profession.
Increasing the number of periods of schooling lowers the number of new students in response
to a demand shock.
This occurs because with longer required schooling, new students ultimately benefit less from
the impact of that shock on wages.
Chapter 83
Cattle Cycles
83.1 Contents
This lecture uses the DLE class to construct instances of the “Cattle Cycles” model of Rosen,
Murphy and Scheinkman (1994) [133].
That paper constructs a rational expectations equilibrium model to understand sources of
recurrent cycles in US cattle stocks and prices.
We make the following imports:
The model features a static linear demand curve and a “time-to-grow” structure for cattle.
Let 𝑝𝑡 be the price of slaughtered beef, 𝑚𝑡 the cost of preparing an animal for slaughter, ℎ𝑡
the holding cost for a mature animal, 𝛾1 ℎ𝑡 the holding cost for a yearling, and 𝛾0 ℎ𝑡 the hold-
ing cost for a calf.
1423
1424 CHAPTER 83. CATTLE CYCLES
𝑥𝑡 = (1 − 𝛿)𝑥𝑡−1 + 𝑔𝑥𝑡−3 − 𝑐𝑡
where 𝑔 < 1 is the number of calves that each member of the breeding stock has each year,
and 𝑐𝑡 is the number of cattle slaughtered.
The total headcount of cattle is
𝑦𝑡 = 𝑥𝑡 + 𝑔𝑥𝑡−1 + 𝑔𝑥𝑡−2
This equation states that the total number of cattle equals the sum of adults, calves and
yearlings, respectively.
A representative farmer chooses {𝑐𝑡 , 𝑥𝑡 } to maximize:
∞
𝜓1 2 𝜓2 2 𝜓 𝜓
𝔼0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ𝑡 𝑥𝑡 − 𝛾0 ℎ𝑡 (𝑔𝑥𝑡−1 ) − 𝛾1 ℎ𝑡 (𝑔𝑥𝑡−2 ) − 𝑚𝑡 𝑐𝑡 − 𝑥 − 𝑥 − 3 𝑥2𝑡−3 − 4 𝑐𝑡2 }
𝑡=0
2 𝑡 2 𝑡−1 2 2
subject to the law of motion for 𝑥𝑡 , taking as given the stochastic laws of motion for the ex-
ogenous processes, the equilibrium price process, and the initial state [𝑥−1 , 𝑥−2 , 𝑥−3 ].
Remark The 𝜓𝑗 parameters are very small quadratic costs that are included for technical
reasons to make well posed and well behaved the linear quadratic dynamic programming
problem solved by the fictitious planner who in effect chooses equilibrium quantities and
shadow prices.
Demand for beef is government by 𝑐𝑡 = 𝑎0 − 𝑎1 𝑝𝑡 + 𝑑𝑡̃ where 𝑑𝑡̃ is a stochastic process with
mean zero, representing a demand shifter.
83.3.1 Preferences
1
−
We set Λ = 0, Δℎ = 0, Θℎ = 0, Π = 𝛼1 2 and 𝑏𝑡 = Π𝑑𝑡̃ + Π𝛼0 .
With these settings, the FOC for the household’s problem becomes the demand curve of the
“Cattle Cycles” model.
83.3.2 Technology
(1 − 𝛿) 0 𝑔 1
Δ𝑘 = ⎡
⎢ 1 0 0 ⎤
⎥ , Θ 𝑘 = ⎡ 0 ⎤
⎢ ⎥
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
83.3. MAPPING INTO HS2013 FRAMEWORK 1425
(where 𝑖𝑡 = −𝑐𝑡 ).
To capture the production of cattle, we set
1 0 0 0 0 1 0 0 0
⎡ 𝑓 ⎤ ⎡ 1 0 0 0 ⎤ ⎡ 0 ⎤ ⎡ 𝑓 (1 − 𝛿) 0 𝑔𝑓 ⎤
⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 1 ⎥
Φ𝑐 = ⎢ 0 ⎥ , Φ𝑔 = ⎢ 0 1 0 0 ⎥ , Φ𝑖 = ⎢ 0 ⎥ , Γ = ⎢ 𝑓3 0 0 ⎥
⎢ 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎢ 0 ⎥ ⎢ 0 𝑓5 0 ⎥
⎣ −𝑓7 ⎦ ⎣ 0 0 0 1 ⎦ 0
⎣ ⎦ ⎣ 0 0 0 ⎦
83.3.3 Information
We set
0
1 0 0 0 0 0 0 ⎡ ⎤
⎡ ⎤ ⎡ ⎤ 𝑓2 𝑈ℎ
0 𝜌1 0 0 1 0 0 ⎢ ⎥
𝐴22 =⎢ ⎥ , 𝐶2 = ⎢ ⎥ , 𝑈𝑏 = [ Π𝛼0 0 0 Π ] , 𝑈𝑑 = ⎢ 𝑓4 𝑈ℎ ⎥
⎢ 0 0 𝜌2 0 ⎥ ⎢ 0 1 0 ⎥
⎢ 𝑓6 𝑈ℎ ⎥
⎣ 0 0 0 𝜌3 ⎦ ⎣ 0 0 15 ⎦
⎣ 𝑓8 𝑈ℎ ⎦
Ψ1 Ψ2 Ψ3
To map this into our class, we set 𝑓12 = 2 , 𝑓22 = 2 , 𝑓32 = 2 , 2𝑓1 𝑓2 = 1, 2𝑓3 𝑓4 = 𝛾0 𝑔,
2𝑓5 𝑓6 = 𝛾1 𝑔.
In [4]: β = np.array([[0.909]])
lλ = np.array([[0]])
a1 = 0.5
πh = np.array([[1 / (sqrt(a1))]])
δh = np.array([[0]])
θh = np.array([[0]])
δ = 0.1
g = 0.85
f1 = 0.001
f3 = 0.001
f5 = 0.001
f7 = 0.001
ϕg = np.array([[0, 0, 0, 0],
[1, 0, 0, 0],
1426 CHAPTER 83. CATTLE CYCLES
[0, 1, 0, 0],
[0, 0, 1,0],
[0, 0, 0, 1]])
γ = np.array([[ 0, 0, 0],
[f1 * (1 - δ), 0, g * f1],
[ f3, 0, 0],
[ 0, f5, 0],
[ 0, 0, 0]])
δk = np.array([[1 - δ, 0, g],
[ 1, 0, 0],
[ 0, 1, 0]])
ρ1 = 0
ρ2 = 0
ρ3 = 0.6
a0 = 500
γ0 = 0.4
γ1 = 0.7
f2 = 1 / (2 * f1)
f4 = γ0 * g / (2 * f3)
f6 = γ1 * g / (2 * f5)
f8 = 1 / (2 * f7)
c2 = np.array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 15]])
Notice that we have set 𝜌1 = 𝜌2 = 0, so ℎ𝑡 and 𝑚𝑡 consist of a constant and a white noise
component.
We set up the economy using tuples for information, technology and preference matrices be-
low.
We also construct two extra information matrices, corresponding to cases when 𝜌3 = 1 and
𝜌3 = 0 (as opposed to the baseline case of 𝜌3 = 0.6).
ρ3_2 = 1
a22_2 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_2]])
ρ3_3 = 0
a22_3 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_3]])
Out[5]: array([[1. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.6]])
[133] use the model to understand the sources of recurrent cycles in total cattle stocks.
Plotting 𝑦𝑡 for a simulation of their model shows its ability to generate cycles in quantities
In their Figure 3, [133] plot the impulse response functions of consumption and the breeding
stock of cattle to the demand shock, 𝑑𝑡̃ , under the three different values of 𝜌3 .
We replicate their Figure 3 below
econ1.irf(ts_length=25, shock=shock_demand)
econ2.irf(ts_length=25, shock=shock_demand)
econ3.irf(ts_length=25, shock=shock_demand)
The above figures show how consumption patterns differ markedly, depending on the persis-
tence of the demand shock:
• If it is purely transitory (𝜌3 = 0) then consumption rises immediately but is later re-
duced to build stocks up again.
• If it is permanent (𝜌3 = 1), then consumption falls immediately, in order to build up
stocks to satisfy the permanent rise in future demand.
In Figure 4 of their paper, [133] plot the response to a demand shock of the breeding stock
and the total stock, for 𝜌3 = 0 and 𝜌3 = 0.6.
We replicate their Figure 4 below
The fact that 𝑦𝑡 is a weighted moving average of 𝑥𝑡 creates a humped shape response of the
total stock in response to demand shocks, contributing to the cyclicality seen in the first
graph of this lecture.
Chapter 84
84.1 Contents
• Overview 84.2
• Model 84.3
• Code 84.4
84.2 Overview
This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [78] class of models described in detail in Recursive Models of Dynamic
Linear Economies.
In addition to what’s in Anaconda, this lecture uses the quantecon library.
This lecture can be viewed as introducing an early contribution to what is now often called a
news and noise issue.
In particular, it analyzes a shock-invertibility issue that is endemic within a class of perma-
nent income models.
Technically, the invertibility problem indicates a situation in which histories of the shocks in
an econometrician’s autoregressive or Wold moving average representation span a smaller in-
formation space than do the shocks that are seen by the agents inside the econometrician’s
model.
1431
1432 CHAPTER 84. SHOCK NON INVERTIBILITY
This situation sets the stage for an econometrician who is unaware of the problem and conse-
quently misinterprets shocks and likely responses to them.
A shock-invertibility that is technically close to the one studied here is discussed by Eric
Leeper, Todd Walker, and Susan Yang [? ] in their analysis of fiscal foresight.
A distinct shock-invertibility issue is present in the special LQ consumption smoothing model
in quantecon lecture.
84.3 Model
We consider the following modification of Robert Hall’s (1978) model [67] in which the en-
dowment process is the sum of two orthogonal autoregressive processes:
Preferences
1 ∞
− 𝔼 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0
𝑠𝑡 = 𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡
Technology
𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0
𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡
Information
1 0 0 0 0 0 0 0
⎡ 0 0.9 0 0 0 0 ⎤ ⎡ 1 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 0 0 0 0 0 ⎥ ⎢ 0 4 ⎥
𝑧𝑡+1 =⎢ ⎥ 𝑧𝑡 + ⎢ ⎥ 𝑤𝑡+1
⎢ 0 0 1 0 0 0 ⎥ ⎢ 0 0 ⎥
⎢ 0 0 0 1 0 0 ⎥ ⎢ 0 0 ⎥
⎣ 0 0 0 0 1 0 ⎦ ⎣ 0 0 ⎦
𝑈𝑏 = [ 30 0 0 0 0 0 ]
The preference shock is constant at 30, while the endowment process is the sum of a constant
and two orthogonal processes.
Specifically:
𝑑𝑡 = 5 + 𝑑1𝑡 + 𝑑2𝑡
𝑑1𝑡 is a first-order AR process, while 𝑑2𝑡 is a third-order pure moving average process.
∞
𝔼 ∑ 𝛽 𝑗 (𝑐𝑡+𝑗 − 𝑑𝑡+𝑗 )|𝐽𝑡 = 𝛽 −1 𝑘𝑡−1 ∀𝑡
𝑗=0
𝑐𝑡 𝜎 (𝐿)
[ ]=[ 1 ] 𝑤𝑡
𝑐𝑡 − 𝑑𝑡 𝜎2 (𝐿)
1434 CHAPTER 84. SHOCK NON INVERTIBILITY
𝑐𝑡 𝜎∗ (𝐿)
[ ] = [ 1∗ ] 𝑢𝑡
𝑐𝑡 − 𝑑 𝑡 𝜎2 (𝐿)
The Appendix of chapter 8 of [78] explains why the impulse response functions in the Wold
representation estimated by the econometrician do not resemble the impulse response func-
tions that depict the response of consumption and the deficit to innovations to agents’ infor-
mation.
Technically, 𝜎2 (𝛽) = [0 0] implies that the history of 𝑢𝑡 s spans a smaller linear space than
does the history of 𝑤𝑡 s.
This means that 𝑢𝑡 will typically be a distributed lag of 𝑤𝑡 that is not concentrated at zero
lag:
∞
𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0
84.4 Code
We will construct Figures from Chapter 8 Appendix E of [78] to illustrate these ideas:
econ1.irf(ts_length=40, shock=None)
ax2.plot(econ1.c_irf, label='Consumption')
ax2.plot(econ1.c_irf - econ1.d_irf[:,0].reshape(40, 1), label='Deficit')
ax2.legend()
ax2.set_title('Response to $w_{2t}$')
plt.show()
84.4. CODE 1435
The above figure displays the impulse response of consumption and the deficit to the endow-
ment innovations.
Consumption displays the characteristic “random walk” response with respect to each innova-
tion.
Each endowment innovation leads to a temporary surplus followed by a permanent net-of-
interest deficit.
The temporary surplus just offsets the permanent deficit in terms of expected present value.
hs_kal = qe.Kalman(lss_hs)
w_lss = hs_kal.whitener_lss()
ma_coefs = hs_kal.stationary_coefficients(50, 'ma')
ma_coefs = ma_coefs
jj = 50
y1_w1 = np.empty(jj)
y2_w1 = np.empty(jj)
y1_w2 = np.empty(jj)
y2_w2 = np.empty(jj)
for t in range(jj):
y1_w1[t] = ma_coefs[t][0, 0]
y1_w2[t] = ma_coefs[t][0, 1]
y2_w1[t] = ma_coefs[t][1, 0]
y2_w2[t] = ma_coefs[t][1, 1]
ax1.legend()
ax1.set_title('Response to $u_{1t}$')
ax2.plot(y1_w2, label='Consumption')
ax2.plot(y2_w2, label='Deficit')
ax2.legend()
ax2.set_title('Response to $u_{2t}$')
plt.show()
The above figure displays the impulse response of consumption and the deficit to the innova-
tions in the econometrician’s Wold representation
• this is the object that would be recovered from a high order vector autoregression on
the econometrician’s observations.
Consumption responds only to the first innovation
• this is indicative of the Granger causality imposed on the [𝑐𝑡 , 𝑐𝑡 − 𝑑𝑡 ] process by Hall’s
model: consumption Granger causes 𝑐𝑡 − 𝑑𝑡 , with no reverse causality.
jj = 20
irf_wlss = w_lss.impulse_response(jj)
ycoefs = irf_wlss[1]
# Pull out the shocks
a1_w1 = np.empty(jj)
a1_w2 = np.empty(jj)
a2_w1 = np.empty(jj)
a2_w2 = np.empty(jj)
for t in range(jj):
a1_w1[t] = ycoefs[t][0, 0]
a1_w2[t] = ycoefs[t][0, 1]
a2_w1[t] = ycoefs[t][1, 0]
a2_w2[t] = ycoefs[t][1, 1]
∞
𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0
While the responses of the innovations to consumption are concentrated at lag zero for both
components of 𝑤𝑡 , the responses of the innovations to (𝑐𝑡 −𝑑𝑡 ) are spread over time (especially
in response to 𝑤1𝑡 ).
Thus, the innovations to (𝑐𝑡 − 𝑑𝑡 ) as revealed by the vector autoregression depend on what the
economic agent views as “old news”.
1438 CHAPTER 84. SHOCK NON INVERTIBILITY
Part XI
1439
Chapter 85
85.1 Contents
• Notation 85.2
• Model Ingredients and Assumptions 85.3
• Dynamic Interpretation 85.4
• Duality 85.5
• Interpretation as a Game Theoretic Problem (Two-player Zero-sum Game) 85.6
Co-author: Balint Szoke
This notebook uses the class Neumann to calculate key objects of a linear growth model of
John von Neumann [161] that was generalized by Kemeny, Morgenstern and Thompson [96].
Objects of interest are the maximal expansion rate (𝛼), the interest factor (𝛽), and the opti-
mal intensities (𝑥) and prices (𝑝).
In addition to watching how the towering mind of John von Neumann formulated an equilib-
rium model of price and quantity vectors in balanced growth, this notebook shows how fruit-
fully to employ the following important tools:
• a zero-sum two-player game
• linear programming
• the Perron-Frobenius theorem
We’ll begin with some imports:
np.set_printoptions(precision=2)
1441
1442 CHAPTER 85. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
"""
This class describes the Generalized von Neumann growth model as it was
discussed in Kemeny et al. (1956, ECTA) and Gale (1960, Chapter 9.5):
Let:
n ... number of goods
m ... number of activities
A ... input matrix is m-by-n
a_{i,j} - amount of good j consumed by activity i
B ... output matrix is m-by-n
b_{i,j} - amount of good j produced by activity i
Parameters
----------
A : array_like or scalar(float)
Part of the state transition equation. It should be `n x n`
B : array_like or scalar(float)
Part of the state transition equation. It should be `n x k`
"""
def __repr__(self):
85.1. CONTENTS 1443
return self.__str__()
def __str__(self):
me = """
Generalized von Neumann expanding model:
- number of goods : {n}
- number of activities : {m}
Assumptions:
- AI: every column of B has a positive entry : {AI}
- AII: every row of A has a positive entry : {AII}
"""
# Irreducible : {irr}
return dedent(me.format(n=self.n, m=self.m,
AI=self.AI, AII=self.AII))
def bounds(self):
"""
Calculate the trivial upper and lower bounds for alpha (expansion�
↪ rate)
and beta (interest factor). See the proof of Theorem 9.8 in Gale�
↪ (1960)
"""
n, m = self.n, self.m
A, B = self.A, self.B
return LB, UB
M(gamma) = B - gamma * A
subject to
[-M', ones(n, 1)] @ (x', v)' <= 0
(x', v) @ (ones(m, 1), 0) = 1
(x', v) >= (0', -inf)
Outputs:
--------
value: scalar
value of the zero-sum game
strategy: vector
if dual = False, it is the intensity vector,
if dual = True, it is the price vector
"""
if dual == False:
# Solve the primal LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog�
↪ minimizes)
c = np.hstack([np.zeros(m), -1])
else:
# Solve the dual LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog�
↪ minimizes)
c = np.hstack([np.zeros(n), 1])
if res.status != 0:
print(res.message)
Outputs:
--------
alpha: scalar
optimal expansion rate
"""
LB, UB = self.bounds()
γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ)
V = ZS[0] # value of the game with γ
if V >= 0:
LB = γ
else:
UB = γ
return γ, x, p
vector
Outputs:
--------
beta: scalar
optimal interest rate
"""
LB, UB = self.bounds()
if V > 0:
LB = γ
else:
UB = γ
return γ, x, p
85.2 Notation
𝑏.,𝑗 > 0 ∀𝑗 = 1, 2, … , 𝑛
• Assumption II: (no free lunch)
𝑎𝑖,. > 0 ∀𝑖 = 1, 2, … , 𝑚
A semi-positive 𝑚-vector:math:x denotes the levels at which activities are operated (intensity
vector).
Therefore,
• vector 𝑥𝑇 𝐴 gives the total amount of goods used in production
• vector 𝑥𝑇 𝐵 gives total outputs
An economy (𝐴, 𝐵) is said to be productive, if there exists a non-negative intensity vector 𝑥 ≥
0 such that 𝑥𝑇 𝐵 > 𝑥𝑇 𝐴.
The semi-positive 𝑛-vector 𝑝 contains prices assigned to the 𝑛 goods.
The 𝑝 vector implies cost and revenue vectors
• the vector 𝐴𝑝 tells costs of the vector of activities
• the vector 𝐵𝑝 tells revenues from the vector of activities
A property of an input-output pair (𝐴, 𝐵) called irreducibility (or indecomposability) deter-
mines whether an economy can be decomposed into multiple ‘’sub-economies”.
Definition: Given an economy (𝐴, 𝐵), the set of goods 𝑆 ⊂ {1, 2, … , 𝑛} is called an indepen-
dent subset if it is possible to produce every good in 𝑆 without consuming any good outside
𝑆. Formally, the set 𝑆 is independent if ∃𝑇 ⊂ {1, 2, … , 𝑚} (subset of activities) such that
𝑎𝑖,𝑗 = 0, ∀𝑖 ∈ 𝑇 and 𝑗 ∈ 𝑆 𝑐 and for all 𝑗 ∈ 𝑆, ∃𝑖 ∈ 𝑇 , s.t. 𝑏𝑖,𝑗 > 0. The economy is irre-
ducible if there are no proper independent subsets.
We study two examples, both coming from Chapter 9.6 of Gale [60]
B1 = np.array([[1, 0, 0, 0],
[0, 0, 2, 0],
[0, 1, 0, 1]])
[0, 0, 0, 0, 1, 0]])
B2 = np.array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 1, 0, 1]])
The following code sets up our first Neumann economy or Neumann instance
Out[4]:
Generalized von Neumann expanding model:
- number of goods : 4
- number of activities : 3
Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True
Out[5]:
Generalized von Neumann expanding model:
- number of goods : 6
- number of activities : 5
Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True
Attach a time index 𝑡 to the preceding objects, regard an economy as a dynamic system, and
study sequences
An interesting special case holds the technology process constant and investigates the dynam-
ics of quantities and prices only.
Accordingly, in the rest of this notebook, we assume that (𝐴𝑡 , 𝐵𝑡 ) = (𝐴, 𝐵) for all 𝑡 ≥ 0.
A crucial element of the dynamic interpretation involves the timing of production.
We assume that production (consumption of inputs) takes place in period 𝑡, while the associ-
ated output materializes in period 𝑡 + 1, i.e. consumption of 𝑥𝑇𝑡 𝐴 in period 𝑡 results in 𝑥𝑇𝑡 𝐵
amounts of output in period 𝑡 + 1.
85.5. DUALITY 1449
𝑥𝑇𝑡 𝐵 ≥ 𝑥𝑇𝑡+1 𝐴 ∀𝑡 ≥ 1
which asserts that no more goods can be used today than were produced yesterday.
Accordingly, 𝐴𝑝𝑡 tells the costs of production in period 𝑡 and 𝐵𝑝𝑡 tells revenues in period 𝑡 +
1.
𝑥𝑡+1 ./𝑥𝑡 = 𝛼, ∀𝑡 ≥ 0
With balanced growth, the law of motion of 𝑥 is evidently 𝑥𝑡+1 = 𝛼𝑥𝑡 and so we can rewrite
the feasibility constraint as
𝑥𝑇𝑡 𝐵 ≥ 𝛼𝑥𝑇𝑡 𝐴 ∀𝑡
In the same spirit, define 𝛽 ∈ ℝ as the interest factor per unit of time.
We assume that it is always possible to earn a gross return equal to the constant interest fac-
tor 𝛽 by investing “outside the model”.
Under this assumption about outside investment opportunities, a no-arbitrage condition gives
rise to the following (no profit) restriction on the price sequence:
𝛽𝐴𝑝𝑡 ≥ 𝐵𝑝𝑡 ∀𝑡
This says that production cannot yield a return greater than that offered by the investment
opportunity (note that we compare values in period 𝑡 + 1).
The balanced growth assumption allows us to drop time subscripts and conduct an analysis
purely in terms of a time-invariant growth rate 𝛼 and interest factor 𝛽.
85.5 Duality
The following two problems are connected by a remarkable dual relationship between the
technological and valuation characteristics of the economy:
Definition: The technological expansion problem (TEP) for the economy (𝐴, 𝐵) is to find a
semi-positive 𝑚-vector 𝑥 > 0 and a number 𝛼 ∈ ℝ, s.t.
max 𝛼
𝛼
s.t. 𝑥𝑇 𝐵 ≥ 𝛼𝑥𝑇 𝐴
1450 CHAPTER 85. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)
Theorem 9.3 of David Gale’s book [60] asserts that if Assumptions I and II are both satisfied,
then a maximum value of 𝛼 exists and it is positive.
It is called the technological expansion rate and is denoted by 𝛼0 . The associated intensity
vector 𝑥0 is the optimal intensity vector.
Definition: The economical expansion problem (EEP) for (𝐴, 𝐵) is to find a semi-positive
𝑛-vector 𝑝 > 0 and a number 𝛽 ∈ ℝ, such that
min 𝛽
𝛽
s.t. 𝐵𝑝 ≤ 𝛽𝐴𝑝
Assumptions I and II imply existence of a minimum value 𝛽0 > 0 called the economic expan-
sion rate.
The corresponding price vector 𝑝0 is the optimal price vector.
Evidently, the criterion functions in technological expansion problem and the economical ex-
pansion problem are both linearly homogeneous, so the optimality of 𝑥0 and 𝑝0 are defined
only up to a positive scale factor.
For simplicity (and to emphasize a close connection to zero-sum games), in the following, we
normalize both vectors 𝑥0 and 𝑝0 to have unit length.
A standard duality argument (see Lemma 9.4. in (Gale, 1960) [60]) implies that under As-
sumptions I and II, 𝛽0 ≤ 𝛼0 .
But in the other direction, that is 𝛽0 ≥ 𝛼0 , Assumptions I and II are not sufficient.
Nevertheless, von Neumann [161] proved the following remarkable “duality-type” result con-
necting TEP and EEP.
Theorem 1 (von Neumann): If the economy (𝐴, 𝐵) satisfies Assumptions I and II, then
there exists a set (𝛾 ∗ , 𝑥0 , 𝑝0 ), where 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ] ⊂ ℝ, 𝑥0 > 0 is an 𝑚-vector, 𝑝0 > 0 is an
𝑛-vector and the following holds true
𝑥𝑇0 𝐵 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴
𝐵𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0
𝑥𝑇0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0
Proof (Sketch): Assumption I and II imply that there exist (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 )
solving the TEP and EEP, respectively. If 𝛾 ∗ > 𝛼0 , then by definition of 𝛼0 , there
cannot exist a semi-positive 𝑥 that satisfies 𝑥𝑇 𝐵 ≥ 𝛾 ∗ 𝑥𝑇 𝐴. Similarly, if 𝛾 ∗ < 𝛽0 ,
there is no semi-positive 𝑝 so that 𝐵𝑝 ≤ 𝛾 ∗ 𝐴𝑝. Let 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ], then 𝑥𝑇0 𝐵 ≥
𝛼0 𝑥𝑇0 𝐴 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴. Moreover, 𝐵𝑝0 ≤ 𝛽0 𝐴𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0 . These two inequalities imply
𝑥0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0.
Here the constant 𝛾 ∗ is both expansion and interest factor (not necessarily optimal).
We have already encountered and discussed the first two inequalities that represent feasibility
and no-profit conditions.
Moreover, the equality compactly captures the requirements that if any good grows at a rate
larger than 𝛾 ∗ (i.e., if it is oversupplied), then its price must be zero; and that if any activity
provides negative profit, it must be unused.
85.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)14
Therefore, these expressions encode all equilibrium conditions and Theorem I essentially
states that under Assumptions I and II there always exists an equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ) with
balanced growth.
Note that Theorem I is silent about uniqueness of the equilibrium. In fact, it does not rule
out (trivial) cases with 𝑥𝑇0 𝐵𝑝0 = 0 so that nothing of value is produced.
To exclude such uninteresting cases, Kemeny, Morgenstern and Thomspson [96] add an extra
requirement
Finding Nash equilibria of a finite two-player zero-sum game can be formulated as a linear
programming problem.
To see this, we introduce the following notation - For a fixed 𝑥, let 𝑣 be the value of the min-
imization problem: 𝑣 ≡ min𝑝 𝑥𝑇 𝐶𝑝 = min𝑗 𝑥𝑇 𝐶𝑒𝑗 - For a fixed 𝑝, let 𝑢 be the value of the
maximization problem: 𝑢 ≡ max𝑥 𝑥𝑇 𝐶𝑝 = max𝑖 (𝑒𝑖 )𝑇 𝐶𝑝.
Then the max-min problem (the game from the maximizing player’s point of view) can be
written as the primal LP
𝑉 (𝐶) = max 𝑣
s.t. 𝑣𝜄𝑇𝑛 ≤ 𝑥𝑇 𝐶
𝑥≥0
𝜄𝑇𝑛 𝑥 = 1
while the min-max problem (the game from the minimizing player’s point of view) is the dual
LP
𝑉 (𝐶) = min 𝑢
s.t. 𝑢𝜄𝑚 ≥ 𝐶𝑝
𝑝≥0
𝜄𝑇𝑚 𝑝 = 1
Hamburger, Thompson and Weil [69] view the input-output pair of the economy as payoff
matrices of two-player zero-sum games. Using this interpretation, they restate Assumption I
and II as follows
𝑀 (𝛾) ≡ 𝐵 − 𝛾𝐴
For fixed 𝛾, treating 𝑀 (𝛾) as a matrix game, we can calculate the solution of the game
• If 𝛾 > 𝛼0 , then for all 𝑥 > 0, there ∃𝑗 ∈ {1, … , 𝑛}, s.t. [𝑥𝑇 𝑀 (𝛾)]𝑗 < 0 implying that
𝑉 (𝑀 (𝛾)) < 0.
• If 𝛾 < 𝛽0 , then for all 𝑝 > 0, there ∃𝑖 ∈ {1, … , 𝑚}, s.t. [𝑀 (𝛾)𝑝]𝑖 > 0 implying that
𝑉 (𝑀 (𝛾)) > 0.
• If 𝛾 ∈ {𝛽0 , 𝛼0 }, then (by Theorem I) the optimal intensity and price vectors 𝑥0 and 𝑝0
satisfy
85.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)14
That is, (𝑥0 , 𝑝0 , 0) is a solution of the game 𝑀 (𝛾) so that 𝑉 (𝑀 (𝛽0 )) = 𝑉 (𝑀 (𝛼0 )) = 0.
• If 𝛽0 < 𝛼0 and 𝛾 ∈ (𝛽0 , 𝛼0 ), then 𝑉 (𝑀 (𝛾)) = 0.
Moreover, if 𝑥′ is optimal for the maximizing player in 𝑀 (𝛾 ′ ) for 𝛾 ′ ∈ (𝛽0 , 𝛼0 ) and 𝑝″ is op-
timal for the minimizing player in 𝑀 (𝛾 ″ ) where 𝛾 ″ ∈ (𝛽0 , 𝛾 ′ ), then (𝑥′ , 𝑝″ , 0) is a solution for
𝑀 (𝛾), ∀𝛾 ∈ (𝛾 ″ , 𝛾 ′ ).
𝑀 (𝛾)𝑝″ = 𝑀 (𝛾 ″ ) + (𝛾 ″ − 𝛾)𝐴𝑝″ ≤ 0
hence 𝑉 (𝑀 (𝛾)) ≤ 0.
It is clear from the above argument that 𝛽0 , 𝛼0 are the minimal and maximal 𝛾 for which
𝑉 (𝑀 (𝛾)) = 0.
Moreover, Hamburger et al. [69] show that the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)) is continuous and non-
increasing in 𝛾.
This suggests an algorithm to compute (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 ) for a given input-output pair
(𝐴, 𝐵).
85.6.2 Algorithm
Hamburger, Thompson and Weil [69] propose a simple bisection algorithm to find the mini-
mal and maximal roots (i.e. 𝛽0 and 𝛼0 ) of the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)).
Step 1
First, notice that we can easily find trivial upper and lower bounds for 𝛼0 and 𝛽0 .
• TEP requires that 𝑥𝑇 (𝐵 − 𝛼𝐴) ≥ 0𝑇 and 𝑥 > 0, so if 𝛼 is so large that max𝑖 {[(𝐵 −
𝛼𝐴)𝜄𝑛 ]𝑖 } < 0, then TEP ceases to have a solution.
Accordingly, let UB be the 𝛼∗ that solves max𝑖 {[(𝐵 − 𝛼∗ 𝐴)𝜄𝑛 ]𝑖 } = 0.
• Similar to the upper bound, if 𝛽 is so low that min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽𝐴)]𝑗 } > 0, then the EEP
has no solution and so we can define LB as the 𝛽 ∗ that solves min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽 ∗ 𝐴)]𝑗 } = 0.
The bounds method calculates these trivial bounds for us
In [6]: n1.bounds()
Step 2
Compute 𝛼0 and 𝛽0
• Finding 𝛼0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated with 𝑀 (𝛾). We can use either the primal or the dual LP problem.
2. If 𝑉 (𝑀 (𝛾)) ≥ 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾.
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖.
• Finding 𝛽0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated. with 𝑀 (𝛾). We can use either the primal or the dual LP problem.
2. If 𝑉 (𝑀 (𝛾)) > 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾.
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖.
Existence: Since 𝑉 (𝑀 (𝐿𝐵)) > 0 and 𝑉 (𝑀 (𝑈 𝐵)) < 0 and 𝑉 (𝑀 (⋅)) is a continuous,
nonincreasing function, there is at least one 𝛾 ∈ [𝐿𝐵, 𝑈 𝐵], s.t. 𝑉 (𝑀 (𝛾)) = 0.
The zerosum method calculates the value and optimal strategies associated with a given 𝛾.
In [7]: γ = 2
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:176:
OptimizeWarning: Unknown solver options: bland
value_ex1_grid = np.asarray([n1.zerosum(γ=γ_grid[i])[0]
for i in range(numb_grid)])
value_ex2_grid = np.asarray([n2.zerosum(γ=γ_grid[i])[0]
for i in range(numb_grid)])
ax.plot(γ_grid, grid)
ax.set(title=f'Example {i}', xlabel='$\gamma$')
ax.axhline(0, c='k', lw=1)
ax.axvline(N.bounds()[0], c='r', ls='--', label='lower bound')
ax.axvline(N.bounds()[1], c='g', ls='--', label='upper bound')
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
The expansion method implements the bisection algorithm for 𝛼0 (and uses the primal LP
problem for 𝑥0 )
α_0 = 1.2599210478365421
x_0 = [0.33 0.26 0.41]
The corresponding p from the dual = [4.13e-01 3.27e-01 2.60e-01 1.82e-10]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:176:
OptimizeWarning: Unknown solver options: bland
The interest method implements the bisection algorithm for 𝛽0 (and uses the dual LP prob-
lem for 𝑝0 )
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:176:
OptimizeWarning: Unknown solver options: bland
β_0 = 1.2599210478365421
p_0 = [4.13e-01 3.27e-01 2.60e-01 1.82e-10]
The corresponding x from the primal = [0.33 0.26 0.41]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
Of course, when 𝛾 ∗ is unique, it is irrelevant which one of the two methods we use.
In particular, as will be shown below, in case of an irreducible (𝐴, 𝐵) (like in Example 1), the
maximal and minimal roots of 𝑉 (𝑀 (𝛾)) necessarily coincide implying a “full duality” result,
i.e. 𝛼0 = 𝛽0 = 𝛾 ∗ , and that the expansion (and interest) rate 𝛾 ∗ is unique.
As an illustration, compute first the maximal and minimal roots of 𝑉 (𝑀 (⋅)) for Example 2,
which displays a reducible input-output pair (𝐴, 𝐵)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
α_0 = 1.1343231229111552
x_0 = [1.67e-11 1.83e-11 3.24e-01 2.61e-01 4.15e-01]
The corresponding p from the dual = [5.04e-01 4.96e-01 2.96e-12 2.24e-12 3.08e-12
3.56e-12]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:176:
OptimizeWarning: Unknown solver options: bland
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:176:
OptimizeWarning: Unknown solver options: bland
β_0 = 1.2579826870933175
p_0 = [5.11e-01 4.89e-01 2.73e-08 2.17e-08 1.88e-08 2.66e-09]
The corresponding x from the primal = [1.61e-09 1.65e-09 3.27e-01 2.60e-01 4.12e-01]
85.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)14
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:158:
OptimizeWarning: Unknown solver options: bland
As we can see, with a reducible (𝐴, 𝐵), the roots found by the bisection algorithms might dif-
fer, so there might be multiple 𝛾 ∗ that make the value of the game with 𝑀 (𝛾 ∗ ) zero. (see the
figure above).
Indeed, although the von Neumann theorem assures existence of the equilibrium, Assump-
tions I and II are not sufficient for uniqueness. Nonetheless, Kemeny et al. (1967) show that
there are at most finitely many economic solutions, meaning that there are only finitely many
𝛾 ∗ that satisfy 𝑉 (𝑀 (𝛾 ∗ )) = 0 and 𝑥𝑇0 𝐵𝑝0 > 0 and that for each such 𝛾𝑖∗ , there is a self-
sufficient part of the economy (a sub-economy) that in equilibrium can expand independently
with the expansion coefficient 𝛾𝑖∗ .
The following theorem (see Theorem 9.10. in Gale [60]) asserts that imposing irreducibility is
sufficient for uniqueness of (𝛾 ∗ , 𝑥0 , 𝑝0 ).
Theorem II: Consider the conditions of Theorem 1. If the economy (𝐴, 𝐵) is irreducible,
then 𝛾 ∗ = 𝛼0 = 𝛽0 .
There is a special (𝐴, 𝐵) that allows us to simplify the solution method significantly by invok-
ing the powerful Perron-Frobenius theorem for non-negative matrices.
Definition: We call an economy simple if it satisfies 1. 𝑛 = 𝑚 2. Each activity produces
exactly one good 3. Each good is produced by one and only one activity.
These assumptions imply that 𝐵 = 𝐼𝑛 , i.e., that 𝐵 can be written as an identity matrix (pos-
sibly after reshuffling its rows and columns).
The simple model has the following special property (Theorem 9.11. in Gale [60]): if 𝑥0 and
𝛼0 > 0 solve the TEP with (𝐴, 𝐼𝑛 ), then
1
𝑥𝑇0 = 𝛼0 𝑥𝑇0 𝐴 ⇔ 𝑥𝑇0 𝐴 = ( ) 𝑥𝑇0
𝛼0
The latter shows that 1/𝛼0 is a positive eigenvalue of 𝐴 and 𝑥0 is the corresponding non-
negative left eigenvector.
The classical result of Perron and Frobenius implies that a non-negative matrix always has
a non-negative eigenvalue-eigenvector pair.
Moreover, if 𝐴 is irreducible, then the optimal intensity vector 𝑥0 is positive and unique up to
multiplication by a positive scalar.
Suppose that 𝐴 is reducible with 𝑘 irreducible subsets 𝑆1 , … , 𝑆𝑘 . Let 𝐴𝑖 be the submatrix
corresponding to 𝑆𝑖 and let 𝛼𝑖 and 𝛽𝑖 be the associated expansion and interest factors, re-
spectively. Then we have
1459
Chapter 86
86.1 Contents
• Overview 86.2
• Introduction 86.3
• Spectral Analysis 86.4
• Implementation 86.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
86.2 Overview
In this lecture we study covariance stationary linear stochastic processes, a class of models
routinely used to study economic and financial time series.
This class has the advantage of being
We will focus much of our attention on linear covariance stationary models with a finite num-
ber of parameters.
In particular, we will study stationary ARMA processes, which form a cornerstone of the
standard theory of time series analysis.
Every ARMA process can be represented in linear state space form.
However, ARMA processes have some important structure that makes it valuable to study
them separately.
1461
1462 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
86.3 Introduction
86.3.1 Definitions
2. For all 𝑘 in ℤ, the 𝑘-th autocovariance 𝛾(𝑘) ∶= 𝔼(𝑋𝑡 − 𝜇)(𝑋𝑡+𝑘 − 𝜇) is finite and depends
only on 𝑘.
Perhaps the simplest class of covariance stationary processes is the white noise processes.
A process {𝜖𝑡 } is called a white noise process if
1. 𝔼𝜖𝑡 = 0
From the simple building block provided by white noise, we can construct a very flexible fam-
ily of covariance stationary processes — the general linear processes
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 , 𝑡∈ℤ (1)
𝑗=0
where
• {𝜖𝑡 } is white noise
∞
• {𝜓𝑡 } is a square summable sequence in ℝ (that is, ∑𝑡=0 𝜓𝑡2 < ∞)
The sequence {𝜓𝑡 } is often called a linear filter.
Equation (1) is said to present a moving average process or a moving average representa-
tion.
With some manipulations, it is possible to confirm that the autocovariance function for (1) is
∞
𝛾(𝑘) = 𝜎2 ∑ 𝜓𝑗 𝜓𝑗+𝑘 (2)
𝑗=0
By the Cauchy-Schwartz inequality, one can show that 𝛾(𝑘) satisfies equation (2).
Evidently, 𝛾(𝑘) does not depend on 𝑡.
1464 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
Remarkably, the class of general linear processes goes a long way towards describing the en-
tire class of zero-mean covariance stationary processes.
In particular, Wold’s decomposition theorem states that every zero-mean covariance station-
ary process {𝑋𝑡 } can be written as
∞
𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 + 𝜂𝑡
𝑗=0
where
• {𝜖𝑡 } is white noise
• {𝜓𝑡 } is square summable
• 𝜓0 𝜖𝑡 is the one-step ahead prediction error in forecasting 𝑋𝑡 as a linear least-squares
function of the infinite history 𝑋𝑡−1 , 𝑋𝑡−2 , …
• 𝜂𝑡 can be expressed as a linear function of 𝑋𝑡−1 , 𝑋𝑡−2 , … and is perfectly predictable
over arbitrarily long horizons
For the method of constructing a Wold representation, intuition, and further discussion, see
[142], p. 286.
86.3.5 AR and MA
𝜎2
𝛾(𝑘) = 𝜙𝑘 , 𝑘 = 0, 1, … (4)
1 − 𝜙2
The next figure plots an example of this function for 𝜙 = 0.8 and 𝜙 = −0.8 with 𝜎 = 1.
times = list(range(16))
acov = [ϕ**k / (1 - ϕ**2) for k in times]
ax.plot(times, acov, 'bo-', alpha=0.6,
label=f'autocovariance, $\phi = {ϕ:.2}$')
ax.legend(loc='upper right')
ax.set(xlabel='time', xlim=(0, 15))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
plt.show()
Another very simple process is the MA(1) process (here MA means “moving average”)
𝑋𝑡 = 𝜖𝑡 + 𝜃𝜖𝑡−1
The AR(1) can be generalized to an AR(𝑝) and likewise for the MA(1).
Putting all of this together, we get the
A stochastic process {𝑋𝑡 } is called an autoregressive moving average process, or ARMA(𝑝, 𝑞),
if it can be written as
1466 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
𝐿0 𝑋𝑡 − 𝜙1 𝐿1 𝑋𝑡 − ⋯ − 𝜙𝑝 𝐿𝑝 𝑋𝑡 = 𝐿0 𝜖𝑡 + 𝜃1 𝐿1 𝜖𝑡 + ⋯ + 𝜃𝑞 𝐿𝑞 𝜖𝑡 (6)
In what follows we always assume that the roots of the polynomial 𝜙(𝑧) lie outside the unit
circle in the complex plane.
This condition is sufficient to guarantee that the ARMA(𝑝, 𝑞) process is covariance stationary.
In fact, it implies that the process falls within the class of general linear processes described
above.
That is, given an ARMA(𝑝, 𝑞) process {𝑋𝑡 } satisfying the unit circle condition, there exists a
∞
square summable sequence {𝜓𝑡 } with 𝑋𝑡 = ∑𝑗=0 𝜓𝑗 𝜖𝑡−𝑗 for all 𝑡.
The sequence {𝜓𝑡 } can be obtained by a recursive procedure outlined on page 79 of [38].
The function 𝑡 ↦ 𝜓𝑡 is often called the impulse response function.
Autocovariance functions provide a great deal of information about covariance stationary pro-
cesses.
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution.
Even for non-Gaussian processes, it provides a significant amount of information.
It turns out that there is an alternative representation of the autocovariance function of a
covariance stationary process, called the spectral density.
At times, the spectral density is easier to derive, easier to manipulate, and provides additional
intuition.
86.4. SPECTRAL ANALYSIS 1467
Before discussing the spectral density, we invite you to recall the main properties of complex
numbers (or skip to the next section).
It can be helpful to remember that, in a formal sense, complex numbers are just points
(𝑥, 𝑦) ∈ ℝ2 endowed with a specific notion of multiplication.
When (𝑥, 𝑦) is regarded as a complex number, 𝑥 is called the real part and 𝑦 is called the
imaginary part.
The modulus or absolute value of a complex number 𝑧 = (𝑥, 𝑦) is just its Euclidean norm in
ℝ2 , but is usually written as |𝑧| instead of ‖𝑧‖.
The product of two complex numbers (𝑥, 𝑦) and (𝑢, 𝑣) is defined to be (𝑥𝑢−𝑣𝑦, 𝑥𝑣+𝑦𝑢), while
addition is standard pointwise vector addition.
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field — addition and multiplication play well together, just as they do in ℝ.
The complex number (𝑥, 𝑦) is often written as 𝑥 + 𝑖𝑦, where 𝑖 is called the imaginary unit and
is understood to obey 𝑖2 = −1.
The 𝑥 + 𝑖𝑦 notation provides an easy way to remember the definition of multiplication given
above, because, proceeding naively,
Converted back to our first notation, this becomes (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢) as promised.
Complex numbers can be represented in the polar form 𝑟𝑒𝑖𝜔 where
(Some authors normalize the expression on the right by constants such as 1/𝜋 — the conven-
tion chosen makes little difference provided you are consistent).
Using the fact that 𝛾 is even, in the sense that 𝛾(𝑡) = 𝛾(−𝑡) for all 𝑡, we can show that
• real-valued
• even (𝑓(𝜔) = 𝑓(−𝜔) ), and
• 2𝜋-periodic, in the sense that 𝑓(2𝜋 + 𝜔) = 𝑓(𝜔) for all 𝜔
It follows that the values of 𝑓 on [0, 𝜋] determine the values of 𝑓 on all of ℝ — the proof is an
exercise.
For this reason, it is standard to plot the spectral density only on the interval [0, 𝜋].
It is an exercise to show that the MA(1) process 𝑋𝑡 = 𝜃𝜖𝑡−1 + 𝜖𝑡 has a spectral density
With a bit more effort, it’s possible to show (see, e.g., p. 261 of [142]) that the spectral den-
sity of the AR(1) process 𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 is
𝜎2
𝑓(𝜔) = (11)
1 − 2𝜙 cos(𝜔) + 𝜙2
More generally, it can be shown that the spectral density of the ARMA process (5) is
2
𝜃(𝑒𝑖𝜔 )
𝑓(𝜔) = ∣ ∣ 𝜎2 (12)
𝜙(𝑒𝑖𝜔 )
where
• 𝜎 is the standard deviation of the white noise process {𝜖𝑡 }.
• the polynomials 𝜙(⋅) and 𝜃(⋅) are as defined in (7).
The derivation of (12) uses the fact that convolutions become products under Fourier trans-
formations.
The proof is elegant and can be found in many places — see, for example, [142], chapter 11,
section 4.
It’s a nice exercise to verify that (10) and (11) are indeed special cases of (12).
Plotting (11) reveals the shape of the spectral density for the AR(1) model when 𝜙 takes the
values 0.8 and -0.8 respectively.
86.4. SPECTRAL ANALYSIS 1469
These spectral densities correspond to the autocovariance functions for the AR(1) process
shown above.
Informally, we think of the spectral density as being large at those 𝜔 ∈ [0, 𝜋] at which the
autocovariance function seems approximately to exhibit big damped cycles.
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral
density for the case 𝜙 = −0.8 is large at 𝜔 = 𝜋.
Recall that the spectral density can be expressed as
1470 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
When we evaluate this at 𝜔 = 𝜋, we get a large number because cos(𝜋𝑘) is large and positive
when (−0.8)𝑘 is positive, and large in absolute value and negative when (−0.8)𝑘 is negative.
Hence the product is always large and positive, and hence the sum of the products on the
right-hand side of (13) is large.
These ideas are illustrated in the next figure, which has 𝑘 on the horizontal axis.
In [5]: ϕ = -0.8
times = list(range(16))
y1 = [ϕ**k / (1 - ϕ**2) for k in times]
y2 = [np.cos(np.pi * k) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("k")
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:27:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
86.4. SPECTRAL ANALYSIS 1471
On the other hand, if we evaluate 𝑓(𝜔) at 𝜔 = 𝜋/3, then the cycles are not matched, the
sequence 𝛾(𝑘) cos(𝜔𝑘) contains both positive and negative terms, and hence the sum of these
terms is much smaller.
In [6]: ϕ = -0.8
times = list(range(16))
y1 = [ϕ**k / (1 - ϕ**2) for k in times]
y2 = [np.cos(np.pi * k/3) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]
num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)
# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
1472 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("$k$")
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:27:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
In summary, the spectral density is large at frequencies 𝜔 where the autocovariance function
exhibits damped cycles.
We have just seen that the spectral density is useful in the sense that it provides a frequency-
based perspective on the autocovariance structure of a covariance stationary process.
86.4. SPECTRAL ANALYSIS 1473
Another reason that the spectral density is useful is that it can be “inverted” to recover the
autocovariance function via the inverse Fourier transform.
In particular, for all 𝑘 ∈ ℤ, we have
𝜋
1
𝛾(𝑘) = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 (14)
2𝜋 −𝜋
This is convenient in situations where the spectral density is easier to calculate and manipu-
late than the autocovariance function.
(For example, the expression (12) for the ARMA spectral density is much easier to work with
than the expression for the ARMA autocovariance)
This section is loosely based on [142], p. 249-253, and included for those who
• would like a bit more insight into spectral densities
• and have at least some background in Hilbert space theory
Others should feel free to skip to the next section — none of this material is necessary to
progress to computation.
Recall that every separable Hilbert space 𝐻 has a countable orthonormal basis {ℎ𝑘 }.
The nice thing about such a basis is that every 𝑓 ∈ 𝐻 satisfies
𝑒𝑖𝜔𝑘
ℎ𝑘 (𝜔) = √ , 𝑘 ∈ ℤ, 𝜔 ∈ [−𝜋, 𝜋]
2𝜋
Using the definition of 𝑇 from above and the fact that 𝑓 is even, we now have
𝑒𝑖𝜔𝑘 1
𝑇 𝛾 = ∑ 𝛾(𝑘) √ = √ 𝑓(𝜔) (16)
𝑘∈ℤ 2𝜋 2𝜋
In other words, apart from a scalar multiple, the spectral density is just a transformation of
𝛾 ∈ ℓ2 under a certain linear isometry — a different way to view 𝛾.
In particular, it is an expansion of the autocovariance function with respect to the trigono-
metric basis functions in 𝐿2 .
As discussed above, the Fourier coefficients of 𝑇 𝛾 are given by the sequence 𝛾, and, in partic-
ular, 𝛾(𝑘) = ⟨𝑇 𝛾, ℎ𝑘 ⟩.
Transforming this inner product into its integral expression and using (16) gives (14), justify-
ing our earlier expression for the inverse transform.
86.5 Implementation
Most code for working with covariance stationary models deals with ARMA models.
Python code for studying ARMA models can be found in the tsa submodule of statsmodels.
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis —
we’ve put together the module arma.py, which is part of QuantEcon.py package.
The module provides functions for mapping ARMA(𝑝, 𝑞) models into their
3. autocovariance function
4. spectral density
86.5.1 Application
Let’s use this code to replicate the plots on pages 68–69 of [108].
Here are some functions to generate the plots
ax = plt.gca()
yi = arma.impulse_response()
ax.stem(list(range(len(yi))), yi)
ax.set(xlim=(-0.5), ylim=(min(yi)-0.1, max(yi)+0.1),
title='Impulse response', xlabel='time', ylabel='response')
return ax
def quad_plot(arma):
"""
Plots the impulse response, spectral_density, autocovariance,
and one realization of the process.
"""
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8))
plot_functions = [plot_impulse_response,
plot_spectral_density,
plot_autocovariance,
plot_simulation]
for plot_func, ax in zip(plot_functions, axes.flatten()):
plot_func(arma, ax)
plt.tight_layout()
plt.show()
In [8]: ϕ = 0.0
θ = 0.0
1476 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
"""
/home/ubuntu/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py:538:
ComplexWarning: Casting complex values to real discards the imaginary part
return array(a, dtype, copy=False, order=order)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16:
UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
app.launch_new_instance()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/matplotlib/transforms.py:923:
ComplexWarning: Casting complex values to real discards the imaginary part
self._points[:, 1] = interval
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:23:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
If we look carefully, things look good: the spectrum is the flat line at 100 at the very top of
the spectrum graphs, which is at it should be.
Also
1 𝜋
• the variance equals 1 = 2𝜋 ∫−𝜋 1𝑑𝜔 as it should.
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
"""
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16:
UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
app.launch_new_instance()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:23:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
In [10]: ϕ = 0.9
θ = -0.0
1478 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
"""
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16:
UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
app.launch_new_instance()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:23:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
86.5. IMPLEMENTATION 1479
"""
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16:
UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
app.launch_new_instance()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:23:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
In [12]: ϕ = .98
θ = -0.7
arma = qe.ARMA(ϕ, θ)
quad_plot(arma)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
set the "use_line_collection" keyword argument to True.
"""
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16:
UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
app.launch_new_instance()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:23:
UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a
LineCollection instead of individual lines. This significantly improves the
performance of a stem plot. To remove this warning and switch to the new behaviour,
1480 CHAPTER 86. COVARIANCE STATIONARY PROCESSES
86.5.2 Explanation
The call
arma = ARMA(ϕ, θ, σ)
𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 + 𝜃𝜖𝑡−1
The two numerical packages most useful for working with ARMA models are scipy.signal
and numpy.fft.
86.5. IMPLEMENTATION 1481
The package scipy.signal expects the parameters to be passed into its functions in a
manner consistent with the alternative ARMA notation (8).
For example, the impulse response sequence {𝜓𝑡 } discussed above can be obtained using
scipy.signal.dimpulse, and the function call should be of the form
where ma_poly and ar_poly correspond to the polynomials in (7) — that is,
• ma_poly is the vector (1, 𝜃1 , 𝜃2 , … , 𝜃𝑞 )
• ar_poly is the vector (1, −𝜙1 , −𝜙2 , … , −𝜙𝑝 )
To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their
values computed automatically from the values of phi and theta supplied by the user.
If the user decides to change the value of either theta or phi ex-post by assignments such
as arma.phi = (0.5, 0.2) or arma.theta = (0, -0.1).
then ma_poly and ar_poly should update automatically to reflect these new parameters.
This is achieved in our implementation by using descriptors.
As discussed above, for ARMA processes the spectral density has a simple representation that
is relatively easy to calculate.
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform.
Here we use NumPy’s Fourier transform package np.fft, which wraps a standard Fortran-
based package called FFTPACK.
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given
sequence 𝐴0 , 𝐴1 , … , 𝐴𝑛−1 and returns the sequence 𝑎0 , 𝑎1 , … , 𝑎𝑛−1 defined by
1 𝑛−1
𝑎𝑘 = ∑ 𝐴 𝑒𝑖𝑘2𝜋𝑡/𝑛
𝑛 𝑡=0 𝑡
Thus, if we set 𝐴𝑡 = 𝑓(𝜔𝑡 ), where 𝑓 is the spectral density and 𝜔𝑡 ∶= 2𝜋𝑡/𝑛, then
1 𝑛−1 1 2𝜋 𝑛−1
𝑎𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 , 𝜔𝑡 ∶= 2𝜋𝑡/𝑛
𝑛 𝑡=0 2𝜋 𝑛 𝑡=0
2𝜋 𝜋
1 1
𝑎𝑘 ≈ ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔
2𝜋 0 2𝜋 −𝜋
Estimation of Spectra
87.1 Contents
• Overview 87.2
• Periodograms 87.3
• Smoothing 87.4
• Exercises 87.5
• Solutions 87.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
87.2 Overview
1483
1484 CHAPTER 87. ESTIMATION OF SPECTRA
87.3 Periodograms
Recall that the spectral density 𝑓 of a covariance stationary process with autocorrelation
function 𝛾 can be written
Now consider the problem of estimating the spectral density of a given time series, when 𝛾 is
unknown.
In particular, let 𝑋0 , … , 𝑋𝑛−1 be 𝑛 consecutive observations of a single time series that is as-
sumed to be covariance stationary.
The most common estimator of the spectral density of this process is the periodogram of
𝑋0 , … , 𝑋𝑛−1 , which is defined as
2
1 𝑛−1
𝐼(𝜔) ∶= ∣∑ 𝑋𝑡 𝑒𝑖𝑡𝜔 ∣ , 𝜔∈ℝ (1)
𝑛 𝑡=0
2 2
1⎧
{ 𝑛−1 𝑛−1 ⎫
}
𝐼(𝜔) = [∑ 𝑋𝑡 cos(𝜔𝑡)] + [∑ 𝑋𝑡 sin(𝜔𝑡)]
𝑛⎨
{ ⎬
}
⎩ 𝑡=0 𝑡=0 ⎭
It is straightforward to show that the function 𝐼 is even and 2𝜋-periodic (i.e., 𝐼(𝜔) = 𝐼(−𝜔)
and 𝐼(𝜔 + 2𝜋) = 𝐼(𝜔) for all 𝜔 ∈ ℝ).
From these two results, you will be able to verify that the values of 𝐼 on [0, 𝜋] determine the
values of 𝐼 on all of ℝ.
The next section helps to explain the connection between the periodogram and the spectral
density.
87.3.1 Interpretation
To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies
2𝜋𝑗
𝜔𝑗 ∶= , 𝑗 = 0, … , 𝑛 − 1
𝑛
𝑛−1 𝑛−1
𝑡
∑ 𝑒𝑖𝑡𝜔𝑗 = ∑ exp {𝑖2𝜋𝑗 } = 0
𝑡=0 𝑡=0
𝑛
87.3. PERIODOGRAMS 1485
𝑛−1
Letting 𝑋̄ denote the sample mean 𝑛−1 ∑𝑡=0 𝑋𝑡 , we then have
2
𝑛−1 𝑛−1 𝑛−1
̄
𝑛𝐼(𝜔𝑗 ) = ∣∑(𝑋𝑡 − 𝑋)𝑒 𝑖𝑡𝜔𝑗 ̄ 𝑖𝑡𝜔𝑗 ∑(𝑋𝑟 − 𝑋)𝑒
∣ = ∑(𝑋𝑡 − 𝑋)𝑒 ̄ −𝑖𝑟𝜔𝑗
𝑡=0 𝑡=0 𝑟=0
Now let
1 𝑛−1 ̄ ̄
𝛾(𝑘)
̂ ∶= ∑(𝑋 − 𝑋)(𝑋 𝑡−𝑘 − 𝑋), 𝑘 = 0, 1, … , 𝑛 − 1
𝑛 𝑡=𝑘 𝑡
This is the sample autocovariance function, the natural “plug-in estimator” of the autocovari-
ance function 𝛾.
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations
with sample means)
With this notation, we can now write
𝑛−1
𝐼(𝜔𝑗 ) = 𝛾(0)
̂ + 2 ∑ 𝛾(𝑘)
̂ cos(𝜔𝑗 𝑘)
𝑘=1
Recalling our expression for 𝑓 given above, we see that 𝐼(𝜔𝑗 ) is just a sample analog of 𝑓(𝜔𝑗 ).
87.3.2 Calculation
𝑛−1
𝑡𝑗
𝐴𝑗 ∶= ∑ 𝑎𝑡 exp {𝑖2𝜋 }, 𝑗 = 0, … , 𝑛 − 1
𝑡=0
𝑛
With numpy.fft.fft imported as fft and 𝑎0 , … , 𝑎𝑛−1 stored in NumPy array a, the func-
tion call fft(a) returns the values 𝐴0 , … , 𝐴𝑛−1 as a NumPy array.
It follows that when the data 𝑋0 , … , 𝑋𝑛−1 are stored in array X, the values 𝐼(𝜔𝑗 ) at the
Fourier frequencies, which are given by
1486 CHAPTER 87. ESTIMATION OF SPECTRA
2
1 𝑛−1 𝑡𝑗
∣∑ 𝑋 exp {𝑖2𝜋 }∣ , 𝑗 = 0, … , 𝑛 − 1
𝑛 𝑡=0 𝑡 𝑛
where {𝜖𝑡 } is white noise with unit variance, and compares the periodogram to the actual
spectral density
fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
x_sd, y_sd = lp.spectral_density(two_pi=False, res=120)
ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')
ax.legend()
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py:538:
ComplexWarning: Casting complex values to real discards the imaginary part
return array(a, dtype, copy=False, order=order)
87.3. PERIODOGRAMS 1487
This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not sur-
prising that the estimate is poor.
However, if we try again with n = 1200 the outcome is not much better
The periodogram is far too irregular relative to the underlying spectral density.
This brings us to our next topic.
1488 CHAPTER 87. ESTIMATION OF SPECTRA
87.4 Smoothing
𝑝
𝐼𝑆 (𝜔𝑗 ) ∶= ∑ 𝑤(ℓ)𝐼(𝜔𝑗+ℓ ) (3)
ℓ=−𝑝
where the weights 𝑤(−𝑝), … , 𝑤(𝑝) are a sequence of 2𝑝 + 1 nonnegative values summing to
one.
In general, larger values of 𝑝 indicate more smoothing — more on this below.
The next figure shows the kind of sequence typically used.
Note the smaller weights towards the edges and larger weights in the center, so that more dis-
tant values from 𝐼(𝜔𝑗 ) have less weight than closer ones in the sum (3).
Our next step is to provide code that will not only estimate the periodogram but also provide
smoothing as required.
Such functions have been written in estspec.py and are available once you’ve installed Quan-
tEcon.py.
The GitHub listing displays three functions, smooth(), periodogram(),
ar_periodogram(). We will discuss the first two here and the third one below.
The periodogram() function returns a periodogram, optionally smoothed via the
smooth() function.
Regarding the smooth() function, since smoothing adds a nontrivial amount of computa-
tion, we have applied a fairly terse array-centric method based around np.convolve.
Readers are left either to explore or simply to use this code according to their interests.
The next three figures each show smoothed and unsmoothed periodograms, as well as the
population or “true” spectral density.
(The model is the same as before — see equation (2) — and there are 400 observations)
1490 CHAPTER 87. ESTIMATION OF SPECTRA
From the top figure to bottom, the window length is varied from small to large.
In looking at the figure, we can see that for this model and data size, the window length cho-
sen in the middle figure provides the best fit.
Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing.
Of course in real estimation problems, the true spectral density is not visible and the choice
of appropriate smoothing will have to be made based on judgement/priors or some other the-
ory.
In the code listing, we showed three functions from the file estspec.py.
The third function in the file (ar_periodogram()) adds a pre-processing step to peri-
odogram smoothing.
87.4. SMOOTHING 1491
First, we describe the basic idea, and after that we give the code.
The essential idea is to
1. Transform the data in order to make estimation of the spectral density more efficient.
2. Compute the periodogram associated with the transformed data.
3. Reverse the effect of the transformation on the periodogram, so that it now estimates
the spectral density of the original process.
Let’s examine this idea more carefully in a particular setting — where the data are assumed
to be generated by an AR(1) process.
(More general ARMA settings can be handled using similar techniques to those described be-
low)
Suppose in particular that {𝑋𝑡 } is covariance stationary and AR(1), with
where 𝜇 and 𝜙 ∈ (−1, 1) are unknown parameters and {𝜖𝑡 } is white noise.
It follows that if we regress 𝑋𝑡+1 on 𝑋𝑡 and an intercept, the residuals will approximate white
noise.
Let
• 𝑔 be the spectral density of {𝜖𝑡 } — a constant function, as discussed above
• 𝐼0 be the periodogram estimated from the residuals — an estimate of 𝑔
• 𝑓 be the spectral density of {𝑋𝑡 } — the object we are trying to estimate
In view of an earlier result we obtained while discussing ARMA processes, 𝑓 and 𝑔 are related
by
1492 CHAPTER 87. ESTIMATION OF SPECTRA
2
1
𝑓(𝜔) = ∣ ∣ 𝑔(𝜔) (5)
1 − 𝜙𝑒𝑖𝜔
This suggests that the recoloring step, which constructs an estimate 𝐼 of 𝑓 from 𝐼0 , should set
2
1
𝐼(𝜔) = ∣ ∣ 𝐼0 (𝜔)
1 − 𝜙𝑒̂ 𝑖𝜔
The periodograms are calculated from time series drawn from (4) with 𝜇 = 0 and 𝜙 = −0.9.
Each time series is of length 150.
The difference between the three subfigures is just randomness — each one uses a different
87.5. EXERCISES 1493
In all cases, periodograms are fit with the “hamming” window and window length of 65.
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer
to the true spectral density.
87.5 Exercises
87.5.1 Exercise 1
87.5.2 Exercise 2
87.6 Solutions
87.6.1 Exercise 1
In [5]: ## Data
n = 400
ϕ = 0.5
θ = 0, -0.8
lp = ARMA(ϕ, θ)
X = lp.simulation(ts_length=n)
x, y = periodogram(X)
ax[i].plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
ax[i].legend()
ax[i].set_title(f'window length = {wl}')
plt.show()
87.6. SOLUTIONS 1495
87.6.2 Exercise 2
In [6]: lp = ARMA(-0.9)
wl = 65
for i in range(3):
X = lp.simulation(ts_length=150)
ax[i].set_xlim(0, np.pi)
ax[i].legend(loc='upper left')
plt.show()
Chapter 88
88.1 Contents
• Overview 88.2
• A Particular Additive Functional 88.3
• Dynamics 88.4
• Code 88.5
• More About the Multiplicative Martingale 88.6
Co-authors: Chase Coleman and Balint Szoke
In addition to what’s in Anaconda, this lecture will need the following libraries:
88.2 Overview
Many economic time series display persistent growth that prevents them from being asymp-
totically stationary and ergodic.
For example, outputs, prices, and dividends typically display irregular but persistent growth.
Asymptotic stationarity and ergodicity are key assumptions needed to make it possible to
learn by applying statistical methods.
Are there ways to model time series having persistent growth that still enables statistical
learning based on a law of large number for an asymptotically stationary and ergodic process?
The answer provided by Hansen and Scheinkman [79] is yes.
They described two classes of time series models that accommodate growth.
They are
1497
1498 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
1. a constant
2. a trend component
4. a martingale
Here
• 𝑥𝑡 is an 𝑛 × 1 vector,
• 𝐴 is an 𝑛 × 𝑛 stable matrix (all eigenvalues lie within the open unit circle),
• 𝑧𝑡+1 ∼ 𝑁 (0, 𝐼) is an 𝑚 × 1 IID shock,
• 𝐵 is an 𝑛 × 𝑚 matrix, and
88.4. DYNAMICS 1499
• a scalar constant 𝜈,
• the vector 𝑥𝑡 , and
• the same Gaussian vector 𝑧𝑡+1 that appears in the VAR (1)
In particular,
A convenient way to represent our additive functional is to use a linear state space system.
To do this, we set up state and observation vectors
1
𝑥
𝑥𝑡̂ = ⎡𝑥
⎢ 𝑡⎥
⎤ and 𝑦𝑡̂ = [ 𝑡 ]
𝑦𝑡
⎣ 𝑦𝑡 ⎦
1 1 0 0 1 0
⎡𝑥 ⎤ = ⎡0 𝐴 0⎤ ⎡𝑥 ⎤ + ⎡ 𝐵 ⎤ 𝑧
⎢ 𝑡+1 ⎥ ⎢ ⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡+1
′ ′
⎣ 𝑦𝑡+1 ⎦ ⎣𝜈 𝐷 1⎦ ⎣ 𝑦𝑡 ⎦ ⎣𝐹 ⎦
1
𝑥𝑡 0 𝐼 0 ⎡ ⎤
[ ]=[ ] 𝑥
𝑦𝑡 0 0 1 ⎢ 𝑡⎥
⎣ 𝑦𝑡 ⎦
𝑥𝑡+1
̂ = 𝐴𝑥 ̂ ̂ + 𝐵𝑧
̂ 𝑡+1
𝑡
𝑦𝑡̂ = 𝐷̂ 𝑥𝑡̂
88.4 Dynamics
𝑥𝑡+1
̃ = 𝜙1 𝑥𝑡̃ + 𝜙2 𝑥𝑡−1
̃ + 𝜙3 𝑥𝑡−2
̃ + 𝜙4 𝑥𝑡−3
̃ + 𝜎𝑧𝑡+1 (3)
𝜙(𝑧) = (1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − 𝜙3 𝑧3 − 𝜙4 𝑧4 )
88.4.1 Simulation
In [3]: """
@authors: Chase Coleman, Balint Szoke, Tom Sargent
"""
class AMF_LSS_VAR:
"""
This class transforms an additive (multiplicative)
functional into a QuantEcon linear state space system.
"""
self.D = D
elif len(D.shape) > 1 and D.shape[0] == 1:
self.nm = 1
self.D = D
else:
self.nm = 1
self.D = np.expand_dims(D, 0)
# Set F
if not np.any(F):
self.F = np.zeros((self.nk, 1))
else:
self.F = F
# Set ν
if not np.any(ν):
self.ν = np.zeros((self.nm, 1))
elif type(ν) == float:
self.ν = np.asarray([[ν]])
elif len(ν.shape) == 1:
self.ν = np.expand_dims(ν, 1)
else:
self.ν = ν
if self.ν.shape[0] != self.D.shape[0]:
raise ValueError("The dimension of ν is inconsistent with D!")
def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
nx, nk, nm = self.nx, self.nk, self.nm
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()
# Auxiliary blocks with 0's and 1's to fill out the lss matrices
nx0c = np.zeros((nx, 1))
nx0r = np.zeros(nx)
nx1 = np.ones(nx)
nk0 = np.zeros(nk)
ny0c = np.zeros((nm, 1))
ny0r = np.zeros(nm)
ny1m = np.eye(nm)
ny0m = np.zeros((nm, nm))
nyx0m = np.zeros_like(D)
1502 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (κ_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
I = np.identity(self.nx)
A_res = la.solve(I - self.A, I)
g = self.D @ A_res
H = self.F + self.D @ A_res @ self.B
return self.ν, H, g
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
88.4. DYNAMICS 1503
return ν_tilde, H, g
return llh[-1]
"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm
sadd_dist = norm(ymeans[nx+2*nm+ii],
1504 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
np.sqrt(yvar[nx+2*nm+ii, nx+2*nm+ii]))
sbounds[li:ui, t] = sadd_dist.ppf([0.01, .99])
add_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
add_figs.append(self.plot_given_paths(T,
ypath[li:ui,:],
mpath[li:ui,:],
spath[li:ui,:],
tpath[li:ui,:],
mbounds[LI:UI,:],
sbounds[LI:UI,:],
show_trend=show_trend))
return add_figs
"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm
# Matrices for the multiplicative decomposition
ν_tilde, H, g = self.multiplicative_decomp()
mult_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mult_figs.append(self.plot_given_paths(T,
ypath_mult[li:ui,:],
mpath_mult[li:ui,:],
spath_mult[li:ui,:],
tpath_mult[li:ui,:],
mbounds_mult[LI:UI,:],
sbounds_mult[LI:UI,:],
1,
show_trend=show_trend))
1506 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
mult_figs[ii].suptitle(f'Multiplicative decomposition of \
$y_{ii+1}$', fontsize=14)
return mult_figs
mart_figs = []
for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mart_figs.append(self.plot_martingale_paths(T, mpath_mult[li:
↪ ui, :],
88.4. DYNAMICS 1507
mbounds_mult[LI:UI, :],
horline=1))
mart_figs[ii].suptitle(f'Martingale components for many paths�
↪ of \
$y_{ii+1}$', fontsize=14)
return mart_figs
# Allocate space
trange = np.arange(T)
# Create figure
fig, ax = plt.subplots(2, 2, sharey=True, figsize=(15, 8))
return fig
# Create figure
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
return fig
For now, we just plot 𝑦𝑡 and 𝑥𝑡 , postponing until later a description of exactly how we com-
pute them.
# A matrix should be n x n
A = np.array([[ϕ_1, ϕ_2, ϕ_3, ϕ_4],
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0]])
# B matrix should be n x k
B = np.array([[σ, 0, 0, 0]]).T
D = np.array([1, 0, 0, 0]) @ A
F = np.array([1, 0, 0, 0]) @ B
T = 150
x, y = amf.lss.simulate(T)
88.4.2 Decomposition
Hansen and Sargent [77] describe how to construct a decomposition of an additive functional
into four parts:
• a constant inherited from initial values 𝑥0 and 𝑦0
• a linear trend
• a martingale
• an (asymptotically) stationary component
To attain this decomposition for the particular class of additive functionals defined by (1) and
(2), we first construct the matrices
𝐻 ∶= 𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷
𝑔 ∶= 𝐷′ (𝐼 − 𝐴)−1
Martingale component
⏞
𝑡 initial conditions
𝑦𝑡 = 𝑡𝜈
⏟ + ∑ 𝐻𝑧𝑗 − 𝑔𝑥
⏟𝑡 + 𝑔⏞
𝑥 0 + 𝑦0
trend component 𝑗=1 stationary component
At this stage, you should pause and verify that 𝑦𝑡+1 − 𝑦𝑡 satisfies (2).
It is convenient for us to introduce the following notation:
• 𝜏𝑡 = 𝜈𝑡 , a linear, deterministic trend
𝑡
• 𝑚𝑡 = ∑𝑗=1 𝐻𝑧𝑗 , a martingale with time 𝑡 + 1 increment 𝐻𝑧𝑡+1
• 𝑠𝑡 = 𝑔𝑥𝑡 , an (asymptotically) stationary component
We want to characterize and simulate components 𝜏𝑡 , 𝑚𝑡 , 𝑠𝑡 of the decomposition.
A convenient way to do this is to construct an appropriate instance of a linear state space
system by using LinearStateSpace from QuantEcon.py.
This will allow us to use the routines in LinearStateSpace to study dynamics.
To start, observe that, under the dynamics in (1) and (2) and with the definitions just given,
1 1 0 0 0 0 1 0
⎡ 𝑡 + 1 ⎤ ⎡1 1 0 0 0 ⎤ ⎡ 𝑡 ⎤ ⎡ 0⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 𝑥𝑡+1 ⎥ = ⎢0 0 𝐴 0 0⎥ ⎢ 𝑥𝑡 ⎥ + ⎢ 𝐵 ⎥ 𝑧𝑡+1
⎢ 𝑦𝑡+1 ⎥ ⎢𝜈 0 𝐷′ 1 0⎥ ⎢ 𝑦𝑡 ⎥ ⎢ 𝐹 ′ ⎥
⎣𝑚𝑡+1 ⎦ ⎣0 0 0 0 1⎦ ⎣𝑚𝑡 ⎦ ⎣𝐻 ′ ⎦
and
𝑥𝑡 0 0 𝐼 0 0 1
⎡ 𝑦 ⎤ ⎡0 0 0 1 0⎤ ⎡ 𝑡 ⎤
⎢ 𝑡⎥ ⎢ ⎥⎢ ⎥
⎢ 𝜏𝑡 ⎥ = ⎢0 𝜈 0 0 0⎥ ⎢ 𝑥𝑡 ⎥
⎢𝑚𝑡 ⎥ ⎢0 0 0 0 1 ⎥ ⎢ 𝑦𝑡 ⎥
⎣ 𝑠𝑡 ⎦ ⎣0 0 −𝑔 0 0⎦ ⎣𝑚𝑡 ⎦
With
1 𝑥𝑡
⎡ 𝑡 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑡⎥
𝑥̃ ∶= ⎢ 𝑥𝑡 ⎥ and 𝑦 ̃ ∶= ⎢ 𝜏𝑡 ⎥
⎢ 𝑦𝑡 ⎥ ⎢𝑚𝑡 ⎥
⎣𝑚𝑡 ⎦ ⎣ 𝑠𝑡 ⎦
𝑥𝑡+1
̃ = 𝐴𝑥 ̃ ̃ + 𝐵𝑧
̃ 𝑡+1
𝑡
𝑦𝑡̃ = 𝐷̃ 𝑥𝑡̃
88.5 Code
The class AMF_LSS_VAR mentioned above does all that we want to study our additive
functional.
In fact, AMF_LSS_VAR does more because it allows us to study an associated multiplicative
functional as well.
(A hint that it does more is the name of the class – here AMF stands for “additive and mul-
tiplicative functional” – the code computes and displays objects associated with multiplicative
functionals too.)
Let’s use this code (embedded above) to explore the example process described above.
If you run the code that first simulated that example again and then the method call you will
generate (modulo randomness) the plot
In [5]: amf.plot_additive(T)
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-
packages/scipy/stats/_distn_infrastructure.py:1983: RuntimeWarning: invalid value
encountered in multiply
lower_bound = _a * scale + loc
/home/ubuntu/anaconda3/lib/python3.7/site-
packages/scipy/stats/_distn_infrastructure.py:1984: RuntimeWarning: invalid value
encountered in multiply
upper_bound = _b * scale + loc
When we plot multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also
plot the population 95% probability coverage sets computed using the LinearStateSpace class.
We have chosen to simulate many paths, all starting from the same non-random initial condi-
tions 𝑥0 , 𝑦0 (you can tell this from the shape of the 95% probability coverage shaded areas).
1512 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
𝑡
𝑀𝑡
= exp(𝑡𝜈) exp(∑ 𝐻 ⋅ 𝑍𝑗 ) exp(𝐷′ (𝐼 − 𝐴)−1 𝑥0 − 𝐷′ (𝐼 − 𝐴)−1 𝑥𝑡 )
𝑀0 𝑗=1
or
𝑀𝑡 ̃
𝑀 𝑒(𝑋
̃ 0)
̃ ( 𝑡)(
= exp (𝜈𝑡) )
𝑀0 ̃0
𝑀 𝑒(𝑥
̃ 𝑡)
where
𝑡
𝐻 ⋅𝐻 ̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )), ̃0 = 1
𝜈̃ = 𝜈 + , 𝑀 𝑀
2 𝑗=1
2
and
𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷′ (𝐼 − 𝐴)−1 𝑥]
In [6]: amf.plot_multiplicative(T)
plt.show()
88.5. CODE 1513
As before, when we plotted multiple realizations of a component in the 2nd, 3rd, and 4th
panels, we also plotted population 95% confidence bands computed using the LinearStateS-
pace class.
Comparing this figure and the last also helps show how geometric growth differs from arith-
metic growth.
The top right panel of the above graph shows a panel of martingales associated with the
panel of 𝑀𝑡 = exp(𝑦𝑡 ) that we have generated for a limited horizon 𝑇 .
It is interesting to how the martingale behaves as 𝑇 → +∞.
Let’s see what happens when we set 𝑇 = 12000 instead of 150.
Hansen and Sargent [77] (ch. 8) describe the following two properties of the martingale com-
̃𝑡 of the multiplicative decomposition
ponent 𝑀
̃𝑡 = 1 for all 𝑡 ≥ 0, nevertheless …
• while 𝐸0 𝑀
• as 𝑡 → +∞, 𝑀̃𝑡 converges to zero almost surely
̃𝑡 is a multiplicative martingale with initial
The first property follows from the fact that 𝑀
condition 𝑀̃0 = 1.
The second is a peculiar property noted and proved by Hansen and Sargent [77].
̃𝑡 illustrates both properties
The following simulation of many paths of 𝑀
In [7]: np.random.seed(10021987)
amf.plot_martingales(12000)
plt.show()
1514 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
The dotted line in the above graph is the mean 𝐸 𝑀̃ 𝑡 = 1 of the martingale.
It remains constant at unity, illustrating the first property.
The purple 95 percent frequency coverage interval collapses around zero, illustrating the sec-
ond property.
̃𝑡 }∞
Let’s drill down and study probability distribution of the multiplicative martingale {𝑀 𝑡=0
in more detail.
As we have seen, it has representation
𝑡
̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )),
𝑀 ̃0 = 1
𝑀
𝑗=1
2
In particular, we want to simulate 5000 sample paths of length 𝑇 for the case in which 𝑥 is a
scalar and [𝐴, 𝐵, 𝐷, 𝐹 ] = [0.8, 0.001, 1.0, 0.01] and 𝜈 = 0.005.
After accomplishing this, we want to display and study histograms of 𝑀̃ 𝑇𝑖 for various values
of 𝑇 .
88.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1515
We’ll do this by formulating the additive functional as a linear state space model and putting
the LinearStateSpace class to work.
In [8]: """
"""
class AMF_LSS_VAR:
"""
This class is written to transform a scalar additive functional
into a linear state space system.
"""
def __init__(self, A, B, D, F=0.0, ν=0.0):
# Unpack required elements
self.A, self.B, self.D, self.F, self.ν = A, B, D, F, ν
def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
nx, nk, nm = 1, 1, 1
if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()
return lss
def additive_decomp(self):
"""
Return values for the martingale decomposition (Proposition 4.3.3.)
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component�
↪ (kappa_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
A_res = 1 / (1 - self.A)
g = self.D * A_res
H = self.F + self.D * A_res * self.B
return self.ν, H, g
def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5) * H**2
return ν_tilde, H, g
llh = self.loglikelihood_path(x, y)
return llh[-1]
return x, y
# Allocate space
storeX = np.empty((I, T))
storeY = np.empty((I, T))
for i in range(I):
# Do specific simulation
x, y = simulate_xy(amf, T)
Now that we have these functions in our took kit, let’s apply them to run some simulations.
# Allocate space
add_mart_comp = np.empty((I, T))
# Build model
amf_2 = AMF_LSS_VAR(0.8, 0.001, 1.0, 0.01,.005)
# The distribution
mdist = lognorm(np.sqrt(t*H2), scale=np.exp(-t*H2/2))
x = np.linspace(xmin, xmax, npts)
pdf = mdist.pdf(x)
return x, pdf
# The distribution
lmdist = norm(-t*H2/2, np.sqrt(t*H2))
x = np.linspace(xmin, xmax, npts)
pdf = lmdist.pdf(x)
return x, pdf
plt.tight_layout()
plt.show()
1520 CHAPTER 88. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS
88.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1521
These probability density functions help us understand mechanics underlying the peculiar
property of our multiplicative martingale
• As 𝑇 grows, most of the probability mass shifts leftward toward zero.
• For example, note that most mass is near 1 for 𝑇 = 10 or 𝑇 = 100 but most of it is near
0 for 𝑇 = 5000.
̃𝑇 lengthens toward the right.
• As 𝑇 grows, the tail of the density of 𝑀
• Enough mass moves toward the right tail to keep 𝐸 𝑀 ̃𝑇 = 1 even as most mass in the
̃
distribution of 𝑀𝑇 collapses around 0.
89.1 Contents
• Overview 89.2
• A Control Problem 89.3
• Finite Horizon Theory 89.4
• The Infinite Horizon Limit 89.5
• Undiscounted Problems 89.6
• Implementation 89.7
• Exercises 89.8
89.2 Overview
1523
1524 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In this lecture and the sequel Classical Filtering with Linear Algebra, we mostly rely on ele-
mentary linear algebra.
The main tool from linear algebra we’ll put to work here is LU decomposition.
We’ll begin with discrete horizon problems.
Then we’ll view infinite horizon problems as appropriate limits of these finite horizon prob-
lems.
Later, we will examine the close connection between LQ control and least-squares prediction
and filtering problems.
These classes of problems are connected in the sense that to solve each, essentially the same
mathematics is used.
Let’s start with some standard imports:
89.2.1 References
Let 𝐿 be the lag operator, so that, for sequence {𝑥𝑡 } we have 𝐿𝑥𝑡 = 𝑥𝑡−1 .
More generally, let 𝐿𝑘 𝑥𝑡 = 𝑥𝑡−𝑘 with 𝐿0 𝑥𝑡 = 𝑥𝑡 and
𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚
𝑁
1 2 1 2
max lim ∑ 𝛽 𝑡 {𝑎𝑡 𝑦𝑡 − ℎ𝑦 − [𝑑(𝐿)𝑦𝑡 ] } , (1)
{𝑦𝑡 } 𝑁→∞
𝑡=0
2 𝑡 2
where
• ℎ is a positive parameter and 𝛽 ∈ (0, 1) is a discount factor.
• {𝑎𝑡 }𝑡≥0 is a sequence of exponential order less than 𝛽 −1/2 , by which we mean
𝑡
lim𝑡→∞ 𝛽 2 𝑎𝑡 = 0.
Maximization in (1) is subject to initial conditions for 𝑦−1 , 𝑦−2 … , 𝑦−𝑚 .
Maximization is over infinite sequences {𝑦𝑡 }𝑡≥0 .
89.4. FINITE HORIZON THEORY 1525
89.3.1 Example
The formulation of the LQ problem given above is broad enough to encompass many useful
models.
As a simple illustration, recall that in LQ Control: Foundations we consider a monopolist fac-
ing stochastic demand shocks and adjustment costs.
Let’s consider a deterministic version of this problem, where the monopolist maximizes the
discounted sum
∞
∑ 𝛽 𝑡 𝜋𝑡
𝑡=0
and
1. fixing 𝑁 > 𝑚,
𝑁
𝐽 = ∑ 𝛽 𝑡 [𝑑(𝐿)𝑦𝑡 ][𝑑(𝐿)𝑦𝑡 ]
𝑡=0
𝑁
= ∑ 𝛽 𝑡 (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 ) (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 )
𝑡=0
𝜕𝐽
= 2𝛽 𝑡 𝑑0 𝑑(𝐿)𝑦𝑡 + 2𝛽 𝑡+1 𝑑1 𝑑(𝐿)𝑦𝑡+1 + ⋯ + 2𝛽 𝑡+𝑚 𝑑𝑚 𝑑(𝐿)𝑦𝑡+𝑚
𝜕𝑦𝑡
= 2𝛽 𝑡 (𝑑0 + 𝑑1 𝛽𝐿−1 + 𝑑2 𝛽 2 𝐿−2 + ⋯ + 𝑑𝑚 𝛽 𝑚 𝐿−𝑚 ) 𝑑(𝐿)𝑦𝑡
𝜕𝐽
= 2𝛽 𝑡 𝑑(𝛽𝐿−1 ) 𝑑(𝐿)𝑦𝑡 (2)
𝜕𝑦𝑡
𝜕𝐽
= 2𝛽 𝑁 𝑑0 𝑑(𝐿)𝑦𝑁
𝜕𝑦𝑁
𝜕𝐽
= 2𝛽 𝑁−1 [𝑑0 + 𝛽 𝑑1 𝐿−1 ] 𝑑(𝐿)𝑦𝑁−1
𝜕𝑦𝑁−1 (3)
⋮ ⋮
𝜕𝐽
= 2𝛽 𝑁−𝑚+1 [𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 ]𝑑(𝐿)𝑦𝑁−𝑚+1
𝜕𝑦𝑁−𝑚+1
With these preliminaries under our belts, we are ready to differentiate (1).
Differentiating (1) with respect to 𝑦𝑡 for 𝑡 = 0, … , 𝑁 − 𝑚 gives the Euler equations
The system of equations (4) forms a 2 × 𝑚 order linear difference equation that must hold for
the values of 𝑡 indicated.
Differentiating (1) with respect to 𝑦𝑡 for 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 gives the terminal conditions
That is, for the finite 𝑁 problem, conditions (4) and (5) are necessary and sufficient for a
maximum, by concavity of the objective function.
Next, we describe how to obtain the solution using matrix methods.
Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ
control problem.
[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑 (𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 1
(6)
𝛽 𝑁 [𝑎𝑁 − ℎ 𝑦𝑁 − 𝑑0 𝑑 (𝐿)𝑦𝑁 ] = 0
where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿.
These equations are to be solved for 𝑦0 , 𝑦1 , … , 𝑦𝑁 as functions of 𝑎0 , 𝑎1 , … , 𝑎𝑁 and 𝑦−1 .
Let
(𝜙0 − 𝑑12 ) 𝜙1 0 0 … … 0 𝑦𝑁 𝑎𝑁
⎡ 𝛽𝜙 𝜙 𝜙 0 … … 0 ⎤ ⎡𝑦 ⎤ ⎡ 𝑎 ⎤
⎢ 1 0 1 ⎥ ⎢ 𝑁−1 ⎥ ⎢ 𝑁−1 ⎥
⎢ 0 𝛽𝜙1 𝜙0 𝜙1 … … 0 ⎥ ⎢𝑦𝑁−2 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥ ⎢ ⎥=⎢ ⎥ (7)
⎢ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 … … … 𝛽𝜙1 𝜙0 𝜙1 ⎥ ⎢ 𝑦1 ⎥ ⎢ 𝑎1 ⎥
⎣ 0 … … … 0 𝛽𝜙1 𝜙0 ⎦ ⎣ 𝑦0 ⎦ ⎣𝑎0 − 𝜙1 𝑦−1 ⎦
or
𝑊 𝑦 ̄ = 𝑎̄ (8)
1. The first element differs from the remaining diagonal elements, reflecting the terminal
condition.
𝑦 ̄ = 𝑊 −1 𝑎̄ (9)
An Alternative Representation
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (10)
1 𝑈12 0 0 … 0 0 𝑦𝑁
⎡0 1 𝑈 0 … 0 0 ⎤ ⎡𝑦 ⎤
⎢ 23 ⎥ ⎢ 𝑁−1 ⎥
⎢0 0 1 𝑈34 … 0 0 ⎥ ⎢𝑦𝑁−2 ⎥
⎢0 0 0 1 … 0 0 ⎥ ⎢𝑦𝑁−3 ⎥ =
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 … 1 𝑈𝑁,𝑁+1 ⎥ ⎢ 𝑦1 ⎥
⎣0 0 0 0 … 0 1 ⎦ ⎣ 𝑦0 ⎦
89.4. FINITE HORIZON THEORY 1529
𝐿−1
11 0 0 … 0 𝑎𝑁
⎡ 𝐿−1 −1
𝐿22 0 … 0 ⎤ ⎡ ⎤
⎢ 21 ⎥ ⎢ 𝑎𝑁−1 ⎥
⎢ 𝐿−1
31 𝐿−1
32 𝐿−1
33 … 0 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥
⎢ −1 −1 −1
⎥⎢ ⎥
⎢ 𝐿𝑁,1 𝐿𝑁,2 𝐿𝑁,3 … 0 ⎥⎢ 𝑎1 ⎥
−1 −1 −1
𝐿
⎣ 𝑁+1,1 𝐿𝑁+1,2 𝐿𝑁+1,3 … 𝐿−1 𝑎
𝑁+1 𝑁+1 ⎦ ⎣ 0 − 𝜙 𝑦
1 −1 ⎦
where 𝐿−1
𝑖𝑗 is the (𝑖, 𝑗) element of 𝐿
−1
and 𝑈𝑖𝑗 is the (𝑖, 𝑗) element of 𝑈 .
Note how the left side for a given 𝑡 involves 𝑦𝑡 and one lagged value 𝑦𝑡−1 while the right side
involves all future values of the forcing process 𝑎𝑡 , 𝑎𝑡+1 , … , 𝑎𝑁 .
We briefly indicate how this approach extends to the problem with 𝑚 > 1.
Assume that 𝛽 = 1 and let 𝐷𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) symmetric matrix whose elements
are determined from the following formula:
𝑦𝑁 𝑎𝑁 𝑦𝑁−𝑚+1
⎡𝑦 ⎤ ⎡𝑎 ⎤ ⎡𝑦 ⎤
(𝐷𝑚+1 + ℎ𝐼𝑚+1 ) ⎢ 𝑁−1 ⎥ = ⎢ 𝑁−1 ⎥ + 𝑀 ⎢ 𝑁−𝑚−2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑦
⎣ 𝑁−𝑚 ⎦ ⎣𝑎𝑁−𝑚 ⎦ 𝑦
⎣ 𝑁−2𝑚 ⎦
where 𝑀 is (𝑚 + 1) × 𝑚 and
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (11)
𝑡 𝑁−𝑡
∑ 𝑈−𝑡+𝑁+1, −𝑡+𝑁+𝑗+1 𝑦𝑡−𝑗 = ∑ 𝐿−𝑡+𝑁+1, −𝑡+𝑁+1−𝑗 𝑎𝑡+𝑗
̄ ,
𝑗=0 𝑗=0
𝑡 = 0, 1, … , 𝑁
where 𝐿−1
𝑡,𝑠 is the element in the (𝑡, 𝑠) position of 𝐿, and similarly for 𝑈 .
The left side of equation (11) is the “feedback” part of the optimal control law for 𝑦𝑡 , while
the right-hand side is the “feedforward” part.
We note that there is a different control law for each 𝑡.
Thus, in the finite horizon case, the optimal control law is time-dependent.
It is natural to suspect that as 𝑁 → ∞, (11) becomes equivalent to the solution of our infinite
horizon problem, which below we shall show can be expressed as
−1
so that as 𝑁 → ∞ we expect that for each fixed 𝑡, 𝑈𝑡,𝑡−𝑗 → 𝑐𝑗 and 𝐿𝑡,𝑡+𝑗 approaches the
−𝑗 −1 −1
coefficient on 𝐿 in the expansion of 𝑐(𝛽𝐿 ) .
This suspicion is true under general conditions that we shall study later.
For now, we note that by creating the matrix 𝑊 for large 𝑁 and factoring it into the 𝐿𝑈
form, good approximations to 𝑐(𝐿) and 𝑐(𝛽𝐿−1 )−1 can be obtained.
For the infinite horizon problem, we propose to discover first-order necessary conditions by
taking the limits of (4) and (5) as 𝑁 → ∞.
This approach is valid, and the limits of (4) and (5) as 𝑁 approaches infinity are first-order
necessary conditions for a maximum.
However, for the infinite horizon problem with 𝛽 < 1, the limits of (4) and (5) are, in general,
not sufficient for a maximum.
That is, the limits of (5) do not provide enough information uniquely to determine the solu-
tion of the Euler equation (4) that maximizes (1).
As we shall see below, a side condition on the path of 𝑦𝑡 that together with (4) is sufficient
for an optimum is
∞
∑ 𝛽 𝑡 ℎ𝑦𝑡2 < ∞ (12)
𝑡=0
All paths that satisfy the Euler equations, except the one that we shall select below, violate
89.5. THE INFINITE HORIZON LIMIT 1531
this condition and, therefore, evidently lead to (much) lower values of (1) than does the opti-
mal path selected by the solution procedure below.
Consider the characteristic equation associated with the Euler equation
where 𝑧0 is a constant.
In (14), we substitute (𝑧 − 𝑧𝑗 ) = −𝑧𝑗 (1 − 𝑧1 𝑧) and (𝑧 − 𝛽𝑧𝑗−1 ) = 𝑧(1 − 𝑧𝛽 𝑧−1 ) for 𝑗 = 1, … , 𝑚 to
𝑗 𝑗
get
1 1 1 1
ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = (−1)𝑚 (𝑧0 𝑧1 ⋯ 𝑧𝑚 )(1 − 𝑧) ⋯ (1 − 𝑧)(1 − 𝛽𝑧 −1 ) ⋯ (1 − 𝛽𝑧−1 )
𝑧1 𝑧𝑚 𝑧1 𝑧𝑚
𝑚
Now define 𝑐(𝑧) = ∑𝑗=0 𝑐𝑗 𝑧𝑗 as
1/2 𝑧 𝑧 𝑧
𝑐 (𝑧) = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] (1 − ) (1 − ) ⋯ (1 − ) (15)
𝑧1 𝑧2 𝑧𝑚
𝑐(𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧) (17)
where
1/2 1
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] ; 𝜆𝑗 = , 𝑗 = 1, … , 𝑚
𝑧𝑗
√ √
Since |𝑧𝑗 | > 𝛽 for 𝑗 = 1, … , 𝑚 it follows that |𝜆𝑗 | < 1/ 𝛽 for 𝑗 = 1, … , 𝑚.
1532 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In sum, we have constructed a factorization (16) of the characteristic polynomial for the Euler
equation in which the zeros of 𝑐(𝑧) exceed 𝛽 1/2 in modulus, and the zeros of 𝑐 (𝛽𝑧 −1 ) are less
than 𝛽 1/2 in modulus.
Using (16), we now write the Euler equation as
𝑐(𝛽𝐿−1 ) 𝑐 (𝐿) 𝑦𝑡 = 𝑎𝑡
The unique solution of the Euler equation that satisfies condition (12) is
𝑐0−2 𝑎𝑡
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = (19)
(1 − 𝛽𝜆1 𝐿−1 ) ⋯ (1 − 𝛽𝜆𝑚 𝐿−1 )
Using partial fractions, we can write the characteristic polynomial on the right side of (19) as
𝑚
𝐴𝑗 𝑐0−2
∑ where 𝐴𝑗 ∶= 𝜆𝑖
1 − 𝜆𝑗 𝛽𝐿−1 ∏𝑖≠𝑗 (1 −
𝑗=1 𝜆𝑗 )
𝑚
𝐴𝑗
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝑎
𝑗=1
1 − 𝜆𝑗 𝛽𝐿−1 𝑡
or
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (20)
𝑗=1 𝑘=0
Equation (20) expresses the optimum sequence for 𝑦𝑡 in terms of 𝑚 lagged 𝑦’s, and 𝑚
weighted infinite geometric sums of future 𝑎𝑡 ’s.
Furthermore, (20) is the unique solution of the Euler equation that satisfies the initial condi-
tions and condition (12).
In effect, condition (12) compels us to solve the “unstable” roots of ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) forward
(see [142]).
−1 −1
The step of factoring the polynomial
√ ℎ + 𝑑(𝛽𝑧 ) 𝑑(𝑧) into 𝑐 (𝛽𝑧 )𝑐 (𝑧), where the zeros of
𝑐 (𝑧) all have modulus exceeding 𝛽, is central to solving the problem.
We note two features of the solution (20)
89.6. UNDISCOUNTED PROBLEMS 1533
√ √
• Since |𝜆𝑗 | < 1/ 𝛽 for all 𝑗, it follows that (𝜆𝑗 𝛽) < 𝛽. √
• The assumption that {𝑎𝑡 } is of exponential order less than 1/ 𝛽 is sufficient to guaran-
tee that the geometric sums of future 𝑎𝑡 ’s on the right side of (20) converge.
We immediately see that those sums will converge under the weaker condition that {𝑎𝑡 } is of
exponential order less than 𝜙−1 where 𝜙 = max {𝛽𝜆𝑖 , 𝑖 = 1, … , 𝑚}.
Note that with 𝑎𝑡 identically zero, (20) implies that in general |𝑦𝑡 | eventually grows exponen-
tially at a rate given by max𝑖 |𝜆𝑖 |.
√
The condition max𝑖 |𝜆𝑖 | < 1/ 𝛽 guarantees that condition (12) is satisfied.
√
In fact, max𝑖 |𝜆𝑖 | < 1/ 𝛽 is a necessary condition for (12) to hold.
Were (12) not satisfied, the objective function would diverge to −∞, implying that the 𝑦𝑡
path could not be optimal.
For example, with 𝑎𝑡 = 0, for all 𝑡, it is easy to describe a naive (nonoptimal) policy for
{𝑦𝑡 , 𝑡 ≥ 0} that gives a finite value of (17).
We can simply let 𝑦𝑡 = 0 for 𝑡 ≥ 0.
This policy involves at most 𝑚 nonzero values of ℎ𝑦𝑡2 and [𝑑(𝐿)𝑦𝑡 ]2 , and so yields a finite
value of (1).
Therefore it is easy to dominate a path that violates (12).
It is worthwhile focusing on a special case of the LQ problems above: the undiscounted prob-
lem that emerges when 𝛽 = 1.
In this case, the Euler equation is
(ℎ + 𝑑(𝐿−1 )𝑑(𝐿)) 𝑦𝑡 = 𝑎𝑡
(ℎ + 𝑑 (𝑧 −1 )𝑑(𝑧)) = 𝑐 (𝑧 −1 ) 𝑐 (𝑧)
where
𝑐 (𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧)
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 … 𝑧𝑚 ]
|𝜆𝑗 | < 1 for 𝑗 = 1, … , 𝑚
1
𝜆𝑗 = for 𝑗 = 1, … , 𝑚
𝑧𝑗
𝑧0 = constant
𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
Discounted problems can always be converted into undiscounted problems via a simple trans-
formation.
Consider problem (1) with 0 < 𝛽 < 1.
Define the transformed variables
𝑚
Then notice that 𝛽 𝑡 [𝑑 (𝐿)𝑦𝑡 ]2 = [𝑑 ̃(𝐿)𝑦𝑡̃ ]2 with 𝑑 ̃(𝐿) = ∑𝑗=0 𝑑𝑗̃ 𝐿𝑗 and 𝑑𝑗̃ = 𝛽 𝑗/2 𝑑𝑗 .
Then the original criterion function (1) is equivalent to
𝑁
1 1
lim ∑{𝑎𝑡̃ 𝑦𝑡̃ − ℎ 𝑦𝑡2̃ − [𝑑 ̃(𝐿) 𝑦𝑡̃ ]2 } (22)
𝑁→∞
𝑡=0
2 2
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿) 𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ , (23)
𝑗=1 𝑘=0
1/2
[(−1)𝑚 𝑧0̃ 𝑧1̃ … 𝑧𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) … (1 − 𝜆̃ 𝑚 𝑧) = 𝑐 ̃ (𝑧), where |𝜆̃ 𝑗 | < 1
We leave it to the reader to show that (23) implies the equivalent form of the solution
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + ⋯ + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘
𝑗=1 𝑘=0
where
The transformations (21) and the inverse formulas (24) allow us to solve a discounted prob-
lem by first solving a related undiscounted problem.
89.7 Implementation
Code that computes solutions to the LQ problem using the methods described above can be
found in file control_and_filter.py.
Here’s how it looks
In [2]: """
"""
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of upfront
#---------------------------------------------
ϕ = np.zeros(2 * self.m + 1)
1536 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
#-----------------------------------------------------
# If r is given calculate the vector _r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
ϕ_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
ϕ_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1,�
↪ 1) \
@ self.r.reshape(1, self.k + 1),
k=-i
)
)
if h_eps is None:
self.ϕ_r = ϕ_r
else:
ϕ_r[self.k] = ϕ_r[self.k] + h_eps
self.ϕ_r = ϕ_r
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)) \
.reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
M = np.zeros((m + 1, m))
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
ϕ = self.ϕ
for i in range(m):
W_m[N - i, :(m - i)] = ϕ[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic
equation associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots�
↪ that can
be evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
ϕ = self.ϕ
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = ϕ_r[self.k + abs(i-j)]
return V
V = self.construct_V(N + 1)
d = spst.multivariate_normal(np.zeros(N + 1), V)
return d.rvs()
formed
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary
diagonal matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
1540 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1] \
.reshape(N + 1, 1)
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)) \
.reshape(N + 1 + self.m, 1)
89.7.1 Example
However, as we increase 𝛾, the agent gives greater weight to a smooth time path.
Hence {𝑦𝑡 } evolves as a smoothed version of {𝑎𝑡 }.
The {𝑎𝑡 } sequence we’ll choose as a stationary cyclic process plus some white noise.
Here’s some code that generates a plot when 𝛾 = 0.8
d = γ * np.asarray([1, -1])
y_m = np.asarray(y_m).reshape(m, 1)
plot_simulation()
1542 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
In [4]: plot_simulation(γ=5)
And here’s 𝛾 = 10
In [5]: plot_simulation(γ=10)
89.8. EXERCISES 1543
89.8 Exercises
89.8.1 Exercise 1
𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿)𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0
or
𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ (25)
𝑗=1 𝑘=0
Here
̃ −1 )𝑑(𝑧)
• ℎ + 𝑑(𝑧 ̃ = 𝑐(𝑧̃ −1 )𝑐(𝑧)
̃
• 𝑐(𝑧) 𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) ⋯ (1 − 𝜆̃ 𝑚 𝑧)
̃ = [(−1) 𝑧0̃ 𝑧1̃ ⋯ 𝑧𝑚 1/2
̃ −1 ) 𝑑(𝑧).
where the 𝑧𝑗̃ are the zeros of ℎ + 𝑑(𝑧 ̃
Prove that (25) implies that the solution for 𝑦𝑡 in feedback form is
𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + … + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ 𝛽 𝑘 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
1544 CHAPTER 89. CLASSICAL CONTROL WITH LINEAR ALGEBRA
89.8.2 Exercise 2
2
1
∑ {𝑎𝑡 𝑦𝑡 − [(1 − 2𝐿)𝑦𝑡 ]2 }
𝑡=0
2
89.8.3 Exercise 3
𝑁
1
lim ∑ − [(1 − 2𝐿)𝑦𝑡 ]2 ,
𝑁→∞
𝑡=0
2
89.8.4 Exercise 4
𝑁
1
lim ∑ (.0000001) 𝑦𝑡2 − [(1 − 2𝐿)𝑦𝑡 ]2
𝑁→∞
𝑡=0
2
subject to 𝑦−1 given. Prove that the solution 𝑦𝑡 = 2𝑦𝑡−1 violates condition (12), and so is not
optimal.
Prove that the optimal solution is approximately 𝑦𝑡 = .5𝑦𝑡−1 .
Chapter 90
90.1 Contents
• Overview 90.2
• Finite Dimensional Prediction 90.3
• Combined Finite Dimensional Control and Prediction 90.4
• Infinite Horizon Prediction and Filtering Problems 90.5
• Exercises 90.6
90.2 Overview
This is a sequel to the earlier lecture Classical Control with Linear Algebra.
That lecture used linear algebra – in particular, the LU decomposition – to formulate and
solve a class of linear-quadratic optimal control problems.
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition, to
solve linear prediction and filtering problems.
We exploit the useful fact that there is an intimate connection between two superficially dif-
ferent classes of problems:
• deterministic linear-quadratic (LQ) optimal control problems
• linear least squares prediction and filtering problems
The first class of problems involves no randomness, while the second is all about randomness.
Nevertheless, essentially the same mathematics solves both types of problem.
This connection, which is often termed “duality,” is present whether one uses “classical” or
“recursive” solution procedures.
In fact, we saw duality at work earlier when we formulated control and prediction problems
recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter,
and The permanent income model.
A useful consequence of duality is that
• With every LQ control problem, there is implicitly affiliated a linear least squares pre-
1545
1546CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
90.2.1 References
The key insight here comes from noting that because the covariance matrix 𝑉 of 𝑥 is a posi-
tive definite and symmetric, there exists a (Cholesky) decomposition of 𝑉 such that
𝑉 = 𝐿−1 (𝐿−1 )′
and
𝐿 𝑉 𝐿′ = 𝐼
𝐿11 𝑥1 = 𝜀1
𝐿21 𝑥1 + 𝐿22 𝑥2 = 𝜀2
(1)
⋮
𝐿𝑇 1 𝑥1 … + 𝐿𝑇 𝑇 𝑥𝑇 = 𝜀𝑇
or
𝑡−1
∑ 𝐿𝑡,𝑡−𝑗 𝑥𝑡−𝑗 = 𝜀𝑡 , 𝑡 = 1, 2, … 𝑇 (2)
𝑗=0
𝑥1 = 𝐿−1
11 𝜀1
𝑥2 = 𝐿−1 −1
22 𝜀2 + 𝐿21 𝜀1
, (3)
⋮
𝑥𝑇 = 𝐿−1 −1 −1
𝑇 𝑇 𝜀𝑇 + 𝐿𝑇 ,𝑇 −1 𝜀𝑇 −1 … + 𝐿𝑇 ,1 𝜀1
or
𝑡−1
𝑥𝑡 = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (4)
𝑗=0
where 𝐿−1 −1
𝑖,𝑗 denotes the 𝑖, 𝑗 element of 𝐿 .
To proceed, it is useful to drill down and note that for 𝑡 − 1 ≥ 𝑚 ≥ 1 we can rewrite (4) in the
form of the moving average representation
𝑚−1 𝑡−1
𝑥𝑡 = ∑ 𝐿−1 −1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 + ∑ 𝐿𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (6)
𝑗=0 𝑗=𝑚
𝑡−1
Representation (6) is an orthogonal decomposition of 𝑥𝑡 into a part ∑𝑗=𝑚 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that lies
𝑡−1
in the space spanned by [𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … , 𝑥1 ] and an orthogonal component ∑𝑗=𝑚 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗
that does not line in that space but instead in a linear space knowns as its orthogonal com-
plement.
It follows that
𝑚−1
̂ 𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚−1 , … , 𝑥1 ] = ∑ 𝐿−1
𝔼[𝑥 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=0
90.3.1 Implementation
Code that computes solutions to LQ control and filtering problems using the methods de-
scribed here and in Classical Control with Linear Algebra can be found in the file con-
trol_and_filter.py.
Here’s how it looks
In [2]: """
"""
import numpy as np
import scipy.stats as spst
import scipy.linalg as la
class LQFilter:
Parameters
----------
90.3. FINITE DIMENSIONAL PREDICTION 1549
self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1
self.y_m = np.asarray(y_m)
if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")
#---------------------------------------------
# Define the coefficients of upfront
#---------------------------------------------
ϕ = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
ϕ[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) \
@ self.d.reshape(1, self.m + 1),
k=-i
)
)
ϕ[self.m] = ϕ[self.m] + self.h
self.ϕ = ϕ
#-----------------------------------------------------
# If r is given calculate the vector _r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
ϕ_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
ϕ_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1,�
↪ 1) \
@ self.r.reshape(1, self.k + 1),
k=-i
)
)
if h_eps is None:
self.ϕ_r = ϕ_r
else:
ϕ_r[self.k] = ϕ_r[self.k] + h_eps
1550CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
self.ϕ_r = ϕ_r
#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)) \
.reshape(self.m, 1)
m = self.m
d = self.d
W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))
#---------------------------------------
# Terminal conditions
#---------------------------------------
for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]
#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
ϕ = self.ϕ
for i in range(m):
W_m[N - i, :(m - i)] = ϕ[(m + 1 + i):]
return W, W_m
def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic
equation associated with the Euler equation (1.7)
Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots�
↪ that can
be evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
ϕ = self.ϕ
λ = 1 / z_1_to_m
def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j
return c_coeffs[::-1]
def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]
1552CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])
return λ, A
for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = ϕ_r[self.k + abs(i-j)]
return V
return d.rvs()
formed
N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)
return Ea_hist
deterministic a_t
- if t is given, it solves the combined control prediction problem
(section 7)(by default, t == None -> deterministic)
Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary
diagonal matrix D which renormalizes L and U
"""
N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)
L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))
J = np.fliplr(np.eye(N + 1))
a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)
#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1] \
.reshape(N + 1, 1)
#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)) \
.reshape(N + 1 + self.m, 1)
90.3.2 Example 1
𝑥𝑡 = (1 − 2𝐿)𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
If we were to use the tools associated with infinite dimensional prediction and filtering to be
described below, we would use the Wiener-Kolmogorov formula (21) to compute the linear
least squares forecasts 𝔼[𝑥𝑡+𝑗 ∣ 𝑥𝑡 , 𝑥𝑡−1 , …], for 𝑗 = 1, 2.
But we can do everything we want by instead using our finite dimensional tools and setting
𝑑 = 𝑟, generating an instance of LQFilter, then invoking pertinent methods of LQFilter.
In [3]: m = 1
y_m = np.asarray([.0]).reshape(m, 1)
d = np.asarray([1, -2])
r = np.asarray([1, -2])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
In [4]: example.coeffs_of_c()
In [5]: example.roots_of_characteristic()
Now let’s form the covariance matrix of a time series vector of length 𝑁 and put it in 𝑉 .
Then we’ll take a Cholesky decomposition of 𝑉 = 𝐿−1 𝐿−1 and use it to form the vector of
“moving average representations” 𝑥 = 𝐿−1 𝜀 and the vector of “autoregressive representations”
𝐿𝑥 = 𝜀.
90.3. FINITE DIMENSIONAL PREDICTION 1555
In [6]: V = example.construct_V(N=5)
print(V)
[[ 5. -2. 0. 0. 0.]
[-2. 5. -2. 0. 0.]
[ 0. -2. 5. -2. 0.]
[ 0. 0. -2. 5. -2.]
[ 0. 0. 0. -2. 5.]]
Notice how the lower rows of the “moving average representations” are converging to the ap-
propriate infinite history Wold representation to be described below when we study infinite
horizon-prediction and filtering
In [7]: Li = np.linalg.cholesky(V)
print(Li)
[[ 2.23606798 0. 0. 0. 0. ]
[-0.89442719 2.04939015 0. 0. 0. ]
[ 0. -0.97590007 2.01186954 0. 0. ]
[ 0. 0. -0.99410024 2.00293902 0. ]
[ 0. 0. 0. -0.99853265 2.000733 ]]
Notice how the lower rows of the “autoregressive representations” are converging to the ap-
propriate infinite-history autoregressive representation to be described below when we study
infinite horizon-prediction and filtering
In [8]: L = np.linalg.inv(Li)
print(L)
[[0.4472136 0. 0. 0. 0. ]
[0.19518001 0.48795004 0. 0. 0. ]
[0.09467621 0.23669053 0.49705012 0. 0. ]
[0.04698977 0.11747443 0.2466963 0.49926632 0. ]
[0.02345182 0.05862954 0.12312203 0.24917554 0.49981682]]
90.3.3 Example 2
√
𝑋𝑡 = (1 − 2𝐿2 )𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity.
Let’s find a Wold moving average representation for 𝑥𝑡 that will prevail in the infinite-history
context to be studied in detail below.
To do this, we’ll use the Wiener-Kolomogorov formula (21) presented below to compute the
linear least squares forecasts 𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡−1 , …] for 𝑗 = 1, 2, 3.
We proceed in the same way as in example 1
1556CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
In [9]: m = 2
y_m = np.asarray([.0, .0]).reshape(m, 1)
d = np.asarray([1, 0, -np.sqrt(2)])
r = np.asarray([1, 0, -np.sqrt(2)])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()
In [10]: example.roots_of_characteristic()
In [11]: V = example.construct_V(N=8)
print(V)
[[ 3. 0. -1.41421356 0. 0. 0.
0. 0. ]
[ 0. 3. 0. -1.41421356 0. 0.
0. 0. ]
[-1.41421356 0. 3. 0. -1.41421356 0.
0. 0. ]
[ 0. -1.41421356 0. 3. 0. -1.41421356
0. 0. ]
[ 0. 0. -1.41421356 0. 3. 0.
-1.41421356 0. ]
[ 0. 0. 0. -1.41421356 0. 3.
0. -1.41421356]
[ 0. 0. 0. 0. -1.41421356 0.
3. 0. ]
[ 0. 0. 0. 0. 0. -1.41421356
0. 3. ]]
In [12]: Li = np.linalg.cholesky(V)
print(Li[-3:, :])
[[ 0. 0. 0. -0.9258201 0. 1.46385011
0. 0. ]
[ 0. 0. 0. 0. -0.96609178 0.
1.43759058 0. ]
[ 0. 0. 0. 0. 0. -0.96609178
0. 1.43759058]]
In [13]: L = np.linalg.inv(Li)
print(L)
[[0.57735027 0. 0. 0. 0. 0.
0. 0. ]
[0. 0.57735027 0. 0. 0. 0.
0. 0. ]
[0.3086067 0. 0.65465367 0. 0. 0.
90.3. FINITE DIMENSIONAL PREDICTION 1557
0. 0. ]
[0. 0.3086067 0. 0.65465367 0. 0.
0. 0. ]
[0.19518001 0. 0.41403934 0. 0.68313005 0.
0. 0. ]
[0. 0.19518001 0. 0.41403934 0. 0.68313005
0. 0. ]
[0.13116517 0. 0.27824334 0. 0.45907809 0.
0.69560834 0. ]
[0. 0.13116517 0. 0.27824334 0. 0.45907809
0. 0.69560834]]
90.3.4 Prediction
It immediately follows from the “orthogonality principle” of least squares (see [11] or [142]
[ch. X]) that
𝑡−1
̂ 𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … 𝑥1 ] = ∑ 𝐿−1
𝔼[𝑥 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=𝑚 (7)
= [𝐿−1 −1 −1
𝑡,1 𝐿𝑡,2 , … , 𝐿𝑡,𝑡−𝑚 0 0 … 0]𝐿 𝑥
This formula will be convenient in representing the solution of control problems under uncer-
tainty.
Equation (4) can be recognized as a finite dimensional version of a moving average represen-
tation.
Equation (2) can be viewed as a finite dimension version of an autoregressive representation.
Notice that even if the 𝑥𝑡 process is covariance stationary, so that 𝑉 is such that 𝑉𝑖𝑗 depends
only on |𝑖 − 𝑗|, the coefficients in the moving average representation are time-dependent, there
being a different moving average for each 𝑡.
If 𝑥𝑡 is a covariance stationary process, the last row of 𝐿−1 converges to the coefficients in the
Wold moving average representation for {𝑥𝑡 } as 𝑇 → ∞.
Further, if 𝑥𝑡 is covariance stationary, for fixed 𝑘 and 𝑗 > 0, 𝐿−1 −1
𝑇 ,𝑇 −𝑗 converges to 𝐿𝑇 −𝑘,𝑇 −𝑘−𝑗
as 𝑇 → ∞.
That is, the “bottom” rows of 𝐿−1 converge to each other and to the Wold moving average
coefficients as 𝑇 → ∞.
This last observation gives one simple and widely-used practical way of forming a finite 𝑇 ap-
proximation to a Wold moving average representation.
1558CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
′
First, form the covariance matrix 𝔼𝑥𝑥′ = 𝑉 , then obtain the Cholesky decomposition 𝐿−1 𝐿−1
of 𝑉 , which can be accomplished quickly on a computer.
The last row of 𝐿−1 gives the approximate Wold moving average coefficients.
This method can readily be generalized to multivariate systems.
𝑁
1 1
𝔼 ∑ {𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 } , ℎ>0
𝑡=0
2 2
𝑦−1
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
⎣𝑦−𝑚 ⎦
0 0
𝔼[̂ 𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] = 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄
0 𝐼(𝑠+1)
𝑦−1
0 0
𝑈 𝑦 ̄ = 𝐿−1 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄ + 𝐾 ⎡
⎢ ⋮ ⎥
⎤
0 𝐼(𝑡+1)
⎣𝑦−𝑚 ⎦
𝑌𝑡 = 𝑑(𝐿)𝑢𝑡 (9)
𝑚
where 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 , and 𝑢𝑡 is a serially uncorrelated stationary random process satisfy-
ing
𝔼𝑢𝑡 = 0
1 if 𝑡 = 𝑠 (10)
𝔼𝑢𝑡 𝑢𝑠 = {
0 otherwise
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 (11)
where 𝜀𝑡 is a serially uncorrelated stationary random process with 𝔼𝜀𝑡 = 0 and 𝔼𝜀𝑡 𝜀𝑠 = 0 for
all distinct 𝑡 and 𝑠.
We also assume that 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
The linear least squares prediction problem is to find the 𝐿2 random variable 𝑋̂ 𝑡+𝑗
among linear combinations of {𝑋𝑡 , 𝑋𝑡−1 , …} that minimizes 𝔼(𝑋̂ 𝑡+𝑗 − 𝑋𝑡+𝑗 )2 .
∞ ∞
That is, the problem is to find a 𝛾𝑗 (𝐿) = ∑𝑘=0 𝛾𝑗𝑘 𝐿𝑘 such that ∑𝑘=0 |𝛾𝑗𝑘 |2 < ∞ and
𝔼[𝛾𝑗 (𝐿)𝑋𝑡 − 𝑋𝑡+𝑗 ]2 is minimized.
∞
The linear least squares filtering problem is to find a 𝑏 (𝐿) = ∑𝑗=0 𝑏𝑗 𝐿𝑗 such that
∞
∑𝑗=0 |𝑏𝑗 |2 < ∞ and 𝔼[𝑏 (𝐿)𝑋𝑡 − 𝑌𝑡 ]2 is minimized.
Interesting versions of these problems related to the permanent income theory were studied
by [120].
𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏
𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 𝜏 = 0, ±1, ±2, … (12)
𝐶𝑌 ,𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏
1560CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
∞
𝑔𝑋 (𝑧) = ∑ 𝐶𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
∞
𝑔𝑌 (𝑧) = ∑ 𝐶𝑌 (𝜏 )𝑧 𝜏 (13)
𝜏=−∞
∞
𝑔𝑌 𝑋 (𝑧) = ∑ 𝐶𝑌 𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞
𝑦𝑡 = 𝐴(𝐿)𝑣1𝑡 + 𝐵(𝐿)𝑣2𝑡
𝑥𝑡 = 𝐶(𝐿)𝑣1𝑡 + 𝐷(𝐿)𝑣2𝑡
𝑔𝑌 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
𝑔𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 ) + ℎ (15)
𝑔𝑌 𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
The key step in obtaining solutions to our problems is to factor the covariance generating
function 𝑔𝑋 (𝑧) of 𝑋.
The solutions of our problems are given by formulas due to Wiener and Kolmogorov.
These formulas utilize the Wold moving average representation of the 𝑋𝑡 process,
𝑋𝑡 = 𝑐 (𝐿) 𝜂𝑡 (16)
𝑚
where 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿𝑗 , with
̂ 𝑡 |𝑋𝑡−1 , 𝑋𝑡−2 , …]
𝑐0 𝜂𝑡 = 𝑋𝑡 − 𝔼[𝑋 (17)
Condition (17) requires that 𝜂𝑡 lie in the closed linear space spanned by [𝑋𝑡 , 𝑋𝑡−1 , …].
This will be true if and only if the zeros of 𝑐(𝑧) do not lie inside the unit circle.
It is an implication of (17) that 𝜂𝑡 is a serially uncorrelated random process and that normal-
ization can be imposed so that 𝔼𝜂𝑡2 = 1.
Consequently, an implication of (16) is that the covariance generating function of 𝑋𝑡 can be
expressed as
Therefore, we have already shown constructively how to factor the covariance generating
function 𝑔𝑋 (𝑧) = 𝑑(𝑧) 𝑑 (𝑧 −1 ) + ℎ.
We now introduce the annihilation operator:
∞ ∞
[ ∑ 𝑓𝑗 𝐿𝑗 ] ≡ ∑ 𝑓𝑗 𝐿𝑗 (20)
𝑗=−∞ 𝑗=0
+
𝑐(𝐿)
𝛾𝑗 (𝐿) = [ ] 𝑐 (𝐿)−1 (21)
𝐿𝑗 +
̂ 𝑡 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = 𝑏(𝐿)𝑋𝑡 .
We have defined the solution of the filtering problem as 𝔼[𝑌
The Wiener-Kolomogorov formula for 𝑏(𝐿) is
𝑔𝑌 𝑋 (𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝑐(𝐿−1 ) +
or
𝑑(𝐿)𝑑(𝐿−1 )
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (22)
𝑐(𝐿−1 ) +
Formulas (21) and (22) are discussed in detail in [164] and [142].
The interested reader can there find several examples of the use of these formulas in eco-
nomics Some classic examples using these formulas are due to [120].
As an example of the usefulness of formula (22), we let 𝑋𝑡 be a stochastic process with Wold
moving average representation
1562CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
𝑋𝑡 = 𝑐(𝐿)𝜂𝑡
Suppose that at time 𝑡, we wish to predict a geometric sum of future 𝑋’s, namely
∞
1
𝑦𝑡 ≡ ∑ 𝛿 𝑗 𝑋𝑡+𝑗 = 𝑋
𝑗=0
1 − 𝛿𝐿−1 𝑡
𝑐(𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (23)
1 − 𝛿𝐿−1 +
In order to evaluate the term in the annihilation operator, we use the following result from
[74].
Proposition Let
∞ ∞
• 𝑔(𝑧) = ∑𝑗=0 𝑔𝑗 𝑧𝑗 where ∑𝑗=0 |𝑔𝑗 |2 < +∞.
• ℎ (𝑧 −1 ) = (1 − 𝛿1 𝑧−1 ) … (1 − 𝛿𝑛 𝑧−1 ), where |𝛿𝑗 | < 1, for 𝑗 = 1, … , 𝑛.
Then
𝑛
𝑔(𝑧) 𝑔(𝑧) 𝛿𝑗 𝑔(𝛿𝑗 ) 1
[ ] = − ∑ 𝑛 ( ) (24)
ℎ(𝑧−1 ) + ℎ(𝑧 −1 ) 𝑗=1 ∏ 𝑘=1 (𝛿𝑗 − 𝛿𝑘 ) 𝑧 − 𝛿𝑗
𝑘≠𝑗
and, alternatively,
𝑛
𝑔(𝑧) 𝑧𝑔(𝑧) − 𝛿𝑗 𝑔(𝛿𝑗 )
[ −1
] = ∑ 𝐵𝑗 ( ) (25)
ℎ(𝑧 ) + 𝑗=1 𝑧 − 𝛿𝑗
𝑛
where 𝐵𝑗 = 1/ ∏ 𝑘=1 (1 − 𝛿𝑘 /𝛿𝑗 ).
𝑘+𝑗
Applying formula (25) of the proposition to evaluating (23) with 𝑔(𝑧) = 𝑐(𝑧) and ℎ(𝑧 −1 ) =
1 − 𝛿𝑧 −1 gives
𝐿𝑐(𝐿) − 𝛿𝑐(𝛿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝐿−𝛿
or
90.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1563
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝑏(𝐿) = [ ]
1 − 𝛿𝐿−1
Thus, we have
∞
1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝔼̂ [∑ 𝛿 𝑗 𝑋𝑡+𝑗 |𝑋𝑡 , 𝑥𝑡−1 , …] = [ ] 𝑋𝑡 (26)
𝑗=0
1 − 𝛿𝐿−1
This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control
with Linear Algebra in which the randomness emerges because {𝑎𝑡 } is a stochastic process.
The problem is to maximize
𝑁
1 1
𝔼0 lim ∑ 𝛽 𝑡 [𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 ] (27)
𝑁→∞
𝑡−0
2 2
𝑎𝑡 = 𝑐(𝐿) 𝜂𝑡
where
𝑛̃
𝑐(𝐿) = ∑ 𝑐𝑗 𝐿𝑗
𝑗=0
and
̂ 𝑡 |𝑎𝑡−1 , …]
𝜂𝑡 = 𝑎𝑡 − 𝔼[𝑎
The problem is to maximize (27) with respect to a contingency plan expressing 𝑦𝑡 as a func-
tion of information known at 𝑡, which is assumed to be (𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑎𝑡 , 𝑎𝑡−1 , …).
The solution of this problem can be achieved in two steps.
First, ignoring the uncertainty, we can solve the problem assuming that {𝑎𝑡 } is a known se-
quence.
The solution is, from above,
or
𝑚 ∞
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑(𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (28)
𝑗=1 𝑘=0
Second, the solution of the problem under uncertainty is obtained by replacing the terms on
the right-hand side of the above expressions with their linear least squares predictors.
1564CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
𝑚
1 − 𝛽𝜆𝑗 𝑐(𝛽𝜆𝑗 )𝐿−1 𝑐(𝐿)−1
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 [ ] 𝑎𝑡
𝑗=1
1 − 𝛽𝜆𝑗 𝐿−1
Blaschke factors
The following is a useful piece of mathematics underlying “root flipping”.
𝑚
Let 𝜋(𝑧) = ∑𝑗=0 𝜋𝑗 𝑧𝑗 and let 𝑧1 , … , 𝑧𝑘 be the zeros of 𝜋(𝑧) that are inside the unit circle,
𝑘 < 𝑚.
Then define
(𝑧1 𝑧 − 1) (𝑧 𝑧 − 1) (𝑧 𝑧 − 1)
𝜃(𝑧) = 𝜋(𝑧)( )( 2 )…( 𝑘 )
(𝑧 − 𝑧1 ) (𝑧 − 𝑧2 ) (𝑧 − 𝑧𝑘 )
and that the zeros of 𝜃(𝑧) are not inside the unit circle.
90.6 Exercises
90.6.1 Exercise 1
Let 𝑌𝑡 = (1 − 2𝐿)𝑢𝑡 where 𝑢𝑡 is a mean zero white noise with 𝔼𝑢2𝑡 = 1. Let
𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡
where 𝜀𝑡 is a serially uncorrelated white noise with 𝔼𝜀2𝑡 = 9, and 𝔼𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠.
Find the Wold moving average representation for 𝑋𝑡 .
Find a formula for the 𝐴1𝑗 ’s in
∞
̂𝑡+1 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴1𝑗 𝑋𝑡−𝑗
𝔼𝑋
𝑗=0
∞
̂ 𝑡+2 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴2𝑗 𝑋𝑡−𝑗
𝔼𝑋
𝑗=0
90.6. EXERCISES 1565
90.6.2 Exercise 2
𝑌𝑡 = 𝐷(𝐿)𝑈𝑡
𝑚
where 𝐷(𝐿) = ∑𝑗=0 𝐷𝑗 𝐿𝐽 , 𝐷𝑗 an 𝑛 × 𝑛 matrix, 𝑈𝑡 an (𝑛 × 1) vector white noise with 𝔼𝑈𝑡 = 0
for all 𝑡, 𝔼𝑈𝑡 𝑈𝑠′ = 0 for all 𝑠 ≠ 𝑡, and 𝔼𝑈𝑡 𝑈𝑡′ = 𝐼 for all 𝑡.
Let 𝜀𝑡 be an 𝑛 × 1 vector white noise with mean 0 and contemporaneous covariance matrix 𝐻,
where 𝐻 is a positive definite matrix.
Let 𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 .
′ ′ ′
Define the covariograms as 𝐶𝑋 (𝜏 ) = 𝔼𝑋𝑡 𝑋𝑡−𝜏 , 𝐶𝑌 (𝜏 ) = 𝔼𝑌𝑡 𝑌𝑡−𝜏 , 𝐶𝑌 𝑋 (𝜏 ) = 𝔼𝑌𝑡 𝑋𝑡−𝜏 .
Then define the matrix covariance generating function, as in (21), only interpret all the ob-
jects in (21) as matrices.
Show that the covariance generating functions are given by
𝑔𝑦 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑔𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻
𝑔𝑌 𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑚
𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻 = 𝐶(𝑧)𝐶(𝑧 −1 )′ , 𝐶(𝑧) = ∑ 𝐶𝑗 𝑧𝑗
𝑗=0
where the zeros of |𝐶(𝑧)| do not lie inside the unit circle.
A vector Wold moving average representation of 𝑋𝑡 is then
𝑋𝑡 = 𝐶(𝐿)𝜂𝑡
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝑗 ] 𝜂𝑡
𝐿 +
If 𝐶(𝐿) is invertible, i.e., if the zeros of det 𝐶(𝑧) lie strictly outside the unit circle, then this
formula can be written
𝐶(𝐿)
𝔼̂ [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ 𝐽 ] 𝐶(𝐿)−1 𝑋𝑡
𝐿 +
1566CHAPTER 90. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA
Part XIII
1567
Chapter 91
91.1 Contents
• Overview 91.2
• Pricing Models 91.3
• Prices in the Risk-Neutral Case 91.4
• Asset Prices under Risk Aversion 91.5
• Exercises 91.6
• Solutions 91.7
“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.
In addition to what’s in Anaconda, this lecture will need the following libraries:
91.2 Overview
1569
1570 CHAPTER 91. ASSET PRICING I: FINITE STATE MODELS
What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [81] and Lars Peter Hansen and Scott Richard [73] showed
that in quite general settings the price of an ex-dividend asset obeys
We give examples of how the stochastic discount factor has been modeled below.
Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have
Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡 .
Let’s write down an expression that this ratio should satisfy.
We can divide both sides of (2) by 𝑑𝑡 to get
𝑑𝑡+1
𝑣𝑡 = 𝔼𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡
What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on
For now let’s focus on the risk-neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process.
1572 CHAPTER 91. ASSET PRICING I: FINITE STATE MODELS
The simplest case is risk-neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0.
Removing the expectation from (1) and iterating forward gives
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))
⋮
= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )
𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽
Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1.
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be.
If we guess this, substituting 𝑣𝑡 = 𝑣 into (5) as well as our other assumptions, we get 𝑣 =
𝛽𝑔(1 + 𝑣).
Since 𝛽𝑔 < 1, we have a unique positive solution:
𝛽𝑔
𝑣=
1 − 𝛽𝑔
𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes
1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡
𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …
where
1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities
Pricing
To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth.
In that case, we found that 𝑣 is constant.
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡 .
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 ).
We can substitute this guess into (5) to get
or
𝑣 = 𝛽𝐾(𝟙 + 𝑣) (9)
Here
• 𝑣 is understood to be the column vector (𝑣(𝑥1 ), … , 𝑣(𝑥𝑛 ))′ .
• 𝐾 is the matrix (𝐾(𝑥𝑖 , 𝑥𝑗 ))1≤𝑖,𝑗≤𝑛 .
• 𝟙 is a column vector of ones.
When does (9) have a unique solution?
From the Neumann series lemma and Gelfand’s formula, this will be the case if 𝛽𝐾 has spec-
tral radius strictly less than one.
In other words, we require that the eigenvalues of 𝐾 be strictly less than 𝛽 −1 in modulus.
The solution is then
91.4.4 Code
K = mc.P * np.exp(mc.state_values)
I = np.identity(n)
v = solve(I - β * K, β * K @ np.ones(n))
Now let’s turn to the case where agents are risk averse.
We’ll price several distinct assets, including
• The price of an endowment stream
• A consol (a type of bond issued by the UK government in the 19th century)
• Call options on a consol
Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [109].
As in [109], suppose that the stochastic discount factor takes the form
𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )
𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾
−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡
If we let
𝑣 = 𝛽𝐽 (𝟙 + 𝑣)
Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution
𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 𝟙 (14)
We will define a function tree_price to solve for 𝑣 given parameters stored in the class Asset-
PriceModel
Parameters
----------
1578 CHAPTER 91. ASSET PRICING I: FINITE STATE MODELS
β : scalar, float
Discount factor
mc : MarkovChain
Contains the transition matrix and set of state values for the state
process
γ : scalar(float)
Coefficient of risk aversion
g : callable
The function mapping states to growth rates
"""
def __init__(self, β=0.96, mc=None, γ=2.0, g=np.exp):
self.β, self.γ = β, γ
self.g = g
self.n = self.mc.P.shape[0]
def tree_price(ap):
"""
Computes the price-dividend ratio of the Lucas tree.
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
Returns
-------
v : array_like(float)
Lucas tree price-dividend ratio
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
J = P * ap.g(y)**(1 - γ)
# Compute v
91.5. ASSET PRICES UNDER RISK AVERSION 1579
I = np.identity(ap.n)
Ones = np.ones(ap.n)
v = solve(I - β * J, β * J @ Ones)
return v
Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)
for γ in γs:
ap.γ = γ
v = tree_price(ap)
ax.plot(states, v, lw=2, alpha=0.6, label=rf"$\gamma = {γ}$")
In the stochastic discount factor (13), higher growth decreases the discount factor, lowering
the weight placed on future returns.
Special Cases
∞
1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 𝟙 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 𝟙 = 𝛽 𝟙
𝑖=0
1−𝛽
Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant.
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk-neutral solution (10).
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk-neutral).
𝑝𝑡 = 𝔼𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]
−𝛾
𝑝𝑡 = 𝔼𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)
Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution
𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁𝟙 (16)
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
Coupon of the console
Returns
-------
p : array_like(float)
Console bond prices
"""
# Simplify names, set up matrices
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)
# Compute price
I = np.identity(ap.n)
Ones = np.ones(ap.n)
p = solve(I - β * M, β * ζ * M @ Ones)
return p
Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆 .
2. Not to exercise the option now but to retain the right to exercise it later
Thus, the owner either exercises the option now or chooses not to exercise and wait until next
period.
This is termed an infinite-horizon call option with strike price 𝑝𝑆 .
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond.
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption.
1582 CHAPTER 91. ASSET PRICING I: FINITE STATE MODELS
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today).
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies
𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 𝔼𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )
The first term on the right is the value of waiting, while the second is the value of exercising
now.
We can also write this as
With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
(17) as the nonlinear vector equation
𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙} (18)
To solve (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via
𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 𝟙}
Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives
ζ : scalar(float)
Coupon of the console
p_s : scalar(float)
Strike price
ϵ : scalar(float), optional(default=1e-8)
Tolerance for infinite horizon problem
Returns
-------
w : array_like(float)
Infinite horizon call option prices
"""
91.5. ASSET PRICES UNDER RISK AVERSION 1583
return w
In [9]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40
x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)
−𝛾
As before, the stochastic discount factor is 𝑚𝑡+1 = 𝛽𝑔𝑡+1 .
It follows that the reciprocal 𝑅𝑡−1 of the gross risk-free interest rate 𝑅𝑡 in state 𝑥 is
𝑚1 = 𝛽𝑀 𝟙
where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖 .
91.6. EXERCISES 1585
Other Terms
Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖 .
Then 𝑚1 = 𝛽𝑀 , and 𝑚𝑗+1 = 𝑀 𝑚𝑗 for 𝑗 ≥ 1.
91.6 Exercises
91.6.1 Exercise 1
91.6.2 Exercise 2
In [10]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
# State values of the Markov chain
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05])
γ = 2.0
β = 0.94
91.6.3 Exercise 3
Let’s consider finite horizon call options, which are more common than the infinite horizon
variety.
Finite horizon options obey functional equations closely related to (17).
A 𝑘 period option expires after 𝑘 periods.
1586 CHAPTER 91. ASSET PRICING I: FINITE STATE MODELS
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1.
The option expires at time 𝑘.
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option.
It obeys
91.7 Solutions
91.7.1 Exercise 1
𝑝𝑡 = 𝑑𝑡 + 𝛽𝔼𝑡 [𝑝𝑡+1 ]
1
𝑝𝑡 = 𝑑
1−𝛽 𝑡
1
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡
91.7.2 Exercise 2
In [11]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # State values
91.7. SOLUTIONS 1587
mc = qe.MarkovChain(P, state_values=s)
γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0
In [13]: tree_price(apm)
In [14]: consol_price(apm, ζ)
91.7.3 Exercise 3
return w
92.1 Contents
• Overview 92.2
• The Lucas Model 92.3
• Exercises 92.4
• Solutions 92.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
92.2 Overview
1591
1592 CHAPTER 92. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
Lucas studied a pure exchange economy with a representative consumer (or household), where
• Pure exchange means that all endowments are exogenous.
• Representative consumer means that either
– there is a single consumer (sometimes also referred to as a household), or
– all consumers have identical endowments and preferences
Either way, the assumption of a representative agent means that prices adjust to eradicate
desires to trade.
This makes it very easy to compute competitive equilibrium prices.
Assets
There is a single “productive unit” that costlessly generates a sequence of consumption goods
{𝑦𝑡 }∞
𝑡=0 .
We will assume that this endowment is Markovian, following the exogenous process
Consumers
A representative consumer ranks consumption streams {𝑐𝑡 } according to the time separable
utility functional
∞
𝔼 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
Here
92.3. THE LUCAS MODEL 1593
𝑐𝑡 + 𝜋𝑡+1 𝑝𝑡 ≤ 𝜋𝑡 𝑦𝑡 + 𝜋𝑡 𝑝𝑡
𝑣(𝜋, 𝑦) = max
′
{𝑢(𝑐) + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)}
𝑐,𝜋
subject to
We can invoke the fact that utility is increasing to claim equality in (2) and hence eliminate
the constraint, obtaining
𝑣(𝜋, 𝑦) = max
′
{𝑢[𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ 𝑝(𝑦)] + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)} (3)
𝜋
The solution to this dynamic programming problem is an optimal policy expressing either 𝜋′
or 𝑐 as a function of the state (𝜋, 𝑦).
• Each one determines the other, since 𝑐(𝜋, 𝑦) = 𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ (𝜋, 𝑦)𝑝(𝑦)
Next Steps
1. Solve this two-dimensional dynamic programming problem for the optimal policy.
However, as Lucas showed, there is a related but more straightforward way to do this.
Equilibrium Constraints
Since the consumption good is not storable, in equilibrium we must have 𝑐𝑡 = 𝑦𝑡 for all 𝑡.
In addition, since there is one representative consumer (alternatively, since all consumers are
identical), there should be no trade in equilibrium.
In particular, the representative consumer owns the whole tree in every period, so 𝜋𝑡 = 1 for
all 𝑡.
Prices must adjust to satisfy these two constraints.
Now observe that the first-order condition for (3) can be written as
To obtain 𝑣1′ we can simply differentiate the right-hand side of (3) with respect to 𝜋, yielding
Next, we impose the equilibrium constraints while combining the last two equations to get
𝑢′ [𝐺(𝑦, 𝑧)]
𝑝(𝑦) = 𝛽 ∫ [𝐺(𝑦, 𝑧) + 𝑝(𝐺(𝑦, 𝑧))]𝜙(𝑑𝑧) (4)
𝑢′ (𝑦)
𝑢′ (𝑐𝑡+1 )
𝑝𝑡 = 𝔼𝑡 [𝛽 (𝑦 + 𝑝𝑡+1 )] (5)
𝑢′ (𝑐𝑡 ) 𝑡+1
Instead of solving for it directly we’ll follow Lucas’ indirect approach, first setting
Here ℎ(𝑦) ∶= 𝛽 ∫ 𝑢′ [𝐺(𝑦, 𝑧)]𝐺(𝑦, 𝑧)𝜙(𝑑𝑧) is a function that depends only on the primitives.
Equation (7) is a functional equation in 𝑓.
The plan is to solve out for 𝑓 and convert back to 𝑝 via (6).
To solve (7) we’ll use a standard method: convert it to a fixed point problem.
First, we introduce the operator 𝑇 mapping 𝑓 into 𝑇 𝑓 as defined by
(Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the
next section)
Recall the Banach contraction mapping theorem.
It tells us that the previous statements will be true if we can find an 𝛼 < 1 such that
≤ 𝛽 ∫ ‖𝑓 − 𝑔‖𝜙(𝑑𝑧)
= 𝛽‖𝑓 − 𝑔‖
Since the right-hand side is an upper bound, taking the sup over all 𝑦 on the left-hand side
gives (9) with 𝛼 ∶= 𝛽.
The preceding discussion tells that we can compute 𝑓 ∗ by picking any arbitrary 𝑓 ∈ 𝑐𝑏ℝ+ and
then iterating with 𝑇 .
The equilibrium price function 𝑝∗ can then be recovered by 𝑝∗ (𝑦) = 𝑓 ∗ (𝑦)/𝑢′ (𝑦).
Let’s try this when ln 𝑦𝑡+1 = 𝛼 ln 𝑦𝑡 + 𝜎𝜖𝑡+1 where {𝜖𝑡 } is IID and standard normal.
Utility will take the isoelastic form 𝑢(𝑐) = 𝑐1−𝛾 /(1 − 𝛾), where 𝛾 > 0 is the coefficient of
relative risk aversion.
We will set up a LucasTree class to hold parameters of the model
92.3. THE LUCAS MODEL 1597
"""
def __init__(self,
γ=2, # CRRA utility parameter
β=0.95, # Discount factor
α=0.90, # Correlation coefficient
σ=0.1, # Volatility coefficient
grid_size=100):
self.h = np.empty(self.grid_size)
for i, y in enumerate(self.grid):
self.h[i] = β * np.mean((y**α * self.draws)**(1 - γ))
The following function takes an instance of the LucasTree and generates a jitted version of
the Lucas operator
"""
Returns approximate Lucas operator, which computes and returns the
updated function Tf on the grid points.
"""
@njit(parallel=parallel_flag)
def T(f):
"""
The Lucas operator
"""
Tf = np.empty_like(f)
1598 CHAPTER 92. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
return Tf
return T
To solve the model, we write a function that iterates using the Lucas operator to find the
fixed point.
"""
# Simplify notation
grid, grid_size = tree.grid, tree.grid_size
γ = tree.γ
T = operator_factory(tree)
i = 0
f = np.ones_like(grid) # Initial guess of f
error = tol + 1
while error > tol and i < max_iter:
Tf = T(f)
error = np.max(np.abs(Tf - f))
f = Tf
i += 1
return price
We see that the price is increasing, even if we remove all serial correlation from the endow-
ment process.
The reason is that a larger current endowment reduces current marginal utility.
The price must therefore rise to induce the household to consume the entire endowment (and
hence satisfy the resource constraint).
What happens with a more patient consumer?
Here the orange line corresponds to the previous parameters and the green line is price when
𝛽 = 0.98.
1600 CHAPTER 92. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL
We see that when consumers are more patient the asset becomes more valuable, and the price
of the Lucas tree shifts up.
Exercise 1 asks you to replicate this figure.
92.4 Exercises
92.4.1 Exercise 1
92.5 Solutions
92.5.1 Exercise 1
ax.legend(loc='upper left')
ax.set(xlabel='$y$', ylabel='price', xlim=(min(grid), max(grid)))
plt.show()
Chapter 93
93.1 Contents
• Overview 93.2
• Structure of the Model 93.3
• Solving the Model 93.4
• Exercises 93.5
• Solutions 93.6
In addition to what’s in Anaconda, this lecture will need the following libraries:
93.2 Overview
1601
1602 CHAPTER 93. ASSET PRICING III: INCOMPLETE MARKETS
93.2.1 References
Prior to reading the following, you might like to review our lectures on
• Markov chains
• Asset pricing with finite state space
93.2.2 Bubbles
The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts.
There is a fixed number 𝐴 of shares of an asset.
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}.
The dividend obeys
0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1
The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡.
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡.
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element
𝑃 (𝑖, 𝑗) = ℙ{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}
1 1
𝑃𝑎 = [ 22 2]
1
3 3
2 1
𝑃𝑏 = [ 31 3]
3
4 4
93.3. STRUCTURE OF THE MODEL 1603
The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:
In [4]: mcb.stationary_distributions
An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1.
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1).
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did.
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock.
1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset Section ??.
2. No single type of investor has sufficient resources to purchase the entire stock.
The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism.
Remember that state 1 is the high dividend state.
• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent.
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend.
However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person.
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows.
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix
1 1
𝑃𝑜 = [ 21 2]
3
4 4
1 1
𝑃𝑝 = [ 21 2]
3
4 4
93.3.4 Information
Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state.
This price function is endogenous and to be determined below.
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡 .
2. There are two types of agents differentiated only by their beliefs. Each type of agent
has sufficient resources to purchase all of the asset (Harrison and Kreps’s setting).
93.4. SOLVING THE MODEL 1605
3. There are two types of agents with different beliefs, but because of limited wealth
and/or limited leverage, both types of investors hold the asset each period.
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
Here
• 𝑝𝑎 is the equilibrium price function under homogeneous beliefs 𝑃𝑎
• 𝑝𝑏 is the equilibrium price function under homogeneous beliefs 𝑃𝑏
• 𝑝𝑜 is the equilibrium price function under heterogeneous beliefs with optimistic marginal
investors
• 𝑝𝑝 is the equilibrium price function under heterogeneous beliefs with pessimistic
marginal investors
• 𝑝𝑎̂ is the amount type 𝑎 investors are willing to pay for the asset
• 𝑝𝑏̂ is the amount type 𝑏 investors are willing to pay for the asset
We’ll explain these values and how they are calculated one row at a time.
𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1
The first two rows of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠).
Here’s a function that can be used to compute these values
In [5]: """
1606 CHAPTER 93. ASSET PRICING III: INCOMPLETE MARKETS
"""
return prices
These equilibrium prices under homogeneous beliefs are important benchmarks for the subse-
quent analysis.
• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset.
• Here “fundamental value” means the expected discounted present value of future divi-
dends.
We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs.
𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)
for 𝑠 = 0, 1.
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if
𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄
𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)
for 𝑠 = 0, 1.
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75.
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡 .
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏 .
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state.
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth.
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will.
• Investors of type 𝑎 are willing to pay the following price for the asset
𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1
• Investors of type 𝑏 are willing to pay the following price for the asset
Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset.
In this case, the asset price must adjust to attract pessimistic investors.
Instead of equation (2), the equilibrium price satisfies
𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)
and the marginal investor who prices the asset is always the one that values it less highly
than does the other type.
Now the marginal investor is always the (temporarily) pessimistic type.
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states.
When pessimistic investors price the asset according to (4), optimistic investors think that
the asset is underpriced.
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset.
Implicit constraints on leverage prohibit them from doing so.
When optimistic investors price the asset as in equation (2), pessimistic investors think that
the asset is overpriced and would like to sell the asset short.
Constraints on short sales prevent that.
Here’s code to solve for 𝑝̌ using iteration
return p_new
93.5 Exercises
93.5.1 Exercise 1
Recreate the summary table using the functions we have built above.
1610 CHAPTER 93. ASSET PRICING III: INCOMPLETE MARKETS
𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08
You will first need to define the transition matrices and dividend payoff vector.
93.6 Solutions
93.6.1 Exercise 1
First, we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic.
p_a
====================
State 0: [1.33]
State 1: [1.22]
--------------------
p_b
====================
State 0: [1.45]
State 1: [1.91]
--------------------
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_pessimistic
93.6. SOLUTIONS 1611
====================
State 0: [1.]
State 1: [1.]
--------------------
We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs.
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_hat_a
====================
State 0: [1.85]
State 1: [1.69]
--------------------
p_hat_b
====================
State 0: [1.69]
State 1: [2.08]
--------------------
Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type.
Footnotes
[1] By assuming that both types of agents always have “deep enough pockets” to purchase
all of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model
generates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
1612 CHAPTER 93. ASSET PRICING III: INCOMPLETE MARKETS
Chapter 94
94.1 Contents
• Overview 94.2
• Appendix 94.3
Authors: Daniel Csaba, Thomas J. Sargent and Balint Szoke
94.2 Overview
The famous Black-Litterman (1992) [26] portfolio choice model that we describe in this lec-
ture is motivated by the finding that with high or moderate frequency data, means are more
difficult to estimate than variances.
A model of robust portfolio choice that we’ll describe also begins from the same starting
point.
To begin, we’ll take for granted that means are more difficult to estimate that covariances
and will focus on how Black and Litterman, on the one hand, an robust control theorists,
on the other, would recommend modifying the mean-variance portfolio choice model
to take that into account.
At the end of this lecture, we shall use some rates of convergence results and some simula-
tions to verify how means are more difficult to estimate than variances.
Among the ideas in play in this lecture will be
• Mean-variance portfolio theory
• Bayesian approaches to estimating linear regressions
• A risk-sensitivity operator and its connection to robust control theory
Let’s start with some imports:
1613
1614CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
This lecture describes two lines of thought that modify the classic mean-variance portfolio
choice model in ways designed to make its recommendations more plausible.
As we mentioned above, the two approaches build on a common and widespread hunch – that
because it is much easier statistically to estimate covariances of excess returns than it is to
estimate their means, it makes sense to contemplated the consequences of adjusting investors’
subjective beliefs about mean returns in order to render more sensible decisions.
Both of the adjustments that we describe are designed to confront a widely recognized em-
barrassment to mean-variance portfolio theory, namely, that it usually implies taking very
extreme long-short portfolio positions.
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇, Σ)
or
𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖
𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) ∼ 𝒩(𝑤′ 𝜇, 𝑤′ Σ𝑤)
𝛿
𝑈 (𝜇, Σ; 𝑤) = 𝑤′ 𝜇 − 𝑤′ Σ𝑤 (1)
2
where 𝛿 > 0 is a risk-aversion parameter. The first-order condition for maximizing (1) with
respect to the vector 𝑤 is
94.2. OVERVIEW 1615
𝜇 = 𝛿Σ𝑤
𝑤 = (𝛿Σ)−1 𝜇 (2)
The key inputs into the portfolio choice model (2) are
• estimates of the parameters 𝜇, Σ of the random excess return vector(𝑟 ⃗ − 𝑟𝑓 1)
• the risk-aversion parameter 𝛿
A standard way of estimating 𝜇 is maximum-likelihood or least squares; that amounts to es-
timating 𝜇 by a sample mean of excess returns and estimating Σ by a sample covariance ma-
trix.
When estimates of 𝜇 and Σ from historical sample means and covariances have been com-
bined with reasonable values of the risk-aversion parameter 𝛿 to compute an optimal port-
folio from formula (2), a typical outcome has been 𝑤’s with extreme long and short posi-
tions.
A common reaction to these outcomes is that they are so unreasonable that a portfolio man-
ager cannot recommend them to a customer.
In [2]: np.random.seed(12)
N = 10 # Number of assets
T = 200 # Sample size
# Estimate μ and Σ
μ_est = sample.mean(0).reshape(N, 1)
1616CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
Σ_est = np.cov(sample.T)
𝑤𝑚 = (𝛿Σ)−1 𝜇𝐵𝐿
94.2.6 Details
Let’s define
′
𝑤𝑚 𝜇 ≡ (𝑟𝑚 − 𝑟𝑓 )
𝜎 2 = 𝑤𝑚
′
Σ𝑤𝑚
𝑟𝑚 − 𝑟𝑓
SR𝑚 =
𝜎
as the Sharpe-ratio on the market portfolio 𝑤𝑚 .
Let 𝛿𝑚 be the value of the risk aversion parameter that induces an investor to hold the mar-
ket portfolio in light of the optimal portfolio choice rule (2).
Evidently, portfolio rule (2) then implies that 𝑟𝑚 − 𝑟𝑓 = 𝛿𝑚 𝜎2 or
𝑟𝑚 − 𝑟𝑓
𝛿𝑚 =
𝜎2
or
SR𝑚
𝛿𝑚 =
𝜎
Following the Black-Litterman philosophy, our first step will be to back a value of 𝛿𝑚 from
• an estimate of the Sharpe-ratio, and
• our maximum likelihood estimate of 𝜎 drawn from our estimates or 𝑤𝑚 and Σ
The second key Black-Litterman step is then to use this value of 𝛿 together with the maxi-
mum likelihood estimate of Σ to deduce a 𝜇BL that verifies portfolio rule (2) at the market
portfolio 𝑤 = 𝑤𝑚
𝜇𝑚 = 𝛿𝑚 Σ𝑤𝑚
The starting point of the Black-Litterman portfolio choice model is thus a pair (𝛿𝑚 , 𝜇𝑚 ) that
tells the customer to hold the market portfolio.
# Sharpe-ratio
sr_m = r_m / np.sqrt(σ_m)
Black and Litterman start with a baseline customer who asserts that he or she shares the
market’s views, which means that he or she believes that excess returns are governed by
94.2. OVERVIEW 1619
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇𝐵𝐿 , Σ) (3)
Black and Litterman would advise that customer to hold the market portfolio of risky securi-
ties.
Black and Litterman then imagine a consumer who would like to express a view that differs
from the market’s.
The consumer wants appropriately to mix his view with the market’s before using (2) to
choose a portfolio.
Suppose that the customer’s view is expressed by a hunch that rather than (3), excess returns
are governed by
𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇,̂ 𝜏 Σ)
where 𝜏 > 0 is a scalar parameter that determines how the decision maker wants to mix his
view 𝜇̂ with the market’s view 𝜇BL .
Black and Litterman would then use a formula like the following one to mix the views 𝜇̂ and
𝜇BL
Black and Litterman would then advise the customer to hold the portfolio associated with
these views implied by rule (2):
𝑤̃ = (𝛿Σ)−1 𝜇̃
This portfolio 𝑤̃ will deviate from the portfolio 𝑤𝐵𝐿 in amounts that depend on the mixing
parameter 𝜏 .
If 𝜇̂ is the maximum likelihood estimator and 𝜏 is chosen heavily to weight this view, then the
customer’s portfolio will involve big short-long positions.
τ = 1
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
@interact(τ=τ_slider)
def BL_plot(τ):
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
w_tilde = np.linalg.solve(δ * Σ_est, μ_tilde)
𝜇 ∼ 𝒩(𝜇𝐵𝐿 , Σ)
Given a particular realization of the mean excess returns 𝜇 one observes the average excess
returns 𝜇̂ on the market according to the distribution
𝜇̂ ∣ 𝜇, Σ ∼ 𝒩(𝜇, 𝜏 Σ)
where 𝜏 is typically small capturing the idea that the variation in the mean is smaller than
the variation of the individual random variable.
Given the realized excess returns one should then update the prior over the mean excess re-
turns according to Bayes rule.
The corresponding posterior over mean excess returns is normally distributed with mean
Hence, the Black-Litterman recommendation is consistent with the Bayes update of the prior
over the mean excess returns in light of the realized average excess returns on the market.
𝑟𝑒⃗ ∼ 𝒩(𝜇𝐵𝐿 , Σ)
and
𝑟𝑒⃗ ∼ 𝒩(𝜇,̂ 𝜏 Σ)
A special feature of the multivariate normal random variable 𝑍 is that its density function
depends only on the (Euclidiean) length of its realization 𝑧.
Formally, let the 𝑘-dimensional random vector be
𝑍 ∼ 𝒩(𝜇, Σ)
then
1622CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
𝑍 ̄ ≡ Σ(𝑍 − 𝜇) ∼ 𝒩(0, 𝐼)
and so the points where the density takes the same value can be described by the ellipse
Remark: More generally there is a class of density functions that possesses this
feature, i.e.
This property is called spherical symmetry (see p 81. in Leamer (1978) [104]).
In our specific example, we can use the pair (𝑑1̄ , 𝑑2̄ ) as being two “likelihood” values for which
the corresponding iso-likelihood ellipses in the excess return space are given by
Notice that for particular 𝑑1̄ and 𝑑2̄ values the two ellipses have a tangency point.
These tangency points, indexed by the pairs (𝑑1̄ , 𝑑2̄ ), characterize points 𝑟𝑒⃗ from which there
exists no deviation where one can increase the likelihood of one view without decreasing the
likelihood of the other view.
The pairs (𝑑1̄ , 𝑑2̄ ) for which there is such a point outlines a curve in the excess return space.
This curve is reminiscent of the Pareto curve in an Edgeworth-box setting.
Dickey (1975) [45] calls it a curve decolletage.
Leamer (1978) [104] calls it an information contract curve and describes it by the following
program: maximize the likelihood of one view, say the Black-Litterman recommendation
while keeping the likelihood of the other view at least at a prespecified constant 𝑑2̄
𝑟𝑒⃗ = (Σ−1 + 𝜆(𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + 𝜆(𝜏 Σ)−1 𝜇)̂ (6)
94.2. OVERVIEW 1623
Note that if 𝜆 = 1, (6) is equivalent with (4) and it identifies one point on the information
contract curve.
Furthermore, because 𝜆 is a function of the minimum likelihood 𝑑2̄ on the RHS of the con-
straint, by varying 𝑑2̄ (or 𝜆 ), we can trace out the whole curve as the figure below illustrates.
In [5]: np.random.seed(1987102)
N = 2 # Number of assets
T = 200 # Sample size
τ = 0.8
μ = (np.random.randn(N) + 5) / 100
S = np.random.randn(N, N)
V = S @ S.T
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m)
excess_return = stat.multivariate_normal(μ, Σ)
sample = excess_return.rvs(T)
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)
@interact(λ=λ_slider)
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * Σ_est)
X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))
for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))
Note that the line that connects the two points 𝜇̂ and 𝜇𝐵𝐿 is linear, which comes from the
fact that the covariance matrices of the two competing distributions (views) are proportional
to each other.
To illustrate the fact that this is not necessarily the case, consider another example using the
same parameter values, except that the “second view” constituting the constraint has covari-
ance matrix 𝜏 𝐼 instead of 𝜏 Σ.
This leads to the following figure, on which the curve connecting 𝜇̂ and 𝜇𝐵𝐿 are bending
@interact(λ=λ_slider)
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * np.eye(N))
X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))
for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))
̂
𝛽𝑂𝐿𝑆 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
̂
mse(𝛽𝑂𝐿𝑆 ̂
, 𝛽0 ) ∶= 𝔼‖𝛽𝑂𝐿𝑆 − 𝛽0 ‖2 = 𝔼‖ ̂
𝛽⏟ − 𝔼𝛽𝑂𝐿𝑆 ‖2 + ‖𝔼 ̂
𝛽⏟ −⏟𝛽 ‖2
⏟⏟ ⏟⏟⏟
𝑂𝐿𝑆 ⏟⏟⏟ ⏟⏟ ⏟
𝑂𝐿𝑆 ⏟0⏟
variance bias
From this decomposition, one can see that in order for the MSE to be small, both the bias
and the variance terms must be small.
For example, consider the case when 𝑋 is a 𝑇 -vector of ones (where 𝑇 is the sample size), so
̂
𝛽𝑂𝐿𝑆 is simply the sample average, while 𝛽0 ∈ ℝ is defined by the true mean of 𝑦.
In this example the MSE is
2
𝑇
̂ 1
mse(𝛽𝑂𝐿𝑆 , 𝛽0 ) = 2 𝔼 (∑(𝑦𝑡 − 𝛽0 )) + 0⏟
𝑇
⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑡=1 bias
variance
94.2. OVERVIEW 1627
However, because there is a trade-off between the estimator’s bias and variance, there are
cases when by permitting a small bias we can substantially reduce the variance so overall the
MSE gets smaller.
A typical scenario when this proves to be useful is when the number of coefficients to be esti-
mated is large relative to the sample size.
In these cases, one approach to handle the bias-variance trade-off is the so called Tikhonov
regularization.
A general form with regularization matrix Γ can be written as
̃ 2}
min {‖𝑋𝛽 − 𝑦‖2 + ‖Γ(𝛽 − 𝛽)‖
𝛽
̂
𝛽𝑅𝑒𝑔 = (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑦 + Γ′ Γ𝛽)̃
̂
Substituting the value of 𝛽𝑂𝐿𝑆 yields
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
Often, the regularization matrix takes the form Γ = 𝜆𝐼 with 𝜆 > 0 and 𝛽 ̃ = 0.
Then the Tikhonov regularization is equivalent to what is called ridge regression in statistics.
To illustrate how this estimator addresses the bias-variance trade-off, we compute the MSE of
the ridge estimator
2
𝑇 2
1 𝜆
mse(𝛽 ̂
ridge , 𝛽 0 ) = 2
𝔼 ( ∑ (𝑦𝑡 − 𝛽0 )) + ( ) 𝛽02
(𝑇 + 𝜆) 𝑇 +
⏟⏟⏟⏟⏟ 𝜆
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 𝑡=1
bias
variance
The ridge regression shrinks the coefficients of the estimated vector towards zero relative to
the OLS estimates thus reducing the variance term at the cost of introducing a “small” bias.
However, there is nothing special about the zero vector.
When 𝛽 ̃ ≠ 0 shrinkage occurs in the direction of 𝛽.̃
Now, we can give a regularization interpretation of the Black-Litterman portfolio recommen-
dation.
To this end, simplify first the equation (4) characterizing the Black-Litterman recommenda-
tion
In our case, 𝜇̂ is the estimated mean excess returns of securities. This could be written as a
vector autoregression where
1628CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
̂
= (1 + 𝜏 )−1 𝑋 ′ 𝑋(𝑋 ′ 𝑋)−1 (𝛽𝑂𝐿𝑆 + 𝜏 𝛽)̃
= (1 + 𝜏 )−1 (𝛽 ̂ + 𝜏 𝛽)̃
𝑂𝐿𝑆
= (1 + 𝜏 −1 −1 ̂
) (𝜏 −1 𝛽𝑂𝐿𝑆 + 𝛽)̃
̂
Given that 𝛽𝑂𝐿𝑆 = 𝜇̂ and 𝛽 ̃ = 𝜇𝐵𝐿 in the Black-Litterman model, we have the following
interpretation of the model’s recommendation.
The estimated (personal) view of the mean excess returns, 𝜇̂ that would lead to extreme
short-long positions are “shrunk” towards the conservative market view, 𝜇𝐵𝐿 , that leads to
the more conservative market portfolio.
So the Black-Litterman procedure results in a recommendation that is a compromise between
the conservative market portfolio and the more extreme portfolio that is implied by estimated
“personal” views.
The Black-Litterman approach is partly inspired by the econometric insight that it is easier
to estimate covariances of excess returns than the means.
That is what gave Black and Litterman license to adjust investors’ perception of mean excess
returns while not tampering with the covariance matrix of excess returns.
The robust control theory is another approach that also hinges on adjusting mean excess re-
turns but not covariances.
Associated with a robust control problem is what Hansen and Sargent [76], [71] call a T oper-
ator.
Let’s define the T operator as it applies to the problem at hand.
Let 𝑥 be an 𝑛 × 1 Gaussian random vector with mean vector 𝜇 and covariance matrix Σ =
𝐶𝐶 ′ . This means that 𝑥 can be represented as
𝑥 = 𝜇 + 𝐶𝜖
̃ = 𝑚(𝜖, 𝜇)𝜙(𝜖)
𝜙(𝜖)
The next concept that we need is the entropy of the distorted distribution 𝜙 ̃ with respect to
𝜙.
Entropy is defined as
or
̃
ent = ∫ log 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖
That is, relative entropy is the expected value of the likelihood ratio 𝑚 where the expectation
is taken with respect to the twisted density 𝜙.̃
Relative entropy is non-negative. It is a measure of the discrepancy between two probability
distributions.
As such, it plays an important role in governing the behavior of statistical tests designed to
discriminate one probability distribution from another.
We are ready to define the T operator.
Let 𝑉 (𝑥) be a value function.
Define
−𝑉 (𝜇 + 𝐶𝜖)
= − log 𝜃 ∫ exp ( ) 𝜙(𝜖)𝑑𝜖
𝜃
This asserts that T is an indirect utility function for a minimization problem in which an evil
agent chooses a distorted probability distribution 𝜙 ̃ to lower expected utility, subject to a
penalty term that gets bigger the larger is relative entropy.
Here the penalty parameter
𝜃 ∈ [𝜃, +∞]
is a robustness parameter when it is +∞, there is no scope for the minimizing agent to dis-
tort the distribution, so no robustness to alternative distributions is acquired As 𝜃 is lowered,
more robustness is achieved.
Note: The T operator is sometimes called a risk-sensitivity operator.
We shall apply Tto the special case of a linear value function 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) where 𝑟 ⃗ − 𝑟𝑓 1 ∼
𝒩(𝜇, Σ) or 𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖and 𝜖 ∼ 𝒩(0, 𝐼).
1630CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
The associated worst-case distribution of 𝜖 is Gaussian with mean 𝑣 = −𝜃−1 𝐶 ′ 𝑤 and co-
variance matrix 𝐼 (When the value function is affine, the worst-case distribution distorts the
mean vector of 𝜖 but not the covariance matrix of 𝜖).
For utility function argument 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1)
1 ′
T(𝑟 ⃗ − 𝑟𝑓 1) = 𝑤′ 𝜇 + 𝜁 − 𝑤 Σ𝑤
2𝜃
and entropy is
𝑣′ 𝑣 1
= 2 𝑤′ 𝐶𝐶 ′ 𝑤
2 2𝜃
According to criterion (1), the mean-variance portfolio choice problem chooses 𝑤 to maximize
which equals
𝛿
𝑤′ 𝜇 − 𝑤′ Σ𝑤
2
A robust decision maker can be modeled as replacing the mean return 𝐸[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] with the
risk-sensitive
1 ′
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] = 𝑤′ 𝜇 − 𝑤 Σ𝑤
2𝜃
that comes from replacing the mean 𝜇 of 𝑟 ⃗ − 𝑟_𝑓1 with the worst-case mean
𝜇 − 𝜃−1 Σ𝑤
𝛿
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] − 𝑤′ Σ𝑤
2
or
𝛿
𝑤′ (𝜇 − 𝜃−1 Σ𝑤) − 𝑤′ Σ𝑤 (7)
2
1
𝑤rob = Σ−1 𝜇
𝛿+𝛾
94.3 Appendix
We want to illustrate the “folk theorem” that with high or moderate frequency data, it is
more difficult to estimate means than variances.
In order to operationalize this statement, we take two analog estimators:
𝑁
• sample average: 𝑋̄ 𝑁 = 1
𝑁 ∑𝑖=1 𝑋𝑖
𝑁
• sample variance: 𝑆𝑁 = 1
𝑁−1 ∑𝑡=1 (𝑋𝑖 − 𝑋̄ 𝑁 )2
to estimate the unconditional mean and unconditional variance of the random variable 𝑋,
respectively.
To measure the “difficulty of estimation”, we use mean squared error (MSE), that is the aver-
age squared difference between the estimator and the true value.
Assuming that the process {𝑋𝑖 }is ergodic, both analog estimators are known to converge to
their true values as the sample size 𝑁 goes to infinity.
More precisely for all 𝜀 > 0
and
A necessary condition for these convergence results is that the associated MSEs vanish as 𝑁
goes to infinity, or in other words,
Even if the MSEs converge to zero, the associated rates might be different. Looking at the
limit of the relative MSE (as the sample size grows to infinity)
In particular, we find that the rate of convergence of the variance estimator is less sensitive to
increased sampling frequency than the rate of convergence of the mean estimator.
Hence, we can expect the relative asymptotic rate, 𝐵, to get smaller with higher frequency
data, illustrating that “it is more difficult to estimate means than variances”.
That is, we need significantly more data to obtain a given precision of the mean estimate
than for our variance estimate.
We start our analysis with the benchmark case of IID data. Consider a sample of size 𝑁 gen-
erated by the following IID process,
𝑋𝑖 ∼ 𝒩(𝜇, 𝜎2 )
𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) =
𝑁
2𝜎4
MSE(𝑆𝑁 , 𝜎2 ) =
𝑁 −1
Both estimators are unbiased and hence the MSEs reflect the corresponding variances of the
estimators.
Furthermore, both MSEs are 𝑜(1) with a (multiplicative) factor of difference in their rates of
convergence:
MSE(𝑆𝑁 , 𝜎2 ) 𝑁 2𝜎2
= → 2𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) 𝑁 −1 𝑁→∞
We are interested in how this (asymptotic) relative rate of convergence changes as increasing
sampling frequency puts dependence into the data.
To investigate how sampling frequency affects relative rates of convergence, we assume that
the data are generated by a mean-reverting continuous time process of the form
where 𝜇is the unconditional mean, 𝜅 > 0 is a persistence parameter, and {𝑊𝑡 } is a standard-
ized Brownian motion.
Observations arising from this system in particular discrete periods 𝒯(ℎ) ≡ {𝑛ℎ ∶ 𝑛 ∈
ℤ}withℎ > 0 can be described by the following process
94.3. APPENDIX 1633
where
𝜎2 (1 − exp(−2𝜅ℎ))
𝜖𝑡,ℎ ∼ 𝒩(0, Σℎ ) with Σℎ =
2𝜅
We call ℎ the frequency parameter, whereas 𝑛 represents the number of lags between observa-
tions.
Hence, the effective distance between two observations 𝑋𝑡 and 𝑋𝑡+𝑛 in the discrete time nota-
tion is equal to ℎ ⋅ 𝑛 in terms of the underlying continuous time process.
Straightforward calculations show that the autocorrelation function for the stochastic process
{𝑋𝑡 }𝑡∈𝒯(ℎ) is
exp(−𝜅ℎ𝑛)𝜎2
𝛾ℎ (𝑛) ≡ cov(𝑋𝑡+ℎ𝑛 , 𝑋𝑡 ) = .
2𝜅
𝜎2
It follows that if 𝑛 = 0, the unconditional variance is given by 𝛾ℎ (0) = 2𝜅 irrespective of the
sampling frequency.
The following figure illustrates how the dependence between the observations is related to the
sampling frequency
• For any given ℎ, the autocorrelation converges to zero as we increase the distance – 𝑛–
between the observations. This represents the “weak dependence” of the 𝑋 process.
• Moreover, for a fixed lag length, 𝑛, the dependence vanishes as the sampling frequency
goes to infinity. In fact, letting ℎ go to ∞ gives back the case of IID data.
In [7]: μ = .0
κ = .1
σ = .5
var_uncond = σ**2 / (2 * κ)
Consider again the AR(1) process generated by discrete sampling with frequency ℎ. Assume
that we have a sample of size 𝑁 and we would like to estimate the unconditional mean – in
our case the true mean is 𝜇.
Again, the sample average is an unbiased estimator of the unconditional mean
1 𝑁
𝔼[𝑋̄ 𝑁 ] = ∑ 𝔼[𝑋𝑖 ] = 𝔼[𝑋0 ] = 𝜇
𝑁 𝑖=1
1 𝑁
𝕍 (𝑋̄ 𝑁 ) = 𝕍 ( ∑ 𝑋𝑖 )
𝑁 𝑖=1
𝑁 𝑁−1 𝑁
1
= (∑ 𝕍(𝑋 𝑖 ) + 2 ∑ ∑ cov(𝑋𝑖 , 𝑋𝑠 ))
𝑁 2 𝑖=1 𝑖=1 𝑠=𝑖+1
𝑁−1
1
= (𝑁 𝛾(0) + 2 ∑ 𝑖 ⋅ 𝛾 (ℎ ⋅ (𝑁 − 𝑖)))
𝑁2 𝑖=1
𝑁−1
1 𝜎2 𝜎2
= 2
(𝑁 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖)) )
𝑁 2𝜅 𝑖=1
2𝜅
It is explicit in the above equation that time dependence in the data inflates the variance of
the mean estimator through the covariance terms. Moreover, as we can see, a higher sampling
frequency—smaller ℎ—makes all the covariance terms larger, everything else being fixed. This
implies a relatively slower rate of convergence of the sample average for high-frequency data.
94.3. APPENDIX 1635
Intuitively, the stronger dependence across observations for high-frequency data reduces the
“information content” of each observation relative to the IID case.
We can upper bound the variance term in the following way
𝑁−1
1
𝕍(𝑋̄ 𝑁 ) = 2 (𝑁 𝜎2 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖))𝜎2 )
𝑁 𝑖=1
𝑁−1
𝜎2
≤ (1 + 2 ∑ ⋅ exp(−𝜅ℎ(𝑖)))
2𝜅𝑁 𝑖=1
𝜎2 1 − exp(−𝜅ℎ)𝑁−1
= (1 + 2 )
2𝜅𝑁
⏟ 1 − exp(−𝜅ℎ)
IID case
Asymptotically the exp(−𝜅ℎ)𝑁−1 vanishes and the dependence in the data inflates the bench-
mark IID variance by a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
This long run factor is larger the higher is the frequency (the smaller is ℎ).
Therefore, we expect the asymptotic relative MSEs, 𝐵, to change with time-dependent data.
We just saw that the mean estimator’s rate is roughly changing by a factor of
1
(1 + 2 )
1 − exp(−𝜅ℎ)
mean_uncond = μ
std_uncond = np.sqrt(σ**2 / (2 * κ))
for i in range(N):
y_path[:, i + 1] = ϕ + ρ * y_path[:, i] + ε_path[:, i]
return y_path
var_est_store = []
mean_est_store = []
labels = []
for h in h_grid:
labels.append(h)
sample = sample_generator(h, N_app, M_app)
mean_est_store.append(np.mean(sample, 1))
var_est_store.append(np.var(sample, 1))
var_est_store = np.array(var_est_store)
mean_est_store = np.array(mean_est_store)
The above figure illustrates the relationship between the asymptotic relative MSEs and the
sampling frequency
• We can see that with low-frequency data – large values of ℎ – the ratio of asymptotic
rates approaches the IID case.
• As ℎ gets smaller – the higher the frequency – the relative performance of the variance
estimator is better in the sense that the ratio of asymptotic rates gets smaller. That
is, as the time dependence gets more pronounced, the rate of convergence of the mean
estimator’s MSE deteriorates more than that of the variance estimator.
1638CHAPTER 94. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
Part XIV
1639
Chapter 95
Stackelberg Plans
95.1 Contents
• Overview 95.2
• Duopoly 95.3
• The Stackelberg Problem 95.4
• Stackelberg Plan 95.5
• Recursive Representation of Stackelberg Plan 95.6
• Computing the Stackelberg Plan 95.7
• Exhibiting Time Inconsistency of Stackelberg Plan 95.8
• Recursive Formulation of the Follower’s Problem 95.9
• Markov Perfect Equilibrium 95.10
• MPE vs. Stackelberg 95.11
In addition to what’s in Anaconda, this lecture will need the following libraries:
95.2 Overview
This notebook formulates and computes a plan that a Stackelberg leader uses to manip-
ulate forward-looking decisions of a Stackelberg follower that depend on continuation se-
quences of decisions made once and for all by the Stackelberg leader at time 0.
To facilitate computation and interpretation, we formulate things in a context that allows us
to apply linear optimal dynamic programming.
From the beginning, we carry along a linear-quadratic model of duopoly in which firms face
adjustment costs that make them want to forecast actions of other firms that influence future
prices.
Let’s start with some standard imports:
1641
1642 CHAPTER 95. STACKELBERG PLANS
95.3 Duopoly
𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 )
where 𝑞𝑖𝑡 is output of firm 𝑖 at time 𝑡 and 𝑎0 and 𝑎1 are both positive.
𝑞10 , 𝑞20 are given numbers that serve as initial conditions at time 0.
By incurring a cost of change
2
𝛾𝑣𝑖𝑡
2
𝜋𝑖𝑡 = 𝑝𝑡 𝑞𝑖𝑡 − 𝛾𝑣𝑖𝑡
∞
∑ 𝛽 𝑡 𝜋𝑖𝑡
𝑡=0
Knowing that firm 2 has chosen {𝑞2𝑡+1 }∞𝑡=0 , the follower firm 1 goes second and chooses
{𝑞1𝑡+1 }∞
𝑡=0 once and for all at time 0.
In choosing 𝑞2⃗ , firm 2 takes into account that firm 1 will base its choice of 𝑞1⃗ on firm 2’s
choice of 𝑞2⃗ .
where the appearance behind the semi-colon indicates that 𝑞2⃗ is given.
Firm 1’s problem induces the best response mapping
𝑞1⃗ = 𝐵(𝑞2⃗ )
whose maximizer is a sequence 𝑞2⃗ that depends on the initial conditions 𝑞10 , 𝑞20 and the pa-
rameters of the model 𝑎0 , 𝑎1 , 𝛾.
This formulation captures key features of the model
• Both firms make once-and-for-all choices at time 0.
• This is true even though both firms are choosing sequences of quantities that are in-
dexed by time.
• The Stackelberg leader chooses first within time 0, knowing that the Stackelberg fol-
lower will choose second within time 0.
While our abstract formulation reveals the timing protocol and equilibrium concept well, it
obscures details that must be addressed when we want to compute and interpret a Stackel-
berg plan and the follower’s best response to it.
To gain insights about these things, we study them in more detail.
Firm 2 knows that firm 1 chooses second and takes this into account in choosing {𝑞2𝑡+1 }∞
𝑡=0 .
In the spirit of working backward, we study firm 1’s problem first, taking {𝑞2𝑡+1 }∞
𝑡=0 as given.
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
1644 CHAPTER 95. STACKELBERG PLANS
These first-order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to take the
form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We can substitute the second equation into the first equation to obtain
This equation can in turn be rearranged to become the second-order difference equation
Equation (1) is a second-order difference equation in the sequence 𝑞1⃗ whose solution we want.
It satisfies two boundary conditions:
• an initial condition that 𝑞1,0 , which is given
• a terminal condition requiring that lim𝑇 →+∞ 𝛽 𝑇 𝑞1𝑡
2
< +∞
Using the lag operators described in chapter IX of Macroeconomic Theory, Second edition
(1987), difference equation (1) can be written as
1 + 𝛽 + 𝑐1
𝛽(1 − 𝐿 + 𝛽 −1 𝐿2 )𝑞1𝑡+2 = −𝑐0 + 𝑐2 𝑞2𝑡+1
𝛽
The polynomial in the lag operator on the left side can be factored as
1 + 𝛽 + 𝑐1
(1 − 𝐿 + 𝛽 −1 𝐿2 ) = (1 − 𝛿1 𝐿)(1 − 𝛿2 𝐿) (2)
𝛽
Because 𝛿2 > √1𝛽 the operator (1 − 𝛿2 𝐿) contributes an unstable component if solved back-
wards but a stable component if solved forwards.
Mechanically, write
−1 −1 −1
[−𝛿2 𝐿(1 − 𝛿2−1 𝐿−1 )] = −𝛿2 (1 − 𝛿2 ) 𝐿−1
Operating on both sides of equation (2) with 𝛽 −1 times this inverse operator gives the fol-
lower’s decision rule for setting 𝑞1𝑡+1 in the feedback-feedforward form.
∞
1
𝑞1𝑡+1 = 𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐 2 𝛿 −1 −1
2 𝛽 ∑ 𝛿2𝑗 𝑞2𝑡+𝑗+1 , 𝑡≥0 (3)
1 − 𝛿2 𝑗=0
95.4. THE STACKELBERG PROBLEM 1645
The problem of the Stackelberg leader firm 2 is to choose the sequence {𝑞2𝑡+1 }∞
𝑡=0 to maxi-
mize its discounted profits
∞
∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞
𝐿̃ = ∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞ ∞ (4)
1
𝑡
+ ∑ 𝛽 𝜃𝑡 {𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐 2 𝛿 −1 −1
2 𝛽 ∑ 𝛿2−𝑗 𝑞2𝑡+𝑗+1 − 𝑞1𝑡+1 }
𝑡=0
1 − 𝛿2 𝑗=0
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑟(𝑦, 𝑢) = 𝑦′ 𝑅𝑦 + 𝑢′ 𝑄𝑢
Subject to an initial condition for 𝑧0 , but not for 𝑥0 , the Stackelberg leader wants to maxi-
mize
∞
− ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 ) (5)
𝑡=0
𝐼 0 𝑧 𝐴̂ 𝐴12̂ 𝑧 ̂ 𝑡
[ ] [ 𝑡+1 ] = [ 11
̂ ̂ ] [ 𝑡 ] + 𝐵𝑢 (6)
𝐺21 𝐺22 𝑥𝑡+1 𝐴21 𝐴22 𝑥 𝑡
𝐼 0
We assume that the matrix [ ] on the left side of equation (6) is invertible, so that
𝐺21 𝐺22
we can multiply both sides by its inverse to obtain
𝑧 𝐴 𝐴12 𝑧𝑡
[ 𝑡+1 ] = [ 11 ] [ ] + 𝐵𝑢𝑡 (7)
𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡
or
The Stackelberg follower’s best response mapping is summarized by the second block of equa-
tions of (7).
In particular, these equations are the first-order conditions of the Stackelberg follower’s opti-
mization problem (i.e., its Euler equations).
These Euler equations summarize the forward-looking aspect of the follower’s behavior and
express how its time 𝑡 decision depends on the leader’s actions at times 𝑠 ≥ 𝑡.
When combined with a stability condition to be imposed below, the Euler equations summa-
rize the follower’s best response to the sequence of actions by the leader.
The Stackelberg leader maximizes (5) by choosing sequences {𝑢𝑡 , 𝑥𝑡 , 𝑧𝑡+1 }∞
𝑡=0 subject to (8)
and an initial condition for 𝑧0 .
Note that we have an initial condition for 𝑧0 but not for 𝑥0 .
𝑥0 is among the variables to be chosen at time 0 by the Stackelberg leader.
The Stackelberg leader uses its understanding of the responses restricted by (8) to manipulate
the follower’s decisions.
95.4. THE STACKELBERG PROBLEM 1647
Please remember that the follower’s Euler equation is embedded in the system of dynamic
equations 𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡 .
Note that in the definition of Ω(𝑦0 ), 𝑦0 is taken as given.
Although it is taken as given in Ω(𝑦0 ), eventually, the 𝑥0 component of 𝑦0 will be chosen by
the Stackelberg leader.
Subproblem 1
∞
𝑣(𝑦0 ) = max − ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 )
(𝑦1⃗ ,𝑢⃗ 0 )∈Ω(𝑦0 )
𝑡=0
Subproblem 2
Subproblem 1
𝑣(𝑦) = max
∗
{−𝑟(𝑦, 𝑢) + 𝛽𝑣(𝑦∗ )} (9)
𝑢,𝑦
𝑦∗ = 𝐴𝑦 + 𝐵𝑢
which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation
𝑢𝑡 = −𝐹 𝑦𝑡
Subproblem 2
−2𝑃21 𝑧0 − 2𝑃22 𝑥0 = 0,
−1
𝑥0 = −𝑃22 𝑃21 𝑧0
Now let’s map our duopoly model into the above setup.
We will formulate a state space system
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
where in this instance 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the follower firm 1.
95.5. STACKELBERG PLAN 1649
Now we’ll proceed to cast our duopoly model within the framework of the more general
linear-quadratic structure described above.
That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation to
solve a linear-quadratic dynamic program.
As emphasized above, firm 1 acts as if firm 2’s decisions {𝑞2𝑡+1 , 𝑣2𝑡 }∞
𝑡=0 are given and beyond
its control.
∞
𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0
𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
These first-order order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to
take the form
𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞 − 1 𝑞2𝑡+1
2𝛾 𝛾 1𝑡+1 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡
We use these two equations as components of the following linear system that confronts a
Stackelberg continuation leader at time 𝑡
1 0 0 0 1 1 0 0 0 1 0
⎡ 0 1 0 0 ⎤ ⎡𝑞2𝑡+1 ⎤ ⎡0 1 0 0⎤ ⎡𝑞 ⎤ ⎡1⎤
⎢ ⎥⎢ ⎥=⎢ ⎥ ⎢ 2𝑡 ⎥ + ⎢ ⎥ 𝑣
⎢ 0 0 1 0 ⎥ ⎢𝑞1𝑡+1 ⎥ ⎢0 0 1 1⎥ ⎢𝑞1𝑡 ⎥ ⎢0⎥ 2𝑡
𝛽𝑎0
⎣ 2𝛾 − 𝛽𝑎
2𝛾
1
− 𝛽𝑎𝛾 1 𝛽 ⎦ ⎣𝑣1𝑡+1 ⎦ ⎣0 0 0 1⎦ ⎣𝑣1𝑡 ⎦ ⎣0⎦
2
Time 𝑡 revenues of firm 2 are 𝜋2𝑡 = 𝑎0 𝑞2𝑡 − 𝑎1 𝑞2𝑡 − 𝑎1 𝑞1𝑡 𝑞2𝑡 which evidently equal
′ 𝑎0
1 0 2 0 1
′ ⎡ ⎤ ⎡ 𝑎0
𝑧𝑡 𝑅1 𝑧𝑡 ≡ ⎢𝑞2𝑡 ⎥ ⎢ 2 −𝑎1 𝑎1 ⎤ ⎡
− 2 ⎥ ⎢𝑞2𝑡 ⎤
⎥
⎣𝑞1𝑡 ⎦ ⎣ 0 − 𝑎21 0 ⎦ ⎣𝑞1𝑡 ⎦
where
𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡
𝑅1 0
𝑅=[ ]
0 0
−1
𝑥0̌ = −𝑃22 𝑃21 𝑧0
𝑢𝑡 = −𝐹 𝑦𝑡̌ , 𝑡≥0
𝑦𝑡+1
̌ = (𝐴 − 𝐵𝐹 )𝑦𝑡̌ , 𝑡≥0
From this representation, we can deduce the sequence of functions 𝜎 = {𝜎𝑡 (𝑧𝑡̌ )}∞
𝑡=0 that com-
prise a Stackelberg plan.
𝑧̌
For convenience, let 𝐴 ̌ ≡ 𝐴 − 𝐵𝐹 and partition 𝐴 ̌ conformably to the partition 𝑦𝑡 = [ 𝑡 ] as
𝑥𝑡̌
𝐴̌ ̌
𝐴12
[ 11̌ ̌ ]
𝐴21 𝐴22
95.6. RECURSIVE REPRESENTATION OF STACKELBERG PLAN 1651
𝑡
𝑥𝑡 = ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗
̌
𝑗=1
where
̌
𝐻1𝑡 = 𝐴21
𝐻𝑡 = 𝐴̌ 𝐴̌
2 22 21
⋮ ⋮
𝑡
𝐻𝑡−1 ̌ 𝐴̌
= 𝐴𝑡−2
22 21
𝐻𝑡𝑡 = ̌
𝐴𝑡−1 ̌ ̌ 𝐻 0)
22 (𝐴21 + 𝐴22 0
𝑧̌
𝑢𝑡 = −𝐹 𝑦𝑡̌ ≡ − [𝐹𝑧 𝐹𝑥 ] [ 𝑡 ]
𝑥𝑡
or
𝑡
𝑢𝑡 = −𝐹𝑧 𝑧𝑡̌ − 𝐹𝑥 ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗 = 𝜎𝑡 (𝑧𝑡̌ ) (10)
𝑗=1
Representation (10) confirms that whenever 𝐹𝑥 ≠ 0, the typical situation, the time 𝑡 compo-
nent 𝜎𝑡 of a Stackelberg plan is history-dependent, meaning that the Stackelberg leader’s
choice 𝑢𝑡 depends not just on 𝑧𝑡̌ but on components of 𝑧𝑡−1
̌ .
After all, at the end of the day, it will turn out that because we set 𝑧0̌ = 𝑧0 , it will be true
that 𝑧𝑡 = 𝑧𝑡̌ for all 𝑡 ≥ 0.
Then why did we distinguish 𝑧𝑡̌ from 𝑧𝑡 ?
The answer is that if we want to present to the Stackelberg follower a history-dependent
representation of the Stackelberg leader’s sequence 𝑞2⃗ , we must use representation (10) cast
in terms of the history 𝑧𝑡̌ and not a corresponding representation cast in terms of 𝑧𝑡 .
Given the sequence 𝑞2⃗ chosen by the Stackelberg leader in our duopoly model, it turns out
that the Stackelberg follower’s problem is recursive in the natural state variables that con-
front a follower at any time 𝑡 ≥ 0.
This means that the follower’s plan is time consistent.
1652 CHAPTER 95. STACKELBERG PLANS
To verify these claims, we’ll formulate a recursive version of a follower’s problem that builds
on our recursive representation of the Stackelberg leader’s plan and our use of the Big K,
little k idea.
We now use what amounts to another “Big 𝐾, little 𝑘” trick (see rational expectations equi-
librium) to formulate a recursive version of a follower’s problem cast in terms of an ordinary
Bellman equation.
Firm 1, the follower, faces {𝑞2𝑡 }∞
𝑡=0 as a given quantity sequence chosen by the leader and be-
lieves that its output price at 𝑡 satisfies
To do so, recall that under the Stackelberg plan, firm 2 sets output according to the 𝑞2𝑡 com-
ponent of
1
⎡𝑞 ⎤
𝑦𝑡+1 = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡 ⎥
⎣ 𝑥𝑡 ⎦
which is governed by
𝑦𝑡+1 = (𝐴 − 𝐵𝐹 )𝑦𝑡
1
⎡𝑞 ⎤
𝑦𝑡̃ = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑥𝑡̃ ⎦
𝑦𝑡+1
̃ = (𝐴 − 𝐵𝐹 )𝑦𝑡̃
−1
subject to the initial condition 𝑞10
̃ = 𝑞10 and 𝑥0̃ = 𝑥0 where 𝑥0 = −𝑃22 𝑃21 as stated above.
Firm 1’s state vector is
𝑦𝑡̃
𝑋𝑡 = [ ]
𝑞1𝑡
𝑦̃ 𝐴 − 𝐵𝐹 0 𝑦𝑡̃ 0
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑥𝑡 (11)
𝑞1𝑡+1 0 1 𝑞1𝑡 1
This specification assures that from the point of the view of a firm 1, 𝑞2𝑡 is an exogenous pro-
cess.
Here
• 𝑞1𝑡
̃ , 𝑥𝑡̃ play the role of Big K
• 𝑞1𝑡 , 𝑥𝑡 play the role of little k
The time 𝑡 component of firm 1’s objective is
′
1 0 0 0 0 𝑎20 1
⎡𝑞 ⎤ ⎡0 0 0 0 − 𝑎21 ⎤ ⎡𝑞2𝑡 ⎤
2𝑡 ⎥
̃ 𝑡 − 𝑥2𝑡 𝑄̃ = ⎢
𝑋̃ 𝑡′ 𝑅𝑥 ⎢𝑞1𝑡
̃ ⎥
⎢
⎢0 0 0 0
⎥⎢ ⎥
0 ⎥ ⎢𝑞1𝑡 ̃ ⎥ − 𝛾𝑥2𝑡
⎢ 𝑥𝑡̃ ⎥ ⎢0 0 0 0 0 ⎥ ⎢ 𝑥𝑡̃ ⎥
𝑎
⎣𝑞1𝑡 ⎦ ⎣ 20 − 𝑎21 0 0 −𝑎1 ⎦ ⎣𝑞1𝑡 ⎦
𝑥𝑡 = −𝐹 ̃ 𝑋𝑡
𝑋̃ 𝑡+1 = (𝐴 ̃ − 𝐵̃ 𝐹 ̃ )𝑋𝑡
1
⎡𝑞 ⎤
⎢ 20 ⎥
𝑋0 = ⎢𝑞10 ⎥
⎢ 𝑥0 ⎥
⎣𝑞10 ⎦
we recover
𝑥0 = −𝐹 ̃ 𝑋̃ 0
which will verify that we have properly set up a recursive representation of the follower’s
problem facing the Stackelberg leader’s 𝑞2⃗ .
Since the follower can solve its problem using dynamic programming its problem is recursive
in what for it are the natural state variables, namely
1654 CHAPTER 95. STACKELBERG PLANS
1
⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎢𝑞10
̃ ⎥
⎣ 𝑥0̃ ⎦
Here is our code to compute a Stackelberg plan via a linear-quadratic dynamic program as
outlined above
In [3]: # Parameters
a0 = 10
a1 = 2
β = 0.96
γ = 120
n = 300
tol0 = 1e-8
tol1 = 1e-16
tol2 = 1e-2
βs = np.ones(n)
βs[1:] = β
βs = βs.cumprod()
In [4]: # In LQ form
Alhs = np.eye(4)
Arhs = np.eye(4)
Arhs[2, 3] = 1
Alhsinv = la.inv(Alhs)
A = Alhsinv @ Arhs
Q = np.array([[γ]])
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values(method='doubling')
95.7. COMPUTING THE STACKELBERG PLAN 1655
# Simulate forward
π_leader = np.zeros(n)
z0 = np.array([[1, 1, 1]]).T
x0 = H_0_0 @ z0
y0 = np.vstack((z0, x0))
π_matrix = (R + F. T @ Q @ F)
for t in range(n):
π_leader[t] = -(yt[:, t].T @ π_matrix @ yt[:, t])
# Display policies
print("Computed policy for Stackelberg leader\n")
print(f"F = {F}")
# Display values
print("Computed values for the Stackelberg leader at t=0:\n")
print(f"v_leader_forward(forward sim) = {v_leader_forward:.4f}")
print(f"v_leader_direct (direct) = {v_leader_direct:.4f}")
Out[7]: True
Out[8]: True
yt_reset = yt.copy()
yt_reset[-1, :] = (H_0_0 @ yt[:3, :])
for t in range(n):
vt_leader[t] = -yt[:, t].T @ P @ yt[:, t]
vt_reset_leader[t] = -yt_reset[:, t].T @ P @ yt_reset[:, t]
plt.tight_layout()
plt.show()
1658 CHAPTER 95. STACKELBERG PLANS
We now formulate and compute the recursive version of the follower’s problem.
We check that the recursive Big 𝐾 , little 𝑘 formulation of the follower’s problem produces
the same output path 𝑞1⃗ that we computed when we solved the Stackelberg problem
Q_tilde = Q
B_tilde = np.array([[0, 0, 0, 0, 1]]).T
In [12]: # Checks that the recursive formulation of the follower's problem gives
# the same solution as the original Stackelberg problem
fig, ax = plt.subplots()
95.9. RECURSIVE FORMULATION OF THE FOLLOWER’S PROBLEM 1659
Note: Variables with _tilde are obtained from solving the follower’s problem – those with-
out are from the Stackelberg problem
Out[13]: 6.661338147750939e-16
In [14]: # x0 == x0_tilde
yt[:, 0][-1] - (yt_tilde[:, 1] - yt_tilde[:, 0])[-1] < tol0
Out[14]: True
If we inspect the coefficients in the decision rule −𝐹 ̃ , we can spot the reason that the follower
chooses to set 𝑥𝑡 = 𝑥𝑡̃ when it sets 𝑥𝑡 = −𝐹 ̃ 𝑋𝑡 in the recursive formulation of the follower
problem.
Can you spot what features of 𝐹 ̃ imply this?
Hint: remember the components of 𝑋𝑡
Out[18]: True
for i in range(1000):
P_guess = ((R_tilde + F_tilde_star.T @ Q @ F_tilde_star) +
β * (A_tilde - B_tilde @ F_tilde_star).T @ P_guess
@ (A_tilde - B_tilde @ F_tilde_star))
Out[20]: 112.65590740578058
Out[21]: 112.6559074057807
for i in range(100):
# Compute P_iter
P_iter = np.zeros((5, 5))
for j in range(1000):
P_iter = ((R_tilde + F_iter.T @ Q @ F_iter) + β
* (A_tilde - B_tilde @ F_iter).T @ P_iter
@ (A_tilde - B_tilde @ F_iter))
# Update F_iter
F_iter = (β * la.inv(Q + β * B_tilde.T @ P_iter @ B_tilde)
@ B_tilde.T @ P_iter @ A_tilde)
In [23]: # Simulate the system using `F_tilde_star` and check that it gives the
# same result as the original solution
for t in range(n-1):
yt_tilde_star[t+1, :] = (A_tilde - B_tilde @ F_tilde_star) \
@ yt_tilde_star[t, :]
fig, ax = plt.subplots()
ax.plot(yt_tilde_star[:, 4], 'r', label="q_tilde")
ax.plot(yt_tilde[2], 'b', label="q")
ax.legend()
plt.show()
1662 CHAPTER 95. STACKELBERG PLANS
Out[24]: 0.0
1
𝑧𝑡 = ⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎣𝑞1𝑡 ⎦
0 0
𝐵1 = ⎡ ⎤
⎢0⎥ , 𝐵2 = ⎡
⎢1⎥
⎤
⎣1⎦ ⎣0⎦
𝑧𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2 )𝑧𝑡
In [25]: # In LQ form
A = np.eye(3)
B1 = np.array([[0], [0], [1]])
B2 = np.array([[0], [1], [0]])
Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0
# Simulate forward
AF = A - B1 @ F1 - B2 @ F2
z = np.empty((3, n))
z[:, 0] = 1, 1, 1
for t in range(n-1):
z[:, t+1] = AF @ z[:, t]
# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
In [26]: q1 = z[1, :]
q2 = z[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE
In [27]: # Computes the maximum difference between the two quantities of the two�
↪firms
np.max(np.abs(q1 - q2))
Out[27]: 6.8833827526759706e-15
π_1 = p * q1 - γ * (u1) ** 2
π_2 = p * q2 - γ * (u2) ** 2
# Display values
print("Computed values for firm 1 and firm 2:\n")
print(f"v1(forward sim) = {v1_forward:.4f}; v1 (direct) = {v1_direct:.4f}")
print(f"v2 (forward sim) = {v2_forward:.4f}; v2 (direct) = {v2_direct:.
↪ 4f}")
Out[29]: True
for t in range(n):
vt_MPE[t] = -z[:, t].T @ P1 @ z[:, t]
vt_follower[t] = -yt_tilde[:, t].T @ P_tilde @ yt_tilde[:, t]
fig, ax = plt.subplots()
ax.plot(vt_MPE, 'b', label='MPE')
ax.plot(vt_leader, 'r', label='Stackelberg leader')
ax.plot(vt_follower, 'g', label='Stackelberg follower')
ax.set_title(r'MPE vs. Stackelberg Value Function')
ax.set_xlabel('t')
ax.legend(loc=(1.05, 0))
plt.show()
Computed values:
vt_leader(y0) = 150.0324
vt_follower(y0) = 112.6559
vt_MPE(y0) = 133.3296
In [32]: # Compute the difference in total value between the Stackelberg and the MPE
vt_leader[0] + vt_follower[0] - 2 * vt_MPE[0]
Out[32]: -3.970942562087714
Chapter 96
96.1 Contents
• Overview 96.2
• The Model 96.3
• Structure 96.4
• Intertemporal Influences 96.5
• Four Models of Government Policy 96.6
• A Ramsey Planner 96.7
• A Constrained-to-a-Constant-Growth-Rate Ramsey Government 96.8
• Markov Perfect Governments 96.9
• Equilibrium Outcomes for Three Models of Government Policy Making 96.10
• A Fourth Model of Government Decision Making 96.11
• Sustainable or Credible Plan 96.12
• Whose Credible Plan is it? 96.13
• Comparison of Equilibrium Values 96.14
• Note on Dynamic Programming Squared 96.15
Co-author: Sebastian Graves
In addition to what’s in Anaconda, this lecture will need the following libraries:
96.2 Overview
This lecture describes a linear-quadratic version of a model that Guillermo Calvo [29] used to
illustrate the time inconsistency of optimal government plans.
Like Chang [34], we use the model as a laboratory in which to explore the consequences of
different timing protocols for government decision making.
The model focuses attention on intertemporal tradeoffs between
• welfare benefits that anticipated deflation generates by increasing a representative
agent’s liquidity as measured by his or her real money balances, and
1667
1668 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
• costs associated with distorting taxes that must be used to withdraw money from the
economy in order to generate anticipated deflation
The model features
• rational expectations
• costly government actions at all dates 𝑡 ≥ 1 that increase household utilities at dates
before 𝑡
• two Bellman equations, one that expresses the private sector’s expectation of future in-
flation as a function of current and future government actions, another that describes
the value function of a Ramsey planner
A theme of this lecture is that timing protocols affect outcomes.
We’ll use ideas from papers by Cagan [28], Calvo [29], Stokey [150], [151], Chari and Kehoe
[35], Chang [34], and Abreu [1] as well as from chapter 19 of [108].
In addition, we’ll use ideas from linear-quadratic dynamic programming described in Linear
Quadratic Control as applied to Ramsey problems in Stackelberg problems.
In particular, we have specified the model in a way that allows us to use linear-quadratic
dynamic programming to compute an optimal government plan under a timing protocol in
which a government chooses an infinite sequence of money supply growth rates once and for
all at time 0.
We’ll start with some imports:
There is no uncertainty.
Let:
• 𝑝𝑡 be the log of the price level
• 𝑚𝑡 be the log of nominal money balances
• 𝜃𝑡 = 𝑝𝑡+1 − 𝑝𝑡 be the net rate of inflation between 𝑡 and 𝑡 + 1
• 𝜇𝑡 = 𝑚𝑡+1 − 𝑚𝑡 be the net rate of growth of nominal balances
The demand for real balances is governed by a perfect foresight version of the Cagan [28] de-
mand function:
for 𝑡 ≥ 0.
Equation (1) asserts that the demand for real balances is inversely related to the public’s ex-
pected rate of inflation, which here equals the actual rate of inflation.
(When there is no uncertainty, an assumption of rational expectations simplifies to per-
fect foresight).
(See [141] for a rational expectations version of the model when there is uncertainty)
96.3. THE MODEL 1669
Subtracting the demand function at time 𝑡 from the demand function at 𝑡 + 1 gives:
𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡
or
𝛼 1
𝜃𝑡 = 𝜃𝑡+1 + 𝜇 (2)
1+𝛼 1+𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 < 1.
Definition: For a scalar 𝑥𝑡 , let 𝐿2 be the space of sequences {𝑥𝑡 }∞
𝑡=0 satisfying
∞
∑ 𝑥2𝑡 < +∞
𝑡=0
∞ 𝑗
1 𝛼
𝜃𝑡 = ∑( ) 𝜇𝑡+𝑗 (3)
1 + 𝛼 𝑗=0 1 + 𝛼
Insight: In the spirit of Chang [34], note that equations (1) and (3) show that 𝜃𝑡 intermedi-
ates how choices of 𝜇𝑡+𝑗 , 𝑗 = 0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡 .
We shall use this insight to help us simplify and analyze government policy problems.
That future rates of money creation influence earlier rates of inflation creates optimal govern-
ment policy problems in which timing protocols matter.
We can rewrite the model as:
1 1 0 1 0
[ ]=[ 1+𝛼 ] [ ] +[ ]𝜇
𝜃𝑡+1 0 𝛼 𝜃𝑡 − 𝛼1 𝑡
or
We write the model in the state-space form (4) even though 𝜃0 is to be determined and so is
not an initial condition as it ordinarily would be in the state-space model described in Linear
Quadratic Control.
We write the model in the form (4) because we want to apply an approach described in
Stackelberg problems.
Assume that a representative household’s utility of real balances at time 𝑡 is:
𝑎2
𝑈 (𝑚𝑡 − 𝑝𝑡 ) = 𝑎0 + 𝑎1 (𝑚𝑡 − 𝑝𝑡 ) − (𝑚𝑡 − 𝑝𝑡 )2 , 𝑎0 > 0, 𝑎1 > 0, 𝑎2 > 0 (5)
2
1670 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
𝑎1
The “bliss level” of real balances is then 𝑎2 .
The money demand function (1) and the utility function (5) imply that utility maximizing or
bliss level of real balances is attained when:
𝑎1
𝜃𝑡 = 𝜃∗ = −
𝑎2 𝛼
Below, we introduce the discount factor 𝛽 ∈ (0, 1) that a representative household and a
benevolent government both use to discount future utilities.
(If we set parameters so that 𝜃∗ = log(𝛽), then we can regard a recommendation to set 𝜃𝑡 =
𝜃∗ as a “poor man’s Friedman rule” that attains Milton Friedman’s optimal quantity of
money)
Via equation (3), a government plan 𝜇⃗ = {𝜇𝑡 }∞
𝑡=0 leads to an equilibrium sequence of inflation
outcomes 𝜃 ⃗ = {𝜃𝑡 }∞
𝑡=0 .
We assume that social costs 2𝑐 𝜇2𝑡 are incurred at 𝑡 when the government changes the stock of
nominal money balances at rate 𝜇𝑡 .
Therefore, the one-period welfare function of a benevolent government is:
′
1 𝑎 − 𝑎12𝛼 1 𝑐 2
−𝑠(𝜃𝑡 , 𝜇𝑡 ) ≡ −𝑟(𝑥𝑡 , 𝜇𝑡 ) = [ ] [ 𝑎01 𝛼 ′ 2
𝑎2 𝛼2 ] [ ] − 𝜇𝑡 = −𝑥𝑡 𝑅𝑥𝑡 − 𝑄𝜇𝑡 (6)
𝜃𝑡 − 2 − 2 𝜃𝑡 2
∞ ∞
𝑣0 = − ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝜇𝑡 ) = − ∑ 𝛽 𝑡 𝑠(𝜃𝑡 , 𝜇𝑡 ) (7)
𝑡=0 𝑡=0
We can represent the dependence of 𝑣0 on (𝜃,⃗ 𝜇)⃗ recursively via the linear difference equation
96.4 Structure
The following structure is induced by private agents’ behavior as summarized by the demand
function for money (1) that leads to equation (3) that tells how future settings of 𝜇 affect the
current value of 𝜃.
Equation (3) maps a policy sequence of money growth rates 𝜇⃗ = {𝜇𝑡 }∞ 2
𝑡=0 ∈ 𝐿 into an infla-
tion sequence 𝜃 ⃗ = {𝜃𝑡 }𝑡=0 ∈ 𝐿 .
∞ 2
𝑣𝑡 = −𝑠(𝜃𝑡 , 𝜇𝑡 ) + 𝛽𝑣𝑡+1
Criterion function (7) and the constraint system (4) exhibit the following structure:
• Setting 𝜇𝑡 ≠ 0 imposes costs 2𝑐 𝜇2𝑡 at time 𝑡 and at no other times; but
• The money growth rate 𝜇𝑡 affects the representative household’s one-period utilities at
all dates 𝑠 = 0, 1, … , 𝑡.
That settings of 𝜇 at one date affect household utilities at earlier dates sets the stage for the
emergence of a time-inconsistent optimal government plan under a Ramsey (also called a
Stackelberg) timing protocol.
We’ll study outcomes under a Ramsey timing protocol below.
But we’ll also study the consequences of other timing protocols.
Ω(𝑥0 ) = {(⃗⃗𝑥⃗⃗1 , 𝜇
⃗⃗⃗⃗0 ) ∶ 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝜇𝑡 , ∀𝑡 ≥ 0}
96.7.1 Subproblem 1
∞
𝐽 (𝑥0 ) = max − ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝜇𝑡 )
(⃗⃗𝑥⃗ ⃗1 ,⃗⃗⃗⃗⃗
𝜇0 )∈Ω(𝑥0 )
𝑡=0
subject to:
𝑥′ = 𝐴𝑥 + 𝐵𝜇
As in Stackelberg problems, we map this problem into a linear-quadratic control problem and
then carefully use the optimal value function associated with it.
Guessing that 𝐽 (𝑥) = −𝑥′ 𝑃 𝑥 and substituting into the Bellman equation gives rise to the
algebraic matrix Riccati equation:
𝜇𝑡 = −𝐹 𝑥𝑡
where
96.7.2 Subproblem 2
𝑉 = max 𝐽 (𝑥0 )
𝑥0
𝑃11 𝑃12 1
𝐽 (𝑥0 ) = − [1 𝜃0 ] [ ] [ ] = −𝑃11 − 2𝑃21 𝜃0 − 𝑃22 𝜃02
𝑃21 𝑃22 𝜃0
−2𝑃21 − 2𝑃22 𝜃0 = 0
which implies
𝑃21
𝜃0∗ = −
𝑃22
The preceding calculations indicate that we can represent a Ramsey plan 𝜇⃗ recursively with
the following system created in the spirit of Chang [34]:
𝜃0 = 𝜃0∗
𝜇𝑡 = 𝑏0 + 𝑏1 𝜃𝑡 (9)
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡
Multiple roles of 𝜃𝑡
The inflation rate 𝜃𝑡 that appears in the system (9) and equation (3) plays three roles simul-
taneously:
• In equation (3), 𝜃𝑡 is the actual rate of inflation between 𝑡 and 𝑡 + 1.
• In equation (2) and (3), 𝜃𝑡 is also the public’s expected rate of inflation between 𝑡 and
𝑡 + 1.
• In system (9), 𝜃𝑡 is a promised rate of inflation chosen by the Ramsey planner at time 0.
1674 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
As discussed in Stackelberg problems and Optimal taxation with state-contingent debt, a con-
tinuation Ramsey plan is not a Ramsey plan.
This is a concise way of characterizing the time inconsistency of a Ramsey plan.
The time inconsistency of a Ramsey plan has motivated other models of government decision
making that alter either
• the timing protocol and/or
• assumptions about how government decision makers think their decisions affect private
agents’ beliefs about future government decisions
𝑐
𝑈 (−𝛼𝜇)̌ − 𝜇2̌
2
Here we have imposed the perfect foresight outcome implied by equation (2) that 𝜃𝑡 = 𝜇̌ when
the government chooses a constant 𝜇 for all 𝑡 ≥ 0.
With the quadratic form (5) for the utility function 𝑈 , the maximizing 𝜇̄ is
𝛼𝑎1
𝜇̌ = −
𝛼2 𝑎
2+𝑐
𝛼 1
𝜃𝑡 = 𝜇̄ + 𝜇
1+𝛼 1+𝛼 𝑡
𝑐
𝑊 = 𝑈 (−𝛼𝜃𝑡 ) − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
2
where 𝑉 (𝜇)̄ is the time 0 value 𝑣0 of recursion (8) under a money supply growth rate that is
forever constant at 𝜇.̄
Substituting for 𝑈 and 𝜃𝑡 gives:
𝛼2 𝛼 𝑎 𝛼2 𝛼 𝑐
𝑊 = 𝑎0 + 𝑎1 (− 𝜇̄ − 𝜇𝑡 ) − 2 ((− 𝜇̄ − 𝜇𝑡 )2 − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
1+𝛼 1+𝛼 2 1+𝛼 1+𝛼 2
𝛼 𝛼2 𝛼 𝛼
− 𝑎1 − 𝑎2 (− 𝜇̄ − 𝜇𝑡 )(− ) − 𝑐𝜇𝑡 = 0
1+𝛼 1+𝛼 1+𝛼 1+𝛼
Rearranging we get:
−𝑎1 𝛼 2 𝑎2
𝜇𝑡 = 1+𝛼 𝛼
− 𝜇̄
𝛼 𝑐 + 1+𝛼 𝑎2 [ 1+𝛼 𝛼
𝛼 𝑐 + 1+𝛼 𝑎2 ] (1 + 𝛼)
−𝑎1
𝜇𝑡 = 𝜇̄ = 1+𝛼 𝛼 𝛼2
𝛼 𝑐 + 1+𝛼 𝑎2 + 1+𝛼 𝑎2
In light of results presented in the previous section, this can be simplified to:
𝛼𝑎1
𝜇̄ = −
𝛼2 𝑎 2 + (1 + 𝛼)𝑐
Below we compute sequences {𝜃𝑡 , 𝜇𝑡 } under a Ramsey plan and compare these with the con-
stant levels of 𝜃 and 𝜇 in a) a Markov Perfect Equilibrium, and b) a Ramsey plan in which
the planner is restricted to choose 𝜇𝑡 = 𝜇̌ for all 𝑡 ≥ 0.
We denote the Ramsey sequence as 𝜃𝑅 , 𝜇𝑅 and the MPE values as 𝜃𝑀𝑃 𝐸 , 𝜇𝑀𝑃 𝐸 .
The bliss level of inflation is denoted by 𝜃∗ .
First, we will create a class ChangLQ that solves the models and stores their values
"""
def __init__(self, α, α0, α1, α2, c, T=1000, θ_n=200):
# Record parameters
self.α, self.α0, self.α1 = α, α0, α1
self.α2, self.c, self.T, self.θ_n = α2, c, T, θ_n
# LQ Matrices
R = -np.array([[α0, -α1 * α / 2],
[-α1 * α/2, -α2 * α**2 / 2]])
Q = -np.array([[-c / 2]])
A = np.array([[1, 0], [0, (1 + α) / α]])
B = np.array([[0], [-1 / α]])
# Solve Subproblem 2
self.θ_R = -self.P[0, 1] / self.P[1, 1]
self.J_series = J_series
self.μ_series = μ_series
self.θ_series = θ_series
96.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1677
J_LB = min(J_space)
J_UB = max(J_space)
J_range = J_UB - J_LB
self.J_LB = J_LB - 0.05 * J_range
self.J_UB = J_UB + 0.05 * J_range
self.J_range = J_range
self.J_space = J_space
self.θ_space = θ_space
self.μ_space = μ_space
self.θ_prime = θ_prime
self.check_space = check_space
Out[4]: 0.8464817248906141
The following code generates a figure that plots the value function from the Ramsey Planner’s
problem, which is maximized at 𝜃0𝑅 .
𝑅
The figure also shows the limiting value 𝜃∞ to which the inflation rate 𝜃𝑡 converges under the
Ramsey plan and compares it to the MPE value and the bliss value.
"""
fig, ax = plt.subplots()
ax.set_xlim([clq.θ_LB, clq.θ_UB])
ax.set_ylim([clq.J_LB, clq.J_UB])
t1 = clq.θ_space[np.argmax(clq.J_space)]
tR = clq.θ_series[1, -1]
θ_points = [t1, tR, clq.θ_B, clq.θ_MPE]
labels = [r"$\theta_0^R$", r"$\theta_\infty^R$",
r"$\theta^*$", r"$\theta^{MPE}$"]
plot_value_function(clq)
The next code generates a figure that plots the value function from the Ramsey Planner’s
96.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1679
problem as well as that for a Ramsey planner that must choose a constant 𝜇 (that in turn
equals an implied constant 𝜃).
plt.xlabel(r"$\theta$", fontsize=18)
ax.plot(clq.θ_space, clq.check_space,
lw=2, label=r"$V^\check(\theta)$")
plt.legend(fontsize=14, loc='upper left')
θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.μ_check]
labels = [r"$\theta_0^R$", r"$\theta^\check$"]
compare_ramsey_check(clq)
1680 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
The next code generates figures that plot the policy functions for a continuation Ramsey
planner.
The left figure shows the choice of 𝜃′ chosen by a continuation Ramsey planner who inherits
𝜃.
The right figure plots a continuation Ramsey planner’s choice of 𝜇 as a function of an inher-
ited 𝜃.
ax = axes[0]
ax.set_ylim([clq.θ_LB, clq.θ_UB])
ax.plot(clq.θ_space, clq.θ_prime,
label=r"$\theta'(\theta)$", lw=2)
x = np.linspace(clq.θ_LB, clq.θ_UB, 5)
ax.plot(x, x, 'k--', lw=2, alpha=0.7)
ax.set_ylabel(r"$\theta'$", fontsize=18)
θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.θ_series[1, -1]]
ax = axes[1]
μ_min = min(clq.μ_space)
μ_max = max(clq.μ_space)
μ_range = μ_max - μ_min
ax.set_ylim([μ_min - 0.05 * μ_range, μ_max + 0.05 * μ_range])
ax.plot(clq.θ_space, clq.μ_space, lw=2)
ax.set_ylabel(r"$\mu(\theta)$", fontsize=18)
for ax in axes:
ax.set_xlabel(r"$\theta$", fontsize=18)
ax.set_xlim([clq.θ_LB, clq.θ_UB])
plot_policy_functions(clq)
The following code generates a figure that plots sequences of 𝜇 and 𝜃 in the Ramsey plan and
compares these to the constant levels in a MPE and in a Ramsey plan with a government re-
stricted to set 𝜇𝑡 to a constant for all 𝑡.
plt.tight_layout()
plt.show()
plot_ramsey_MPE(clq)
The variation over time in 𝜇⃗ chosen by the Ramsey planner is a symptom of time inconsis-
tency.
• The Ramsey planner reaps immediate benefits from promising lower inflation later to be
achieved by costly distorting taxes.
• These benefits are intermediated by reductions in expected inflation that precede the
reductions in money creation rates that rationalize them, as indicated by equation (3).
• A government authority offered the opportunity to ignore effects on past utilities and to
reoptimize at date 𝑡 ≥ 1 would, if allowed, want to deviate from a Ramsey plan.
Note: A modified Ramsey plan constructed under the restriction that 𝜇𝑡 must be constant
over time is time consistent (see 𝜇̌ and 𝜃 ̌ in the above graphs).
In settings in which governments actually choose sequentially, many economists regard a time
inconsistent plan implausible because of the incentives to deviate that occur along the plan.
A way to summarize this defect in a Ramsey plan is to say that it is not credible because
there endure incentives for policymakers to deviate from it.
For that reason, the Markov perfect equilibrium concept attracts many economists.
96.11. A FOURTH MODEL OF GOVERNMENT DECISION MAKING 1683
Research by Abreu [1], Chari and Kehoe [35] [150], and Stokey [151] discovered conditions
under which a Ramsey plan can be rescued from the complaint that it is not credible.
They accomplished this by expanding the description of a plan to include expectations about
adverse consequences of deviating from it that can serve to deter deviations.
We turn to such theories of sustainable plans next.
The government’s one-period return function 𝑠(𝜃, 𝜇) described in equation (6) above has the
property that for all 𝜃
−𝑠(𝜃, 0) ≥ −𝑠(𝜃, 𝜇)
This inequality implies that whenever the policy calls for the government to set 𝜇 ≠ 0, the
government could raise its one-period payoff by setting 𝜇 = 0.
Disappointing private sector expectations in that way would increase the government’s cur-
rent payoff but would have adverse consequences for subsequent government payoffs be-
cause the private sector would alter its expectations about future settings of 𝜇.
The temporary gain constitutes the government’s temptation to deviate from a plan.
If the government at 𝑡 is to resist the temptation to raise its current payoff, it is only because
it forecasts adverse consequences that its setting of 𝜇𝑡 would bring for continuation govern-
ment payoffs via alterations in the private sector’s expectations.
A plan 𝜇𝐴
⃗ (here the superscipt 𝐴 is for Abreu) is said to be self-enforcing if
96.12. SUSTAINABLE OR CREDIBLE PLAN 1685
𝑣𝑗𝐴 = −𝑠(𝜃𝑗𝐴 , 𝜇𝐴 𝐴
𝑗 ) + 𝛽𝑣𝑗+1
(10)
≥ −𝑠(𝜃𝑗𝐴 , 0) + 𝛽𝑣0𝐴 ≡ 𝑣𝑗𝐴,𝐷 , 𝑗≥0
(Here it is useful to recall that setting 𝜇 = 0 is the maximizing choice for the government’s
one-period return function)
The first line tells the consequences of confirming private agents’ expectations by following
the plan, while the second line tells the consequences of disappointing private agents’ expecta-
tions by deviating from the plan.
A consequence of the inequality stated in the definition is that a self-enforcing plan is credi-
ble.
Self-enforcing plans can be used to construct other credible plans, including ones with better
values.
Thus, where 𝑣𝐴⃗ is the value associated with a self-enforcing plan 𝜇𝐴⃗ , a sufficient condition for
⃗
another plan 𝜇⃗ associated with inflation 𝜃 and value 𝑣 ⃗ to be credible is that
𝑣𝑗 = −𝑠(𝜃𝑗 , 𝜇𝑗 ) + 𝛽𝑣𝑗+1
(11)
≥ −𝑠(𝜃𝑗 , 0) + 𝛽𝑣0𝐴 ∀𝑗 ≥ 0
The left side of the above inequality is the government’s gain from deviating from the plan,
while the right side is the government’s loss from deviating from the plan.
A government never wants to deviate from a credible plan.
Abreu taught us that key step in constructing a credible plan is first constructing a self-
enforcing plan that has a low time 0 value.
The idea is to use the self-enforcing plan as a continuation plan whenever the government’s
choice at time 𝑡 fails to confirm private agents’ expectation.
We shall use a construction featured in Abreu ([1]) to construct a self-enforcing plan with low
time 0 value.
Abreu ([1]) invented a way to create a self-enforcing plan with a low initial value.
Imitating his idea, we can construct a self-enforcing plan 𝜇⃗ with a low time 0 value to the
government by insisting that future government decision makers set 𝜇𝑡 to a value yielding
1686 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
low one-period utilities to the household for a long time, after which government decisions
thereafter yield high one-period utilities.
• Low one-period utilities early are a stick
• High one-period utilities later are a carrot
Consider a candidate plan 𝜇𝐴
⃗ that sets 𝜇𝐴
𝑡 = 𝜇̄ (a high positive number) for 𝑇𝐴 periods, and
then reverts to the Ramsey plan.
Denote this sequence by {𝜇𝐴 ∞
𝑡 }𝑡=0 .
∞ 𝑗
1 𝛼
𝜃𝑡𝐴 = ∑( ) 𝜇𝐴
𝑡+𝑗
1 + 𝛼 𝑗=0 1 + 𝛼
𝑇𝐴 −1
𝑣0𝐴 = − ∑ 𝛽 𝑡 𝑠(𝜃𝑡𝐴 , 𝜇𝐴
𝑡 )+𝛽
𝑇𝐴
𝐽 (𝜃0𝑅 )
𝑡=0
For an appropriate 𝑇𝐴 , this plan can be verified to be self-enforcing and therefore credible.
clq.V_A = np.zeros(T)
for t in range(T):
clq.V_A[t] = sum(U_A[t:] / clq.β**t)
plt.tight_layout()
plt.show()
abreu_plan(clq)
1688 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
To confirm that the plan 𝜇𝐴 ⃗ is self-enforcing, we plot an object that we call 𝑉𝑡𝐴,𝐷 , defined
in the key inequality in the second line of equation (10) above.
𝑉𝑡𝐴,𝐷 is the value at 𝑡 of deviating from the self-enforcing plan 𝜇𝐴
⃗ by setting 𝜇𝑡 = 0 and then
96.12. SUSTAINABLE OR CREDIBLE PLAN 1689
Out[10]: True
check_ramsey(clq)
Out[11]: True
We can represent a sustainable plan recursively by taking the continuation value 𝑣𝑡 as a state
variable.
We form the following 3-tuple of functions:
𝜇𝑡̂ = 𝜈𝜇 (𝑣𝑡 )
𝜃𝑡 = 𝜈𝜃 (𝑣𝑡 ) (12)
𝑣𝑡+1 = 𝜈𝑣 (𝑣𝑡 , 𝜇𝑡 )
In [12]: clq.J_series[0]
Out[12]: 6.67918822960449
In [13]: clq.J_check
Out[13]: 6.676729524674898
In [14]: clq.J_MPE
Out[14]: 6.663435886995107
We have also computed credible plans for a government or sequence of governments that
choose sequentially.
These include
• a self-enforcing plan that gives a low initial value 𝑣0 .
• a better plan – possibly one that attains values associated with Ramsey plan – that is
not self-enforcing.
96.15. NOTE ON DYNAMIC PROGRAMMING SQUARED 1691
The theory deployed in this lecture is an application of what we nickname dynamic pro-
gramming squared.
The nickname refers to the fact that a value satisfying one Bellman equation is itself an argu-
ment in a second Bellman equation.
Thus, our models have involved two Bellman equations:
• equation (1) expresses how 𝜃𝑡 depends on 𝜇𝑡 and 𝜃𝑡+1
• equation (4) expresses how value 𝑣𝑡 depends on (𝜇𝑡 , 𝜃𝑡 ) and 𝑣𝑡+1
A value 𝜃 from one Bellman equation appears as an argument of a second Bellman equation
for another value 𝑣.
1692 CHAPTER 96. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
Chapter 97
97.1 Contents
• Overview 97.2
• A Competitive Equilibrium with Distorting Taxes 97.3
• Recursive Formulation of the Ramsey Problem 97.4
• Examples 97.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
97.2 Overview
This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and
Nancy Stokey [111].
The model revisits classic issues about how to pay for a war.
Here a war means a more or less temporary surge in an exogenous government expenditure
process.
The model features
• a government that must finance an exogenous stream of government expenditures with
either
– a flat rate tax on labor, or
– purchases and sales from a full array of Arrow state-contingent securities
• a representative household that values consumption and leisure
• a linear production function mapping labor into a single good
• a Ramsey planner who at time 𝑡 = 0 chooses a plan for taxes and trades of Arrow secu-
rities for all 𝑡 ≥ 0
After first presenting the model in a space of sequences, we shall represent it recursively
in terms of two Bellman equations formulated along lines that we encountered in Dynamic
Stackelberg models.
1693
1694 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state
vector artfully.
In particular, we shall include forward-looking variables that summarize optimal responses of
private agents to a Ramsey plan.
See Optimal taxation for analysis within a linear-quadratic setting.
Let’s start with some standard imports:
For 𝑡 ≥ 0, a history 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ] of an exogenous state 𝑠𝑡 has joint probability density
𝜋𝑡 (𝑠𝑡 ).
We begin by assuming that government purchases 𝑔𝑡 (𝑠𝑡 ) at time 𝑡 ≥ 0 depend on 𝑠𝑡 .
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 and date 𝑡.
A representative household is endowed with one unit of time that can be divided between
leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between 𝑐𝑡 (𝑠𝑡 ) and 𝑔𝑡 (𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where the utility function 𝑢 is increasing, strictly concave, and three times continuously dif-
ferentiable in both arguments.
The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡 .
The government imposes a flat-rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
There are complete markets in one-period Arrow securities.
One unit of an Arrow security issued at time 𝑡 at history 𝑠𝑡 and promising to pay one unit of
time 𝑡 + 1 consumption in state 𝑠𝑡+1 costs 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ).
The government issues one-period Arrow securities each period.
The government has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
𝑔𝑡 (𝑠𝑡 ) = 𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) − 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) (4)
𝑠𝑡+1
97.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1695
where
• 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) is a competitive equilibrium price of one unit of consumption at date 𝑡 + 1
in state 𝑠𝑡+1 at date 𝑡 and history 𝑠𝑡 .
• 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) is government debt falling due at time 𝑡, history 𝑠𝑡 .
Government debt 𝑏0 (𝑠0 ) is an exogenous initial condition.
The representative household has a sequence of budget constraints whose time 𝑡 ≥ 0 compo-
nent is
𝑐𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = [1 − 𝜏𝑡 (𝑠𝑡 )] 𝑛𝑡 (𝑠𝑡 ) + 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) ∀𝑡 ≥ 0 (5)
𝑠𝑡+1
The household faces the price system as a price-taker and takes the government policy as
given.
The household chooses {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞ 𝑡
𝑡=0 to maximize (3) subject to (5) and (1) for all 𝑡, 𝑠 .
We find it convenient sometimes to work with the Arrow-Debreu price system that is implied
by a sequence of Arrow securities prices.
Let 𝑞𝑡0 (𝑠𝑡 ) be the price at time 0, measured in time 0 consumption goods, of one unit of con-
sumption at time 𝑡, history 𝑠𝑡 .
The following recursion relates Arrow-Debreu prices {𝑞𝑡0 (𝑠𝑡 )}∞
𝑡=0 to Arrow securities prices
𝑡 ∞
{𝑝𝑡+1 (𝑠𝑡+1 |𝑠 )}𝑡=0
0
𝑞𝑡+1 (𝑠𝑡+1 ) = 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑞𝑡0 (𝑠𝑡 ) 𝑠.𝑡. 𝑞00 (𝑠0 ) = 1 (6)
Arrow-Debreu prices are useful when we want to compress a sequence of budget constraints
into a single intertemporal budget constraint, as we shall find it convenient to do below.
1696 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
We apply a popular approach to solving a Ramsey problem, called the primal approach.
The idea is to use first-order conditions for household optimization to eliminate taxes and
prices in favor of quantities, then pose an optimization problem cast entirely in terms of
quantities.
After Ramsey quantities have been found, taxes and prices can then be unwound from the
allocation.
The primal approach uses four steps:
1. Obtain first-order conditions of the household’s problem and solve them for
{𝑞𝑡0 (𝑠𝑡 ), 𝜏𝑡 (𝑠𝑡 )}∞ 𝑡 𝑡 ∞
𝑡=0 as functions of the allocation {𝑐𝑡 (𝑠 ), 𝑛𝑡 (𝑠 )}𝑡=0 .
2. Substitute these expressions for taxes and prices in terms of the allocation into the
household’s present-value budget constraint.
• This intertemporal constraint involves only the allocation and is regarded as an imple-
mentability constraint.
1. Find the allocation that maximizes the utility of the representative household (3) sub-
ject to the feasibility constraints (1) and (2) and the implementability condition derived
in step 2.
1. Use the Ramsey allocation together with the formulas from step 1 to find taxes and
prices.
By sequential substitution of one one-period budget constraint (5) into another, we can ob-
tain the household’s present-value budget constraint:
∞ ∞
∑ ∑ 𝑞𝑡0 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) = ∑ ∑ 𝑞𝑡0 (𝑠𝑡 )[1 − 𝜏𝑡 (𝑠𝑡 )]𝑛𝑡 (𝑠𝑡 ) + 𝑏0 (7)
𝑡=0 𝑠𝑡 𝑡=0 𝑠𝑡
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (8)
𝑢𝑐 (𝑠𝑡 )
and
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽𝜋(𝑠𝑡+1 |𝑠𝑡 ) ( ) (9)
𝑢𝑐 (𝑠𝑡 )
97.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1697
𝑢𝑐 (𝑠𝑡 )
𝑞𝑡0 (𝑠𝑡 ) = 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ) (10)
𝑢𝑐 (𝑠0 )
Using the first-order conditions (8) and (9) to eliminate taxes and prices from (7), we derive
the implementability condition
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )[𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝑢𝑐 (𝑠0 )𝑏0 = 0 (11)
𝑡=0 𝑠𝑡
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] (12)
𝑡=0 𝑠𝑡
subject to (11).
𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] = 𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] + Φ [𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] (13)
∞
𝐽 = ∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ){𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] + 𝜃𝑡 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )]} − Φ𝑢𝑐 (0)𝑏0 (14)
𝑡=0 𝑠𝑡
where {𝜃𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 is a sequence of Lagrange multipliers on the feasible conditions (2).
Given an initial government debt 𝑏0 , we want to maximize 𝐽 with respect to
{𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 and to minimize with respect to {𝜃(𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 .
The first-order conditions for the Ramsey problem for periods 𝑡 ≥ 1 and 𝑡 = 0, respectively,
are
𝑐𝑡 (𝑠𝑡 )∶ (1 + Φ)𝑢𝑐 (𝑠𝑡 ) + Φ [𝑢𝑐𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ𝑐 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝜃𝑡 (𝑠𝑡 ) = 0, 𝑡≥1
𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡
(15)
𝑛𝑡 (𝑠 )∶ − (1 + Φ)𝑢ℓ (𝑠 ) − Φ [𝑢𝑐ℓ (𝑠 )𝑐𝑡 (𝑠 ) − 𝑢ℓℓ (𝑠 )𝑛𝑡 (𝑠 )] + 𝜃𝑡 (𝑠 ) = 0, 𝑡≥1
and
1698 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
𝑐0 (𝑠0 , 𝑏0 )∶ (1 + Φ)𝑢𝑐 (𝑠0 , 𝑏0 ) + Φ [𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓ𝑐 (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] − 𝜃0 (𝑠0 , 𝑏0 )
− Φ𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑏0 = 0
𝑛0 (𝑠0 , 𝑏0 )∶ − (1 + Φ)𝑢ℓ (𝑠0 , 𝑏0 ) − Φ [𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓℓ (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] + 𝜃0 (𝑠0 , 𝑏0 )
+ Φ𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑏0 = 0
(16)
Please note how these first-order conditions differ between 𝑡 = 0 and 𝑡 ≥ 1.
It is instructive to use first-order conditions (15) for 𝑡 ≥ 1 to eliminate the multipliers 𝜃𝑡 (𝑠𝑡 ).
For convenience, we suppress the time subscript and the index 𝑠𝑡 and obtain
𝑔𝑡 (𝑠𝑡 ) = 𝑔𝜏 (𝑠𝜏̃ ) = 𝑔
then it follows from (17) that the Ramsey choices of consumption and leisure, (𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ))
and (𝑐𝑗 (𝑠𝜏̃ ), ℓ𝑗 (𝑠𝜏̃ )), are identical.
The proposition asserts that the optimal allocation is a function of the currently realized
quantity of government purchases 𝑔 only and does not depend on the specific history that
preceded that realization of 𝑔.
Also, assume that government purchases 𝑔 are an exact time-invariant function 𝑔(𝑠) of 𝑠.
We maintain these assumptions throughout the remainder of this lecture.
We complete the Ramsey plan by computing the Lagrange multiplier Φ on the implementabil-
ity constraint (11).
Government budget balance restricts Φ via the following line of reasoning.
The household’s first-order conditions imply
𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (19)
𝑢𝑐 (𝑠𝑡 )
𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽Π(𝑠𝑡+1 |𝑠𝑡 ) (20)
𝑢𝑐 (𝑠𝑡 )
Substituting from (19), (20), and the feasibility condition (2) into the recursive version (5) of
the household budget constraint gives
𝑢𝑐 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )] + 𝛽 ∑ Π(𝑠𝑡+1 |𝑠𝑡 )𝑢𝑐 (𝑠𝑡+1 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
𝑠𝑡+1 (21)
𝑡 𝑡 𝑡 𝑡−1
= 𝑢𝑙 (𝑠 )𝑛𝑡 (𝑠 ) + 𝑢𝑐 (𝑠 )𝑏𝑡 (𝑠𝑡 |𝑠 )
Notice that 𝑥𝑡 (𝑠𝑡 ) appears on the right side of (21) while 𝛽 times the conditional expectation
of 𝑥𝑡+1 (𝑠𝑡+1 ) appears on the left side.
Hence the equation shares much of the structure of a simple asset pricing equation with 𝑥𝑡
being analogous to the price of the asset at time 𝑡.
We learned earlier that for a Ramsey allocation 𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ) and 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ), and therefore
also 𝑥𝑡 (𝑠𝑡 ), are each functions of 𝑠𝑡 only, being independent of the history 𝑠𝑡−1 for 𝑡 ≥ 1.
That means that we can express equation (21) as
where 𝑠′ denotes a next period value of 𝑠 and 𝑥′ (𝑠′ ) denotes a next period value of 𝑥.
Equation (22) is easy to solve for 𝑥(𝑠) for 𝑠 = 1, … , 𝑆.
If we let 𝑛,⃗ 𝑔,⃗ 𝑥⃗ denote 𝑆 × 1 vectors whose 𝑖th elements are the respective 𝑛, 𝑔, and 𝑥 values
when 𝑠 = 𝑖, and let Π be the transition matrix for the Markov state 𝑠, then we can express
(22) as the matrix equation
In these equations, by 𝑢⃗𝑐 𝑛,⃗ for example, we mean element-by-element multiplication of the
two vectors.
𝑥(𝑠)
After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the
matrix equation
𝑥⃗
𝑏⃗ = (25)
𝑢⃗𝑐
where division here means an element-by-element division of the respective components of the
𝑆 × 1 vectors 𝑥⃗ and 𝑢⃗𝑐 .
Here is a computational algorithm:
1. Start with a guess for the value for Φ, then use the first-order conditions and the feasi-
bility conditions to compute 𝑐(𝑠𝑡 ), 𝑛(𝑠𝑡 ) for 𝑠 ∈ [1, … , 𝑆] and 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ),
given Φ.
• these depend on Φ.
𝑆
𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠|𝑠0 )𝑥(𝑠) (26)
𝑠=1
by gradually raising Φ if the left side of (26) exceeds the right side and lowering Φ if the left
side is less than the right side.
1. After computing a Ramsey allocation, recover the flat tax rate on labor from (8) and
the implied one-period Arrow securities prices from (9).
In our calculations below and in a subsequent lecture based on an extension of the Lucas-
Stokey model by Aiyagari, Marcet, Sargent, and Seppälä (2002) [7], we shall modify the one-
period utility function assumed above.
(We adopted the preceding utility specification because it was the one used in the original
[111] paper)
We will modify their specification by instead assuming that the representative agent has util-
ity function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
1702 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
With these understandings, equations (17) and (18) simplify in the case of the CRRA utility
function.
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (28)
In equation (27), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠.
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (29)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate.
In equation (29), it is understood that
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
1704 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) \
+ Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] /�
↪ u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) in equation (21) appears to be a purely “forward-looking” variable.
But 𝑥𝑡 (𝑠𝑡 ) is a also a natural candidate for a state variable in a recursive formulation of the
Ramsey problem.
To express a Ramsey plan recursively, we imagine that a time 0 Ramsey planner is followed
by a sequence of continuation Ramsey planners at times 𝑡 = 1, 2, ….
A “continuation Ramsey planner” at times 𝑡 ≥ 1 has a different objective function and faces
different constraints and state variabls than does the Ramsey planner at time 𝑡 = 0.
1706 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled
government debts 𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as predetermined quantities that continuation
Ramsey planners at times 𝑡 ≥ 1 are obligated to attain.
Continuation Ramsey planners do this by choosing continuation policies that induce the rep-
resentative household to make choices that imply that 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) = 𝑥𝑡 (𝑠𝑡 ).
A time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables.
A time 𝑡 ≥ 1 continuation Ramsey planner delivers 𝑥𝑡 by choosing a suitable 𝑛𝑡 , 𝑐𝑡 pair and
a list of 𝑠𝑡+1 -contingent continuation quantities 𝑥𝑡+1 to bequeath to a time 𝑡 + 1 continuation
Ramsey planner.
While a time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables, the time 0
Ramsey planner faces 𝑏0 , not 𝑥0 , as a state variable.
Furthermore, the Ramsey planner cares about (𝑐0 (𝑠0 ), ℓ0 (𝑠0 )), while continuation Ramsey
planners do not.
The time 0 Ramsey planner hands a state-contingent function that make 𝑥1 a function of 𝑠1
to a time 1 continuation Ramsey planner.
These lines of delegated authorities and responsibilities across time express the continuation
Ramsey planners’ obligations to implement their parts of the original Ramsey plan, designed
once-and-for-all at time 0.
After 𝑠𝑡 has been realized at time 𝑡 ≥ 1, the state variables confronting the time 𝑡 continua-
tion Ramsey planner are (𝑥𝑡 , 𝑠𝑡 ).
• Let 𝑉 (𝑥, 𝑠) be the value of a continuation Ramsey plan at 𝑥𝑡 = 𝑥, 𝑠𝑡 = 𝑠 for 𝑡 ≥ 1.
• Let 𝑊 (𝑏, 𝑠) be the value of a Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠.
We work backward by presenting a Bellman equation for 𝑉 (𝑥, 𝑠) first, then a Bellman equa-
tion for 𝑊 (𝑏, 𝑠).
where maximization over 𝑛 and the 𝑆 elements of 𝑥′ (𝑠′ ) is subject to the single imple-
mentability constraint for 𝑡 ≥ 1.
Associated with a value function 𝑉 (𝑥, 𝑠) that solves Bellman equation (30) are 𝑆 + 1 time-
invariant policy functions
𝑛𝑡 = 𝑓(𝑥𝑡 , 𝑠𝑡 ), 𝑡≥1
(32)
𝑥𝑡+1 (𝑠𝑡+1 ) = ℎ(𝑠𝑡+1 ; 𝑥𝑡 , 𝑠𝑡 ), 𝑠𝑡+1 ∈ 𝑆, 𝑡 ≥ 1
where maximization over 𝑛0 and the 𝑆 elements of 𝑥′ (𝑠1 ) is subject to the time 0 imple-
mentability constraint
𝑛0 = 𝑓0 (𝑏0 , 𝑠0 )
(35)
𝑥1 (𝑠1 ) = ℎ0 (𝑠1 ; 𝑏0 , 𝑠0 )
Notice the appearance of state variables (𝑏0 , 𝑠0 ) in the time 0 policy functions for the Ramsey
planner as compared to (𝑥𝑡 , 𝑠𝑡 ) in the policy functions (32) for the time 𝑡 ≥ 1 continuation
Ramsey planners.
The value function 𝑉 (𝑥𝑡 , 𝑠𝑡 ) of the time 𝑡 continuation Ramsey planner equals
∞
𝐸𝑡 ∑𝜏=𝑡 𝛽 𝜏−𝑡 𝑢(𝑐𝑡 , 𝑙𝑡 ), where the consumption and leisure processes are evaluated along the
original time 0 Ramsey plan.
Attach a Lagrange multiplier Φ1 (𝑥, 𝑠) to constraint (31) and a Lagrange multiplier Φ0 to con-
straint (26).
Time 𝑡 ≥ 1: the first-order conditions for the time 𝑡 ≥ 1 constrained maximization problem on
the right side of the continuation Ramsey planner’s Bellman equation (30) are
for 𝑛.
1708 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
Given Φ1 , equation (37) is one equation to be solved for 𝑛 as a function of 𝑠 (or of 𝑔(𝑠)).
Equation (36) implies 𝑉𝑥 (𝑥′ , 𝑠′ ) = Φ1 , while an envelope condition is 𝑉𝑥 (𝑥, 𝑠) = Φ1 , so it
follows that
Time 𝑡 = 0: For the time 0 problem on the right side of the Ramsey planner’s Bellman equa-
tion (33), first-order conditions are
𝑉𝑥 (𝑥(𝑠1 ), 𝑠1 ) = Φ0 (39)
Notice similarities and differences between the first-order conditions for 𝑡 ≥ 1 and for 𝑡 = 0.
An additional term is present in (40) except in three special cases
• 𝑏0 = 0, or
• 𝑢𝑐 is constant (i.e., preferences are quasi-linear in consumption), or
• initial government assets are sufficiently large to finance all government purchases with
interest earnings from those assets so that Φ0 = 0
Except in these special cases, the allocation and the labor tax rate as functions of 𝑠𝑡 differ
between dates 𝑡 = 0 and subsequent dates 𝑡 ≥ 1.
Naturally, the first-order conditions in this recursive formulation of the Ramsey problem
agree with the first-order conditions derived when we first formulated the Ramsey plan in the
space of sequences.
𝑉𝑥 (𝑥𝑡 , 𝑠𝑡 ) = Φ0 (41)
for all 𝑡 ≥ 1.
When 𝑉 is concave in 𝑥, this implies state-variable degeneracy along a Ramsey plan in the
sense that for 𝑡 ≥ 1, 𝑥𝑡 will be a time-invariant function of 𝑠𝑡 .
Given Φ0 , this function mapping 𝑠𝑡 into 𝑥𝑡 can be expressed as a vector 𝑥⃗ that solves equa-
tion (34) for 𝑛 and 𝑐 as functions of 𝑔 that are associated with Φ = Φ0 .
While the marginal utility adjusted level of government debt 𝑥𝑡 is a key state variable for the
continuation Ramsey planners at 𝑡 ≥ 1, it is not a state variable at time 0.
97.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1709
class RecursiveAllocation:
'''
Compute the planner's allocation by solving Bellman
equation.
'''
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and initial
grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
S = len(model.π)
# Create xgrid
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
cHist[0], nHist[0], xprime = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = 0
# Time 1 onward
for t in range(1, T):
s, x = sHist[t], xprime[sHist[t]]
c, n, xprime = np.empty(self.S), nf[s](x), np.empty(self.S)
1712 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[sHist[t - 1]] @ u_c
μHist[t] = self.Vf[s](x, 1)
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s in range(self.S):
for x in xgrid:
xprime0 = np.empty(self.S)
for sprime in range(self.S):
xprime0[sprime] = xprimef[s, sprime](x)
self.z0[x, s] = np.hstack([cf[s](x), nf[s](x), xprime0])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + Un(self.cFB, self.nFB) *�
↪ self.nFB
self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack([self.cFB[s], self.nFB[s], self.xFB])
def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([x - Uc(c, n) * c - Un(c, n) * n - β * π[s]
@ xprime,
(Θ * n - c - G)[s]])
if imode > 0:
raise Exception(smode)
1714 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
self.z0[x, s] = out
return np.hstack([-fx, out])
def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])
def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([-Uc(c, n) * (c - B_) - Un(c, n) * n - β * π[s0]
@ xprime,
(Θ * n - c - G)[s0]])
if imode > 0:
raise Exception(smode)
97.5 Examples
This example illustrates in a simple setting how a Ramsey planner manages risk.
Government expenditures are known for sure in all periods except one
• For 𝑡 < 3 and 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1
We define the components of the state vector as the following six (𝑡, 𝑔) pairs:
(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 ).
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6.
97.5. EXAMPLES 1715
0 1 0 0 0 0
⎛
⎜0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
Π=⎜
⎜
⎜
⎟
⎟
⎜0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜0.1⎞⎟
⎜
⎜ ⎟
0.1⎟
𝑔=⎜
⎜
⎜
⎟
⎟
⎟
⎜0.1 ⎟
⎜
⎜0.2⎟⎟
⎝0.1⎠
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
1716 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]
plt.tight_layout()
plt.show()
Tax smoothing
• the tax rate is constant for all 𝑡 ≥ 1
– For 𝑡 ≥ 1, 𝑡 ≠ 3, this is a consequence of 𝑔𝑡 being the same at all those dates.
– For 𝑡 = 3, it is a consequence of the special one-period utility function that we
have assumed.
– Under other one-period utility functions, the time 𝑡 = 3 tax rate could be either
higher or lower than for dates 𝑡 ≥ 1, 𝑡 ≠ 3.
• the tax rate is the same at 𝑡 = 3 for both the high 𝑔𝑡 outcome and the low 𝑔𝑡 outcome
We have assumed that at 𝑡 = 0, the government owes positive debt 𝑏0 .
It sets the time 𝑡 = 0 tax rate partly with an eye to reducing the value 𝑢𝑐,0 𝑏0 of 𝑏0 .
It does this by increasing consumption at time 𝑡 = 0 relative to consumption in later periods.
This has the consequence of lowering the time 𝑡 = 0 value of the gross interest rate for risk-
free loans between periods 𝑡 and 𝑡 + 1, which equals
𝑢𝑐,𝑡
𝑅𝑡 =
𝛽𝔼𝑡 [𝑢𝑐,𝑡+1 ]
A tax policy that makes time 𝑡 = 0 consumption be higher than time 𝑡 = 1 consumption
evidently decreases the risk-free rate one-period interest rate, 𝑅𝑡 , at 𝑡 = 0.
Lowering the time 𝑡 = 0 risk-free interest rate makes time 𝑡 = 0 consumption goods cheaper
1718 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
relative to consumption goods at later dates, thereby lowering the value 𝑢𝑐,0 𝑏0 of initial gov-
ernment debt 𝑏0 .
We see this in a figure below that plots the time path for the risk-free interest rate under
both realizations of the time 𝑡 = 3 government expenditure shock.
The following plot illustrates how the government lowers the interest rate at time 0 by raising
consumption
At time 𝑡 = 1, the government evidently saves since it has set the tax rate sufficiently high to
allow it to set 𝑏2 < 𝑏1 .
At time 𝑡 = 2 the government trades state-contingent Arrow securities to hedge against war
at 𝑡 = 3.
97.5. EXAMPLES 1719
We have seen that when 𝑏0 > 0, the Ramsey plan sets the time 𝑡 = 0 tax rate partly with
an eye toward lowering a risk-free interest rate for one-period loans between times 𝑡 = 0 and
𝑡 = 1.
By lowering this interest rate, the plan makes time 𝑡 = 0 goods cheap relative to consumption
goods at later times.
By doing this, it lowers the value of time 𝑡 = 0 debt that it has inherited and must finance.
In the preceding example, the Ramsey tax rate at time 0 differs from its value at time 1.
To explore what is going on here, let’s simplify things by removing the possibility of war at
time 𝑡 = 3.
The Ramsey problem then includes no randomness because 𝑔𝑡 = 𝑔𝑙 for all 𝑡.
The figure below plots the Ramsey tax rates and gross interest rates at time 𝑡 = 0 and time
𝑡 ≥ 1 as functions of the initial government debt (using the sequential allocation solution and
a CRRA utility function defined above)
n = 100
tax_policy = np.empty((n, 2))
interest_rate = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
interest_rate[i] = tax_sequence.simulate(gov_debt[i], 0, 3)[-1]
fig.tight_layout()
plt.show()
The figure indicates that if the government enters with positive debt, it sets a tax rate at 𝑡 =
0 that is less than all later tax rates.
By setting a lower tax rate at 𝑡 = 0, the government raises consumption, which reduces the
value 𝑢𝑐,0 𝑏0 of its initial debt.
It does this by increasing 𝑐0 and thereby lowering 𝑢𝑐,0 .
Conversely, if 𝑏0 < 0, the Ramsey planner sets the tax rate at 𝑡 = 0 higher than in subsequent
periods.
A side effect of lowering time 𝑡 = 0 consumption is that it lowers the one-period interest rate
at time 𝑡 = 0 below that of subsequent periods.
There are only two values of initial government debt at which the tax rate is constant for all
𝑡 ≥ 0.
The first is 𝑏0 = 0
• Here the government can’t use the 𝑡 = 0 tax rate to alter the value of the
initial debt.
97.5. EXAMPLES 1721
The second occurs when the government enters with sufficiently large assets that the Ramsey
planner can achieve first best and sets 𝜏𝑡 = 0 for all 𝑡.
It is only for these two values of initial government debt that the Ramsey plan is time-
consistent.
Another way of saying this is that, except for these two values of initial government debt, a
continuation of a Ramsey plan is not a Ramsey plan.
To illustrate this, consider a Ramsey planner who starts with an initial government debt 𝑏1
associated with one of the Ramsey plans computed above.
Call 𝜏1𝑅 the time 𝑡 = 0 tax rate chosen by the Ramsey planner confronting this value for ini-
tial government debt government.
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner
and what a new Ramsey planner would choose for its time 𝑡 = 0 tax rate
n = 100
tax_policy = np.empty((n, 2))
τ_reset = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
τ_reset[i] = tax_sequence.simulate(gov_debt[i], 0, 1)[3]
fig.tight_layout()
plt.show()
1722 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
The tax rates in the figure are equal for only two values of initial government debt.
The complete tax smoothing for 𝑡 ≥ 1 in the preceding example is a consequence of our hav-
ing assumed CRRA preferences.
To see what is driving this outcome, we begin by noting that the Ramsey tax rate for 𝑡 ≥ 1
is a time-invariant function 𝜏 (Φ, 𝑔) of the Lagrange multiplier on the implementability con-
straint and government expenditures.
For CRRA preferences, we can exploit the relations 𝑈𝑐𝑐 𝑐 = −𝜎𝑈𝑐 and 𝑈𝑛𝑛 𝑛 = 𝛾𝑈𝑛 to derive
(1 + (1 − 𝜎)Φ)𝑈𝑐
=1
(1 + (1 − 𝛾)Φ)𝑈𝑛
class LogUtility:
def __init__(self,
97.5. EXAMPLES 1723
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)
Also, suppose that 𝑔𝑡 follows a two-state IID process with equal probabilities attached to 𝑔𝑙
and 𝑔ℎ .
To compute the tax rate, we will use both the sequential and recursive approaches described
above.
The figure below plots a sample path of the Ramsey tax rate
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1,
1, 1, 1, 1, 1, 0])
# Simulate
sim_seq = seq_log.simulate(0.5, 0, T, sHist)
sim_bel = bel_log.simulate(0.5, 0, T, sHist)
# Output paths
1724 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
axes.flatten()[0].legend(('Sequential', 'Recursive'))
fig.tight_layout()
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:18:
RuntimeWarning: divide by zero encountered in log
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:22:
RuntimeWarning: divide by zero encountered in double_scalars
As should be expected, the recursive and sequential solutions produce almost identical alloca-
tions.
Unlike outcomes with CRRA preferences, the tax rate is not perfectly smoothed.
Instead, the government raises the tax rate when 𝑔𝑡 is high.
97.5. EXAMPLES 1725
A related lecture describes an extension of the Lucas-Stokey model by Aiyagari, Marcet, Sar-
gent, and Seppälä (2002) [7].
In the AMSS economy, only a risk-free bond is traded.
That lecture compares the recursive representation of the Lucas-Stokey model presented in
this lecture with one for an AMSS economy.
By comparing these recursive formulations, we shall glean a sense in which the dimension of
the state is lower in the Lucas Stokey model.
Accompanying that difference in dimension will be different dynamics of government debt.
1726 CHAPTER 97. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT
Chapter 98
98.1 Contents
• Overview 98.2
• Competitive Equilibrium with Distorting Taxes 98.3
• Recursive Version of AMSS Model 98.4
• Examples 98.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
98.2 Overview
In an earlier lecture, we described a model of optimal taxation with state-contingent debt due
to Robert E. Lucas, Jr., and Nancy Stokey [111].
Aiyagari, Marcet, Sargent, and Seppälä [7] (hereafter, AMSS) studied optimal taxation in a
model without state-contingent debt.
In this lecture, we
• describe assumptions and equilibrium concepts
• solve the model
• implement the model numerically
• conduct some policy experiments
• compare outcomes with those in a corresponding complete-markets model
1727
1728 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Many but not all features of the economy are identical to those of the Lucas-Stokey economy.
Let’s start with things that are identical.
For 𝑡 ≥ 0, a history of the state is represented by 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ].
Government purchases 𝑔(𝑠) are an exact time-invariant function of 𝑠.
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 at time 𝑡.
Each period a representative household is endowed with one unit of time that can be divided
between leisure ℓ𝑡 and labor 𝑛𝑡 :
Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between consumption 𝑐𝑡 (𝑠𝑡 ) and 𝑔(𝑠𝑡 )
∞
∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡
where
• 𝜋𝑡 (𝑠𝑡 ) is a joint probability distribution over the sequence 𝑠𝑡 , and
• the utility function 𝑢 is increasing, strictly concave, and three times continuously differ-
entiable in both arguments.
The government imposes a flat rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡 .
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities;
also see smoothing models.
It is at this point that AMSS [7] modify the Lucas and Stokey economy.
AMSS allow the government to issue only one-period risk-free debt each period.
Ruling out complete markets in this way is a step in the direction of making total tax collec-
tions behave more like that prescribed in [14] than they do in [111].
• 𝑏𝑡+1 (𝑠𝑡 ) be the amount of the time 𝑡 + 1 consumption good that at time 𝑡 the govern-
ment promised to pay
• 𝑅𝑡 (𝑠𝑡 ) be the gross interest rate on risk-free one-period debt between periods 𝑡 and 𝑡 + 1
• 𝑇𝑡 (𝑠𝑡 ) be a non-negative lump-sum transfer to the representative household Section ??
That 𝑏𝑡+1 (𝑠𝑡 ) is the same for all realizations of 𝑠𝑡+1 captures its risk-free character.
The market value at time 𝑡 of government debt maturing at time 𝑡 + 1 equals 𝑏𝑡+1 (𝑠𝑡 ) divided
by 𝑅𝑡 (𝑠𝑡 ).
The government’s budget constraint in period 𝑡 at history 𝑠𝑡 is
𝑏𝑡+1 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝜏𝑡𝑛 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) +
𝑅𝑡 (𝑠𝑡 )
(4)
𝑏 (𝑠𝑡 )
≡ 𝑧(𝑠 ) + 𝑡+1 𝑡 ,
𝑡
𝑅𝑡 (𝑠 )
𝑡+1
1 𝑡+1 𝑡 𝑢𝑐 (𝑠 )
𝑡
= ∑ 𝛽𝜋 𝑡+1 (𝑠 |𝑠 ) 𝑡
𝑅𝑡 (𝑠 ) 𝑠𝑡+1 |𝑠𝑡 𝑢𝑐 (𝑠 )
Substituting this expression into the government’s budget constraint (4) yields:
𝑢𝑐 (𝑠𝑡+1 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝑧(𝑠𝑡 ) + 𝛽 ∑ 𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑏 (𝑠𝑡 ) (5)
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+1
Components of 𝑧(𝑠𝑡 ) on the right side depend on 𝑠𝑡 , but the left side is required to depend on
𝑠𝑡−1 only.
This is what it means for one-period government debt to be risk-free.
Therefore, the sum on the right side of equation (5) also has to depend only on 𝑠𝑡−1 .
This requirement will give rise to measurability constraints on the Ramsey allocation to
be discussed soon.
If we replace 𝑏𝑡+1 (𝑠𝑡 ) on the right side of equation (5) by the right side of next period’s bud-
get constraint (associated with a particular realization 𝑠𝑡 ) we get
After making similar repeated substitutions for all future occurrences of government indebt-
edness, and by invoking the natural debt limit, we arrive at:
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = ∑ ∑ 𝛽 𝑗 𝜋𝑡+𝑗 (𝑠𝑡+𝑗 |𝑠𝑡 ) 𝑧(𝑠𝑡+𝑗 ) (6)
𝑗=0 𝑠𝑡+𝑗 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 )
1730 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Now let’s
• substitute the resource constraint into the net-of-interest government surplus, and
• use the household’s first-order condition 1 − 𝜏𝑡𝑛 (𝑠𝑡 ) = 𝑢ℓ (𝑠𝑡 )/𝑢𝑐 (𝑠𝑡 ) to eliminate the labor
tax rate
so that we can express the net-of-interest government surplus 𝑧(𝑠𝑡 ) as
𝑢ℓ (𝑠𝑡 )
𝑧(𝑠𝑡 ) = [1 − ] [𝑐𝑡 (𝑠𝑡 ) + 𝑔𝑡 (𝑠𝑡 )] − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) . (7)
𝑢𝑐 (𝑠𝑡 )
If we substitute the appropriate versions of the right side of (7) for 𝑧(𝑠𝑡+𝑗 ) into equation (6),
we obtain a sequence of implementability constraints on a Ramsey allocation in an AMSS
economy.
Expression (6) at time 𝑡 = 0 and initial state 𝑠0 was also an implementability constraint on a
Ramsey allocation in a Lucas-Stokey economy:
∞
𝑢𝑐 (𝑠𝑗 )
𝑏0 (𝑠−1 ) = 𝔼0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) (8)
𝑗=0
𝑢𝑐 (𝑠0 )
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝔼𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) (9)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
The expression on the right side of (9) in the Lucas-Stokey (1983) economy would equal the
present value of a continuation stream of government surpluses evaluated at what would be
competitive equilibrium Arrow-Debreu prices at date 𝑡.
In the Lucas-Stokey economy, that present value is measurable with respect to 𝑠𝑡 .
In the AMSS economy, the restriction that government debt be risk-free imposes that that
same present value must be measurable with respect to 𝑠𝑡−1 .
In a language used in the literature on incomplete markets models, it can be said that the
AMSS model requires that at each (𝑡, 𝑠𝑡 ) what would be the present value of continuation
government surpluses in the Lucas-Stokey model must belong to the marketable subspace
of the AMSS model.
After we have substituted the resource constraint into the utility function, we can express the
Ramsey problem as being to choose an allocation that solves
∞
max 𝔼0 ∑ 𝛽 𝑡 𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
{𝑐𝑡 (𝑠𝑡 ),𝑏𝑡+1 (𝑠𝑡 )}
𝑡=0
98.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1731
∞
𝑢𝑐 (𝑠𝑗 )
𝔼0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) ≥ 𝑏0 (𝑠−1 ) (10)
𝑗=0
𝑢𝑐 (𝑠0 )
and
∞
𝑢𝑐 (𝑠𝑡+𝑗 )
𝔼𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) = 𝑏𝑡 (𝑠𝑡−1 ) ∀ 𝑠𝑡 (11)
𝑗=0
𝑢𝑐 (𝑠𝑡 )
given 𝑏0 (𝑠−1 ).
Lagrangian Formulation
Depending on how the constraints bind, these multipliers can be positive or negative:
A negative multiplier 𝛾𝑡 (𝑠𝑡 ) < 0 means that if we could relax constraint (11), we would like to
increase the beginning-of-period indebtedness for that particular realization of history 𝑠𝑡 .
That would let us reduce the beginning-of-period indebtedness for some other history Section
??.
These features flow from the fact that the government cannot use state-contingent debt and
therefore cannot allocate its indebtedness efficiently across future states.
∞
𝐽 = 𝔼0 ∑ 𝛽 𝑡 {𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0
∞
+ 𝛾𝑡 (𝑠𝑡 )[𝔼𝑡 ∑ 𝛽 𝑗 𝑢𝑐 (𝑠𝑡+𝑗 ) 𝑧(𝑠𝑡+𝑗 ) − 𝑢𝑐 (𝑠𝑡 ) 𝑏𝑡 (𝑠𝑡−1 )}
𝑗=0
(12)
∞
𝑡 𝑡 𝑡
= 𝔼0 ∑ 𝛽 {𝑢 (𝑐𝑡 (𝑠 ), 1 − 𝑐𝑡 (𝑠 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0
where
In (12), the second equality uses the law of iterated expectations and Abel’s summation for-
mula (also called summation by parts, see this page).
First-order conditions with respect to 𝑐𝑡 (𝑠𝑡 ) can be expressed as
𝑢𝑐 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 ) + Ψ𝑡 (𝑠𝑡 ) {[𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑧(𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 ) 𝑧𝑐 (𝑠𝑡 )}
(14)
− 𝛾𝑡 (𝑠𝑡 ) [𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑏𝑡 (𝑠𝑡−1 ) = 0
If we substitute 𝑧(𝑠𝑡 ) from (7) and its derivative 𝑧𝑐 (𝑠𝑡 ) into the first-order condition (14), we
find two differences from the corresponding condition for the optimal allocation in a Lucas-
Stokey economy with state-contingent government debt.
1. The term involving 𝑏𝑡 (𝑠𝑡−1 ) in the first-order condition (14) does not appear in the cor-
responding expression for the Lucas-Stokey economy.
1. The Lagrange multiplier Ψ𝑡 (𝑠𝑡 ) in the first-order condition (14) may change over time
in response to realizations of the state, while the multiplier Φ in the Lucas-Stokey econ-
omy is time-invariant.
We need some code from our an earlier lecture on optimal taxation with state-contingent debt
sequential allocation implementation:
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
1734 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] /�
↪ u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
To analyze the AMSS model, we find it useful to adopt a recursive formulation using tech-
niques like those in our lectures on dynamic Stackelberg models and optimal taxation with
state-contingent debt.
where 𝑅𝑡 (𝑠𝑡 ) is the gross risk-free rate of interest between 𝑡 and 𝑡 + 1 at history 𝑠𝑡 and 𝑇𝑡 (𝑠𝑡 )
are non-negative transfers.
Throughout this lecture, we shall set transfers to zero (for some issues about the limiting
behavior of debt, this makes a possibly important difference from AMSS [7], who restricted
transfers to be non-negative).
In this case, the household faces a sequence of budget constraints
𝑏𝑡 (𝑠𝑡−1 ) + (1 − 𝜏𝑡 (𝑠𝑡 ))𝑛𝑡 (𝑠𝑡 ) = 𝑐𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) (16)
The household’s first-order conditions are 𝑢𝑐,𝑡 = 𝛽𝑅𝑡 𝔼𝑡 𝑢𝑐,𝑡+1 and (1 − 𝜏𝑡 )𝑢𝑐,𝑡 = 𝑢𝑙,𝑡 .
Using these to eliminate 𝑅𝑡 and 𝜏𝑡 from budget constraint (16) gives
𝑢𝑐,𝑡 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡−1 ) + 𝑢𝑙,𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) = 𝑢𝑐,𝑡 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) + 𝛽(𝔼𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 ) (18)
Now define
𝑏𝑡+1 (𝑠𝑡 )
𝑥𝑡 ≡ 𝛽𝑏𝑡+1 (𝑠𝑡 )𝔼𝑡 𝑢𝑐,𝑡+1 = 𝑢𝑐,𝑡 (𝑠𝑡 ) (19)
𝑅𝑡 (𝑠𝑡 )
𝑢𝑐,𝑡 𝑥𝑡−1
= 𝑢𝑐,𝑡 𝑐𝑡 − 𝑢𝑙,𝑡 𝑛𝑡 + 𝑥𝑡 (20)
𝛽𝔼𝑡−1 𝑢𝑐,𝑡
for 𝑡 ≥ 1.
The right side of equation (21) expresses the time 𝑡 value of government debt in terms of a
linear combination of terms whose individual components are measurable with respect to 𝑠𝑡 .
98.4. RECURSIVE VERSION OF AMSS MODEL 1737
The sum of terms on the right side of equation (21) must equal 𝑏𝑡 (𝑠𝑡−1 ).
That implies that it has to be measurable with respect to 𝑠𝑡−1 .
Equations (21) are the measurability constraints that the AMSS model adds to the single time
0 implementation constraint imposed in the Lucas and Stokey model.
Let Π(𝑠|𝑠− ) be a Markov transition matrix whose entries tell probabilities of moving from
state 𝑠− to state 𝑠 in one period.
Let
• 𝑉 (𝑥− , 𝑠− ) be the continuation value of a continuation Ramsey plan at 𝑥𝑡−1 = 𝑥− , 𝑠𝑡−1 =
𝑠− for 𝑡 ≥ 1
• 𝑊 (𝑏, 𝑠) be the value of the Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠
We distinguish between two types of planners:
For 𝑡 ≥ 1, the value function for a continuation Ramsey planner satisfies the Bellman
equation
𝑢𝑐 (𝑠)𝑥−
= 𝑢𝑐 (𝑠)(𝑛(𝑠) − 𝑔(𝑠)) − 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (23)
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
A continuation Ramsey planner at 𝑡 ≥ 1 takes (𝑥𝑡−1 , 𝑠𝑡−1 ) = (𝑥− , 𝑠− ) as given and before 𝑠 is
realized chooses (𝑛𝑡 (𝑠𝑡 ), 𝑥𝑡 (𝑠𝑡 )) = (𝑛(𝑠), 𝑥(𝑠)) for 𝑠 ∈ 𝑆.
The Ramsey planner takes (𝑏0 , 𝑠0 ) as given and chooses (𝑛0 , 𝑥0 ).
The value function 𝑊 (𝑏0 , 𝑠0 ) for the time 𝑡 = 0 Ramsey planner satisfies the Bellman equa-
tion
Let 𝜇(𝑠|𝑠− )Π(𝑠|𝑠− ) be a Lagrange multiplier on the constraint (23) for state 𝑠.
After forming an appropriate Lagrangian, we find that the continuation Ramsey planner’s
first-order condition with respect to 𝑥(𝑠) is
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ Π(𝑠|𝑠− )𝜇(𝑠|𝑠− ) (27)
𝑠
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠̃ − )𝑢𝑐 (𝑠)̃
𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ (Π(𝑠|𝑠− ) ) 𝑉𝑥 (𝑥(𝑠), 𝑠) (28)
𝑠
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌ 𝑢𝑐 (𝑠)
Π(𝑠|𝑠 − ) ≡ Π(𝑠|𝑠− )
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
̌
Exercise: Please verify that Π(𝑠|𝑠 − ) is a valid Markov transition density, i.e., that its ele-
ments are all non-negative and that for each 𝑠− , the sum over 𝑠 equals unity.
Along a Ramsey plan, the state variable 𝑥𝑡 = 𝑥𝑡 (𝑠𝑡 , 𝑏0 ) becomes a function of the history 𝑠𝑡
and initial government debt 𝑏0 .
In Lucas-Stokey model, we found that
• a counterpart to 𝑉𝑥 (𝑥, 𝑠) is time-invariant and equal to the Lagrange multiplier on the
Lucas-Stokey implementability constraint
• time invariance of 𝑉𝑥 (𝑥, 𝑠) is the source of a key feature of the Lucas-Stokey model,
namely, state variable degeneracy (i.e., 𝑥𝑡 is an exact function of 𝑠𝑡 )
That 𝑉𝑥 (𝑥, 𝑠) varies over time according to a twisted martingale means that there is no state-
variable degeneracy in the AMSS model.
In the AMSS model, both 𝑥 and 𝑠 are needed to describe the state.
This property of the AMSS model transmits a twisted martingale component to consumption,
employment, and the tax rate.
Furthermore, when the Markov chain Π(𝑠|𝑠− ) and the government expenditure function 𝑔(𝑠)
are such that 𝑔𝑡 is perpetually random, 𝑉𝑥 (𝑥, 𝑠) almost surely converges to zero.
For quasi-linear preferences, the first-order condition with respect to 𝑛(𝑠) becomes
When 𝜇(𝑠|𝑠− ) = 𝛽𝑉𝑥 (𝑥(𝑠), 𝑥) converges to zero, in the limit 𝑢𝑙 (𝑠) = 1 = 𝑢𝑐 (𝑠), so that
𝜏 (𝑥(𝑠), 𝑠) = 0.
Thus, in the limit, if 𝑔𝑡 is perpetually random, the government accumulates sufficient assets
to finance all expenditures from earnings on those assets, returning any excess revenues to the
household as non-negative lump-sum transfers.
98.4.7 Code
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
for s_ in range(S):
c, n, x, V = zip(*map(lambda μ: incomplete_allocation(μ, s_),�
↪ μgrid0))
c, n = np.vstack(c).T, np.vstack(n).T
x, V = np.hstack(x), np.hstack(V)
xprimes = np.vstack([x] * S)
cf.append(interp(x, c))
nf.append(interp(x, n))
Vf.append(interp(x, V))
xgrid.append(x)
xprimef.append(interp(x, xprimes))
cf, nf, xprimef = fun_vstack(cf), fun_vstack(nf), fun_vstack(xprimef)
Vf = fun_hstack(Vf)
policies = [cf, nf, xprimef]
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
'''
Computes Τ given c and n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# Time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
1742 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
1744 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0],�
↪f_eqcons=cons,
bounds=bounds, full_output=True,
iprint=0)
if imode > 0:
raise Exception(smode)
98.5 Examples
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
else:
fhat = np.vstack([f(x) for f in self.F.flatten()])
return fhat.reshape(np.hstack((shape, len(x))))
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
In our lecture on optimal taxation with state contingent debt we studied how the government
manages uncertainty in a simple setting.
As in that lecture, we assume the one-period utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
Note
For convenience in matching our computer code, we have expressed utility as a
function of 𝑛 rather than leisure 𝑙.
1746 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
We consider the same government expenditure process studied in the lecture on optimal taxa-
tion with state contingent debt.
Government expenditures are known for sure in all periods except one.
• For 𝑡 < 3 or 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1.
• At 𝑡 = 3 a war occurs with probability 0.5.
– If there is war, 𝑔3 = 𝑔ℎ = 0.2.
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1.
A useful trick is to define components of the state vector as the following six (𝑡, 𝑔) pairs:
0 1 0 0 0 0
⎛
⎜ 0 0 1 0 0 0⎞⎟
⎜
⎜ ⎟
0 0 0 0.5 0.5 0⎟
𝑃 =⎜
⎜
⎜
⎟
⎟
⎜ 0 0 0 0 0 1⎟⎟
⎜
⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠
0.1
⎛
⎜0.1⎞⎟
⎜
⎜ ⎟
⎜0.1⎟⎟
𝑔=⎜
⎜ ⎟
⎟
⎜0.1 ⎟
⎜0.2⎟
⎜ ⎟
⎝0.1⎠
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
The following figure plots the Ramsey plan under both complete and incomplete markets for
both possible realizations of the state at time 𝑡 = 3.
Optimal policies when the government has access to state contingent debt are represented by
black lines, while the optimal policies when there is only a risk-free bond are in red.
Paths with circles are histories in which there is peace, while those with triangle denote war.
time_example = CRRAutility()
# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]
sim_bel_l[5] = time_example.Θ[sHist_l] * sim_bel_l[1]
sim_bel_h[5] = time_example.Θ[sHist_h] * sim_bel_h[1]
ax.set(title=title)
ax.grid()
plt.tight_layout()
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24:
RuntimeWarning: divide by zero encountered in reciprocal
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29:
RuntimeWarning: divide by zero encountered in power
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in true_divide
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:228:
RuntimeWarning: invalid value encountered in matmul
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:233:
RuntimeWarning: invalid value encountered in matmul
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in multiply
0.6029333236643755
0.11899588239403049
0.09881553212225772
0.08354106892508192
0.07149555120835548
0.06173036758118132
0.05366019901394205
0.04689112026451663
0.04115178347560931
0.036240012965927396
0.032006237992696515
0.028368464481562206
0.025192689677184087
0.022405843880195616
0.01994774715614924
0.017777614158738117
0.01586311426476452
98.5. EXAMPLES 1749
0.014157556340393418
0.012655688350303772
0.011323561508356405
0.010134342587404501
0.009067133049314944
0.008133363039380094
0.007289176565901135
0.006541414713738157
0.005872916742002829
0.005262680193064001
0.0047307749771207785
0.00425304528362447
0.003818501528167009
0.0034264405600953744
0.003079364780532014
0.002768326786546087
0.002490427866931677
0.002240592066624134
0.0020186948255381727
0.001817134273040178
0.001636402035539666
0.0014731339707420147
0.0013228186455305523
0.0011905279885160533
0.001069923299755228
0.0009619064545164963
0.000866106560101833
0.0007801798498127538
0.0007044038334509719
0.001135820461718877
0.0005858462046557034
0.0005148785169405882
0.0008125646930954998
0.000419343630648423
0.0006110525605884945
0.0003393644339027041
0.00030505082851731526
0.0002748939327310508
0.0002466101258104514
0.00022217612526700695
0.00020017376735678401
0.00018111714263865545
0.00016358937979053516
0.00014736943218961575
0.00013236625616948046
0.00011853760872608077
0.00010958653853354627
9.594155330329376e-05
1750 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
How a Ramsey planner responds to war depends on the structure of the asset market.
If it is able to trade state-contingent debt, then at time 𝑡 = 2
• the government purchases an Arrow security that pays off when 𝑔3 = 𝑔ℎ
• the government sells an Arrow security that pays off when 𝑔3 = 𝑔𝑙
• These purchases are designed in such a way that regardless of whether or not there is a
war at 𝑡 = 3, the government will begin period 𝑡 = 4 with the same government debt
This pattern facilities smoothing tax rates across states.
The government without state contingent debt cannot do this.
Instead, it must enter time 𝑡 = 3 with the same level of debt falling due whether there is
peace or war at 𝑡 = 3.
It responds to this constraint by smoothing tax rates across time.
To finance a war it raises taxes and issues more debt.
To service the additional debt burden, it raises taxes in all future periods.
The absence of state contingent debt leads to an important difference in the optimal tax pol-
icy.
When the Ramsey planner has access to state contingent debt, the optimal tax policy is his-
tory independent
• the tax rate is a function of the current level of government spending only, given the
Lagrange multiplier on the implementability constraint
Without state contingent debt, the optimal tax rate is history dependent.
• A war at time 𝑡 = 3 causes a permanent increase in the tax rate.
98.5. EXAMPLES 1751
History dependence occurs more dramatically in a case in which the government perpetually
faces the prospect of war.
This case was studied in the final example of the lecture on optimal taxation with state-
contingent debt.
There, each period the government faces a constant probability, 0.5, of war.
In addition, this example features the following preferences
class LogUtility:
def __init__(self,
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)
With these preferences, Ramsey tax rates will vary even in the Lucas-Stokey model with
state-contingent debt.
The figure below plots optimal tax policies for both the economy with state contingent debt
(circles) and the economy with only a risk-free bond (triangles).
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
# Simulate
sim_seq = log_sequential.simulate(0.5, 0, T, sHist)
sim_bel = log_bellman.simulate(0.5, 0, T, sHist)
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:18:
RuntimeWarning: invalid value encountered in log
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:18:
RuntimeWarning: divide by zero encountered in log
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:22:
RuntimeWarning: divide by zero encountered in true_divide
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in true_divide
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in multiply
0.09444436241467027
0.05938723624807882
0.009418765429903522
0.008379498687574425
0.0074624123240604355
0.006647816620408291
0.005931361510280879
0.005294448322145543
0.0047253954106721
0.0042222775808757355
0.0037757367327914595
0.0033746180929005954
0.003017386825278821
98.5. EXAMPLES 1753
0.002699930230109115
0.002417750826161132
0.002162259204654334
0.0019376221726160596
0.001735451076427532
0.0015551292357692775
0.0013916748907577743
0.0012464994947173087
0.0011179310440191763
0.0010013269547115295
0.0008961739076702308
0.0008040179931696027
0.0007206700039880485
0.0006461943981250373
0.0005794223638423901
0.0005197699346274911
0.0004655559524182191
0.00047793563176274804
0.00041453864841817386
0.0003355701386912934
0.0003008429779500316
0.00034467634902466326
0.00024187486910206502
0.0002748557784369297
0.0002832106657514851
0.00017453973560832125
0.00017016491393155364
0.00017694150942686578
0.00019907388038770387
0.00011291946575698052
0.00010121277902064972
9.094360747603131e-05
1754 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
When the government experiences a prolonged period of peace, it is able to reduce govern-
ment debt and set permanently lower tax rates.
However, the government finances a long war by borrowing and raising taxes.
This results in a drift away from policies with state contingent debt that depends on the his-
tory of shocks.
This is even more evident in the following figure that plots the evolution of the two policies
over 200 periods.
# Output paths
sim_seq_long[5] = log_example.Θ[sHist_long] * sim_seq_long[1]
sim_bel_long[5] = log_example.Θ[sHist_long] * sim_bel_long[1]
Footnotes
[1] In an allocation that solves the Ramsey problem and that levies distorting taxes on labor,
why would the government ever want to hand revenues back to the private sector? It would
not in an economy with state-contingent debt, since any such allocation could be improved by
lowering distortionary taxes rather than handing out lump-sum transfers. But, without state-
contingent debt there can be circumstances when a government would like to make lump-sum
transfers to the private sector.
[2] From the first-order conditions for the Ramsey problem, there exists another realization 𝑠𝑡̃
with the same history up until the previous period, i.e., 𝑠𝑡−1
̃ = 𝑠𝑡−1 , but where the multiplier
𝑡
on constraint (11) takes a positive value, so 𝛾𝑡 (𝑠 ̃ ) > 0.
1756 CHAPTER 98. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT
Chapter 99
99.1 Contents
• Overview 99.2
• Forces at Work 99.3
• Logical Flow of Lecture 99.4
• Example Economy 99.5
• Reverse Engineering Strategy 99.6
• Code for Reverse Engineering 99.7
• Short Simulation for Reverse-engineered: Initial Debt 99.8
• Long Simulation 99.9
• BEGS Approximations of Limiting Debt and Convergence Rate 99.10
Co-authors: Anmol Bhandari and David Evans
In addition to what’s in Anaconda, this lecture will need the following libraries:
99.2 Overview
This lecture extends our investigations of how optimal policies for levying a flat-rate tax on
labor income and issuing government debt depend on whether there are complete markets for
debt.
A Ramsey allocation and Ramsey policy in the AMSS [7] model described in optimal taxation
without state-contingent debt generally differs from a Ramsey allocation and Ramsey policy
in the Lucas-Stokey [111] model described in optimal taxation with state-contingent debt.
This is because the implementability restriction that a competitive equilibrium with a distort-
ing tax imposes on allocations in the Lucas-Stokey model is just one among a set of imple-
mentability conditions imposed in the AMSS model.
These additional constraints require that time 𝑡 components of a Ramsey allocation for the
AMSS model be measurable with respect to time 𝑡 − 1 information.
The measurability constraints imposed by the AMSS model are inherited from the restriction
1757
1758CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
ernment assets, fluctuations in the one-period risk-free interest rate provide the government
with complete insurance against stochastically varying government expenditures.
Let’s start with some imports:
The forces driving asymptotic outcomes here are examples of dynamics present in a more gen-
eral class incomplete markets models analyzed in [23] (BEGS).
BEGS provide conditions under which government debt under a Ramsey plan converges to an
invariant distribution.
BEGS construct approximations to that asymptotically invariant distribution of government
debt under a Ramsey plan.
BEGS also compute an approximation to a Ramsey plan’s rate of convergence to that limit-
ing invariant distribution.
We shall use the BEGS approximating limiting distribution and the approximating rate of
convergence to help interpret outcomes here.
For a long time, the Ramsey plan puts a nontrivial martingale-like component into the par
value of government debt as part of the way that the Ramsey plan imperfectly smooths dis-
tortions from the labor tax rate across time and Markov states.
But BEGS show that binding implementability constraints slowly push government debt in
a direction designed to let the government use fluctuations in equilibrium interest rate rather
than fluctuations in par values of debt to insure against shocks to government expenditures.
• This is a weak (but unrelenting) force that, starting from an initial debt level, for a
long time is dominated by the stochastic martingale-like component of debt dynam-
ics that the Ramsey planner uses to facilitate imperfect tax-smoothing across time and
states.
• This weak force slowly drives the par value of government assets to a constant level
at which the government can completely insure against government expenditure shocks
while shutting down the stochastic component of debt dynamics.
• At that point, the tail of the par value of government debt becomes a trivial martingale:
it is constant over time.
Although we are studying an AMSS [7] economy, a Lucas-Stokey [111] economy plays an im-
portant role in the reverse-engineering calculation to be described below.
For that reason, it is helpful to have readily available some key equations underlying a Ram-
sey plan for the Lucas-Stokey economy.
Recall first-order conditions for a Ramsey allocation for the Lucas-Stokey economy.
For 𝑡 ≥ 1, these take the form
There is one such equation for each value of the Markov state 𝑠𝑡 .
In addition, given an initial Markov state, the time 𝑡 = 0 quantities 𝑐0 and 𝑏0 satisfy
𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (3)
𝑅0
where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate.
In equation (3), it is understood that
𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0−1 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
It is useful to transform some of the above equations to forms that are more natural for ana-
lyzing the case of a CRRA utility specification that we shall use in our example economies.
99.4. LOGICAL FLOW OF LECTURE 1761
As in lectures optimal taxation without state-contingent debt and optimal taxation with
state-contingent debt, we assume that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
The analysis of Lucas and Stokey prevails once we make the following replacements
With these understandings, equations (1) and (2) simplify in the case of the CRRA utility
function.
They become
and
(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (5)
In equation (4), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠.
The CRRA utility function is represented in the following class.
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
1762CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
𝛽 = .9
𝜎=2
𝛾=2
Here are several classes that do most of the work for us.
The code is mostly taken or adapted from the earlier lectures optimal taxation without state-
contingent debt and optimal taxation with state-contingent debt.
class SequentialAllocation:
99.5. EXAMPLE ECONOMY 1763
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) \
+ Θ * Ξ, # FOC of n
1764CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] /�
↪ u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
1766CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
'''
S, xgrid = len(self.π), self.xgrid
interp = interpolator_factory(3, 0)
cf, nf, xprimef, Tf, Vf = [], [], [], [], []
for s_ in range(S):
PFvec = np.vstack([PF(x, s_) for x in self.xgrid]).T
Vf.append(interp(xgrid, PFvec[0, :]))
cf.append(interp(xgrid, PFvec[1:1 + S]))
nf.append(interp(xgrid, PFvec[1 + S:1 + 2 * S]))
xprimef.append(interp(xgrid, PFvec[1 + 2 * S:1 + 3 * S]))
Tf.append(interp(xgrid, PFvec[1 + 3 * S:]))
policies = fun_vstack(cf), fun_vstack(
nf), fun_vstack(xprimef), fun_vstack(Tf)
Vf = fun_hstack(Vf)
return Vf, policies
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# Time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
1768CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
99.5. EXAMPLE ECONOMY 1769
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)
1770CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0],�
↪f_eqcons=cons,
bounds=bounds, full_output=True,
iprint=0)
if imode > 0:
raise Exception(smode)
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
99.6. REVERSE ENGINEERING STRATEGY 1771
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
We can reverse engineer a value 𝑏0 of initial debt due that renders the AMSS measurability
constraints not binding from time 𝑡 = 0 onward.
1772CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
We accomplish this by recognizing that if the AMSS measurability constraints never bind,
then the AMSS allocation and Ramsey plan is equivalent with that for a Lucas-Stokey econ-
omy in which for each period 𝑡 ≥ 0, the government promises to pay the same state-
contingent amount 𝑏̄ in each state tomorrow.
This insight tells us to find a 𝑏0 and other fundamentals for the Lucas-Stokey [111] model
that make the Ramsey planner want to borrow the same value 𝑏̄ next period for all states and
all dates.
We accomplish this by using various equations for the Lucas-Stokey [111] model presented in
optimal taxation with state-contingent debt.
We use the following steps.
Step 1: Pick an initial Φ.
Step 2: Given that Φ, jointly solve two versions of equation (4) for 𝑐(𝑠), 𝑠 = 1, 2 associated
with the two values for 𝑔(𝑠), 𝑠 = 1, 2.
Step 3: Solve the following equation for 𝑥⃗
𝑥(𝑠)
Step 4: After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or
the matrix equation
𝑥⃗
𝑏⃗ = (7)
𝑢⃗𝑐
In [7]: u = CRRAutility()
def min_Φ(Φ):
# Solve Φ(c)
def equations(unknowns, Φ):
c1, c2 = unknowns
# First argument of .Uc and second argument of .Un are redundant
return loss
Out[8]: -1.0757576567504166
c0, b0 = unknowns
g0 = u.G[s-1]
The following graph shows simulations of outcomes for both a Lucas-Stokey economy and for
an AMSS economy starting from initial government debt equal to 𝑏0 = −1.038698407551764.
These graphs report outcomes for both the Lucas-Stokey economy with complete markets and
the AMSS economy with one-period risk-free debt only.
log_example = CRRAutility()
T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])
# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]
ax.grid()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24:
RuntimeWarning: divide by zero encountered in reciprocal
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29:
RuntimeWarning: divide by zero encountered in power
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in true_divide
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in multiply
0.04094445433234912
0.0016732111459338028
0.0014846748487524172
0.0013137721375787164
0.001181403713496291
0.001055965336274255
0.0009446661646844358
0.0008463807322718293
0.0007560453780620191
0.0006756001036624751
0.0006041528458700388
0.0005396004512131591
0.0004820716911559142
0.0004308273211001684
0.0003848185136981698
0.0003438352175587286
0.000307243693715206
0.0002745009148200469
0.00024531773404782317
0.0002192332430448889
0.00019593539446980383
0.00017514303514117128
0.0001565593983558638
0.00013996737141091305
0.00012514457833358872
0.00011190070779369022
0.0001000702022487836
8.949728533921615e-05
8.004975220206986e-05
7.16059059036149e-05
6.40583656889648e-05
5.731162430892402e-05
5.127968193566545e-05
4.5886529754852955e-05
4.106387898823845e-05
3.675099365037568e-05
3.289361837628717e-05
2.9443289305467077e-05
2.635678797913085e-05
2.3595484132661966e-05
2.1124903957300157e-05
1.891424711454524e-05
1.6936003234214835e-05
1.5165596593393527e-05
1.358106697950504e-05
1.2162792578343118e-05
1776CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
1.089323614045592e-05
9.756722989261432e-06
8.739240835382216e-06
7.828264537526775e-06
7.012590840428639e-06
6.282206099226885e-06
5.628151985858767e-06
5.042418443402312e-06
4.5178380641774095e-06
4.048002049270609e-06
3.6271748637111453e-06
3.25022483449945e-06
2.9125597419793e-06
2.6100730258792974e-06
2.33908472396273e-06
2.096307136505147e-06
1.8787904889257265e-06
1.6838997430816734e-06
1.509274819366032e-06
1.3528011889214775e-06
1.212587081653834e-06
1.0869381104429176e-06
9.743372244174285e-07
8.73426405689756e-07
7.829877314930334e-07
7.019331006223168e-07
6.292850109121352e-07
5.641704754646274e-07
5.058062142044674e-07
4.534908905846261e-07
4.0659614636622263e-07
3.6455917260464895e-07
3.2687571576858064e-07
2.9309400626589154e-07
2.628097110920697e-07
2.3565904692627078e-07
2.1131781852307158e-07
1.894947440294367e-07
1.699288361713118e-07
1.5238586063734686e-07
1.366568424325186e-07
1.2255365279755824e-07
1.0990783200082102e-07
9.856861272368773e-08
8.840091774987147e-08
7.928334532230156e-08
7.110738489161091e-08
6.377562438179933e-08
5.720073827118772e-08
5.1304550974155735e-08
4.6016827121093976e-08
4.127508285786482e-08
3.702254013429707e-08
3.3208575403099436e-08
2.9788031505649846e-08
2.6720125194025672e-08
2.3968551794263268e-08
2.1500634727809534e-08
1.928709568259096e-08
1.7301644673193848e-08
1.5520805495718083e-08
1.3923446503682317e-08
1.2490628141347746e-08
99.8. SHORT SIMULATION FOR REVERSE-ENGINEERED: INITIAL DEBT 1777
1.1205412924843752e-08
1.005255424847768e-08
9.018420064493843e-09
8.090776959812253e-09
7.2586201295038205e-09
6.512151645666916e-09
5.842497427160883e-09
5.2417739988686235e-09
4.702866830975856e-09
4.219410867722359e-09
3.7856971691602775e-09
3.3965991981299917e-09
3.047527271191316e-09
2.73435780104547e-09
2.4533959184694e-09
2.201325576919178e-09
1.975173912964314e-09
1.7722736943474094e-09
1.5902318528480405e-09
1.4269032326934397e-09
1.280361209635549e-09
1.1488803057922307e-09
1.030910807308611e-09
9.250638131182712e-10
8.30091415855734e-10
7.44876618462649e-10
6.684152536152628e-10
5.998085081044447e-10
5.382483192957509e-10
4.830097256567513e-10
4.3344408654246964e-10
3.88969172650052e-10
3.4905943032488643e-10
3.1324806778169217e-10
2.811122777111904e-10
2.5227584505600285e-10
2.2639906361282244e-10
2.0317838832934676e-10
1.8234104590203233e-10
1.6364103618734542e-10
1.468608707188693e-10
1.3180218471597189e-10
1.182881710076278e-10
1.0616062455371046e-10
9.527750852134792e-11
1778CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
The Ramsey allocations and Ramsey outcomes are identical for the Lucas-Stokey and AMSS
economies.
This outcome confirms the success of our reverse-engineering exercises.
Notice how for 𝑡 ≥ 1, the tax rate is a constant - so is the par value of government debt.
However, output and labor supply are both nontrivial time-invariant functions of the Markov
state.
The following graph shows the par value of government debt and the flat rate tax on labor
income for a long simulation for our sample economy.
For the same realization of a government expenditure path, the graph reports outcomes for
two economies
• the gray lines are for the Lucas-Stokey economy with complete markets
• the blue lines are for the AMSS economy with risk-free one-period debt only
For both economies, initial government debt due at time 0 is 𝑏0 = .5.
For the Lucas-Stokey complete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡+1 ).
• Notice that this is a time-invariant function of the Markov state from the beginning.
For the AMSS incomplete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡 ).
• Notice that this is a martingale-like random process that eventually seems to converge
to a constant 𝑏̄ ≈ −1.07.
99.9. LONG SIMULATION 1779
• Notice that the limiting value 𝑏̄ < 0 so that asymptotically the government makes a
constant level of risk-free loans to the public.
• In the simulation displayed as well as other simulations we have run, the par value of
government debt converges to about 1.07 afters between 1400 to 2000 periods.
For the AMSS incomplete markets economy, the marginal tax rate on labor income 𝜏𝑡 con-
verges to a constant
• labor supply and output each converge to time-invariant functions of the Markov state
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
As remarked above, after 𝑏𝑡+1 (𝑠𝑡 ) has converged to a constant, the measurability constraints
in the AMSS model cease to bind
• the associated Lagrange multipliers on those implementability constraints converge to
zero
This leads us to seek an initial value of government debt 𝑏0 that renders the measurability
constraints slack from time 𝑡 = 0 onward
• a tell-tale sign of this situation is that the Ramsey planner in a corresponding Lucas-
Stokey economy would instruct the government to issue a constant level of government
debt 𝑏𝑡+1 (𝑠𝑡+1 ) across the two Markov states
We now describe how to find such an initial level of government debt.
It is useful to link the outcome of our reverse engineering exercise to limiting approximations
constructed by [23].
[23] used a slightly different notation to represent a generalization of the AMSS model.
We’ll introduce a version of their notation so that readers can quickly relate notation that
appears in their key formulas to the notation that we have used.
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to our notation by
𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅𝑡−1 =
𝑢𝑐,𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑡−1
𝑏𝑡 (𝑠 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]
In terms of their notation, equation (44) of [23] expresses the time 𝑡 state 𝑠 government bud-
get constraint as
where the dependence on 𝜏 is to remind us that these objects depend on the tax rate and 𝑠−
is last period’s Markov state.
BEGS interpret random variations in the right side of (8) as a measure of fiscal risk com-
posed of
• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government
portfolio, namely, ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡
99.10. BEGS APPROXIMATIONS OF LIMITING DEBT AND CONVERGENCE RATE1781
cov∞ (ℛ, 𝒳)
ℬ∗ = − (9)
var∞ (ℛ)
where the superscript ∞ denotes a moment taken with respect to an ergodic distribution.
Formula (9) presents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribution.
This regression coefficient emerges as the minimizer for a variance-minimization problem:
The minimand in criterion (10) is the measure of fiscal risk associated with a given tax-debt
policy that appears on the right side of equation (8).
Expressing formula (9) in terms of our notation tells us that 𝑏̄ should approximately equal
ℬ∗
𝑏̂ = (11)
𝛽𝐸𝑡 𝑢𝑐,𝑡+1
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition.
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
∗
≈ 2
(12)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var(ℛ)
For our example, we describe some code that we use to compute the steady state mean and
the rate of convergence to it.
The values of 𝜋(𝑠) are 0.5, 0.5.
We can then construct 𝒳(𝑠), ℛ(𝑠), 𝑢𝑐 (𝑠) for our two states using the definitions above.
We can then construct 𝛽𝐸𝑡−1 𝑢𝑐 = 𝛽 ∑𝑠 𝑢𝑐 (𝑠)𝜋(𝑠), cov(ℛ(𝑠), 𝒳(𝑠)) and var(ℛ(𝑠)) to be
plugged into formula (11).
We also want to compute var(𝒳).
To compute the variances and covariance, we use the following standard formulas.
Temporarily let 𝑥(𝑠), 𝑠 = 1, 2 be an arbitrary random variables.
Then we define
1782CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
𝜇𝑥 = ∑ 𝑥(𝑠)𝜋(𝑠)
𝑠
cov(𝑥, 𝑦) = (∑ 𝑥(𝑠)𝑦(𝑠)𝜋(𝑠)) − 𝜇𝑥 𝜇𝑦
𝑠
After we compute these moments, we compute the BEGS approximation to the asymptotic
mean 𝑏̂ in formula (11).
After that, we move on to compute ℬ∗ in formula (9).
We’ll also evaluate the BEGS criterion (8) at the limiting value ℬ∗
2
𝐽 (ℬ∗ ) = var(ℛ) (ℬ∗ ) + 2ℬ∗ cov(ℛ, 𝒳) + var(𝒳) (13)
Here are some functions that we’ll use to compute key objects that we want
def variance(x):
x = np.array(x)
return x**2 @ u.π[s] - mean(x)**2
Now let’s form the two random variables ℛ, 𝒳 appearing in the BEGS approximating formu-
las
In [14]: u = CRRAutility()
s = 0
c = [0.940580824225584, 0.8943592757759343] # Vector for c
g = u.G # Vector for g
n = c + g # Total population
τ = lambda s: 1 + u.Un(1, n[s]) / u.Uc(c[s], 1)
R = [R_s(0), R_s(1)]
X = [X_s(0), X_s(1)]
Now let’s compute the ingredient of the approximating limit and the approximating rate of
convergence
Out[15]: -1.0757585378303758
So we have
Out[17]: -8.810799592140484e-07
These outcomes show that 𝑏̂ does a remarkably good job of approximating 𝑏.̄
Next, let’s compute the BEGS fiscal criterion that 𝑏̂ is minimizing
Out[18]: -9.020562075079397e-17
This is machine zero, a verification that 𝑏̂ succeeds in minimizing the nonnegative fiscal cost
criterion 𝐽 (ℬ∗ ) defined in BEGS and in equation (13) above.
Let’s push our luck and compute the mean reversion speed in the formula above equation
(47) in [23].
Now let’s compute the implied meantime to get to within 0.01 of the limit
The slow rate of convergence and the implied time of getting within one percent of the limit-
ing value do a good job of approximating our long simulation above.
1784CHAPTER 99. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE
Chapter 100
100.1 Contents
• Overview 100.2
• The Economy 100.3
• Long Simulation 100.4
• Asymptotic Mean and Rate of Convergence 100.5
In addition to what’s in Anaconda, this lecture will need the following libraries:
100.2 Overview
This lecture studies government debt in an AMSS economy [7] of the type described in Opti-
mal Taxation without State-Contingent Debt.
We study the behavior of government debt as time 𝑡 → +∞.
We use these techniques
• simulations
• a regression coefficient from the tail of a long simulation that allows us to verify that
the asymptotic mean of government debt solves a fiscal-risk minimization problem
• an approximation to the mean of an ergodic distribution of government debt
• an approximation to the rate of convergence to an ergodic distribution of government
debt
We apply tools applicable to more general incomplete markets economies that are presented
on pages 648 - 650 in section III.D of [23] (BEGS).
We study an [7] economy with three Markov states driving government expenditures.
• In a previous lecture, we showed that with only two Markov states, it is pos-
sible that eventually endogenous interest rate fluctuations support complete
markets allocations and Ramsey outcomes.
1785
1786 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
• The presence of three states prevents the full spanning that eventually prevails in the
two-state example featured in Fiscal Insurance via Fluctuating Interest Rates.
The lack of full spanning means that the ergodic distribution of the par value of government
debt is nontrivial, in contrast to the situation in Fiscal Insurance via Fluctuating Interest
Rates where the ergodic distribution of the par value is concentrated on one point.
Nevertheless, [23] (BEGS) establish for general settings that include ours, the Ramsey plan-
ner steers government assets to a level that comes as close as possible to providing full
spanning in a precise a sense defined by BEGS that we describe below.
We use code constructed in a previous lecture.
Warning: Key equations in [23] section III.D carry typos that we correct below.
Let’s start with some imports:
As in Optimal Taxation without State-Contingent Debt and Optimal Taxation with State-
Contingent Debt, we assume that the representative agent has utility function
𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾
𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡
𝛽 = .9
𝜎=2
𝛾=2
class CRRAutility:
def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):
# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)
We’ll want first and second moments of some key random variables below.
The following code computes these moments; the code is recycled from Fiscal Insurance via
Fluctuating Interest Rates.
class SequentialAllocation:
'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''
def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
if not res.success:
raise Exception('Could not find first best')
self.cFB = res.x[:S]
self.nFB = res.x[S:]
'''
Computes optimal allocation for time t >= 1 for a given μ
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn
def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
# FOC of c
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) \
+ Θ * Ξ, # FOC of n
Θ * n - c - G])
# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)
return c, n, x, Ξ
# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')
return res.x
1790 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
if sHist is None:
sHist = self.mc.simulate(T, s_0)
# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] /�
↪ u_c[s], \
Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ
class RecursiveAllocationAMSS:
def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)
# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid
print(diff)
Vf = Vfnew
if sHist is None:
sHist = simulate_markov(π, s_0, T)
# Time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)
Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
μHist[t] = self.Vf[s](xprime[s])
class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''
self.z0 = {}
cf, nf, xprimef = policies0
for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])
self.find_first_best()
def find_first_best(self):
'''
1794 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])
self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB
self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])
def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]
Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])
def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
100.4. LONG SIMULATION 1795
if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)
if imode > 0:
raise Exception(smode)
def objf(z):
c, n, xprime = z[:-1]
def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])
if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0],�
↪f_eqcons=cons,
bounds=bounds, full_output=True,
iprint=0)
if imode > 0:
raise Exception(smode)
class interpolate_wrapper:
def transpose(self):
self.F = self.F.transpose()
def __len__(self):
return len(self.F)
class interpolator_factory:
def fun_vstack(fun_list):
def fun_hstack(fun_list):
return sHist
Next, we show the code that we use to generate a very long simulation starting from initial
government debt equal to −.5.
Here is a graph of a long simulation of 102000 periods.
sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24:
RuntimeWarning: divide by zero encountered in reciprocal
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29:
RuntimeWarning: divide by zero encountered in power
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in true_divide
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:235:
RuntimeWarning: invalid value encountered in multiply
0.03826635338765925
0.0015144378246584984
1798 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
0.0013387575049829455
0.0011833202399953704
0.0010600307116151308
0.0009506620325028087
0.0008518776516937746
0.0007625857030716029
0.0006819563061621401
0.0006094002926927259
0.0005443007358227137
0.0004859950035124384
0.00043383959352032413
0.00038722730861434493
0.000345595412214899
0.0003084287064063272
0.0002752590187094664
0.0002456631291600592
0.00021925988530998263
0.00019570695817042554
0.00017469751640521595
0.0001559569713071983
0.00013923987965085293
0.00012432704760933488
0.00011102285952965586
9.915283206803345e-05
8.856139174858334e-05
7.91098648574037e-05
7.067466535012738e-05
6.31456673681484e-05
5.6424746008860264e-05
5.042447143154252e-05
4.506694212534692e-05
4.028274355430257e-05
3.601001918083999e-05
3.2193642882531256e-05
2.878448111493858e-05
2.5738738819018375e-05
2.301736976750311e-05
2.0585562762952467e-05
1.841227366505203e-05
1.647009732636953e-05
1.4734148263778101e-05
1.3182214397654561e-05
1.1794654663586968e-05
1.0553942919813837e-05
9.444436170445705e-06
8.452171096119784e-06
7.564681527564076e-06
6.770836691014705e-06
6.0606991281269e-06
5.425387729296574e-06
4.856977427893397e-06
4.348382669160568e-06
3.893276412835248e-06
3.4860031510823107e-06
3.1215109737669223e-06
2.795284109545752e-06
2.503284080753522e-06
2.241904849713046e-06
2.0079207043630637e-06
1.7984473598229776e-06
1.6109043156289632e-06
1.4429883335786674e-06
1.2926350820537814e-06
100.4. LONG SIMULATION 1799
1.1580014056712184e-06
1.037436438388734e-06
9.294649648188667e-07
8.3276668236914e-07
7.461586315970762e-07
6.685859440207697e-07
5.991018791164966e-07
5.36860205044418e-07
4.811036780956593e-07
4.311540879734326e-07
3.864052975497157e-07
3.4631272976818724e-07
3.103916419559902e-07
2.7820604666234584e-07
2.493665757525022e-07
2.235242330547715e-07
2.0036662752964725e-07
1.7961406916271733e-07
1.6101610597963257e-07
1.4434841355653347e-07
1.2941010571693734e-07
1.1602128543466011e-07
1.0402082434646952e-07
9.326441852343976e-08
8.362274988135493e-08
7.49799939308504e-08
6.723239527266927e-08
6.028702608399241e-08
5.406062550179954e-08
4.847860158085037e-08
4.347411399612939e-08
3.898727291456688e-08
3.496441211560151e-08
3.135744827221921e-08
2.8123291587559614e-08
2.5223328515894898e-08
2.26229541381511e-08
2.0291155429696614e-08
1.8200137067287912e-08
1.632498637211791e-08
1.464337367650618e-08
1.313528409236892e-08
1.1782776814280018e-08
1.056978340425791e-08
9.481875293010455e-09
8.506129973427988e-09
7.630960687058812e-09
6.845981534564069e-09
6.141882022205125e-09
5.510312745970958e-09
4.943790102561541e-09
4.435605224917734e-09
3.979784496747062e-09
3.570876548574359e-09
3.204046477022082e-09
2.8749568316999748e-09
2.5797181798724397e-09
2.3148435453620055e-09
2.0772039322334064e-09
1.8639959361200393e-09
1.6727033117204262e-09
1.5010698530901528e-09
1.347071344111145e-09
1800 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
1.2088951024813536e-09
1.0849106904200139e-09
9.736599966038641e-10
8.738333708142846e-10
7.842541269558106e-10
7.038711045065831e-10
6.317371277104943e-10
5.670057512125597e-10
5.08915453134878e-10
4.5678390119330306e-10
4.0999973929598685e-10
3.6801375426220425e-10
3.303311273300117e-10
2.965134487651615e-10
2.6616089632731117e-10
2.3891982879379247e-10
2.1446961352153604e-10
1.9252572584463677e-10
1.7282859124437483e-10
1.5514900929464294e-10
1.392805601387578e-10
1.250371249717539e-10
1.1225226616102369e-10
1.0077473312033485e-10
9.04737312852764e-11
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:30:
UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
/home/ubuntu/anaconda3/lib/python3.7/site-packages/IPython/core/pylabtools.py:128:
UserWarning: Creating legend with loc="best" can be slow with large amounts of data.
fig.canvas.print_figure(bytes_io, **kw)
100.4. LONG SIMULATION 1801
It takes about 1000 periods to reach the ergodic distribution – an outcome that is forecast by
approximations to rates of convergence that appear in [23] and that we discuss in a previous
lecture.
We discard the first 2000 observations of the simulation and construct the histogram of the
part value of government debt.
We obtain the following graph for the histogram of the last 100,000 observations on the par
value of government debt.
The black vertical line denotes the sample mean for the last 100,000 observations included in
ℬ∗
the histogram; the green vertical line denotes the value of 𝐸𝑢 , associated with the sample
𝑐
∗
(presumably) from the ergodic where ℬ is the regression coefficient described below; the red
vertical line denotes an approximation by [23] to the mean of the ergodic distribution that
can be precomputed before sampling from the ergodic distribution, as described below.
Before moving on to discuss the histogram and the vertical lines approximating the ergodic
mean of government debt in more detail, the following graphs show government debt and
taxes early in the simulation, for periods 1-100 and 101 to 200 respectively.
axes[i+2].set(title=titles[i])
axes[i].grid()
axes[i+2].grid()
For the short samples early in our simulated sample of 102,000 observations, fluctuations in
government debt and the tax rate conceal the weak but inexorable force that the Ramsey
planner puts into both series driving them toward ergodic distributions far from these early
observations
• early observations are more influenced by the initial value of the par value of
government debt than by the ergodic mean of the par value of government
debt
1806 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
• much later observations are more influenced by the ergodic mean and are independent
of the initial value of the par value of government debt
• the rate of convergence to the ergodic distribution from an arbitrary initial government
debt
We begin by computing objects required by the theory of section III.i of [23].
As in Fiscal Insurance via Fluctuating Interest Rates, we recall that [23] used a particular
notation to represent what we can regard as a generalization of the AMSS model.
We introduce some of the [23] notation so that readers can quickly relate notation that ap-
pears in their key formulas to the notation that we have used in previous lectures here and
here.
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to notation that we used in earlier
lectures by
𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅𝑡−1 =
𝑢𝑐,𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]
[23] call 𝒳𝑡 the effective government deficit, and ℬ𝑡 the effective government debt.
Equation (44) of [23] expresses the time 𝑡 state 𝑠 government budget constraint as
where the dependence on 𝜏 is to remind us that these objects depend on the tax rate; 𝑠− is
last period’s Markov state.
BEGS interpret random variations in the right side of (1) as fiscal risks generated by
• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government
portfolio, namely, ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡
BEGS give conditions under which the ergodic mean of ℬ𝑡 approximately satisfies the equa-
tion
100.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1807
cov∞ (ℛt , 𝒳t )
ℬ∗ = − (2)
var∞ (ℛt )
where the superscript ∞ denotes a moment taken with respect to an ergodic distribution.
Formula (2) represents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribution.
Regression coefficient ℬ∗ solves a variance-minimization problem:
The minimand in criterion (3) measures fiscal risk associated with a given tax-debt policy
that appears on the right side of equation (1).
Expressing formula (2) in terms of our notation tells us that the ergodic mean of the par
value 𝑏 of government debt in the AMSS model should approximately equal
ℬ∗ ℬ∗
𝑏̂ = = (4)
𝛽𝐸(𝐸𝑡 𝑢𝑐,𝑡+1 ) 𝛽𝐸(𝑢𝑐,𝑡+1 )
where mathematical expectations are taken with respect to the ergodic distribution.
BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition.
𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1
∗
≈ (5)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var∞ (ℛ)
2
The remainder of this lecture is about technical material based on formulas from [23].
The topic is interpreting and extending formula (3) for the ergodic mean ℬ∗ .
Attributes of the ergodic distribution for ℬ𝑡 appear on the right side of formula (3) for the
ergodic mean ℬ∗ .
Thus, formula (3) is not useful for estimating the mean of the ergodic in advance of actually
computing the ergodic distribution
• we need to know the ergodic distribution to compute the right side of for-
mula (3)
1808 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
So the primary use of equation (3) is how it confirms that the ergodic distribution solves a
fiscal-risk minimization problem.
As an example, notice how we used the formula for the mean of ℬ in the ergodic distribution
of the special AMSS economy in Fiscal Insurance via Fluctuating Interest Rates
[23] propose an approximation to ℬ∗ that can be computed without first knowing the ergodic
distribution.
To construct the BEGS approximation to ℬ∗ , we just follow steps set forth on pages 648 - 650
of section III.D of [23]
• notation in BEGS might be confusing at first sight, so it is important to stare and di-
gest before computing
• there are also some sign errors in the [23] text that we’ll want to correct
Here is a step-by-step description of the [23] approximation procedure.
Step 2: Knowing 𝑐𝜏 (𝑠), 𝑠 = 1, … , 𝑆 for a given 𝜏 , we want to compute the random variables
𝑐𝜏 (𝑠)−𝜎
ℛ𝜏 (𝑠) = 𝑆
𝛽 ∑𝑠′ =1 𝑐𝜏 (𝑠′ )−𝜎 𝜋(𝑠′ )
and
each for 𝑠 = 1, … , 𝑆.
100.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1809
BEGS call ℛ𝜏 (𝑠) the effective return on risk-free debt and they call 𝒳𝜏 (𝑠) the effective
government deficit.
Step 3: With the preceding objects in hand, for a given ℬ, we seek a 𝜏 that satisfies
𝛽 𝛽
ℬ=− 𝐸𝒳𝜏 ≡ − ∑ 𝒳𝜏 (𝑠)𝜋(𝑠)
1−𝛽 1−𝛽 𝑠
This equation says that at a constant discount factor 𝛽, equivalent government debt ℬ equals
the present value of the mean effective government surplus.
Typo alert: there is a sign error in equation (46) of [23] –the left side should be multiplied
by −1.
For a given ℬ, let a 𝜏 that solves the above equation be called 𝜏 (ℬ).
We’ll use a Python root solver to finds a 𝜏 that this equation for a given ℬ.
We’ll use this function to induce a function 𝜏 (ℬ).
Step 4: With a Python program that computes 𝜏 (ℬ) in hand, next we write a Python func-
tion to compute the random variable.
Step 5: Now that we have a machine to compute the random variable 𝐽 (ℬ)(𝑠), 𝑠 = 1, … , 𝑆,
via a composition of Python functions, we can use the population variance function that we
defined in the code above to construct a function var(𝐽 (ℬ)).
We put var(𝐽 (ℬ)) into a function minimizer and compute
Step 6: Next we take the minimizer ℬ∗ and the Python functions for computing means and
variances and compute
1
rate =
1+ 𝛽 2 var(ℛ𝜏(ℬ∗ ) )
(ℬ∗ , rate)
𝑑𝑖𝑣 = 𝛽𝐸𝑢𝑐,𝑡+1
and then compute the mean of the par value of government debt in the AMSS model
ℬ∗
𝑏̂ =
𝑑𝑖𝑣
1810 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
In the two-Markov-state AMSS economy in Fiscal Insurance via Fluctuating Interest Rates,
𝐸𝑡 𝑢𝑐,𝑡+1 = 𝐸𝑢𝑐,𝑡+1 in the ergodic distribution and we have confirmed that this formula very
accurately describes a constant par value of government debt that
• this is the red vertical line plotted in the histogram of the last 100,000 obser-
vations of our simulation of the par value of government debt plotted above
100.5.7 Execution
Step 1
Step 2
100.5.9 Code
In [15]: u.π
In [16]: s = 0
R, X = compute_R_X(τ, u, s)
1812 CHAPTER 100. FISCAL RISK AND GOVERNMENT DEBT
In [17]: R
In [18]: mean(R, s)
Out[18]: 1.1111111111111112
In [19]: X
In [20]: mean(X, s)
Out[20]: 0.19134248445303795
In [21]: X @ u.π
Step 3
In [23]: s = 0
B = 1.0
Out[23]: 0.2740159773695818
Step 4
In [25]: min_J(B, u, s)
Out[25]: 0.035564405653720765
100.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1813
Step 6
Out[26]: -1.199483167941158
Out[29]: -1.0577661126390971
Out[30]: 0.09572916798461703
Out[32]: 0.9931353432732218
101.1 Contents
• Overview 101.2
• Setting 101.3
• Competitive Equilibrium 101.4
• Inventory of Objects in Play 101.5
• Analysis 101.6
• Calculating all Promise-Value Pairs in CE 101.7
• Solving a Continuation Ramsey Planner’s Bellman Equation 101.8
Co-author: Sebastian Graves
In addition to what’s in Anaconda, this lecture will need the following libraries:
101.2 Overview
This lecture describes how Chang [34] analyzed competitive equilibria and a best competi-
tive equilibrium called a Ramsey plan.
He did this by
• characterizing a competitive equilibrium recursively in a way also employed in the dy-
namic Stackelberg problems and Calvo model lectures to pose Stackelberg problems in
linear economies, and then
• appropriately adapting an argument of Abreu, Pearce, and Stachetti [2] to describe key
features of the set of competitive equilibria
Roberto Chang [34] chose a model of Calvo [29] as a simple structure that conveys ideas that
apply more broadly.
A textbook version of Chang’s model appears in chapter 25 of [108].
This lecture and Credible Government Policies in Chang Model can be viewed as more so-
phisticated and complete treatments of the topics discussed in Ramsey plans, time inconsis-
1815
1816 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, ….
The objects in play are
• an initial quantity 𝑀−1 of nominal money holdings
• a sequence of inverse money growth rates ℎ⃗ and an associated sequence of nominal
money holdings 𝑀⃗
• a sequence of values of money 𝑞 ⃗
• a sequence of real money holdings 𝑚⃗
• a sequence of total tax collections 𝑥⃗
• a sequence of per capita rates of consumption 𝑐 ⃗
• a sequence of per capita incomes 𝑦 ⃗
A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget con-
straints and other constraints imposed by competitive equilibrium.
Given tax collection and price of money sequences, a representative household chooses se-
quences (𝑐,⃗ 𝑚)
⃗ of consumption and real balances.
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling
decisions of the government and the representative household.
Chang adopts a version of a model that [29] designed to exhibit time-inconsistency of a Ram-
sey policy in a simple and transparent setting.
101.3. SETTING 1817
101.3 Setting
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively.
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, which we can also call the value of money.
Chang [34] assumes that
• 𝑢 ∶ ℝ+ → ℝ is twice continuously differentiable, strictly concave, and strictly increasing;
• 𝑣 ∶ ℝ+ → ℝ is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0
The household carries real balances out of a period equal to 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 .
Inequality (2) is the household’s time 𝑡 budget constraint.
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on income, consumption, taxes,
and real balances 𝑞𝑡 𝑀𝑡−1 carried into the period.
Equation (3) imposes an exogenous upper bound 𝑚̄ on the household’s choice of real bal-
ances, where 𝑚̄ ≥ 𝑚𝑓 .
1818 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
101.3.2 Government
The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋.
𝑡
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚].
̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸.
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:
𝑦𝑡 = 𝑓(𝑥𝑡 ), (5)
where 𝑓 ∶ ℝ → ℝ satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ ℝ, so that subsidies and taxes are equally distorting.
Calvo’s and Chang’s purpose is not to model the causes of tax distortions in any detail but
simply to summarize the outcome of those distortions via the function 𝑓(𝑥).
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues.
Ramsey plan: A Ramsey plan is a competitive equilibrium that maximizes (1).
Within-period timing of decisions is as follows:
• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;
• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household
chooses 𝑀𝑡 and therefore 𝑚𝑡 because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡
This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected.
This consideration will be important in lecture credible government policies when we study
credible government policies.
The model is designed to focus on the intertemporal trade-offs between the welfare benefits
of deflation and the welfare costs associated with the high tax collections required to retire
money at a rate that delivers deflation.
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing sufficiently large distorting tax collections.
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation.
101.4. COMPETITIVE EQUILIBRIUM 1819
∞
ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’.
From the standpoint of the household at time 𝑡, equation (7) shows that 𝜃𝑡+1 intermediates
the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡 .
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1 .
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation.
A closely related observation pervaded the analysis of Stackelberg plans in lecture dynamic
Stackelberg problems.
Definition:
• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0.
• A price system is a nonnegative value of money sequence 𝑞.⃗
• An allocation is a triple of nonnegative sequences (𝑐,⃗ 𝑚,⃗ 𝑦).
⃗
It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸.
Definition:
1820 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
• Let Ω denote the set of initial promised marginal utilities of money 𝜃0 associated with
competitive equilibria.
• Chang exploits the fact that a competitive equilibrium consists of a first period outcome
(ℎ0 , 𝑚0 , 𝑥0 ) and a continuation competitive equilibrium with marginal utility of money
𝜃1 ∈ Ω.
ℎ𝑡 = ℎ(𝜃𝑡 )
𝑚𝑡 = 𝑚(𝜃𝑡 )
(8)
𝑥𝑡 = 𝑥(𝜃𝑡 )
𝜃𝑡+1 = Ψ(𝜃𝑡 )
starting from 𝜃0
The range and domain of Ψ(⋅) are both Ω
• Imagine that after a ‘revolution’ at time 𝑡 ≥ 1, a new Ramsey planner is given the op-
portunity to ignore history and solve a brand new Ramsey plan.
101.6. ANALYSIS 1821
• This new planner would want to reset the 𝜃𝑡 associated with the original Ramsey plan
to 𝜃0 .
• The incentive to reinitialize 𝜃𝑡 associated with this revolution experiment indicates the
time-inconsistency of the Ramsey plan.
• By resetting 𝜃 to 𝜃0 , the new planner avoids the costs at time 𝑡 that the original Ram-
sey planner must pay to reap the beneficial effects that the original Ramsey plan for
𝑠 ≥ 𝑡 had achieved via its influence on the household’s decisions for 𝑠 = 0, … , 𝑡 − 1.
101.6 Analysis
A competitive equilibrium is a triple of sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ that satisfies (2), (3), and
(6).
Chang works with a set of competitive equilibria defined as follows.
Definition: 𝐶𝐸 = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ such that (2), (3), and (6) are satisfied }.
𝐶𝐸 is not empty because there exists a competitive equilibrium with ℎ𝑡 = 1 for all 𝑡 ≥ 1,
namely, an equilibrium with a constant money supply and constant price level.
Chang establishes that 𝐶𝐸 is also compact.
Chang makes the following key observation that combines ideas of Abreu, Pearce, and Stac-
chetti [2] with insights of Kydland and Prescott [102].
Proposition: The continuation of a competitive equilibrium is a competitive equilibrium.
That is, (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 implies that (𝑚⃗ 𝑡 , 𝑥𝑡⃗ , ℎ⃗ 𝑡 ) ∈ 𝐶𝐸 ∀ 𝑡 ≥ 1.
(Lecture dynamic Stackelberg problems also used a version of this insight)
We can now state that a Ramsey problem is to
∞
max ∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈𝐸 ∞
𝑡=0
Ω = {𝜃 ∈ ℝ such that 𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for some (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
Equation (6) inherits from the household’s Euler equation for money holdings the prop-
erty that the value of 𝑚0 consistent with the representative household’s choices depends on
(ℎ⃗ 1 , 𝑚⃗ 1 ).
This dependence is captured in the definition above by making Ω be the set of first period
values of 𝜃0 satisfying 𝜃0 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for first period component (𝑚0 , ℎ0 ) of compet-
⃗
itive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ).
Chang establishes that Ω is a nonempty and compact subset of ℝ+ .
Next Chang advances:
1822 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
∞
𝑤(𝜃) = max ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈Γ(𝜃) 𝑡=0
and
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥) (11)
and
−𝑥 = 𝑚(1 − ℎ) (12)
and
Before we use this proposition to recover a recursive representation of the Ramsey plan, note
that the proposition relies on knowing the set Ω.
To find Ω, Chang uses the insights of Kydland and Prescott [102] together with a method
based on the Abreu, Pearce, and Stacchetti [2] iteration to convergence on an operator 𝐵 that
maps continuation values into values.
We want an operator that maps a continuation 𝜃 into a current 𝜃.
Chang lets 𝑄 be a nonempty, bounded subset of ℝ.
Elements of the set 𝑄 are taken to be candidate values for continuation marginal utilities.
101.6. ANALYSIS 1823
2. Ω = 𝐵(Ω) (‘factorization’).
Let ℎ⃗ 𝑡 = (ℎ0 , ℎ1 , … , ℎ𝑡 ) denote a history of inverse money creation rates with time 𝑡 compo-
nent ℎ𝑡 ∈ Π.
A government strategy 𝜎 = {𝜎𝑡 }∞
𝑡=0 is a 𝜎0 ∈ Π and for 𝑡 ≥ 1 a sequence of functions 𝜎𝑡 ∶
Π𝑡−1 → Π.
Chang restricts the government’s choice of strategies to the following space:
𝐶𝐸𝜋 = {ℎ⃗ ∈ Π∞ ∶ there is some (𝑚,⃗ 𝑥)⃗ such that (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}
In words, 𝐶𝐸𝜋 is the set of money growth sequences consistent with the existence of competi-
tive equilibria.
Chang observes that 𝐶𝐸𝜋 is nonempty and compact.
Definition: 𝜎 is said to be admissible if for all 𝑡 ≥ 1 and after any history ℎ⃗ 𝑡−1 , the continua-
tion ℎ⃗ 𝑡 implied by 𝜎 belongs to 𝐶𝐸𝜋 .
Admissibility of 𝜎 means that anticipated policy choices associated with 𝜎 are consistent with
the existence of competitive equilibria after each possible subsequent history.
After any history ℎ⃗ 𝑡−1 , admissibility restricts the government’s choice in period 𝑡 to the set
In words, 𝐶𝐸𝜋0 is the set of all first period money growth rates ℎ = ℎ0 , each of which is con-
sistent with the existence of a sequence of money growth rates ℎ⃗ starting from ℎ0 in the ini-
tial period and for which a competitive equilibrium exists.
Remark: 𝐶𝐸𝜋0 = {ℎ ∈ Π ∶ there is (𝑚, 𝜃′ ) ∈ [0, 𝑚]×Ω
̄ such that 𝑚𝑢′ [𝑓((ℎ−1)𝑚)−𝑣′ (𝑚)] ≤
′
𝛽𝜃 with equality if 𝑚 < 𝑚}.
̄
1824 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
At this point it is convenient to introduce another operator that can be used to compute a
Ramsey plan.
For computing a Ramsey plan, this operator is wasteful because it works with a state vector
that is bigger than necessary.
We introduce this operator because it helps to prepare the way for Chang’s operator called
̃
𝐷(𝑍) that we shall describe in lecture credible government policies.
It is also useful because a fixed point of the operator to be defined here provides a good guess
̃
for an initial set from which to initiate iterations on Chang’s set-to-set operator 𝐷(𝑍) to be
described in lecture credible government policies.
Let 𝑆 be the set of all pairs (𝑤, 𝜃) of competitive equilibrium values and associated initial
marginal utilities.
Let 𝑊 be a bounded set of values in ℝ.
Let 𝑍 be a nonempty subset of 𝑊 × Ω.
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, 𝜃 pairs.
Define the operator
such that
It is possible to establish.
Proposition:
2. 𝑆 = 𝐷(𝑆) (‘factorization’).
Proposition:
It can be shown that 𝑆 is compact and that therefore there exists a (𝑤, 𝜃) pair within this set
that attains the highest possible value 𝑤.
This (𝑤, 𝜃) pair i associated with a Ramsey plan.
Further, we can compute 𝑆 by iterating to convergence on 𝐷 provided that one begins with a
sufficiently large initial set 𝑆0 .
As a very useful by-product, the algorithm that finds the largest fixed point 𝑆 = 𝐷(𝑆) also
produces the Ramsey plan, its value 𝑤, and the associated competitive equilibrium.
𝐷(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
We noted that the set 𝑆 can be found by iterating to convergence on 𝐷, provided that we
start with a sufficiently large initial set 𝑆0 .
Our implementation builds on ideas in this notebook.
1826 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
To find 𝑆 we use a numerical algorithm called the outer hyperplane approximation algorithm.
It was invented by Judd, Yeltekin, Conklin [94].
This algorithm constructs the smallest convex set that contains the fixed point of the 𝐷(𝑆)
operator.
Given that we are finding the smallest convex set that contains 𝑆, we can represent it on a
computer as the intersection of a finite number of half-spaces.
Let 𝐻 be a set of subgradients, and 𝐶 be a set of hyperplane levels.
We approximate 𝑆 by:
A key feature of this algorithm is that we discretize the action space, i.e., we create a grid of
possible values for 𝑚 and ℎ (note that 𝑥 is implied by 𝑚 and ℎ). This discretization simplifies
computation of 𝑆 ̃ by allowing us to find it by solving a sequence of linear programs.
The outer hyperplane approximation algorithm proceeds as follows:
• Solve a linear program (described below) for each action in the action space.
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1 .
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
101.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1827
This problem maximizes the hyperplane level for a given set of actions.
The second part of Step 2 then finds the maximum possible hyperplane level across the action
space.
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0 .
Step 3 ends the algorithm when the difference between these sets is small enough.
We have created a Python class that solves the model assuming the following functional
forms:
𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
In [3]: """
Author: Sebastian Graves
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang�
↪(1998)
# Record parameters
self.β, self.mbar, self.h_min, self.h_max = β, mbar, h_min, h_max
self.n_h, self.n_m, self.N_g = n_h, n_m, N_g
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
1830 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq,�
↪ b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h),�
↪ 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
for i in range(self.N_g):
c_a1a2_c, t_a1a2_c = np.full(self.N_a, -np.inf), \
np.zeros((self.N_a, 2))
c_a1a2_s, t_a1a2_s = np.full(self.N_a, -np.inf), \
np.zeros((self.N_a, 2))
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq =�
↪ aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, \
self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β * res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq =�
↪ aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, \
self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β*res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_s[j] = res.x
self.Θ_vec[idx_s]])
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), \
np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} \
seconds'.format(iters, round(elapsed, 2)))
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) \
+ self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
# Bellman Iterations
diff = 1
iters = 1
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
101.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1835
for i in range(100):
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
1836 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)
�
↪---------------------------------------------------------------------------
<ipython-input-4-d19a06b35f4c> in <module>
1 ch1 = ChangModel(β=0.3, mbar=30, h_min=0.9, h_max=2,�
↪n_h=8, n_m=35, N_g=10)
----> 2 ch1.solve_sustainable()
<ipython-input-3-ae015887b922> in solve_sustainable(self,�
↪tol, max_iter)
101.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1837
<ipython-input-3-ae015887b922> in solve_subgradient(self)
233 res = linprog(c, A_ub=aineq_S,�
↪b_ub=bineq_S, A_eq = aeq_S,
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog.py in linprog(c, A_ub, b_ub, A_eq, b_eq, bounds, method,�
↪callback, options, x0)
542
543 sol = {
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _postprocess(x, c, A_ub, b_ub, A_eq, b_eq,�
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _check_result(x, fun, status, slack, con, lb,�
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
plt.tight_layout()
plt.show()
plot_competitive(ch1)
101.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1839
�
↪---------------------------------------------------------------------------
1840 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
<ipython-input-6-1970f7c91f36> in <module>
1 ch2 = ChangModel(β=0.8, mbar=30, h_min=0.9, h_max=1/0.8,
2 n_h=8, n_m=35, N_g=10)
----> 3 ch2.solve_sustainable()
<ipython-input-3-ae015887b922> in solve_sustainable(self,�
↪tol, max_iter)
<ipython-input-3-ae015887b922> in solve_subgradient(self)
233 res = linprog(c, A_ub=aineq_S,�
↪b_ub=bineq_S, A_eq = aeq_S,
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog.py in linprog(c, A_ub, b_ub, A_eq, b_eq, bounds, method,�
↪callback, options, x0)
542
543 sol = {
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _postprocess(x, c, A_ub, b_ub, A_eq, b_eq,�
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _check_result(x, fun, status, slack, con, lb,�
↪ub, tol, message)
In [7]: plot_competitive(ch2)
In this section we solve the Bellman equation confronting a continuation Ramsey planner.
The construction of a Ramsey plan is decomposed into a two subproblems in Ramsey plans,
time inconsistency, sustainable plans and dynamic Stackelberg problems.
1842 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
subject to:
𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥)
𝑥 = 𝑚(ℎ − 1)
(𝑚, 𝑥, ℎ) ∈ 𝐸
𝜃′ ∈ Ω
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:35:
RuntimeWarning: invalid value encountered in log
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:311:
RuntimeWarning: invalid value encountered in log
First, a quick check that our approximations of the value functions are good.
We do this by calculating the residuals between iterates on the value function on a fine grid:
The value functions plotted below trace out the right edges of the sets of equilibrium values
plotted above
plt.show()
The next figure plots the optimal policy functions; values of 𝜃′ , 𝑚, 𝑥, ℎ for each value of the
state 𝜃:
plt.show()
1844 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
With the first set of parameter values, the value of 𝜃′ chosen by the Ramsey planner quickly
hits the upper limit of Ω.
But with the second set of parameters it converges to a value in the interior of the set.
Consequently, the choice of 𝜃 ̄ is clearly important with the first set of parameter values.
One way of seeing this is plotting 𝜃′ (𝜃) for each set of parameters.
With the first set of parameter values, this function does not intersect the 45-degree line until
𝜃,̄ whereas in the second set of parameter values, it intersects in the interior.
axes[0].legend()
plt.show()
Subproblem 2 is equivalent to the planner choosing the initial value of 𝜃 (i.e. the value which
maximizes the value function).
From this starting point, we can then trace out the paths for {𝜃𝑡 , 𝑚𝑡 , ℎ𝑡 , 𝑥𝑡 }∞
𝑡=0 that support
this equilibrium.
These are shown below for both sets of parameters
plt.show()
1846 CHAPTER 101. COMPETITIVE EQUILIBRIA OF CHANG MODEL
In Credible Government Policies in Chang Model we shall find a subset of competitive equi-
libria that are sustainable in the sense that a sequence of government administrations that
chooses sequentially, rather than once and for all at time 0 will choose to implement them.
In the process of constructing them, we shall construct another, smaller set of competitive
equilibria.
Chapter 102
102.1 Contents
• Overview 102.2
• The Setting 102.3
• Calculating the Set of Sustainable Promise-Value Pairs 102.4
Co-author: Sebastian Graves
In addition to what’s in Anaconda, this lecture will need the following libraries:
102.2 Overview
Some of the material in this lecture and competitive equilibria in the Chang model can be
viewed as more sophisticated and complete treatments of the topics discussed in Ramsey
plans, time inconsistency, sustainable plans.
This lecture assumes almost the same economic environment analyzed in competitive equilib-
ria in the Chang model.
The only change – and it is a substantial one – is the timing protocol for making government
decisions.
In competitive equilibria in the Chang model, a Ramsey planner chose a comprehensive gov-
ernment policy once-and-for-all at time 0.
Now in this lecture, there is no time 0 Ramsey planner.
Instead there is a sequence of government decision-makers, one for each 𝑡.
The time 𝑡 government decision-maker choose time 𝑡 government actions after forecasting
what future governments will do.
We use the notion of a sustainable plan proposed in [35], also referred to as a credible public
policy in [150].
Technically, this lecture starts where lecture competitive equilibria in the Chang model on
1847
1848 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
We begin by reviewing the set up deployed in competitive equilibria in the Chang model.
Chang’s model, adopted from Calvo, is designed to focus on the intertemporal trade-offs be-
tween the welfare benefits of deflation and the welfare costs associated with the high tax col-
lections required to retire money at a rate that delivers deflation.
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing an infinite sequence of sufficiently large distorting tax collections.
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation.
We start by reviewing notation.
For a sequence of scalars 𝑧 ⃗ ≡ {𝑧𝑡 }∞ 𝑡
𝑡=0 , let 𝑧 ⃗ = (𝑧0 , … , 𝑧𝑡 ), 𝑧𝑡⃗ = (𝑧𝑡 , 𝑧𝑡+1 , …).
An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, ….
The objects in play are
• an initial quantity 𝑀−1 of nominal money holdings
102.3. THE SETTING 1849
A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively.
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize
∞
∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0
subject to
𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)
and
𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)
Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, also known as the value of money.
Chang [34] assumes that
• 𝑢 ∶ ℝ+ → ℝ is twice continuously differentiable, strictly concave, and strictly increasing;
• 𝑣 ∶ ℝ+ → ℝ is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0
Real balances carried out of a period equal 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 .
Inequality (2) is the household’s time 𝑡 budget constraint.
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on income, consumption, taxes,
and real balances 𝑞𝑡 𝑀𝑡−1 carried into the period.
Equation (3) imposes an exogenous upper bound 𝑚̄ on the choice of real balances, where 𝑚̄ ≥
𝑚𝑓 .
1850 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
102.3.2 Government
The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋.
𝑡
−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)
The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚].
̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸.
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:
𝑦𝑡 = 𝑓(𝑥𝑡 ) (5)
where 𝑓 ∶ ℝ → ℝ satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ ℝ, so that subsidies and taxes are equally distorting.
The purpose is not to model the causes of tax distortions in any detail but simply to summa-
rize the outcome of those distortions via the function 𝑓(𝑥).
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues.
The government chooses a competitive equilibrium that maximizes (1).
For the results in this lecture, the timing of actions within a period is important because of
the incentives that it activates.
Chang assumed the following within-period timing of decisions:
• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;
• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household
chooses 𝑀𝑡 and therefore 𝑚𝑡 because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡
This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected.
This timing will shape the incentives confronting the government at each history that are to
be incorporated in the construction of the 𝐷̃ operator below.
102.3. THE SETTING 1851
∞
ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0
+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}
𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄
𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies
This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’.
From the standpoint of the household at time 𝑡, equation (7) shows that 𝜃𝑡+1 intermediates
the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡 .
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1 .
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation.
A closely related observation pervaded the analysis of Stackelberg plans in dynamic Stackel-
berg problems and the Calvo model.
Definition:
• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0.
• A price system is a non-negative value of money sequence 𝑞.⃗
• An allocation is a triple of non-negative sequences (𝑐,⃗ 𝑚,⃗ 𝑦).
⃗
It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸.
Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be
a competitive equilibrium if
• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 ).
1852 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
• Here it is to be understood that ℎ̂ 𝑡 is the action that the government policy instructs
the government to take, while ℎ𝑡 possibly not equal to ℎ̂ 𝑡 is some other action that the
government is free to take at time 𝑡.
The plan is credible if it is in the time 𝑡 government’s interest to execute it.
Credibility requires that the plan be such that for all possible choices of ℎ𝑡 that are consistent
with competitive equilibria,
so that at each instance and circumstance of choice, a government attains a weakly higher
lifetime utility with continuation value 𝑤𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) by adhering to the plan and con-
firming the associated time 𝑡 action ℎ̂ 𝑡 that the public had expected earlier.
Please note the subtle change in arguments of the functions used to represent a competitive
equilibrium and a Ramsey plan, on the one hand, and a credible government plan, on the
other hand.
The extra arguments appearing in the functions used to represent a credible plan come from
allowing the government to contemplate disappointing the private sector’s expectation about
its time 𝑡 choice ℎ̂ 𝑡 .
102.3. THE SETTING 1853
A credible plan induces the government to confirm the private sector’s expectation.
The recursive representation of the plan uses the evolution of continuation values to deter the
government from wanting to disappoint the private sector’s expectations.
Technically, a Ramsey plan and a credible plan both incorporate history dependence.
For a Ramsey plan, this is encoded in the dynamics of the state variable 𝜃𝑡 , a promised
marginal utility that the Ramsey plan delivers to the private sector.
For a credible government plan, we the two-dimensional state vector (𝑤𝑡 , 𝜃𝑡 ) encodes history
dependence.
A government strategy 𝜎 and an allocation rule 𝛼 are said to constitute a sustainable plan
(SP) if.
1. 𝜎 is admissible.
2. Given 𝜎, 𝛼 is competitive.
3. After any history ℎ⃗ 𝑡−1 , the continuation of 𝜎 is optimal for the government; i.e., the se-
quence ℎ⃗ 𝑡 induced by 𝜎 after ℎ⃗ 𝑡−1 maximizes over 𝐶𝐸𝜋 given 𝛼.
Given any history ℎ⃗ 𝑡−1 , the continuation of a sustainable plan is a sustainable plan.
Let Θ = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 ∶ there is an SP whose outcome is(𝑚,⃗ 𝑥,⃗ ℎ)}.
⃗
with value
∞
𝑤 = ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )] and such that 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) = 𝜃}
𝑡=0
The space 𝑆 is a compact subset of 𝑊 × Ω where 𝑊 = [𝑤, 𝑤] is the space of values associated
with sustainable plans. Here 𝑤 and 𝑤 are finite bounds on the set of values.
Because there is at least one sustainable plan, 𝑆 is nonempty.
Now recall the within-period timing protocol, which we can depict (ℎ, 𝑥) → 𝑚 = 𝑞𝑀 → 𝑦 = 𝑐.
With this timing protocol in mind, the time 0 component of an SP has the following compo-
nents:
1. A period 0 action ℎ̂ ∈ Π that the public expects the government to take, together
̂ 𝑥(ℎ)̂ when the government acts as
with subsequent within-period consequences 𝑚(ℎ),
expected.
2. For any first-period action ℎ ≠ ℎ̂ with ℎ ∈ 𝐶𝐸𝜋0 , a pair of within-period consequences
𝑚(ℎ), 𝑥(ℎ) when the government does not act as the public had expected.
1854 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
3. For every ℎ ∈ Π, a pair (𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ 𝑆 to carry into next period.
These components must be such that it is optimal for the government to choose ℎ̂ as ex-
pected; and for every possible ℎ ∈ Π, the government budget constraint and the household’s
Euler equation must hold with continuation 𝜃 being 𝜃′ (ℎ).
Given the timing protocol within the model, the representative household’s response to a
government deviation to ℎ ≠ ℎ̂ from a prescribed ℎ̂ consists of a first-period action 𝑚(ℎ)
and associated subsequent actions, together with future equilibrium prices, captured by
(𝑤′ (ℎ), 𝜃′ (ℎ)).
At this point, Chang introduces an idea in the spirit of Abreu, Pearce, and Stacchetti [2].
Let 𝑍 be a nonempty subset of 𝑊 × Ω.
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, promised
marginal utility pairs.
Define the following operator:
̃
𝐷(𝑍) = {(𝑤, 𝜃) ∶ there is ℎ̂ ∈ 𝐶𝐸𝜋0 and for each ℎ ∈ 𝐶𝐸𝜋0
(9)
a four-tuple (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
̂ + 𝑣(𝑚(ℎ))
𝑤 = 𝑢(𝑓(𝑥(ℎ))) ̂ + 𝛽𝑤′ (ℎ)̂ (10)
̂
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ)̂ + 𝑥(ℎ))
̂ (11)
and
This operator adds the key incentive constraint to the conditions that had defined the earlier
𝐷(𝑍) operator defined in competitive equilibria in the Chang model.
Condition (12) requires that the plan deter the government from wanting to take one-shot
deviations when candidate continuation values are drawn from 𝑍.
Proposition:
̃
1. If 𝑍 ⊂ 𝐷(𝑍), ̃
then 𝐷(𝑍) ⊂ 𝑆 (‘self-generation’).
102.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1855
̃
2. 𝑆 = 𝐷(𝑆) (‘factorization’).
Proposition:.
Chang establishes that 𝑆 is compact and that therefore there exists a highest value SP and a
lowest value SP.
Further, the preceding structure allows Chang to compute 𝑆 by iterating to convergence on 𝐷̃
provided that one begins with a sufficiently large initial set 𝑍0 .
This structure delivers the following recursive representation of a sustainable outcome:
2. generate a sustainable outcome recursively by iterating on (8), which we repeat here for
convenience:
ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
̃
Above we defined the 𝐷(𝑍) operator as (9).
Chang (1998) provides a method for dealing with the final three constraints.
These incentive constraints ensure that the government wants to choose ℎ̂ as the private sec-
tor had expected it to.
Chang’s simplification starts from the idea that, when considering whether or not to confirm
the private sector’s expectation, the government only needs to consider the payoff of the best
possible deviation.
Equally, to provide incentives to the government, we only need to consider the harshest possi-
ble punishment.
Let ℎ denote some possible deviation. Chang defines:
𝑥 = 𝑚(ℎ − 1)
1856 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) + 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
For a given deviation ℎ, this problem finds the worst possible sustainable value.
We then define:
𝐸(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍
such that
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))
𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)
and
𝑤 ≥ 𝐵𝑅(𝑍)}
Aside from the final incentive constraint, this is the same as the operator in competitive equi-
libria in the Chang model.
Consequently, to implement this operator we just need to add one step to our outer hyper-
plane approximation algorithm :
• Solve a linear program (described below) for each action in the action space.
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1 .
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
This gives us a matrix of possible values, corresponding to each point in the action space.
To find 𝐵𝑅(𝑍), we minimize over the 𝑚 dimension and maximize over the ℎ dimension.
Step 3 then constructs the set 𝑆𝑡+1 = 𝐸(𝑆𝑡 ). The linear program in Step 3 is designed to
construct a set 𝑆𝑡+1 that is as large as possible while satisfying the constraints of the 𝐸(𝑆)
operator.
To do this, for each subgradient ℎ𝑖 , and for each point in the action space (𝑚𝑗 , ℎ𝑗 ), we solve
the following problem:
max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]
subject to
𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡
𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)
𝑤 ≥ 𝐵𝑅(𝑍)
This problem maximizes the hyperplane level for a given set of actions.
1858 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
The second part of Step 3 then finds the maximum possible hyperplane level across the action
space.
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0 .
Step 4 ends the algorithm when the difference between these sets is small enough.
We have created a Python class that solves the model assuming the following functional
forms:
𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)
1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500
In [3]: """
Author: Sebastian Graves
import numpy as np
import quantecon as qe
import time
class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang�
↪(1998)
w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space
# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)
return C, H, Z
def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
102.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1861
"""
# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))
aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq,�
↪ b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]
# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h),�
↪ 0))
def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""
# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))
aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]
np.zeros((self.N_a, 2))
c_a1a2_s, t_a1a2_s = np.full(self.N_a, -np.inf), \
np.zeros((self.N_a, 2))
for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq =�
↪ aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, \
self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β * res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_c[j] = res.x
# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq =�
↪ aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, \
self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] \
+ self.β*res.x[0]) + self.H[i, 1] * self.Θ_vec[j]
t_a1a2_s[j] = res.x
for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])
t = time.time()
diff = tol + 1
iters = 0
# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), \
np.copy(self.c1_s)
self.iters = iters
elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} \
seconds'.format(iters, round(elapsed, 2)))
mbar = self.mbar
def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) \
+ self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
# Bellman Iterations
diff = 1
iters = 1
self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))
# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
1866 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p
self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)
# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)
# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun
res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x
# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
102.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1867
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]
self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)
The set of (𝑤, 𝜃) associated with sustainable plans is smaller than the set of (𝑤, 𝜃) pairs asso-
ciated with competitive equilibria, since the additional constraints associated with sustainabil-
ity must also be satisfied.
Let’s compute two examples, one with a low 𝛽, another with a higher 𝛽
In [5]: ch1.solve_sustainable()
�
↪---------------------------------------------------------------------------
<ipython-input-5-ce0f3c9d3306> in <module>
----> 1 ch1.solve_sustainable()
<ipython-input-3-ae015887b922> in solve_sustainable(self,�
↪tol, max_iter)
271 iters = iters + 1
272 self.solve_worst_spe()
--> 273 self.solve_subgradient()
274 diff = max(np.maximum(abs(self.c0_c - self.
↪c1_c),
<ipython-input-3-ae015887b922> in solve_subgradient(self)
233 res = linprog(c, A_ub=aineq_S,�
↪b_ub=bineq_S, A_eq = aeq_S,
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog.py in linprog(c, A_ub, b_ub, A_eq, b_eq, bounds, method,�
542
543 sol = {
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _postprocess(x, c, A_ub, b_ub, A_eq, b_eq,�
↪bounds, complete, undo, status, message, tol, iteration, disp)
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _check_result(x, fun, status, slack, con, lb,�
↪ub, tol, message)
The following plot shows both the set of 𝑤, 𝜃 pairs associated with competitive equilibria (in
red) and the smaller set of 𝑤, 𝜃 pairs associated with sustainable plans (in blue).
ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)
plt.tight_layout()
plt.show()
plot_equilibria(ch1)
1870 CHAPTER 102. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL
In [8]: ch2.solve_sustainable()
�
↪---------------------------------------------------------------------------
<ipython-input-8-b1776dca964b> in <module>
----> 1 ch2.solve_sustainable()
<ipython-input-3-ae015887b922> in solve_sustainable(self,�
↪tol, max_iter)
271 iters = iters + 1
272 self.solve_worst_spe()
--> 273 self.solve_subgradient()
274 diff = max(np.maximum(abs(self.c0_c - self.
↪c1_c),
<ipython-input-3-ae015887b922> in solve_subgradient(self)
233 res = linprog(c, A_ub=aineq_S,�
↪b_ub=bineq_S, A_eq = aeq_S,
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog.py in linprog(c, A_ub, b_ub, A_eq, b_eq, bounds, method,�
542
543 sol = {
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _postprocess(x, c, A_ub, b_ub, A_eq, b_eq,�
1416
~/anaconda3/lib/python3.7/site-packages/scipy/optimize/
↪_linprog_util.py in _check_result(x, fun, status, slack, con, lb,�
In [9]: plot_equilibria(ch2)
[1] Dilip Abreu. On the theory of infinitely repeated games with discounting. Economet-
rica, 56:383–396, 1988.
[2] Dilip Abreu, David Pearce, and Ennio Stacchetti. Toward a theory of discounted re-
peated games with imperfect monitoring. Econometrica, 58(5):1041–1063, September
1990.
[3] Daron Acemoglu, Simon Johnson, and James A Robinson. The colonial origins of com-
parative development: An empirical investigation. The American Economic Review,
91(5):1369–1401, 2001.
[4] Daron Acemoglu and James A. Robinson. The political economy of the Kuznets curve.
Review of Development Economics, 6(2):183–203, 2002.
[5] SeHyoun Ahn, Greg Kaplan, Benjamin Moll, Thomas Winberry, and Christian Wolf.
When inequality matters for macro and macro matters for inequality. NBER Macroeco-
nomics Annual, 32(1):1–75, 2018.
[6] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.
[7] S Rao Aiyagari, Albert Marcet, Thomas J Sargent, and Juha Seppälä. Optimal taxation
without state-contingent debt. Journal of Political Economy, 110(6):1220–1254, 2002.
[8] D. B. O. Anderson and J. B. Moore. Optimal Filtering. Dover Publications, 2005.
[9] E. W. Anderson, L. P. Hansen, E. R. McGrattan, and T. J. Sargent. Mechanics of
Forming and Estimating Dynamic Linear Economies. In Handbook of Computational
Economics. Elsevier, vol 1 edition, 1996.
[10] Cristina Arellano. Default risk and income fluctuations in emerging economies. The
American Economic Review, pages 690–712, 2008.
[11] Papoulis Athanasios and S Unnikrishna Pillai. Probability, random variables, and
stochastic processes. Mc-Graw Hill, 1991.
[12] Orazio P Attanasio and Nicola Pavoni. Risk sharing in private information models with
asset accumulation: Explaining the excess smoothness of consumption. Econometrica,
79(4):1027–1068, 2011.
[13] Robert L Axtell. Zipf distribution of us firm sizes. science, 293(5536):1818–1820, 2001.
[14] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[15] Robert J Barro. Determinants of democracy. Journal of Political economy,
107(S6):S158–S183, 1999.
1873
1874 BIBLIOGRAPHY
[16] Robert J Barro and Rachel McCleary. Religion and economic growth. Technical report,
National Bureau of Economic Research, 2003.
[17] Jess Benhabib and Alberto Bisin. Skewed wealth distributions: Theory and empirics.
Journal of Economic Literature, 56(4):1261–91, 2018.
[18] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.
[20] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.
[21] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.
[23] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J. Sargent. Fiscal Policy
and Debt Management with Incomplete Markets. The Quarterly Journal of Economics,
132(2):617–663, 2017.
[24] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J Sargent. Inequality,
business cycles, and monetary-fiscal policy. Technical report, National Bureau of Eco-
nomic Research, 2018.
[26] Fischer Black and Robert Litterman. Global portfolio optimization. Financial analysts
journal, 48(5):28–43, 1992.
[27] Dariusz Buraczewski, Ewa Damek, Thomas Mikosch, et al. Stochastic models with
power-law tails. Springer, 2016.
[28] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor,
Studies in the Quantity Theory of Money, pages 25–117. University of Chicago Press,
Chicago, 1956.
[29] Guillermo A. Calvo. On the time consistency of optimal policy in a monetary economy.
Econometrica, 46(6):1411–1428, 1978.
[30] Andrew S Caplin. The variability of aggregate demand with (s, s) inventory policies.
Econometrica, pages 1395–1409, 1985.
[31] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.
[32] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.
[33] David Cass. Optimum growth in an aggregative model of capital accumulation. Review
of Economic Studies, 32(3):233–240, 1965.
BIBLIOGRAPHY 1875
[34] Roberto Chang. Credible monetary policy in an infinite horizon model: Recursive ap-
proaches. Journal of Economic Theory, 81(2):431–461, 1998.
[35] Varadarajan V Chari and Patrick J Kehoe. Sustainable plans. Journal of Political
Economy, pages 783–802, 1990.
[36] Ronald Harry Coase. The nature of the firm. economica, 4(16):386–405, 1937.
[37] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.
[38] J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.
[39] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.
[40] Bruno de Finetti. La prevision: Ses lois logiques, ses sources subjectives. Annales de
l’Institute Henri Poincare’, 7:1 – 68, 1937. English translation in Kyburg and Smokler
(eds.), Studies in Subjective Probability, Wiley, New York, 1964.
[41] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.
[42] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.
[43] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.
[44] Raymond J Deneckere and Kenneth L Judd. Cyclical and chaotic behavior in a dy-
namic equilibrium model, with implications for fiscal policy. Cycles and chaos in eco-
nomic equilibrium, pages 308–329, 1992.
[45] J Dickey. Bayesian alternatives to the f-test and least-squares estimate in the normal
linear model. In S.E. Fienberg and A. Zellner, editors, Studies in Bayesian econometrics
and statistics, pages 515–554. North-Holland, Amsterdam, 1975.
[46] JBR Do Val, JC Geromel, and OLV Costa. Solutions for the linear-quadratic control
problem of markov jump linear systems. Journal of Optimization Theory and Applica-
tions, 103(2):283–311, 1999.
[47] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.
[48] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.
[49] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.
[50] Timothy Dunne, Mark J Roberts, and Larry Samuelson. The growth and failure of us
manufacturing plants. The Quarterly Journal of Economics, 104(4):671–698, 1989.
[51] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.
[52] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.
1876 BIBLIOGRAPHY
[53] David S Evans. The relationship between firm growth, size, and age: Estimates for 100
manufacturing industries. The Journal of Industrial Economics, pages 567–581, 1987.
[57] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.
[58] Yoshi Fujiwara, Corrado Di Guilmi, Hideaki Aoyama, Mauro Gallegati, and Wataru
Souma. Do pareto–zipf and gibrat laws hold true? an analysis with european firms.
Physica A: Statistical Mechanics and its Applications, 335(1-2):197–216, 2004.
[59] Xavier Gabaix. Power laws in economics: An introduction. Journal of Economic Per-
spectives, 30(1):185–206, 2016.
[60] David Gale. The theory of linear economic models. University of Chicago press, 1989.
[61] Albert Gallatin. Report on the finances**, november, 1807. In Reports of the Secretary
of the Treasury of the United States, Vol 1. Government printing office, Washington,
DC, 1837.
[62] Robert Gibrat. Les inégalités économiques: Applications d’une loi nouvelle, la loi de
l’effet proportionnel. PhD thesis, Recueil Sirey, 1931.
[63] Edward Glaeser, Jose Scheinkman, and Andrei Shleifer. The injustice of inequality.
Journal of Monetary Economics, 50(1):199–222, 2003.
[65] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.
[66] Bronwyn H Hall. The relationship between firm size and firm growth in the us manu-
facturing sector. The Journal of Industrial Economics, pages 583–606, 1987.
[67] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.
[68] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.
[69] Michael J Hamburger, Gerald L Thompson, and Roman L Weil. Computation of expan-
sion rates for the generalized von neumann model of an expanding economy. Economet-
rica, Journal of the Econometric Society, pages 542–547, 1967.
[70] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.
[72] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
[73] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.
[74] Lars Peter Hansen and Thomas J Sargent. Formulating and estimating dynamic linear
rational expectations models. Journal of Economic Dynamics and control, 2:7–46, 1980.
[75] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics.
Manuscript, Department of Economics, Stanford University., 4, 2000.
[76] Lars Peter Hansen and Thomas J. Sargent. Robust control and model uncertainty.
American Economic Review, 91(2):60–66, 2001.
[77] Lars Peter Hansen and Thomas J Sargent. Robustness. Princeton university press, 2008.
[78] Lars Peter Hansen and Thomas J. Sargent. Recursive Linear Models of Dynamic Eco-
nomics. Princeton University Press, Princeton, New Jersey, 2013.
[79] Lars Peter Hansen and José A Scheinkman. Long-term risk: An operator approach.
Econometrica, 77(1):177–234, 2009.
[80] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.
[81] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.
[82] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.
[83] Elhanan Helpman and Paul Krugman. Market structure and international trade. MIT
Press Cambridge, 1985.
[85] Hugo A Hopenhayn. Entry, exit, and firm dynamics in long run equilibrium. Economet-
rica: Journal of the Econometric Society, pages 1127–1150, 1992.
[86] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.
[87] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A
General Equilibrium Analysis. Journal of Political Economy, 101(5):915–938, 1993.
[89] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.
[90] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.
1878 BIBLIOGRAPHY
[91] Boyan Jovanovic. Firm-specific capital and turnover. Journal of Political Economy,
87(6):1246–1260, 1979.
[92] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.
[93] Kenneth L Judd. On the performance of patents. Econometrica, pages 567–585, 1985.
[94] Kenneth L. Judd, Sevin Yeltekin, and James Conklin. Computing Supergame Equilib-
ria. Econometrica, 71(4):1239–1254, 07 2003.
[95] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.
[96] John G Kemeny, Oskar Morgenstern, and Gerald L Thompson. A generalization of the
von neumann model of an expanding economy. Econometrica, Journal of the Economet-
ric Society, pages 115–135, 1956.
[97] Tomoo Kikuchi, Kazuo Nishimura, and John Stachurski. Span of control, transaction
costs, and the structure of production chains. Theoretical Economics, 13(2):729–760,
2018.
[98] Illenin Kondo, Logan T Lewis, and Andrea Stella. On the us firm and establishment
size distributions. Technical report, SSRN, 2018.
[100] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.
[102] Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational expecta-
tions and optimal control. Journal of Economic Dynamics and Control, 2:79–91, 1980.
[103] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynam-
ics. Applied Mathematical Sciences. Springer-Verlag, 1994.
[104] Edward E Leamer. Specification searches: Ad hoc inference with nonexperimental data,
volume 53. John Wiley & Sons Incorporated, 1978.
[105] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.
[106] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.
[107] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.
[108] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.
BIBLIOGRAPHY 1879
[109] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.
[110] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.
[111] Robert E Lucas, Jr. and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):55–93, 1983.
[112] Benoit Mandelbrot. The variation of certain speculative prices. The Journal of Busi-
ness, 36(4):394–419, 1963.
[113] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.
[114] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.
[116] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.
[117] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.
[118] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance.
Cambridge: MIT Press, 2002.
[119] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.
[120] John F Muth. Optimal properties of exponentially weighted forecasts. Journal of the
american statistical association, 55(290):299–306, 1960.
[121] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.
[122] Y Nishiyama, S Osada, and K Morimune. Estimation and testing for rank size rule
regression under pareto distribution. In Proceedings of the International Environmental
Modelling and Software Society iEMSs 2004 International Conference. Citeseer, 2004.
[124] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.
[126] Martin L Puterman. Markov decision processes: discrete stochastic dynamic program-
ming. John Wiley & Sons, 2005.
1880 BIBLIOGRAPHY
[127] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.
[128] Svetlozar Todorov Rachev. Handbook of heavy tailed distributions in finance: Handbooks
in finance, volume 1. Elsevier, 2003.
[130] Kevin L Reffett. Production-based asset pricing in monetary economies with transac-
tions costs. Economica, pages 427–443, 1996.
[133] Sherwin Rosen, Kevin M Murphy, and Jose A Scheinkman. Cattle cycles. Journal of
Political Economy, 102(3):468–492, 1994.
[135] Hernán D Rozenfeld, Diego Rybski, Xavier Gabaix, and Hernán A Makse. The area
and population of cities: New insights from a different perspective on cities. American
Economic Review, 101(5):2205–25, 2011.
[138] Jaewoo Ryoo and Sherwin Rosen. The engineering labor market. Journal of political
economy, 112(S1):S110–S140, 2004.
[139] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Studies, 21(2):75–78, 1939.
[140] Thomas Sargent, Lars Peter Hansen, and Will Roberts. Observable implications of
present value budget balance. In Rational Expectations Econometrics. Westview Press,
1991.
[141] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational
Expectations: I. International Economic Review, 18(1):59–82, February 1977.
[142] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.
[143] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.
[144] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.
[146] Christian Schluter and Mark Trede. Size distributions reconsidered. Econometric Re-
views, 38(6):695–710, 2019.
[148] John Stachurski. Continuous state dynamic programming via nonexpansive approxima-
tion. Computational Economics, 31(2):141–160, 2008.
[150] Nancy L Stokey. Reputation and time consistency. The American Economic Review,
pages 134–139, 1989.
[151] Nancy L. Stokey. Credible public policy. Journal of Economic Dynamics and Control,
15(4):627–656, October 1991.
[152] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.
[154] Lars E.O. Svensson and Noah Williams. Optimal Monetary Policy under Uncertainty in
DSGE Models: A Markov Jump-Linear-Quadratic Approach. In Klaus Schmidt-Hebbel,
Carl E. Walsh, Norman Loayza (Series Editor), and Klaus Schmidt-Hebbel (Series, ed-
itors, Monetary Policy under Uncertainty and Learning, volume 13 of Central Banking,
Analysis, and Economic Policies Book Series, chapter 3, pages 077–114. Central Bank
of Chile, March 2009.
[155] Lars EO Svensson, Noah Williams, et al. Optimal monetary policy under uncertainty:
A markov jump-linear-quadratic approach. Federal Reserve Bank of St. Louis Review,
90(4):275–293, 2008.
[156] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.
[157] Daniel Treisman. Russia’s billionaires. The American Economic Review, 106(5):236–241,
2016.
[158] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.
[160] John von Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen,
100(1):295–320, 1928.
[161] John von Neumann. Uber ein okonomsiches gleichungssystem und eine verallgemeiner-
ing des browerschen fixpunktsatzes. In Erge. Math. Kolloq., volume 8, pages 73–83,
1937.
[162] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.
[163] Peter Whittle. Prediction and regulation by linear least-square methods. English Univ.
Press, 1963.
1882 BIBLIOGRAPHY
[164] Peter Whittle. Prediction and Regulation by Linear Least Squares Methods. University
of Minnesota Press, Minneapolis, Minnesota, 2nd edition, 1983.
[166] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.