0% found this document useful (0 votes)
190 views

Quantitative Economics With Python

Uploaded by

Ivanildo Batista
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views

Quantitative Economics With Python

Uploaded by

Ivanildo Batista
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1478

Lectures in Quantitative Economics

with Python

Thomas J. Sargent and John Stachurski1

July 1, 2019

1 https://lectures.quantecon.org/py/
2
Contents

I Introduction to Python 1

1 About Python 3

2 Setting up Your Python Environment 13

3 An Introductory Example 35

4 Python Essentials 55

5 OOP I: Introduction to Object Oriented Programming 73

II The Scientific Libraries 79

6 NumPy 81

7 Matplotlib 99

8 SciPy 111

9 Numba 123

10 Other Scientific Libraries 135

III Advanced Python Programming 145

11 Writing Good Code 147

12 OOP II: Building Classes 155

13 OOP III: Samuelson Multiplier Accelerator 171

14 More Language Features 205

15 Debugging 237

IV Data and Empirics 243

16 Pandas 245

17 Pandas for Panel Data 259

3
4 CONTENTS

18 Linear Regression in Python 279

19 Maximum Likelihood Estimation 295

V Tools and Techniques 315

20 Geometric Series for Elementary Economics 317

21 Linear Algebra 337

22 Complex Numbers and Trignometry 361

23 Orthogonal Projections and Their Applications 371

24 LLN and CLT 387

25 Linear State Space Models 405

26 Finite Markov Chains 429

27 Continuous State Markov Chains 455

28 Cass-Koopmans Optimal Growth Model 477

29 A First Look at the Kalman Filter 505

30 Reverse Engineering a la Muth 523

VI Dynamic Programming 531

31 Shortest Paths 533

32 Job Search I: The McCall Search Model 541

33 Job Search II: Search and Separation 553

34 A Problem that Stumped Milton Friedman 565

35 Job Search III: Search with Learning 583

36 Job Search IV: Modeling Career Choice 599

37 Job Search V: On-the-Job Search 611

38 Optimal Growth I: The Stochastic Optimal Growth Model 621

39 Optimal Growth II: Time Iteration 639

40 Optimal Growth III: The Endogenous Grid Method 657

41 LQ Dynamic Programming Problems 665

42 Optimal Savings I: The Permanent Income Model 693


CONTENTS 5

43 Optimal Savings II: LQ Techniques 711

44 Consumption and Tax Smoothing with Complete and Incomplete Markets 729

45 Optimal Savings III: Occasionally Binding Constraints 747

46 Robustness 763

47 Discrete State Dynamic Programming 783

VII Multiple Agent Models 807

48 Schelling’s Segregation Model 809

49 A Lake Model of Employment and Unemployment 821

50 Rational Expectations Equilibrium 845

51 Markov Perfect Equilibrium 859

52 Robust Markov Perfect Equilibrium 875

53 Uncertainty Traps 893

54 The Aiyagari Model 907

55 Default Risk and Income Fluctuations 915

56 Globalization and Cycles 933

57 Coase’s Theory of the Firm 949

VIII Recursive Models of Dynamic Linear Economies 963

58 Recursive Models of Dynamic Linear Economies 965

59 Growth in Dynamic Linear Economies 1001

60 Lucas Asset Pricing Using DLE 1013

61 IRFs in Hall Models 1021

62 Permanent Income Model using the DLE Class 1029

63 Rosen Schooling Model 1035

64 Cattle Cycles 1041

65 Shock Non Invertibility 1049

IX Classic Linear Models 1055

66 Von Neumann Growth Model (and a Generalization) 1057


6 CONTENTS

X Time Series Models 1073

67 Covariance Stationary Processes 1075

68 Estimation of Spectra 1095

69 Additive and Multiplicative Functionals 1109

70 Classical Control with Linear Algebra 1131

71 Classical Prediction and Filtering With Linear Algebra 1151

XI Asset Pricing and Finance 1171

72 Asset Pricing I: Finite State Models 1173

73 Asset Pricing II: The Lucas Asset Pricing Model 1193

74 Asset Pricing III: Incomplete Markets 1203

75 Two Modifications of Mean-variance Portfolio Theory 1215

XII Dynamic Programming Squared 1239

76 Stackelberg Plans 1241

77 Ramsey Plans, Time Inconsistency, Sustainable Plans 1265

78 Optimal Taxation in an LQ Economy 1289

79 Optimal Taxation with State-Contingent Debt 1309

80 Optimal Taxation without State-Contingent Debt 1339

81 Fluctuating Interest Rates Deliver Fiscal Insurance 1365

82 Fiscal Risk and Government Debt 1389

83 Competitive Equilibria of Chang Model 1415

84 Credible Government Policies in Chang Model 1443


Part I

Introduction to Python

1
1

About Python

1.1 Contents

• Overview 1.2

• What’s Python? 1.3

• Scientific Programming 1.4

• Learn More 1.5

1.2 Overview

In this lecture we will

• Outline what Python is


• Showcase some of its abilities
• Compare it to some other languages

At this stage, it’s not our intention that you try to replicate all you see
We will work through what follows at a slow pace later in the lecture series
Our only objective for this lecture is to give you some feel of what Python is, and what it can
do

1.3 What’s Python?

Python is a general-purpose programming language conceived in 1989 by Dutch programmer


Guido van Rossum
Python is free and open source, with development coordinated through the Python Software
Foundation
Python has experienced rapid adoption in the last decade and is now one of the most popular
programming languages

3
4 1. ABOUT PYTHON

1.3.1 Common Uses

Python is a general-purpose language used in almost all application domains

• communications
• web development
• CGI and graphical user interfaces
• games
• multimedia, data processing, security, etc., etc., etc.

Used extensively by Internet service and high tech companies such as

• Google
• Dropbox
• Reddit
• YouTube
• Walt Disney Animation, etc., etc.

Often used to teach computer science and programming


For reasons we will discuss, Python is particularly popular within the scientific community

• academia, NASA, CERN, Wall St., etc., etc.

1.3.2 Relative Popularity

The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python

The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science
1.3. WHAT’S PYTHON? 5

For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here
(The corresponding time path for MATLAB is shown for comparison)

Note that pandas takes off in 2012, which is the same year that we seek Python’s popularity
begin to spike in the first figure
Overall, it’s clear that

• Python is one of the most popular programming languages worldwide


• Python is a major tool for scientific computing, accounting for a rapidly rising share of
scientific work around the globe

1.3.3 Features

Python is a high-level language suitable for rapid development


It has a relatively small core language supported by many libraries
Other features:

• A multiparadigm language, in that multiple programming styles are supported (proce-


dural, object-oriented, functional, etc.)
• Interpreted rather than compiled

1.3.4 Syntax and Design

One nice feature of Python is its elegant syntax — we’ll see many examples later on
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the
syntax easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you don’t need to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is an elegant design
6 1. ABOUT PYTHON

Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors

1.4 Scientific Programming

Python has become one of the core languages of scientific computing


It’s either the dominant player or a major player in

• Machine learning and data science


• Astronomy
• Artificial intelligence
• Chemistry
• Computational biology
• Meteorology
• etc., etc.

Its popularity in economics is also beginning to rise


This section briefly showcases some examples of Python for scientific programming

• All of these topics will be covered in detail later on

1.4.1 Numerical Programming

Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library
NumPy provides the basic array data type plus some simple processing operations
For example, let’s build some arrays

In [1]: import numpy as np # Load the library

a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -π to π


b = np.cos(a) # Apply cosine to each element of a
c = np.sin(a) # Apply sin to each element of a

Now let’s take the inner product:

In [2]: b @ c

Out[2]: 1.5265566588595902e-16

The number you see here might vary slightly but it’s essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density
1.4. SCIENTIFIC PROGRAMMING 7

In [3]: from scipy.stats import norm


from scipy.integrate import quad

� = norm()
value, error = quad(�.pdf, -2, 2) # Integrate using Gaussian quadrature
value

Out[3]: 0.9544997361036417

SciPy includes many of the standard routines used in

• linear algebra
• integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc.

1.4.2 Graphics

The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib

• Plots, histograms, contour images, 3D, bar charts, etc., etc.


• Output in many formats (PDF, PNG, EPS, etc.)
• LaTeX integration

Example 2D plot with embedded LaTeX annotations

Example contour plot


8 1. ABOUT PYTHON

Example 3D plot

More examples can be found in the Matplotlib thumbnail gallery


Other graphics libraries include

• Plotly
• Bokeh
• VPython — 3D graphics and animations
1.4. SCIENTIFIC PROGRAMMING 9

1.4.3 Symbolic Algebra

It’s useful to be able to manipulate symbolic expressions, as in Mathematica or Maple


The SymPy library provides this functionality from within the Python shell

In [4]: from sympy import Symbol

x, y = Symbol('x'), Symbol('y') # Treat 'x' and 'y' as algebraic symbols


x + x + x + y

Out[4]: 3*x + y

We can manipulate expressions

In [5]: expression = (x + y)**2


expression.expand()

Out[5]: x**2 + 2*x*y + y**2

solve polynomials

In [6]: from sympy import solve

solve(x**2 + x + 2)

Out[6]: [-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]

and calculate limits, derivatives and integrals

In [7]: from sympy import limit, sin, diff

limit(1 / x, x, 0)

Out[7]: oo

In [8]: limit(sin(x) / x, x, 0)

Out[8]: 1

In [9]: diff(sin(x), x)

Out[9]: cos(x)

The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 1. ABOUT PYTHON

1.4.4 Statistics

Python’s data manipulation and statistics libraries have improved rapidly over the last few
years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Here’s a simple example, using some fake data

In [10]: import pandas as pd


np.random.seed(1234)

data = np.random.randn(5, 2) # 5x2 matrix of N(0, 1) random draws


dates = pd.date_range('28/12/2010', periods=5)

df = pd.DataFrame(data, columns=('price', 'weight'), index=dates)


print(df)

price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685

In [11]: df.mean()

Out[11]: price 0.411768


weight -0.699135
dtype: float64

Other Useful Statistics Libraries


- statsmodels — various statistical routines
- scikit-learn — machine learning in Python (sponsored by Google, among others)
- pyMC — for Bayesian data analysis
- pystan Bayesian analysis based on stan

1.4.5 Networks and Graphs

Python has many libraries for studying graphs


One well-known example is NetworkX

• Standard graph algorithms for analyzing network structure, etc.


• Plotting routines
• etc., etc.

Here’s some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node
1.4. SCIENTIFIC PROGRAMMING 11

In [12]: import networkx as nx


import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(1234)

# Generate a random graph


p = dict((i,(np.random.uniform(0, 1),np.random.uniform(0, 1))) for i in range(200))
G = nx.random_geometric_graph(200, 0.12, pos=p)
pos = nx.get_node_attributes(G, 'pos')

# find node nearest the center point (0.5, 0.5)


dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
ncenter = np.argmin(dists)

# Plot graph, coloring by path length from central node


p = nx.single_source_shortest_path_length(G, ncenter)
plt.figure()
nx.draw_networkx_edges(G, pos, alpha=0.4)
nx.draw_networkx_nodes(G,
pos,
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarnin
if cb.is_numlike(alpha):

1.4.6 Cloud Computing

Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
12 1. ABOUT PYTHON

See also
- Amazon Elastic Compute Cloud
- The Google App Engine (Python, Java, PHP or Go)
- Pythonanywhere
- Sagemath Cloud

1.4.7 Parallel Processing

Apart from the cloud computing options listed above, you might like to consider
- Parallel computing through IPython clusters
- The Starcluster interface to Amazon’s EC2
- GPU programming through PyCuda, PyOpenCL, Theano or similar

1.4.8 Other Developments

There are many other interesting developments with scientific programming in Python
Some representative examples include
- Jupyter — Python in your browser with code cells, embedded images, etc.
- Numba — Make Python run at the same speed as native machine code!
- Blaze — a generalization of NumPy
- PyTables — manage large data sets
- CVXPY — convex optimization in Python

1.5 Learn More

• Browse some Python projects on GitHub


• Have a look at some of the Jupyter notebooks people have shared on various scientific
topics

- Visit the Python Package Index


- View some of the questions people are asking about Python on Stackoverflow
- Keep up to date on what’s happening in the Python community with the Python subreddit
2

Setting up Your Python


Environment

2.1 Contents

• Overview 2.2

• Anaconda 2.3

• Jupyter Notebooks 2.4

• Installing Libraries 2.5

• Working with Files 2.6

• Editors and IDEs 2.7

• Exercises 2.8

2.2 Overview

In this lecture, you will learn how to

1. get a Python environment up and running with all the necessary tools
2. execute simple Python commands
3. run a sample program
4. install the code libraries that underpin these lectures

2.3 Anaconda

The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which

• the core installation doesn’t provide


• is painful to install one piece at a time

13
14 2. SETTING UP YOUR PYTHON ENVIRONMENT

Hence the best approach for our purposes is to install a free Python distribution that contains

1. the core Python language and


2. the most popular scientific libraries

The best such distribution is Anaconda


Anaconda is

• very popular
• cross platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name

Anaconda also comes with a great package management system to organize your code li-
braries
All of what follows assumes that you adopt this recommendation!

2.3.1 Installing Anaconda

Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:

• Install the latest version


• If you are asked during the installation process whether you’d like to make Anaconda
your default Python installation, say yes
• Otherwise, you can accept all of the defaults

2.3.2 Updating Anaconda

Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution
As a practice run, please execute the following

1. Open up a terminal
2. Type conda update anaconda

For more information on conda, type conda help in a terminal

2.4 Jupyter Notebooks

Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries
They use a browser-based interface to Python with
2.4. JUPYTER NOTEBOOKS 15

• The ability to write and execute Python commands


• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions

Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook

You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to

• start coding in Python


• test new ideas or interact with small pieces of code
• share or collaborate scientific ideas with students or colleagues

These lectures are designed for executing in Jupyter notebooks


16 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.4.1 Starting the Jupyter Notebook

Once you have installed Anaconda, you can start the Jupyter notebook
Either

• search for Jupyter in your applications menu, or

• open up a terminal and type jupyter notebook

– Windows users should substitute “Anaconda command prompt” for “terminal” in


the previous line

If you use the second option, you will see something like this (click to enlarge)

The output tells us the notebook is running at http://localhost:8888/

• localhost is the name of the local machine


• 8888 refers to port number 8888 on your computer

Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine
Hopefully, your default browser has also opened up with a web page that looks something like
this (click to enlarge)
2.4. JUPYTER NOTEBOOKS 17

What you see here is called the Jupyter dashboard


If you look at the URL at the top, it should be localhost:8888 or similar, matching the
message above
Assuming all this has worked OK, you can now click on New at the top right and select
Python 3 or similar
Here’s what shows up on our machine:
18 2. SETTING UP YOUR PYTHON ENVIRONMENT

The notebook displays an active cell, into which you can type Python commands

2.4.2 Notebook Basics

Let’s start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter
2.4. JUPYTER NOTEBOOKS 19

(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are

1. Edit mode

• Indicated by a green border around one cell


• Whatever you type appears as is in that cell

1. Command mode

• The green border is replaced by a grey border


• Key strokes are interpreted as commands — for example, typing b adds a new cell be-
low the current one

To switch to

• command mode from edit mode, hit the Esc key or Ctrl-M
20 2. SETTING UP YOUR PYTHON ENVIRONMENT

• edit mode from command mode, hit Enter or click in a cell

The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it
User Interface Tour
At this stage, we recommend you take your time to

• look at the various options in the menus and see what they do
• take the “user interface tour”, which can be accessed through the help menu

Inserting Unicode (e.g., Greek Letters)


Python 3 introduced support for unicode characters, allowing the use of characters such as �
and � in your code
Unicode characters can be typed quickly in Jupyter using the tab key
Try creating a new code cell and typing �, then hitting the tab key on your keyboard
A Test Program
Let’s run a test program
Here’s an arbitrary program we can use: http://matplotlib.org/1.4.1/examples/
pie_and_polar_charts/polar_bar_demo.html
On that page, you’ll see the following code

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

N = 20
θ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)

ax = plt.subplot(111, polar=True)
bars = ax.bar(θ, radii, width=width, bottom=0.0)

# Use custom colors and opacity


for r, bar in zip(radii, bars):
bar.set_facecolor(plt.cm.jet(r / 10.))
bar.set_alpha(0.5)

plt.show()
2.4. JUPYTER NOTEBOOKS 21

Don’t worry about the details for now — let’s just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)

2.4.3 Working with the Notebook

Here are a few more tips on working with Jupyter notebooks


Tab Completion
In the previous program, we executed the line import numpy as np

• NumPy is a numerical library we’ll work with in depth

After this import command, functions in NumPy can be accessed with


np.<function_name> type syntax

• For example, try np.random.randn(3)

We can explore these attributes of np using the Tab key


For example, here we type np.ran and hit Tab (click to enlarge)
22 2. SETTING UP YOUR PYTHON ENVIRONMENT

Jupyter offers up the two possible completions, random and rank


In this way, the Tab key helps remind you of what’s available and also saves you typing
On-Line Help
To get help on np.rank, say, we can execute np.rank?
Documentation appears in a split window of the browser, like so
2.4. JUPYTER NOTEBOOKS 23

Clicking on the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
24 2. SETTING UP YOUR PYTHON ENVIRONMENT

Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25

2.4.4 Sharing Notebooks

Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files — or by using web services such as
nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above

2.4.5 QuantEcon Notes

QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon
Notes
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community
26 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.5 Installing Libraries

Most of the libraries we need come in Anaconda


Other libraries can be installed with pip
One library we’ll be using is QuantEcon.py
You can install QuantEcon.py by starting Jupyter and typing

!pip install quantecon

into a cell
Alternatively, you can type the following into a terminal

pip install quantecon

More instructions can be found on the library page


To upgrade to the latest version, which you should do regularly, use

pip install --upgrade quantecon

Another library we will be using is interpolation.py


This can be installed by typing in Jupyter

!pip install interpolation

2.6 Working with Files

How does one run a locally saved Python file?


There are a number of ways to do this but let’s focus on methods using Jupyter notebooks

2.6.1 Option 1: Copy and Paste

The steps are:

1. Navigate to your file with your mouse/trackpad using a file browser


2. Click on your file to open it with a text editor
3. Copy and paste into a cell and Shift-Enter

2.6.2 Method 2: Run

Using the run command is often easier than copy and paste

• For example, %run test.py will run the file test.py


2.6. WORKING WITH FILES 27

(You might find that the % is unnecessary — use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isn’t in that directory, you will get an error
Let’s look at a successful example, where we run a file test.py with contents:

In [2]: for i in range(5):


print('foobar')

foobar
foobar
foobar
foobar
foobar

Here’s the notebook (click to enlarge)

Here

• pwd asks Jupyter to show the PWD (or %pwd — see the comment about automagic
above)

– This is where Jupyter is going to look for files to run


– Your output will look a bit different depending on your OS

• ls asks Jupyter to list files in the PWD (or %ls)


28 2. SETTING UP YOUR PYTHON ENVIRONMENT

– Note that test.py is there (on our computer, because we saved it there earlier)

• cat test.py asks Jupyter to print the contents of test.py (or !type test.py on
Windows)

• run test.py runs the file and prints any output

2.6.3 But File X isn’t in my PWD!

If you’re trying to run a file not in the present working directory, you’ll get an error
To fix this error you need to either

1. Shift the file into the PWD, or


2. Change the PWD to where the file lives

One way to achieve the first option is to use the Upload button

• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture

The second option can be achieved using the cd command

• On Windows it might look like this cd C:/Python27/Scripts/dir


• On Linux / OSX it might look like this cd /home/user/scripts/dir

Note: You can type the first letter or two of each directory name and then use the tab key to
expand

2.6.4 Loading Files

It’s often convenient to be able to see your code before you run it
2.7. EDITORS AND IDES 29

In the following example, we execute load white_noise_plot.py where


white_noise_plot.py is in the PWD
(Use %load if automagic is off)
Now the code from the file appears in a cell ready to execute

2.6.5 Saving Files

To save the contents of a cell as file foo.py

• put %%file foo.py as the first line of the cell


• Shift+Enter

Here %%file is an example of a cell magic

2.7 Editors and IDEs

The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your
workflow
There are many different options and we mention them only in passing
30 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.1 JupyterLab

JupyterLab is an integrated development environment centered around Jupyter notebooks


It is available through Anaconda and will soon be made the default environment for Jupyter
notebooks
Reading the docs or searching for a recent YouTube video will give you more information

2.7.2 Text Editors

A text editor is an application that is specifically designed to work with text files — such as
Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide

• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.

Among the most popular are Sublime Text and Atom


For a top quality open source text editor with a steeper learning curve, try Emacs
If you want an outstanding free text editor and don’t mind a seemingly vertical learning
curve plus long days of pain and suffering while all your neural pathways are rewired, try
Vim

2.7.3 Text Editors Plus IPython Shell

A text editor is for writing programs


To run them you can continue to use Jupyter as described above
Another option is to use the excellent IPython shell
To use an IPython shell, open up a terminal and type ipython
You should see something like this
2.7. EDITORS AND IDES 31

The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typing…
Here’s one set up, on a Linux box, with

• a file being edited in Vim


• An IPython shell next to it, to run the file
32 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.4 IDEs

IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code — please tell us about your experiences on the forum

2.8 Exercises

2.8.1 Exercise 1

If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running
Now

1. Start your browser — or open a new tab if it’s already running


2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the
top

You should now be able to run a standard Jupyter notebook session


This is an alternative way to start the notebook that can also be handy

2.8.2 Exercise 2

This exercise will familiarize you with git and GitHub


Git is a version control system — a piece of software used to manage digital projects such as
code libraries
In many cases, the associated collections of files — called repositories — are stored on
GitHub
GitHub is a wonderland of collaborative coding projects
For example, it hosts many of the scientific libraries we’ll be using later on, such as this one
Git is the underlying software used to manage these projects
Git is an extremely powerful tool for distributed collaboration — for example, we use it to
share and synchronize all the source files for these lectures
There are two main flavors of Git

1. the plain vanilla command line Git version


2. the various point-and-click GUI versions
2.8. EXERCISES 33

• See, for example, the GitHub version

As an exercise, try

1. Installing Git
2. Getting a copy of QuantEcon.py using Git

For example, if you’ve installed the command line version, open up a terminal and enter

git clone https://github.com/QuantEcon/QuantEcon.py

(This is just git clone in front of the URL for the repository)
Even better,

1. Sign up to GitHub
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub)
3. Fork QuantEcon.py
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo
5. If you made a valuable improvement, send us a pull request!

For reading on these and other topics, try

• The official Git documentation


• Reading through the docs on GitHub
• Pro Git Book by Scott Chacon and Ben Straub
• One of the thousands of Git tutorials on the Net
34 2. SETTING UP YOUR PYTHON ENVIRONMENT
3

An Introductory Example

3.1 Contents

• Overview 3.2

• The Task: Plotting a White Noise Process 3.3

• Version 1 3.4

• Alternative Versions 3.5

• Exercises 3.6

• Solutions 3.7

We’re now ready to start learning the Python language itself


The level of this and the next few lectures will suit those with some basic knowledge of pro-
gramming
But don’t give up if you have none—you are not excluded
You just need to cover a few of the fundamentals of programming before returning here
Good references for first time programmers include:

• The first 5 or 6 chapters of How to Think Like a Computer Scientist


• Automate the Boring Stuff with Python
• The start of Dive into Python 3

Note: These references offer help on installing Python but you should probably stick with the
method on our set up page
You’ll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course

3.2 Overview

In this lecture, we will write and then pick apart small Python programs

35
36 3. AN INTRODUCTORY EXAMPLE

The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures

3.2.1 Prerequisites

The lecture on getting started with Python

3.3 The Task: Plotting a White Noise Process

Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw
𝜖𝑡 is independent standard normal
In other words, we want to generate figures that look something like this:

We’ll do this in several different ways

3.4 Version 1

Here are a few lines of code that perform the task we set

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 37

Let’s break this program down and see how it works

3.4.1 Import Statements

The first two lines of the program import functionality


The first line imports NumPy, a favorite Python package for tasks like

• working with arrays (vectors and matrices)


• common mathematical functions like cos and sqrt
• generating random numbers
• linear algebra, etc.

After import numpy as np we have access to these attributes via the syntax np.
Here’s another example

In [2]: import numpy as np

np.sqrt(4)

Out[2]: 2.0

We could also just write

In [3]: import numpy

numpy.sqrt(4)

Out[3]: 2.0
38 3. AN INTRODUCTORY EXAMPLE

But the former method is convenient and more standard


Why all the Imports?
Remember that Python is a general-purpose language
The core language is quite small so it’s easy to learn and maintain
When you want to do something interesting with Python, you almost always need to import
additional functionality
Scientific work in Python is no exception
Most of our programs start off with lines similar to the import statements seen above
Packages
As stated above, NumPy is a Python package
Packages are used by developers to organize a code library
In fact, a package is just a directory containing

1. files with Python code — called modules in Python speak


2. possibly some compiled code that can be accessed by Python (e.g., functions compiled
from C or FORTRAN code)
3. a file called __init__.py that specifies what will be executed when we type import
package_name

In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around
On this machine, it’s located in

anaconda3/lib/python3.6/site-packages/numpy

Subpackages
Consider the line x = np.random.randn(100)
Here np refers to the package NumPy, while random is a subpackage of NumPy
You can see the contents here
Subpackages are just packages that are subdirectories of another package

3.4.2 Importing Names Directly

Recall this code that we saw above

In [4]: import numpy as np

np.sqrt(4)

Out[4]: 2.0

Here’s another way to access NumPy’s square root function


3.5. ALTERNATIVE VERSIONS 39

In [5]: from numpy import sqrt

sqrt(4)

Out[5]: 2.0

This is also fine


The advantage is less typing if we use sqrt often in our code
The disadvantage is that, in a long program, these two lines might be separated by many
other lines
Then it’s harder for readers to know where sqrt came from, should they wish to

3.5 Alternative Versions

Let’s try writing some alternative versions of our first program


Our aim in doing this is to illustrate some more Python syntax and semantics
The programs below are less efficient but

• help us understand basic constructs like loops


• illustrate common data types like lists

3.5.1 A Version with a For Loop

Here’s a version that illustrates loops and Python lists

In [6]: ts_length = 100


�_values = [] # Empty list

for i in range(ts_length):
e = np.random.randn()
�_values.append(e)

plt.plot(�_values)
plt.show()
40 3. AN INTRODUCTORY EXAMPLE

In brief,

• The first pair of lines import functionality as before


• The next line sets the desired length of the time series
• The next line creates an empty list called �_values that will store the 𝜖𝑡 values as we
generate them
• The next three lines are the for loop, which repeatedly draws a new random number 𝜖𝑡
and appends it to the end of the list �_values
• The last two lines generate the plot and display it to the user

Let’s study some parts of this program in more detail

3.5.2 Lists

Consider the statement �_values = [], which creates an empty list


Lists are a native Python data structure used to group a collection of objects
For example, try

In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)

Out[7]: list

The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)

In [8]: x
3.5. ALTERNATIVE VERSIONS 41

Out[8]: [10, 'foo', False]

In [9]: x.append(2.5)
x

Out[9]: [10, 'foo', False, 2.5]

Here append() is what’s called a method, which is a function “attached to” an object—in
this case, the list x
We’ll learn all about methods later on, but just to give you some idea,

• Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object
• String objects have string methods, list objects have list methods, etc.

Another useful list method is pop()

In [10]: x

Out[10]: [10, 'foo', False, 2.5]

In [11]: x.pop()

Out[11]: 2.5

In [12]: x

Out[12]: [10, 'foo', False]

The full set of list methods can be found here


Following C, C++, Java, etc., lists in Python are zero-based

In [13]: x

Out[13]: [10, 'foo', False]

In [14]: x[0]

Out[14]: 10

In [15]: x[1]

Out[15]: 'foo'
42 3. AN INTRODUCTORY EXAMPLE

3.5.3 The For Loop

Now let’s consider the for loop from the program above, which was

In [16]: for i in range(ts_length):


e = np.random.randn()
�_values.append(e)

Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the “block” of code that we
are looping over
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion
In our program, indentation decreases after line �_values.append(e), telling Python that
this line marks the lower limit of the code block
More on indentation below—for now, let’s look at another example of a for loop

In [17]: animals = ['dog', 'cat', 'bird']


for animal in animals:
print("The plural of " + animal + " is " + animal + "s")

The plural of dog is dogs


The plural of cat is cats
The plural of bird is birds

This example helps to clarify how the for loop works: When we execute a loop of the form

for variable_name in sequence:


<code block>

The Python interpreter performs the following:

• For each element of the sequence, it “binds” the name variable_name to that ele-
ment and then executes the code block

The sequence object can in fact be a very general object, as we’ll see soon enough

3.5.4 Code Blocks and Indentation

In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram
Once you get used to it, this is a good thing: It
3.5. ALTERNATIVE VERSIONS 43

• forces clean, consistent indentation, improving readability


• removes clutter, such as the brackets or end statements used in other languages

On the other hand, it takes a bit of care to get right, so please remember:

• The line before the start of a code block always ends in a colon

– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.

• All lines in a code block must have the same amount of indentation

• The Python standard is 4 spaces, and that’s what you should use

Tabs vs Spaces
One small “gotcha” here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure it’s configured to do
so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
— trying searching online

3.5.5 While Loops

The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, let’s modify the program above to use a while loop in-
stead

In [18]: ts_length = 100


�_values = []
i = 0
while i < ts_length:
e = np.random.randn()
�_values.append(e)
i = i + 1
plt.plot(�_values)
plt.show()
44 3. AN INTRODUCTORY EXAMPLE

Note that

• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1

3.5.6 User-Defined Functions

Now let’s go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:

1. A user-defined function that generates a list of random variables

2. The main part of the program that

3. calls this function to get data

4. plots the data

This is accomplished in the next program

In [19]: def generate_data(n):


�_values = []
for i in range(n):
e = np.random.randn()
�_values.append(e)
return �_values

data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 45

Let’s go over this carefully, in case you’re not familiar with functions and how they work
We have defined a function called generate_data() as follows

• def is a Python keyword used to start function definitions


• def generate_data(n): indicates that the function is called generate_data and
that it has a single argument n
• The indented code is a code block called the function body—in this case, it creates an
IID list of random draws using the same logic as before
• The return keyword indicates that �_values is the object that should be returned to
the calling code

This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100
The net result is that the name data is bound to the list �_values returned by the function

3.5.7 Conditions

Our function generate_data() is rather limited


Let’s make it slightly more useful by giving it the ability to return either standard normals or
uniform random variables on (0, 1) as required
This is achieved the next piece of code

In [20]: def generate_data(n, generator_type):


�_values = []
for i in range(n):
if generator_type == 'U':
e = np.random.uniform(0, 1)
46 3. AN INTRODUCTORY EXAMPLE

else:
e = np.random.randn()
�_values.append(e)
return �_values

data = generate_data(100, 'U')


plt.plot(data)
plt.show()

Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks
Notes

• We are passing the argument U as a string, which is why we write it as 'U'

• Notice that equality is tested with the == syntax, not =

– For example, the statement a = 10 assigns the name a to the value 10


– The expression a == 10 evaluates to either True or False, depending on the
value of a

Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function
To understand this, consider the following version

In [21]: def generate_data(n, generator_type):


�_values = []
for i in range(n):
e = generator_type()
�_values.append(e)
return �_values
3.5. ALTERNATIVE VERSIONS 47

data = generate_data(100, np.random.uniform)


plt.plot(data)
plt.show()

Now, when we call the function generate_data(), we pass np.random.uniform as the


second argument
This object is a function
When the function call generate_data(100, np.random.uniform) is executed,
Python runs the function code block with n equal to 100 and the name generator_type
“bound” to the function np.random.uniform

• While these lines are executed, the names generator_type and


np.random.uniform are “synonyms”, and can be used in identical ways

This principle works more generally—for example, consider the following piece of code

In [22]: max(7, 2, 4) # max() is a built-in Python function

Out[22]: 7

In [23]: m = max
m(7, 2, 4)

Out[23]: 7

Here we created another name for the built-in function max(), which could then be used in
identical ways
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another function—as we did above
48 3. AN INTRODUCTORY EXAMPLE

3.5.8 List Comprehensions

We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line

In [24]: animals = ['dog', 'cat', 'bird']


plurals = [animal + 's' for animal in animals]
plurals

Out[24]: ['dogs', 'cats', 'birds']

Here’s another example

In [25]: range(8)

Out[25]: range(0, 8)

In [26]: doubles = [2 * x for x in range(8)]


doubles

Out[26]: [0, 2, 4, 6, 8, 10, 12, 14]

With the list comprehension syntax, we can simplify the lines

�_values = []
for i in range(n):
e = generator_type()
�_values.append(e)

into

�_values = [generator_type() for i in range(n)]

3.6 Exercises

3.6.1 Exercise 1

Recall that 𝑛! is read as “𝑛 factorial” and defined as 𝑛! = 𝑛 × (𝑛 − 1) × ⋯ × 2 × 1


There are functions to compute this in various modules, but let’s write our own version as an
exercise
In particular, write a function factorial such that factorial(n) returns 𝑛! for any posi-
tive integer 𝑛
3.6. EXERCISES 49

3.6.2 Exercise 2

The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary
trials, where each trial succeeds with probability 𝑝
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of 𝑌
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True
with probability 𝑝

3.6.3 Exercise 3

Compute an approximation to 𝜋 using Monte Carlo. Use no imports besides

In [27]: import numpy as np

Your hints are as follows:

• If 𝑈 is a bivariate uniform random variable on the unit square (0, 1)2 , then the proba-
bility that 𝑈 lies in a subset 𝐵 of (0, 1)2 is equal to the area of 𝐵
• If 𝑈1 , … , 𝑈𝑛 are IID copies of 𝑈 , then, as 𝑛 gets large, the fraction that falls in 𝐵, con-
verges to the probability of landing in 𝐵
• For a circle, area = pi * radius^2

3.6.4 Exercise 4

Write a program that prints one realization of the following random device:

• Flip an unbiased coin 10 times


• If 3 consecutive heads occur one or more times within this sequence, pay one dollar
• If not, pay nothing

Use no import besides from numpy.random import uniform

3.6.5 Exercise 5

Your next task is to simulate and plot the correlated time series

𝑥𝑡+1 = 𝛼 𝑥𝑡 + 𝜖𝑡+1 where 𝑥0 = 0 and 𝑡 = 0, … , 𝑇

The sequence of shocks {𝜖𝑡 } is assumed to be IID and standard normal


In your solution, restrict your import statements to

In [28]: import numpy as np


import matplotlib.pyplot as plt

Set 𝑇 = 200 and 𝛼 = 0.9


50 3. AN INTRODUCTORY EXAMPLE

3.6.6 Exercise 6

To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea

In [29]: import numpy as np


import matplotlib.pyplot as plt

x = [np.random.randn() for i in range(100)]


plt.plot(x, label="white noise")
plt.legend()
plt.show()

Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 51

(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the 𝛼 values
Important hints:

• If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure

– And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line

• The expression 'foo' + str(42) evaluates to 'foo42'

3.7 Solutions

3.7.1 Exercise 1
In [30]: def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k

factorial(4)

Out[30]: 24

3.7.2 Exercise 2
In [31]: from numpy.random import uniform
52 3. AN INTRODUCTORY EXAMPLE

def binomial_rv(n, p):


count = 0
for i in range(n):
U = uniform()
if U < p:
count = count + 1 # Or count += 1
return count

binomial_rv(10, 0.5)

Out[31]: 5

3.7.3 Exercise 3

Consider the circle of diameter 1 embedded in the unit square


Let 𝐴 be its area and let 𝑟 = 1/2 be its radius
If we know 𝜋 then we can compute 𝐴 via 𝐴 = 𝜋𝑟2
But here the point is to compute 𝜋, which we can do by 𝜋 = 𝐴/𝑟2
Summary: If we can estimate the area of the unit circle, then dividing by 𝑟2 = (1/2)2 = 1/4
gives an estimate of 𝜋
We estimate the area by sampling bivariate uniforms and looking at the fraction that falls
into the unit circle

In [32]: n = 100000

count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1

area_estimate = count / n

print(area_estimate * 4) # dividing by radius**2

3.13976

3.7.4 Exercise 4
In [33]: from numpy.random import uniform

payoff = 0
count = 0

for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1

print(payoff)

1
3.7. SOLUTIONS 53

3.7.5 Exercise 5

The next line embeds all subsequent figures in the browser itself

In [34]: α = 0.9
ts_length = 200
current_x = 0

x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values)
plt.show()

3.7.6 Exercise 6

In [35]: αs = [0.0, 0.8, 0.98]


ts_length = 200

for α in αs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values, label=f'α = {α}')
plt.legend()
plt.show()
54 3. AN INTRODUCTORY EXAMPLE
4

Python Essentials

4.1 Contents

• Data Types 4.2

• Input and Output 4.3

• Iterating 4.4

• Comparisons and Logical Operators 4.5

• More Functions 4.6

• Coding Style and PEP8 4.7

• Exercises 4.8

• Solutions 4.9

In this lecture, we’ll cover features of the language that are essential to reading and writing
Python code

4.2 Data Types

We’ve already met several built-in Python data types, such as strings, integers, floats and
lists
Let’s learn a bit more about them

4.2.1 Primitive Data Types

One simple data type is Boolean values, which can be either True or False

In [1]: x = True
x

Out[1]: True

55
56 4. PYTHON ESSENTIALS

In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value

In [2]: y = 100 < 10


y

Out[2]: False

In [3]: type(y)

Out[3]: bool

In arithmetic expressions, True is converted to 1 and False is converted 0


This is called Boolean arithmetic and is often useful in programming
Here are some examples

In [4]: x + y

Out[4]: 1

In [5]: x * y

Out[5]: 0

In [6]: True + True

Out[6]: 2

In [7]: bools = [True, True, False, True] # List of Boolean values

sum(bools)

Out[7]: 3

The two most common data types used to represent numbers are integers and floats

In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)

Out[8]: int

In [9]: type(c)

Out[9]: float

Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate
As long as you’re using Python 3.x, division of integers yields floats

In [10]: 1 / 2
4.2. DATA TYPES 57

Out[10]: 0.5

But be careful! If you’re still using Python 2.x, division of two integers returns only the inte-
ger part
For integer division in Python 3.x use this syntax:

In [11]: 1 // 2

Out[11]: 0

Complex numbers are another primitive data type in Python

In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y

Out[12]: 5j

4.2.2 Containers

Python has several basic types for storing collections of (possibly heterogeneous) data
We’ve already discussed lists
A related data type is tuples, which are “immutable” lists

In [13]: x = ('a', 'b') # Parentheses instead of the square brackets


x = 'a', 'b' # Or no brackets --- the meaning is identical
x

Out[13]: ('a', 'b')

In [14]: type(x)

Out[14]: tuple

In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable

In [15]: x = [1, 2]
x[0] = 10
x

Out[15]: [10, 2]

But tuples are not

In [16]: x = (1, 2)
x[0] = 10
58 4. PYTHON ESSENTIALS

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10

TypeError: 'tuple' object does not support item assignment

We’ll say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be “unpacked” as follows

In [17]: integers = (10, 20, 30)


x, y, z = integers
x

Out[17]: 10

In [18]: y

Out[18]: 20

You’ve actually seen an example of this already


Tuple unpacking is convenient and we’ll use it often
Slice Notation
To access multiple elements of a list or tuple, you can use Python’s slice notation
For example,

In [19]: a = [2, 4, 6, 8]
a[1:]

Out[19]: [4, 6, 8]

In [20]: a[1:3]

Out[20]: [4, 6]

The general rule is that a[m:n] returns n - m elements, starting at a[m]


Negative numbers are also permissible

In [21]: a[-2:] # Last two elements of the list

Out[21]: [6, 8]

The same slice notation works on tuples and strings

In [22]: s = 'foobar'
s[-3:] # Select the last three elements
4.3. INPUT AND OUTPUT 59

Out[22]: 'bar'

Sets and Dictionaries


Two other container types we should mention before moving on are sets and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered

In [23]: d = {'name': 'Frodo', 'age': 33}


type(d)

Out[23]: dict

In [24]: d['age']

Out[24]: 33

The names 'name' and 'age' are called the keys


The objects that the keys are mapped to ('Frodo' and 33) are called the values
Sets are unordered collections without duplicates, and set methods provide the usual set-
theoretic operations

In [25]: s1 = {'a', 'b'}


type(s1)

Out[25]: set

In [26]: s2 = {'b', 'c'}


s1.issubset(s2)

Out[26]: False

In [27]: s1.intersection(s2)

Out[27]: {'b'}

The set() function creates sets from sequences

In [28]: s3 = set(('foo', 'bar', 'foo'))


s3

Out[28]: {'bar', 'foo'}

4.3 Input and Output

Let’s briefly review reading and writing to text files, starting with writing

In [29]: f = open('newfile.txt', 'w') # Open 'newfile.txt' for writing


f.write('Testing\n') # Here '\n' means new line
f.write('Testing again')
f.close()
60 4. PYTHON ESSENTIALS

Here

• The built-in function open() creates a file object for writing to


• Both write() and close() are methods of file objects

Where is this file that we’ve created?


Recall that Python maintains a concept of the present working directory (pwd) that can be
located from with Jupyter or IPython via

In [30]: %pwd

Out[30]: '/home/anju/Desktop/lecture-source-py/_build/jupyter/executed'

If a path is not specified, then this is where Python writes to


We can also use Python to read the contents of newline.txt as follows

In [31]: f = open('newfile.txt', 'r')


out = f.read()
out

Out[31]: 'Testing\nTesting again'

In [32]: print(out)

Testing
Testing again

4.3.1 Paths

Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case, you can shift the file to the pwd or specify the full path to the file

f = open('insert_full_path_to_file/newfile.txt', 'r')

4.4 Iterating

One of the most important tasks in computing is stepping through a sequence of data and
performing a given action
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for
loop

4.4.1 Looping over Different Objects

Many Python objects are “iterable”, in the sense that they can be looped over
To give an example, let’s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory
4.4. ITERATING 61

In [33]: %%file us_cities.txt


new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Overwriting us_cities.txt

Suppose that we want to make the information more readable, by capitalizing names and
adding commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:

In [34]: data_file = open('us_cities.txt', 'r')


for line in data_file:
city, population = line.split(':') # Tuple unpacking
city = city.title() # Capitalize city names
population = f'{int(population):,}' # Add commas to numbers
print(city.ljust(15) + population)
data_file.close()

New York 8,244,910


Los Angeles 3,819,702
Chicago 2,707,120
Houston 2,145,146
Philadelphia 1,536,471
Phoenix 1,469,471
San Antonio 1,359,758
San Diego 1,326,179
Dallas 1,223,229

Here format() is a string method used for inserting variables into strings
The reformatting of each line is the result of three different string methods, the details of
which can be left till later
The interesting part of this program for us is line 2, which shows that

1. The file object f is iterable, in the sense that it can be placed to the right of in within
a for loop
2. Iteration steps through each line in the file

This leads to the clean, convenient syntax shown in our program


Many other kinds of objects are iterable, and we’ll discuss some of them later on

4.4.2 Looping without Indices

One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing
For example,
62 4. PYTHON ESSENTIALS

In [35]: x_values = [1, 2, 3] # Some iterable x


for x in x_values:
print(x * x)

1
4
9

is preferred to

In [36]: for i in range(len(x_values)):


print(x_values[i] * x_values[i])

1
4
9

When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code

In [37]: countries = ('Japan', 'Korea', 'China')


cities = ('Tokyo', 'Seoul', 'Beijing')
for country, city in zip(countries, cities):
print(f'The capital of {country} is {city}')

The capital of Japan is Tokyo


The capital of Korea is Seoul
The capital of China is Beijing

The zip() function is also useful for creating dictionaries — for example

In [38]: names = ['Tom', 'John']


marks = ['E', 'F']
dict(zip(names, marks))

Out[38]: {'Tom': 'E', 'John': 'F'}

If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example

In [39]: letter_list = ['a', 'b', 'c']


for index, letter in enumerate(letter_list):
print(f"letter_list[{index}] = '{letter}'")

letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'

The output of the loop is

In [40]: letter_list[0] = 'a'


letter_list[1] = 'b'
letter_list[2] = 'c'
4.5. COMPARISONS AND LOGICAL OPERATORS 63

4.5 Comparisons and Logical Operators

4.5.1 Comparisons

Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False)
A common type is comparisons, such as

In [41]: x, y = 1, 2
x < y

Out[41]: True

In [42]: x > y

Out[42]: False

One of the nice features of Python is that we can chain inequalities

In [43]: 1 < 2 < 3

Out[43]: True

In [44]: 1 <= 2 <= 3

Out[44]: True

As we saw earlier, when testing for equality we use ==

In [45]: x = 1 # Assignment
x == 2 # Comparison

Out[45]: False

For “not equal” use !=

In [46]: 1 != 2

Out[46]: True

Note that when testing conditions, we can use any valid Python expression

In [47]: x = 'yes' if 42 else 'no'


x

Out[47]: 'yes'

In [48]: x = 'yes' if [] else 'no'


x

Out[48]: 'no'
64 4. PYTHON ESSENTIALS

What’s going on here?


The rule is:

• Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.)
and None are all equivalent to False

– for example, and () are equivalent to False in an if clause

• All other values are equivalent to True

– for example, 42 is equivalent to True in an if clause

4.5.2 Combining Expressions

We can combine expressions using and, or and not


These are the standard logical connectives (conjunction, disjunction and denial)

In [49]: 1 < 2 and 'f' in 'foo'

Out[49]: True

In [50]: 1 < 2 and 'g' in 'foo'

Out[50]: False

In [51]: 1 < 2 or 'g' in 'foo'

Out[51]: True

In [52]: not True

Out[52]: False

In [53]: not not True

Out[53]: True

Remember

• P and Q is True if both are True, else False


• P or Q is False if both are False, else True

4.6 More Functions

Let’s talk a bit more about functions, which are all important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
4.6. MORE FUNCTIONS 65

In [54]: max(19, 20)

Out[54]: 20

In [55]: range(4) # in python3 this returns a range iterator object

Out[55]: range(0, 4)

In [56]: list(range(4)) # will evaluate the range iterator and create a list

Out[56]: [0, 1, 2, 3]

In [57]: str(22)

Out[57]: '22'

In [58]: type(22)

Out[58]: int

Two more useful built-in functions are any() and all()

In [59]: bools = False, True, True


all(bools) # True if all are True and False otherwise

Out[59]: False

In [60]: any(bools) # False if all are False and True otherwise

Out[60]: True

The full list of Python built-ins is here


Now let’s talk some more about user-defined functions constructed using the keyword def

4.6.1 Why Write Functions?

User-defined functions are important for improving the clarity of your code by

• separating different strands of logic


• facilitating code reuse

(Writing the same thing twice is almost always a bad idea)


The basics of user-defined functions were discussed here
66 4. PYTHON ESSENTIALS

4.6.2 The Flexibility of Python Functions

As we discussed in the previous lecture, Python functions are very flexible


In particular

• Any number of functions can be defined in a given file


• Functions can be (and often are) defined inside other functions
• Any object can be passed to a function as an argument, including other functions
• A function can return any kind of object, including functions

We already gave an example of how straightforward it is to pass a function to a function


Note that a function can have arbitrarily many return statements (including zero)
Execution of the function terminates when the first return is hit, allowing code like the fol-
lowing example

In [61]: def f(x):


if x < 0:
return 'negative'
return 'nonnegative'

Functions without a return statement automatically return the special Python object None

4.6.3 Docstrings

Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this

In [62]: def f(x):


"""
This function squares its argument
"""
return x**2

After running this code, the docstring is available

In [63]: f?

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument

In [64]: f??

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
4.6. MORE FUNCTIONS 67

Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2

With one question mark we bring up the docstring, and with two we get the source code as
well

4.6.4 One-Line Functions: lambda

The lambda keyword is used to create simple functions on one line


For example, the definitions

In [65]: def f(x):


return x**3

and

In [66]: f = lambda x: x**3

are entirely equivalent


2
To see why lambda is useful, suppose that we want to calculate ∫0 𝑥3 𝑑𝑥 (and have forgotten
our high-school calculus)
The SciPy library has a function called quad that will do this calculation for us
The syntax of the quad function is quad(f, a, b) where f is a function and a and b are
numbers
To create the function 𝑓(𝑥) = 𝑥3 we can use lambda as follows

In [67]: from scipy.integrate import quad

quad(lambda x: x**3, 0, 2)

Out[67]: (4.0, 4.440892098500626e-14)

Here the function created by lambda is said to be anonymous because it was never given a
name

4.6.5 Keyword Arguments

If you did the exercises in the previous lecture, you would have come across the statement

plt.plot(x, 'b-', label="white noise")


68 4. PYTHON ESSENTIALS

In this call to Matplotlib’s plot function, notice that the last argument is passed in
name=argument syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order

• plot(x, 'b-', label="white noise") is different from plot('b-', x, la-


bel="white noise")

Keyword arguments are particularly useful when a function has a lot of arguments, in which
case it’s hard to remember the right order
You can adopt keyword arguments in user-defined functions with no difficulty
The next example illustrates the syntax

In [68]: def f(x, a=1, b=1):


return a + b * x

The keyword argument values we supplied in the definition of f become the default values

In [69]: f(2)

Out[69]: 3

They can be modified as follows

In [70]: f(2, a=4, b=5)

Out[70]: 14

4.7 Coding Style and PEP8

To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
We’ve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true

• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page

In Python, the standard style is set out in PEP8


(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical nota-
tion)
4.8. EXERCISES 69

4.8 Exercises

Solve the following exercises


(For some, the built-in function sum() comes in handy)

4.8.1 Exercise 1

Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip()
Part 2: In one line, count the number of even numbers in 0,…,99

• Hint: x % 2 returns 0 if x is even, 1 otherwise

Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even

4.8.2 Exercise 2

Consider the polynomial

𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (1)
𝑖=0

Write a function p such that p(x, coeff) that computes the value in Eq. (1) given a point
x and a list of coefficients coeff
Try to use enumerate() in your loop

4.8.3 Exercise 3

Write a function that takes a string as an argument and returns the number of capital letters
in the string
Hint: 'foo'.upper() returns 'FOO'

4.8.4 Exercise 4

Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False

• By “sequence” we mean a list, a tuple or a string


• Do the exercise without using sets and set methods

4.8.5 Exercise 5

When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation
70 4. PYTHON ESSENTIALS

Nevertheless, let’s write our own function approximation routine as an exercise


In particular, without using any imports, write a function linapprox that takes as argu-
ments

• A function f mapping some interval [𝑎, 𝑏] into R


• two scalars a and b providing the limits of this interval
• An integer n determining the number of grid points
• A number x satisfying a <= x <= b

and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency

4.9 Solutions

4.9.1 Exercise 1

Part 1 Solution:
Here’s one possible solution

In [71]: x_vals = [1, 2, 3]


y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])

Out[71]: 6

This also works

In [72]: sum(x * y for x, y in zip(x_vals, y_vals))

Out[72]: 6

Part 2 Solution:
One solution is

In [73]: sum([x % 2 == 0 for x in range(100)])

Out[73]: 50

This also works:

In [74]: sum(x % 2 == 0 for x in range(100))

Out[74]: 50

Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 71

In [75]: len([x for x in range(100) if x % 2 == 0])

Out[75]: 50

and

In [76]: sum([1 for x in range(100) if x % 2 == 0])

Out[76]: 50

Part 3 Solution
Here’s one possibility

In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])

Out[77]: 2

4.9.2 Exercise 2
In [78]: def p(x, coeff):
return sum(a * x**i for i, a in enumerate(coeff))

In [79]: p(1, (2, 4))

Out[79]: 6

4.9.3 Exercise 3

Here’s one solution:

In [80]: def f(string):


count = 0
for letter in string:
if letter == letter.upper() and letter.isalpha():
count += 1
return count
f('The Rain in Spain')

Out[80]: 3

4.9.4 Exercise 4

Here’s a solution:

In [81]: def f(seq_a, seq_b):


is_subset = True
for a in seq_a:
if a not in seq_b:
is_subset = False
return is_subset

# == test == #

print(f([1, 2], [1, 2, 3]))


print(f([1, 2, 3], [1, 2]))
72 4. PYTHON ESSENTIALS

True
False

Of course, if we use the sets data type then the solution is easier

In [82]: def f(seq_a, seq_b):


return set(seq_a).issubset(set(seq_b))

4.9.5 Exercise 5
In [83]: def linapprox(f, a, b, n, x):
"""
Evaluates the piecewise linear interpolant of f at x on the interval
[a, b], with n evenly spaced grid points.

Parameters
===========
f : function
The function to approximate

x, a, b : scalars (floats or integers)


Evaluation point and endpoints, with a <= x <= b

n : integer
Number of grid points

Returns
=========
A float. The interpolant evaluated at x

"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals

# === find first grid point larger than x === #


point = a
while point <= x:
point += step

# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point

return f(u) + (x - u) * (f(v) - f(u)) / (v - u)


5

OOP I: Introduction to Object


Oriented Programming

5.1 Contents

• Overview 5.2

• Objects 5.3

• Summary 5.4

5.2 Overview

OOP is one of the major paradigms in programming


The traditional programming paradigm (think Fortran, C, MATLAB, etc.) is called procedu-
ral
It works as follows

• The program has a state corresponding to the values of its variables


• Functions are called to act on these data
• Data are passed back and forth via function calls

In contrast, in the OOP paradigm

• data and functions are “bundled together” into “objects”

(Functions in this context are referred to as methods)

5.2.1 Python and OOP

Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach
However, at a foundational level, Python is object-oriented

73
74 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In particular, in Python, everything is an object


In this lecture, we explain what that statement means and why it matters

5.3 Objects

In Python, an object is a collection of data and instructions held in computer memory that
consists of

1. a type
2. a unique identity
3. data (i.e., content)
4. methods

These concepts are defined and discussed sequentially below

5.3.1 Type

Python provides for different types of objects, to accommodate different categories of data
For example

In [1]: s = 'This is a string'


type(s)

Out[1]: str

In [2]: x = 42 # Now let's create an integer


type(x)

Out[2]: int

The type of an object matters for many expressions


For example, the addition operator between two strings means concatenation

In [3]: '300' + 'cc'

Out[3]: '300cc'

On the other hand, between two numbers it means ordinary addition

In [4]: 300 + 400

Out[4]: 700

Consider the following expression

In [5]: '300' + 400


5.3. OBJECTS 75

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400

TypeError: can only concatenate str (not "int") to str

Here we are mixing types, and it’s unclear to Python whether the user wants to

• convert '300' to an integer and then add it to 400, or


• convert 400 to string and then concatenate it with '300'

Some languages might try to guess but Python is strongly typed

• Type is important, and implicit type conversion is rare


• Python will respond instead by raising a TypeError

To avoid the error, you need to clarify by changing the relevant type
For example,

In [6]: int('300') + 400 # To add as numbers, change the string to an integer

Out[6]: 700

5.3.2 Identity

In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object
The identity of an object can be obtained via the id() function

In [7]: y = 2.5
z = 2.5
id(y)

Out[7]: 140535456630128

In [8]: id(z)

Out[8]: 140535456630080

In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object
The identity of an object is in fact just the address of the object in memory
76 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

5.3.3 Object Content: Data and Attributes

If we set x = 42 then we create an object of type int that contains the data 42
In fact, it contains more, as the following example shows

In [9]: x = 42
x

Out[9]: 42

In [10]: x.imag

Out[10]: 0

In [11]: x.__class__

Out[11]: int

When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type
Any name following a dot is called an attribute of the object to the left of the dot

• e.g.,imag and __class__ are attributes of x

We see from this example that objects have attributes that contain auxiliary information
They also have attributes that act like functions, called methods
These attributes are important, so let’s discuss them in-depth

5.3.4 Methods

Methods are functions that are bundled with objects


Formally, methods are attributes of objects that are callable (i.e., can be called as functions)

In [12]: x = ['foo', 'bar']


callable(x.append)

Out[12]: True

In [13]: callable(x.__doc__)

Out[13]: False

Methods typically act on the data contained in the object they belong to, or combine that
data with other data

In [14]: x = ['a', 'b']


x.append('c')
s = 'This is a string'
s.upper()
5.4. SUMMARY 77

Out[14]: 'THIS IS A STRING'

In [15]: s.lower()

Out[15]: 'this is a string'

In [16]: s.replace('This', 'That')

Out[16]: 'That is a string'

A great deal of Python functionality is organized around method calls


For example, consider the following piece of code

In [17]: x = ['a', 'b']


x[0] = 'aa' # Item assignment using square bracket notation
x

Out[17]: ['aa', 'b']

It doesn’t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows

In [18]: x = ['a', 'b']


x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
x

Out[18]: ['aa', 'b']

(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)

5.4 Summary

In Python, everything in memory is treated as an object


This includes not just lists, strings, etc., but also less obvious things, such as

• functions (once they have been read into memory)


• modules (ditto)
• files opened for reading or writing
• integers, etc.

Consider, for example, functions


When Python reads a function definition, it creates a function object and stores it in mem-
ory
The following code illustrates
78 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In [19]: def f(x): return x**2


f

Out[19]: <function __main__.f(x)>

In [20]: type(f)

Out[20]: function

In [21]: id(f)

Out[21]: 140535456543336

In [22]: f.__name__

Out[22]: 'f'

We can see that f has type, identity, attributes and so on—just like any other object
It also has methods
One example is the __call__ method, which just evaluates the function

In [23]: f.__call__(3)

Out[23]: 9

Another is the __dir__ method, which returns a list of attributes


Modules loaded into memory are also treated as objects

In [24]: import math

id(math)

Out[24]: 140535632790936

This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent
Part II

The Scientific Libraries

79
6

NumPy

6.1 Contents

• Overview 6.2

• Introduction to NumPy 6.3

• NumPy Arrays 6.4

• Operations on Arrays 6.5

• Additional Functionality 6.6

• Exercises 6.7

• Solutions 6.8

“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton

6.2 Overview

NumPy is a first-rate library for numerical programming

• Widely used in academia, finance and industry


• Mature, fast, stable and under continuous development

In this lecture, we introduce NumPy arrays and the fundamental array processing operations
provided by NumPy

6.2.1 References

• The official NumPy documentation

81
82 6. NUMPY

6.3 Introduction to NumPy

The essential problem that NumPy solves is fast array processing


For example, suppose we want to create an array of 1 million random draws from a uniform
distribution and compute the mean
If we did this in pure Python it would be orders of magnitude slower than C or Fortran
This is because

• Loops in Python over Python data types like lists carry significant overhead
• C and Fortran code contains a lot of type information that can be used for optimization
• Various optimizations can be carried out during compilation when the compiler sees the
instructions as a whole

However, for a task like the one described above, there’s no need to switch back to C or For-
tran
Instead, we can use NumPy, where the instructions look like this:

In [1]: import numpy as np

x = np.random.uniform(0, 1, size=1000000)
x.mean()

Out[1]: 0.5004892850074708

The operations of creating the array and computing its mean are both passed out to carefully
optimized machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines

6.3.1 A Comment on Vectorization

NumPy is great for operations that are naturally vectorized


Vectorized operations are precompiled routines that can be sent in batches, like

• matrix multiplication and other linear algebra routines


• generating a vector of random numbers
• applying a fixed transformation (e.g., sine or cosine) to an entire array

In a later lecture, we’ll discuss code that isn’t easy to vectorize and how such routines can
also be optimized

6.4 NumPy Arrays

The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
6.4. NUMPY ARRAYS 83

NumPy arrays power a large proportion of the scientific Python ecosystem


To create a NumPy array containing only zeros we use np.zeros

In [2]: a = np.zeros(3)
a

Out[2]: array([0., 0., 0.])

In [3]: type(a)

Out[3]: numpy.ndarray

NumPy arrays are somewhat like native Python lists, except that

• Data must be homogeneous (all elements of the same type)


• These types must be one of the data types (dtypes) provided by NumPy

The most important of these dtypes are:

• float64: 64 bit floating-point number


• int64: 64 bit integer
• bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64

In [4]: a = np.zeros(3)
type(a[0])

Out[4]: numpy.float64

If we want to use integers we can specify as follows:

In [5]: a = np.zeros(3, dtype=int)


type(a[0])

Out[5]: numpy.int64

6.4.1 Shape and Dimension

Consider the following assignment

In [6]: z = np.zeros(10)

Here z is a flat array with no dimension — neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple

In [7]: z.shape
84 6. NUMPY

Out[7]: (10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute

In [8]: z.shape = (10, 1)


z

Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])

In [9]: z = np.zeros(4)
z.shape = (2, 2)
z

Out[9]: array([[0., 0.],


[0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2))

6.4.2 Creating Arrays

As we’ve seen, the np.zeros function creates an array of zeros


You can probably guess what np.ones creates
Related is np.empty, which creates arrays in memory that can later be populated with data

In [10]: z = np.empty(3)
z

Out[10]: array([0., 0., 0.])

The numbers you see here are garbage values


(Python allocates 3 contiguous 64 bit pieces of memory, and the existing contents of those
memory slots are interpreted as float64 values)
To set up a grid of evenly spaced numbers use np.linspace

In [11]: z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements

To create an identity matrix use either np.identity or np.eye

In [12]: z = np.identity(2)
z
6.4. NUMPY ARRAYS 85

Out[12]: array([[1., 0.],


[0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array

In [13]: z = np.array([10, 20]) # ndarray from Python list


z

Out[13]: array([10, 20])

In [14]: type(z)

Out[14]: numpy.ndarray

In [15]: z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'


z

Out[15]: array([10., 20.])

In [16]: z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists


z

Out[16]: array([[1, 2],


[3, 4]])

See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array

In [17]: na = np.linspace(10, 20, 2)


na is np.asarray(na) # Does not copy NumPy arrays

Out[17]: True

In [18]: na is np.array(na) # Does make a new copy --- perhaps unnecessarily

Out[18]: False

To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details

6.4.3 Array Indexing

For a flat array, indexing is the same as Python sequences:

In [19]: z = np.linspace(1, 2, 5)
z

Out[19]: array([1. , 1.25, 1.5 , 1.75, 2. ])

In [20]: z[0]

Out[20]: 1.0
86 6. NUMPY

In [21]: z[0:2] # Two elements, starting at element 0

Out[21]: array([1. , 1.25])

In [22]: z[-1]

Out[22]: 2.0

For 2D arrays the index syntax is as follows:

In [23]: z = np.array([[1, 2], [3, 4]])


z

Out[23]: array([[1, 2],


[3, 4]])

In [24]: z[0, 0]

Out[24]: 1

In [25]: z[0, 1]

Out[25]: 2

And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows

In [26]: z[0, :]

Out[26]: array([1, 2])

In [27]: z[:, 1]

Out[27]: array([2, 4])

NumPy arrays of integers can also be used to extract elements

In [28]: z = np.linspace(2, 4, 5)
z

Out[28]: array([2. , 2.5, 3. , 3.5, 4. ])

In [29]: indices = np.array((0, 2, 3))


z[indices]

Out[29]: array([2. , 3. , 3.5])

Finally, an array of dtype bool can be used to extract elements

In [30]: z
6.4. NUMPY ARRAYS 87

Out[30]: array([2. , 2.5, 3. , 3.5, 4. ])

In [31]: d = np.array([0, 1, 1, 0, 0], dtype=bool)


d

Out[31]: array([False, True, True, False, False])

In [32]: z[d]

Out[32]: array([2.5, 3. ])

We’ll see why this is useful below


An aside: all elements of an array can be set equal to one number using slice notation

In [33]: z = np.empty(3)
z

Out[33]: array([2. , 3. , 3.5])

In [34]: z[:] = 42
z

Out[34]: array([42., 42., 42.])

6.4.4 Array Methods

Arrays have useful methods, all of which are carefully optimized

In [35]: a = np.array((4, 3, 2, 1))


a

Out[35]: array([4, 3, 2, 1])

In [36]: a.sort() # Sorts a in place


a

Out[36]: array([1, 2, 3, 4])

In [37]: a.sum() # Sum

Out[37]: 10

In [38]: a.mean() # Mean

Out[38]: 2.5

In [39]: a.max() # Max

Out[39]: 4

In [40]: a.argmax() # Returns the index of the maximal element


88 6. NUMPY

Out[40]: 3

In [41]: a.cumsum() # Cumulative sum of the elements of a

Out[41]: array([ 1, 3, 6, 10])

In [42]: a.cumprod() # Cumulative product of the elements of a

Out[42]: array([ 1, 2, 6, 24])

In [43]: a.var() # Variance

Out[43]: 1.25

In [44]: a.std() # Standard deviation

Out[44]: 1.118033988749895

In [45]: a.shape = (2, 2)


a.T # Equivalent to a.transpose()

Out[45]: array([[1, 3],


[2, 4]])

Another method worth knowing is searchsorted()


If z is a nondecreasing array, then z.searchsorted(a) returns the index of the first ele-
ment of z that is >= a

In [46]: z = np.linspace(2, 4, 5)
z

Out[46]: array([2. , 2.5, 3. , 3.5, 4. ])

In [47]: z.searchsorted(2.2)

Out[47]: 1

Many of the methods discussed above have equivalent functions in the NumPy namespace

In [48]: a = np.array((4, 3, 2, 1))

In [49]: np.sum(a)

Out[49]: 10

In [50]: np.mean(a)

Out[50]: 2.5
6.5. OPERATIONS ON ARRAYS 89

6.5 Operations on Arrays

6.5.1 Arithmetic Operations

The operators +, -, *, / and ** all act elementwise on arrays

In [51]: a = np.array([1, 2, 3, 4])


b = np.array([5, 6, 7, 8])
a + b

Out[51]: array([ 6, 8, 10, 12])

In [52]: a * b

Out[52]: array([ 5, 12, 21, 32])

We can add a scalar to each element as follows

In [53]: a + 10

Out[53]: array([11, 12, 13, 14])

Scalar multiplication is similar

In [54]: a * 10

Out[54]: array([10, 20, 30, 40])

The two-dimensional arrays follow the same general rules

In [55]: A = np.ones((2, 2))


B = np.ones((2, 2))
A + B

Out[55]: array([[2., 2.],


[2., 2.]])

In [56]: A + 10

Out[56]: array([[11., 11.],


[11., 11.]])

In [57]: A * B

Out[57]: array([[1., 1.],


[1., 1.]])

In particular, A * B is not the matrix product, it is an element-wise product


90 6. NUMPY

6.5.2 Matrix Multiplication

With Anaconda’s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:

In [58]: A = np.ones((2, 2))


B = np.ones((2, 2))
A @ B

Out[58]: array([[2., 2.],


[2., 2.]])

(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays

In [59]: A = np.array((1, 2))


B = np.array((10, 20))
A @ B

Out[59]: 50

In fact, we can use @ when one element is a Python list or tuple

In [60]: A = np.array(((1, 2), (3, 4)))


A

Out[60]: array([[1, 2],


[3, 4]])

In [61]: A @ (0, 1)

Out[61]: array([2, 4])

Since we are post-multiplying, the tuple is treated as a column vector

6.5.3 Mutability and Copying Arrays

NumPy arrays are mutable data types, like Python lists


In other words, their contents can be altered (mutated) in memory after initialization
We already saw examples above
Here’s another example:

In [62]: a = np.array([42, 44])


a

Out[62]: array([42, 44])

In [63]: a[-1] = 0 # Change last element to 0


a

Out[63]: array([42, 0])


6.5. OPERATIONS ON ARRAYS 91

Mutability leads to the following behavior (which can be shocking to MATLAB program-
mers…)

In [64]: a = np.random.randn(3)
a

Out[64]: array([ 1.05287718, -0.90366748, -1.51731058])

In [65]: b = a
b[0] = 0.0
a

Out[65]: array([ 0. , -0.90366748, -1.51731058])

What’s happened is that we have changed a by changing b


The name b is bound to a and becomes just another reference to the array (the Python as-
signment model is described in more detail later in the course)
Hence, it has equal rights to make changes to that array
This is in fact the most sensible default behavior!
It means that we pass around only pointers to data, rather than making copies
Making copies is expensive in terms of both speed and memory
Making Copies
It is of course possible to make b an independent copy of a when required
This can be done using np.copy

In [66]: a = np.random.randn(3)
a

Out[66]: array([-0.19842005, 0.08435544, -0.34056112])

In [67]: b = np.copy(a)
b

Out[67]: array([-0.19842005, 0.08435544, -0.34056112])

Now b is an independent copy (called a deep copy)

In [68]: b[:] = 1
b

Out[68]: array([1., 1., 1.])

In [69]: a

Out[69]: array([-0.19842005, 0.08435544, -0.34056112])

Note that the change to b has not affected a


92 6. NUMPY

6.6 Additional Functionality

Let’s look at some other useful things we can do with NumPy

6.6.1 Vectorized Functions

NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays

In [70]: z = np.array([1, 2, 3])


np.sin(z)

Out[70]: array([0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops such as

In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])

Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions

In [72]: z

Out[72]: array([1, 2, 3])

In [73]: (1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2)

Out[73]: array([0.24197072, 0.05399097, 0.00443185])

Not all user-defined functions will act element-wise


For example, passing the function f defined below a NumPy array causes a ValueError

In [74]: def f(x):


return 1 if x > 0 else 0

The NumPy function np.where provides a vectorized alternative:

In [75]: x = np.random.randn(4)
x

Out[75]: array([ 1.61695912, -0.70388772, 0.17046687, 0.89294672])

In [76]: np.where(x > 0, 1, 0) # Insert 1 if x > 0 true, otherwise 0

Out[76]: array([1, 0, 1, 1])


6.6. ADDITIONAL FUNCTIONALITY 93

You can also use np.vectorize to vectorize a given function

In [77]: def f(x): return 1 if x > 0 else 0

f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example

Out[77]: array([1, 0, 1, 1])

However, this approach doesn’t always obtain the same speed as a more carefully crafted vec-
torized function

6.6.2 Comparisons

As a rule, comparisons on arrays are done element-wise

In [78]: z = np.array([2, 3])


y = np.array([2, 3])
z == y

Out[78]: array([ True, True])

In [79]: y[0] = 5
z == y

Out[79]: array([False, True])

In [80]: z != y

Out[80]: array([ True, False])

The situation is similar for >, <, >= and <=


We can also do comparisons against scalars

In [81]: z = np.linspace(0, 10, 5)


z

Out[81]: array([ 0. , 2.5, 5. , 7.5, 10. ])

In [82]: z > 3

Out[82]: array([False, False, True, True, True])

This is particularly useful for conditional extraction

In [83]: b = z > 3
b

Out[83]: array([False, False, True, True, True])

In [84]: z[b]

Out[84]: array([ 5. , 7.5, 10. ])

Of course we can—and frequently do—perform this in one step

In [85]: z[z > 3]

Out[85]: array([ 5. , 7.5, 10. ])


94 6. NUMPY

6.6.3 Sub-packages

NumPy provides some additional functionality related to scientific programming through its
sub-packages
We’ve already seen how we can generate random variables using np.random

In [86]: z = np.random.randn(10000) # Generate standard normals


y = np.random.binomial(10, 0.5, size=1000) # 1,000 draws from Bin(10, 0.5)
y.mean()

Out[86]: 5.034

Another commonly used subpackage is np.linalg

In [87]: A = np.array([[1, 2], [3, 4]])

np.linalg.det(A) # Compute the determinant

Out[87]: -2.0000000000000004

In [88]: np.linalg.inv(A) # Compute the inverse

Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])

Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy
We’ll cover the SciPy versions in more detail soon
For a comprehensive list of what’s available in NumPy see this documentation

6.7 Exercises

6.7.1 Exercise 1

Consider the polynomial expression

𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (1)
𝑛=0

Earlier, you wrote a simple function p(x, coeff) to evaluate Eq. (1) without considering
efficiency
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
don’t use this class)

• Hint: Use np.cumprod()


6.7. EXERCISES 95

6.7.2 Exercise 2

Let q be a NumPy array of length n with q.sum() == 1


Suppose that q represents a probability mass function
We wish to generate a discrete random variable 𝑥 such that P{𝑥 = 𝑖} = 𝑞𝑖
In other words, x takes values in range(len(q)) and x = i with probability q[i]
The standard (inverse transform) algorithm is as follows:

• Divide the unit interval [0, 1] into 𝑛 subintervals 𝐼0 , 𝐼1 , … , 𝐼𝑛−1 such that the length of
𝐼𝑖 is 𝑞𝑖
• Draw a uniform random variable 𝑈 on [0, 1] and return the 𝑖 such that 𝑈 ∈ 𝐼𝑖

The probability of drawing 𝑖 is the length of 𝐼𝑖 , which is equal to 𝑞𝑖


We can implement the algorithm as follows

In [89]: from random import uniform

def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]

If you can’t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops

• Hint: Use np.searchsorted and np.cumsum

If you can, implement the functionality as a class called discreteRV, where

• the data for an instance of the class is the vector of probabilities q


• the class has a draw() method, which returns one draw according to the algorithm de-
scribed above

If you can, write the method so that draw(k) returns k draws from q

6.7.3 Exercise 3

Recall our earlier discussion of the empirical cumulative distribution function


Your task is to

1. Make the __call__ method more efficient using NumPy


2. Add a method that plots the ECDF over [𝑎, 𝑏], where 𝑎 and 𝑏 are method parameters
96 6. NUMPY

6.8 Solutions
In [90]: import matplotlib.pyplot as plt
%matplotlib inline

6.8.1 Exercise 1

This code does the job

In [91]: def p(x, coef):


X = np.empty(len(coef))
X[0] = 1
X[1:] = x
y = np.cumprod(X) # y = [1, x, x**2,...]
return coef @ y

Let’s test it

In [92]: coef = np.ones(3)


print(coef)
print(p(1, coef))
# For comparison
q = np.poly1d(coef)
print(q(1))

[1. 1. 1.]
3.0
3.0

6.8.2 Exercise 2

Here’s our first pass at a solution:

In [93]: from numpy import cumsum


from numpy.random import uniform

class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""

def __init__(self, q):


"""
The argument q is a NumPy array, or array like, nonnegative and sums
to 1
"""
self.q = q
self.Q = cumsum(q)

def draw(self, k=1):


"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))

The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
6.8. SOLUTIONS 97

In [94]: q = (0.1, 0.9)


d = DiscreteRV(q)
d.q = (0.5, 0.5)

The problem is that Q does not change accordingly, and Q is the data used in the draw
method
To deal with this, one option is to compute Q every time the draw method is called
But this is inefficient relative to computing Q once-off
A better option is to use descriptors
A solution from the quantecon library using descriptors that behaves as we desire can be
found here

6.8.3 Exercise 3

An example solution is given below


In essence, we’ve just taken this code from QuantEcon and added in a plot method

In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method

"""

class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.

Parameters
----------
observations : array_like
An array of observations

Attributes
----------
observations : array_like
An array of observations

"""

def __init__(self, observations):


self.observations = np.asarray(observations)

def __call__(self, x):


"""
Evaluates the ecdf at x

Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated

Returns
-------
scalar(float)
Fraction of the sample less than x

"""
return np.mean(self.observations <= x)

def plot(self, a=None, b=None):


"""
Plot the ecdf on the interval [a, b].
98 6. NUMPY

Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval

"""

# === choose reasonable interval if [a, b] not specified === #


if a is None:
a = self.observations.min() - self.observations.std()
if b is None:
b = self.observations.max() + self.observations.std()

# === generate plot === #


x_vals = np.linspace(a, b, num=100)
f = np.vectorize(self.__call__)
plt.plot(x_vals, f(x_vals))
plt.show()

Here’s an example of usage

In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
7

Matplotlib

7.1 Contents

• Overview 7.2

• The APIs 7.3

• More Features 7.4

• Further Reading 7.5

• Exercises 7.6

• Solutions 7.7

7.2 Overview

We’ve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with

• high-quality 2D and 3D plots


• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine-grained control over all aspects of presentation
• animation, etc.

7.2.1 Matplotlib’s Split Personality

Matplotlib is unusual in that it offers two different interfaces to plotting


One is a simple MATLAB-style API (Application Programming Interface) that was written to
help MATLAB refugees find a ready home
The other is a more “Pythonic” object-oriented API
For reasons described below, we recommend that you use the second API
But first, let’s discuss the difference

99
100 7. MATPLOTLIB

7.3 The APIs

7.3.1 The MATLAB-style API

Here’s the kind of easy example you might find in introductory treatments

In [1]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np

x = np.linspace(0, 10, 200)


y = np.sin(x)

plt.plot(x, y, 'b-', linewidth=2)


plt.show()

This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line)
This leads us to the alternative, object-oriented Matplotlib API

7.3.2 The Object-Oriented API

Here’s the code corresponding to the preceding figure using the object-oriented API

In [2]: fig, ax = plt.subplots()


ax.plot(x, y, 'b-', linewidth=2)
plt.show()
7.3. THE APIS 101

Here the call fig, ax = plt.subplots() returns a pair, where

• fig is a Figure instance—like a blank canvas


• ax is an AxesSubplot instance—think of a frame for plotting in

The plot() function is actually a method of ax


While there’s a bit more typing, the more explicit use of objects gives us better control
This will become more clear as we go along

7.3.3 Tweaks

Here we’ve changed the line to red and added a legend

In [3]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
102 7. MATPLOTLIB

We’ve also used alpha to make the line slightly transparent—which makes it look smoother
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center')

In [4]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

If everything is properly configured, then adding LaTeX is trivial


7.3. THE APIS 103

In [5]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

Controlling the ticks, adding titles and so on is also straightforward

In [6]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
ax.set_yticks([-1, 0, 1])
ax.set_title('Test plot')
plt.show()
104 7. MATPLOTLIB

7.4 More Features

Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them
We mention just a few

7.4.1 Multiple Plots on One Axis

It’s straightforward to generate multiple plots on the same axes


Here’s an example that randomly generates three normal densities and adds a label with their
mean

In [7]: from scipy.stats import norm


from random import uniform

fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()

7.4.2 Multiple Subplots

Sometimes we want multiple subplots in one figure


7.4. MORE FEATURES 105

Here’s an example that generates 6 histograms

In [8]: num_rows, num_cols = 3, 2


fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
for i in range(num_rows):
for j in range(num_cols):
m, s = uniform(-1, 1), uniform(1, 2)
x = norm.rvs(loc=m, scale=s, size=100)
axes[i, j].hist(x, alpha=0.6, bins=20)
t = f'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
plt.show()
106 7. MATPLOTLIB

7.4.3 3D Plots

Matplotlib does a nice job of 3D plots — here is one example

In [9]: from mpl_toolkits.mplot3d.axes3d import Axes3D


from matplotlib import cm

def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)


ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()

7.4.4 A Customizing Function

Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
7.5. FURTHER READING 107

Here’s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes
Read carefully through the code and see if you can follow what’s going on

In [10]: def subplots():


"Custom subplots with axes through the origin"
fig, ax = plt.subplots()

# Set the axes through the origin


for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.grid()
return fig, ax

fig, ax = subplots() # Call the local version, not plt.subplots()


x = np.linspace(-2, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='lower right')
plt.show()

The custom subplots function

1. calls the standard plt.subplots function internally to generate the fig, ax pair,
2. makes the desired customizations to ax, and
3. passes the fig, ax pair back to the calling code

7.5 Further Reading

• The Matplotlib gallery provides many examples


• A nice Matplotlib tutorial by Nicolas Rougier, Mike Muller and Gael Varoquaux
108 7. MATPLOTLIB

• mpltools allows easy switching between plot styles


• Seaborn facilitates common statistics plots in Matplotlib

7.6 Exercises

7.6.1 Exercise 1

Plot the function

𝑓(𝑥) = cos(𝜋𝜃𝑥) exp(−𝑥)

over the interval [0, 5] for each 𝜃 in np.linspace(0, 2, 10)


Place all the curves in the same figure
The output should look like this

7.7 Solutions

7.7.1 Exercise 1

Here’s one solution

In [11]: θ_vals = np.linspace(0, 2, 10)


x = np.linspace(0, 5, 200)
fig, ax = plt.subplots()

for θ in θ_vals:
ax.plot(x, np.cos(np.pi * θ * x) * np.exp(- x))

plt.show()
7.7. SOLUTIONS 109
110 7. MATPLOTLIB
8

SciPy

8.1 Contents

• SciPy versus NumPy 8.2

• Statistics 8.3

• Roots and Fixed Points 8.4

• Optimization 8.5

• Integration 8.6

• Linear Algebra 8.7

• Exercises 8.8

• Solutions 8.9

SciPy builds on top of NumPy to provide common tools for scientific programming such as

• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc

Like NumPy, SciPy is stable, mature and widely used


Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as
LAPACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole
A more common approach is to get some idea of what’s in the library and then look up docu-
mentation as required
In this lecture, we aim only to highlight some useful parts of the package

111
112 8. SCIPY

8.2 SciPy versus NumPy

SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initializa-
tion file

In [1]: # Import numpy symbols to scipy namespace


import numpy as _num
linalg = None
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']

del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')

However, it’s more common and better practice to use NumPy functionality explicitly

In [2]: import numpy as np

a = np.identity(3)

What is useful in SciPy is the functionality in its sub-packages

• scipy.optimize, scipy.integrate, scipy.stats, etc.

These sub-packages and their attributes need to be imported separately

In [3]: from scipy.integrate import quad


from scipy.optimize import brentq
# etc

Let’s explore some of the major sub-packages

8.3 Statistics

The scipy.stats subpackage supplies

• numerous random variable objects (densities, cumulative distributions, random sam-


pling, etc.)
• some estimation procedures
• some statistical tests
8.3. STATISTICS 113

8.3.1 Random Variables and Distributions

Recall that numpy.random provides functions for generating random variables

In [4]: np.random.beta(5, 5, size=3)

Out[4]: array([0.46025917, 0.2775525 , 0.25400856])

This generates a draw from the distribution below when a, b = 5, 5

𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢

Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface
Here’s an example of usage

In [5]: from scipy.stats import beta


import matplotlib.pyplot as plt
%matplotlib inline

q = beta(5, 5) # Beta(a, b), with a = b = 5


obs = q.rvs(2000) # 2000 observations
grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots(figsize=(10, 6))


ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()

In this code, we created a so-called rv_frozen object, via the call q = beta(5, 5)
114 8. SCIPY

The “frozen” part of the notation implies that q represents a particular distribution with a
particular set of parameters
Once we’ve done so, we can then generate random numbers, evaluate the density, etc., all
from this fixed distribution

In [6]: q.cdf(0.4) # Cumulative distribution function

Out[6]: 0.26656768000000003

In [7]: q.pdf(0.4) # Density function

Out[7]: 2.0901888000000013

In [8]: q.ppf(0.8) # Quantile (inverse cdf) function

Out[8]: 0.6339134834642708

In [9]: q.mean()

Out[9]: 0.5

The general syntax for creating these objects is

identifier = scipy.stats.distribution_name(shape_parameters)

where distribution_name is one of the distribution names in scipy.stats


There are also two keyword arguments, loc and scale, which following our example above,
are called as

identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)

These transform the original random variable 𝑋 into 𝑌 = 𝑐 + 𝑑𝑋


The methods rvs, pdf, cdf, etc. are transformed accordingly
Before finishing this section, we note that there is an alternative way of calling the methods
described above
For example, the previous code can be replaced by

In [10]: obs = beta.rvs(5, 5, size=2000)


grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
8.4. ROOTS AND FIXED POINTS 115

8.3.2 Other Goodies in scipy.stats

There are a variety statistical functions in scipy.stats


For example, scipy.stats.linregress implements simple linear regression

In [11]: from scipy.stats import linregress

x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept

Out[11]: (2.0015196606243273, 0.009718239356687364)

To see the full list, consult the documentation

8.4 Roots and Fixed Points

A root of a real function 𝑓 on [𝑎, 𝑏] is an 𝑥 ∈ [𝑎, 𝑏] such that 𝑓(𝑥) = 0


For example, if we plot the function

𝑓(𝑥) = sin(4(𝑥 − 1/4)) + 𝑥 + 𝑥20 − 1 (2)

with 𝑥 ∈ [0, 1] we get

In [12]: f = lambda x: np.sin(4 * (x - 1/4)) + x + x**20 - 1


x = np.linspace(0, 1, 100)
116 8. SCIPY

plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()

The unique root is approximately 0.408


Let’s consider some numerical techniques for finding roots

8.4.1 Bisection

One of the most common algorithms for numerical root-finding is bisection


To understand the idea, recall the well-known game where

• Player A thinks of a secret number between 1 and 100

• Player B asks if it’s less than 50

– If yes, B asks if it’s less than 25


– If no, B asks if it’s less than 75

And so on
This is bisection
Here’s a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with 𝑓(𝑎) < 0 < 𝑓(𝑏)
8.4. ROOTS AND FIXED POINTS 117

In [13]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:


middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper

return 0.5 * (upper + lower)

In fact, SciPy provides its own bisection function, which we now test using the function 𝑓 de-
fined in Eq. (2)

In [14]: from scipy.optimize import bisect

bisect(f, 0, 1)

Out[14]: 0.4082935042806639

8.4.2 The Newton-Raphson Method

Another very common root-finding algorithm is the Newton-Raphson method


In SciPy this algorithm is implemented by scipy.optimize.newton
Unlike bisection, the Newton-Raphson method uses local slope information
This is a double-edged sword:

• When the function is well-behaved, the Newton-Raphson method is faster than bisec-
tion
• When the function is less well-behaved, the Newton-Raphson might fail

Let’s investigate this using the same function 𝑓, first looking at potential instability

In [15]: from scipy.optimize import newton

newton(f, 0.2) # Start the search at initial condition x = 0.2

Out[15]: 0.40829350427935673

In [16]: newton(f, 0.7) # Start the search at x = 0.7 instead

Out[16]: 0.7001700000000279

The second initial condition leads to failure of convergence


On the other hand, using IPython’s timeit magic, we see that newton can be much faster

In [17]: %timeit bisect(f, 0, 1)


118 8. SCIPY

62.4 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [18]: %timeit newton(f, 0.2)

149 µs ± 5.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

8.4.3 Hybrid Methods

So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle

• If you have specific knowledge about your function, you might be able to exploit it to
generate efficiency
• If not, then the algorithm choice involves a trade-off between the speed of convergence
and robustness

In practice, most default algorithms for root-finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following man-
ner:

1. Attempt to use a fast method


2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm

In scipy.optimize, the function brentq is such a hybrid method and a good default

In [19]: brentq(f, 0, 1)

Out[19]: 0.40829350427936706

In [20]: %timeit brentq(f, 0, 1)

15.6 µs ± 840 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Here the correct solution is found and the speed is almost the same as newton

8.4.4 Multivariate Root-Finding

Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK


See the documentation for details
8.5. OPTIMIZATION 119

8.4.5 Fixed Points

SciPy has a function for finding (scalar) fixed points too

In [21]: from scipy.optimize import fixed_point

fixed_point(lambda x: x**2, 10.0) # 10.0 is an initial guess

Out[21]: array(1.)

If you don’t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function 𝑓 is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥)

8.5 Optimization

Most numerical packages provide only functions for minimization


Maximization can be performed by recalling that the maximizer of a function 𝑓 on domain 𝐷
is the minimizer of −𝑓 on 𝐷
Minimization is closely related to root-finding: For smooth functions, interior optima corre-
spond to roots of the first derivative
The speed/robustness trade-off described above is present with numerical optimization too
Unless you have some prior information you can exploit, it’s usually best to use hybrid meth-
ods
For constrained, univariate (i.e., scalar) minimization, a good hybrid option is fminbound

In [22]: from scipy.optimize import fminbound

fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]

Out[22]: 0.0

8.5.1 Multivariate Optimization

Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg,


fmin_bfgs, and fmin_ncg
Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc,
fmin_cobyla
See the documentation for details

8.6 Integration

Most numerical integration methods work by computing the integral of an approximating


polynomial
The resulting error depends on how well the polynomial fits the integrand, which in turn de-
pends on how “regular” the integrand is
120 8. SCIPY

In SciPy, the relevant module for numerical integration is scipy.integrate


A good default for univariate integration is quad

In [23]: from scipy.integrate import quad

integral, error = quad(lambda x: x**2, 0, 1)


integral

Out[23]: 0.33333333333333337

In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integration—a useful one is fixed_quad, which is fast
and hence works well inside for loops
There are also functions for multivariate integration
See the documentation for more details

8.7 Linear Algebra

We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines

8.8 Exercises

8.8.1 Exercise 1

Previously we discussed the concept of recursive function calls


Write a recursive implementation of the bisection function described above, which we repeat
here for convenience

In [24]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:


middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper

return 0.5 * (upper + lower)


8.9. SOLUTIONS 121

Test it on the function f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 -


1 discussed above

8.9 Solutions

8.9.1 Exercise 1

Here’s a reasonable solution:

In [25]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root-finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b
if upper - lower < tol:
return 0.5 * (upper + lower)
else:
middle = 0.5 * (upper + lower)
print(f'Current mid point = {middle}')
if f(middle) > 0: # Implies root is between lower and middle
return bisect(f, lower, middle)
else: # Implies root is between middle and upper
return bisect(f, middle, upper)

We can test it as follows

In [26]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1


bisect(f, 0, 1)

Current mid point = 0.5


Current mid point = 0.25
Current mid point = 0.375
Current mid point = 0.4375
Current mid point = 0.40625
Current mid point = 0.421875
Current mid point = 0.4140625
Current mid point = 0.41015625
Current mid point = 0.408203125
Current mid point = 0.4091796875
Current mid point = 0.40869140625
Current mid point = 0.408447265625
Current mid point = 0.4083251953125
Current mid point = 0.40826416015625

Out[26]: 0.408294677734375
122 8. SCIPY
9

Numba

9.1 Contents

• Overview 9.2

• Where are the Bottlenecks? 9.3

• Vectorization 9.4

• Numba 9.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

9.2 Overview

In our lecture on NumPy, we learned one method to improve speed and efficiency in numeri-
cal work
That method, called vectorization, involved sending array processing operations in batch to
efficient low-level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory-intensive
Another problem is that only some algorithms can be vectorized
In the last few years, a new Python library called Numba has appeared that solves many of
these problems
It does so through something called just in time (JIT) compilation
JIT compilation is effective in many numerical settings and can generate extremely fast, effi-
cient code
It can also do other tricks such as facilitate multithreading (a form of parallelization well
suited to numerical work)

123
124 9. NUMBA

9.2.1 The Need for Speed

To understand what Numba does and why, we need some background knowledge
Let’s start by thinking about higher-level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment

• specifying variable types


• memory allocation/deallocation, etc.

The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug
The downside is that Python is harder to optimize — that is, turn into fast machine code —
than languages like C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high-speed languages for the great major-
ity of scientific computing tasks
This is because

1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combi-
nation of NumPy and Numba

This lecture provides a guide

9.3 Where are the Bottlenecks?

Let’s start by trying to understand why high-level languages like Python are slower than com-
piled code

9.3.1 Dynamic Typing

Consider this Python operation

In [2]: a, b = 10, 10
a + b

Out[2]: 20

Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
9.3. WHERE ARE THE BOTTLENECKS? 125

In [3]: a, b = 'foo', 'bar'


a + b

Out[3]: 'foobar'

If a and b are lists, then a + b requires list concatenation

In [4]: a, b = ['foo'], ['bar']


a + b

Out[4]: ['foo', 'bar']

(We say that the operator + is overloaded — its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10

#include <stdio.h>

int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}

The variables i and sum are explicitly declared to be integers


Hence, the meaning of addition here is completely unambiguous

9.3.2 Data Access

Another drag on speed for high-level languages is data access


To illustrate, let’s consider the problem of summing some data — say, a collection of integers
Summing with Compiled Code
In C or Fortran, these integers would typically be stored in an array, which is a simple data
structure for storing homogeneous data
Such an array is stored in a single contiguous block of memory

• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
126 9. NUMBA

• For example, a 64 bit integer is stored in 8 bytes of memory


• An array of 𝑛 such integers occupies 8𝑛 consecutive memory slots

Moreover, the compiler is made aware of the data type by the programmer

• In this case 64 bit integers

Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount

• In this case 8 bytes

Summing in Pure Python


Python tries to replicate these ideas to some degree
For example, in the standard Python implementation (CPython), list elements are placed in
memory locations that are in a sense contiguous
However, these list elements are more like pointers to data rather than actual data
Hence, there is still overhead involved in accessing the data values themselves
This is a considerable drag on speed
In fact, it’s generally true that memory traffic is a major culprit when it comes to slow execu-
tion
Let’s look at some ways around these problems

9.4 Vectorization

Vectorization is about sending batches of related operations to native machine code

• The machine code itself is typically compiled from carefully optimized C or Fortran

This can greatly accelerate many (but not all) numerical computations

9.4.1 Operations on Arrays

First, let’s run some imports

In [5]: import random


import numpy as np
import quantecon as qe

Now let’s try this non-vectorized code

In [6]: qe.util.tic() # Start timing


n = 100_000
sum = 0
for i in range(n):
x = random.uniform(0, 1)
sum += x**2
qe.util.toc() # End timing
9.4. VECTORIZATION 127

TOC: Elapsed: 0:00:0.04

Out[6]: 0.04178762435913086

Now compare this vectorized code

In [7]: qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[7]: 0.0038301944732666016

The second code block — which achieves the same thing as the first — runs much faster
The reason is that in the second implementation we have broken the loop down into three
basic operations

1. draw n uniforms
2. square them
3. sum them

These are sent as batch operators to optimized machine code


Apart from minor overheads associated with sending data back and forth, the result is C or
Fortran-like speed
When we run batch operations on arrays like this, we say that the code is vectorized
Vectorized code is typically fast and efficient
It is also surprisingly flexible, in the sense that many operations can be vectorized
The next section illustrates this point

9.4.2 Universal Functions

Many functions provided by NumPy are so-called universal functions — also called ufuncs
This means that they

• map scalars into scalars, as expected


• map arrays into arrays, acting element-wise

For example, np.cos is a ufunc:

In [8]: np.cos(1.0)

Out[8]: 0.5403023058681398
128 9. NUMBA

In [9]: np.cos(np.linspace(0, 1, 3))

Out[9]: array([1. , 0.87758256, 0.54030231])

By exploiting ufuncs, many operations can be vectorized


For example, consider the problem of maximizing a function 𝑓 of two variables (𝑥, 𝑦) over the
square [−𝑎, 𝑎] × [−𝑎, 𝑎]
For 𝑓 and 𝑎 let’s choose

cos(𝑥2 + 𝑦2 )
𝑓(𝑥, 𝑦) = and 𝑎 = 3
1 + 𝑥2 + 𝑦 2

Here’s a plot of 𝑓

In [10]: import matplotlib.pyplot as plt


%matplotlib inline
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)


ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
9.4. VECTORIZATION 129

To maximize it, we’re going to use a naive grid search:

1. Evaluate 𝑓 for all (𝑥, 𝑦) in a grid on the square


2. Return the maximum of observed values

Here’s a non-vectorized version that uses Python loops

In [11]: def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)


m = -np.inf

qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z

qe.toc()

TOC: Elapsed: 0:00:2.74

Out[11]: 2.7486989498138428

And here’s a vectorized version

In [12]: def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
130 9. NUMBA

grid = np.linspace(-3, 3, 1000)


x, y = np.meshgrid(grid, grid)

qe.tic()
np.max(f(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[12]: 0.02516627311706543

In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(We’ll make it even faster again below when we discuss Numba)

9.4.3 Pros and Cons of Vectorization

At its best, vectorization yields fast, simple code


However, it’s not without disadvantages
One issue is that it can be highly memory-intensive
For example, the vectorized maximization routine above is far more memory intensive than
the non-vectorized version that preceded it
Another issue is that not all algorithms can be vectorized
In these kinds of settings, we need to go back to loops
Fortunately, there are nice ways to speed up Python loops

9.5 Numba

One exciting development in this direction is Numba


Numba aims to automatically compile functions to native machine code instructions on the
fly
The process isn’t flawless, since Numba needs to infer type information on all variables to
generate pure machine instructions
Such inference isn’t possible in every setting
But for simple routines, Numba infers types very well
Moreover, the “hot loops” at the heart of our code that we need to speed up are often such
simple routines

9.5.1 Prerequisites

If you followed our set up instructions, then Numba should be installed


Make sure you have the latest version of Anaconda by running conda update anaconda
from a terminal (Mac, Linux) / Anaconda command prompt (Windows)
9.5. NUMBA 131

9.5.2 An Example

Let’s consider some problems that are difficult to vectorize


One is generating the trajectory of a difference equation given an initial condition
Let’s take the difference equation to be the quadratic map

𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )

Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis

In [13]: def qm(x0, n):


x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()

To speed this up using Numba is trivial using Numba’s jit function

In [14]: from numba import jit

qm_numba = jit(qm) # qm_numba is now a 'compiled' version of qm

Let’s time and compare identical function calls across these two versions:

In [15]: qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
132 9. NUMBA

TOC: Elapsed: 0:00:0.06

In [16]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.11

The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:

In [17]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.00

In [18]: time1 / time2 # Calculate speed gain

Out[18]: 174.51294400963275

That’s a speed increase of two orders of magnitude!


Your mileage will of course vary depending on hardware and so on
Nonetheless, two orders of magnitude is huge relative to how simple and clear the implemen-
tation is
Decorator Notation
If you don’t need a separate name for the “numbafied” version of qm, you can just put @jit
before the function

In [19]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

This is equivalent to qm = jit(qm)

9.5.3 How and When it Works

Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
9.5. NUMBA 133

In an ideal setting, Numba can infer all necessary type information


This allows it to generate native machine code, without having to call the Python runtime
environment
In such a setting, Numba will be on par with machine code from low-level languages
When Numba cannot infer all type information, some Python objects are given generic ob-
ject status, and some code is generated using the Python runtime
In this second setting, Numba typically provides only minor speed gains — or none at all
Hence, it’s prudent when using Numba to focus on speeding up small, time-critical snippets of
code
This will give you much better performance than blanketing your Python programs with
@jit statements
A Gotcha: Global Variables
Consider the following example

In [20]: a = 1

@jit
def add_x(x):
return a + x

print(add_x(10))

11

In [21]: a = 2

print(add_x(10))

11

Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability

9.5.4 Numba for Vectorization

Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximiza-
tion problem discussed above

In [22]: from numba import vectorize

@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)


x, y = np.meshgrid(grid, grid)
134 9. NUMBA

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.03

Out[22]: 0.030055522918701172

This is faster than our vectorized version using NumPy’s ufuncs


Why should that be? After all, anything vectorized with NumPy will be running in fast C or
Fortran code
The reason is that it’s much less memory-intensive
For example, when NumPy computes np.cos(x**2 + y**2) it first creates the intermedi-
ate arrays x**2 and y**2, then it creates the array np.cos(x**2 + y**2)
In our @vectorize version using Numba, the entire operator is reduced to a single vector-
ized process and none of these intermediate arrays are created
We can gain further speed improvements using Numba’s automatic parallelization feature by
specifying target='parallel'
In this case, we need to specify the types of our inputs and outputs

In [23]: @vectorize('float64(float64, float64)', target='parallel')


def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[23]: 0.023700714111328125

This is a striking speed up with very little effort


10

Other Scientific Libraries

10.1 Contents

• Overview 10.2

• Cython 10.3

• Joblib 10.4

• Other Options 10.5

• Exercises 10.6

• Solutions 10.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

10.2 Overview

In this lecture, we review some other scientific libraries that are useful for economic research
and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass

10.3 Cython

Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As you’ll recall, Numba solves this problem (where possible) by inferring type
Cython’s approach is different — programmers add type definitions directly to their “Python”
code

135
136 10. OTHER SCIENTIFIC LIBRARIES

As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code
Cython also takes care of building language extensions — the wrapper code that interfaces
between the resulting compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook

• See the book Cython by Kurt Smith or the online documentation

10.3.1 A First Example

Let’s start with a rather artificial example


𝑛
Suppose that we want to compute the sum ∑𝑖=0 𝛼𝑖 for given 𝛼, 𝑛
Suppose further that we’ve forgotten the basic formula

𝑛
1 − 𝛼𝑛+1
∑ 𝛼𝑖 =
𝑖=0
1−𝛼

for a geometric progression and hence have resolved to rely on a loop


Python vs C
Here’s a pure Python function that does the job

In [2]: def geo_prog(alpha, n):


current = 1.0
sum = current
for i in range(n):
current = current * alpha
sum = sum + current
return sum

This works fine but for large 𝑛 it is slow


Here’s a C function that will do the same thing

double geo_prog(double alpha, int n) {


double current = 1.0;
double sum = current;
int i;
for (i = 1; i <= n; i++) {
current = current * alpha;
sum = sum + current;
}
return sum;
}
10.3. CYTHON 137

If you’re not familiar with C, the main thing you should take notice of is the type definitions

• int means integer


• double means double precision floating-point number
• the double in double geo_prog(... indicates that the function will return a dou-
ble

Not surprisingly, the C code is faster than the Python code


A Cython Implementation
Cython implementations look like a convex combination of Python and C
We’re going to run our Cython code in the Jupyter notebook, so we’ll start by loading the
Cython extension in a notebook cell

In [3]: %load_ext Cython

In the next cell, we execute the following

In [4]: %%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum

Here cdef is a Cython keyword indicating a variable declaration and is followed by a type
The %%cython line at the top is not actually Cython code — it’s a Jupyter cell magic indi-
cating the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within
Python
What you are in fact calling is compiled C code with a Python call interface

In [5]: import quantecon as qe


qe.util.tic()
geo_prog(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.08

Out[5]: 0.0884397029876709

In [6]: qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[6]: 0.03421354293823242
138 10. OTHER SCIENTIFIC LIBRARIES

10.3.2 Example 2: Cython with NumPy Arrays

Let’s go back to the first problem that we worked with: generating the iterates of the
quadratic map

𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )

The problem of computing iterates and returning a time series requires us to work with ar-
rays
The natural array type to work with is NumPy arrays
Here’s a Cython implementation that initializes, populates and returns a NumPy array

In [7]: %%cython
import numpy as np

def qm_cython_first_pass(double x0, int n):


cdef int t
x = np.zeros(n+1, float)
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)

If you run this code and time it, you will see that its performance is disappointing — nothing
like the speed gain we got from Numba

In [8]: qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[8]: 0.03150629997253418

This example was also computed in the Numba lecture, and you can see Numba is around 90
times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cython’s typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Here’s an example:

In [9]: %%cython
import numpy as np
from numpy cimport float_t

def qm_cython(double x0, int n):


cdef int t
x_np_array = np.zeros(n+1, dtype=float)
cdef float_t [:] x = x_np_array
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)
10.4. JOBLIB 139

Here

• cimport pulls in some compile-time information from NumPy


• cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array
x_np_array
• the return statement uses np.asarray(x) to convert the memoryview back to a
NumPy array

Let’s time it:

In [10]: qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[10]: 0.0006136894226074219

This is fast, although still slightly slower than qm_numba

10.3.3 Summary

Cython requires more expertise than Numba, and is a little more fiddly in terms of getting
good performance
In fact, it’s surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,

• Cython is a very mature, stable and widely used tool


• Cython can be more useful than Numba when working with larger, more sophisticated
applications

10.4 Joblib

Joblib is a popular Python library for caching and parallelization


To install it, start Jupyter and type

In [11]: !pip install joblib

Requirement already satisfied: joblib in /home/anju/anaconda3/lib/python3.7/site-packages (0.13.2)

from within a notebook


Here we review just the basics
140 10. OTHER SCIENTIFIC LIBRARIES

10.4.1 Caching

Perhaps, like us, you sometimes run a long computation that simulates a model at a given set
of parameters — to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all
again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to
you when you repeat the calculation

10.4.2 An Example

Let’s look at a toy example, related to the quadratic map model discussed above
Let’s say we want to generate a long trajectory from a certain initial condition 𝑥0 and see
what fraction of the sample is below 0.1
(We’ll omit JIT compilation or other speedups for simplicity)
Here’s our code

In [12]: from joblib import Memory


location = './cachedir'
memory = Memory(location='./joblib_cache')

@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)

We are using joblib to cache the result of calling qm at a given set of parameters
With the argument location=’./joblib_cache’, any call to this function results in both the in-
put values and output values being stored a subdirectory joblib_cache of the present working
directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output
that notes information being cached

In [13]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

________________________________________________________________________________
[Memory] Calling __main__--home-anju-Desktop-lecture-source-py-_build-jupyter-executed-__ipython-input__.qm…
qm(0.2, 10000000)
_______________________________________________________________qm - 8.9s, 0.1min
TOC: Elapsed: 0:00:8.85

Out[13]: 8.85545039176941
10.5. OTHER OPTIONS 141

The next time we call the function with the same set of parameters, the result is returned
almost instantaneously

In [14]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[14]: 0.0007827281951904297

10.5 Other Options

There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a
try

10.6 Exercises

10.6.1 Exercise 1

Later we’ll learn all about finite-state Markov chains


For now, let’s just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes — high or low
The transition probabilities across states are as follows

For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be

• high with probability 0.8


• low with probability 0.2
142 10. OTHER SCIENTIFIC LIBRARIES

Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare
speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3

10.7 Solutions

10.7.1 Exercise 1

We let

• 0 represent “low”
• 1 represent “high”

In [15]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively

Here’s a pure Python version of the function

In [16]: def compute_series(n):


x = np.empty(n, dtype=int)
x[0] = 1 # Start in state 1
U = np.random.uniform(0, 1, size=n)
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return x

Let’s run this code and check that the fraction of time spent in the low state is about 0.666

In [17]: n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0

0.6629

Now let’s time it

In [18]: qe.util.tic()
compute_series(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.07

Out[18]: 0.0751335620880127
10.7. SOLUTIONS 143

Next let’s implement a Numba version, which is easy

In [19]: from numba import jit

compute_series_numba = jit(compute_series)

Let’s check we still get the right numbers

In [20]: x = compute_series_numba(n)
print(np.mean(x == 0))

0.66566

Let’s see the time

In [21]: qe.util.tic()
compute_series_numba(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[21]: 0.0015265941619873047

This is a nice speed improvement for one line of code


Now let’s implement a Cython version

In [22]: %load_ext Cython

The Cython extension is already loaded. To reload it, use:


%reload_ext Cython

In [23]: %%cython
import numpy as np
from numpy cimport int_t, float_t

def compute_series_cy(int n):


# == Create NumPy arrays first == #
x_np = np.empty(n, dtype=int)
U_np = np.random.uniform(0, 1, size=n)
# == Now create memoryviews of the arrays == #
cdef int_t [:] x = x_np
cdef float_t [:] U = U_np
# == Other variable declarations == #
cdef float p = 0.1
cdef float q = 0.2
cdef int t
# == Main loop == #
x[0] = 1
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return np.asarray(x)

In [24]: compute_series_cy(10)
144 10. OTHER SCIENTIFIC LIBRARIES

Out[24]: array([1, 1, 1, 1, 0, 0, 1, 0, 0, 0])

In [25]: x = compute_series_cy(n)
print(np.mean(x == 0))

0.66746

In [26]: qe.util.tic()
compute_series_cy(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[26]: 0.0033597946166992188

The Cython implementation is fast but not as fast as Numba


Part III

Advanced Python Programming

145
11

Writing Good Code

11.1 Contents

• Overview 11.2

• An Example of Bad Code 11.3

• Good Coding Practice 11.4

• Revisiting the Example 11.5

• Summary 11.6

11.2 Overview

When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing — such as just in time compi-
lation — and how they affect good program design

11.3 An Example of Bad Code

Let’s have a look at some poorly written code


The job of the code is to generate and plot time series of the simplified Solow model

𝑘𝑡+1 = 𝑠𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡 , 𝑡 = 0, 1, 2, … (1)

Here

147
148 11. WRITING GOOD CODE

• 𝑘𝑡 is capital at time 𝑡 and


• 𝑠, 𝛼, 𝛿 are parameters (savings, a productivity parameter and depreciation)

For each parameterization, the code

1. sets 𝑘0 = 1
2. iterates using Eq. (1) to produce a sequence 𝑘0 , 𝑘1 , 𝑘2 … , 𝑘𝑇
3. plots the sequence

The plots will be grouped into three subfigures


In each subfigure, two parameters are held fixed while another varies

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

# Allocate memory for time series


k = np.empty(50)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Trajectories with different α


δ = 0.1
s = 0.4
α = (0.25, 0.33, 0.45)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$")

axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different s


δ = 0.1
α = 0.33
s = (0.3, 0.4, 0.5)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ}$")

axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different δ


δ = (0.05, 0.1, 0.15)
α = 0.33
s = 0.4

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$")
11.3. AN EXAMPLE OF BAD CODE 149

axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)

plt.show()

True, the code more or less follows PEP8


At the same time, it’s very poorly structured
Let’s talk about why that’s the case, and what we can do about it
150 11. WRITING GOOD CODE

11.4 Good Coding Practice

There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesn’t matter too much
But if you are ambitious and want to produce useful things, you’ll write medium to large pro-
grams too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts

11.4.1 Don’t Use Magic Numbers

If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the
code
These kinds of numeric literals in the body of your code are sometimes called “magic num-
bers”
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:

• the meaning is much clearer throughout


• to alter the time series length, you only need to change one value

11.4.2 Don’t Repeat Yourself

The other mortal sin in the code snippet above is repetition


Blocks of logic (such as the loop to generate time series) are repeated with only minor
changes
This violates a fundamental tenet of programming: Don’t repeat yourself (DRY)

• Also called DIE (duplication is evil)

Yes, we realize that you can just cut and paste and change a few symbols
But as a programmer, your aim should be to automate repetition, not do it yourself
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong
If you want to know more, read the excellent summary found on this page
We’ll talk about how to avoid repetition below
11.4. GOOD CODING PRACTICE 151

11.4.3 Minimize Global Variables

Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient
Rookie programmers typically use global variables with abandon — as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since

• they can affect what happens in any part of your program


• they can be changed by any function

This makes it much harder to be certain about what some small part of a given piece of code
actually commands
Here’s a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them
(We’ll discuss how just below)
JIT Compilation
In fact, there’s now another good reason to avoid global variables
In scientific computing, we’re witnessing the rapid growth of just in time (JIT) compilation
JIT compilation can generate excellent performance for scripting languages like Python and
Julia
But the task of the compiler used for JIT compilation becomes much harder when many
global variables are present
(This is because data type instability hinders the generation of efficient machine code — we’ll
learn more about such topics later on)

11.4.4 Use Functions or Classes

Fortunately, we can easily avoid the evils of global variables and WET code

• WET stands for “we love typing” and is the opposite of DRY

We can do this by making frequent use of functions or classes


In fact, functions and classes are designed specifically to help us avoid shaming ourselves by
repeating code or excessive use of global variables
Which One, Functions or Classes?
Both can be useful, and in fact they work well with each other
We’ll learn more about these topics over time
(Personal preference is part of the story too)
What’s really important is that you use one or the other or both
152 11. WRITING GOOD CODE

11.5 Revisiting the Example

Here’s some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that

• global variables are quarantined by collecting together at the end, not the start of the
program
• magic numbers are avoided
• the loop at the end where the actual work is done is short and relatively simple

In [2]: from itertools import product

def plot_path(ax, αs, s_vals, δs, series_length=50):


"""
Add a time series plot to the axes ax for all given parameters.
"""
k = np.empty(series_length)

for (α, s, δ) in product(αs, s_vals, δs):


k[0] = 1
for t in range(series_length-1):
k[t+1] = s * k[t]**α + (1 - δ) * k[t]
ax.plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta = {δ}$")

ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Parameters (αs, s_vals, δs)


set_one = ([0.25, 0.33, 0.45], [0.4], [0.1])
set_two = ([0.33], [0.3, 0.4, 0.5], [0.1])
set_three = ([0.33], [0.4], [0.05, 0.1, 0.15])

for (ax, params) in zip(axes, (set_one, set_two, set_three)):


αs, s_vals, δs = params
plot_path(ax, αs, s_vals, δs)

plt.show()
11.6. SUMMARY 153

11.6 Summary

Writing decent code isn’t hard


It’s also fun and intellectually satisfying
We recommend that you cultivate good habits and style even when you write relatively short
programs
154 11. WRITING GOOD CODE
12

OOP II: Building Classes

12.1 Contents

• Overview 12.2

• OOP Review 12.3

• Defining Your Own Classes 12.4

• Special Methods 12.5

• Exercises 12.6

• Solutions 12.7

12.2 Overview

In an earlier lecture, we learned some foundations of object-oriented programming


The objectives of this lecture are

• cover OOP in more depth


• learn how to build our own objects, specialized to our needs

For example, you already know how to

• create lists, strings and other Python objects


• use their methods to modify their contents

So imagine now you want to write a program with consumers, who can

• hold and spend cash


• consume goods
• work and earn cash

A natural solution in Python would be to create consumers as objects with

155
156 12. OOP II: BUILDING CLASSES

• data, such as cash on hand


• methods, such as buy or work that affect this data

Python makes it easy to do this, by providing you with class definitions


Classes are blueprints that help you build objects according to your own specifications
It takes a little while to get used to the syntax so we’ll provide plenty of examples

12.3 OOP Review

OOP is supported in many languages:

• JAVA and Ruby are relatively pure OOP


• Python supports both procedural and object-oriented programming
• Fortran and MATLAB are mainly procedural, some OOP recently tacked on
• C is a procedural language, while C++ is C with OOP added on top

Let’s cover general OOP concepts before we specialize to Python

12.3.1 Key Concepts

As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into “objects”
An example is a Python list, which not only stores data but also knows how to sort itself, etc.

In [1]: x = [1, 5, 4]
x.sort()
x

Out[1]: [1, 4, 5]

As we now know, sort is a function that is “part of” the list object — and hence called a
method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers)
It describes

• What kind of data the class stores


• What methods it has for acting on these data

An object or instance is a realization of the class, created from the blueprint

• Each instance has its own unique data


• Methods set out in the class definition act on this (and other) data
12.4. DEFINING YOUR OWN CLASSES 157

In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via “dotted attribute notation”

• object_name.data
• object_name.method_name()

In the example

In [2]: x = [1, 5, 4]
x.sort()
x.__class__

Out[2]: list

• x is an object or instance, created from the definition for Python lists, but with its own
particular data
• x.sort() and x.__class__ are two attributes of x
• dir(x) can be used to view all the attributes of x

12.3.2 Why is OOP Useful?

OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure
For example,

• a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all players’ actions, and a timing protocol

These are all abstractions that collect together “objects” of the same “type”
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object

12.4 Defining Your Own Classes

Let’s build some simple classes to start off


158 12. OOP II: BUILDING CLASSES

12.4.1 Example: A Consumer Class

First, we’ll build a Consumer class with

• a wealth attribute that stores the consumer’s wealth (data)


• an earn method, where earn(y) increments the consumer’s wealth by y
• a spend method, where spend(x) either decreases wealth by x or returns an error if
insufficient funds exist

Admittedly a little contrived, this example of a class helps us internalize some new syntax
Here’s one implementation

In [3]: class Consumer:

def __init__(self, w):


"Initialize consumer with w dollars of wealth"
self.wealth = w

def earn(self, y):


"The consumer earns y dollars"
self.wealth += y

def spend(self, x):


"The consumer spends x dollars if feasible"
new_wealth = self.wealth - x
if new_wealth < 0:
print("Insufficent funds")
else:
self.wealth = new_wealth

There’s some special syntax here so let’s step through carefully

• The class keyword indicates that we are building a class

This class defines instance data wealth and three methods: __init__, earn and spend

• wealth is instance data because each consumer we create (each instance of the Con-
sumer class) will have its own separate wealth data

The ideas behind the earn and spend methods were discussed above
Both of these act on the instance data wealth
The __init__ method is a constructor method
Whenever we create an instance of the class, this method will be called automatically
Calling __init__ sets up a “namespace” to hold the instance data — more on this soon
We’ll also discuss the role of self just below
Usage
Here’s an example of usage

In [4]: c1 = Consumer(10) # Create instance with initial wealth 10


c1.spend(5)
c1.wealth
12.4. DEFINING YOUR OWN CLASSES 159

Out[4]: 5

In [5]: c1.earn(15)
c1.spend(100)

Insufficent funds

We can of course create multiple instances each with its own data

In [6]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth

Out[6]: 8

In [7]: c1.wealth

Out[7]: 10

In fact, each instance stores its data in a separate namespace dictionary

In [8]: c1.__dict__

Out[8]: {'wealth': 10}

In [9]: c2.__dict__

Out[9]: {'wealth': 8}

When we access or set attributes we’re actually just modifying the dictionary maintained by
the instance
Self
If you look at the Consumer class definition again you’ll see the word self throughout the
code
The rules with self are that

• Any instance data should be prepended with self

– e.g., the earn method references self.wealth rather than just wealth

• Any method defined within the class should have self as its first argument

– e.g., def earn(self, y) rather than just def earn(y)

• Any method referenced within the class should be called as self.method_name

There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section, we look at some more formal details related to classes and self
160 12. OOP II: BUILDING CLASSES

• You might wish to skip to the next section on first pass of this lecture
• You can return to these details after you’ve familiarized yourself with more examples

Methods actually live inside a class object formed when the interpreter reads the class defini-
tion

In [10]: print(Consumer.__dict__) # Show __dict__ attribute of class object

{'__module__': '__main__', '__init__': <function Consumer.__init__ at 0x7f89127b42f0>, 'earn': <function Consu

Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code

In [11]: c1 = Consumer(10)
c1.earn(10)
c1.wealth

Out[11]: 20

When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn
In fact, the following are equivalent

• c1.earn(10)
• Consumer.earn(c1, 10)

In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter

In [12]: def earn(self, y):


"The consumer earns y dollars"
self.wealth += y

The end result is that self is bound to the instance c1 inside the function call
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth

12.4.2 Example: The Solow Growth Model

For our next example, let’s write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita 𝑘𝑡 evolves according to the rule

𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (1)
1+𝑛

Here
12.4. DEFINING YOUR OWN CLASSES 161

• 𝑠 is an exogenously given savings rate


• 𝑧 is a productivity parameter
• 𝛼 is capital’s share of income
• 𝑛 is the population growth rate
• 𝛿 is the depreciation rate

The steady state of the model is the 𝑘 that solves Eq. (1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘
Here’s a class that implements this model
Some points of interest in the code are

• An instance maintains a record of its current capital stock in the variable self.k

• The h method implements the right-hand side of Eq. (1)

• The update method uses h to update capital as per Eq. (1)

– Notice how inside update the reference to the local method h is self.h

The methods steady_state and generate_sequence are fairly self-explanatory

In [13]: class Solow:


r"""
Implements the Solow growth model with the update rule

k_{t+1} = [(s z k^α_t) + (1 - δ)k_t] /(1 + n)

"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock

self.n, self.s, self.δ, self.α, self.z = n, s, δ, α, z


self.k = k

def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)

def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()

def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))

def generate_sequence(self, t):


"Generate and return a time series of length t"
path = []
for i in range(t):
path.append(self.k)
self.update()
return path
162 12. OOP II: BUILDING CLASSES

Here’s a little program that uses the class to compute time series from two different initial
conditions
The common steady state is also plotted for comparison

In [14]: import matplotlib.pyplot as plt


%matplotlib inline

s1 = Solow()
s2 = Solow(k=8.0)

T = 60
fig, ax = plt.subplots(figsize=(9, 6))

# Plot the common steady state value of capital


ax.plot([s1.steady_state()]*T, 'k-', label='steady state')

# Plot time series for each economy


for s in s1, s2:
lb = f'capital series from initial state {s.k}'
ax.plot(s.generate_sequence(T), 'o-', lw=2, alpha=0.6, label=lb)

ax.legend()
plt.show()

12.4.3 Example: A Market

Next, let’s write a class for a simple one good market where agents are price takers
The market consists of the following objects:

• A linear demand curve 𝑄 = 𝑎𝑑 − 𝑏𝑑 𝑝


• A linear supply curve 𝑄 = 𝑎𝑧 + 𝑏𝑧 (𝑝 − 𝑡)
12.4. DEFINING YOUR OWN CLASSES 163

Here

• 𝑝 is price paid by the consumer, 𝑄 is quantity and 𝑡 is a per-unit tax


• Other symbols are demand and supply parameters

The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus
Here’s our implementation

In [15]: from scipy.integrate import quad

class Market:

def __init__(self, ad, bd, az, bz, tax):


"""
Set up market parameters. All parameters are scalars. See
https://lectures.quantecon.org/py/python_oop.html for interpretation.

"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')

def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)

def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()

def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()

def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area

def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()

def inverse_demand(self, x):


"Compute inverse demand"
return self.ad / self.bd - (1 / self.bd)* x

def inverse_supply(self, x):


"Compute inverse supply curve"
return -(self.az / self.bz) + (1 / self.bz) * x + self.tax

def inverse_supply_no_tax(self, x):


"Compute inverse supply curve without tax"
return -(self.az / self.bz) + (1 / self.bz) * x

Here’s a sample of usage

In [16]: baseline_params = 15, .5, -2, .5, 3


m = Market(*baseline_params)
print("equilibrium price = ", m.price())
164 12. OOP II: BUILDING CLASSES

equilibrium price = 18.5

In [17]: print("consumer surplus = ", m.consumer_surp())

consumer surplus = 33.0625

Here’s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes

In [18]: import numpy as np

# Baseline ad, bd, az, bz, tax


baseline_params = 15, .5, -2, .5, 3
m = Market(*baseline_params)

q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)

fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()

The next program provides a function that

• takes an instance of Market as a parameter


12.4. DEFINING YOUR OWN CLASSES 165

• computes dead weight loss from the imposition of the tax

In [19]: def deadw(m):


"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2

Here’s an example of usage

In [20]: baseline_params = 15, .5, -2, .5, 3


m = Market(*baseline_params)
deadw(m) # Show deadweight loss

Out[20]: 1.125

12.4.4 Example: Chaos

Let’s look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map

𝑥𝑡+1 = 𝑟𝑥𝑡 (1 − 𝑥𝑡 ), 𝑥0 ∈ [0, 1], 𝑟 ∈ [0, 4] (2)

Let’s write a class for generating time series from this model
Here’s one implementation

In [21]: class Chaos:


"""
Models the dynamical system with :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r

def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)

def generate_sequence(self, n):


"Generate and return a sequence of length n."
path = []
for i in range(n):
path.append(self.x)
self.update()
return path

Here’s an example of usage

In [22]: ch = Chaos(0.1, 4.0) # x0 = 0.1 and r = 0.4


ch.generate_sequence(5) # First 5 iterates

Out[22]: [0.1, 0.36000000000000004, 0.9216, 0.28901376000000006, 0.8219392261226498]


166 12. OOP II: BUILDING CLASSES

This piece of code plots a longer trajectory

In [23]: ch = Chaos(0.1, 4.0)


ts_length = 250

fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()

The next piece of code provides a bifurcation diagram

In [24]: fig, ax = plt.subplots()


ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
r = r + 0.005

ax.set_xlabel('$r$', fontsize=16)
plt.show()
12.5. SPECIAL METHODS 167

On the horizontal axis is the parameter 𝑟 in Eq. (2)


The vertical axis is the state space [0, 1]
For each 𝑟 we compute a long time series and then plot the tail (the last 50 points)
The tail of the sequence shows us where the trajectory concentrates after settling down to
some kind of steady state, if a steady state exists
Whether it settles down, and the character of the steady state to which it does settle down,
depend on the value of 𝑟
For 𝑟 between about 2.5 and 3, the time series settles into a single fixed point plotted on the
vertical axis
For 𝑟 between about 3 and 3.45, the time series settles down to oscillating between the two
values plotted on the vertical axis
For 𝑟 a little bit higher than 3.45, the time series settles down to oscillating among the four
values plotted on the vertical axis
Notice that there is no value of 𝑟 that leads to a steady state oscillating among three values

12.5 Special Methods

Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function

In [25]: x = (10, 20)


len(x)
168 12. OOP II: BUILDING CLASSES

Out[25]: 2

If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method

In [26]: class Foo:

def __len__(self):
return 42

Now we get

In [27]: f = Foo()
len(f)

Out[27]: 42

A special method we will use regularly is the __call__ method


This method can be used to make your instances callable, just like functions

In [28]: class Foo:

def __call__(self, x):


return x + 42

After running we get

In [29]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)

Out[29]: 50

Exercise 1 provides a more useful example

12.6 Exercises

12.6.1 Exercise 1

The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is
defined as

1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ R) (3)
𝑛 𝑖=1

Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥)
is the fraction of the sample that falls below 𝑥
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 con-
verges to the true distribution function 𝐹
Implement 𝐹𝑛 as a class called ECDF, where
12.7. SOLUTIONS 169

• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥

Your code should work as follows (modulo randomness)

from random import uniform

samples = [uniform(0, 1) for i in range(10)]


F = ECDF(samples)
F(0.5) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]


F(0.5)

Aim for clarity, not efficiency

12.6.2 Exercise 2

In an earlier exercise, you wrote a function for evaluating polynomials


This exercise is an extension, where the task is to build a simple class called Polynomial for
representing and manipulating polynomial functions such as

𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ R) (4)
𝑛=0

The instance data for the class Polynomial will be the coefficients (in the case of Eq. (4),
the numbers 𝑎0 , … , 𝑎𝑁 )
Provide methods that

1. Evaluate the polynomial Eq. (4), returning 𝑝(𝑥) for any 𝑥


2. Differentiate the polynomial, replacing the original coefficients with those of its deriva-
tive 𝑝′

Avoid using any import statements

12.7 Solutions

12.7.1 Exercise 1
In [30]: class ECDF:

def __init__(self, observations):


self.observations = observations

def __call__(self, x):


counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
170 12. OOP II: BUILDING CLASSES

In [31]: # == test == #

from random import uniform

samples = [uniform(0, 1) for i in range(10)]


F = ECDF(samples)

print(F(0.5)) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]

print(F(0.5))

0.4
0.484

12.7.2 Exercise 2
In [32]: class Polynomial:

def __init__(self, coefficients):


"""
Creates an instance of the Polynomial class representing

p(x) = a_0 x^0 + ... + a_N x^N,

where a_i = coefficients[i].


"""
self.coefficients = coefficients

def __call__(self, x):


"Evaluate the polynomial at x."
y = 0
for i, a in enumerate(self.coefficients):
y += a * x**i
return y

def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
13

OOP III: Samuelson Multiplier


Accelerator

13.1 Contents

• Overview 13.2
• Details 13.3
• Implementation 13.4
• Stochastic Shocks 13.5
• Government Spending 13.6
• Wrapping Everything Into a Class 13.7
• Using the LinearStateSpace Class 13.8
• Pure Multiplier Model 13.9
• Summary 13.10

Co-author: Natasha Watkins


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

13.2 Overview

This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [115]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to

• provide a more detailed example of OOP and classes


• review a famous model
• review linear difference equations, both deterministic and stochastic

171
172 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.2.1 Samuelson’s Model

Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:

• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply

(To read about linear difference equations see here or chapter IX of [118])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output
Possible dynamic properties include

• smooth convergence to a constant level of output


• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode

Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles
(To read about stochastic linear difference equations see chapter XI of [118])

13.3 Details

Let’s assume that

• {𝐺𝑡 } is a sequence of levels of government expenditures – we’ll start by setting 𝐺𝑡 = 𝐺


for all 𝑡

• {𝐶𝑡 } is a sequence of levels of aggregate consumption expenditures, a key endogenous


variable in the model

• {𝐼𝑡 } is a sequence of rates of investment, another key endogenous variable

• {𝑌𝑡 } is a sequence of levels of national income, yet another endogenous variable

• 𝑎 is the marginal propensity to consume in the Keynesian consumption function 𝐶𝑡 =


𝑎𝑌𝑡−1 + 𝛾
13.3. DETAILS 173

• 𝑏 is the “accelerator coefficient” in the “investment accelerator” 𝐼_𝑡 = 𝑏(𝑌 _𝑡 − 1 −


𝑌 _𝑡 − 2)

• {𝜖𝑡 } is an IID sequence standard normal random variables

• 𝜎 ≥ 0 is a “volatility” parameter — setting 𝜎 = 0 recovers the non-stochastic case that


we’ll start with

The model combines the consumption function

𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)

with the investment accelerator

𝐼𝑡 = 𝑏(𝑌𝑡−1 − 𝑌𝑡−2 ) (2)

and the national income identity

𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)

• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation


Eq. (1) asserts that people consume a fraction of math:a in (0,1) of each additional dol-
lar of income
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation Eq. (2) asserts
that people invest in physical capital when income is increasing and disinvest when it is
decreasing

Equations Eq. (1), Eq. (2), and Eq. (3) imply the following second-order linear difference
equation for national income:

𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )

or

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2 + (𝛾 + 𝐺𝑡 ) (4)

where 𝜌1 = (𝑎 + 𝑏) and 𝜌2 = −𝑏
To complete the model, we require two initial conditions
If the model is to generate time series for 𝑡 = 0, … , 𝑇 , we require initial values

̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2

We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large

We are interested in studying

• the transient fluctuations in 𝑌𝑡 as it converges to its steady state level


174 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

• the rate at which it converges to a steady state level

The deterministic version of the model described so far — meaning that no random shocks
hit aggregate demand — has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand

13.3.1 Stochastic Version of the Model

We create a random or stochastic version of the model by adding a random process of


shocks or disturbances {𝜎𝜖𝑡 } to the right side of equation Eq. (4), leading to the second-
order scalar linear stochastic difference equation:

𝑌𝑡 = 𝐺𝑡 + 𝑎(1 − 𝑏)𝑌𝑡−1 − 𝑎𝑏𝑌𝑡−2 + 𝜎𝜖𝑡 (5)

13.3.2 Mathematical Analysis of the Model

To get started, let’s set 𝐺𝑡 ≡ 0, 𝜎 = 0, and 𝛾 = 0


Then we can write equation Eq. (5) as

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2

or

𝑌𝑡+2 − 𝜌1 𝑌𝑡+1 − 𝜌2 𝑌𝑡 = 0 (6)

To discover the properties of the solution of Eq. (6), it is useful first to form the characteris-
tic polynomial for Eq. (6):

𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)

where 𝑧 is possibly a complex number


We want to find the two zeros (a.k.a. roots) – namely 𝜆1 , 𝜆2 – of the characteristic polyno-
mial
These are two special values of 𝑧, say 𝑧 = 𝜆1 and 𝑧 = 𝜆2 , such that if we set 𝑧 equal to one of
these values in expression Eq. (7), the characteristic polynomial Eq. (7) equals zero:

𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)

Equation Eq. (8) is said to factor the characteristic polynomial


When the roots are complex, they will occur as a complex conjugate pair
When the roots are complex, it is convenient to represent them in the polar form

𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
13.3. DETAILS 175

where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase
These can also be represented as

𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))

𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))

(To read about the polar form, see here)


Given initial conditions 𝑌−1 , 𝑌−2 , we want to generate a solution of the difference equation
Eq. (6)
It can be represented as

𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2

where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2
When the roots are complex, it is useful to pursue the following calculations
Notice that

𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)

The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms

𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃

So we can write

𝑌𝑡 = 𝑣𝑒𝑖𝜃 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑣𝑒−𝑖𝜃 𝑟𝑡 𝑒−𝑖𝜔𝑡


= 𝑣𝑟𝑡 [𝑒𝑖(𝜔𝑡+𝜃) + 𝑒−𝑖(𝜔𝑡+𝜃) ]
= 2𝑣𝑟𝑡 cos(𝜔𝑡 + 𝜃)

where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟

We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles
(Draw a cosine function to convince yourself of this please)
176 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Remark: Following [115], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:

|𝜆𝑗 | < 1 for 𝑗 = 1, 2

Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model

13.3.3 Things This Lecture Does

We write a function to generate simulations of a {𝑌𝑡 } sequence as a function of time


The function requires that we put in initial conditions for 𝑌−1 , 𝑌−2
The function checks that 𝑎, 𝑏 are set so that 𝜆1 , 𝜆2 are less than
unity in absolute value (also called “modulus”)
The function also tells us whether the roots are complex, and, if they are complex, returns
both their real and complex parts
If the roots are both real, the function returns their values
We use our function written to simulate paths that are stochastic (when 𝜎 > 0)
We have written the function in a way that allows us to input {𝐺𝑡 } paths of a few simple
forms, e.g.,

• one time jumps in 𝐺 at some time


• a permanent jump in 𝐺 that occurs at some time

We proceed to use the Samuelson multiplier-accelerator model as a laboratory to make a sim-


ple OOP example
The “state” that determines next period’s 𝑌𝑡+1 is now not just the current value 𝑌𝑡 but also
the once lagged value 𝑌𝑡−1
This involves a little more bookkeeping than is required in the Solow model class definition
We use the Samuelson multiplier-accelerator model as a vehicle for teaching how we can grad-
ually add more features to the class
We want to have a method in the class that automatically generates a simulation, either non-
stochastic (𝜎 = 0) or stochastic (𝜎 > 0)
We also show how to map the Samuelson model into a simple instance of the Lin-
earStateSpace class described here
We can use a LinearStateSpace instance to do various things that we did above with our
homemade function and class
Among other things, we show by example that the eigenvalues of the matrix 𝐴 that we use to
form the instance of the LinearStateSpace class for the Samuelson model equal the roots
of the characteristic polynomial Eq. (7) for the Samuelson multiplier accelerator model
13.4. IMPLEMENTATION 177

Here is the formula for the matrix 𝐴 in the linear state space system in the case that govern-
ment expenditures are a constant 𝐺:

1 0 0
𝐴 = ⎢𝛾 + 𝐺 𝜌1 𝜌2 ⎤


⎣ 0 1 0 ⎦

13.4 Implementation

We’ll start by drawing an informative graph from page 189 of [118]

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

def param_plot():

"""this function creates the graph on page 189 of Sargent Macroeconomic Theory, second edition, 19

fig, ax = plt.subplots(figsize=(10, 6))


ax.set_aspect('equal')

# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])

# Set axis labels


ax.set(xticks=[], yticks=[])
ax.set_xlabel(r'$\rho_2$', fontsize=16)
ax.xaxis.set_label_position('top')
ax.set_ylabel(r'$\rho_1$', rotation=0, fontsize=16)
ax.yaxis.set_label_position('right')

# Draw (t1, t2) points


ρ1 = np.linspace(-2, 2, 100)
ax.plot(ρ1, -abs(ρ1) + 1, c='black')
ax.plot(ρ1, np.ones_like(ρ1) * -1, c='black')
ax.plot(ρ1, -(ρ1**2 / 4), c='black')

# Turn normal axes off


for spine in ['left', 'bottom', 'top', 'right']:
ax.spines[spine].set_visible(False)

# Add arrows to represent axes


axes_arrows = {'arrowstyle': '<|-|>', 'lw': 1.3}
ax.annotate('', xy=(xmin, 0), xytext=(xmax, 0), arrowprops=axes_arrows)
ax.annotate('', xy=(0, ymin), xytext=(0, ymax), arrowprops=axes_arrows)

# Annotate the plot with equations


plot_arrowsl = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=-0.2"}
plot_arrowsr = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=0.2"}
ax.annotate(r'$\rho_1 + \rho_2 < 1$', xy=(0.5, 0.3), xytext=(0.8, 0.6),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_1 + \rho_2 = 1$', xy=(0.38, 0.6), xytext=(0.6, 0.8),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_2 < 1 + \rho_1$', xy=(-0.5, 0.3), xytext=(-1.3, 0.6),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = 1 + \rho_1$', xy=(-0.38, 0.6), xytext=(-1, 0.8),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = -1$', xy=(1.5, -1), xytext=(1.8, -1.3),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 = 0$', xy=(1.15, -0.35),
xytext=(1.5, -0.3), arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 < 0$', xy=(1.4, -0.7),
xytext=(1.8, -0.6), arrowprops=plot_arrowsr, fontsize='12')
178 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

# Label categories of solutions


ax.text(1.5, 1, 'Explosive\n growth', ha='center', fontsize=16)
ax.text(-1.5, 1, 'Explosive\n oscillations', ha='center', fontsize=16)
ax.text(0.05, -1.5, 'Explosive oscillations', ha='center', fontsize=16)
ax.text(0.09, -0.5, 'Damped oscillations', ha='center', fontsize=16)

# Add small marker to y-axis


ax.axhline(y=1.005, xmin=0.495, xmax=0.505, c='black')
ax.text(-0.12, -1.12, '-1', fontsize=10)
ax.text(-0.12, 0.98, '1', fontsize=10)

return fig

param_plot()
plt.show()

The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:

• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles

Later we’ll present the graph with a red mark showing the particular point implied by the
setting of (𝑎, 𝑏)
13.4. IMPLEMENTATION 179

13.4.1 Function to Describe Implications of Characteristic Polynomial


In [3]: def categorize_solution(ρ1, ρ2):
"""this function takes values of ρ1 and ρ2 and uses them to classify the type of solution"""

discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; therefore get smooth convergence

In [4]: ### Test the categorize_solution function

categorize_solution(1.3, -.4)

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state

13.4.2 Function for Plotting Paths

A useful function for our work below is

In [5]: def plot_y(function=None):


"""function plots path of Y_t"""
plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()

13.4.3 Manual or “by hand” Root Calculations

The following function calculates roots of the characteristic polynomial using high school al-
gebra
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set

In [6]: from cmath import sqrt

##=== This is a 'manual' method ===#

def y_nonstochastic(y_0=100, y_1=80, α=.92, β=.5, γ=10, n=80):

"""Takes values of parameters and computes the roots of characteristic polynomial.


It tells whether they are real or complex and whether they are less than unity in absolute valu
It also computes a simulation of length n starting from the two given initial conditions for na

roots = []

ρ1 = α + β
ρ2 = -β

print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')

discriminant = ρ1 ** 2 + 4 * ρ2
180 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))

if all(abs(root) < 1 for root in roots):


print('Absolute values of roots are less than one')
else:
print('Absolute values of roots are not less than one')

def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

y_t = [y_0, y_1]

for t in range(2, n):


y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 181

13.4.4 Reverse-Engineering Parameters to Generate Damped Cycles

The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form

𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)

• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots

In [7]: ### code to reverse-engineer a cycle


### y_t = r^t (c_1 cos(� t) + c2 sin(� t))
###

import cmath
import math

def f(r, �):


"""
Takes modulus r and angle � of complex number r exp(j �)
and creates ρ1 and ρ2 of characteristic polynomial for which
r exp(j �) and r exp(- j �) are complex roots.

Returns the multiplier coefficient a and the accelerator coefficient b


that verifies those roots.
"""
g1 = cmath.rect(r, �) # Generate two complex roots
g2 = cmath.rect(r, -�)
ρ1 = g1 + g2 # Implied ρ1, ρ2
ρ2 = -g1 * g2
b = -ρ2 # Reverse-engineer a and b that validate these
a = ρ1 - b
return ρ1, ρ2, a, b

## Now let's use the function in an example


## Here are the example parameters

r = .95
period = 10 # Length of cycle in units of time
� = 2 * math.pi/period

## Apply the function

ρ1, ρ2, a, b = f(r, �)

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)

In [8]: ## Print the real components of ρ1 and ρ2

ρ1 = ρ1.real
ρ2 = ρ2.real

ρ1, ρ2

Out[8]: (1.5371322893124, -0.9024999999999999)


182 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.5 Root Finding Using Numpy

Here we’ll use numpy to compute the roots of the characteristic polynomial

In [9]: r1, r2 = np.roots([1, -ρ1, -ρ2])

p1 = cmath.polar(r1)
p2 = cmath.polar(r2)

print(f"r, � = {r}, {�}")


print(f"p1, p2 = {p1}, {p2}")
# print(f"g1, g2 = {g1}, {g2}")

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

r, � = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124, -0.9024999999999999

In [10]: ##=== This method uses numpy to calculate roots ===#

def y_nonstochastic(y_0=100, y_1=80, α=.9, β=.8, γ=10, n=80):

""" Rather than computing the roots of the characteristic polynomial by hand as we did earlier, t
enlists numpy to do the work for us """

# Useful constants
ρ1 = α + β
ρ2 = -β

categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(f'Roots are {roots}')

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Define transition equation


def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
13.4. IMPLEMENTATION 183

Roots are less than one

13.4.6 Reverse-Engineered Complex Roots: Example

The next cell studies the implications of reverse-engineered complex roots


We’ll generate an undamped cycle of period 10

In [11]: r = 1 # generates undamped, nonexplosive cycles

period = 10 # length of cycle in units of time


� = 2 * math.pi/period

## Apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, �)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")

ytemp = y_nonstochastic(α=a, β=b, y_0=20, y_1=30)


plot_y(ytemp)

a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
184 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.7 Digression: Using Sympy to Find Roots

We can also use sympy to compute analytic formulas for the roots

In [12]: import sympy


from sympy import Symbol, init_printing
init_printing()

r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")

sympy.solve(z**2 - r1*z - r2, z)

Out[12]:

𝜌1 √𝜌12 + 4𝜌2 𝜌1 √𝜌12 + 4𝜌2


[ − , + ]
2 2 2 2

𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2

In [13]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b

sympy.solve(z**2 - r1*z - r2, z)

Out[13]:
13.5. STOCHASTIC SHOCKS 185

𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽 𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽


[ + − , + + ]
2 2 2 2 2 2

𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2

13.5 Stochastic Shocks

Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand

In [14]: def y_stochastic(y_0=0, y_1=0, α=0.8, β=0.2, γ=10, n=100, σ=5):

"""This function takes parameters of a stochastic version of the model and proceeds to analyze
the roots of the characteristic polynomial and also generate a simulation"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
� = np.random.normal(0, 1, n)

# Define transition equation


def transition(x, t): return ρ1 * \
x[t - 1] + ρ2 * x[t - 2] + γ + σ * �[t]

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_stochastic())

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
186 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots

In [15]: r = .97

period = 10 # length of cycle in units of time


� = 2 * math.pi/period

### apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, �)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")


plot_y(y_stochastic(y_0=40, y_1 = 42, α=a, β=b, σ=2, n=100))

a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
13.6. GOVERNMENT SPENDING 187

13.6 Government Spending

This function computes a response to either a permanent or one-off increase in government


expenditures

In [16]: def y_stochastic_g(y_0=20,


y_1=20,
α=0.8,
β=0.2,
γ=10,
n=100,
σ=2,
g=0,
g_t=0,
duration='permanent'):

"""This program computes a response to a permanent increase in government expenditures that occur
at time 20"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
188 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Roots are less than one')


else:
print('Roots are not less than one')

# Generate shocks
� = np.random.normal(0, 1, n)

def transition(x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed


if σ == 0:
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g

# Stochastic
else:
� = np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * �[t]

# Create list and set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):

# No government spending
if g == 0:
y_t.append(transition(y_t, t))

# Government spending (no shock)


elif g != 0 and duration == None:
y_t.append(transition(y_t, t))

# Permanent government spending shock


elif duration == 'permanent':
if t < g_t:
y_t.append(transition(y_t, t, g=0))
else:
y_t.append(transition(y_t, t, g=g))

# One-off government spending shock


elif duration == 'one-off':
if t == g_t:
y_t.append(transition(y_t, t, g=g))
else:
y_t.append(transition(y_t, t, g=0))
return y_t

A permanent government spending shock can be simulated as follows

In [17]: plot_y(y_stochastic_g(g=10, g_t=20, duration='permanent'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.6. GOVERNMENT SPENDING 189

We can also see the response to a one time jump in government expenditures

In [18]: plot_y(y_stochastic_g(g=500, g_t=50, duration='one-off'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
190 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.7 Wrapping Everything Into a Class

Up to now, we have written functions to do the work


Now we’ll roll up our sleeves and write a Python class called Samuelson for the Samuelson
model

In [19]: class Samuelson():

r"""This class represents the Samuelson model, otherwise known as the


multiple-accelerator model. The model combines the Keynesian multiplier
with the accelerator theory of investment.

The path of output is governed by a linear second-order difference equation

.. math::

Y_t = + \alpha (1 + \beta) Y_{t-1} - \alpha \beta Y_{t-2}

Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.

"""

def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):

self.y_0, self.y_1, self.α, self.β = y_0, y_1, α, β


self.n, self.g, self.g_t, self.duration = n, g, g_t, duration
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
self.roots = np.roots([1, -self.ρ1, -self.ρ2])

def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
13.7. WRAPPING EVERYTHING INTO A CLASS 191

def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True

def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'

def _transition(self, x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed


if self.σ == 0:
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g

# Stochastic
else:
� = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g + self.σ * �[t]

def generate_series(self):

# Create list and set initial conditions


y_t = [self.y_0, self.y_1]

# Generate y_t series


for t in range(2, self.n):

# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))

# Government spending (no shock)


elif self.g != 0 and self.duration == None:
y_t.append(self._transition(y_t, t))

# Permanent government spending shock


elif self.duration == 'permanent':
if t < self.g_t:
y_t.append(self._transition(y_t, t, g=0))
else:
y_t.append(self._transition(y_t, t, g=self.g))

# One-off government spending shock


elif self.duration == 'one-off':
if t == self.g_t:
y_t.append(self._transition(y_t, t, g=self.g))
else:
y_t.append(self._transition(y_t, t, g=0))
return y_t

def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')

if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')

if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
192 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Non-stochastic series')

if self.g != 0:
print('Government spending equal to ' + str(self.g))

if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))

def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()

# Add parameter values to plot


paramstr = f'$\\alpha={self.α:.2f}$ \n $\\beta={self.β:.2f}$ \n $\\gamma={self.γ:.2f}$ \n \
$\\sigma={self.σ:.2f}$ \n $\\rho_1={self.ρ1:.2f}$ \n $\\rho_2={self.ρ2:.2f}$'
props = dict(fc='white', pad=10, alpha=0.5)
ax.text(0.87, 0.05, paramstr, transform=ax.transAxes,
fontsize=12, bbox=props, va='bottom')

return fig

def param_plot(self):

# Uses the param_plot() function defined earlier (it is then able


# to be used standalone or as part of the model)

fig = param_plot()
ax = fig.gca()

# Add λ values to legend


for i, root in enumerate(self.roots):
if isinstance(root, complex):
operator = ['+', ''] # Need to fill operator for positive as string is split apart
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f} {operator[i]} {sam.roots[i].imag:
else:
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f}$'
ax.scatter(0, 0, 0, label=label) # dummy to add to legend

# Add ρ pair to plot


ax.scatter(self.ρ1, self.ρ2, 100, 'red', '+', label=r'$(\ \rho_1, \ \rho_2 \ )$', zorder=5)

plt.legend(fontsize=12, loc=3)

return fig

13.7.1 Illustration of Samuelson Class

Now we’ll put our Samuelson class to work on an example

In [20]: sam = Samuelson(α=0.8, β=0.5, σ=2, g=10, g_t=20, duration='permanent')


sam.summary()

Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20

In [21]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 193

13.7.2 Using the Graph

We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed
The red + sign shows the location of the roots

In [22]: sam.param_plot()
plt.show()
194 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8 Using the LinearStateSpace Class

It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class

In [23]: from quantecon import LinearStateSpace

""" This script maps the Samuelson model in the the ``LinearStateSpace`` class"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = -β
γ = 10
σ = 1
g = 10
n = 100

A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]

G = [[γ + g, ρ1, ρ2], # this is Y_{t+1}


[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

μ_0 = [1, 100, 100]


C = np.zeros((3,1))
C[1] = σ # stochastic

sam_t = LinearStateSpace(A, C, G, mu_0=μ_0)

x, y = sam_t.simulate(ts_length=n)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[-1].set_xlabel('Iteration')

plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 195

13.8.1 Other Methods in the LinearStateSpace Class

Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class

In [24]: imres = sam_t.impulse_response()


imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape

Out[24]:

(2, 6, 1)

(2, 6, 1)

Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴

In [25]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)

[0.85+0.42130749j 0.85-0.42130749j 1. +0.j ]


196 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8.2 Inheriting Methods from LinearStateSpace

We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use

In [26]: class SamuelsonLSS(LinearStateSpace):

"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):

self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ

# Define intial conditions


self.μ_0 = [1, y_0, y_1]

self.ρ1 = α + β
self.ρ2 = -β

# Define transition matrix


self.A = [[1, 0, 0],
[γ + g, self.ρ1, self.ρ2],
[0, 1, 0]]

# Define output matrix


self.G = [[γ + g, self.ρ1, self.ρ2], # this is Y_{t+1}
[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

self.C = np.zeros((3, 1))


self.C[1] = σ # stochastic

# Initialize LSS with parameters from Samuelson model


LinearStateSpace.__init__(self, self.A, self.C, self.G, mu_0=self.μ_0)

def plot_simulation(self, ts_length=100, stationary=True):

# Temporarily store original parameters


temp_μ = self.μ_0
temp_Σ = self.Sigma_0

# Set distribution parameters equal to their stationary values for simulation


if stationary == True:
try:
self.μ_x, self.μ_y, self.σ_x, self.σ_y = self.stationary_distributions()
self.μ_0 = self.μ_y
self.Σ_0 = self.σ_y
# Exception where no convergence achieved when calculating stationary distributions
except ValueError:
print('Stationary distribution does not exist')

x, y = self.simulate(ts_length)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()
13.8. USING THE LINEARSTATESPACE CLASS 197

axes[-1].set_xlabel('Iteration')

# Reset distribution parameters to their initial values


self.μ_0 = temp_μ
self.Sigma_0 = temp_Σ

return fig

def plot_irf(self, j=5):

x, y = self.impulse_response(j)

# Reshape into 3 x j matrix for plotting purposes


yimf = np.array(y).flatten().reshape(j+1, 3).T

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


labels = ['$Y_t$', '$C_t$', '$I_t$']
colors = ['darkblue', 'red', 'purple']
for ax, series, label, color in zip(axes, yimf, labels, colors):
ax.plot(series, color=color)
ax.set(xlim=(0, j))
ax.set_ylabel(label, rotation=0, fontsize=14, labelpad=10)
ax.grid()

axes[0].set_title('Impulse Response Functions')


axes[-1].set_xlabel('Iteration')

return fig

def multipliers(self, j=5):


x, y = self.impulse_response(j)
return np.sum(np.array(y).flatten().reshape(j+1, 3), axis=0)

13.8.3 Illustrations

Let’s show how we can use the SamuelsonLSS

In [27]: samlss = SamuelsonLSS()

In [28]: samlss.plot_simulation(100, stationary=False)


plt.show()
198 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [29]: samlss.plot_simulation(100, stationary=True)


plt.show()
13.9. PURE MULTIPLIER MODEL 199

In [30]: samlss.plot_irf(100)
plt.show()

In [31]: samlss.multipliers()

Out[31]: array([7.414389, 6.835896, 0.578493])

13.9 Pure Multiplier Model

Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model

• the absence of cycles gives an idea about why Samuelson included the accelerator

In [32]: pure_multiplier = SamuelsonLSS(α=0.95, β=0)

In [33]: pure_multiplier.plot_simulation()

Stationary distribution does not exist

Out[33]:
200 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [34]: pure_multiplier = SamuelsonLSS(α=0.8, β=0)

In [35]: pure_multiplier.plot_simulation()
13.9. PURE MULTIPLIER MODEL 201

Out[35]:

In [36]: pure_multiplier.plot_irf(100)
202 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Out[36]:
13.10. SUMMARY 203

13.10 Summary

In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [115]
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
204 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
14

More Language Features

14.1 Contents

• Overview 14.2
• Iterables and Iterators 14.3
• Names and Name Resolution 14.4
• Handling Errors 14.5
• Decorators and Descriptors 14.6
• Generators 14.7
• Recursive Function Calls 14.8
• Exercises 14.9
• Solutions 14.10

14.2 Overview

With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it
It’s here

1. as a reference, so we can link back to it when required, and


2. for those who have worked through a number of applications, and now want to learn
more about the Python language

A variety of topics are treated in the lecture, including generators, exceptions and descriptors

14.3 Iterables and Iterators

We’ve already said something about iterating in Python


Now let’s look more closely at how it all works, focusing in Python’s implementation of the
for loop

205
206 14. MORE LANGUAGE FEATURES

14.3.1 Iterators

Iterators are a uniform interface to stepping through elements in a collection


Here we’ll talk about using iterators—later we’ll learn how to build our own
Formally, an iterator is an object with a __next__ method
For example, file objects are iterators
To see this, let’s have another look at the US cities data, which is written to the present
working directory in the following cell

In [1]: %%file us_cities.txt


new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Writing us_cities.txt

In [2]: f = open('us_cities.txt')
f.__next__()

Out[2]: 'new york: 8244910\n'

In [3]: f.__next__()

Out[3]: 'los angeles: 3819702\n'

We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls
this method

In [4]: next(f)

Out[4]: 'chicago: 2707120\n'

The objects returned by enumerate() are also iterators

In [5]: e = enumerate(['foo', 'bar'])


next(e)

Out[5]: (0, 'foo')

In [6]: next(e)

Out[6]: (1, 'bar')


14.3. ITERABLES AND ITERATORS 207

as are the reader objects from the csv module


Let’s create a small csv file that contains data from the NIKKEI index

In [7]: %%file test_table.csv


Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83

Writing test_table.csv

In [8]: from csv import reader

f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)

Out[8]: ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [9]: next(nikkei_data)

Out[9]: ['2009-05-21', '9280.35', '9286.35', '9189.92', '9264.15', '133200', '9264.15']

14.3.2 Iterators in For Loops

All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write

for x in iterator:
<code block>

then the interpreter

• calls iterator.___next___() and binds x to the result


• executes the code block
• repeats until a StopIteration error occurs

So now you know how this magical looking syntax works

f = open('somefile.txt', 'r')
for line in f:
# do something

The interpreter just keeps

1. calling f.__next__() and binding line to the result


2. executing the body of the loop

This continues until a StopIteration error occurs


208 14. MORE LANGUAGE FEATURES

14.3.3 Iterables

You already know that we can put a Python list to the right of in in a for loop

In [10]: for i in ['spam', 'eggs']:


print(i)

spam
eggs

So does that mean that a list is an iterator?


The answer is no

In [11]: x = ['foo', 'bar']


type(x)

Out[11]: list

In [12]: next(x)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)

TypeError: 'list' object is not an iterator

So why can we iterate over a list in a for loop?


The reason is that a list is iterable (as opposed to an iterator)
Formally, an object is iterable if it can be converted to an iterator using the built-in function
iter()
Lists are one such object

In [13]: x = ['foo', 'bar']


type(x)

Out[13]: list

In [14]: y = iter(x)
type(y)

Out[14]: list_iterator

In [15]: next(y)

Out[15]: 'foo'

In [16]: next(y)
14.3. ITERABLES AND ITERATORS 209

Out[16]: 'bar'

In [17]: next(y)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)

StopIteration:

Many other objects are iterable, such as dictionaries and tuples


Of course, not all objects are iterable

In [18]: iter(42)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)

TypeError: 'int' object is not iterable

To conclude our discussion of for loops

• for loops work on either iterators or iterables


• In the second case, the iterable is converted into an iterator before the loop starts

14.3.4 Iterators and built-ins

Some built-in functions that act on sequences also work with iterables

• max(), min(), sum(), all(), any()

For example

In [19]: x = [10, -10]


max(x)

Out[19]: 10

In [20]: y = iter(x)
type(y)

Out[20]: list_iterator
210 14. MORE LANGUAGE FEATURES

In [21]: max(y)

Out[21]: 10

One thing to remember about iterators is that they are depleted by use

In [22]: x = [10, -10]


y = iter(x)
max(y)

Out[22]: 10

In [23]: max(y)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)

ValueError: max() arg is an empty sequence

14.4 Names and Name Resolution

14.4.1 Variable Names in Python

Consider the Python statement

In [24]: x = 42

We now know that when this statement is executed, Python creates an object of type int in
your computer’s memory, containing

• the value 42
• some associated attributes

But what is x itself?


In Python, x is called a name, and the statement x = 42 binds the name x to the integer
object we have just discussed
Under the hood, this process of binding names to objects is implemented as a dictionary—
more about this in a moment
There is no problem binding two or more names to the one object, regardless of what that
object is

In [25]: def f(string): # Create a function called f


print(string) # that prints any string it's passed

g = f
id(g) == id(f)
14.4. NAMES AND NAME RESOLUTION 211

Out[25]: True

In [26]: g('test')

test

In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then
rebound to another

In [27]: x = 'foo'
id(x)

Out[27]: 139979150881488

In [28]: x = 'bar' # No names bound to the first object

What happens here is that the first object is garbage collected


In other words, the memory slot that stores that object is deallocated, and returned to the
operating system

14.4.2 Namespaces

Recall from the preceding discussion that the statement

In [29]: x = 42

binds the name x to the integer object on the right-hand side


We also mentioned that this process of binding x to the correct object is implemented as a
dictionary
This dictionary is called a namespace
Definition: A namespace is a symbol table that maps names to objects in memory
Python uses multiple namespaces, creating them on the fly as necessary
For example, every time we import a module, Python creates a namespace for that module
To see this in action, suppose we write a script math2.py with a single line

In [30]: %%file math2.py


pi = 'foobar'

Writing math2.py

Now we start the Python interpreter and import it


212 14. MORE LANGUAGE FEATURES

In [31]: import math2

Next let’s import the math module from the standard library

In [32]: import math

Both of these modules have an attribute called pi

In [33]: math.pi

Out[33]: 3.141592653589793

In [34]: math2.pi

Out[34]: 'foobar'

These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary
We can look at the dictionary directly, using module_name.__dict__

In [35]: import math

math.__dict__.items()

Out[35]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

In [36]: import math2

math2.__dict__.items()

Out[36]: dict_items([('__name__', 'math2'), ('__doc__', None), ('__package__', ''), ('__loader__', <_frozen_im


All Rights Reserved.

Copyright (c) 2000 BeOpen.com.


All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.


All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.


All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
for supporting Python development. See www.python.org for more information., 'license': Type lic

As you know, we access elements of the namespace using the dotted attribute notation

In [37]: math.pi

Out[37]: 3.141592653589793

In fact this is entirely equivalent to math.__dict__['pi']

In [38]: math.__dict__['pi'] == math.pi

Out[38]: True
14.4. NAMES AND NAME RESOLUTION 213

14.4.3 Viewing Namespaces

As we saw above, the math namespace can be printed by typing math.__dict__


Another way to see its contents is to type vars(math)

In [39]: vars(math).items()

Out[39]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

If you just want to see the names, you can type

In [40]: dir(math)[0:10]

Out[40]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']

Notice the special names __doc__ and __name__


These are initialized in the namespace when any module is imported

• __doc__ is the doc string of the module


• __name__ is the name of the module

In [41]: print(math.__doc__)

This module is always available. It provides access to the


mathematical functions defined by the C standard.

In [42]: math.__name__

Out[42]: 'math'

14.4.4 Interactive Sessions

In Python, all code executed by the interpreter runs in some module


What about commands typed at the prompt?
These are also regarded as being executed within a module — in this case, a module called
__main__
To check this, we can look at the current module name via the value of __name__ given at
the prompt

In [43]: print(__name__)
214 14. MORE LANGUAGE FEATURES

__main__

When we run a script using IPython’s run command, the contents of the file are executed as
part of __main__ too
To see this, let’s create a file mod.py that prints its own __name__ attribute

In [44]: %%file mod.py


print(__name__)

Writing mod.py

Now let’s look at two different ways of running it in IPython

In [45]: import mod # Standard import

mod

In [46]: %run mod.py # Run interactively

__main__

In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos

In [47]: x = 2
y = 3

import numpy as np

%whos

Variable Type Data/Info


-----------------------------------------------------
e enumerate <enumerate object at 0x7f4f6c16f708>
f function <function f at 0x7f4f6c1c7048>
g function <function f at 0x7f4f6c1c7048>
i str eggs
math module <module 'math' from '/hom<…>37m-x86_64-linux-gnu.so'>
math2 module <module 'math2' from '/ho<…>pyter/executed/math2.py'>
mod module <module 'mod' from '/home<…>jupyter/executed/mod.py'>
nikkei_data reader <_csv.reader object at 0x7f4f6c178588>
np module <module 'numpy' from '/ho<…>kages/numpy/__init__.py'>
reader builtin_function_or_method <built-in function reader>
x int 2
y int 3
14.4. NAMES AND NAME RESOLUTION 215

14.4.5 The Global Namespace

Python documentation often makes reference to the “global namespace”


The global namespace is the namespace of the module currently being executed
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is
the global namespace
Next, we import a module called amodule

import amodule

At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made
In this case it’s __main__, so the namespace of __main__ again becomes the global names-
pace

14.4.6 Local Namespaces

Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with lo-
cals()
For example, consider

In [48]: def f(x):


a = 2
print(locals())
return a * x

Now let’s call the function

In [49]: f(1)

{'x': 1, 'a': 2}

Out[49]: 2

You can see the local namespace of f before it is destroyed


216 14. MORE LANGUAGE FEATURES

14.4.7 The __builtins__ Namespace

We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?

• These definitions are stored in a module called __builtin__


• They have there own namespace called __builtins__

In [50]: dir()[0:10]

Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']

In [51]: dir(__builtins__)[0:10]

Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']

We can access elements of the namespace as follows

In [52]: __builtins__.max

Out[52]: <function max>

But __builtins__ is special, because we can always access them directly as well

In [53]: max

Out[53]: <function max>

In [54]: __builtins__.max == max

Out[54]: True

The next section explains how this works …

14.4.8 Name Resolution

Namespaces are great because they help us organize variable names


(Type import this at the prompt and look at the last item that’s printed)
However, we do need to understand how the Python interpreter works with multiple names-
paces
14.4. NAMES AND NAME RESOLUTION 217

At any point of execution, there are in fact at least two namespaces that can be accessed di-
rectly
(“Accessed directly” means without using a dot, as in pi rather than math.pi)
These namespaces are

• The global namespace (of the module being executed)


• The builtin namespace

If the interpreter is executing a function, then the directly accessible namespaces are

• The local namespace of the function


• The global namespace (of the module being executed)
• The builtin namespace

Sometimes functions are defined within other functions, like so

In [55]: def f():


a = 2
def g():
b = 4
print(a * b)
g()

Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is

1. the local namespace (if it exists)


2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace

If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Here’s an example that helps to illustrate
Consider a script test.py that looks as follows

In [56]: %%file test.py


def g(x):
a = 1
x = x + a
return x

a = 0
y = g(10)
print("a = ", a, "y = ", y)

Writing test.py

What happens when we run this script?


218 14. MORE LANGUAGE FEATURES

In [57]: %run test.py

a = 0 y = 11

In [58]: x

Out[58]: 2

First,

• The global namespace {} is created


• The function object is created, and g is bound to it within the global namespace
• The name a is bound to 0, again in the global namespace

Next g is called via y = g(10), leading to the following sequence of actions

• The local namespace for the function is created


• Local names x and a are bound, so that the local namespace becomes {'x': 10,
'a': 1}
• Statement x = x + a uses the local a and local x to compute x + a, and binds local
name x to the result
• This value is returned, and y is bound to it in the global namespace
• Local x and a are discarded (and the local namespace is deallocated)

Note that the global a was not affected by the local a

14.4.9 Mutable Versus Immutable Parameters

This is a good time to say a little more about mutable vs immutable objects
Consider the code segment

In [59]: def f(x):


x = x + 1
return x

x = 1
print(f(x), x)

2 1

We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, it’s a different story when we use a mutable data type such as a list
14.5. HANDLING ERRORS 219

In [60]: def f(x):


x[0] = x[0] + 1
return x

x = [1]
print(f(x), x)

[2] [2]

This prints as the value of f(x) and same for x


Here’s what happens

• f is registered as a function in the global namespace


• x bound to in the global namespace
• The call f(x)

– Creates a local namespace


– Adds x to local namespace, bound to
– The list is modified to
– Returns the list
– The local namespace is deallocated, and local x is lost

• Global x has been modified

14.5 Handling Errors

Sometimes it’s possible to anticipate errors as we’re writing code


For example, the unbiased sample variance of sample 𝑦1 , … , 𝑦𝑛 is defined as

𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1

This can be calculated in NumPy using np.var


But if you were writing a function to handle such a calculation, you might anticipate a divide-
by-zero error when the sample size is one
One possible action is to do nothing — the program will just crash, and spit out an error
message
But sometimes it’s worth writing your code in a way that anticipates and deals with runtime
errors that you think might arise
Why?

• Because the debugging information provided by the interpreter is often less useful than
the information on possible errors you have in your head when writing code
• Because errors causing execution to stop are frustrating if you’re in the middle of a
large computation
• Because it’s reduces confidence in your code on the part of your users (if you are writing
for others)
220 14. MORE LANGUAGE FEATURES

14.5.1 Assertions

A relatively easy way to handle checks is with the assert keyword


For example, pretend for a moment that the np.var function doesn’t exist and we need to
write our own

In [61]: def var(y):


n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)

If we run this with an array of length one, the program will terminate and print our error
message

In [62]: var([1])

---------------------------------------------------------------------------

AssertionError Traceback (most recent call last)

<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])

<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)

AssertionError: Sample size must be greater than one.

The advantage is that we can

• fail early, as soon as we know there will be a problem


• supply specific information on why a program is failing

14.5.2 Handling Errors During Runtime

The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Let’s look at how this is done
Exceptions
Here’s an example of a common error type

In [63]: def f:

File "<ipython-input-63-262a7e387ba5>", line 1


def f:
^
SyntaxError: invalid syntax
14.5. HANDLING ERRORS 221

Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Here’s a different kind of error, unrelated to syntax

In [64]: 1 / 0

---------------------------------------------------------------------------

ZeroDivisionError Traceback (most recent call last)

<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0

ZeroDivisionError: division by zero

Here’s another

In [65]: x1 = y1

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1

NameError: name 'y1' is not defined

And another

In [66]: 'foo' + 6

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6

TypeError: can only concatenate str (not "int") to str

And another

In [67]: X = []
x = X[0]

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]

IndexError: list index out of range


222 14. MORE LANGUAGE FEATURES

On each occasion, the interpreter informs us of the error type

• NameError, TypeError, IndexError, ZeroDivisionError, etc.

In Python, these errors are called exceptions


Catching Exceptions
We can catch and deal with exceptions using try – except blocks
Here’s a simple example

In [68]: def f(x):


try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None

When we call f we get the following output

In [69]: f(2)

Out[69]: 0.5

In [70]: f(0)

Error: division by zero. Returned None

In [71]: f(0.0)

Error: division by zero. Returned None

The error is caught and execution of the program is not terminated


Note that other error types are not caught
If we are worried the user might pass in a string, we can catch that error too

In [72]: def f(x):


try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None

Here’s what happens

In [73]: f(2)

Out[73]: 0.5

In [74]: f(0)
14.6. DECORATORS AND DESCRIPTORS 223

Error: Division by zero. Returned None

In [75]: f('foo')

Error: Unsupported operation. Returned None

If we feel lazy we can catch these errors together

In [76]: def f(x):


try:
return 1.0 / x
except (TypeError, ZeroDivisionError):
print('Error: Unsupported operation. Returned None')
return None

Here’s what happens

In [77]: f(2)

Out[77]: 0.5

In [78]: f(0)

Error: Unsupported operation. Returned None

In [79]: f('foo')

Error: Unsupported operation. Returned None

If we feel extra lazy we can catch all error types as follows

In [80]: def f(x):


try:
return 1.0 / x
except:
print('Error. Returned None')
return None

In general it’s better to be specific

14.6 Decorators and Descriptors

Let’s look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peo-
ple’s code
Hence you need to understand them at some stage of your Python education
224 14. MORE LANGUAGE FEATURES

14.6.1 Decorators

Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar
It’s very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
Suppose we are working on a program that looks something like this

In [81]: import numpy as np

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow
If you try it, you’ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined)
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up
later on
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message
This change is easy enough to implement

In [82]: import numpy as np

def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))

def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times
14.6. DECORATORS AND DESCRIPTORS 225

The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater

In [83]: import numpy as np

def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g

This looks complicated so let’s work through it slowly


To unravel the logic, consider what happens when we say f = check_nonneg(f)
This calls the function check_nonneg with parameter func set equal to f
Now check_nonneg creates a new function called safe_function that verifies x as non-
negative and then calls func on it (which is the same as f)
Finally, the global name f is then set equal to safe_function
Now the behavior of f is as we desire, and the same is true of g
At the same time, the test logic is written only once
Enter Decorators
The last version of our code is still not ideal
For example, if someone is reading our code and wants to know how f works, they will be
looking for the function definition, which is

In [84]: def f(x):


return np.log(np.log(x))

They may well miss the line f = check_nonneg(f)


For this and other reasons, decorators were introduced to Python
With decorators, we can replace the lines

In [85]: def f(x):


return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)

with
226 14. MORE LANGUAGE FEATURES

In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))

@check_nonneg
def g(x):
return np.sqrt(42 * x)

These two pieces of code do exactly the same thing


If they do the same thing, do we really need decorator syntax?
Well, notice that the decorators sit right on top of the function definitions
Hence anyone looking at the definition of the function will see them and be aware that the
function is modified
In the opinion of many people, this makes the decorator syntax a significant improvement to
the language

14.6.2 Descriptors

Descriptors solve a common problem regarding management of variables


To understand the issue, consider a Car class, that simulates a car
Suppose that this class defines the variables miles and kms, which give the distance traveled
in miles and kilometers respectively
A highly simplified version of the class might look as follows

In [87]: class Car:

def __init__(self, miles=1000):


self.miles = miles
self.kms = miles * 1.61

# Some other functionality, details omitted

One potential problem we might have here is that a user alters one of these variables but not
the other

In [88]: car = Car()


car.miles

Out[88]: 1000

In [89]: car.kms

Out[89]: 1610.0

In [90]: car.miles = 6000


car.kms

Out[90]: 1610.0

In the last two lines we see that miles and kms are out of sync
14.6. DECORATORS AND DESCRIPTORS 227

What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated
A Solution
In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class

In [91]: class Car:

def __init__(self, miles=1000):


self._miles = miles
self._kms = miles * 1.61

def set_miles(self, value):


self._miles = value
self._kms = value * 1.61

def set_kms(self, value):


self._kms = value
self._miles = value / 1.61

def get_miles(self):
return self._miles

def get_kms(self):
return self._kms

miles = property(get_miles, set_miles)


kms = property(get_kms, set_kms)

First let’s check that we get the desired behavior

In [92]: car = Car()


car.miles

Out[92]: 1000

In [93]: car.miles = 6000


car.kms

Out[93]: 9660.0

Yep, that’s what we want — car.kms is automatically updated


How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the
variables
The objects miles and kms are properties, a common kind of descriptor
The methods get_miles, set_miles, get_kms and set_kms define what happens when
you get (i.e. access) or set (bind) these variables

• So-called “getter” and “setter” methods


228 14. MORE LANGUAGE FEATURES

The builtin Python function property takes getter and setter methods and creates a prop-
erty
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered — in this case set_miles
Decorators and Properties
These days its very common to see the property function used via a decorator
Here’s another version of our Car class that works as before but now uses decorators to set
up the properties

In [94]: class Car:

def __init__(self, miles=1000):


self._miles = miles
self._kms = miles * 1.61

@property
def miles(self):
return self._miles

@property
def kms(self):
return self._kms

@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61

@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61

We won’t go through all the details here


For further information you can refer to the descriptor documentation

14.7 Generators

A generator is a kind of iterator (i.e., it works with a next function)


We will study two ways to build generators: generator expressions and generator functions

14.7.1 Generator Expressions

The easiest way to build generators is using generator expressions


Just like a list comprehension, but with round brackets
Here is the list comprehension:

In [95]: singular = ('dog', 'cat', 'bird')


type(singular)

Out[95]: tuple
14.7. GENERATORS 229

In [96]: plural = [string + 's' for string in singular]


plural

Out[96]: ['dogs', 'cats', 'birds']

In [97]: type(plural)

Out[97]: list

And here is the generator expression

In [98]: singular = ('dog', 'cat', 'bird')


plural = (string + 's' for string in singular)
type(plural)

Out[98]: generator

In [99]: next(plural)

Out[99]: 'dogs'

In [100]: next(plural)

Out[100]: 'cats'

In [101]: next(plural)

Out[101]: 'birds'

Since sum() can be called on iterators, we can do this

In [102]: sum((x * x for x in range(10)))

Out[102]: 285

The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case

In [103]: sum(x * x for x in range(10))

Out[103]: 285

14.7.2 Generator Functions

The most flexible way to create generator objects is to use generator functions
Let’s look at some examples
Example 1
Here’s a very simple example of a generator function
230 14. MORE LANGUAGE FEATURES

In [104]: def f():


yield 'start'
yield 'middle'
yield 'end'

It looks like a function, but uses a keyword yield that we haven’t met before
Let’s see how it works after running this code

In [105]: type(f)

Out[105]: function

In [106]: gen = f()


gen

Out[106]: <generator object f at 0x7f4f6c1bb1b0>

In [107]: next(gen)

Out[107]: 'start'

In [108]: next(gen)

Out[108]: 'middle'

In [109]: next(gen)

Out[109]: 'end'

In [110]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:

The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)

• Executes code in the body of f() until it meets a yield statement


• Returns that value to the caller of next(gen)

The second call to next(gen) starts executing from the next line
14.7. GENERATORS 231

In [111]: def f():


yield 'start'
yield 'middle' # This line!
yield 'end'

and continues until the next yield statement


At that point it returns the value following yield to the caller of next(gen), and so on
When the code block ends, the generator throws a StopIteration error
Example 2
Our next example receives an argument x from the caller

In [112]: def g(x):


while x < 100:
yield x
x = x * x

Let’s see how it works

In [113]: g

Out[113]: <function __main__.g(x)>

In [114]: gen = g(2)


type(gen)

Out[114]: generator

In [115]: next(gen)

Out[115]: 2

In [116]: next(gen)

Out[116]: 4

In [117]: next(gen)

Out[117]: 16

In [118]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:
232 14. MORE LANGUAGE FEATURES

The call gen = g(2) binds gen to a generator


Inside the generator, the name x is bound to 2
When we call next(gen)

• The body of g() executes until the line yield x, and the value of x is returned

Note that value of x is retained inside the generator


When we call next(gen) again, execution continues from where it left off

In [119]: def g(x):


while x < 100:
yield x
x = x * x # execution continues from here

When x < 100 fails, the generator throws a StopIteration error


Incidentally, the loop inside the generator can be infinite

In [120]: def g(x):


while 1:
yield x
x = x * x

14.7.3 Advantages of Iterators

What’s the advantage of using an iterator here?


Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows

In [121]: import random


n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)

Out[121]: 5001162

But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens

In [122]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]

We can avoid these problems using iterators


Here is the generator function

In [123]: def f(n):


i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
14.8. RECURSIVE FUNCTION CALLS 233

Now let’s do the sum

In [124]: n = 10000000
draws = f(n)
draws

Out[124]: <generator object f at 0x7f4f4fdfbb88>

In [125]: sum(draws)

Out[125]: 5000216

In summary, iterables

• avoid the need to create big lists/tuples, and


• provide a uniform interface to iteration that can be used transparently in for loops

14.8 Recursive Function Calls

This is not something that you will use every day, but it is still useful — you should learn it
at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing 𝑥𝑡 for some t when

𝑥𝑡+1 = 2𝑥𝑡 , 𝑥0 = 1 (1)

Obviously the answer is 2𝑡


We can compute this easily enough with a loop

In [126]: def x_loop(t):


x = 1
for i in range(t):
x = 2 * x
return x

We can also use a recursive solution, as follows

In [127]: def x(t):


if t == 0:
return 1
else:
return 2 * x(t-1)

What happens here is that each successive call uses it’s own frame in the stack

• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue

This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution
We’ll meet less contrived applications of recursion later on
234 14. MORE LANGUAGE FEATURES

14.9 Exercises

14.9.1 Exercise 1

The Fibonacci numbers are defined by

𝑥𝑡+1 = 𝑥𝑡 + 𝑥𝑡−1 , 𝑥0 = 0, 𝑥1 = 1 (2)

The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡

14.9.2 Exercise 2

Complete the following code, and test it using this csv file, which we assume that you’ve put
in your current working directory

def column_iterator(target_file, column_number):


"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here

dates = column_iterator('test_table.csv', 1)

for date in dates:


print(date)

14.9.3 Exercise 3

Suppose we have a text file numbers.txt containing the following lines

prices
3
8

7
21

Using try – except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers
14.10. SOLUTIONS 235

14.10 Solutions

14.10.1 Exercise 1

Here’s the standard solution

In [128]: def x(t):


if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)

Let’s test it

In [129]: print([x(i) for i in range(10)])

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

14.10.2 Exercise 2

One solution is as follows

In [130]: def column_iterator(target_file, column_number):


"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
f = open(target_file, 'r')
for line in f:
yield line.split(',')[column_number - 1]
f.close()

dates = column_iterator('test_table.csv', 1)

i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1

Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11

14.10.3 Exercise 3

Let’s save the data first


236 14. MORE LANGUAGE FEATURES

In [131]: %%file numbers.txt


prices
3
8

7
21

Writing numbers.txt

In [132]: f = open('numbers.txt')

total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass

f.close()

print(total)

39.0
15

Debugging

15.1 Contents

• Overview 15.2

• Debugging 15.3

• Other Useful Magics 15.4

“Debugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.” – Brian Kernighan

15.2 Overview

Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here we’ll focus on Jupyter and leave you to explore other settings
We’ll need the following imports

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

15.3 Debugging

15.3.1 The debug Magic

Let’s consider a simple (and rather contrived) example

237
238 15. DEBUGGING

In [2]: def plot_log():


fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

This code is intended to plot the log function over the interval [1, 2]
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method
15.3. DEBUGGING 239

But let’s pretend that we don’t understand this for the moment
We might suspect there’s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:

In [3]: ax

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-3-b00e77935981> in <module>
----> 1 ax

NameError: name 'ax' is not defined

The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates
Let’s try doing it a different way
We run the first cell block again, generating the same error

In [4]: def plot_log():


fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'


240 15. DEBUGGING

But this time we type in the following cell block

%debug

You should be dropped into a new prompt that looks something like this

ipdb>

(You might see pdb> instead)


Now we can investigate the value of our variables at this point in the program, step forward
through the code, etc.
For example, here we simply type the name ax to see what’s happening with this object:

ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)

It’s now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help

ipdb> h

Documented commands (type help <topic>):


========================================
EOF bt cont enable jump pdef r tbreak w
a c continue exit l pdoc restart u whatis
alias cl d h list pinfo return unalias where
15.3. DEBUGGING 241

args clear debug help n pp run unt


b commands disable ignore next q s until
break condition down j p quit step up

Miscellaneous help topics:


==========================
exec pdb

Undocumented commands:
======================
retval rv

ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.

15.3.2 Setting a Break Point

The preceding approach is handy but sometimes insufficient


Consider the following modified version of our function above

In [5]: def plot_log():


fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10)
242 15. DEBUGGING

Now there won’t be any exception, but the plot won’t look right
To investigate, it would be helpful if we could inspect variables like x during execution of the
function
To this end, we add a “break point” by inserting breakpoint() inside the function code
block

def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Now let’s run the script, and investigate via the debugger

> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])

We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q

15.4 Other Useful Magics

In this lecture, we used the %debug IPython magic


There are many other useful magics:

• %precision 4 sets printed precision for floats to 4 decimal places


• %whos gives a list of variables and their values
• %quickref gives a list of magics

The full list of magics is here


Part IV

Data and Empirics

243
16

Pandas

16.1 Contents

• Overview 16.2

• Series 16.3

• DataFrames 16.4

• On-Line Data Sources 16.5

• Exercises 16.6

• Solutions 16.7

16.2 Overview

Pandas is a package of fast, efficient data analysis tools for Python


Its popularity has surged in recent years, coincident with the rise of fields such as data science
and machine learning
Here’s a popularity comparison over time against STATA and SAS, courtesy of Stack Over-
flow Trends

245
246 16. PANDAS

Just as NumPy provides the basic array data type plus core array operations, pandas

1. defines fundamental structures for working with data and


2. endows them with methods that facilitate operations such as

• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging [1]
• dealing with missing values, etc., etc.

More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture, we will assume that the following imports have taken place

In [1]: import pandas as pd


import numpy as np

16.3 Series

Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a “column” of data, such as a collection of observations on a
single variable
A DataFrame is an object for storing related columns of data
Let’s start with Series

In [2]: s = pd.Series(np.random.randn(4), name='daily returns')


s

Out[2]: 0 0.246617
1 1.616297
16.3. SERIES 247

2 1.371344
3 -0.854713
Name: daily returns, dtype: float64

Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares
Pandas Series are built on top of NumPy arrays and support many similar operations

In [3]: s * 100

Out[3]: 0 24.661661
1 161.629724
2 137.134394
3 -85.471300
Name: daily returns, dtype: float64

In [4]: np.abs(s)

Out[4]: 0 0.246617
1 1.616297
2 1.371344
3 0.854713
Name: daily returns, dtype: float64

But Series provide more than NumPy arrays


Not only do they have some additional (statistically oriented) methods

In [5]: s.describe()

Out[5]: count 4.000000


mean 0.594886
std 1.135605
min -0.854713
25% -0.028716
50% 0.808980
75% 1.432582
max 1.616297
Name: daily returns, dtype: float64

But their indices are more flexible

In [6]: s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']


s

Out[6]: AMZN 0.246617


AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same type—in this case, floats)
In fact, you can use much of the same syntax as Python dictionaries

In [7]: s['AMZN']
248 16. PANDAS

Out[7]: 0.24661661104520952

In [8]: s['AMZN'] = 0
s

Out[8]: AMZN 0.000000


AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

In [9]: 'AAPL' in s

Out[9]: True

16.4 DataFrames

While a Series is a single column of data, a DataFrame is several columns, one for each
variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here
Here’s the content of test_pwt.csv

"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","

Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:

In [10]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pw
type(df)

Out[10]: pandas.core.frame.DataFrame

In [11]: df

Out[11]: country country isocode year POP XRAT tcgdp \


0 Argentina ARG 2000 37335.653 0.999500 2.950722e+05
1 Australia AUS 2000 19053.186 1.724830 5.418047e+05
16.4. DATAFRAMES 249

2 India IND 2000 1006300.297 44.941600 1.728144e+06


3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03
5 South Africa ZAF 2000 45064.098 6.939830 2.272424e+05
6 United States USA 2000 282171.957 1.000000 9.898700e+06
7 Uruguay URY 2000 3219.793 12.099592 2.525596e+04

cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068

We can select particular rows using standard Python array slicing notation

In [12]: df[2:5]

Out[12]: country country isocode year POP XRAT tcgdp \


2 India IND 2000 1006300.297 44.941600 1.728144e+06
3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03

cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954

To select columns, we can pass a list containing the names of the desired columns represented
as strings

In [13]: df[['country', 'tcgdp']]

Out[13]: country tcgdp


0 Argentina 2.950722e+05
1 Australia 5.418047e+05
2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04

To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]

In [14]: df.iloc[2:5, 0:4]

Out[14]: country country isocode year POP


2 India IND 2000 1006300.297
3 Israel ISR 2000 6114.570
4 Malawi MWI 2000 11801.505

To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way

In [15]: df.loc[df.index[2:5], ['country', 'tcgdp']]


250 16. PANDAS

Out[15]: country tcgdp


2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03

Let’s imagine that we’re only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above

In [16]: df = df[['country', 'POP', 'tcgdp']]


df

Out[16]: country POP tcgdp


0 Argentina 37335.653 2.950722e+05
1 Australia 19053.186 5.418047e+05
2 India 1006300.297 1.728144e+06
3 Israel 6114.570 1.292539e+05
4 Malawi 11801.505 5.026222e+03
5 South Africa 45064.098 2.272424e+05
6 United States 282171.957 9.898700e+06
7 Uruguay 3219.793 2.525596e+04

Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex
To do this, we set the index to be the country variable in the dataframe

In [17]: df = df.set_index('country')
df

Out[17]: POP tcgdp


country
Argentina 37335.653 2.950722e+05
Australia 19053.186 5.418047e+05
India 1006300.297 1.728144e+06
Israel 6114.570 1.292539e+05
Malawi 11801.505 5.026222e+03
South Africa 45064.098 2.272424e+05
United States 282171.957 9.898700e+06
Uruguay 3219.793 2.525596e+04

Let’s give the columns slightly better names

In [18]: df.columns = 'population', 'total GDP'


df

Out[18]: population total GDP


country
Argentina 37335.653 2.950722e+05
Australia 19053.186 5.418047e+05
India 1006300.297 1.728144e+06
Israel 6114.570 1.292539e+05
Malawi 11801.505 5.026222e+03
South Africa 45064.098 2.272424e+05
United States 282171.957 9.898700e+06
Uruguay 3219.793 2.525596e+04

Population is in thousands, let’s revert to single units

In [19]: df['population'] = df['population'] * 1e3


df
16.4. DATAFRAMES 251

Out[19]: population total GDP


country
Argentina 3.733565e+07 2.950722e+05
Australia 1.905319e+07 5.418047e+05
India 1.006300e+09 1.728144e+06
Israel 6.114570e+06 1.292539e+05
Malawi 1.180150e+07 5.026222e+03
South Africa 4.506410e+07 2.272424e+05
United States 2.821720e+08 9.898700e+06
Uruguay 3.219793e+06 2.525596e+04

Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions

In [20]: df['GDP percap'] = df['total GDP'] * 1e6 / df['population']


df

Out[20]: population total GDP GDP percap


country
Argentina 3.733565e+07 2.950722e+05 7903.229085
Australia 1.905319e+07 5.418047e+05 28436.433261
India 1.006300e+09 1.728144e+06 1717.324719
Israel 6.114570e+06 1.292539e+05 21138.672749
Malawi 1.180150e+07 5.026222e+03 425.896679
South Africa 4.506410e+07 2.272424e+05 5042.647686
United States 2.821720e+08 9.898700e+06 35080.381854
Uruguay 3.219793e+06 2.525596e+04 7843.970620

One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita

In [21]: import matplotlib.pyplot as plt


%matplotlib inline

df['GDP percap'].plot(kind='bar')
plt.show()
252 16. PANDAS

At the moment the data frame is ordered alphabetically on the countries—let’s change it to
GDP per capita

In [22]: df = df.sort_values(by='GDP percap', ascending=False)


df

Out[22]: population total GDP GDP percap


country
United States 2.821720e+08 9.898700e+06 35080.381854
Australia 1.905319e+07 5.418047e+05 28436.433261
Israel 6.114570e+06 1.292539e+05 21138.672749
Argentina 3.733565e+07 2.950722e+05 7903.229085
Uruguay 3.219793e+06 2.525596e+04 7843.970620
South Africa 4.506410e+07 2.272424e+05 5042.647686
India 1.006300e+09 1.728144e+06 1717.324719
Malawi 1.180150e+07 5.026222e+03 425.896679

Plotting as before now yields

In [23]: df['GDP percap'].plot(kind='bar')


plt.show()
16.5. ON-LINE DATA SOURCES 253

16.5 On-Line Data Sources

Python makes it straightforward to query online databases programmatically


An important database for economists is FRED — a vast collection of time series data main-
tained by the St. Louis Fed
For example, suppose that we are interested in the unemployment rate
Via FRED, the entire series for the US civilian unemployment rate can be downloaded di-
rectly by entering this URL into your browser (note that this requires an internet connection)

https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv

(Equivalently, click here: https://research.stlouisfed.org/fred2/series/


UNRATE/downloaddata/UNRATE.csv)
This request returns a CSV file, which will be handled by your default application for this
class of files
Alternatively, we can access the CSV file from within a Python program
This can be done with a variety of methods
We start with a relatively low-level method and then return to pandas
254 16. PANDAS

16.5.1 Accessing Data with requests

One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer

In [24]: import requests

r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv')

If there’s no error message, then the call has succeeded


If you do get an error, then there are two likely causes

1. You are not connected to the Internet — hopefully, this isn’t the case
2. Your machine is accessing the Internet through a proxy server, and Python isn’t aware
of this

In the second case, you can either

• switch to another machine


• solve your proxy problem by reading the documentation

Assuming that all is working, you can now proceed


to use the source object returned by the call re-
quests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRA

In [25]: url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv'


source = requests.get(url).content.decode().split("\n")
source[0]

Out[25]: 'DATE,VALUE\r'

In [26]: source[1]

Out[26]: '1948-01-01,3.4\r'

In [27]: source[2]

Out[27]: '1948-02-01,3.8\r'

We could now write some additional code to parse this text and store it as an array
But this is unnecessary — pandas’ read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering

In [28]: data = pd.read_csv(url, index_col=0, parse_dates=True)

The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
16.5. ON-LINE DATA SOURCES 255

In [29]: type(data)

Out[29]: pandas.core.frame.DataFrame

In [30]: data.head() # A useful method to get a quick look at a data frame

Out[30]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5

In [31]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly

Out[31]: VALUE
count 857.0
mean 5.8
std 1.6
min 2.5
25% 4.6
50% 5.6
75% 6.8
max 10.8

We can also plot the unemployment rate from 2006 to 2012 as follows

In [32]: data['2006':'2012'].plot()
plt.show()
256 16. PANDAS

16.5.2 Accessing World Bank Data

Let’s look at one more example of downloading and manipulating data — this time from the
World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, here’s some data on government debt as a ratio to GDP
If you click on “DOWNLOAD DATA” you will be given the option to download the data as
an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots
time series for the US and Australia

In [33]: import matplotlib.pyplot as plt


import requests
import pandas as pd

# == Get data and read into file gd.xls == #


wb_data_query = "http://api.worldbank.org/v2/en/indicator/gc.dod.totl.gd.zs?downloadformat=excel"
r = requests.get(wb_data_query)
with open('gd.xls', 'wb') as output:
output.write(r.content)

# == Parse data into a DataFrame == #


govt_debt = pd.read_excel('gd.xls', sheet_name='Data', skiprows=3, index_col=1)

# == Take desired values and plot == #


govt_debt = govt_debt.transpose()
govt_debt = govt_debt[['AUS', 'USA']]
govt_debt = govt_debt[38:]
govt_debt.plot(lw=2)
plt.show()

(The file is pandas/wb_download.py, and can be downloaded here


16.6. EXERCISES 257

16.6 Exercises

16.6.1 Exercise 1

Write a program to calculate the percentage price change over 2013 for the following shares

In [34]: ticker_list = {'INTC': 'Intel',


'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

A dataset of daily closing prices for the above firms can be found in pan-
das/data/ticker_data.csv and can be downloaded here
Plot the result as a bar graph like follows

16.7 Solutions

16.7.1 Exercise 1
In [35]: ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/tic
ticker.set_index('Date', inplace=True)

ticker_list = {'INTC': 'Intel',


'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
258 16. PANDAS

'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

price_change = pd.Series()

for tick in ticker_list:


change = 100 * (ticker.loc[ticker.index[-1], tick] - ticker.loc[ticker.index[0], tick]) / ticker.
name = ticker_list[tick]
price_change[name] = change

price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()

Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
17

Pandas for Panel Data

17.1 Contents

• Overview 17.2

• Slicing and Reshaping Data 17.3

• Merging Dataframes and Filling NaNs 17.4

• Grouping and Summarizing Data 17.5

• Final Remarks 17.6

• Exercises 17.7

• Solutions 17.8

17.2 Overview

In an earlier lecture on pandas, we looked at working with simple data sets


Econometricians often need to work with more complex data sets, such as panels
Common tasks include

• Importing data, cleaning it and reshaping it across several axes


• Selecting a time series or cross-section from a panel
• Grouping and summarizing data

pandas (derived from ‘panel’ and ‘data’) contains powerful and easy-to-use tools for solving
exactly these kinds of problems
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:

• summary statistics over multiple dimensions of our data


• a time series of the average minimum wage of countries in the dataset
• kernel density estimates of wages by continent

259
260 17. PANDAS FOR PANEL DATA

We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex
Additional detail will be added to our DataFrame using pandas’ merge function, and data
will be summarized with the groupby function
Most of this lecture was created by Natasha Watkins

17.3 Slicing and Reshaping Data

We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory

In [1]: import pandas as pd

# Display 6 columns for viewing purposes


pd.set_option('display.max_columns', 6)

# Reduce decimal points to 2


pd.options.display.float_format = '{:,.2f}'.format

realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/r

Let’s have a look at what we’ve got to work with

In [2]: realwage.head() # Show first 5 rows

Out[2]: Unnamed: 0 Time Country Series \


0 0 2006-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
1 1 2007-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
2 2 2008-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
3 3 2009-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
4 4 2010-01-01 Ireland In 2015 constant prices at 2015 USD PPPs

Pay period value


0 Annual 17,132.44
1 Annual 18,100.92
2 Annual 17,747.41
3 Annual 18,580.14
4 Annual 18,755.83

The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis

In [3]: realwage = realwage.pivot_table(values='value',


index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()
17.3. SLICING AND RESHAPING DATA 261

Out[3]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Time
2006-01-01 20,410.65 10.33
2007-01-01 21,087.57 10.67
2008-01-01 20,718.24 10.48
2009-01-01 20,984.77 10.62
2010-01-01 20,879.33 10.57

Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
2006-01-01 23,826.64 …
2007-01-01 24,616.84 …
2008-01-01 24,185.70 …
2009-01-01 24,496.84 …
2010-01-01 24,373.76 …

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Hourly
Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

[5 rows x 128 columns]

To more easily filter our time series data, later on, we will convert the index into a Date-
TimeIndex

In [4]: realwage.index = pd.to_datetime(realwage.index)


type(realwage.index)

Out[4]: pandas.core.indexes.datetimes.DatetimeIndex

The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas

In [5]: type(realwage.columns)

Out[5]: pandas.core.indexes.multi.MultiIndex

In [6]: realwage.columns.names

Out[6]: FrozenList(['Country', 'Series', 'Pay period'])

Like before, we can select the country (the top level of our MultiIndex)
262 17. PANDAS FOR PANEL DATA

In [7]: realwage['United States'].head()

Out[7]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.un-
stack() works in the opposite direction - try it out)

In [8]: realwage.stack().head()

Out[8]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 20,410.65
Hourly 10.33
2007-01-01 Annual 21,087.57
Hourly 10.67
2008-01-01 Annual 20,718.24

Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70

Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
2006-01-01 Annual 21,042.28 …
Hourly 10.09 …
2007-01-01 Annual 21,310.05 …
Hourly 10.22 …
2008-01-01 Annual 21,416.96 …

Country United Kingdom \


Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 20,376.32
Hourly 9.81
2007-01-01 Annual 20,954.13
Hourly 10.07
2008-01-01 Annual 20,902.87

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
17.3. SLICING AND RESHAPING DATA 263

2007-01-01 Annual 12,974.40


Hourly 6.24
2008-01-01 Annual 14,097.56

Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56

[5 rows x 64 columns]

We can also pass in an argument to select the level we would like to stack

In [9]: realwage.stack(level='Country').head()

Out[9]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time Country
2006-01-01 Australia 20,410.65 10.33
Belgium 21,042.28 10.09
Brazil 3,310.51 1.41
Canada 13,649.69 6.56
Chile 5,201.65 2.22

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time Country
2006-01-01 Australia 23,826.64 12.06
Belgium 20,228.74 9.70
Brazil 2,032.87 0.87
Canada 14,335.12 6.89
Chile 3,333.76 1.42

Using a DatetimeIndex makes it easy to select a particular time period


Selecting one year and stacking the two lower levels of the MultiIndex creates a cross-
section of our panel data

In [10]: realwage['2015'].stack(level=(1, 2)).transpose().head()

Out[10]: Time 2015-01-01 \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Country
Australia 21,715.53 10.99
Belgium 21,588.12 10.35
Brazil 4,628.63 2.00
Canada 16,536.83 7.95
Chile 6,633.56 2.80

Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81

For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars
264 17. PANDAS FOR PANEL DATA

To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)

In [11]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()

Out[11]: Country Australia Belgium Brazil … Turkey United Kingdom \


Time …
2006-01-01 12.06 9.70 0.87 … 2.27 9.81
2007-01-01 12.46 9.82 0.92 … 2.26 10.07
2008-01-01 12.24 9.87 0.96 … 2.22 10.04
2009-01-01 12.40 10.21 1.03 … 2.28 10.15
2010-01-01 12.34 10.05 1.08 … 2.30 9.96

Country United States


Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

[5 rows x 32 columns]

17.4 Merging Dataframes and Filling NaNs

Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here

In [12]: worlddata = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel


worlddata.head()

Out[12]: Country (en) Country (de) Country (local) … Deathrate \


0 Afghanistan Afghanistan Afganistan/Afqanestan … 13.70
1 Egypt Ägypten Misr … 4.70
2 Åland Islands Ålandinseln Åland … 0.00
3 Albania Albanien Shqipëria … 6.70
4 Algeria Algerien Al-Jaza’ir/Algérie … 4.30

Life expectancy Url


0 51.30 https://www.laenderdaten.info/Asien/Afghanista…
1 72.70 https://www.laenderdaten.info/Afrika/Aegypten/…
2 0.00 https://www.laenderdaten.info/Europa/Aland/ind…
3 78.30 https://www.laenderdaten.info/Europa/Albanien/…
4 76.80 https://www.laenderdaten.info/Afrika/Algerien/…

[5 rows x 17 columns]

First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’

In [13]: worlddata = worlddata[['Country (en)', 'Continent']]


worlddata = worlddata.rename(columns={'Country (en)': 'Country'})
worlddata.head()
17.4. MERGING DATAFRAMES AND FILLING NANS 265

Out[13]: Country Continent


0 Afghanistan Asia
1 Egypt Africa
2 Åland Islands Europe
3 Albania Europe
4 Algeria Africa

We want to merge our new dataframe, worlddata, with realwage_f


The pandas merge function allows dataframes to be joined together by rows
Our dataframes will be merged using country names, requiring us to use the transpose of re-
alwage_f so that rows correspond to country names in both dataframes

In [14]: realwage_f.transpose().head()

Out[14]: Time 2006-01-01 2007-01-01 2008-01-01 … 2014-01-01 2015-01-01 \


Country …
Australia 12.06 12.46 12.24 … 12.67 12.83
Belgium 9.70 9.82 9.87 … 10.01 9.95
Brazil 0.87 0.92 0.96 … 1.21 1.21
Canada 6.89 6.96 7.24 … 8.22 8.35
Chile 1.42 1.45 1.44 … 1.76 1.81

Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91

[5 rows x 11 columns]

We can use either left, right, inner, or outer join to merge our datasets:

• left join includes only countries from the left dataset


• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets

By default, merge will use an inner join


Here we will pass how='left' to keep all countries in realwage_f, but discard countries
in worlddata that do not have a corresponding data entry realwage_f
This is illustrated by the red shading in the following diagram
266 17. PANDAS FOR PANEL DATA

We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'

In [15]: merged = pd.merge(realwage_f.transpose(), worlddata,


how='left', left_index=True, right_on='Country')
merged.head()

Out[15]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \


17 12.06 12.46 12.24 …
23 9.70 9.82 9.87 …
32 0.87 0.92 0.96 …
100 6.89 6.96 7.24 …
38 1.42 1.45 1.44 …

2016-01-01 00:00:00 Country Continent


17 12.98 Australia Australia
23 9.76 Belgium Europe
32 1.24 Brazil South America
100 8.48 Canada North America
38 1.91 Chile South America

[5 rows x 13 columns]

Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe

In [16]: merged[merged['Continent'].isnull()]

Out[16]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \


247 3.42 3.74 3.87 …
247 0.23 0.45 0.39 …
247 1.50 1.64 1.71 …
17.4. MERGING DATAFRAMES AND FILLING NANS 267

2016-01-01 00:00:00 Country Continent


247 5.28 Korea NaN
247 0.55 Russian Federation NaN
247 2.08 Slovak Republic NaN

[3 rows x 13 columns]

We have three missing values!


One option to deal with NaN values is to create a dictionary containing these countries and
their respective continents
.map() will match countries in merged[' Country '] with their continent from the dic-
tionary
Notice how countries not in our dictionary are mapped with NaN

In [17]: missing_continents = {'Korea': 'Asia',


'Russian Federation': 'Europe',
'Slovak Republic': 'Europe'}

merged['Country'].map(missing_continents)

Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object

We don’t want to overwrite the entire series with this mapping


.fillna() only fills in NaN values in merged['Continent'] with the mapping, while
leaving other values in the column unchanged

In [18]: merged['Continent'] = merged['Continent'].fillna(merged['Country'].map(missing_continents))

# Check for whether continents were correctly mapped

merged[merged['Country'] == 'Korea']
268 17. PANDAS FOR PANEL DATA

Out[18]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \


247 3.42 3.74 3.87 …

2016-01-01 00:00:00 Country Continent


247 5.28 Korea Asia

[1 rows x 13 columns]

We will also combine the Americas into a single continent - this will make our visualization
nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want
to replace

In [19]: replace = ['Central America', 'North America', 'South America']

for country in replace:


merged['Continent'].replace(to_replace=country,
value='America',
inplace=True)

Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on
By default, levels will be sorted top-down

In [20]: merged = merged.set_index(['Continent', 'Country']).sort_index()


merged.head()

Out[20]: 2006-01-01 2007-01-01 2008-01-01 … 2014-01-01 \


Continent Country …
America Brazil 0.87 0.92 0.96 … 1.21
Canada 6.89 6.96 7.24 … 8.22
Chile 1.42 1.45 1.44 … 1.76
Colombia 1.01 1.02 1.01 … 1.13
Costa Rica nan nan nan … 2.41

2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63

[5 rows x 11 columns]

While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format

In [21]: merged.columns

Out[21]: Index([2006-01-01 00:00:00, 2007-01-01 00:00:00, 2008-01-01 00:00:00,


2009-01-01 00:00:00, 2010-01-01 00:00:00, 2011-01-01 00:00:00,
2012-01-01 00:00:00, 2013-01-01 00:00:00, 2014-01-01 00:00:00,
2015-01-01 00:00:00, 2016-01-01 00:00:00],
dtype='object')

Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
17.5. GROUPING AND SUMMARIZING DATA 269

In [22]: merged.columns = pd.to_datetime(merged.columns)


merged.columns = merged.columns.rename('Time')
merged.columns

Out[22]: DatetimeIndex(['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',


'2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01'],
dtype='datetime64[ns]', name='Time', freq=None)

The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged

In [23]: merged = merged.transpose()


merged.head()

Out[23]: Continent America … Europe


Country Brazil Canada Chile … Slovenia Spain United Kingdom
Time …
2006-01-01 0.87 6.89 1.42 … 3.92 3.99 9.81
2007-01-01 0.92 6.96 1.45 … 3.88 4.10 10.07
2008-01-01 0.96 7.24 1.44 … 3.96 4.14 10.04
2009-01-01 1.03 7.67 1.52 … 4.08 4.32 10.15
2010-01-01 1.08 7.94 1.56 … 4.81 4.30 9.96

[5 rows x 32 columns]

17.5 Grouping and Summarizing Data

Grouping and summarizing data can be particularly useful for understanding large panel
datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max()
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)

In [24]: merged.mean().head(10)

Out[24]: Continent Country


America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64

Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set

In [25]: import matplotlib.pyplot as plt


%matplotlib inline
import matplotlib
matplotlib.style.use('seaborn')
270 17. PANDAS FOR PANEL DATA

merged.mean().sort_values(ascending=False).plot(kind='bar', title="Average real minimum wage 2006 - 2

#Set country labels


country_labels = merged.mean().sort_values(ascending=False).index.get_level_values('Country').tolist(
plt.xticks(range(0, len(country_labels)), country_labels)
plt.xlabel('Country')

plt.show()

Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)

In [26]: merged.mean(axis=1).head()

Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64

We can plot this time series as a line graph

In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
17.5. GROUPING AND SUMMARIZING DATA 271

plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()

We can also specify a level of the MultiIndex (in the column axis) to aggregate over

In [28]: merged.mean(level='Continent', axis=1).head()

Out[28]: Continent America Asia Australia Europe


Time
2006-01-01 2.80 4.29 10.25 4.80
2007-01-01 2.85 4.44 10.73 4.94
2008-01-01 2.99 4.45 10.76 4.99
2009-01-01 3.23 4.53 10.97 5.16
2010-01-01 3.34 4.53 10.95 5.17

We can plot the average minimum wages in each continent as a time series

In [29]: merged.mean(level='Continent', axis=1).plot()


plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
272 17. PANDAS FOR PANEL DATA

We will drop Australia as a continent for plotting purposes

In [30]: merged = merged.drop('Australia', level='Continent', axis=1)


merged.mean(level='Continent', axis=1).plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
17.5. GROUPING AND SUMMARIZING DATA 273

.describe() is useful for quickly retrieving a number of common summary statistics

In [31]: merged.stack().describe()

Out[31]: Continent America Asia Europe


count 69.00 44.00 200.00
mean 3.19 4.70 5.15
std 3.02 1.56 3.82
min 0.52 2.22 0.23
25% 1.03 3.37 2.02
50% 1.44 5.48 3.54
75% 6.96 5.95 9.70
max 8.48 6.65 12.39

This is a simplified way to use groupby


Using groupby generally follows a ‘split-apply-combine’ process:

• split: data is grouped based on one or more keys


• apply: a function is called on each group independently
• combine: the results of the function calls are combined into a new data structure

The groupby method achieves the first step of this process, creating a new
DataFrameGroupBy object with data split into groups
Let’s split merged by continent again, this time using the groupby function, and name the
resulting object grouped

In [32]: grouped = merged.groupby(level='Continent', axis=1)


grouped

Out[32]: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f59c27f9da0>

Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using
.size()
In this case, our new data structure is a Series

In [33]: grouped.size()

Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64

Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
274 17. PANDAS FOR PANEL DATA

In [34]: import seaborn as sns

continents = grouped.groups.keys()

for continent in continents:


sns.kdeplot(grouped.get_group(continent)['2015'].unstack(), label=continent, shade=True)

plt.title('Real minimum wages in 2015')


plt.xlabel('US dollars')
plt.show()

17.6 Final Remarks

This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures

17.7 Exercises

17.7.1 Exercise 1

In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns
17.8. SOLUTIONS 275

Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex

17.7.2 Exercise 2

Filter the above dataframe to only include employment as a percentage of ‘active population’
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries

17.8 Solutions

17.8.1 Exercise 1
In [35]: employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/em
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()

Out[35]: UNIT Percentage of total population … \


AGE From 15 to 24 years …
SEX Females …
INDIC_EM Active population …
GEO Austria Belgium Bulgaria …
DATE …
2007-01-01 56.00 31.60 26.00 …
2008-01-01 56.20 30.80 26.10 …
2009-01-01 56.20 29.90 24.80 …
2010-01-01 54.00 29.80 26.60 …
2011-01-01 54.80 29.80 24.80 …

UNIT Thousand persons \


AGE From 55 to 64 years
SEX Total
INDIC_EM Total employment (resident population concept - LFS)
GEO Switzerland Turkey
DATE
2007-01-01 nan 1,282.00
2008-01-01 nan 1,354.00
2009-01-01 nan 1,449.00
2010-01-01 640.00 1,583.00
2011-01-01 661.00 1,760.00

UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00

[5 rows x 1440 columns]

This is a large dataset so it is useful to explore the levels and variables available

In [36]: employ.columns.names
276 17. PANDAS FOR PANEL DATA

Out[36]: FrozenList(['UNIT', 'AGE', 'SEX', 'INDIC_EM', 'GEO'])

Variables within levels can be quickly retrieved with a loop

In [37]: for name in employ.columns.names:


print(name, employ.columns.get_level_values(name).unique())

UNIT Index(['Percentage of total population', 'Thousand persons'], dtype='object', name='UNIT')


AGE Index(['From 15 to 24 years', 'From 25 to 54 years', 'From 55 to 64 years'], dtype='object', name='AGE')
SEX Index(['Females', 'Males', 'Total'], dtype='object', name='SEX')
INDIC_EM Index(['Active population', 'Total employment (resident population concept - LFS)'], dtype='object',
GEO Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
'Denmark', 'Estonia', 'Euro area (17 countries)',
'Euro area (18 countries)', 'Euro area (19 countries)',
'European Union (15 countries)', 'European Union (27 countries)',
'European Union (28 countries)', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

17.8.2 Exercise 2

To easily filter by country, swap GEO to the top level and sort the MultiIndex

In [38]: employ.columns = employ.columns.swaplevel(0,-1)


employ = employ.sort_index(axis=1)

We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’

In [39]: geo_list = employ.columns.get_level_values('GEO').unique().tolist()


countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()

Out[39]: Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',


'Denmark', 'Estonia', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

Select only percentage employed in the active population from the dataframe

In [40]: employ_f = employ.xs(('Percentage of total population', 'Active population'),


level=('UNIT', 'INDIC_EM'),
axis=1)
employ_f.head()
17.8. SOLUTIONS 277

Out[40]: GEO Austria … United Kingdom \


AGE From 15 to 24 years … From 55 to 64 years
SEX Females Males Total … Females Males
DATE …
2007-01-01 56.00 62.90 59.40 … 49.90 68.90
2008-01-01 56.20 62.90 59.50 … 50.20 69.80
2009-01-01 56.20 62.90 59.50 … 50.60 70.30
2010-01-01 54.00 62.60 58.30 … 51.10 69.20
2011-01-01 54.80 63.60 59.20 … 51.30 68.40

GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70

[5 rows x 306 columns]

Drop the ‘Total’ value before creating the grouped boxplot

In [41]: employ_f = employ_f.drop('Total', level='SEX', axis=1)

In [42]: box = employ_f['2015'].unstack().reset_index()


sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
278 17. PANDAS FOR PANEL DATA
18

Linear Regression in Python

18.1 Contents

• Overview 18.2

• Simple Linear Regression 18.3

• Extending the Linear Regression Model 18.4

• Endogeneity 18.5

• Summary 18.6

• Exercises 18.7

• Solutions 18.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install linearmodels

18.2 Overview

Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models
Along the way, we’ll discuss a variety of topics, including

• simple and multivariate linear regression


• visualization
• endogeneity and omitted variable bias
• two-stage least squares

As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [3]

279
280 18. LINEAR REGRESSION IN PYTHON

• You can download a copy here

In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around

18.2.1 Prerequisites

This lecture assumes you are familiar with basic econometrics


For an introductory text covering these topics, see, for example, [135]

18.2.2 Comments

This lecture is coauthored with Natasha Watkins

18.3 Simple Linear Regression

[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes
How do we measure institutional differences and economic outcomes?
In this paper,

• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group

These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes

In [2]: import pandas as pd

df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.dt
df1.head()

Out[2]: shortnam euro1900 excolony avexpr logpgp95 cons1 cons90 democ00a \


0 AFG 0.000000 1.0 NaN NaN 1.0 2.0 1.0
1 AGO 8.000000 1.0 5.363636 7.770645 3.0 3.0 0.0
2 ARE 0.000000 1.0 7.181818 9.804219 NaN NaN NaN
3 ARG 60.000004 1.0 6.386364 9.133459 1.0 6.0 3.0
4 ARM 0.000000 0.0 NaN 7.682482 NaN NaN NaN

cons00a extmort4 logem4 loghjypl baseco


0 1.0 93.699997 4.540098 NaN NaN
1 1.0 280.000000 5.634789 -3.411248 1.0
18.3. SIMPLE LINEAR REGRESSION 281

2 NaN NaN NaN NaN NaN


3 3.0 68.900002 4.232656 -0.872274 1.0
4 NaN NaN NaN NaN NaN

Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index

In [3]: import matplotlib.pyplot as plt


%matplotlib inline
plt.style.use('seaborn')

df1.plot(x='avexpr', y='logpgp95', kind='scatter')


plt.show()

The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption
We can write our model as

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖

where:

• 𝛽0 is the intercept of the linear trend line on the y-axis


282 18. LINEAR REGRESSION IN PYTHON

• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)

Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])

In [4]: import numpy as np

# Dropping NA's is required to use numpy's polyfit


df1_subset = df1.dropna(subset=['logpgp95', 'avexpr'])

# Use only 'base sample' for plotting purposes


df1_subset = df1_subset[df1_subset['baseco'] == 1]

X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']

# Replace markers with country labels


plt.scatter(X, y, marker='')

for i, label in enumerate(labels):


plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
18.3. SIMPLE LINEAR REGRESSION 283

The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, ie.

𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1

where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)

In [5]: df1['const'] = 1

Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments

In [6]: import statsmodels.api as sm

reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], missing='drop')


type(reg1)

Out[6]: statsmodels.regression.linear_model.OLS

So far we have simply constructed our model


We need to use .fit() to obtain parameter estimates 𝛽0̂ and 𝛽1̂

In [7]: results = reg1.fit()


type(results)

Out[7]: statsmodels.regression.linear_model.RegressionResultsWrapper

We now have the fitted regression model stored in results


To view the OLS regression results, we can call the .summary() method
Note that an observation was mistakenly dropped from the results in the original paper (see
the note located in maketable2.do from Acemoglu’s webpage), and thus the coefficients differ
slightly

In [8]: print(results.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.611
Model: OLS Adj. R-squared: 0.608
Method: Least Squares F-statistic: 171.4
Date: Fri, 21 Jun 2019 Prob (F-statistic): 4.16e-24
Time: 15:39:14 Log-Likelihood: -119.71
284 18. LINEAR REGRESSION IN PYTHON

No. Observations: 111 AIC: 243.4


Df Residuals: 109 BIC: 248.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.6261 0.301 15.391 0.000 4.030 5.222
avexpr 0.5319 0.041 13.093 0.000 0.451 0.612
==============================================================================
Omnibus: 9.251 Durbin-Watson: 1.689
Prob(Omnibus): 0.010 Jarque-Bera (JB): 9.170
Skew: -0.680 Prob(JB): 0.0102
Kurtosis: 3.362 Cond. No. 33.2
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From our results, we see that

• The intercept 𝛽0̂ = 4.63


• The slope 𝛽1̂ = 0.53
• The positive 𝛽1̂ parameter estimate implies that institutional quality has a positive ef-
fect on economic outcomes, as we saw in the figure
• The p-value of 0.000 for 𝛽1̂ implies that the effect of institutions on GDP is statistically
significant (using p < 0.05 as a rejection rule)
• The R-squared value of 0.611 indicates that around 61% of variation in log GDP per
capita is explained by protection against expropriation

Using our parameter estimates, we can now write our estimated relationship as

̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖

This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38

In [9]: mean_expr = np.mean(df1_subset['avexpr'])


mean_expr

Out[9]: 6.515625

In [10]: predicted_logpdp95 = 4.63 + 0.53 * 7.07


predicted_logpdp95

Out[10]: 8.3771

An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟

In [11]: results.predict(exog=[1, mean_expr])


18.4. EXTENDING THE LINEAR REGRESSION MODEL 285

Out[11]: array([8.09156367])

We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes

In [12]: # Drop missing observations from whole sample

df1_plot = df1.dropna(subset=['logpgp95', 'avexpr'])

# Plot predicted values

plt.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5, label='predicted')

# Plot observed values

plt.scatter(df1_plot['avexpr'], df1_plot['logpgp95'], alpha=0.5, label='observed')

plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()

18.4 Extending the Linear Regression Model

So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model
286 18. LINEAR REGRESSION IN PYTHON

Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖
[3] consider other factors such as:

• the effect of climate on economic outcomes; latitude is used to proxy this


• differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies

Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta

In [13]: df2 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable2.d

# Add constant term to dataset


df2['const'] = 1

# Create lists of variables to be used in each regression


X1 = ['const', 'avexpr']
X2 = ['const', 'avexpr', 'lat_abst']
X3 = ['const', 'avexpr', 'lat_abst', 'asia', 'africa', 'other']

# Estimate an OLS regression for each set of variables


reg1 = sm.OLS(df2['logpgp95'], df2[X1], missing='drop').fit()
reg2 = sm.OLS(df2['logpgp95'], df2[X2], missing='drop').fit()
reg3 = sm.OLS(df2['logpgp95'], df2[X3], missing='drop').fit()

Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)

In [14]: from statsmodels.iolib.summary2 import summary_col

info_dict={'R-squared' : lambda x: f"{x.rsquared:.2f}",


'No. observations' : lambda x: f"{int(x.nobs):d}"}

results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])

results_table.add_title('Table 2 - OLS Regressions')

print(results_table)

Table 2 - OLS Regressions


=========================================
Model 1 Model 3 Model 4
-----------------------------------------
const 4.63*** 4.87*** 5.85***
(0.30) (0.33) (0.34)
avexpr 0.53*** 0.46*** 0.39***
(0.04) (0.06) (0.05)
lat_abst 0.87* 0.33
18.5. ENDOGENEITY 287

(0.49) (0.45)
asia -0.15
(0.15)
africa -0.92***
(0.17)
other 0.30
(0.37)
R-squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

18.5 Endogeneity

As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:

• richer countries may be able to afford or prefer better institutions


• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions

To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:

1. correlated with 𝑎𝑣𝑒𝑥𝑝𝑟𝑖


2. not correlated with the error term (ie. it should not directly affect the dependent vari-
able, otherwise it would be correlated with 𝑢𝑖 due to omitted variable bias)

The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument

In [15]: # Dropping NA's is required to use numpy's polyfit


df1_subset2 = df1.dropna(subset=['logem4', 'avexpr'])

X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']

# Replace markers with country labels


288 18. LINEAR REGRESSION IN PYTHON

plt.scatter(X, y, marker='')

for i, label in enumerate(labels):


plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and expropriation risk')
plt.show()

The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions)
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance
[3] argue this is unlikely because:

• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization

As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates
First stage
18.5. ENDOGENEITY 289

The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
𝑙𝑜𝑔𝑒𝑚4𝑖
Therefore, we will estimate the first-stage regression as

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖

The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)

In [16]: # Import and select the data


df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d
df4 = df4[df4['baseco'] == 1]

# Add a constant variable


df4['const'] = 1

# Fit the first stage regression and print summary


results_fs = sm.OLS(df4['avexpr'],
df4[['const', 'logem4']],
missing='drop').fit()
print(results_fs.summary())

OLS Regression Results


==============================================================================
Dep. Variable: avexpr R-squared: 0.270
Model: OLS Adj. R-squared: 0.258
Method: Least Squares F-statistic: 22.95
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.08e-05
Time: 15:39:17 Log-Likelihood: -104.83
No. Observations: 64 AIC: 213.7
Df Residuals: 62 BIC: 218.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 9.3414 0.611 15.296 0.000 8.121 10.562
logem4 -0.6068 0.127 -4.790 0.000 -0.860 -0.354
==============================================================================
Omnibus: 0.035 Durbin-Watson: 2.003
Prob(Omnibus): 0.983 Jarque-Bera (JB): 0.172
Skew: 0.045 Prob(JB): 0.918
Kurtosis: 2.763 Cond. No. 19.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict()
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model
Our second stage regression is thus

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖
290 18. LINEAR REGRESSION IN PYTHON

In [17]: df4['predicted_avexpr'] = results_fs.predict()

results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.477
Model: OLS Adj. R-squared: 0.469
Method: Least Squares F-statistic: 56.60
Date: Fri, 21 Jun 2019 Prob (F-statistic): 2.66e-10
Time: 15:39:17 Log-Likelihood: -72.268
No. Observations: 64 AIC: 148.5
Df Residuals: 62 BIC: 152.9
Df Model: 1
Covariance Type: nonrobust
====================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------
const 1.9097 0.823 2.320 0.024 0.264 3.555
predicted_avexpr 0.9443 0.126 7.523 0.000 0.693 1.195
==============================================================================
Omnibus: 10.547 Durbin-Watson: 2.137
Prob(Omnibus): 0.005 Jarque-Bera (JB): 11.010
Skew: -0.790 Prob(JB): 0.00407
Kurtosis: 4.277 Cond. No. 58.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels

In [18]: from linearmodels.iv import IV2SLS

Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)

In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)

IV-2SLS Estimation Summary


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.1870
Estimator: IV-2SLS Adj. R-squared: 0.1739
No. Observations: 64 F-statistic: 37.568
Date: Fri, Jun 21 2019 P-value (F-stat) 0.0000
Time: 15:39:17 Distribution: chi2(1)
Cov. Estimator: unadjusted
18.6. SUMMARY 291

Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================

Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False

Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development

18.6 Summary

We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmod-
els
If you are familiar with R, you may want to use the formula interface to statsmodels, or
consider using r2py to call R from within Python

18.7 Exercises

18.7.1 Exercise 1

In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖

𝐻0 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) = 0 (𝑛𝑜 𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)


𝐻1 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) ≠ 0 (𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)

This test is run is two stages


First, we regress 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 on the instrument, 𝑙𝑜𝑔𝑒𝑚4𝑖

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖

Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝛼𝜐𝑖̂ + 𝑢𝑖


292 18. LINEAR REGRESSION IN PYTHON

If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous
Using the above information, estimate a Hausman test and interpret your results

18.7.2 Exercise 2

The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)

𝑦 = 𝑋𝛽 + 𝑢

To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals

min𝑢̂′ 𝑢̂
𝛽̂

Rearranging the first equation and substituting into the second equation, we can write

min (𝑌 − 𝑋 𝛽)̂ ′ (𝑌 − 𝑋 𝛽)̂


𝛽̂

Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients

𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture

18.8 Solutions

18.8.1 Exercise 1
In [20]: # Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d

# Add a constant term


df4['const'] = 1

# Estimate the first stage regression


reg1 = sm.OLS(endog=df4['avexpr'],
exog=df4[['const', 'logem4']],
missing='drop').fit()

# Retrieve the residuals


df4['resid'] = reg1.resid

# Estimate the second stage residuals


reg2 = sm.OLS(endog=df4['logpgp95'],
exog=df4[['const', 'avexpr', 'resid']],
missing='drop').fit()

print(reg2.summary())
18.8. SOLUTIONS 293

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.689
Model: OLS Adj. R-squared: 0.679
Method: Least Squares F-statistic: 74.05
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.07e-17
Time: 15:39:17 Log-Likelihood: -62.031
No. Observations: 70 AIC: 130.1
Df Residuals: 67 BIC: 136.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.4782 0.547 4.530 0.000 1.386 3.570
avexpr 0.8564 0.082 10.406 0.000 0.692 1.021
resid -0.4951 0.099 -5.017 0.000 -0.692 -0.298
==============================================================================
Omnibus: 17.597 Durbin-Watson: 2.086
Prob(Omnibus): 0.000 Jarque-Bera (JB): 23.194
Skew: -1.054 Prob(JB): 9.19e-06
Kurtosis: 4.873 Cond. No. 53.8
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous

18.8.2 Exercise 2
In [21]: # Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.d
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])

# Add a constant term


df1['const'] = 1

# Define the X and y variables


y = np.asarray(df1['logpgp95'])
X = np.asarray(df1[['const', 'avexpr']])

# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)

# Print out the results from the 2 x 1 vector β_hat


print(f'β_0 = {β_hat[0]:.2}')
print(f'β_1 = {β_hat[1]:.2}')

β_0 = 4.6
β_1 = 0.53

It is also possible to use np.linalg.inv(X.T @ X) @ X.T @ y to solve for 𝛽, however


.solve() is preferred as it involves fewer computations
294 18. LINEAR REGRESSION IN PYTHON
19

Maximum Likelihood Estimation

19.1 Contents

• Overview 19.2

• Set Up and Assumptions 19.3

• Conditional Distributions 19.4

• Maximum Likelihood Estimation 19.5

• MLE with Numerical Methods 19.6

• Maximum Likelihood Estimation 19.7

• Summary 19.8

• Exercises 19.9

• Solutions 19.10

19.2 Overview

In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict

295
296 19. MAXIMUM LIKELIHOOD ESTIMATION

19.2.1 Prerequisites

We assume familiarity with basic probability and multivariate calculus

19.2.2 Comments

This lecture is co-authored with Natasha Watkins

19.3 Set Up and Assumptions

Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study

19.3.1 Flow of Ideas

The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data

• e.g., the class of all normal distributions, or the class of all gamma distributions

Each such class is a family of distributions indexed by a finite number of parameters

• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞)

We’ll let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates

19.3.2 Counting Billionaires

Treisman [129] is interested in estimating the number of billionaires in different countries


The number of billionaires is integer-valued
Hence we consider distributions that take values only in the nonnegative integers
(This is one reason least squares regression is not the best tool for the present problem, since
the dependent variable in linear regression is not restricted to integer values)
One integer distribution is the Poisson distribution, the probability mass function (pmf) of
which is

𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!

We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
19.3. SET UP AND ASSUMPTIONS 297

In [1]: from numpy import exp


from scipy.special import factorial
import matplotlib.pyplot as plt
%matplotlib inline

poisson_pmf = lambda y, μ: μ**y / factorial(y) * exp(-μ)


y_values = range(0, 25)

fig, ax = plt.subplots(figsize=(12, 8))

for μ in [1, 5, 10]:


distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu$={μ}',
alpha=0.5,
marker='o',
markersize=8)

ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)

plt.show()

Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases
Let’s have a look at the distribution of the data we’ll be working with in this lecture
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth
The dataset mle/fp.dta can be downloaded here or from its AER page
298 19. MAXIMUM LIKELIHOOD ESTIMATION

In [2]: import pandas as pd


pd.options.display.max_columns = 10

# Load in data and view


df = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/mle/fp.dta')
df.head()

Out[2]: country ccode year cyear numbil … topint08 rintr \


0 United States 2.0 1990.0 21990.0 NaN … 39.799999 4.988405
1 United States 2.0 1991.0 21991.0 NaN … 39.799999 4.988405
2 United States 2.0 1992.0 21992.0 NaN … 39.799999 4.988405
3 United States 2.0 1993.0 21993.0 NaN … 39.799999 4.988405
4 United States 2.0 1994.0 21994.0 NaN … 39.799999 4.988405

noyrs roflaw nrrents


0 20.0 1.61 NaN
1 20.0 1.61 NaN
2 20.0 1.61 NaN
3 20.0 1.61 NaN
4 20.0 1.61 NaN

[5 rows x 36 columns]

Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)

In [3]: numbil0_2008 = df[(df['year'] == 2008) & (


df['country'] != 'United States')].loc[:, 'numbil0']

plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3215: MatplotlibDeprecationWarning:
The `xmin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `left` instead.
alternative='`left`', obj_type='argument')
19.4. CONDITIONAL DISTRIBUTIONS 299

From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers)

19.4 Conditional Distributions

In Treisman’s paper, the dependent variable — the number of billionaires 𝑦𝑖 in country 𝑖 —


is modeled as a function of GDP per capita, population size, and years membership in GATT
and WTO
Hence, the distribution of 𝑦𝑖 needs to be conditioned on the vector of explanatory variables x𝑖
The standard formulation — the so-called poisson regression model — is as follows:

𝑦
𝜇 𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑖 𝑒−𝜇𝑖 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !

where 𝜇𝑖 = exp(x′𝑖 𝛽) = exp(𝛽0 + 𝛽1 𝑥𝑖1 + … + 𝛽𝑘 𝑥𝑖𝑘 )

To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖

In [4]: import numpy as np

y_values = range(0, 20)

# Define a parameter vector with estimates


β = np.array([0.26, 0.18, 0.25, -0.1, -0.22])

# Create some observations X


datasets = [np.array([0, 1, 1, 1, 2]),
np.array([2, 3, 2, 4, 0]),
np.array([3, 4, 5, 3, 2]),
np.array([6, 5, 4, 4, 7])]

fig, ax = plt.subplots(figsize=(12, 8))

for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)

ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
300 19. MAXIMUM LIKELIHOOD ESTIMATION

We can see that the distribution of 𝑦𝑖 is conditional on x𝑖 (𝜇𝑖 is no longer constant)

19.5 Maximum Likelihood Estimation

In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate
We will label our entire parameter vector as 𝛽 where

𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦

To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽
Intuitively, we want to find the 𝛽̂ that best fits our data
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 )
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 )
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so

In [5]: from mpl_toolkits.mplot3d import Axes3D

def plot_joint_poisson(μ=7, y_n=20):


19.5. MAXIMUM LIKELIHOOD ESTIMATION 301

yi_values = np.arange(0, y_n, 1)

# Create coordinate points of X and Y


X, Y = np.meshgrid(yi_values, yi_values)

# Multiply distributions together


Z = poisson_pmf(X, μ) * poisson_pmf(Y, μ)

fig = plt.figure(figsize=(12, 8))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z.T, cmap='terrain', alpha=0.6)
ax.scatter(X, Y, Z.T, color='black', alpha=0.5, linewidths=1)
ax.set(xlabel='$y_1$', ylabel='$y_2$')
ax.set_zlabel('$f(y_1, y_2)$', labelpad=10)
plt.show()

plot_joint_poisson(μ=7, y_n=20)

Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as

𝑛 𝑦
𝜇 𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖

𝑦𝑖 is conditional on both the values of x𝑖 and the parameters 𝛽


The likelihood function is the same as the joint pmf, but treats the parameter 𝛽 as a random
variable and takes the observations (𝑦𝑖 , x𝑖 ) as given

𝑛 𝑦
𝜇 𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)
302 19. MAXIMUM LIKELIHOOD ESTIMATION

Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value

maxℒ(𝛽)
𝛽

In doing so it is generally easier to maximize the log-likelihood (consider differentiating


𝑓(𝑥) = 𝑥 exp(𝑥) vs. 𝑓(𝑥) = log(𝑥) + 𝑥)
Given that taking a logarithm is a monotone increasing transformation, a maximizer of the
likelihood function will also be a maximizer of the log-likelihood function
In our case the log-likelihood is

log ℒ(𝛽) = log (𝑓(𝑦1 ; 𝛽) ⋅ 𝑓(𝑦2 ; 𝛽) ⋅ … ⋅ 𝑓(𝑦𝑛 ; 𝛽))


𝑛
= ∑ log 𝑓(𝑦𝑖 ; 𝛽)
𝑖=1
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
= ∑ log ( 𝑒 )
𝑖=1
𝑦𝑖 !
𝑛 𝑛 𝑛
= ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!
𝑖=1 𝑖=1 𝑖=1

The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving

𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1

However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods

19.6 MLE with Numerical Methods

Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates
One such numerical method is the Newton-Raphson algorithm
Our goal is to find the maximum likelihood estimate 𝛽̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0
Let’s illustrate this by supposing

log ℒ(𝛽) = −(𝛽 − 10)2 − 10

In [6]: β = np.linspace(1, 20)


logL = -(β - 10) ** 2 - 10
dlogL = -2 * β + 20

fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(12, 8))


19.6. MLE WITH NUMERICAL METHODS 303

ax1.plot(β, logL, lw=2)


ax2.plot(β, dlogL, lw=2)

ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()

𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot)
Therefore, the likelihood is maximized when 𝛽 = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then

1. Use the updating rule to iterate the algorithm

𝛽 (𝑘+1) = 𝛽 (𝑘) − 𝐻 −1 (𝛽 (𝑘) )𝐺(𝛽 (𝑘) )

where:
304 19. MAXIMUM LIKELIHOOD ESTIMATION

𝑑 log ℒ(𝛽 (𝑘) )


𝐺(𝛽 (𝑘) ) =
𝑑𝛽 (𝑘)
𝑑2 log ℒ(𝛽 (𝑘) )
𝐻(𝛽 (𝑘) ) = ′
𝑑𝛽 (𝑘) 𝑑𝛽 (𝑘)
2. Check whether 𝛽 (𝑘+1) − 𝛽 (𝑘) < 𝑡𝑜𝑙

• If true, then stop iterating and set 𝛽̂ = 𝛽 (𝑘+1)


• If false, then update 𝛽 (𝑘+1)

As can be seen from the updating equation, 𝛽 (𝑘+1) = 𝛽 (𝑘) only when 𝐺(𝛽 (𝑘) ) = 0 ie. where the
first derivative is equal to 0
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Let’s have a go at implementing the Newton-Raphson algorithm
First, we’ll create a class called PoissonRegression so we can easily recompute the values
of the log likelihood, gradient and Hessian for every iteration

In [7]: class PoissonRegression:

def __init__(self, y, X, β):


self.X = X
self.n, self.k = X.shape
self.y = y.reshape(self.n,1) # Reshape y as a n_by_1 column vector
self.β = β.reshape(self.k,1) # Reshape β as a k_by_1 column vector

def μ(self):
return np.exp(self.X @ self.β)

def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) - μ - np.log(factorial(y)))

def G(self):
y = self.y
μ = self.μ()
return X.T @ (y - μ)

def H(self):
X = self.X
μ = self.μ()
return -(X.T @ (μ * X))

Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector 𝛽 0
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates
Iteration will end when either:

• The difference between the parameter and the updated parameter is below a tolerance
level
• The maximum number of iterations has been achieved (meaning convergence is not
achieved)
19.6. MLE WITH NUMERICAL METHODS 305

So we can get an idea of what’s going on while the algorithm is running, an option dis-
play=True is added to print out values at each iteration

In [8]: def newton_raphson(model, tol=1e-3, max_iter=1000, display=True):

i = 0
error = 100 # Initial error value

# Print header of output


if display:
header = f'{"Iteration_k":<13}{"Log-likelihood":<16}{"θ":<60}'
print(header)
print("-" * len(header))

# While loop runs while any value in error is greater


# than the tolerance until max iterations are reached
while np.any(error > tol) and i < max_iter:
H, G = model.H(), model.G()
β_new = model.β - (np.linalg.inv(H) @ G)
error = β_new - model.β
model.β = β_new

# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)

i += 1

print(f'Number of iterations: {i}')


print(f'β_hat = {model.β.flatten()}')

return model.β.flatten() # Return a flat array for β (instead of a k_by_1 column vector)

Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X

In [9]: X = np.array([[1, 2, 5],


[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs


init_β = np.array([0.1, 0.1, 0.1])

# Create an object with Poisson model values


poi = PoissonRegression(y, X, β=init_β)

# Use newton_raphson to find the MLE


β_hat = newton_raphson(poi, display=True)

Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
3 -3.3788646 ['-5.92', '0.909', '0.82']
4 -3.3783559 ['-6.07', '0.933', '0.843']
5 -3.3783555 ['-6.08', '0.933', '0.843']
Number of iterations: 6
β_hat = [-6.07848205 0.93340226 0.84329625]

As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations
306 19. MAXIMUM LIKELIHOOD ESTIMATION

You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve
Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller
The gradient vector should be close to 0 at 𝛽̂

In [10]: poi.G()

Out[10]: array([[-3.95169228e-07],
[-1.00114805e-06],
[-7.73114562e-07]])

The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10

In [11]: logL = lambda x: -(x - 10) ** 2 - 10

def find_tangent(β, a=0.01):


y1 = logL(β)
y2 = logL(β+a)
x = np.array([[β, 1], [β+a, 1]])
m, c = np.linalg.lstsq(x, np.array([y1, y2]), rcond=None)[0]
return m, c

β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')

for β in [7, 8.5, 9.5, 10]:


β_line = np.linspace(β-2, β+2)
m, c = find_tangent(β)
y = m * β_line + c
ax.plot(β_line, y, '-', c='purple', alpha=0.8)
ax.text(β+2.05, y[-1], f'$G({β}) = {abs(m):.0f}$', fontsize=12)
ax.vlines(β, -24, logL(β), linestyles='--', alpha=0.5)
ax.hlines(logL(β), 6, β, linestyles='--', alpha=0.5)

ax.set(ylim=(-24, -4), xlim=(6, 13))


ax.set_xlabel(r'$\beta$', fontsize=15)
ax.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=25,
fontsize=15)
ax.grid(alpha=0.3)
plt.show()
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 307

Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize

19.7 Maximum Likelihood Estimation with statsmodels

Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value

In [12]: from statsmodels.api import Poisson


from scipy import stats

X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

stats_poisson = Poisson(y, X).fit()


print(stats_poisson.summary())

Optimization terminated successfully.


Current function value: 0.675671
Iterations 7
Poisson Regression Results
==============================================================================
308 19. MAXIMUM LIKELIHOOD ESTIMATION

Dep. Variable: y No. Observations: 5


Model: Poisson Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2546
Time: 15:37:09 Log-Likelihood: -3.3784
converged: True LL-Null: -4.5325
LLR p-value: 0.3153
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -6.0785 5.279 -1.151 0.250 -16.425 4.268
x1 0.9334 0.829 1.126 0.260 -0.691 2.558
x2 0.8433 0.798 1.057 0.291 -0.720 2.407
==============================================================================

Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture
Treisman starts by estimating equation Eq. (1), where:

• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)

The paper only considers the year 2008 for estimation


We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)

In [13]: # Keep only year 2008


df = df[df['year'] == 2008]

# Add a constant
df['const'] = 1

# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']

Then we can use the Poisson function from statsmodels to fit the model
We’ll use robust standard errors as in the author’s paper

In [14]: import statsmodels.api as sm

# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())

Optimization terminated successfully.


Current function value: 2.226090
Iterations 9
Poisson Regression Results
==============================================================================
Dep. Variable: numbil0 No. Observations: 197
Model: Poisson Df Residuals: 193
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 309

Method: MLE Df Model: 3


Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.8574
Time: 15:37:10 Log-Likelihood: -438.54
converged: True LL-Null: -3074.7
LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -29.0495 2.578 -11.268 0.000 -34.103 -23.997
lngdppc 1.0839 0.138 7.834 0.000 0.813 1.355
lnpop 1.1714 0.097 12.024 0.000 0.980 1.362
gattwto08 0.0060 0.007 0.868 0.386 -0.008 0.019
==============================================================================

Success! The algorithm was able to achieve convergence in 9 iterations


Our output indicates that GDP per capita, population, and years of membership in the Gen-
eral Agreement on Tariffs and Trade (GATT) are positively related to the number of billion-
aires a country has, as expected
Let’s also estimate the author’s more full-featured models and display them in a single table

In [15]: from statsmodels.iolib.summary2 import summary_col

regs = [reg1, reg2, reg3]


reg_names = ['Model 1', 'Model 2', 'Model 3']
info_dict = {'Pseudo R-squared': lambda x: f"{x.prsquared:.2f}",
'No. observations': lambda x: f"{int(x.nobs):d}"}
regressor_order = ['const',
'lngdppc',
'lnpop',
'gattwto08',
'lnmcap08',
'rintr',
'topint08',
'nrrents',
'roflaw']
results = []

for reg in regs:


result = sm.Poisson(df[['numbil0']], df[reg],
missing='drop').fit(cov_type='HC0', maxiter=100, disp=0)
results.append(result)

results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in 2008')
print(results_table)

Table 1 - Explaining the Number of Billionaires in 2008


=================================================
Model 1 Model 2 Model 3
-------------------------------------------------
const -29.050*** -19.444*** -20.858***
(2.578) (4.820) (4.255)
lngdppc 1.084*** 0.717*** 0.737***
(0.138) (0.244) (0.233)
lnpop 1.171*** 0.806*** 0.929***
(0.097) (0.213) (0.195)
gattwto08 0.006 0.007 0.004
(0.007) (0.006) (0.006)
lnmcap08 0.399** 0.286*
(0.172) (0.167)
rintr -0.010 -0.009
310 19. MAXIMUM LIKELIHOOD ESTIMATION

(0.010) (0.010)
topint08 -0.051***-0.058***
(0.011) (0.012)
nrrents -0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R-squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15

In [16]: data = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08', 'rintr',


'topint08', 'nrrents', 'roflaw', 'numbil0', 'country']
results_df = df[data].dropna()

# Use last model (model 3)


results_df['prediction'] = results[-1].predict()

# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']

# Sort in descending order


results_df.sort_values('difference', ascending=False, inplace=True)

# Plot the first 15 data points


results_df[:15].plot('country', 'difference', kind='bar', figsize=(12,8), legend=False)
plt.ylabel('Number of billionaires above predicted level')
plt.xlabel('Country')
plt.show()
19.8. SUMMARY 311

As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR

19.8 Summary

In this lecture, we used Maximum Likelihood Estimation to estimate the parameters of a


Poisson model
statsmodels contains other built-in likelihood models such as Probit and Logit
For further flexibility, statsmodels provides a way to specify the distribution manually us-
ing the GenericLikelihoodModel class - an example notebook can be found here

19.9 Exercises

19.9.1 Exercise 1

Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions
312 19. MAXIMUM LIKELIHOOD ESTIMATION

We could use a probit regression model, where the pmf of 𝑦𝑖 is

𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)

Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability)
𝛽 is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution

19.9.2 Exercise 2

Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture

1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥

⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦

Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement

In [17]: from statsmodels.discrete.discrete_model import Probit

Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values

19.10 Solutions

19.10.1 Exercise 1

The log-likelihood can be written as

𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1

Using the fundamental theorem of calculus, the derivative of a cumulative probability


distribution is its marginal distribution

𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
19.10. SOLUTIONS 313

where 𝜙 is the marginal normal distribution


The gradient vector of the Probit model is

𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖

The Hessian of the Probit model is

𝑛
𝜕 2 log ℒ ′ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙𝑖 (x′𝑖 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x 𝑖 𝛽)[𝑦 𝑖 ′ 2
+ (1 − 𝑦 𝑖 ) ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]

Using these results, we can write a class for the Probit model as follows

In [18]: from scipy.stats import norm

class ProbitRegression:

def __init__(self, y, X, β):


self.X, self.y, self.β = X, y, β
self.n, self.k = X.shape

def μ(self):
return norm.cdf(self.X @ self.β.T)

def �(self):
return norm.pdf(self.X @ self.β.T)

def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 - y) * np.log(1 - μ))

def G(self):
μ = self.μ()
� = self.�()
return np.sum((X.T * y * � / μ - X.T * (1 - y) * � / (1 - μ)), axis=1)

def H(self):
X = self.X
β = self.β
μ = self.μ()
� = self.�()
a = (� + (X @ β.T) * μ) / μ**2
b = (� - (X @ β.T) * (1 - μ)) / (1 - μ)**2
return -(� * (y * a + (1 - y) * b) * X.T) @ X

19.10.2 Exercise 2
In [19]: X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs


β = np.array([0.1, 0.1, 0.1])

# Create instance of Probit regression class


prob = ProbitRegression(y, X, β)

# Run Newton-Raphson algorithm


newton_raphson(prob)
314 19. MAXIMUM LIKELIHOOD ESTIMATION

Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
β_hat = [-1.54625858 0.77778952 -0.09709757]

Out[19]: array([-1.54625858, 0.77778952, -0.09709757])

In [20]: # Use statsmodels to verify results

print(Probit(y, X).fit().summary())

Optimization terminated successfully.


Current function value: 0.473746
Iterations 6
Probit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Probit Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2961
Time: 15:37:10 Log-Likelihood: -2.3687
converged: True LL-Null: -3.3651
LLR p-value: 0.3692
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -1.5463 1.866 -0.829 0.407 -5.204 2.111
x1 0.7778 0.788 0.986 0.324 -0.768 2.323
x2 -0.0971 0.590 -0.165 0.869 -1.254 1.060
==============================================================================
Part V

Tools and Techniques

315
20

Geometric Series for Elementary


Economics

20.1 Contents

• Overview 20.2
• Key Formulas 20.3
• Example: The Money Multiplier in Fractional Reserve Banking 20.4
• Example: The Keynesian Multiplier 20.5
• Example: Interest Rates and Present Values 20.6
• Back to the Keynesian Multiplier 20.7

20.2 Overview

The lecture describes important ideas in economics that use the mathematics of geometric
series
Among these are

• the Keynesian multiplier


• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets

(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that

“in economics, a little knowledge of geometric series goes a long way “

Below we’ll use the following imports

In [1]: import matplotlib.pyplot as plt


import numpy as np

317
318 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

20.3 Key Formulas

To start, let 𝑐 be a real number that lies strictly between −1 and 1

• We often write this as 𝑐 ∈ (−1, 1)


• Here (−1, 1) denotes the collection of all real numbers that are strictly less than 1 and
strictly greater than −1
• The symbol ∈ means in or belongs to the set after the symbol

We want to evaluate geometric series of two types – infinite and finite

20.3.1 Infinite Geometric Series

The first type of geometric that interests us is the infinite series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯

Where ⋯ means that the series continues without limit


The key formula is

1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐
To prove key formula Eq. (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1),
then the outcome is the equation 1 = 1

20.3.2 Finite Geometric Series

The second series that interests us is the finite geomtric series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇

where 𝑇 is a positive integer


The key formula here is

1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1)
We now move on to describe some famuous economic applications of geometric series

20.4 Example: The Money Multiplier in Fractional Reserve


Banking

In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
20.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING319

• In recent times

– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash

• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)

– cash was a gold or silver coin


– a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)

Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfying 0 <
𝑟 < 1), banks create money by issuing deposits backed by fractional reserves plus loans
that they make to their customers
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system
The geometric series formula Eq. (1) is at the heart of the classic model of the money cre-
ation process – one that leads us to the celebrated money multiplier

20.4.1 A Simple Model

There is a set of banks named 𝑖 = 0, 1, 2, …


Bank 𝑖’s loans 𝐿𝑖 , deposits 𝐷𝑖 , and reserves 𝑅𝑖 must satisfy the balance sheet equation (be-
cause balance sheets balance):

𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖

The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand)
Ecah bank 𝑖 sets its reserves to satisfy the equation

𝑅𝑖 = 𝑟𝐷𝑖 (2)

where 𝑟 ∈ (0, 1) is its reserve-deposit ratio or reserve ratio for short

• the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
320 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Next we add a theory stating that bank 𝑖 + 1’s deposits depend entirely on loans made by
bank 𝑖, namely

𝐷𝑖+1 = 𝐿𝑖 (3)

Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1

• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1

Finally, we add an initial condition about an exogenous level of bank 0’s deposits

𝐷0 is given exogenously

We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0
Now we do a little algebra
Combining equations Eq. (2) and Eq. (3) tells us that

𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (4)

This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves
Combining equation Eq. (4) with equation Eq. (3) tells us that

𝐷𝑖+1 = (1 − 𝑟)𝐷𝑖 for 𝑖 ≥ 0

which implies that

𝐷𝑖 = (1 − 𝑟)𝑖 𝐷0 for 𝑖 ≥ 0 (5)

Equation Eq. (5) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series

1, (1 − 𝑟), (1 − 𝑟)2 , ⋯

Therefore, the sum of all deposits in our banking system 𝑖 = 0, 1, 2, … is


𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (6)
𝑖=0
1 − (1 − 𝑟) 𝑟

20.4.2 Money Multiplier

The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system
1
Equation Eq. (6) asserts that the money multiplier is 𝑟
20.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 321

• an initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system

according to 𝐷0 = ∑𝑖=0 𝑅𝑖

20.5 Example: The Keynesian Multiplier

The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which

• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand

20.5.1 Static Version

An elementary Keynesian model of national income determination consists of three equations


that describe aggegate demand for 𝑦 and its components
The first equation is a national income identity asserting that consumption 𝑐 plus investment
𝑖 equals national income 𝑦:

𝑐+𝑖 = 𝑦

The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:

𝑐 = 𝑏𝑦

The fraction 𝑏 ∈ (0, 1) is called the marginal propensity to consume


The fraction 1 − 𝑏 ∈ (0, 1) is called the marginal propensity to save
The third equation simply states that investment is exogenous at level 𝑖

• exogenous means determined outside this model

Substituting the second equation into the first gives (1 − 𝑏)𝑦 = 𝑖


Solving this equation for 𝑦 gives

1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
322 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS


𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0

where 𝑡 is a nonnegative integer


So we arrive at the following equivalent expressions for the multiplier:


1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0


The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next

20.5.2 Dynamic Version

We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account

• we add a one-period lag in how income affects consumption

We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡


We modify our consumption function to assume the form

𝑐𝑡 = 𝑏𝑦𝑡−1

so that 𝑏 is the marginal propensity to consume (now) out of last period’s income
We begin wtih an initial condition stating that

𝑦−1 = 0

We also assume that

𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0

so that investment is constant over time


It follows that

𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖

and

𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖

and

𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 323

and more generally

𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖

or

1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏

Evidently, as 𝑡 → +∞,

1
𝑦𝑡 → 𝑖
1−𝑏

Remark 1: The above formula is often applied to assert that an exogenous increase in
investment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by
amounts

Δ𝑖, (1 + 𝑏)Δ𝑖, (1 + 𝑏 + 𝑏2 )Δ𝑖, ⋯

at times 0, 1, 2, …
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures
If we generalize the model so that the national income identity becomes

𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡

then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures

20.6 Example: Interest Rates and Present Values

We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate

• if the nominal interest rate is 5 percent, then 𝑟 = .05

A one-period gross nominal interest rate 𝑅 is defined as

𝑅 = 1 + 𝑟 ∈ (1, 2)

• if 𝑟 = .05, then 𝑅 = 1.05


324 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now
The price at which these exchanges occur is the gross nominal interest rate

• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow


• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net in-
terest rate 𝑟

We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, …
Two important geometric sequences are

1, 𝑅, 𝑅2 , ⋯ (7)

and

1, 𝑅−1 , 𝑅−2 , ⋯ (8)

Sequence Eq. (7) tells us how dollar values of an investment accumulate through time
Sequence Eq. (8) tells us how to discount future dollars to get their values in terms of to-
day’s dollars

20.6.1 Accumulation

Geometric sequence Eq. (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates

• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on

Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence

𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯

tells how our account accumulates at dates 𝑡 = 0, 1, 2, …


20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 325

20.6.2 Discounting

Geometric sequence Eq. (8) tells us how much future dollars are worth in terms of today’s
dollars
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡
It follows that

• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1


• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗

So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today)

20.6.3 Application to Asset Pricing

A lease requires a payments stream of 𝑥𝑡 dollars at times 𝑡 = 0, 1, 2, … where

𝑥𝑡 = 𝐺𝑡 𝑥0

where 𝐺 = (1 + 𝑔) and 𝑔 ∈ (0, 1)


Thus, lease payments increase at 𝑔 percent per period
For a reason soon to be revealed, we assume that 𝐺 < 𝑅
The present value of the lease is

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1

where the last line uses the formula for an infinite geometric series
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and𝑔 are typically
small numbers, e.g., .05 or .03
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,

1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟

1
and the fact that 𝑟 is small to aproximate 1+𝑟 ≈ 1−𝑟
Use this approximation to write 𝑝0 as
326 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔

where the last step uses the approximation 𝑟𝑔 ≈ 0


The approximation

𝑥0
𝑝0 =
𝑟−𝑔

is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔
We can also extend the asset pricing formula so that it applies to finite leases
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again

𝑥𝑡 = 𝐺𝑡 𝑥0

The present value of this lease is:

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1

Applying the Taylor series to 𝑅−(𝑇 +1) about 𝑟 = 0 we get:

1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2

Similarly, applying the Taylor series to 𝐺𝑇 +1 about 𝑔 = 0:

(1 + 𝑔)𝑇 +1 = 1 + (𝑇 + 1)𝑔(1 + 𝑔)𝑇 + (𝑇 + 1)𝑇 𝑔2 (1 + 𝑔)𝑇 −1 + ⋯ ≈ 1 + (𝑇 + 1)𝑔

Thus, we get the following approximation:

𝑥0 (1 − (1 + (𝑇 + 1)𝑔)(1 − 𝑟(𝑇 + 1)))


𝑝0 =
1 − (1 − 𝑟)(1 + 𝑔)

Expanding:
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 327

𝑥0 (1 − 1 + (𝑇 + 1)2 𝑟𝑔 − 𝑟(𝑇 + 1) + 𝑔(𝑇 + 1))


𝑝0 =
1 − 1 + 𝑟 − 𝑔 + 𝑟𝑔
𝑥 (𝑇 + 1)((𝑇 + 1)𝑟𝑔 + 𝑟 − 𝑔)
= 0
𝑟 − 𝑔 + 𝑟𝑔
𝑥0 (𝑇 + 1)(𝑟 − 𝑔) 𝑥0 𝑟𝑔(𝑇 + 1)
≈ +
𝑟−𝑔 𝑟−𝑔
𝑥0 𝑟𝑔(𝑇 + 1)
= 𝑥0 (𝑇 + 1) +
𝑟−𝑔

We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in python
First we plot the true finite stream present-value after computing it below

In [2]: # True present value of a finite lease


def finite_lease_pv(T, g, r, x_0):
G = (1 + g)
R = (1 + r)
return (x_0 * (1 - G**(T + 1) * R**(-T - 1))) / (1 - G * R**(-1))
# First approximation for our finite lease

def finite_lease_pv_approx_f(T, g, r, x_0):


p = x_0 * (T + 1) + x_0 * r * g * (T + 1) / (r - g)
return p

# Second approximation for our finite lease


def finite_lease_pv_approx_s(T, g, r, x_0):
return (x_0 * (T + 1))

# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))

Now that we have test run our functions, we can plot some outcomes
First we study the quality of our approximations

In [3]: g = 0.02
r = 0.03
x_0 = 1
T_max = 50
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = finite_lease_pv_approx_f(T, g, r, x_0)
y_3 = finite_lease_pv_approx_s(T, g, r, x_0)
ax.plot(T, y_1, label='True T-period Lease PV')
ax.plot(T, y_2, label='T-period Lease First-order Approx.')
ax.plot(T, y_3, label='T-period Lease First-order Approx. adj.')
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
328 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Evidently our approximations perform well for small values of 𝑇


However, holding 𝑔 and r fixed, our approximations deteriorate as 𝑇 increases
Next we compare the infinite and finite duration lease present values over different lease
lengths 𝑇

In [4]: # Convergence of infinite and finite


T_max = 1000
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Infinite and Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = np.ones(T_max+1)*infinite_lease(g, r, x_0)
ax.plot(T, y_1, label='T-period lease PV')
ax.plot(T, y_2, '--', label='Infinite lease PV')
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
ax.legend()
plt.show()
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 329

The above graphs shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetural lease
Now we consider two different views of what happens as 𝑟 and 𝑔 covary

In [5]: # First view


# Changing r and g
fig, ax = plt.subplots()
ax.set_title('Value of lease of length $T$')
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)
# r >> g, much bigger than g
r = 0.9
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r\gg g$')
# r > g
r = 0.5
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r>g$', color='green')

# r ~ g, not defined when r = g, but approximately goes to straight line with slope 1
r = 0.4001
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label=r'$r \approx g$', color='orange')

# r < g
r = 0.4
g = 0.5
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r<g$', color='red')
ax.legend()
plt.show()
330 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

The above graphs gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length
𝑇 = +∞ is to have finite value
For fans of 3-d graphs the same point comes through in the following graph
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!

In [6]: # Second view


from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
T = 3
ax = fig.gca(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.01, 0.99, 0.005)

rr, gg = np.meshgrid(r, g)
z = finite_lease_pv(T, gg, rr, x_0)

# Removes points where undefined


same = (rr == gg)
z[same] = np.nan
surf = ax.plot_surface(rr, gg, z, cmap=cm.coolwarm, antialiased=True, clim=(0, 15))
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.set_xlabel('$r$')
ax.set_ylabel('$g$')
ax.set_zlabel('Present Value, $p_0$')
ax.view_init(20, 10)
ax.set_title('Three Period Lease PV with Varying $g$ and $r$')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encou


"""
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encoun
"""
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/colors.py:512: RuntimeWarning: invalid value encou
xa[xa < 0] = -1
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 331

We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔
We will use a library called SymPy
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease
After that, we’ll use SymPy to compute derivatives

In [7]: import sympy as sym


from sympy import init_printing

# Creates algebraic symbols that can be used in an algebraic expression


g, r, x0 = sym.symbols('g, r, x0')
G = (1 + g)
R = (1 + r)
p0 = x0 / (1 - G * R**(-1))
init_printing()
print('Our formula is:')
p0

Our formula is:

Out[7]:

𝑥0
𝑔+1
− 𝑟+1 + 1

In [8]: print('dp0 / dg is:')


dp_dg = sym.diff(p0, g)
dp_dg

dp0 / dg is:
332 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Out[8]:

𝑥0
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

In [9]: print('dp0 / dr is:')


dp_dr = sym.diff(p0, r)
dp_dr

dp0 / dr is:

Out[9]:

𝑥0 (−𝑔 − 1)
2 2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

We can see that for 𝜕𝑝


𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, this equation
0

will always be negative


Similarly, 𝜕𝑝
𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, this equation will
0

always be postive

20.7 Back to the Keynesian Multiplier

We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed

In [10]: # Function that calculates a path of y


def calculate_y(i, b, g, T, y_init):
y = np.zeros(T+1)
y[0] = i + b * y_init + g
for t in range(1, T+1):
y[t] = b * y[t-1] + i + g
return y

# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100

fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 333

In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed

In [11]: # Changing fraction of consumption


b_0 = 1/3
b_1 = 2/3
b_2 = 5/6
b_3 = 0.9

fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in (b_0, b_1, b_2, b_3):
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
334 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Increasing the marginal propensity to consumer 𝑏 increases the path of output over time

In [12]: x = np.arange(0, T+1)


y_0 = calculate_y(i_0, b, g_0, T, y_init)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(6, 10))
fig.subplots_adjust(hspace=0.3)

# Changing initial investment:


i_1 = 0.4
y_1 = calculate_y(i_1, b, g_0, T, y_init)
ax1.set_title('An Increase in Investment on Output')
ax1.plot(x, y_0, label=r'$i=0.3$', linestyle='--')
ax1.plot(x, y_1, label=r'$i=0.4$')
ax1.legend()
ax1.set_ylabel('$y_t$')
ax1.set_xlabel('$t$')

# Changing government spending


g_1 = 0.4
y_1 = calculate_y(i_0, b, g_1, T, y_init)
ax2.set_title('An Increase in Government Spending on Output')
ax2.plot(x, y_0, label=r'$g=0.3$', linestyle='--')
ax2.plot(x, y_1, label=r'$g=0.4$')
ax2.legend()
ax2.set_ylabel('$y_t$')
ax2.set_xlabel('$t$')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 335

Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical
336 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
21

Linear Algebra

21.1 Contents

• Overview 21.2

• Vectors 21.3

• Matrices 21.4

• Solving Systems of Equations 21.5

• Eigenvalues and Eigenvectors 21.6

• Further Topics 21.7

• Exercises 21.8

• Solutions 21.9

21.2 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as

𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2

or, more generally,

𝑦1 = 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑘 𝑥𝑘


⋮ (1)
𝑦𝑛 = 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘

The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛

337
338 21. LINEAR ALGEBRA

When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions

• Does a solution actually exist?


• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
• If a solution exists, how should we compute it?

These are the kinds of topics addressed by linear algebra


In this lecture we will cover the basics of linear and matrix algebra, treating both theory and
computation
We admit some overlap with this lecture, where operations on NumPy arrays were first ex-
plained
Note that this lecture is more theoretical than most, and contains background material that
will be used in applications as we go along

21.3 Vectors

A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by R𝑛
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner

In [1]: import matplotlib.pyplot as plt


%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()
21.3. VECTORS 339

21.3.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we
now describe
As a matter of definition, when we add two vectors, we add them element-by-element

𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑥
⎣ 𝑛⎦ ⎣ 𝑛⎦𝑦 𝑥
⎣ 𝑛 + 𝑦 𝑛⎦

Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces

𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦

Scalar multiplication is illustrated in the next figure

In [2]: import numpy as np

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
340 21. LINEAR ALGEBRA

for spine in ['left', 'bottom']:


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')

scalars = (-2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
more commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
syntax
21.3. VECTORS 341

In [3]: x = np.ones(3) # Vector of three ones


y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

Out[3]: array([3., 5., 7.])

In [4]: 4 * x

Out[4]: array([4., 4., 4.])

21.3.2 Inner Product and Norm

The inner product of vectors 𝑥, 𝑦 ∈ R𝑛 is defined as

𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1

Two vectors are called orthogonal if their inner product is zero


The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is
defined as

1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1

The expression ‖𝑥 − 𝑦‖ is thought of as the distance between 𝑥 and 𝑦


Continuing on from the previous example, the inner product and norm can be computed as
follows

In [5]: np.sum(x * y) # Inner product of x and y

Out[5]: 12.0

In [6]: np.sqrt(np.sum(x**2)) # Norm of x, take one

Out[6]: 1.7320508075688772

In [7]: np.linalg.norm(x) # Norm of x, take two

Out[7]: 1.7320508075688772

21.3.3 Span

Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations
New vectors created in this manner are called linear combinations of 𝐴
In particular, 𝑦 ∈ R𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if

𝑦 = 𝛽1 𝑎1 + ⋯ + 𝛽𝑘 𝑎𝑘 for some scalars 𝛽1 , … , 𝛽𝑘


342 21. LINEAR ALGEBRA

In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination
The set of linear combinations of 𝐴 is called the span of 𝐴
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in R3
The span is a two-dimensional plane passing through these two points and the origin

In [8]: from matplotlib import cm


from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import interp2d

fig = plt.figure(figsize=(10, 8))


ax = fig.gca(projection='3d')

x_min, x_max = -5, 5


y_min, y_max = -5, 5

α, β = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),


xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane


def f(x, y):
return α * x + β * y

# Vector locations, by coordinate


x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)

# Draw the plane


grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()
21.3. VECTORS 343

Examples
If 𝐴 contains only one vector 𝑎1 ∈ R2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of R3 , that is

1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥

⎣0⎦ ⎣0⎦ ⎣1⎦

then the span of 𝐴 is all of R3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ R3 , we can write

𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3

Now consider 𝐴0 = {𝑒1 , 𝑒2 , 𝑒1 + 𝑒2 }


If 𝑦 = (𝑦1 , 𝑦2 , 𝑦3 ) is any linear combination of these vectors, then 𝑦3 = 0 (check it)
Hence 𝐴0 fails to span all of R3

21.3.4 Linear Independence

As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors
344 21. LINEAR ALGEBRA

The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 is said to be

• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴


• linearly independent if it is not linearly dependent

Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in R3 as a
plane through the origin
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be

• linearly dependent if 𝑎3 lies in the plane


• linearly independent otherwise

As another illustration of the concept, since R𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in R𝑛 must be linearly
dependent
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛

1. No vector in 𝐴 can be formed as a linear combination of the other elements


2. If 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘 = 0 for scalars 𝛽1 , … , 𝛽𝑘 , then 𝛽1 = ⋯ = 𝛽𝑘 = 0

(The zero in the first expression is the origin of R𝑛 )

21.3.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛 is linearly independent and

𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘

then no other coefficient sequence 𝛾1 , … , 𝛾𝑘 will produce the same vector 𝑦


Indeed, if we also have 𝑦 = 𝛾1 𝑎1 + ⋯ 𝛾𝑘 𝑎𝑘 , then

(𝛽1 − 𝛾1 )𝑎1 + ⋯ + (𝛽𝑘 − 𝛾𝑘 )𝑎𝑘 = 0

Linear independence now implies 𝛾𝑖 = 𝛽𝑖 for all 𝑖

21.4 Matrices

Matrices are a neat way of organizing data for use in linear operations
21.4. MATRICES 345

An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:

𝑎11 𝑎12 ⋯ 𝑎1𝑘


⎡𝑎 𝑎22 ⋯ 𝑎2𝑘 ⎤
𝐴 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⎥
⎣𝑎𝑛1 𝑎𝑛2 ⋯ 𝑎𝑛𝑘 ⎦
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector
If 𝑛 = 𝑘, then 𝐴 is called square
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and
denoted 𝐴′ or 𝐴⊤
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix and denoted by 𝐼

21.4.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:

𝑎11 ⋯ 𝑎1𝑘 𝛾𝑎11 ⋯ 𝛾𝑎1𝑘


𝛾𝐴 = 𝛾 ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ∶= ⎡ ⋮
⎥ ⎢ ⋮ ⋮ ⎤⎥
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝛾𝑎𝑛1 ⋯ 𝛾𝑎𝑛𝑘 ⎦
and

𝑎11 ⋯ 𝑎1𝑘 𝑏11 ⋯ 𝑏1𝑘 𝑎11 + 𝑏11 ⋯ 𝑎1𝑘 + 𝑏1𝑘


𝐴+𝐵 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤+⎡ ⋮
⎥ ⎢ ⋮ ⋮ ⎤ ∶= ⎡
⎥ ⎢ ⋮ ⋮ ⋮ ⎤

⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑏𝑛1 ⋯ 𝑏𝑛𝑘 ⎦ ⎣𝑎𝑛1 + 𝑏𝑛1 ⋯ 𝑎𝑛𝑘 + 𝑏𝑛𝑘 ⎦
In the latter case, the matrices must have the same shape in order for the definition to make
sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page
346 21. LINEAR ALGEBRA

If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting


matrix 𝐴𝐵 is 𝑛 × 𝑚
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥
According to the preceding rule, this gives us an 𝑛 × 1 column vector

𝑎11 ⋯ 𝑎1𝑘 𝑥1 𝑎11 𝑥1 + ⋯ + 𝑎1𝑘 𝑥𝑘


𝐴𝑥 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡ ⋮ ⎤ ∶= ⎡
⎥⎢ ⎥ ⎢ ⋮ ⎤
⎥ (2)
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑥𝑘 ⎦ ⎣𝑎𝑛1 𝑥1 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘 ⎦

Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing

Another important special case is the identity matrix


You should check that if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴
If 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴

21.4.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations [1]
You can create them manually from tuples of tuples (or lists of lists) as follows

In [9]: A = ((1, 2),


(3, 4))

type(A)

Out[9]: tuple

In [10]: A = np.array(A)

type(A)

Out[10]: numpy.ndarray

In [11]: A.shape

Out[11]: (2, 2)

The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
21.5. SOLVING SYSTEMS OF EQUATIONS 347

In [12]: A = np.identity(3)
B = np.ones((3, 3))
2 * A

Out[12]: array([[2., 0., 0.],


[0., 2., 0.],
[0., 0., 2.]])

In [13]: A + B

Out[13]: array([[2., 1., 1.],


[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol


In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multipli-
cation
See here for more discussion

21.4.3 Matrices as Maps

Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ R𝑘 into
𝑦 = 𝐴𝑥 ∈ R𝑛
These kinds of functions have a special property: they are linear
A function 𝑓 ∶ R𝑘 → R𝑛 is called linear if, for all 𝑥, 𝑦 ∈ R𝑘 and all scalars 𝛼, 𝛽, we have

𝑓(𝛼𝑥 + 𝛽𝑦) = 𝛼𝑓(𝑥) + 𝛽𝑓(𝑦)

You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥

21.5 Solving Systems of Equations

Recall again the system of equations Eq. (1)


If we compare Eq. (1) and Eq. (2), we see that Eq. (1) can now be written more conveniently
as

𝑦 = 𝐴𝑥 (3)

The problem we face is to determine a vector 𝑥 ∈ R𝑘 that solves Eq. (3), taking 𝑦 and 𝐴 as
given
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥)
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
348 21. LINEAR ALGEBRA

In [14]: def f(x):


return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = -1, 1


x = np.linspace(xmin, xmax, 160)
y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),


yticks=(), xticks=())

ax.plot(x, y, 'k-', lw=2, label='$f$')


ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)


ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()
21.5. SOLVING SYSTEMS OF EQUATIONS 349

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓
Can we impose conditions on 𝐴 in Eq. (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then

𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘

Hence the range of 𝑓(𝑥) = 𝐴𝑥 is exactly the span of the columns of 𝐴


We want the range to be large so that it contains arbitrary 𝑦
As you might recall, the condition that we want for the span to be large is linear indepen-
dence
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness
Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and
𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘 , then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧
350 21. LINEAR ALGEBRA

21.5.1 The Square Matrix Case

Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary 𝑦 ∈ R𝑛 , we hope to find a unique 𝑥 ∈ R𝑛 such that 𝑦 = 𝐴𝑥
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of R𝑛
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥
Moreover, the solution is unique
In particular, the following are equivalent

1. The columns of 𝐴 are linearly independent


2. For any 𝑦 ∈ R𝑛 , the equation 𝑦 = 𝐴𝑥 has a unique solution

The property of having linearly independent columns is sometimes expressed as having full
column rank
Inverse Matrices
Can we give some sort of expression for the solution?
If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦
A similar expression is available in the matrix case
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative in-
verse matrix 𝐴−1 , with the property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦
This is the solution that we’re looking for
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it here
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted

21.5.2 More Rows than Columns

This is the 𝑛 × 𝑘 case with 𝑛 > 𝑘


This case is very important in many settings, not least in the setting of linear regression
(where 𝑛 is the number of observations, and 𝑘 is the number of explanatory variables)
Given arbitrary 𝑦 ∈ R𝑛 , we seek an 𝑥 ∈ R𝑘 such that 𝑦 = 𝐴𝑥
In this setting, the existence of a solution is highly unlikely
21.5. SOLVING SYSTEMS OF EQUATIONS 351

Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of R𝑛
This span is very “unlikely” to contain arbitrary 𝑦 ∈ R𝑛
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3
Imagine an arbitrarily chosen 𝑦 ∈ R3 , located somewhere in that three-dimensional space
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”
As a result, in the 𝑛 > 𝑘 case we usually give up on existence
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible
To solve this problem, one can use either calculus or the theory of orthogonal projections
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes

21.5.3 More Columns than Rows

This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds
For example, consider the case where 𝑘 = 3 and 𝑛 = 2
Thus, the columns of 𝐴 consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span
R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write

𝑦 = 𝑥1 (𝛼𝑎2 + 𝛽𝑎3 ) + 𝑥2 𝑎2 + 𝑥3 𝑎3 = (𝑥1 𝛼 + 𝑥2 )𝑎2 + (𝑥1 𝛽 + 𝑥3 )𝑎3

In other words, uniqueness fails

21.5.4 Linear Equations with SciPy

Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code

In [15]: from scipy.linalg import inv, solve, det


352 21. LINEAR ALGEBRA

A = ((1, 2), (3, 4))


A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

Out[15]: -2.0

In [16]: A_inv = inv(A) # Compute the inverse


A_inv

Out[16]: array([[-2. , 1. ],
[ 1.5, -0.5]])

In [17]: x = A_inv @ y # Solution


A @ x # Should equal y

Out[17]: array([[1.],
[1.]])

In [18]: solve(A, y) # Produces the same solution

Out[18]: array([[-1.],
[ 1.]])

Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y)

21.6 Eigenvalues and Eigenvectors

Let 𝐴 be an 𝑛 × 𝑛 square matrix


If 𝜆 is scalar and 𝑣 is a non-zero vector in R𝑛 such that

𝐴𝑣 = 𝜆𝑣

then we say that 𝜆 is an eigenvalue of 𝐴, and 𝑣 is an eigenvector


Thus, an eigenvector of 𝐴 is a vector such that when the map 𝑓(𝑥) = 𝐴𝑥 is applied, 𝑣 is
merely scaled
The next figure shows two eigenvectors (blue arrows) and their images under 𝐴 (red arrows)
As expected, the image 𝐴𝑣 of each 𝑣 is just a scaled version of the original

In [19]: from scipy.linalg import eig

A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
21.6. EIGENVALUES AND EIGENVECTORS 353

for spine in ['left', 'bottom']:


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = -3, 3


ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector


for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector


for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through


x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)

plt.show()
354 21. LINEAR ALGEBRA

The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree 𝑛
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows

1. The determinant of 𝐴 equals the product of the eigenvalues


2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues
3. If 𝐴 is symmetric, then all of its eigenvalues are real
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛

A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows

In [20]: A = ((1, 2),


(2, 1))

A = np.array(A)
evals, evecs = eig(A)
evals

Out[20]: array([ 3.+0.j, -1.+0.j])

In [21]: evecs

Out[21]: array([[ 0.70710678, -0.70710678],


[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors


Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check
it), the eig routine normalizes the length of each eigenvector to one

21.6.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that

𝐴𝑣 = 𝜆𝐵𝑣

This can be solved in SciPy via scipy.linalg.eig(A, B)


Of course, if 𝐵 is square and invertible, then we can treat the generalized eigenvalue problem
as an ordinary eigenvalue problem 𝐵−1 𝐴𝑣 = 𝜆𝑣, but this is not always the case
21.7. FURTHER TOPICS 355

21.7 Further Topics

We round out our discussion by briefly mentioning several other important topics

21.7.1 Series Expansions

Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,

then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1
A generalization of this idea exists in the matrix setting
Matrix Norms
Let 𝐴 be a square matrix, and let

‖𝐴‖ ∶= max ‖𝐴𝑥‖


‖𝑥‖=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin [2]
Neumann’s Theorem
Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴
In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ N, then 𝐼 − 𝐴 is invertible,
and


(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0

Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,

𝜌(𝐴) = lim ‖𝐴𝑘 ‖1/𝑘


𝑘→∞

Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of 𝐴
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1
In which case Eq. (4) is valid

21.7.2 Positive Definite Matrices

Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix


We say that 𝐴 is
356 21. LINEAR ALGEBRA

1. positive definite if 𝑥′ 𝐴𝑥 > 0 for every 𝑥 ∈ R𝑛 {0}


2. positive semi-definite or nonnegative definite if 𝑥′ 𝐴𝑥 ≥ 0 for every 𝑥 ∈ R𝑛

Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse)

21.7.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let

• 𝑧, 𝑥 and 𝑎 all be 𝑛 × 1 vectors


• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector

Then

𝜕𝑎′ 𝑥
1. 𝜕𝑥 = 𝑎
𝜕𝐴𝑥 ′
2. 𝜕𝑥 = 𝐴

𝜕𝑥 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧 ′
5. 𝜕𝐵 = 𝑦𝑧

Exercise 1 below asks you to apply these formulas

21.7.4 Further Reading

The documentation of the scipy.linalg submodule can be found here


Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the
same lines as above, with solved exercises
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra
is [69]

21.8 Exercises

21.8.1 Exercise 1

Let 𝑥 be a given 𝑛 × 1 vector and consider the problem

𝑣(𝑥) = max {−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

subject to the linear constraint

𝑦 = 𝐴𝑥 + 𝐵𝑢

Here
21.9. SOLUTIONS 357

• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix


• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite

(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)


One way to solve the problem is to form the Lagrangian

ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

where 𝜆 is an 𝑛 × 1 vector of Lagrange multipliers


Try applying the formulas given above for differentiating quadratic and linear forms to ob-
tain the first-order conditions for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with
respect to 𝜆
Show that these conditions imply that

1. 𝜆 = −2𝑃 𝑦
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

As we will see, in economic contexts Lagrange multipliers often are shadow prices

Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) −
𝑢′ 𝑄𝑢 with respect to 𝑢. You can verify that this leads to the same maximizer.

21.9 Solutions

21.9.1 Solution to Exercise 1

We have an optimization problem:

𝑣(𝑥) = max{−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

s.t.

𝑦 = 𝐴𝑥 + 𝐵𝑢

with primitives

• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix


• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
358 21. LINEAR ALGEBRA

The associated Lagrangian is :

𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦

since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies

𝜆 = −2𝑃 𝑦

2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives

𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0

Substituting the linear constraint 𝑦 = 𝐴𝑥 + 𝐵𝑢 into above equation gives

𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0

(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0

which is the first-order condition for maximizing L w.r.t. u


Thus, the optimal choice of u must satisfy

𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,

which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get

𝑣(𝑥) = max{−(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢}


𝑢

Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 𝑤𝑖𝑡ℎ 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥


21.9. SOLUTIONS 359

To evaluate the function

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢


= −(𝑥′ 𝐴′ + 𝑢′ 𝐵′ )𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑥′ 𝐴′ 𝑃 𝐵𝑢 − 𝑢′ 𝐵′ 𝑃 𝐵𝑢 − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢

For simplicity, denote by 𝑆 ∶= (𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴, then $u = -Sx$


Regarding the second term −2𝑢′ 𝐵′ 𝑃 𝐴𝑥,

−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,

−𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢 = −𝑥′ 𝑆 ′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑆𝑥


= −𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥


This implies that

𝑣(𝑥) = −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢


= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 + 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
= −𝑥′ [𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴]𝑥

Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
360 21. LINEAR ALGEBRA
22

Complex Numbers and Trignometry

22.1 Contents

• Overview 22.2

• De Moivre’s Theorem 22.3

• Applications of de Moivre’s Theorem 22.4

22.2 Overview

This lecture introduces some elementary mathematics and trigonometry


Useful and interesting in its own right, these concepts reap substantial rewards when studying
dynamics generated by linear difference equations or linear differential equations
For example, these tools are keys to understanding outcomes attained by Paul Samuelson
(1939) [115] in his classic paper on interactions between the investment accelerator and the
Keynesian consumption function, our topic in the lecture Samuelson Multiplier Accelerator
In addition to providing foundations for Samuelson’s work and extensions of it, this lec-
ture can be read as a stand-alone quick reminder of key results from elementary high school
trigonometry
So let’s dive in

22.2.1 Complex Numbers

A complex number has a real part 𝑥 and a purely imaginary part 𝑦


The Euclidean, polar, and trigonometric forms of a complex number 𝑧 are:

𝑧 = 𝑥 + 𝑖𝑦 = 𝑟𝑒𝑖𝜃 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃)

The second equality above is known as called Euler’s formula

• Euler contributed many other formulas too!

361
362 22. COMPLEX NUMBERS AND TRIGNOMETRY

The complex conjugate 𝑧 ̄ of 𝑧 is defined as

𝑧 ̄ = 𝑟𝑒−𝑖𝜃 = 𝑟(cos 𝜃 − 𝑖 sin 𝜃)

The value 𝑥 is the real part of 𝑧 and 𝑦 is the imaginary part of 𝑧


The symbol |𝑧| = 𝑧𝑧̄ = 𝑟 represents the modulus of 𝑧
The value 𝑟 is the Euclidean distance of vector (𝑥, 𝑦) from the origin:

𝑟 = |𝑧| = √𝑥2 + 𝑦2

The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 )
Therefore,

𝑦
𝜃 = tan−1 ( )
𝑥

Three elementary trigonometric functions are

𝑥 𝑒𝑖𝜃 + 𝑒−𝑖𝜃 𝑦 𝑒𝑖𝜃 − 𝑒−𝑖𝜃 𝑥


cos 𝜃 = = , sin 𝜃 = = , tan 𝜃 =
𝑟 2 𝑟 2𝑖 𝑦

We’ll need the following imports

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

22.2.2 An Example

Consider the complex number 𝑧 = 1 + 3𝑖
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3

It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜

Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖

In [2]: # Abbreviate useful values and functions


π = np.pi
zeros = np.zeros
ones = np.ones

# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)

# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')

ax.plot((0, θ), (0, r), marker='o', color='b') # plot r


22.2. OVERVIEW 363

ax.plot(zeros(x_range.shape), x_range, color='b') # plot x


ax.plot(θ_range, x / np.cos(θ_range), color='b') # plot y
ax.plot(θ_range, ones(θ_range.shape) * 0.1, color='r') # plot θ

ax.margins(0) # Let the plot starts at origin

ax.set_title("Trigonometry of complex numbers", va='bottom', fontsize='x-large')

ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # less radial ticks
ax.set_rlabel_position(-88.5) # get radial labels away from plotted line

ax.text(θ, r+0.01 , r'$z = x + iy = 1 + \sqrt{3}\, i$') # label z


ax.text(θ+0.2, 1 , '$r = 2$') # label r
ax.text(0-0.2, 0.5, '$x = 1$') # label x
ax.text(0.5, 1.2, r'$y = \sqrt{3}$') # label y
ax.text(0.25, 0.15, r'$\theta = 60^o$') # label θ

ax.grid(True)
plt.show()
364 22. COMPLEX NUMBERS AND TRIGNOMETRY

22.3 De Moivre’s Theorem

de Moivre’s theorem states that:

(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 𝑒𝑖𝑛𝜃 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃)

To prove de Moivre’s theorem, note that

𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )

and compute

22.4 Applications of de Moivre’s Theorem

22.4.1 Example 1

We can use de Moivre’s theorem to show that 𝑟 = √𝑥2 + 𝑦2


We have

1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟

and thus

𝑥2 + 𝑦2 = 𝑟2

We recognize this as a theorem of Pythagoras

22.4.2 Example 2

Let 𝑧 = 𝑟𝑒𝑖𝜃 and 𝑧 ̄ = 𝑟𝑒−𝑖𝜃 so that 𝑧 ̄ is the complex conjugate of 𝑧


(𝑧, 𝑧)̄ form a complex conjugate pair of complex numbers
Let 𝑎 = 𝑝𝑒𝑖𝜔 and 𝑎̄ = 𝑝𝑒−𝑖𝜔 be another complex conjugate pair
For each element of a sequence of integers 𝑛 = 0, 1, 2, … ,
To do so, we can apply de Moivre’s formula
Thus,
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 365

𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

22.4.3 Example 3

This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [115]
Thus, consider a second-order linear difference equation

𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛

whose characteristic polynomial is

𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0

or

(𝑧2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0

has roots 𝑧1 , 𝑧1
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation

Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation

• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions

To solve the difference equation, recall from example 2 that

𝑥𝑛 = 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is

𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔

We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃)
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛
366 22. COMPLEX NUMBERS AND TRIGNOMETRY
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 14 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the
above initial condition:

In [3]: from sympy import *

# Set parameters
r = 0.9
θ = π/4
x0 = 4
x1 = 2 * r * sqrt(2)

# Define symbols to be calculated


ω, p = symbols('ω p', real=True)

# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ω+θ) / cos(ω))
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')

# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')

ω = 0.000
p = 2.000

Using the code above, we compute that 𝜔 = 0 and 𝑝 = 2


Then we plug in the values we solve for 𝜔 and 𝑝 and plot the dynamic

In [4]: # Define range of n


max_n = 30
n = np.arange(0, max_n+1, 0.01)

# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)

# Plot
fig, ax = plt.subplots(figsize=(12, 8))

ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')

ax.spines['bottom'].set_position('center') # Set x-axis in the middle of the plot


ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ticklab = ax.xaxis.get_ticklabels()[0] # Set x-label position


trans = ticklab.get_transform()
ax.xaxis.set_label_coords(31, 0, transform=trans)

ticklab = ax.yaxis.get_ticklabels()[0] # Set y-label position


trans = ticklab.get_transform()
ax.yaxis.set_label_coords(0, 5, transform=trans)

ax.grid()
plt.show()
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 367

22.4.4 Trigonometric Identities

We can obtain a complete suite of trigonometric identities by appropriately manipulating po-


lar forms of complex numbers
We’ll get many of them by deducing implications of the equality

𝑒𝑖(𝜔+𝜃) = 𝑒𝑖𝜔 𝑒𝑖𝜃

For example, we’ll calculate identities for


cos (𝜔 + 𝜃) and sin (𝜔 + 𝜃)
Using the sine and cosine formulas presented at the beginning of this lecture, we have:

𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖

We can also obtain the trigonometric identities as follows:

cos (𝜔 + 𝜃) + 𝑖 sin (𝜔 + 𝜃) = 𝑒𝑖(𝜔+𝜃)


= 𝑒𝑖𝜔 𝑒𝑖𝜃
= (cos 𝜔 + 𝑖 sin 𝜔)(cos 𝜃 + 𝑖 sin 𝜃)
= (cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃) + 𝑖(cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃)

Since both real and imaginary parts of the above formula should be equal, we get:
368 22. COMPLEX NUMBERS AND TRIGNOMETRY

cos (𝜔 + 𝜃) = cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃


sin (𝜔 + 𝜃) = cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃

The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:

In [5]: # Define symbols


ω, θ = symbols('ω θ', real=True)

# Verify
print("cos(ω)cos(θ) - sin(ω)sin(θ) =", simplify(cos(ω)*cos(θ) - sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =", simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))

cos(ω)cos(θ) - sin(ω)sin(θ) = cos(θ + ω)


cos(ω)sin(θ) + sin(ω)cos(θ) = sin(θ + ω)

22.4.5 Trigonometric Integrals

We can also compute the trigonometric integrals using polar forms of complex numbers
For example, we want to solve the following integral:

𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋

Using Euler’s formula, we have:

(𝑒𝑖𝜔 + 𝑒−𝑖𝜔 ) (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )


∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = ∫ 𝑑𝜔
2 2𝑖
1
= ∫ 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 𝑑𝜔
4𝑖
1 −𝑖 𝑖
= ( 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 + 𝐶1 )
4𝑖 2 2
2 2
1
= − [(𝑒𝑖𝜔 ) + (𝑒−𝑖𝜔 ) − 2] + 𝐶2
8
1
= − (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )2 + 𝐶2
8
2
1 𝑒𝑖𝜔 − 𝑒−𝑖𝜔
= ( ) + 𝐶2
2 2𝑖
1
= sin2 (𝜔) + 𝐶2
2

and thus:

𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2

We can verify the analytical as well as numerical results using integrate in the sympy
package:
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 369

In [6]: # Set initial printing


init_printing()

ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)

The analytical solution for integral of cos(ω)sin(ω) is:

Out[6]:

sin2 (𝜔)
2

In [7]: print('The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:')
integrate(cos(ω) * sin(ω), (ω, -π, π))

The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:

Out[7]:

0
370 22. COMPLEX NUMBERS AND TRIGNOMETRY
23

Orthogonal Projections and Their


Applications

23.1 Contents

• Overview 23.2

• Key Definitions 23.3

• The Orthogonal Projection Theorem 23.4

• Orthonormal Basis 23.5

• Projection Using Matrix Algebra 23.6

• Least Squares Regression 23.7

• Orthogonalization and Decomposition 23.8

• Exercises 23.9

• Solutions 23.10

23.2 Overview

Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions
These include, but are not limited to,

• Least squares projection, also known as linear regression


• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc

In this lecture, we focus on

371
372 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

• key ideas
• least squares regression

23.2.1 Further Reading

For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [109]
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter

23.3 Key Definitions

Assume 𝑥, 𝑧 ∈ R𝑛
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors
𝑥 and 𝑧
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧

For a linear subspace 𝑆 ⊂ R𝑛 , we call 𝑥 ∈ R𝑛 orthogonal to 𝑆 if 𝑥 ⟂ 𝑧 for all 𝑧 ∈ 𝑆, and


write 𝑥 ⟂ 𝑆
23.3. KEY DEFINITIONS 373

The orthogonal complement of linear subspace 𝑆 ⊂ R𝑛 is the set 𝑆 ⟂ ∶= {𝑥 ∈ R𝑛 ∶ 𝑥 ⟂ 𝑆}

𝑆 ⟂ is a linear subspace of R𝑛

• To see this, fix 𝑥, 𝑦 ∈ 𝑆 ⟂ and 𝛼, 𝛽 ∈ R


• Observe that if 𝑧 ∈ 𝑆, then

⟨𝛼𝑥 + 𝛽𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + 𝛽⟨𝑦, 𝑧⟩ = 𝛼 × 0 + 𝛽 × 0 = 0

• Hence 𝛼𝑥 + 𝛽𝑦 ∈ 𝑆 ⟂ , as was to be shown


374 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

A set of vectors {𝑥1 , … , 𝑥𝑘 } ⊂ R𝑛 is called an orthogonal set if 𝑥𝑖 ⟂ 𝑥𝑗 whenever 𝑖 ≠ 𝑗


If {𝑥1 , … , 𝑥𝑘 } is an orthogonal set, then the Pythagorean Law states that

‖𝑥1 + ⋯ + 𝑥𝑘 ‖2 = ‖𝑥1 ‖2 + ⋯ + ‖𝑥𝑘 ‖2

For example, when 𝑘 = 2, 𝑥1 ⟂ 𝑥2 implies

‖𝑥1 + 𝑥2 ‖2 = ⟨𝑥1 + 𝑥2 , 𝑥1 + 𝑥2 ⟩ = ⟨𝑥1 , 𝑥1 ⟩ + 2⟨𝑥2 , 𝑥1 ⟩ + ⟨𝑥2 , 𝑥2 ⟩ = ‖𝑥1 ‖2 + ‖𝑥2 ‖2

23.3.1 Linear Independence vs Orthogonality

If 𝑋 ⊂ R𝑛 is an orthogonal set and 0 ∉ 𝑋, then 𝑋 is linearly independent


Proving this is a nice exercise
While the converse is not true, a kind of partial converse holds, as we’ll see below

23.4 The Orthogonal Projection Theorem

What vector within a linear subspace of R𝑛 best approximates a given vector in R𝑛 ?


The next theorem provides answer to this question
Theorem (OPT) Given 𝑦 ∈ R𝑛 and linear subspace 𝑆 ⊂ R𝑛 , there exists a unique solution
to the minimization problem

𝑦 ̂ ∶= min ‖𝑦 − 𝑧‖
𝑧∈𝑆

The minimizer 𝑦 ̂ is the unique vector in R𝑛 that satisfies

• 𝑦̂ ∈ 𝑆
• 𝑦 − 𝑦̂ ⟂ 𝑆

The vector 𝑦 ̂ is called the orthogonal projection of 𝑦 onto 𝑆


The next figure provides some intuition

23.4.1 Proof of Sufficiency

We’ll omit the full proof.


But we will prove sufficiency of the asserted conditions
To this end, let 𝑦 ∈ R𝑛 and let 𝑆 be a linear subspace of R𝑛
Let 𝑦 ̂ be a vector in R𝑛 such that 𝑦 ̂ ∈ 𝑆 and 𝑦 − 𝑦 ̂ ⟂ 𝑆
Let 𝑧 be any other point in 𝑆 and use the fact that 𝑆 is a linear subspace to deduce

‖𝑦 − 𝑧‖2 = ‖(𝑦 − 𝑦)̂ + (𝑦 ̂ − 𝑧)‖2 = ‖𝑦 − 𝑦‖̂ 2 + ‖𝑦 ̂ − 𝑧‖2

Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof
23.4. THE ORTHOGONAL PROJECTION THEOREM 375

23.4.2 Orthogonal Projection as a Mapping

For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship

𝑦 ∈ 𝑌 ↦ its orthogonal projection 𝑦 ̂ ∈ 𝑆

By the OPT, this is a well-defined mapping or operator from R𝑛 to R𝑛


In what follows we denote this operator by a matrix 𝑃

• 𝑃 𝑦 represents the projection 𝑦 ̂


• This is sometimes expressed as 𝐸𝑆̂ 𝑦 = 𝑃 𝑦, where 𝐸̂ denotes a wide-sense expecta-
tions operator and the subscript 𝑆 indicates that we are projecting 𝑦 onto the linear
subspace 𝑆

The operator 𝑃 is called the orthogonal projection mapping onto 𝑆


376 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

It is immediate from the OPT that for any 𝑦 ∈ R𝑛

1. 𝑃 𝑦 ∈ 𝑆 and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆

From this, we can deduce additional useful properties, such as

1. ‖𝑦‖2 = ‖𝑃 𝑦‖2 + ‖𝑦 − 𝑃 𝑦‖2 and


2. ‖𝑃 𝑦‖ ≤ ‖𝑦‖

For example, to prove 1, observe that 𝑦 = 𝑃 𝑦 + 𝑦 − 𝑃 𝑦 and apply the Pythagorean law
Orthogonal Complement
Let 𝑆 ⊂ R𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every
𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂
We write

𝑌 = 𝑆 ⊕ 𝑆⟂

to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that
𝑦 = 𝑥1 + 𝑥2
Moreover, 𝑥1 = 𝐸𝑆̂ 𝑦 and 𝑥2 = 𝑦 − 𝐸𝑆̂ 𝑦
This amounts to another version of the OPT:
Theorem. If 𝑆 is a linear subspace of R𝑛 , 𝐸𝑆̂ 𝑦 = 𝑃 𝑦 and 𝐸𝑆̂ ⟂ 𝑦 = 𝑀 𝑦, then

𝑃 𝑦 ⟂ 𝑀𝑦 and 𝑦 = 𝑃 𝑦 + 𝑀 𝑦 for all 𝑦 ∈ R𝑛


23.5. ORTHONORMAL BASIS 377

The next figure illustrates

23.5 Orthonormal Basis

An orthogonal set of vectors 𝑂 ⊂ R𝑛 is called an orthonormal set if ‖𝑢‖ = 1 for all 𝑢 ∈ 𝑂


Let 𝑆 be a linear subspace of R𝑛 and let 𝑂 ⊂ 𝑆
If 𝑂 is orthonormal and span 𝑂 = 𝑆, then 𝑂 is called an orthonormal basis of 𝑆
𝑂 is necessarily a basis of 𝑆 (being independent by orthogonality and the fact that no ele-
ment is the zero vector)
One example of an orthonormal set is the canonical basis {𝑒1 , … , 𝑒𝑛 } that forms an orthonor-
mal basis of R𝑛 , where 𝑒𝑖 is the 𝑖 th unit vector
If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis of linear subspace 𝑆, then

𝑘
𝑥 = ∑⟨𝑥, 𝑢𝑖 ⟩𝑢𝑖 for all 𝑥∈𝑆
𝑖=1

To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify

𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1)
𝑗=1

Taking the inner product with respect to 𝑢𝑖 gives

𝑘
⟨𝑥, 𝑢𝑖 ⟩ = ∑ 𝛼𝑗 ⟨𝑢𝑗 , 𝑢𝑖 ⟩ = 𝛼𝑖
𝑗=1

Combining this result with Eq. (1) verifies the claim


378 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.5.1 Projection onto an Orthonormal Basis

When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then

𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ R𝑛 (2)
𝑖=1

Proof: Fix 𝑦 ∈ R𝑛 and let 𝑃 𝑦 be defined as in Eq. (2)


Clearly, 𝑃 𝑦 ∈ 𝑆
We claim that 𝑦 − 𝑃 𝑦 ⟂ 𝑆 also holds
It sufficies to show that 𝑦 − 𝑃 𝑦 ⟂ any basis vector 𝑢𝑖 (why?)
This is true because

𝑘 𝑘
⟨𝑦 − ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , 𝑢𝑗 ⟩ = ⟨𝑦, 𝑢𝑗 ⟩ − ∑⟨𝑦, 𝑢𝑖 ⟩⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0
𝑖=1 𝑖=1

23.6 Projection Using Matrix Algebra

Let 𝑆 be a linear subspace of R𝑛 and let 𝑦 ∈ R𝑛


We want to compute the matrix 𝑃 that verifies

𝐸𝑆̂ 𝑦 = 𝑃 𝑦

Evidently 𝑃 𝑦 is a linear function from 𝑦 ∈ R𝑛 to 𝑃 𝑦 ∈ R𝑛


This reference is useful https://en.wikipedia.org/wiki/Linear_map#Matrices
Theorem. Let the columns of 𝑛 × 𝑘 matrix 𝑋 form a basis of 𝑆. Then

𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′

Proof: Given arbitrary 𝑦 ∈ R𝑛 and 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ , our claim is that

1. 𝑃 𝑦 ∈ 𝑆, and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆

Claim 1 is true because

𝑃 𝑦 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑋𝑎 when 𝑎 ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

An expression of the form 𝑋𝑎 is precisely a linear combination of the columns of 𝑋, and


hence an element of 𝑆
Claim 2 is equivalent to the statement
23.6. PROJECTION USING MATRIX ALGEBRA 379

𝑦 − 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 ⟂ 𝑋𝑏 for all 𝑏 ∈ R𝐾

This is true: If 𝑏 ∈ R𝐾 , then

(𝑋𝑏)′ [𝑦 − 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦] = 𝑏′ [𝑋 ′ 𝑦 − 𝑋 ′ 𝑦] = 0

The proof is now complete

23.6.1 Starting with the Basis

It is common in applications to start with 𝑛 × 𝑘 matrix 𝑋 with linearly independent columns


and let

𝑆 ∶= span 𝑋 ∶= span{1 𝑋, … ,𝑘 𝑋}

Then the columns of 𝑋 form a basis of 𝑆


From the preceding theorem, 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 projects 𝑦 onto 𝑆
In this context, 𝑃 is often called the projection matrix

• The matrix 𝑀 = 𝐼 − 𝑃 satisfies 𝑀 𝑦 = 𝐸𝑆̂ ⟂ 𝑦 and is sometimes called the annihilator


matrix

23.6.2 The Orthonormal Case

Suppose that 𝑈 is 𝑛 × 𝑘 with orthonormal columns


Let 𝑢𝑖 ∶= col 𝑈𝑖 for each 𝑖, let 𝑆 ∶= span 𝑈 and let 𝑦 ∈ R𝑛
We know that the projection of 𝑦 onto 𝑆 is

𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦

Since 𝑈 has orthonormal columns, we have 𝑈 ′ 𝑈 = 𝐼


Hence

𝑘
𝑃 𝑦 = 𝑈 𝑈 ′ 𝑦 = ∑⟨𝑢𝑖 , 𝑦⟩𝑢𝑖
𝑖=1

We have recovered our earlier result about projecting onto the span of an orthonormal basis

23.6.3 Application: Overdetermined Systems of Equations

Let 𝑦 ∈ R𝑛 and let 𝑋 is 𝑛 × 𝑘 with linearly independent columns


Given 𝑋 and 𝑦, we seek 𝑏 ∈ R𝑘 satisfying the system of linear equations 𝑋𝑏 = 𝑦
If 𝑛 > 𝑘 (more equations than unknowns), then 𝑏 is said to be overdetermined
380 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Intuitively, we may not be able to find a 𝑏 that satisfies all 𝑛 equations


The best approach here is to

• Accept that an exact solution may not exist


• Look instead for an approximate solution

By approximate solution, we mean a 𝑏 ∈ R𝑘 such that 𝑋𝑏 is as close to 𝑦 as possible


The next theorem shows that the solution is well defined and unique
The proof uses the OPT
Theorem The unique minimizer of ‖𝑦 − 𝑋𝑏‖ over 𝑏 ∈ R𝐾 is

𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Proof: Note that

𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦

Since 𝑃 𝑦 is the orthogonal projection onto span(𝑋) we have

‖𝑦 − 𝑃 𝑦‖ ≤ ‖𝑦 − 𝑧‖ for any 𝑧 ∈ span(𝑋)

Because 𝑋𝑏 ∈ span(𝑋)

‖𝑦 − 𝑋 𝛽‖̂ ≤ ‖𝑦 − 𝑋𝑏‖ for any 𝑏 ∈ R𝐾

This is what we aimed to show

23.7 Least Squares Regression

Let’s apply the theory of orthogonal projection to least squares regression


This approach provides insights about many geometric properties of linear regression
We treat only some examples

23.7.1 Squared Risk Measures

Given pairs (𝑥, 𝑦) ∈ R𝐾 × R, consider choosing 𝑓 ∶ R𝐾 → R to minimize the risk

𝑅(𝑓) ∶= E [(𝑦 − 𝑓(𝑥))2 ]

If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:

1 𝑁
min ∑(𝑦 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁 𝑛=1 𝑛
23.7. LEAST SQUARES REGRESSION 381

Minimizing this expression is called empirical risk minimization


The set ℱ is sometimes called the hypothesis space
The theory of statistical learning tells us that to prevent overfitting we should take the set ℱ
to be relatively simple
If we let ℱ be the class of linear functions 1/𝑁 , the problem is

𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
𝑏∈R𝐾
𝑛=1

This is the sample linear least squares problem

23.7.2 Solution

Define the matrices

𝑦1 𝑥𝑛1

⎜ 𝑦2 ⎞
⎟ ⎛
⎜ 𝑥𝑛2 ⎞

𝑦 ∶= ⎜
⎜ ⎟
⎟ , 𝑥𝑛 ∶= ⎜
⎜ ⎟
⎟ = :math:‘n‘-th obs on all regressors
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ 𝑦𝑁 ⎠ ⎝ 𝑥𝑛𝐾 ⎠

and

𝑥′1 𝑥11 𝑥12 ⋯ 𝑥1𝐾



⎜ 𝑥′2 ⎞
⎟ ⎛
⎜ 𝑥21 𝑥22 ⋯ 𝑥2𝐾 ⎞

𝑋 ∶= ⎜
⎜ ⎟
⎟ ∶=∶ ⎜
⎜ ⎟

⎜ ⋮ ⎟ ⎜ ⋮ ⋮ ⋮ ⎟
⎝ 𝑥′𝑁 ⎠ 𝑥
⎝ 𝑁1 𝑥𝑁2 ⋯ 𝑥 𝑁𝐾 ⎠

We assume throughout that 𝑁 > 𝐾 and 𝑋 is full column rank


𝑁
If you work through the algebra, you will be able to verify that ‖𝑦 − 𝑋𝑏‖2 = ∑𝑛=1 (𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
Since monotone transforms don’t affect minimizers, we have

𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2 = min ‖𝑦 − 𝑋𝑏‖
𝑏∈R𝐾 𝑏∈R𝐾
𝑛=1

By our results about overdetermined linear systems of equations, the solution is

𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Let 𝑃 and 𝑀 be the projection and annihilator associated with 𝑋:

𝑃 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and 𝑀 ∶= 𝐼 − 𝑃

The vector of fitted values is

𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
382 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

The vector of residuals is

𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦

Here are some more standard definitions:

• The total sum of squares is ∶= ‖𝑦‖2


• The sum of squared residuals is ∶= ‖𝑢‖̂ 2
• The explained sum of squares is ∶= ‖𝑦‖̂ 2

TSS = ESS + SSR

We can prove this easily using the OPT


From the OPT we have 𝑦 = 𝑦 ̂ + 𝑢̂ and 𝑢̂ ⟂ 𝑦 ̂
Applying the Pythagorean law completes the proof

23.8 Orthogonalization and Decomposition

Let’s return to the connection between linear independence and orthogonality touched on
above
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets
The next section gives details

23.8.1 Gram-Schmidt Orthogonalization

Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ R𝑛 , there exists an orthonormal
set {𝑢1 , … , 𝑢𝑘 } with

span{𝑥1 , … , 𝑥𝑖 } = span{𝑢1 , … , 𝑢𝑖 } for 𝑖 = 1, … , 𝑘

The Gram-Schmidt orthogonalization procedure constructs an orthogonal set


{𝑢1 , 𝑢2 , … , 𝑢𝑛 }
One description of this procedure is as follows:

• For 𝑖 = 1, … , 𝑘, form 𝑆𝑖 ∶= span{𝑥1 , … , 𝑥𝑖 } and 𝑆𝑖⟂


• Set 𝑣1 = 𝑥1
• For 𝑖 ≥ 2 set 𝑣𝑖 ∶= 𝐸𝑆̂ 𝑖−1
⟂ 𝑥𝑖 and 𝑢𝑖 ∶= 𝑣𝑖 /‖𝑣𝑖 ‖

The sequence 𝑢1 , … , 𝑢𝑘 has the stated properties


A Gram-Schmidt orthogonalization construction is a key idea behind the Kalman filter de-
scribed in A First Look at the Kalman filter
In some exercises below, you are asked to implement this algorithm and test it using projec-
tion
23.9. EXERCISES 383

23.8.2 QR Decomposition

The following result uses the preceding algorithm to produce a useful decomposition
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization
𝑋 = 𝑄𝑅 where

• 𝑅 is 𝑘 × 𝑘, upper triangular, and nonsingular


• 𝑄 is 𝑛 × 𝑘 with orthonormal columns

Proof sketch: Let

• 𝑥𝑗 ∶=𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with the same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using
Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖

Since 𝑥𝑗 ∈ span{𝑢1 , … , 𝑢𝑗 }, we have

𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1

Some rearranging gives 𝑋 = 𝑄𝑅

23.8.3 Linear Regression via QR Decomposition

For matrices 𝑋 and 𝑦 that overdetermine 𝑏𝑒𝑡𝑎 in the linear equation system 𝑦 = 𝑋𝛽, we
found the least squares approximator 𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the QR decomposition 𝑋 = 𝑄𝑅 gives

𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦

Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitu-
tion

23.9 Exercises

23.9.1 Exercise 1

Show that, for any linear subspace 𝑆 ⊂ R𝑛 , 𝑆 ∩ 𝑆 ⟂ = {0}

23.9.2 Exercise 2

Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
384 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.9.3 Exercise 3

Using Gram-Schmidt orthogonalization, produce a linear projection of 𝑦 onto the column


space of 𝑋 and verify this using the projection matrix 𝑃 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and also using
QR decomposition, where:

1
𝑦 ∶= ⎛
⎜ 3 ⎞⎟,
⎝ −3 ⎠

and

1 0
𝑋 ∶= ⎛
⎜ 0 −6 ⎞

⎝ 2 2 ⎠

23.10 Solutions

23.10.1 Exercise 1

If 𝑥 ∈ 𝑆 and 𝑥 ∈ 𝑆 ⟂ , then we have in particular that ⟨𝑥, 𝑥⟩ = 0, ut then 𝑥 = 0

23.10.2 Exercise 2

Symmetry and idempotence of 𝑀 and 𝑃 can be established using standard rules for matrix
algebra. The intuition behind idempotence of 𝑀 and 𝑃 is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.)

23.10.3 Exercise 3

Here’s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture

In [1]: import numpy as np

def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.

Parameters
----------
X : an n x k array with linearly independent columns

Returns
-------
U : an n x k array with orthonormal columns

"""

# Set up
n, k = X.shape
U = np.empty((n, k))
23.10. SOLUTIONS 385

I = np.eye(n)

# The first col of U is just the normalized first col of X


v1 = X[:,0]
U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))

for i in range(1, k):


# Set up
b = X[:, i] # The vector we're going to project
Z = X[:, 0:i] # First i-1 columns of X

# Project onto the orthogonal complement of the col span of Z


M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
u = M @ b

# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))

return U

Here are the arrays we’ll work with

In [2]: y = [1, 3, -3]

X = [[1, 0],
[0, -6],
[2, 2]]

X, y = [np.asarray(z) for z in (X, y)]

First, let’s try projection of 𝑦 onto the column space of 𝑋 using the ordinary matrix expres-
sion:

In [3]: Py1 = X @ np.linalg.inv(X.T @ X) @ X.T @ y


Py1

Out[3]: array([-0.56521739, 3.26086957, -2.2173913 ])

Now let’s do the same using an orthonormal basis created from our gram_schmidt function

In [4]: U = gram_schmidt(X)
U

Out[4]: array([[ 0.4472136 , -0.13187609],


[ 0. , -0.98907071],
[ 0.89442719, 0.06593805]])

In [5]: Py2 = U @ U.T @ y


Py2

Out[5]: array([-0.56521739, 3.26086957, -2.2173913 ])

This is the same answer. So far so good. Finally, let’s try the same thing but with the basis
obtained via QR decomposition:

In [6]: from scipy.linalg import qr

Q, R = qr(X, mode='economic')
Q
386 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Out[6]: array([[-0.4472136 , -0.13187609],


[-0. , -0.98907071],
[-0.89442719, 0.06593805]])

In [7]: Py3 = Q @ Q.T @ y


Py3

Out[7]: array([-0.56521739, 3.26086957, -2.2173913 ])

Again, we obtain the same answer


24

LLN and CLT

24.1 Contents

• Overview 24.2

• Relationships 24.3

• LLN 24.4

• CLT 24.5

• Exercises 24.6

• Solutions 24.7

24.2 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold
In addition, we examine several useful extensions of the classical theorems, such as

• The delta method, for smooth functions of random variables


• The multivariate case

Some of these extensions are presented as exercises

24.3 Relationships

The CLT refines the LLN

387
388 24. LLN AND CLT

The LLN gives conditions under which sample moments converge to population moments as
sample size increases
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases

24.4 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to
their population means

24.4.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID)
random variables
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹
When it exists, let 𝜇 denote the common mean of this sample:

𝜇 ∶= E𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)

In addition, let

1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1

Kolmogorov’s strong law states that, if E|𝑋| is finite, then

P {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)

What does this last expression mean?


Let’s think about it from a simulation perspective, imagining for a moment that our com-
puter can generate perfect random samples (which of course it can’t)
Let’s also imagine that we can generate infinite sequences so that the statement 𝑋̄ 𝑛 → 𝜇 can
be evaluated
In this setting, Eq. (1) should be interpreted as meaning that the probability of the computer
producing a sequence where 𝑋̄ 𝑛 → 𝜇 fails to occur is zero

24.4.2 Proof

The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [38]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition
24.4. LLN 389

The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with E𝑋𝑖2 < ∞, then, for any 𝜖 > 0,
we have

P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} → 0 as 𝑛→∞ (2)

(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖
Recall the Chebyshev inequality, which tells us that

E[(𝑋̄ 𝑛 − 𝜇)2 ]
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2

Now observe that

2

{ 1 𝑛 ⎫
}
E[(𝑋̄ 𝑛 − 𝜇)2 ] = E ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
𝑛 𝑛
1
= 2 ∑ ∑ E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= 2 ∑ E(𝑋𝑖 − 𝜇)2
𝑛 𝑖=1
𝜎2
=
𝑛

Here the crucial step is at the third equality, which follows from independence
Independence means that if 𝑖 ≠ 𝑗, then the covariance term E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛
Combining our last result with Eq. (3), we come to the estimate

𝜎2
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ 2 (4)
𝑛𝜖

The claim in Eq. (2) is now clear


Of course, if the sequence 𝑋1 , … , 𝑋𝑛 is correlated, then the cross-product terms E(𝑋𝑖 −
𝜇)(𝑋𝑗 − 𝜇) are not necessarily zero
While this doesn’t mean that the same line of argument is impossible, it does mean that if we
want a similar result then the covariances should be “almost zero” for “most” of these terms
In a long sequence, this would be true if, for example, E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) approached zero
when the difference between 𝑖 and 𝑗 became large
In other words, the LLN can still work if the sequence 𝑋1 , … , 𝑋𝑛 has a kind of “asymptotic
independence”, in the sense that correlation falls to zero as variables become further apart in
the sequence
This idea is very important in time series analysis, and we’ll come across it again soon enough
390 24. LLN AND CLT

24.4.3 Illustration

Let’s now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted

In [1]: import random


import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, poisson
import matplotlib.pyplot as plt
%matplotlib inline

n = 100

# == Arbitrary collection of distributions == #


distributions = {"student's t with 10 degrees of freedom": t(10),
"β(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"γ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with λ = 1": expon(1)}

# == Create a figure and some axes == #


num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(20, 20))

# == Set some plotting parameters to improve layout == #


bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# == Generate n draws from the distribution == #


data = distribution.rvs(n)

# == Compute sample mean at each n == #


sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# == Plot == #
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args)

plt.show()
24.4. LLN 391

The three distributions are chosen at random from a selection stored in the dictionary dis-
tributions

24.4.4 Infinite Mean

What happens if the condition E|𝑋| < ∞ in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy-tailed — the best- known ex-
ample is the Cauchy distribution, which has density

1
𝑓(𝑥) = (𝑥 ∈ R)
𝜋(1 + 𝑥2 )

The next figure shows 100 independent draws from this distribution

In [2]: from scipy.stats import cauchy

n = 100
distribution = cauchy()
392 24. LLN AND CLT

fig, ax = plt.subplots(figsize=(10, 6))


data = distribution.rvs(n)

ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5)


ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"{n} observations from the Cauchy distribution")

plt.show()

Notice how extreme observations are far more prevalent here than the previous figure
Let’s now have a look at the behavior of the sample mean

In [3]: n = 1000
distribution = cauchy()

fig, ax = plt.subplots(figsize=(10, 6))


data = distribution.rvs(n)

# == Compute sample mean at each n == #


sample_mean = np.empty(n)

for i in range(1, n):


sample_mean[i] = np.mean(data[:i])

# == Plot == #
ax.plot(list(range(n)), sample_mean, 'r-', lw=3, alpha=0.6,
label='$\\bar X_n$')
ax.plot(list(range(n)), [0] * n, 'k--', lw=0.5)
ax.legend()

plt.show()
24.5. CLT 393

Here we’ve increased 𝑛 to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take 𝑛 even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is

𝜙(𝑡) = E𝑒𝑖𝑡𝑋 = ∫ 𝑒𝑖𝑡𝑥 𝑓(𝑥)𝑑𝑥 = 𝑒−|𝑡| (5)

Using independence, the characteristic function of the sample mean becomes

̄ 𝑡 𝑛
E𝑒𝑖𝑡𝑋𝑛 = E exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= E ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ E exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛

In view of Eq. (5), this is just 𝑒−|𝑡|


Thus, in the case of the Cauchy distribution, the sample mean itself has the very same
Cauchy distribution, regardless of 𝑛
In particular, the sequence 𝑋̄ 𝑛 does not converge to a point

24.5 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means
394 24. LLN AND CLT

24.5.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then

√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (6)

𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎

24.5.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [38])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact, all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables
In particular, let 𝑋𝑖 be binary, with P{𝑋𝑖 = 0} = P{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8

In [4]: from scipy.stats import binom

fig, axes = plt.subplots(2, 2, figsize=(10, 6))


plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):


b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(-0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()
24.5. CLT 395

When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability
When 𝑛 = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point 𝑘 = 1
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value)
The intuition is the same — there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes even more pronounced
We are witnessing the binomial approximation of the normal distribution

24.5.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition
To this end, we now perform the following simulation

1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖


396 24. LLN AND CLT


2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇)
3. Use these draws to compute some measure of their distribution — such as a histogram
4. Compare the latter to 𝑁 (0, 𝜎2 )

Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)

In [5]: from scipy.stats import norm

# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()

# == Draw underlying RVs. Each row contains a draw of X_1,..,X_n == #


data = distribution.rvs((k, n))
# == Compute mean of each row, producing k draws of \bar X_n == #
sample_means = data.mean(axis=1)
# == Generate observations of Y_n == #
Y = np.sqrt(n) * (sample_means - μ)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()

Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code
24.5. CLT 397

The fit to the normal density is already tight and can be further improved by increasing n
You can also experiment with other specifications of 𝐹

24.5.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of

𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases
In the simulation, we’ll be working with random variables having 𝜇 = 0
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable

For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5

In [6]: from scipy.stats import gaussian_kde


from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection

beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution is
itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# == Transform rows, so each represents a different distribution == #
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# == Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2} == #
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# == Rescale, so that the random variable is zero mean == #
m, sigma = X.mean(), X.std()
return (X - m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# == Form a matrix Z such that each column is reps independent draws of X == #


Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# == Take cumulative sum across columns
S = Z.cumsum(axis=1)
# == Multiply j-th column by sqrt j == #
Y = (1 / np.sqrt(ns)) * S

# == Plot == #

fig = plt.figure(figsize = (10, 6))


398 24. LLN AND CLT

ax = fig.gca(projection='3d')

a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)

# == Build verts == #
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])


poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',


xlabel=("n"), yticks=((-3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
ax.view_init(30, 45) # Rotates the plot 30 deg on z axis and 45 deg on x axis
plt.show()

As expected, the distribution smooths out into a bell curve as 𝑛 increases


We leave you to investigate its contents if you wish to know more
If you run the file from the ordinary IPython shell, the figure should pop up in a window that
you can rotate with your mouse, giving different views on the density sequence

24.5.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional
settings
To state the results, let’s recall some elementary facts about random vectors
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 )
24.5. CLT 399

Each realization of X is an element of R𝑘


A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in R𝑘 , we have

P{X1 ≤ x1 , … , X𝑛 ≤ x𝑛 } = P{X1 ≤ x1 } × ⋯ × P{X𝑛 ≤ x𝑛 }

(The vector inequality X ≤ x means that 𝑋𝑗 ≤ 𝑥𝑗 for 𝑗 = 1, … , 𝑘)


Let 𝜇𝑗 ∶= E[𝑋𝑗 ] for all 𝑗 = 1, … , 𝑘
The expectation E[X] of X is defined to be the vector of expectations:

E[𝑋1 ] 𝜇1

⎜ E[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞

E[X] ∶= ⎜
⎜ ⎟
⎟ =⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜⎜ ⋮ ⎟⎟
⎝ E[𝑋 𝑘] 𝜇
⎠ ⎝ 𝑘 ⎠

The variance-covariance matrix of random vector X is defined as

Var[X] ∶= E[(X − 𝜇)(X − 𝜇)′ ]

Expanding this out, we get

E[(𝑋1 − 𝜇1 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋1 − 𝜇1 )(𝑋𝑘 − 𝜇𝑘 )]


⎛ E[(𝑋 ⎞
⎜ 2 − 𝜇2 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋2 − 𝜇2 )(𝑋𝑘 − 𝜇𝑘 )] ⎟
Var[X] = ⎜
⎜ ⎟

⎜ ⋮ ⋮ ⋮ ⎟
⎝ E[(𝑋𝑘 − 𝜇𝑘 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋𝑘 − 𝜇𝑘 )(𝑋𝑘 − 𝜇𝑘 )] ⎠

The 𝑗, 𝑘-th term is the scalar covariance between 𝑋𝑗 and 𝑋𝑘


With this notation, we can proceed to the multivariate LLN and CLT
Let X1 , … , X𝑛 be a sequence of independent and identically distributed random vectors, each
one taking values in R𝑘
Let 𝜇 be the vector E[X𝑖 ], and let Σ be the variance-covariance matrix of X𝑖
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1

In this setting, the LLN tells us that

P {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (7)

Here X̄ 𝑛 → 𝜇 means that ‖X̄ 𝑛 − 𝜇‖ → 0, where ‖ ⋅ ‖ is the standard Euclidean norm


The CLT tells us that, provided Σ is finite,

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (8)
400 24. LLN AND CLT

24.6 Exercises

24.6.1 Exercise 1

One very useful consequence of the central limit theorem is as follows


Assume the conditions of the CLT as stated above
If 𝑔 ∶ R → R is differentiable at 𝜇 and 𝑔′ (𝜇) ≠ 0, then

√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (9)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥)

Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?

24.6.2 Exercise 2

Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that

1. X1 , … , X𝑛 is a sequence of IID random vectors, each taking values in R𝑘


2. 𝜇 ∶= E[X𝑖 ], and Σ is the variance-covariance matrix of X𝑖
3. The convergence

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (10)

is valid
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in R𝑘 and A is constant and 𝑘 × 𝑘, then

Var[AX] = A Var[X]A′
24.6. EXERCISES 401

𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in R𝑘 and A is constant and 𝑘 × 𝑘,
then

𝑑
AZ𝑛 → AZ

Third, if S is a 𝑘 × 𝑘 symmetric positive definite matrix, then there exists a symmetric posi-
tive definite matrix Q, called the inverse square root of S, such that

QSQ′ = I

Here I is the 𝑘 × 𝑘 identity matrix


Putting these things together, your first exercise is to show that if Q is the inverse square
root of �, then

√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)

Applying the continuous mapping theorem one more time tells us that

𝑑
‖Z𝑛 ‖2 → ‖Z‖2

Given the distribution of Z, we conclude that

𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (11)

where 𝜒2 (𝑘) is the chi-squared distribution with 𝑘 degrees of freedom


(Recall that 𝑘 is the dimension of X𝑖 , the underlying random vectors)
Your second exercise is to illustrate the convergence in Eq. (11) with a simulation
In doing so, let

𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖

where

• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1]


• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2]
• 𝑈𝑖 and 𝑊𝑖 are independent of each other

Hints:

1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it


2. You should be able to work out Σ from the preceding information
402 24. LLN AND CLT

24.7 Solutions

24.7.1 Exercise 1

Here is one solution

In [7]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""

from scipy.stats import uniform

# == Set parameters == #
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# == Generate obs of sqrt{n} (g(X_n) - g(μ)) == #


data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) - g(μ))

# == Plot == #
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
24.7. SOLUTIONS 403

What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0
Hence the conditions of the delta theorem are not satisfied

24.7.2 Exercise 2

First we want to verify the claim that

√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)

This is straightforward given the facts presented in the exercise


Let


Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)

By the multivariate CLT and the continuous mapping theorem, we have

𝑑
QY𝑛 → QY

Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Q′ = QΣQ′ = I

𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show
Now we turn to the simulation exercise
Our solution is as follows

In [8]: from scipy.stats import chi2


from scipy.linalg import inv, sqrtm

# == Set parameters == #
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)

# == Compute Σ^{-1/2} == #
Q = inv(sqrtm(Σ))

# == Generate observations of the normalized sample mean == #


error_obs = np.empty((2, replications))
for i in range(replications):
# == Generate one sequence of bivariate shocks == #
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
404 24. LLN AND CLT

# == Construct the n observations of the random vector == #


X[0, :] = W
X[1, :] = W + U
# == Construct the i-th observation of Y_n == #
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# == Premultiply by Q and then take the squared norm == #


temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
25

Linear State Space Models

25.1 Contents

• Overview 25.2

• The Linear State Space Model 25.3

• Distributions and Moments 25.4

• Stationarity and Ergodicity 25.5

• Noisy Observations 25.6

• Prediction 25.7

• Code 25.8

• Exercises 25.9

• Solutions 25.10

“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

25.2 Overview

This lecture introduces the linear state space dynamic system


This model is a workhorse that carries a powerful theory of prediction
Its many applications include:

• representing dynamics of higher-order linear systems

• predicting the position of a system 𝑗 steps into the future

405
406 25. LINEAR STATE SPACE MODELS

• predicting a geometric sum of future values of a variable like

– non-financial income
– dividends on a stock
– the money supply
– a government deficit or surplus, etc.

• key ingredient of useful models

– Friedman’s permanent income model of consumption smoothing


– Barro’s model of smoothing total tax collections
– Rational expectations version of Cagan’s model of hyperinflation
– Sargent and Wallace’s “unpleasant monetarist arithmetic,” etc.

25.3 The Linear State Space Model

The objects in play are:

• An 𝑛 × 1 vector 𝑥𝑡 denoting the state at time 𝑡 = 0, 1, 2, …


• An IID sequence of 𝑚 × 1 random vectors 𝑤𝑡 ∼ 𝑁 (0, 𝐼)
• A 𝑘 × 1 vector 𝑦𝑡 of observations at time 𝑡 = 0, 1, 2, …
• An 𝑛 × 𝑛 matrix 𝐴 called the transition matrix
• An 𝑛 × 𝑚 matrix 𝐶 called the volatility matrix
• A 𝑘 × 𝑛 matrix 𝐺 sometimes called the output matrix

Here is the linear state-space system

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 (1)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

25.3.1 Primitives

The primitives of the model are

1. the matrices 𝐴, 𝐶, 𝐺
2. shock distribution, which we have specialized to 𝑁 (0, 𝐼)
3. the distribution of the initial condition 𝑥0 , which we have set to 𝑁 (𝜇0 , Σ0 )

Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model Eq. (1) pins down the values of the
sequences {𝑥𝑡 } and {𝑦𝑡 }
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }
Later we’ll see how to compute these distributions and their moments
Martingale Difference Shocks
We’ve made the common assumption that the shocks are independent standardized normal
vectors
25.3. THE LINEAR STATE SPACE MODEL 407

But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies

E[𝑤𝑡+1 |𝑥𝑡 , 𝑥𝑡−1 , …] = 0

This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼)

25.3.2 Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of


the linear state space model
The following examples help to highlight this point
They also illustrate the wise dictum finding the state is an art
Second-order Difference Equation
Let {𝑦𝑡 } be a deterministic sequence that satisfies

𝑦𝑡+1 = 𝜙0 + 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 s.t. 𝑦0 , 𝑦−1 given (2)

To map Eq. (2) into our state space system Eq. (1), we set

1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡ ⎤
⎢ 0 𝜙1 𝜙2 ⎥
𝜙 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦

You can confirm that under these definitions, Eq. (1) and Eq. (2) agree
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1
408 25. LINEAR STATE SPACE MODELS

Later you’ll be asked to recreate this figure


Univariate Autoregressive Processes
We can use Eq. (1) to represent the model

𝑦𝑡+1 = 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 + 𝜙3 𝑦𝑡−2 + 𝜙4 𝑦𝑡−3 + 𝜎𝑤𝑡+1 (3)

where {𝑤𝑡 } is IID and standard normal



To put this in the linear state space format we take 𝑥𝑡 = [𝑦𝑡 𝑦𝑡−1 𝑦𝑡−2 𝑦𝑡−3 ] and

𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦

The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ]
The next figure shows the dynamics of this process when

𝜙1 = 0.5, 𝜙2 = −0.2, 𝜙3 = 0, 𝜙4 = 0.5, 𝜎 = 0.2, 𝑦0 = 𝑦−1 = 𝑦−2 = 𝑦−3 = 1

Vector Autoregressions
Now suppose that

• 𝑦𝑡 is a 𝑘 × 1 vector
• 𝜙𝑗 is a 𝑘 × 𝑘 matrix and
• 𝑤𝑡 is 𝑘 × 1

Then Eq. (3) is termed a vector autoregression


To map this into Eq. (1), we set
25.3. THE LINEAR STATE SPACE MODEL 409

𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦

where 𝐼 is the 𝑘 × 𝑘 identity matrix and 𝜎 is a 𝑘 × 𝑘 matrix


Seasonals
We can use Eq. (1) to represent

1. the deterministic seasonal 𝑦𝑡 = 𝑦𝑡−4


2. the indeterministic seasonal 𝑦𝑡 = 𝜙4 𝑦𝑡−4 + 𝑤𝑡

In fact, both are special cases of Eq. (3)


With the deterministic seasonal, the transition matrix becomes

0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦

It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:[1]

𝑥𝑡+4 = 𝑥𝑡

Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations
Time Trends
The model 𝑦𝑡 = 𝑎𝑡 + 𝑏 is known as a linear time trend
We can represent this model in the linear state space form by taking

1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0

and starting at initial condition 𝑥0 = [0 1]
In fact, it’s possible to use the state-space system to represent polynomial trends of any order
For instance, let

0 1 1 0 0
𝑥0 = ⎢0⎤

⎥ 𝐴 = ⎢0 1 1 ⎤

⎥ 𝐶 = ⎢0⎤


1
⎣ ⎦ ⎣ 0 0 1 ⎦ 0
⎣ ⎦
It follows that

1 𝑡 𝑡(𝑡 − 1)/2
𝐴𝑡 = ⎡
⎢0 1 𝑡 ⎤

⎣0 0 1 ⎦
410 25. LINEAR STATE SPACE MODELS

Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1], so that 𝑥𝑡 contains linear and quadratic time trends

25.3.3 Moving Average Representations

A nonrecursive expression for 𝑥𝑡 as a function of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 can be found by using


Eq. (1) repeatedly to obtain

𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0

Representation Eq. (5) is a moving average representation


It expresses {𝑥𝑡 } as a linear function of

1. current and past values of the process {𝑤𝑡 } and


2. the initial condition 𝑥0

As an example of a moving average representation, let the model be

1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0

1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0]
0 1
Substituting into the moving average representation Eq. (5), we obtain

𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0

where 𝑥1𝑡 is the first entry of 𝑥𝑡


The first term on the right is a cumulated sum of martingale differences and is therefore a
martingale
The second term is a translated linear function of time
For this reason, 𝑥1𝑡 is called a martingale with drift

25.4 Distributions and Moments

25.4.1 Unconditional Moments

Using Eq. (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡
We’ll explain what unconditional and conditional mean soon
25.4. DISTRIBUTIONS AND MOMENTS 411

Letting 𝜇𝑡 ∶= E[𝑥𝑡 ] and using linearity of expectations, we find that

𝜇𝑡+1 = 𝐴𝜇𝑡 with 𝜇0 given (6)

Here 𝜇0 is a primitive given in Eq. (1)


The variance-covariance matrix of 𝑥𝑡 is Σ𝑡 ∶= E[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
Using 𝑥𝑡+1 − 𝜇𝑡+1 = 𝐴(𝑥𝑡 − 𝜇𝑡 ) + 𝐶𝑤𝑡+1 , we can determine this matrix recursively via

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ + 𝐶𝐶 ′ with Σ0 given (7)

As with 𝜇0 , the matrix Σ0 is a primitive given in Eq. (1)


As a matter of terminology, we will sometimes call

• 𝜇𝑡 the unconditional mean of 𝑥𝑡


• Σ𝑡 the unconditional variance-covariance matrix of 𝑥𝑡

This is to distinguish 𝜇𝑡 and Σ𝑡 from related objects that use conditioning information, to be
defined below
However, you should be aware that these “unconditional” moments do depend on the initial
distribution 𝑁 (𝜇0 , Σ0 )
Moments of the Observations
Using linearity of expectations again we have

E[𝑦𝑡 ] = E[𝐺𝑥𝑡 ] = 𝐺𝜇𝑡 (8)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 ] = 𝐺Σ𝑡 𝐺′ (9)

25.4.2 Distributions

In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given

1. our Gaussian assumptions on the primitives


2. the fact that normality is preserved under linear operations
412 25. LINEAR STATE SPACE MODELS

In fact, it’s well-known that

𝑢 ∼ 𝑁 (𝑢,̄ 𝑆) and 𝑣 = 𝑎 + 𝐵𝑢 ⟹ 𝑣 ∼ 𝑁 (𝑎 + 𝐵𝑢,̄ 𝐵𝑆𝐵′ ) (10)

In particular, given our Gaussian assumptions on the primitives and the linearity of Eq. (1)
we can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 [2]
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact we’ve already done this, in Eq. (6) and Eq. (7)
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have

𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)

By similar reasoning combined with Eq. (8) and Eq. (9),

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ ) (12)

25.4.3 Ensemble Interpretations

How should we interpret the distributions defined by Eq. (11)–Eq. (12)?


Intuitively, the probabilities in a distribution correspond to relative frequencies in a large
population drawn from that distribution
Let’s apply this idea to our setting, focusing on the distribution of 𝑦𝑇 for fixed 𝑇
We can generate independent draws of 𝑦𝑇 by repeatedly simulating the evolution of the sys-
tem up to time 𝑇 , using an independent set of shocks each time
The next figure shows 20 simulations, producing 20 time series for {𝑦𝑡 }, and hence 20 draws
of 𝑦𝑇
The system in question is the univariate autoregressive model Eq. (3)
The values of 𝑦𝑇 are represented by black dots in the left-hand figure

In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
25.4. DISTRIBUTIONS AND MOMENTS 413

Let’s now try with 500,000 observations, showing only the histogram (without rotation)

The black line is the population density of 𝑦𝑇 calculated from Eq. (12)
The histogram and population distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distribution’s sufficient statistics
Ensemble Means
In the preceding figure, we approximated the population distribution of 𝑦𝑇 by

1. generating 𝐼 sample paths (i.e., time series) where 𝐼 is a large number


2. recording each observation 𝑦𝑇𝑖
3. histogramming this sample

Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average

1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇

approximates the expectation E[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers)
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50
414 25. LINEAR STATE SPACE MODELS

The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20)

The ensemble mean for 𝑥𝑡 is

1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

The limit 𝜇𝑇 is a “long-run average”


(By long-run average we mean the average for an infinite (𝐼 = ∞) number of sample 𝑥𝑇 ’s)
Another application of the law of large numbers assures us that

1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

25.4.4 Joint Distributions

In the preceding discussion, we looked at the distributions of 𝑥𝑡 and 𝑦𝑡 in isolation


This gives us useful information but doesn’t allow us to answer questions like

• what’s the probability that 𝑥𝑡 ≥ 0 for all 𝑡?


• what’s the probability that the process {𝑦𝑡 } exceeds some value 𝑎 before falling below
𝑏?
• etc., etc.

Such questions concern the joint distributions of these sequences


To compute the joint distribution of 𝑥0 , 𝑥1 , … , 𝑥𝑇 , recall that joint and conditional densities
are linked by the rule

𝑝(𝑥, 𝑦) = 𝑝(𝑦 | 𝑥)𝑝(𝑥) (joint = conditional × marginal)


25.5. STATIONARITY AND ERGODICITY 415

From this rule we get 𝑝(𝑥0 , 𝑥1 ) = 𝑝(𝑥1 | 𝑥0 )𝑝(𝑥0 )


The Markov property 𝑝(𝑥𝑡 | 𝑥𝑡−1 , … , 𝑥0 ) = 𝑝(𝑥𝑡 | 𝑥𝑡−1 ) and repeated applications of the preced-
ing rule lead us to

𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0

The marginal 𝑝(𝑥0 ) is just the primitive 𝑁 (𝜇0 , Σ0 )


In view of Eq. (1), the conditional densities are

𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )

Autocovariance Functions
An important object related to the joint distribution is the autocovariance function

Σ𝑡+𝑗,𝑡 ∶= E[(𝑥𝑡+𝑗 − 𝜇𝑡+𝑗 )(𝑥𝑡 − 𝜇𝑡 )′ ] (13)

Elementary calculations show that

Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)

Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date

25.5 Stationarity and Ergodicity

Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models
Let’s start with the intuition

25.5.1 Visualizing Stability

Let’s look at some more time series from the same model that we analyzed above
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
416 25. LINEAR STATE SPACE MODELS

Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞
When such a distribution exists it is called a stationary distribution

25.5.2 Stationary Distributions

In our setting, a distribution 𝜓∞ is said to be stationary for 𝑥𝑡 if

𝑥𝑡 ∼ 𝜓∞ and 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 ⟹ 𝑥𝑡+1 ∼ 𝜓∞

Since

1. in the present case, all distributions are Gaussian


2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix

we can restate the definition as follows: 𝜓∞ is stationary for 𝑥𝑡 if

𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )

where 𝜇∞ and Σ∞ are fixed points of Eq. (6) and Eq. (7) respectively
25.5. STATIONARITY AND ERGODICITY 417

25.5.3 Covariance Stationary Processes

Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution

Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size
By

• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )


• the definitions of 𝜇∞ and Σ∞ as fixed points of Eq. (6) and Eq. (7) respectively

we’ve ensured that

𝜇𝑡 = 𝜇∞ and Σ𝑡 = Σ∞ for all 𝑡

Moreover, in view of Eq. (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ ,
which depends on 𝑗 but not on 𝑡
This motivates the following definition
A process {𝑥𝑡 } is said to be covariance stationary if

• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡


• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡

In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡

25.5.4 Conditions for Stationarity

The Globally Stable Case


The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True
418 25. LINEAR STATE SPACE MODELS

The difference equation Eq. (7) also has a unique fixed point in this case, and, moreover

𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞

regardless of the initial conditions 𝜇0 and Σ0


This is the globally stable case — see these notes for more a theoretical treatment
However, global stability is more than we need for stationary solutions, and often more than
we want
To illustrate, consider our second order difference equation example

Here the state is 𝑥𝑡 = [1 𝑦𝑡 𝑦𝑡−1 ]
Because of the constant first component in the state vector, we will never have 𝜇𝑡 → 0
How can we find stationary solutions that respect a constant state component?
Processes with a Constant State Component
To investigate such a process, suppose that 𝐴 and 𝐶 take the form

𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0

where

• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector


Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1
It follows that

𝑥1,𝑡+1 = 𝐴1 𝑥1𝑡 + 𝑎 + 𝐶1 𝑤𝑡+1

Let 𝜇1𝑡 = E[𝑥1𝑡 ] and take expectations on both sides of this expression to get

𝜇1,𝑡+1 = 𝐴1 𝜇1,𝑡 + 𝑎 (15)

Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one
Then Eq. (15) has a unique stationary solution, namely,

𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎


The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1]
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy

Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
25.5. STATIONARITY AND ERGODICITY 419

Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡
In conclusion, if

• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity

then the {𝑥𝑡 } process is covariance stationary, with constant state component

Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on Eq. (7) converge to the fixed point of the dis-
crete Lyapunov equation in the first line of Eq. (16)

25.5.5 Ergodicity

Let’s suppose that we’re working with a covariance stationary process


In this case, we know that the ensemble mean will converge to 𝜇∞ as the sample size 𝐼 ap-
proaches infinity
Averages over Time
Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0
So now let’s take a single realization and form the time-series averages

1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡

Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution
In particular,

1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞

In our linear Gaussian setting, any covariance stationary process is also ergodic
420 25. LINEAR STATE SPACE MODELS

25.6 Noisy Observations

In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce

• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼)


• A 𝑘 × ℓ matrix 𝐻

and extend the linear state-space system to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡 (17)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

The sequence {𝑣𝑡 } is assumed to be independent of {𝑤𝑡 }


The process {𝑥𝑡 } is not modified by noise in the observation equation and its moments, distri-
butions and stability properties remain the same
The unconditional moments of 𝑦𝑡 from Eq. (8) and Eq. (9) now become

E[𝑦𝑡 ] = E[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺𝜇𝑡 (18)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ (19)

The distribution of 𝑦𝑡 is therefore

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )

25.7 Prediction

The theory of prediction for linear state space systems is elegant and simple

25.7.1 Forecasting Formulas – Conditional Means

The natural way to predict variables is to use conditional distributions


For example, the optimal forecast of 𝑥𝑡+1 given information known at time 𝑡 is

E𝑡 [𝑥𝑡+1 ] ∶= E[𝑥𝑡+1 ∣ 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] = 𝐴𝑥𝑡

The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0
That E𝑡 [𝑥𝑡+1 ] = E[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property
25.7. PREDICTION 421

The one-step-ahead forecast error is

𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ] = 𝐶𝑤𝑡+1

The covariance matrix of the forecast error is

E[(𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ])(𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ])′ ] = 𝐶𝐶 ′

More generally, we’d like to compute the 𝑗-step ahead forecasts E𝑡 [𝑥𝑡+𝑗 ] and E𝑡 [𝑦𝑡+𝑗 ]
With a bit of algebra, we obtain

𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡 + 𝐴𝑗−1 𝐶𝑤𝑡+1 + 𝐴𝑗−2 𝐶𝑤𝑡+2 + ⋯ + 𝐴0 𝐶𝑤𝑡+𝑗

In view of the IID property, current and past state values provide no information about fu-
ture values of the shock
Hence E𝑡 [𝑤𝑡+𝑘 ] = E[𝑤𝑡+𝑘 ] = 0
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is

E𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡

The 𝑗-step ahead forecast of 𝑦 is therefore

E𝑡 [𝑦𝑡+𝑗 ] = E𝑡 [𝐺𝑥𝑡+𝑗 + 𝐻𝑣𝑡+𝑗 ] = 𝐺𝐴𝑗 𝑥𝑡

25.7.2 Covariance of Prediction Errors

It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors

𝑗−1
𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0

Evidently,

𝑗−1

𝑉𝑗 ∶= E𝑡 [(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ]) ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘

(21)
𝑘=0

𝑉𝑗 defined in Eq. (21) can be calculated recursively via 𝑉1 = 𝐶𝐶 ′ and

𝑉𝑗 = 𝐶𝐶 ′ + 𝐴𝑉𝑗−1 𝐴′ , 𝑗≥2 (22)

𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡
Under particular conditions, 𝑉𝑗 converges to

𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
422 25. LINEAR STATE SPACE MODELS

Equation Eq. (23) is an example of a discrete Lyapunov equation in the covariance matrix 𝑉∞
A sufficient condition for 𝑉𝑗 to converge is that the eigenvalues of 𝐴 be strictly less than one
in modulus
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of 𝐶 that equal 0

25.7.3 Forecasts of Geometric Sums

In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system Eq. (1)
We want the following objects


• Forecast of a geometric sum of future 𝑥’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ]

• Forecast of a geometric sum of future 𝑦’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ]

These objects are important components of some famous and interesting dynamic models
For example,


• if {𝑦𝑡 } is a stream of dividends, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price

• if {𝑦𝑡 } is the money supply, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level

Formulas
Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴]
This leads to our formulas:

• Forecast of a geometric sum of future 𝑥’s


E𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

• Forecast of a geometric sum of future 𝑦’s


E𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

25.8 Code

Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package
25.9. EXERCISES 423

The code implements a class for handling linear state space models (simulations, calculating
moments, etc.)
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence()
Go back and read the relevant documentation if you’ve forgotten how generator functions
work
Examples of usage are given in the solutions to the exercises

25.9 Exercises

25.9.1 Exercise 1

Replicate this figure using the LinearStateSpace class from lss.py

25.9.2 Exercise 2

Replicate this figure modulo randomness using the same class

25.9.3 Exercise 3

Replicate this figure modulo randomness using the same class


The state space model and parameters are the same as for the preceding exercise

25.9.4 Exercise 4

Replicate this figure modulo randomness using the same class


The state space model and parameters are the same as for the preceding exercise, except that
the initial condition is the stationary distribution
Hint: You can use the stationary_distributions method to get the initial conditions
The number of sample paths is 80, and the time horizon in the figure is 100
Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates
10, 50 and 75

25.10 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
from quantecon import LinearStateSpace

25.10.1 Exercise 1
In [3]: �_0, �_1, �_2 = 1.1, 0.8, -0.8

A = [[1, 0, 0 ],
424 25. LINEAR STATE SPACE MODELS

[�_0, �_1, �_2],


[0, 1, 0 ]]
C = np.zeros((3, 1))
G = [0, 1, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)

fig, ax = plt.subplots(figsize=(10, 6))


y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()

25.10.2 Exercise 2
In [4]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.2

A = [[�_1, �_2, �_3, �_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)

fig, ax = plt.subplots(figsize=(10, 6))


y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()
25.10. SOLUTIONS 425

25.10.3 Exercise 3
In [5]: from scipy.stats import norm
import random

�_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5


σ = 0.1

A = [[�_1, �_2, �_3, �_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15

fig, ax = plt.subplots(figsize=(8, 5))

ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)

ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')

m = ar.moment_sequence()
population_means = []
426 25. LINEAR STATE SPACE MODELS

for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()

25.10.4 Exercise 4
In [6]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.1

A = [[�_1, �_2, �_3, �_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

T0 = 10
T1 = 50
T2 = 75
T4 = 100

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4), Sigma_0=Σ_x)


ymin, ymax = -0.6, 0.6

fig, ax = plt.subplots(figsize=(8, 5))

ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)

ax.set_xticks((T0, T1, T2))


ax.set_xticklabels(("$T$", "$T'$", "$T''$"), fontsize=14)
25.10. SOLUTIONS 427

μ_x, μ_y, Σ_x, Σ_y = ar.stationary_distributions()


ar.mu_0 = μ_x
ar.Sigma_0 = Σ_x

for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()

Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then Eq. (1)
and Eq. (10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that
every 𝑥𝑡 is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
428 25. LINEAR STATE SPACE MODELS
26

Finite Markov Chains

26.1 Contents

• Overview 26.2

• Definitions 26.3

• Simulation 26.4

• Marginal Distributions 26.5

• Irreducibility and Aperiodicity 26.6

• Stationary Distributions 26.7

• Ergodicity 26.8

• Computing Expectations 26.9

• Exercises 26.10

• Solutions 26.11

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

26.2 Overview

Markov chains are one of the most useful classes of stochastic processes, being

• simple, flexible and supported by many elegant theoretical results


• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right

You will find them in many of the workhorse models of economics and finance
In this lecture, we review some of the theory of Markov chains

429
430 26. FINITE MARKOV CHAINS

We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py
Prerequisite knowledge is basic probability and linear algebra

26.3 Definitions

The following concepts are fundamental

26.3.1 Stochastic Matrices

A stochastic matrix (or Markov matrix) is an 𝑛 × 𝑛 square matrix 𝑃 such that

1. each element of 𝑃 is nonnegative, and


2. each row of 𝑃 sums to one

Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes
It is too not difficult to check [1] that if 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘
for all 𝑘 ∈ N

26.3.2 Markov Chains

There is a close connection between stochastic matrices and Markov chains


To begin, let 𝑆 be a finite set with 𝑛 elements {𝑥1 , … , 𝑥𝑛 }
The set 𝑆 is called the state space and 𝑥1 , … , 𝑥𝑛 are the state values
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables on 𝑆 that have the Markov
property
This means that, for any date 𝑡 and any state 𝑦 ∈ 𝑆,

P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 } = P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 , 𝑋𝑡−1 , …} (1)

In other words, knowing the current state is enough to know probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values

𝑃 (𝑥, 𝑦) ∶= P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆) (2)

By construction,

• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥

We can view 𝑃 as a stochastic matrix where

𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
26.3. DEFINITIONS 431

Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:

• draw 𝑋0 from some specified distribution


• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)

By construction, the resulting process satisfies Eq. (2)

26.3.3 Example 1

Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1)
Suppose that, over a one month period,

1. An unemployed worker finds a job with probability 𝛼 ∈ (0, 1)


2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1)

In terms of a Markov model, we have

• 𝑆 = {0, 1}
• 𝑃 (0, 1) = 𝛼 and 𝑃 (1, 0) = 𝛽

We can write out the transition probabilities in matrix form as

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as

• What is the average duration of unemployment?


• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?

We’ll cover such applications below

26.3.4 Example 2

Using US unemployment data, Hamilton [51] estimated the stochastic matrix

0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞

⎝ 0 0.508 0.492 ⎠

where

• the frequency is monthly


432 26. FINITE MARKOV CHAINS

• the first state represents “normal growth”


• the second state represents “mild recession”
• the third state represents “severe recession”

For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities

Here “ng” is normal growth, “mr” is mild recession, etc.

26.4 Simulation

One natural way to answer questions about Markov chains is to simulate them
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.py

• Efficient, bundled with lots of other useful routines for handling Markov chains

However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1

26.4.1 Rolling Our Own

To simulate a Markov chain, we need its stochastic matrix 𝑃 and either an initial state or a
probability distribution 𝜓 for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:

1. At time 𝑡 = 0, the 𝑋0 is set to some fixed state or chosen from 𝜓


2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅)

In order to implement this simulation procedure, we need a method for generating draws from
a discrete distribution
For this task, we’ll use DiscreteRV from QuantEcon
26.4. SIMULATION 433

In [2]: import quantecon as qe


import numpy as np

ψ = (0.1, 0.9) # Probabilities over sample space {0, 1}


cdf = np.cumsum(ψ)
qe.random.draw(cdf, 5) # Generate 5 independent draws from ψ

Out[2]: array([1, 1, 1, 1, 1])

We’ll write our code as a function that takes the following three arguments

• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return

In [3]: def mc_sample_path(P, init=0, sample_size=1000):


# === make sure P is a NumPy array === #
P = np.asarray(P)
# === allocate memory === #
X = np.empty(sample_size, dtype=int)
X[0] = init
# === convert each row of P into a distribution === #
# In particular, P_dist[i] = the distribution corresponding to P[i, :]
n = len(P)
P_dist = [np.cumsum(P[i, :]) for i in range(n)]

# === generate the sample path === #


for t in range(sample_size - 1):
X[t+1] = qe.random.draw(P_dist[X[t]])

return X

Let’s see how it works using the small matrix

0.4 0.6
𝑃 ∶= ( ) (3)
0.2 0.8

As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25
If you run the following code you should get roughly that answer

In [4]: P = [[0.4, 0.6], [0.2, 0.8]]


X = mc_sample_path(P, sample_size=100000)
np.mean(X == 0)

Out[4]: 0.25109

26.4.2 Using QuantEcon’s Routines

As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion
Here’s an illustration using the same P as the preceding example

In [5]: P = [[0.4, 0.6], [0.2, 0.8]]


mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1000000)
np.mean(X == 0)
434 26. FINITE MARKOV CHAINS

Out[5]: 0.249741

In fact the QuantEcon.py routine is JIT compiled and much faster


(Because it’s JIT compiled the first run takes a bit longer — the function has to be compiled
and stored in memory)

In [6]: %timeit mc_sample_path(P, sample_size=1000000) # our version

678 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit mc.simulate(ts_length=1000000) # qe version

30.2 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Adding State Values and Initial Conditions


If we wish to, we can provide a specification of state values to MarkovChain
These state values can be integers, floats, or even strings
The following code illustrates

In [8]: mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))


mc.simulate(ts_length=4, init='employed')

Out[8]: array(['employed', 'employed', 'employed', 'employed'], dtype='<U10')

In [9]: mc.simulate(ts_length=4, init='unemployed')

Out[9]: array(['unemployed', 'employed', 'employed', 'employed'], dtype='<U10')

In [10]: mc.simulate(ts_length=4) # Start at randomly chosen initial state

Out[10]: array(['unemployed', 'employed', 'unemployed', 'employed'], dtype='<U10')

If we want to simulate with output as indices rather than state values we can use

In [11]: mc.simulate_indices(ts_length=4)

Out[11]: array([1, 1, 1, 1])

26.5 Marginal Distributions

Suppose that

1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃


2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡

What then is the distribution of 𝑋𝑡+1 , or, more generally, of 𝑋𝑡+𝑚 ?


26.5. MARGINAL DISTRIBUTIONS 435

26.5.1 Solution

Let 𝜓𝑡 be the distribution of 𝑋𝑡 for 𝑡 = 0, 1, 2, …


Our first aim is to find 𝜓𝑡+1 given 𝜓𝑡 and 𝑃
To begin, pick any 𝑦 ∈ 𝑆
Using the law of total probability, we can decompose the probability that 𝑋𝑡+1 = 𝑦 as follows:

P{𝑋𝑡+1 = 𝑦} = ∑ P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} ⋅ P{𝑋𝑡 = 𝑥}


𝑥∈𝑆

In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
>
𝜓𝑡+1 (𝑦) = ∑ 𝑃 (𝑥, 𝑦)𝜓𝑡 (𝑥)
𝑥∈𝑆

There are 𝑛 such equations, one for each 𝑦 ∈ 𝑆


If we think of 𝜓𝑡+1 and 𝜓𝑡 as row vectors (as is traditional in this literature), these 𝑛 equa-
tions are summarized by the matrix expression

𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)

In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃
By repeating this 𝑚 times we move forward 𝑚 steps into the future
Hence, iterating on Eq. (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚
This is very important, so let’s repeat it

𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)

and, more generally,

𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)

26.5.2 Multiple Step Transition Probabilities

We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦)


It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃
To see why, consider again Eq. (6), but now with 𝜓𝑡 putting all probability on state 𝑥
436 26. FINITE MARKOV CHAINS

• 1 in the 𝑥-th position and zero elsewhere

Inserting this into Eq. (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚
In particular

P{𝑋𝑡+𝑚 = 𝑦} = 𝑃 𝑚 (𝑥, 𝑦) = (𝑥, 𝑦)-th element of 𝑃 𝑚

26.5.3 Example: Probability of Recession

Recall the stochastic matrix 𝑃 for recession and growth considered above
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥)
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product

0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞

⎝ 1 ⎠

26.5.4 Example 2: Cross-Sectional Distributions

The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
rience is described by the specified dynamics, independent of one another
Let 𝜓 be the current cross-sectional distribution over {0, 1}

• For example, 𝜓(0) is the unemployment rate

The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment
The same distribution also describes the fractions of a particular worker’s career spent being
employed and unemployed, respectively

26.6 Irreducibility and Aperiodicity

Irreducibility and aperiodicity are central concepts of modern Markov chain theory
Let’s see what they’re about
26.6. IRREDUCIBILITY AND APERIODICITY 437

26.6.1 Irreducibility

Let 𝑃 be a fixed stochastic matrix


Two states 𝑥 and 𝑦 are said to communicate with each other if there exist positive integers
𝑗 and 𝑘 such that

𝑃 𝑗 (𝑥, 𝑦) > 0 and 𝑃 𝑘 (𝑦, 𝑥) > 0

In view of our discussion above, this means precisely that

• state 𝑥 can be reached eventually from state 𝑦, and


• state 𝑦 can be reached eventually from state 𝑥

The stochastic matrix 𝑃 is called irreducible if all states communicate; that is, if 𝑥 and 𝑦
communicate for all (𝑥, 𝑦) in 𝑆 × 𝑆
For example, consider the following transition probabilities for wealth of a fictitious set of
households

We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes

0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞

⎝ 0.1 0.1 0.8 ⎠

It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually
We can also test this using QuantEcon.py’s MarkovChain class

In [12]: P = [[0.9, 0.1, 0.0],


[0.4, 0.4, 0.2],
[0.1, 0.1, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[12]: True

Here’s a more pessimistic scenario, where the poor are poor forever
438 26. FINITE MARKOV CHAINS

This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Let’s confirm this

In [13]: P = [[1.0, 0.0, 0.0],


[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[13]: False

We can also determine the “communication classes”

In [14]: mc.communication_classes

Out[14]: [array(['poor'], dtype='<U6'), array(['middle', 'rich'], dtype='<U6')]

It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes
For example, poverty is a life sentence in the second graph but not the first
We’ll come back to this a bit later

26.6.2 Aperiodicity

Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise
Here’s a trivial example with three states

The chain cycles with period 3:


26.7. STATIONARY DISTRIBUTIONS 439

In [15]: P = [[0, 1, 0],


[0, 0, 1],
[1, 0, 0]]

mc = qe.MarkovChain(P)
mc.period

Out[15]: 3

More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers

𝐷(𝑥) ∶= {𝑗 ≥ 1 ∶ 𝑃 𝑗 (𝑥, 𝑥) > 0}

In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2

We can confirm that the stochastic matrix is periodic as follows

In [16]: P = [[0.0, 1.0, 0.0, 0.0],


[0.5, 0.0, 0.5, 0.0],
[0.0, 0.5, 0.0, 0.5],
[0.0, 0.0, 1.0, 0.0]]

mc = qe.MarkovChain(P)
mc.period

Out[16]: 2

In [17]: mc.is_aperiodic

Out[17]: False

26.7 Stationary Distributions

As seen in Eq. (4), we can shift probabilities forward one unit of time via postmultiplication
by 𝑃
Some distributions are invariant under this updating process — for example,

In [18]: P = np.array([[.4, .6], [.2, .8]])


ψ = (0.25, 0.75)
ψ @ P

Out[18]: array([0.25, 0.75])

Such distributions are called stationary, or invariant


Formally, a distribution 𝜓∗ on 𝑆 is called stationary for 𝑃 if 𝜓∗ = 𝜓∗ 𝑃
440 26. FINITE MARKOV CHAINS

From this equality, we immediately get 𝜓∗ = 𝜓∗ 𝑃 𝑡 for all 𝑡


This tells us an important fact: If the distribution of 𝑋0 is a stationary distribution, then 𝑋𝑡
will have this same distribution for all 𝑡
Hence stationary distributions have a natural interpretation as stochastic steady states —
we’ll discuss this more in just a moment
Mathematically, a stationary distribution is a fixed point of 𝑃 when 𝑃 is thought of as the
map 𝜓 ↦ 𝜓𝑃 from (row) vectors to (row) vectors
Theorem. Every stochastic matrix 𝑃 has at least one stationary distribution
(We are assuming here that the state space 𝑆 is finite; if not more assumptions are required)
For proof of this result, you can apply Brouwer’s fixed point theorem, or see EDTC, theorem
4.3.5
There may in fact be many stationary distributions corresponding to a given stochastic ma-
trix 𝑃

• For example, if 𝑃 is the identity matrix, then all distributions are stationary

Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility
This gives some intuition for the following fundamental theorem
Theorem. If 𝑃 is both aperiodic and irreducible, then

1. 𝑃 has exactly one stationary distribution 𝜓∗


2. For any initial distribution 𝜓0 , we have ‖𝜓0 𝑃 𝑡 − 𝜓∗ ‖ → 0 as 𝑡 → ∞

For a proof, see, for example, theorem 5.2 of [47]


(Note that part 1 of the theorem requires only irreducibility, whereas part 2 requires both
irreducibility and aperiodicity)
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly
ergodic
One easy sufficient condition for aperiodicity and irreducibility is that every element of 𝑃 is
strictly positive

• Try to convince yourself of this

26.7.1 Example

Recall our model of employment/unemployment dynamics for a given worker discussed above
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0)
26.7. STATIONARY DISTRIBUTIONS 441

Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields

𝛽
𝑝=
𝛼+𝛽

This is, in some sense, a steady state probability of unemployment — more on interpretation
below
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0

26.7.2 Calculating Stationary Distributions

As discussed above, a given Markov matrix 𝑃 can have many stationary distributions
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,

𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2

is a stationary distribution for 𝑃 for any 𝜆 ∈ [0, 1]


If we restrict attention to the case where only one stationary distribution exists, one option
for finding it is to try to solve the linear system 𝜓(𝐼𝑛 − 𝑃 ) = 0 for 𝜓, where 𝐼𝑛 is the 𝑛 × 𝑛
identity
But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution
A suitable algorithm is implemented in QuantEcon.py — the next code block illustrates

In [19]: P = [[0.4, 0.6], [0.2, 0.8]]


mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions

Out[19]: array([[0.25, 0.75]])

The stationary distribution is unique

26.7.3 Convergence to Stationarity

Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state
The convergence in the theorem is illustrated in the next figure

In [20]: from mpl_toolkits.mplot3d import Axes3D


import matplotlib.pyplot as plt
%matplotlib inline

P = ((0.971, 0.029, 0.000),


(0.145, 0.778, 0.077),
442 26. FINITE MARKOV CHAINS

(0.000, 0.508, 0.492))


P = np.array(P)

ψ = (0.0, 0.2, 0.8) # Initial condition

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')

ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1),


xticks=(0.25, 0.5, 0.75),
yticks=(0.25, 0.5, 0.75),
zticks=(0.25, 0.5, 0.75))

x_vals, y_vals, z_vals = [], [], []


for t in range(20):
x_vals.append(ψ[0])
y_vals.append(ψ[1])
z_vals.append(ψ[2])
ψ = ψ @ P

ax.scatter(x_vals, y_vals, z_vals, c='r', s=60)


ax.view_init(30, 210)

mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)

plt.show()

Here

• 𝑃 is the stochastic matrix for recession and growth considered above


• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in R3
26.8. ERGODICITY 443

• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, …


• The black dot is 𝜓∗

The code for the figure can be found here — you might like to try experimenting with differ-
ent initial conditions

26.8 Ergodicity

Under irreducibility, yet another important result obtains: For all 𝑥 ∈ 𝑆,

1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑛 𝑡=1

Here

• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise


• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0

The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in Eq. (7) is valid
The convergence in Eq. (7) is a special case of a law of large numbers result for Markov
chains — see EDTC, section 4.3.4 for some additional information

26.8.1 Example

Recall our cross-sectional interpretation of the employment/unemployment model discussed


above
Assume that 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), so that irreducibility and aperiodicity both hold
We saw that the stationary distribution is (𝑝, 1 − 𝑝), where

𝛽
𝑝=
𝛼+𝛽

In the cross-sectional interpretation, this is the fraction of people unemployed


In view of our latest (ergodicity) result, it is also the fraction of time that a worker can ex-
pect to spend unemployed
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a
given person coincide
This is one interpretation of the notion of ergodicity
444 26. FINITE MARKOV CHAINS

26.9 Computing Expectations

We are interested in computing expectations of the form

E[ℎ(𝑋𝑡 )] (8)

and conditional expectations such as

E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)

where

• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃


• ℎ is a given function, which, in expressions involving matrix algebra, we’ll think of as
the column vector

ℎ(𝑥1 )
ℎ=⎛
⎜ ⋮ ⎞

⎝ ℎ(𝑥𝑛 ) ⎠

The unconditional expectation Eq. (8) is easy: We just sum over the distribution of 𝑋𝑡 to get

E[ℎ(𝑋𝑡 )] = ∑(𝜓𝑃 𝑡 )(𝑥)ℎ(𝑥)


𝑥∈𝑆

Here 𝜓 is the distribution of 𝑋0


Since 𝜓 and hence 𝜓𝑃 𝑡 are row vectors, we can also write this as

E[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ

For the conditional expectation Eq. (9), we need to sum over the conditional distribution of
𝑋𝑡+𝑘 given 𝑋𝑡 = 𝑥
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so

E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] = (𝑃 𝑘 ℎ)(𝑥) (10)

The vector 𝑃 𝑘 ℎ stores the conditional expectation E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] over all 𝑥

26.9.1 Expectations of Geometric Sums

Sometimes we also want to compute expectations of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 )


In view of the preceding discussion, this is


E [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
26.10. EXERCISES 445

where

(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯

Premultiplication by (𝐼 − 𝛽𝑃 )−1 amounts to “applying the resolvent operator”

26.10 Exercises

26.10.1 Exercise 1

According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be

𝛽
𝑝 ∶=
𝛼+𝛽

In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where

1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1

Your exercise is to illustrate this convergence


First,

• generate one simulated time series {𝑋𝑡 } of length 10,000, starting at 𝑋0 = 0


• plot 𝑋̄ 𝑚 − 𝑝 against 𝑚, where 𝑝 is as defined above

Second, repeat the first step, but this time taking 𝑋0 = 1


In both cases, set 𝛼 = 𝛽 = 0.1
The result should look something like the following — modulo randomness, of course
446 26. FINITE MARKOV CHAINS

(You don’t need to add the fancy touches to the graph—see the solution if you’re interested)

26.10.2 Exercise 2

A topic of interest for economics and many other disciplines is ranking


Let’s now consider one of the most practical and important ranking problems — the rank as-
signed to web pages by search engines
(Although the problem is motivated from outside of economics, there is in fact a deep connec-
tion between search ranking systems and prices in certain competitive equilibria — see [37])
To understand the issue, consider the set of results returned by a query to a web search en-
gine
For the user, it is desirable to

1. receive a large set of accurate matches


2. have the matches returned in order, where the order corresponds to some measure of
“importance”

Ranking according to a measure of importance is the problem we now consider


The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank
To illustrate the idea, consider the following diagram
26.10. EXERCISES 447

Imagine that this is a miniature version of the WWW, with

• each node representing a web page


• each arrow representing the existence of a link from one page to another

Now let’s think about which pages are likely to be important, in the sense of being valuable
to a search engine user
One possible criterion for the importance of a page is the number of inbound links — an indi-
cation of popularity
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting 𝑗 be (the integer index of) a typical page and 𝑟𝑗 be its ranking, we set

𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖

where

• ℓ𝑖 is the total number of outbound links from 𝑖


• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗

This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 )
There is, however, another interpretation, and it brings us back to Markov chains
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link
448 26. FINITE MARKOV CHAINS

With this definition of 𝑃 we have

𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖

Writing 𝑟 for the row vector of rankings, this becomes 𝑟 = 𝑟𝑃


Hence 𝑟 is the stationary distribution of the stochastic matrix 𝑃
Let’s think of 𝑃 (𝑖, 𝑗) as the probability of “moving” from page 𝑖 to page 𝑗
The value 𝑃 (𝑖, 𝑗) has the interpretation

• 𝑃 (𝑖, 𝑗) = 1/𝑘 if 𝑖 has 𝑘 outbound links and 𝑗 is one of them


• 𝑃 (𝑖, 𝑗) = 0 if 𝑖 has no direct link to 𝑗

Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page
Here “random” means that each link is selected with equal probability
Since 𝑟 is the stationary distribution of 𝑃 , assuming that the uniform ergodicity condition is
valid, we can interpret 𝑟𝑗 as the fraction of time that a (very persistent) random surfer spends
at page 𝑗
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank
The data for this graph is in the web_graph_data.txt file — you can also view it here
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n
A typical line from the file has the form

d -> h;

This should be interpreted as meaning that there exists a link from d to h


To parse this file and extract the relevant information, you can use regular expressions
The following code snippet provides a hint as to how you can go about this

In [21]: import re

re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics

Out[21]: ['x', 'y', 'z']

In [22]: re.findall('\w', 'a ^^ b &&& $$ c')

Out[22]: ['a', 'b', 'c']

When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a
26.10. EXERCISES 449

26.10.3 Exercise 3

In numerical work, it is sometimes convenient to replace a continuous model with a discrete


one
In particular, Markov chains are routinely generated as discrete approximations to AR(1)
processes of the form

𝑦𝑡+1 = 𝜌𝑦𝑡 + 𝑢𝑡+1

Here 𝑢𝑡 is assumed to be IID and 𝑁 (0, 𝜎𝑢2 )


The variance of the stationary probability distribution of {𝑦𝑡 } is

𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2

Tauchen’s method [128] is the most common method for approximating this continuous state
process with a finite state Markov chain
A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise
As a first step, we choose

• 𝑛, the number of states for the discrete approximation


• 𝑚, an integer that parameterizes the width of the state space

Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ R and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that

• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦
• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)

Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 )
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the deriva-
tion, the rules are as follows:

1. If 𝑗 = 0, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥0 ) = 𝐹 (𝑥0 − 𝜌𝑥𝑖 + 𝑠/2)

1. If 𝑗 = 𝑛 − 1, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥𝑛−1 ) = 1 − 𝐹 (𝑥𝑛−1 − 𝜌𝑥𝑖 − 𝑠/2)

1. Otherwise, set
450 26. FINITE MARKOV CHAINS

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 + 𝑠/2) − 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 − 𝑠/2)

The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that


returns {𝑥0 , … , 𝑥𝑛−1 } ⊂ R and 𝑛 × 𝑛 matrix 𝑃 as described above

• Even better, write a function that returns an instance of QuantEcon.py’s MarkovChain


class

26.11 Solutions

In [23]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import MarkovChain

26.11.1 Exercise 1

Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability

In [24]: α = β = 0.1
N = 10000
p = β / (α + β)

P = ((1 - α, α), # Careful: P and p are distinct


( β, 1 - β))
P = np.array(P)
mc = MarkovChain(P)

fig, ax = plt.subplots(figsize=(9, 6))


ax.set_ylim(-0.25, 0.25)
ax.grid()
ax.hlines(0, 0, N, lw=2, alpha=0.6) # Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')):


# == Generate time series for worker that starts at x0 == #
X = mc.simulate(N, init=x0)
# == Compute fraction of time spent unemployed, for each n == #
X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float))
# == Plot == #
ax.fill_between(range(N), np.zeros(N), X_bar - p, color=col, alpha=0.1)
ax.plot(X_bar - p, color=col, label=f'$X_0 = \, {x0} $')
ax.plot(X_bar - p, 'k-', alpha=0.6) # Overlay in black--make lines clearer

ax.legend(loc='upper right')
plt.show()
26.11. SOLUTIONS 451

26.11.2 Exercise 2

First, save the data into a file called web_graph_data.txt by executing the next cell

In [25]: %%file web_graph_data.txt


a -> d;
a -> f;
b -> j;
b -> k;
b -> m;
c -> c;
c -> g;
c -> j;
c -> m;
d -> f;
d -> h;
d -> k;
e -> d;
e -> h;
e -> l;
f -> a;
f -> b;
f -> j;
f -> l;
g -> b;
g -> j;
h -> d;
h -> g;
h -> l;
h -> m;
i -> g;
i -> h;
i -> n;
j -> e;
j -> i;
j -> k;
k -> n;
l -> m;
452 26. FINITE MARKOV CHAINS

m -> g;
n -> c;
n -> j;
n -> m;

Writing web_graph_data.txt

In [26]: """
Return list of pages, ordered by rank
"""
import numpy as np
from operator import itemgetter

infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'

n = 14 # Total number of web pages (nodes)

# == Create a matrix Q indicating existence of links == #


# * Q[i, j] = 1 if there is a link from i to j
# * Q[i, j] = 0 otherwise
Q = np.zeros((n, n), dtype=int)
f = open(infile, 'r')
edges = f.readlines()
f.close()
for edge in edges:
from_node, to_node = re.findall('\w', edge)
i, j = alphabet.index(from_node), alphabet.index(to_node)
Q[i, j] = 1
# == Create the corresponding Markov matrix P == #
P = np.empty((n, n))
for i in range(n):
P[i, :] = Q[i, :] / Q[i, :].sum()
mc = MarkovChain(P)
# == Compute the stationary distribution r == #
r = mc.stationary_distributions[0]
ranked_pages = {alphabet[i] : r[i] for i in range(n)}
# == Print solution, sorted from highest to lowest rank == #
print('Rankings\n ***')
for name, rank in sorted(ranked_pages.items(), key=itemgetter(1), reverse=1):
print(f'{name}: {rank:.4}')

Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911

26.11.3 Exercise 3

A solution from the QuantEcon.py library can be found here


Footnotes
26.11. SOLUTIONS 453

[1] Hint: First show that if 𝑃 and 𝑄 are stochastic matrices then so is their product — to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that 𝑃 𝑛
is a stochastic matrix using induction.
454 26. FINITE MARKOV CHAINS
27

Continuous State Markov Chains

27.1 Contents

• Overview 27.2

• The Density Case 27.3

• Beyond Densities 27.4

• Stability 27.5

• Exercises 27.6

• Solutions 27.7

• Appendix 27.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

27.2 Overview

In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that

• evolve in discrete-time
• are often nonlinear

The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as we’ll see later on

455
456 27. CONTINUOUS STATE MARKOV CHAINS

The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,

• What happens to the distribution of the state variables?


• Is there anything we can say about the “average behavior” of these variables?
• Is there a notion of “steady state” or “long-run equilibrium” that’s applicable to the
model?

– If so, how can we compute it?

Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.

Note
For some people, the term “Markov chain” always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[95]) in using the term to refer to any discrete time Markov process

27.3 The Density Case

You are probably aware that some distributions can be represented by densities and some
cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition
Once we’ve built some intuition we’ll cover the general case

27.3.1 Definitions and Basic Properties

In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space 𝑆
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnega-
tive square matrix 𝑃 = 𝑃 [𝑖, 𝑗] such that each row 𝑃 [𝑖, ⋅] sums to one
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖
to state 𝑗 in one unit of time
In symbols,

P{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]

Equivalently,
27.3. THE DENSITY CASE 457

• 𝑃 can be thought of as a family of distributions 𝑃 [𝑖, ⋅], one for each 𝑖 ∈ 𝑆


• 𝑃 [𝑖, ⋅] is the distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑖

(As you probably recall, when using NumPy arrays, 𝑃 [𝑖, ⋅] is expressed as P[i, :])
In this section, we’ll allow 𝑆 to be a subset of R, such as

• R itself
• the positive reals (0, ∞)
• a bounded interval (𝑎, 𝑏)

The family of discrete distributions 𝑃 [𝑖, ⋅] will be replaced by a family of densities 𝑝(𝑥, ⋅), one
for each 𝑥 ∈ 𝑆
Analogous to the finite state case, 𝑝(𝑥, ⋅) is to be understood as the distribution (density) of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥
More formally, a stochastic kernel on 𝑆 is a function 𝑝 ∶ 𝑆 × 𝑆 → R with the property that

1. 𝑝(𝑥, 𝑦) ≥ 0 for all 𝑥, 𝑦 ∈ 𝑆


2. ∫ 𝑝(𝑥, 𝑦)𝑑𝑦 = 1 for all 𝑥 ∈ 𝑆

(Integrals are over the whole space unless otherwise specified)


For example, let 𝑆 = R and consider the particular stochastic kernel 𝑝𝑤 defined by

1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (1)
2𝜋 2

What kind of model does 𝑝𝑤 represent?


The answer is, the (normally distributed) random walk

IID
𝑋𝑡+1 = 𝑋𝑡 + 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (2)

To see this, let’s find the stochastic kernel 𝑝 corresponding to Eq. (2)
Recall that 𝑝(𝑥, ⋅) represents the distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
Letting 𝑋𝑡 = 𝑥 in Eq. (2) and considering the distribution of 𝑋𝑡+1 , we see that 𝑝(𝑥, ⋅) =
𝑁 (𝑥, 1)
In other words, 𝑝 is exactly 𝑝𝑤 , as defined in Eq. (1)

27.3.2 Connection to Stochastic Difference Equations

In the previous section, we made the connection between stochastic difference equation
Eq. (2) and stochastic kernel Eq. (1)
In economics and time-series analysis we meet stochastic difference equations of all different
shapes and sizes
It will be useful for us if we have some systematic methods for converting stochastic difference
equations into stochastic kernels
458 27. CONTINUOUS STATE MARKOV CHAINS

To this end, consider the generic (scalar) stochastic difference equation given by

𝑋𝑡+1 = 𝜇(𝑋𝑡 ) + 𝜎(𝑋𝑡 ) 𝜉𝑡+1 (3)

Here we assume that

IID
• {𝜉𝑡 } ∼ 𝜙, where 𝜙 is a given density on R
• 𝜇 and 𝜎 are given functions on 𝑆, with 𝜎(𝑥) > 0 for all 𝑥

Example 1: The random walk Eq. (2) is a special case of Eq. (3), with 𝜇(𝑥) = 𝑥 and 𝜎(𝑥) =
1
Example 2: Consider the ARCH model

𝑋𝑡+1 = 𝛼𝑋𝑡 + 𝜎𝑡 𝜉𝑡+1 , 𝜎𝑡2 = 𝛽 + 𝛾𝑋𝑡2 , 𝛽, 𝛾 > 0

Alternatively, we can write the model as

𝑋𝑡+1 = 𝛼𝑋𝑡 + (𝛽 + 𝛾𝑋𝑡2 )1/2 𝜉𝑡+1 (4)

This is a special case of Eq. (3) with 𝜇(𝑥) = 𝛼𝑥 and 𝜎(𝑥) = (𝛽 + 𝛾𝑥2 )1/2
Example 3: With stochastic production and a constant savings rate, the one-sector neoclas-
sical growth model leads to a law of motion for capital per worker such as

𝑘𝑡+1 = 𝑠𝐴𝑡+1 𝑓(𝑘𝑡 ) + (1 − 𝛿)𝑘𝑡 (5)

Here

• 𝑠 is the rate of savings

• 𝐴𝑡+1 is a production shock

– The 𝑡 + 1 subscript indicates that 𝐴𝑡+1 is not visible at time 𝑡

• 𝛿 is a depreciation rate

• 𝑓 ∶ R+ → R+ is a production function satisfying 𝑓(𝑘) > 0 whenever 𝑘 > 0

(The fixed savings rate can be rationalized as the optimal policy for a particular set of tech-
nologies and preferences (see [87], section 3.1.2), although we omit the details here)
Equation Eq. (5) is a special case of Eq. (3) with 𝜇(𝑥) = (1 − 𝛿)𝑥 and 𝜎(𝑥) = 𝑠𝑓(𝑥)
Now let’s obtain the stochastic kernel corresponding to the generic model Eq. (3)
To find it, note first that if 𝑈 is a random variable with density 𝑓𝑈 , and 𝑉 = 𝑎 + 𝑏𝑈 for some
constants 𝑎, 𝑏 with 𝑏 > 0, then the density of 𝑉 is given by

1 𝑣−𝑎
𝑓𝑉 (𝑣) = 𝑓𝑈 ( ) (6)
𝑏 𝑏
(The proof is below. For a multidimensional version see EDTC, theorem 8.1.3)
27.3. THE DENSITY CASE 459

Taking Eq. (6) as given for the moment, we can obtain the stochastic kernel 𝑝 for Eq. (3) by
recalling that 𝑝(𝑥, ⋅) is the conditional density of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
In the present case, this is equivalent to stating that 𝑝(𝑥, ⋅) is the density of 𝑌 ∶= 𝜇(𝑥) +
𝜎(𝑥) 𝜉𝑡+1 when 𝜉𝑡+1 ∼ 𝜙
Hence, by Eq. (6),

1 𝑦 − 𝜇(𝑥)
𝑝(𝑥, 𝑦) = 𝜙( ) (7)
𝜎(𝑥) 𝜎(𝑥)

For example, the growth model in Eq. (5) has stochastic kernel

1 𝑦 − (1 − 𝛿)𝑥
𝑝(𝑥, 𝑦) = 𝜙( ) (8)
𝑠𝑓(𝑥) 𝑠𝑓(𝑥)

where 𝜙 is the density of 𝐴𝑡+1


(Regarding the state space 𝑆 for this model, a natural choice is (0, ∞) — in which case
𝜎(𝑥) = 𝑠𝑓(𝑥) is strictly positive for all 𝑠 as required)

27.3.3 Distribution Dynamics

In this section of our lecture on finite Markov chains, we asked the following question: If

1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃


2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡

then what is the distribution of 𝑋𝑡+1 ?


Letting 𝜓𝑡+1 denote the distribution of 𝑋𝑡+1 , the answer we gave was that

𝜓𝑡+1 [𝑗] = ∑ 𝑃 [𝑖, 𝑗]𝜓𝑡 [𝑖]


𝑖∈𝑆

This intuitive equality states that the probability of being at 𝑗 tomorrow is the probability of
visiting 𝑖 today and then going on to 𝑗, summed over all possible 𝑖
In the density case, we just replace the sum with an integral and probability mass functions
with densities, yielding

𝜓𝑡+1 (𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡 (𝑥) 𝑑𝑥, ∀𝑦 ∈ 𝑆 (9)

It is convenient to think of this updating process in terms of an operator


(An operator is just a function, but the term is usually reserved for a function that sends
functions into functions)
Let 𝒟 be the set of all densities on 𝑆, and let 𝑃 be the operator from 𝒟 to itself that takes
density 𝜓 and sends it into new density 𝜓𝑃 , where the latter is defined by

(𝜓𝑃 )(𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓(𝑥)𝑑𝑥 (10)


460 27. CONTINUOUS STATE MARKOV CHAINS

This operator is usually called the Markov operator corresponding to 𝑝

Note
Unlike most operators, we write 𝑃 to the right of its argument, instead of to the
left (i.e., 𝜓𝑃 instead of 𝑃 𝜓). This is a common convention, with the intention be-
ing to maintain the parallel with the finite case — see here

With this notation, we can write Eq. (9) more succinctly as 𝜓𝑡+1 (𝑦) = (𝜓𝑡 𝑃 )(𝑦) for all 𝑦, or,
dropping the 𝑦 and letting “=” indicate equality of functions,

𝜓𝑡+1 = 𝜓𝑡 𝑃 (11)

Equation Eq. (11) tells us that if we specify a distribution for 𝜓0 , then the entire sequence of
future distributions can be obtained by iterating with 𝑃
It’s interesting to note that Eq. (11) is a deterministic difference equation
Thus, by converting a stochastic difference equation such as Eq. (3) into a stochastic kernel 𝑝
and hence an operator 𝑃 , we convert a stochastic difference equation into a deterministic one
(albeit in a much higher dimensional space)

Note
Some people might be aware that discrete Markov chains are in fact a special case
of the continuous Markov chains we have just described. The reason is that proba-
bility mass functions are densities with respect to the counting measure.

27.3.4 Computation

To learn about the dynamics of a given process, it’s useful to compute and study the se-
quences of densities generated by the model
One way to do this is to try to implement the iteration described by Eq. (10) and Eq. (11)
using numerical integration
However, to produce 𝜓𝑃 from 𝜓 via Eq. (10), you would need to integrate at every 𝑦, and
there is a continuum of such 𝑦
Another possibility is to discretize the model, but this introduces errors of unknown size
A nicer alternative in the present setting is to combine simulation with an elegant estimator
called the look-ahead estimator
Let’s go over the ideas with reference to the growth model discussed above, the dynamics of
which we repeat here for convenience:

𝑘𝑡+1 = 𝑠𝐴𝑡+1 𝑓(𝑘𝑡 ) + (1 − 𝛿)𝑘𝑡 (12)

Our aim is to compute the sequence {𝜓𝑡 } associated with this model and fixed initial condi-
tion 𝜓0
To approximate 𝜓𝑡 by simulation, recall that, by definition, 𝜓𝑡 is the density of 𝑘𝑡 given 𝑘0 ∼
𝜓0
If we wish to generate observations of this random variable, all we need to do is
27.3. THE DENSITY CASE 461

1. draw 𝑘0 from the specified initial condition 𝜓0


2. draw the shocks 𝐴1 , … , 𝐴𝑡 from their specified density 𝜙
3. compute 𝑘𝑡 iteratively via Eq. (12)

If we repeat this 𝑛 times, we get 𝑛 independent observations 𝑘𝑡1 , … , 𝑘𝑡𝑛


With these draws in hand, the next step is to generate some kind of representation of their
distribution 𝜓𝑡
A naive approach would be to use a histogram, or perhaps a smoothed histogram using
SciPy’s gaussian_kde function
However, in the present setting, there is a much better way to do this, based on the look-
ahead estimator
With this estimator, to construct an estimate of 𝜓𝑡 , we actually generate 𝑛 observations of
𝑘𝑡−1 , rather than 𝑘𝑡
1 𝑛
Now we take these 𝑛 observations 𝑘𝑡−1 , … , 𝑘𝑡−1 and form the estimate

1 𝑛
𝜓𝑡𝑛 (𝑦) 𝑖
= ∑ 𝑝(𝑘𝑡−1 , 𝑦) (13)
𝑛 𝑖=1

where 𝑝 is the growth model stochastic kernel in Eq. (8)


What is the justification for this slightly surprising estimator?
The idea is that, by the strong law of large numbers,

1 𝑛 𝑖
∑ 𝑝(𝑘𝑡−1 , 𝑦) → E𝑝(𝑘𝑡−1
𝑖
, 𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓𝑡−1 (𝑥) 𝑑𝑥 = 𝜓𝑡 (𝑦)
𝑛 𝑖=1

with probability one as 𝑛 → ∞


Here the first equality is by the definition of 𝜓𝑡−1 , and the second is by Eq. (9)
We have just shown that our estimator 𝜓𝑡𝑛 (𝑦) in Eq. (13) converges almost surely to 𝜓𝑡 (𝑦),
which is just what we want to compute
In fact, much stronger convergence results are true (see, for example, this paper)

27.3.5 Implementation

A class called LAE for estimating densities by this technique can be found in lae.py
Given our use of the __call__ method, an instance of LAE acts as a callable object, which
is essentially a function that can store its own data (see this discussion)
This function returns the right-hand side of Eq. (13) using

• the data and stochastic kernel that it stores as its instance data
• the value 𝑦 as its argument

The function is vectorized, in the sense that if psi is such an instance and y is an array, then
the call psi(y) acts elementwise
462 27. CONTINUOUS STATE MARKOV CHAINS

(This is the reason that we reshaped X and y inside the class — to make vectorization work)
Because the implementation is fully vectorized, it is about as efficient as it would be in C or
Fortran

27.3.6 Example

The following code is an example of usage for the stochastic growth model described above

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import lognorm, beta
from quantecon import LAE

# == Define parameters == #
s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # We set f(k) = k**α
ψ_0 = beta(5, 5, scale=0.5) # Initial distribution
� = lognorm(a_σ)

def p(x, y):


"""
Stochastic kernel for the growth model with Cobb-Douglas production.
Both x and y must be strictly positive.
"""
d = s * x**α
return �.pdf((y - (1 - δ) * x) / d) / d

n = 10000 # Number of observations at each date t


T = 30 # Compute density of k_t at 1,...,T+1

# == Generate matrix s.t. t-th column is n observations of k_t == #


k = np.empty((n, T))
A = �.rvs((n, T))
k[:, 0] = ψ_0.rvs(n) # Draw first column from initial distribution
for t in range(T-1):
k[:, t+1] = s * A[:, t] * k[:, t]**α + (1 - δ) * k[:, t]

# == Generate T instances of LAE using this data, one for each date t == #
laes = [LAE(p, k[:, t]) for t in range(T)]

# == Plot == #
fig, ax = plt.subplots()
ygrid = np.linspace(0.01, 4.0, 200)
greys = [str(g) for g in np.linspace(0.0, 0.8, T)]
greys.reverse()
for ψ, g in zip(laes, greys):
ax.plot(ygrid, ψ(ygrid), color=g, lw=2, alpha=0.6)
ax.set_xlabel('capital')
ax.set_title(f'Density of $k_1$ (lighter) to $k_T$ (darker) for $T={T}$')
plt.show()
27.4. BEYOND DENSITIES 463

The figure shows part of the density sequence {𝜓𝑡 }, with each density computed via the look-
ahead estimator
Notice that the sequence of densities shown in the figure seems to be converging — more on
this in just a moment
Another quick comment is that each of these distributions could be interpreted as a cross-
sectional distribution (recall this discussion)

27.4 Beyond Densities

Up until now, we have focused exclusively on continuous state Markov chains where all condi-
tional distributions 𝑝(𝑥, ⋅) are densities
As discussed above, not all distributions can be represented as densities
If the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥 cannot be represented as a density for
some 𝑥 ∈ 𝑆, then we need a slightly different theory
The ultimate option is to switch from densities to probability measures, but not all readers
will be familiar with measure theory
We can, however, construct a fairly general theory using distribution functions

27.4.1 Example and Definitions

To illustrate the issues, recall that Hopenhayn and Rogerson [67] study a model of firm dy-
namics where individual firm productivity follows the exogenous process
464 27. CONTINUOUS STATE MARKOV CHAINS

IID
𝑋𝑡+1 = 𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 , where {𝜉𝑡 } ∼ 𝑁 (0, 𝜎2 )

As is, this fits into the density case we treated above


However, the authors wanted this process to take values in [0, 1], so they added boundaries at
the endpoints 0 and 1
One way to write this is

𝑋𝑡+1 = ℎ(𝑎 + 𝜌𝑋𝑡 + 𝜉𝑡+1 ) where ℎ(𝑥) ∶= 𝑥 1{0 ≤ 𝑥 ≤ 1} + 1{𝑥 > 1}

If you think about it, you will see that for any given 𝑥 ∈ [0, 1], the conditional distribution of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥 puts positive probability mass on 0 and 1
Hence it cannot be represented as a density
What we can do instead is use cumulative distribution functions (cdfs)
To this end, set

𝐺(𝑥, 𝑦) ∶= P{ℎ(𝑎 + 𝜌𝑥 + 𝜉𝑡+1 ) ≤ 𝑦} (0 ≤ 𝑥, 𝑦 ≤ 1)

This family of cdfs 𝐺(𝑥, ⋅) plays a role analogous to the stochastic kernel in the density case
The distribution dynamics in Eq. (9) are then replaced by

𝐹𝑡+1 (𝑦) = ∫ 𝐺(𝑥, 𝑦)𝐹𝑡 (𝑑𝑥) (14)

Here 𝐹𝑡 and 𝐹𝑡+1 are cdfs representing the distribution of the current state and next period
state
The intuition behind Eq. (14) is essentially the same as for Eq. (9)

27.4.2 Computation

If you wish to compute these cdfs, you cannot use the look-ahead estimator as before
Indeed, you should not use any density estimator, since the objects you are estimat-
ing/computing are not densities
One good option is simulation as before, combined with the empirical distribution function

27.5 Stability

In our lecture on finite Markov chains, we also studied stationarity, stability and ergodicity
Here we will cover the same topics for the continuous case
We will, however, treat only the density case (as in this section), where the stochastic kernel
is a family of densities
The general case is relatively similar — references are given below
27.5. STABILITY 465

27.5.1 Theoretical Results

Analogous to the finite case, given a stochastic kernel 𝑝 and corresponding Markov operator
as defined in Eq. (10), a density 𝜓∗ on 𝑆 is called stationary for 𝑃 if it is a fixed point of the
operator 𝑃
In other words,

𝜓∗ (𝑦) = ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥, ∀𝑦 ∈ 𝑆 (15)

As with the finite case, if 𝜓∗ is stationary for 𝑃 , and the distribution of 𝑋0 is 𝜓∗ , then, in
view of Eq. (11), 𝑋𝑡 will have this same distribution for all 𝑡
Hence 𝜓∗ is the stochastic equivalent of a steady state
In the finite case, we learned that at least one stationary distribution exists, although there
may be many
When the state space is infinite, the situation is more complicated
Even existence can fail very easily
For example, the random walk model has no stationary density (see, e.g., EDTC, p. 210)
However, there are well-known conditions under which a stationary density 𝜓∗ exists
With additional conditions, we can also get a unique stationary density (𝜓 ∈ 𝒟 and 𝜓 =
𝜓𝑃 ⟹ 𝜓 = 𝜓∗ ), and also global convergence in the sense that

∀ 𝜓 ∈ 𝒟, 𝜓𝑃 𝑡 → 𝜓∗ as 𝑡 → ∞ (16)

This combination of existence, uniqueness and global convergence in the sense of Eq. (16) is
often referred to as global stability
Under very similar conditions, we get ergodicity, which means that

1 𝑛
∑ ℎ(𝑋𝑡 ) → ∫ ℎ(𝑥)𝜓∗ (𝑥)𝑑𝑥 as 𝑛 → ∞ (17)
𝑛 𝑡=1

for any (measurable) function ℎ ∶ 𝑆 → R such that the right-hand side is finite
Note that the convergence in Eq. (17) does not depend on the distribution (or value) of 𝑋0
This is actually very important for simulation — it means we can learn about 𝜓∗ (i.e., ap-
proximate the right-hand side of Eq. (17) via the left-hand side) without requiring any special
knowledge about what to do with 𝑋0
So what are these conditions we require to get global stability and ergodicity?
In essence, it must be the case that

1. Probability mass does not drift off to the “edges” of the state space
2. Sufficient “mixing” obtains

For one such set of conditions see theorem 8.2.14 of EDTC


In addition
466 27. CONTINUOUS STATE MARKOV CHAINS

• [123] contains a classic (but slightly outdated) treatment of these topics


• From the mathematical literature, [82] and [95] give outstanding in-depth treatments
• Section 8.1.2 of EDTC provides detailed intuition, and section 8.3 gives additional refer-
ences
• EDTC, section 11.3.4 provides a specific treatment for the growth model we considered
in this lecture

27.5.2 An Example of Stability

As stated above, the growth model treated here is stable under mild conditions on the primi-
tives

• See EDTC, section 11.3.4 for more details

We can see this stability in action — in particular, the convergence in Eq. (16) — by simulat-
ing the path of densities from various initial conditions
Here is such a figure

All sequences are converging towards the same limit, regardless of their initial condition
The details regarding initial conditions and so on are given in this exercise, where you are
asked to replicate the figure

27.5.3 Computing Stationary Densities

In the preceding figure, each sequence of densities is converging towards the unique stationary
density 𝜓∗
Even from this figure, we can get a fair idea what 𝜓∗ looks like, and where its mass is located
However, there is a much more direct way to estimate the stationary density, and it involves
only a slight modification of the look-ahead estimator
27.6. EXERCISES 467

Let’s say that we have a model of the form Eq. (3) that is stable and ergodic
Let 𝑝 be the corresponding stochastic kernel, as given in Eq. (7)
To approximate the stationary density 𝜓∗ , we can simply generate a long time-series
𝑋0 , 𝑋1 , … , 𝑋𝑛 and estimate 𝜓∗ via

1 𝑛
𝜓𝑛∗ (𝑦) = ∑ 𝑝(𝑋𝑡 , 𝑦) (18)
𝑛 𝑡=1

This is essentially the same as the look-ahead estimator Eq. (13), except that now the obser-
vations we generate are a single time-series, rather than a cross-section
The justification for Eq. (18) is that, with probability one as 𝑛 → ∞,

1 𝑛
∑ 𝑝(𝑋𝑡 , 𝑦) → ∫ 𝑝(𝑥, 𝑦)𝜓∗ (𝑥) 𝑑𝑥 = 𝜓∗ (𝑦)
𝑛 𝑡=1

where the convergence is by Eq. (17) and the equality on the right is by Eq. (15)
The right-hand side is exactly what we want to compute
On top of this asymptotic result, it turns out that the rate of convergence for the look-ahead
estimator is very good
The first exercise helps illustrate this point

27.6 Exercises

27.6.1 Exercise 1

Consider the simple threshold autoregressive model

IID
𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1 where {𝜉𝑡 } ∼ 𝑁 (0, 1) (19)

This is one of those rare nonlinear stochastic models where an analytical expression for the
stationary density is available
In particular, provided that |𝜃| < 1, there is a unique stationary density 𝜓∗ given by

𝜃𝑦
𝜓∗ (𝑦) = 2 𝜙(𝑦) Φ [ ] (20)
(1 − 𝜃2 )1/2

Here 𝜙 is the standard normal density and Φ is the standard normal cdf
As an exercise, compute the look-ahead estimate of 𝜓∗ , as defined in Eq. (18), and compare it
with 𝜓∗ in Eq. (20) to see whether they are indeed close for large 𝑛
In doing so, set 𝜃 = 0.8 and 𝑛 = 500
The next figure shows the result of such a computation
468 27. CONTINUOUS STATE MARKOV CHAINS

The additional density (black line) is a nonparametric kernel density estimate, added to the
solution for illustration
(You can try to replicate it before looking at the solution if you want to)
As you can see, the look-ahead estimator is a much tighter fit than the kernel density estima-
tor
If you repeat the simulation you will see that this is consistently the case

27.6.2 Exercise 2

Replicate the figure on global convergence shown above


The densities come from the stochastic growth model treated at the start of the lecture
Begin with the code found in stochasticgrowth.py
Use the same parameters
For the four initial distributions, use the shifted beta distributions

ψ_0 = beta(5, 5, scale=0.5, loc=i*2)

27.6.3 Exercise 3

A common way to compare distributions visually is with boxplots


To illustrate, let’s generate three artificial data sets and compare them with a boxplot
The three data sets we will use are:

{𝑋1 , … , 𝑋𝑛 } ∼ 𝐿𝑁 (0, 1), {𝑌1 , … , 𝑌𝑛 } ∼ 𝑁 (2, 1), and {𝑍1 , … , 𝑍𝑛 } ∼ 𝑁 (4, 1),

Here is the code and figure:


27.6. EXERCISES 469

In [3]: n = 500
x = np.random.randn(n) # N(0, 1)
x = np.exp(x) # Map x to lognormal
y = np.random.randn(n) + 2.0 # N(2, 1)
z = np.random.randn(n) + 4.0 # N(4, 1)

fig, ax = plt.subplots(figsize=(10, 6.6))


ax.boxplot([x, y, z])
ax.set_xticks((1, 2, 3))
ax.set_ylim(-2, 14)
ax.set_xticklabels(('$X$', '$Y$', '$Z$'), fontsize=16)
plt.show()

Each data set is represented by a box, where the top and bottom of the box are the third and
first quartiles of the data, and the red line in the center is the median
The boxes give some indication as to

• the location of probability mass for each sample


• whether the distribution is right-skewed (as is the lognormal distribution), etc

Now let’s put these ideas to use in a simulation


Consider the threshold autoregressive model in Eq. (19)
We know that the distribution of 𝑋𝑡 will converge to Eq. (20) whenever |𝜃| < 1
Let’s observe this convergence from different initial conditions using boxplots
In particular, the exercise is to generate J boxplot figures, one for each initial condition 𝑋0 in

initial_conditions = np.linspace(8, 0, J)

For each 𝑋0 in this set,


470 27. CONTINUOUS STATE MARKOV CHAINS

1. Generate 𝑘 time-series of length 𝑛, each starting at 𝑋0 and obeying Eq. (19)


2. Create a boxplot representing 𝑛 distributions, where the 𝑡-th distribution shows the 𝑘
observations of 𝑋𝑡

Use 𝜃 = 0.9, 𝑛 = 20, 𝑘 = 5000, 𝐽 = 8

27.7 Solutions

27.7.1 Exercise 1

Look-ahead estimation of a TAR stationary density, where the TAR model is

𝑋𝑡+1 = 𝜃|𝑋𝑡 | + (1 − 𝜃2 )1/2 𝜉𝑡+1

and 𝜉𝑡 ∼ 𝑁 (0, 1)
Try running at n = 10, 100, 1000, 10000 to get an idea of the speed of convergence

In [4]: from scipy.stats import norm, gaussian_kde

� = norm()
n = 500
θ = 0.8
# == Frequently used constants == #
d = np.sqrt(1 - θ**2)
δ = θ / d

def ψ_star(y):
"True stationary density of the TAR Model"
return 2 * norm.pdf(y) * norm.cdf(δ * y)

def p(x, y):


"Stochastic kernel for the TAR model."
return �.pdf((y - θ * np.abs(x)) / d) / d

Z = �.rvs(n)
X = np.empty(n)
for t in range(n-1):
X[t+1] = θ * np.abs(X[t]) + d * Z[t]
ψ_est = LAE(p, X)
k_est = gaussian_kde(X)

fig, ax = plt.subplots(figsize=(10, 7))


ys = np.linspace(-3, 3, 200)
ax.plot(ys, ψ_star(ys), 'b-', lw=2, alpha=0.6, label='true')
ax.plot(ys, ψ_est(ys), 'g-', lw=2, alpha=0.6, label='look-ahead estimate')
ax.plot(ys, k_est(ys), 'k-', lw=2, alpha=0.6, label='kernel based estimate')
ax.legend(loc='upper left')
plt.show()
27.7. SOLUTIONS 471

27.7.2 Exercise 2

Here’s one program that does the job

In [5]: # == Define parameters == #


s = 0.2
δ = 0.1
a_σ = 0.4 # A = exp(B) where B ~ N(0, a_σ)
α = 0.4 # f(k) = k**α

� = lognorm(a_σ)

def p(x, y):


"Stochastic kernel, vectorized in x. Both x and y must be positive."
d = s * x**α
return �.pdf((y - (1 - δ) * x) / d) / d

n = 1000 # Number of observations at each date t


T = 40 # Compute density of k_t at 1,...,T

fig, axes = plt.subplots(2, 2, figsize=(11, 8))


axes = axes.flatten()
xmax = 6.5

for i in range(4):
ax = axes[i]
ax.set_xlim(0, xmax)
ψ_0 = beta(5, 5, scale=0.5, loc=i*2) # Initial distribution

# == Generate matrix s.t. t-th column is n observations of k_t == #


k = np.empty((n, T))
A = �.rvs((n, T))
k[:, 0] = ψ_0.rvs(n)
for t in range(T-1):
k[:, t+1] = s * A[:,t] * k[:, t]**α + (1 - δ) * k[:, t]
472 27. CONTINUOUS STATE MARKOV CHAINS

# == Generate T instances of lae using this data, one for each t == #


laes = [LAE(p, k[:, t]) for t in range(T)]

ygrid = np.linspace(0.01, xmax, 150)


greys = [str(g) for g in np.linspace(0.0, 0.8, T)]
greys.reverse()
for ψ, g in zip(laes, greys):
ax.plot(ygrid, ψ(ygrid), color=g, lw=2, alpha=0.6)
ax.set_xlabel('capital')
plt.show()

27.7.3 Exercise 3

Here’s a possible solution


Note the way we use vectorized code to simulate the 𝑘 time series for one boxplot all at once

In [6]: n = 20
k = 5000
J = 6

θ = 0.9
d = np.sqrt(1 - θ**2)
δ = θ / d

fig, axes = plt.subplots(J, 1, figsize=(10, 4*J))


initial_conditions = np.linspace(8, 0, J)
X = np.empty((k, n))

for j in range(J):

axes[j].set_ylim(-4, 8)
axes[j].set_title(f'time series from t = {initial_conditions[j]}')
27.7. SOLUTIONS 473

Z = np.random.randn(k, n)
X[:, 0] = initial_conditions[j]
for t in range(1, n):
X[:, t] = θ * np.abs(X[:, t-1]) + d * Z[:, t]
axes[j].boxplot(X)

plt.show()
474 27. CONTINUOUS STATE MARKOV CHAINS
27.8. APPENDIX 475

27.8 Appendix

Here’s the proof of Eq. (6)


Let 𝐹𝑈 and 𝐹𝑉 be the cumulative distributions of 𝑈 and 𝑉 respectively
By the definition of 𝑉 , we have 𝐹𝑉 (𝑣) = P{𝑎 + 𝑏𝑈 ≤ 𝑣} = P{𝑈 ≤ (𝑣 − 𝑎)/𝑏}
In other words, 𝐹𝑉 (𝑣) = 𝐹𝑈 ((𝑣 − 𝑎)/𝑏)
Differentiating with respect to 𝑣 yields Eq. (6)
476 27. CONTINUOUS STATE MARKOV CHAINS
28

Cass-Koopmans Optimal Growth


Model

28.1 Contents

• Overview 28.2

• The Growth Model 28.3

• Competitive Equilibrium 28.4

Coauthor: Brandon Kaplowitz

28.2 Overview

This lecture describes a model that Tjalling Koopmans [78] and David Cass [24] used to ana-
lyze optimal growth
The model can be viewed as an extension of the model of Robert Solow described in an ear-
lier lecture but adapted to make the savings rate the outcome of an optimal choice
(Solow assumed a constant saving rate determined outside the model)
We describe two versions of the model to illustrate what is, in fact, a more general connection
between a planned economy and an economy organized as a competitive equilibrium
The lecture uses important ideas including

• Hicks-Arrow prices named after John R. Hicks and Kenneth Arrow


• A min-max problem for solving a planning problem
• A shooting algorithm for solving difference equations subject to initial and terminal
conditions
• A connection between some Lagrange multipliers in the min-max problem and the
Hicks-Arrow prices
• A Big 𝐾 , little 𝑘 trick widely used in macroeconomic dynamics

• We shall encounter this trick in this lecture and also in this lecture

477
478 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

• An application of a guess and verify method for solving a system of difference equa-
tions
• The intimate connection between the cases for the optimality of two competing visions
of good ways to organize an economy, namely:

• socialism in which a central planner commands the allocation of resources,


and
• capitalism (also known as a free markets economy) in which competitive
equilibrium prices induce individual consumers and producers to choose a
socially optimal allocation as an unintended consequence of their completely
selfish decisions

• A turnpike property that describes optimal paths for long-but-finite horizon economies
• A non-stochastic version of a theory of the term structure of interest rates

Let’s start with some imports

In [1]: from numba import njit


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

28.3 The Growth Model

Time is discrete and takes values 𝑡 = 0, 1, … , 𝑇


(We leave open the possibility that 𝑇 = +∞, but that will require special care in interpreting
and using a terminal condition on 𝐾𝑡 at 𝑡 = 𝑇 + 1 to be described below)
A single good can either be consumed or invested in physical capital
The consumption good is not durable and depreciates completely if not consumed immedi-
ately
The capital good is durable but depreciates each period at rate 𝛿 ∈ (0, 1)
We let 𝐶𝑡 be a nondurable consumption good at time t
Let 𝐾𝑡 be the stock of physical capital at time t
Let 𝐶 ⃗ = {𝐶0 , … , 𝐶𝑇 } and 𝐾⃗ = {𝐾1 , … , 𝐾𝑇 +1 }
A representative household is endowed with one unit of labor at each 𝑡 and likes the con-
sumption good at each 𝑡
The representative household inelastically supplies a single unit of labor 𝑁𝑡 at each 𝑡, so that
𝑁𝑡 = 1 for all 𝑡 ∈ [0, 𝑇 ]
The representative household has preferences over consumption bundles ordered by the utility
functional:

𝑇 1−𝛾
𝐶
𝑈 (𝐶)⃗ = ∑ 𝛽 𝑡 𝑡 (1)
𝑡=0
1−𝛾

where 𝛽 ∈ (0, 1) is a discount factor and 𝛾 > 0 governs the curvature of the one-period utility
function
28.3. THE GROWTH MODEL 479

Note that

𝐶𝑡1−𝛾
𝑢(𝐶𝑡 ) = (2)
1−𝛾

satisfies 𝑢′ > 0, 𝑢″ < 0


𝑢′ > 0 asserts the consumer prefers more to less
𝑢″ < 0 asserts that marginal utility declines with increases in 𝐶𝑡
We assume that 𝐾0 > 0 is a given exogenous level of initial capital
There is an economy-wide production function

𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 (3)

with 0 < 𝛼 < 1, 𝐴 > 0


A feasible allocation 𝐶,⃗ 𝐾⃗ satisfies

𝐶𝑡 + 𝐾𝑡+1 ≤ 𝐹 (𝐾𝑡 , 𝑁𝑡 ) + (1 − 𝛿)𝐾𝑡 , for all 𝑡 ∈ [0, 𝑇 ] (4)

where 𝛿 ∈ (0, 1) is a depreciation rate of capital

28.3.1 Planning Problem

A planner chooses an allocation {𝐶,⃗ 𝐾}


⃗ to maximize Eq. (1) subject to Eq. (4)

Let 𝜇⃗ = {𝜇0 , … , 𝜇𝑇 } be a sequence of nonnegative Lagrange multipliers


To find an optimal allocation, we form a Lagrangian

𝑇
ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ = ∑ 𝛽 𝑡 {𝑢(𝐶𝑡 ) + 𝜇𝑡 (𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 )}
𝑡=0

and then solve the following min-max problem:

min max ℒ(𝐶,⃗ 𝐾,⃗ 𝜇)⃗ (5)


𝜇⃗ 𝐶,⃗ 𝐾⃗

Useful Properties of Linearly Homogeneous Production Function


The following technicalities will help us
Notice that

𝛼
𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝐴𝐾𝑡𝛼 𝑁𝑡1−𝛼 = 𝑁𝑡 𝐴 ( )
𝑁𝑡

Define the output per-capita production function

𝛼
𝐾 𝐾
𝑓 ( 𝑡) = 𝐴( 𝑡)
𝑁𝑡 𝑁𝑡
480 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

whose argument is capital per-capita


Evidently,

𝐾𝑡
𝐹 (𝐾𝑡 , 𝑁𝑡 ) = 𝑁𝑡 𝑓 ( )
𝑁𝑡

Now for some useful calculations


First

𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡

=
𝜕𝐾 𝜕𝐾𝑡
𝐾 1
= 𝑁𝑡 𝑓 ′ ( 𝑡 ) (Chain rule)
𝑁𝑡 𝑁𝑡 (6)
𝐾
= 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁 =1
𝑡

= 𝑓 (𝐾𝑡 )

Also

𝜕𝐹 𝜕𝑁𝑡 𝑓 ( 𝐾
𝑁𝑡 )
𝑡

= (Product rule)
𝜕𝑁 𝜕𝑁𝑡
𝐾 𝐾 −𝐾
= 𝑓 ( 𝑡 ) +𝑁𝑡 𝑓 ′ ( 𝑡 ) 2𝑡 (Chain rule)
𝑁𝑡 𝑁𝑡 𝑁𝑡
𝐾 𝐾 𝐾
= 𝑓 ( 𝑡 ) − 𝑡 𝑓 ′ ( 𝑡 )∣
𝑁𝑡 𝑁𝑡 𝑁𝑡 𝑁 =1
𝑡

= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡

Back to Solving the Problem


To solve the Lagrangian extremization problem, we compute first derivatives of the La-
grangian and set them equal to 0

• Note: Our objective function and constraints satisfy conditions that work to assure
that required second-order conditions are satisfied at an allocation that satisfies the
first-order conditions that we are about to compute

Here are the first order necessary conditions for extremization (i.e., maximization with
respect to 𝐶,⃗ 𝐾,⃗ minimization with respect to 𝜇):

𝐶𝑡 ∶ 𝑢′ (𝐶𝑡 ) − 𝜇𝑡 = 0 for all 𝑡 = 0, 1, … , 𝑇 (7)

𝐾𝑡 ∶ 𝛽𝜇𝑡 [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] − 𝜇𝑡−1 = 0 for all 𝑡 = 1, 2, … , 𝑇 (8)

𝜇𝑡 ∶ 𝐹 (𝐾𝑡 , 1) + (1 − 𝛿)𝐾𝑡 − 𝐶𝑡 − 𝐾𝑡+1 = 0 for all 𝑡 = 0, 1, … , 𝑇 (9)


28.3. THE GROWTH MODEL 481

𝐾𝑇 +1 ∶ −𝜇𝑇 ≤ 0, ≤ 0 if 𝐾𝑇 +1 = 0; = 0 if 𝐾𝑇 +1 > 0 (10)

𝜕𝐹
Note that in Eq. (8) we plugged in for 𝜕𝐾 using our formula Eq. (6) above
Because 𝑁𝑡 = 1 for 𝑡 = 1, … , 𝑇 , need not differentiate with respect to those arguments
Note that Eq. (9) comes from the occurrence of 𝐾𝑡 in both the period 𝑡 and period 𝑡 − 1 fea-
sibility constraints
Eq. (10) comes from differentiating with respect to 𝐾𝑇 +1 in the last period and applying the
following condition called a Karush-Kuhn-Tucker condition (KKT):

𝜇𝑇 𝐾𝑇 +1 = 0 (11)

See Karush-Kuhn-Tucker conditions


Combining Eq. (7) and Eq. (8) gives

𝑢′ (𝐶𝑡 ) [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] − 𝑢′ (𝐶𝑡−1 ) = 0 for all 𝑡 = 1, 2, … , 𝑇 + 1

Rewriting gives

𝑢′ (𝐶𝑡+1 ) [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡+1 )] = 𝑢′ (𝐶𝑡 ) for all 𝑡 = 0, 1, … , 𝑇 (12)

Taking the inverse of the utility function on both sides of the above equation gives

−1
′−1 𝛽
𝐶𝑡+1 = 𝑢 (( ′ [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)]) )
𝑢 (𝐶𝑡 )

or using our utility function Eq. (2)

1/𝛾
𝐶𝑡+1 = (𝛽𝐶𝑡𝛾 [𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])
1/𝛾
= 𝐶𝑡 (𝛽[𝑓 ′ (𝐾𝑡+1 ) + (1 − 𝛿)])

The above first-order condition for consumption is called an Euler equation


It tells us how consumption in adjacent periods are optimally related to each other and to
capital next period
We now use some of the equations above to calculate some variables and functions that we’ll
soon use to solve the planning problem with Python

In [2]: @njit
def u(c, γ):
'''
Utility function
ASIDE: If you have a utility function that is hard to solve by hand
you can use automatic or symbolic differentiation
See https://github.com/HIPS/autograd
'''
if γ == 1:
## If γ = 1 we can show via L'hopital's Rule that the utility becomes log
return np.log(c)
else:
482 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

return c**(1 - γ) / (1 - γ)

@njit
def u_prime(c, γ):
'''Derivative of utility'''
if γ == 1:
return 1 / c
else:
return c**(-γ)

@njit
def u_prime_inv(c, γ):
'''Inverse utility'''
if γ == 1:
return c
else:
return c**(-1 / γ)

@njit
def f(A, k, α):
'''Production function'''
return A * k**α

@njit
def f_prime(A, k, α):
'''Derivative of production function'''
return α * A * k**(α - 1)

@njit
def f_prime_inv(A, k, α):
return (k / (A * α))**(1 / (α - 1))

28.3.2 Shooting Method

We shall use a shooting method to compute an optimal allocation 𝐶,⃗ 𝐾⃗ and an associated
Lagrange multiplier sequence 𝜇⃗
The first-order necessary conditions for the planning problem, namely, equations Eq. (7),
Eq. (8), and Eq. (9), form a system of difference equations with two boundary conditions:

• 𝐾0 is a given initial condition for capital


• 𝐾𝑇 +1 = 0 is a terminal condition for capital that we deduced from the first-order
necessary condition for 𝐾𝑇 +1 the KKT condition Eq. (11)

We have no initial condition for the Lagrange multiplier 𝜇0


If we did, solving for the allocation would be simple:

• Given 𝜇0 and 𝑘0 , we could compute 𝑐0 from equation Eq. (7) and then 𝑘1 from equation
Eq. (9) and 𝜇1 from equation Eq. (8)
• We could then iterate on to compute the remaining elements of 𝐶,⃗ 𝐾,⃗ 𝜇⃗

But we don’t have an initial condition for 𝜇0 , so this won’t work


But a simple modification called the shooting algorithm will work
The shooting algorithm is an instance of a guess and verify algorithm
It proceeds as follows:

• Guess a value for the initial Lagrange multiplier 𝜇0


28.3. THE GROWTH MODEL 483

• Apply the simple algorithm described above


• Compute the implied value of 𝑘𝑇 +1 and check whether it equals zero
• If the implied 𝐾𝑇 +1 = 0, we have solved the problem
• If 𝐾𝑇 +1 > 0, lower 𝜇0 and try again
• If 𝐾𝑇 +1 < 0, raise 𝜇0 and try again

The following Python code implements the shooting algorithm for the planning problem
We make a slight modification starting with a guess of 𝑐0 but since 𝑐0 is a function of 𝜇0
there is no difference to the procedure above
We’ll apply it with an initial guess that will turn out not to be perfect, as we’ll soon see

In [3]: # Parameters
γ = 2
δ = 0.02
β = 0.95
α = 0.33
A = 1

# Initial guesses
T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
k = np.zeros(T+2) # T periods of capital initialized to 0 (T+2 to include t+1 variable as well)
k[0] = 0.3 # Initial k
c[0] = 0.2 # Guess of c_0

@njit
def shooting_method(c, # Initial consumption
k, # Initial capital
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate on capital# Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A): # Technology

T = len(c) - 1

for t in range(T):
k[t+1] = f(A=A, k=k[t], α=α) + (1 - δ) * k[t] - c[t] # Equation 1 with inequality
if k[t+1] < 0: # Ensure nonnegativity
k[t+1] = 0

# Equation 2: We keep in the general form to show how we would


# solve if we didn't want to do any simplification

if β * (f_prime(A=A, k=k[t+1], α=α) + (1 - δ)) == np.inf:


# This only occurs if k[t+1] is 0, in which case, we won't
# produce anything next period, so consumption will have to be 0
c[t+1] = 0
else:
c[t+1] = u_prime_inv(u_prime(c=c[t], γ=γ) / (β * (f_prime(A=A, k=k[t+1], α=α) + (1 - δ))),

# Terminal condition calculation


k[T+1] = f(A=A, k=k[T], α=α) + (1 - δ) * k[T] - c[T]

return c, k

paths = shooting_method(c, k, γ, δ, β, α, A)

fig, axes = plt.subplots(1, 2, figsize=(10, 4))


colors = ['blue', 'red']
titles = ['Consumption', 'Capital']
ylabels = ['$c_t$', '$k_t$']

for path, color, title, y, ax in zip(paths, colors, titles, ylabels, axes):


ax.plot(path, c=color, alpha=0.7)
ax.set(title=title, ylabel=y, xlabel='t')
484 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

ax.scatter(T+1, 0, s=80)
ax.axvline(T+1, color='k', ls='--', lw=1)

plt.tight_layout()
plt.show()

Evidently, our initial guess for 𝜇0 is too high and makes initial consumption is too low
We know this because we miss our 𝐾𝑇 +1 = 0 target on the high side
Now we automate things with a search-for-a-good 𝜇0 algorithm that stops when we hit the
target 𝐾𝑡+1 = 0
The search procedure is to use a bisection method
Here is how we apply the bisection method
We take an initial guess for 𝐶0 (we can eliminate 𝜇0 because 𝐶0 is an exact function of 𝜇0 )
We know that the lowest 𝐶0 can ever be is 0 and the largest it can be is initial output 𝑓(𝐾0 )
We take a 𝐶0 guess and shoot forward to 𝑇 + 1
If the 𝐾𝑇 +1 > 0, let it be our new lower bound on 𝐶0
If 𝐾𝑇 +1 < 0, let it be our new upper bound
Make a new guess for 𝐶0 exactly halfway between our new upper and lower bounds
Shoot forward again and iterate the procedure
When 𝐾𝑇 +1 gets close enough to 0 (within some error tolerance bounds), stop and declare
victory

In [4]: @njit
def bisection_method(c,
k,
γ, # Coefficient of relative risk aversion
δ, # Depreciation rate on capital# Depreciation rate
β, # Discount factor
α, # Return to capital per capita
A, # Technology
tol=1e-4,
max_iter=1e4,
terminal=0): # Value we are shooting towards

T = len(c) - 1
i = 1 # Initial iteration
c_high = f(k=k[0], α=α, A=A) # Initial high value of c
28.3. THE GROWTH MODEL 485

c_low = 0 # Initial low value of c

path_c, path_k = shooting_method(c, k, γ, δ, β, α, A)

while (np.abs((path_k[T+1] - terminal)) > tol or path_k[T] == terminal) and i < max_iter:

if path_k[T+1] - terminal > tol:


# If assets are too high the c[0] we chose is now a lower bound on possible values of c[0]
c_low = c[0]
elif path_k[T+1] - terminal < -tol:
# If assets fell too quickly, the c[0] we chose is now an upper bound on possible values o
c_high=c[0]
elif path_k[T] == terminal:
# If assets fell too quickly, the c[0] we chose is now an upper bound on possible values
c_high=c[0]

c[0] = (c_high + c_low) / 2 # This is the bisection part


path_c, path_k = shooting_method(c, k, γ, δ, β, α, A)
i += 1

if np.abs(path_k[T+1] - terminal) < tol and path_k[T] != terminal:


print('Converged successfully on iteration', i-1)
else:
print('Failed to converge and hit maximum iteration')

μ = u_prime(c=path_c, γ=γ)
return path_c, path_k, μ

Now we can plot

In [5]: T = 10
c = np.zeros(T+1) # T periods of consumption initialized to 0
k = np.zeros(T+2) # T periods of capital initialized to 0. T+2 to include t+1 variable as well.

k[0] = 0.3 # initial k


c[0] = 0.3 # our guess of c_0

paths = bisection_method(c, k, γ, δ, β, α, A)

def plot_paths(paths, axes=None, ss=None):

T = len(paths[0])

if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))

ylabels = ['$c_t$', '$k_t$', '$\mu_t$']


titles = ['Consumption', 'Capital', 'Lagrange Multiplier']

for path, y, title, ax in zip(paths, ylabels, titles, axes):


ax.plot(path)
ax.set(ylabel=y, title=title, xlabel='t')

# Plot steady state value of capital


if ss is not None:
axes[1].axhline(ss, c='k', ls='--', lw=1)

axes[1].axvline(T, c='k', ls='--', lw=1)


axes[1].scatter(T, paths[1][-1], s=80)
plt.tight_layout()

plot_paths(paths)

Converged successfully on iteration 18


486 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

28.3.3 Setting Intial Capital to the Steady State

If 𝑇 → +∞, the optimal allocation converges to steady state values of 𝐶𝑡 and 𝐾𝑡


It is instructive to compute these and then to set 𝐾0 equal to its steady state value
In a steady state 𝐾𝑡+1 = 𝐾𝑡 = 𝐾̄ for all very large 𝑡 the feasibility constraint Eq. (4) is

𝑓(𝐾)̄ − 𝛿 𝐾̄ = 𝐶 ̄ (13)

Substituting 𝐾𝑡 = 𝐾̄ and 𝐶𝑡 = 𝐶 ̄ for all 𝑡 into Eq. (12) gives

𝑢′ (𝐶)̄ ′ ̄
1=𝛽 [𝑓 (𝐾) + (1 − 𝛿)]
𝑢′ (𝐶)̄

1
Defining 𝛽 = 1+𝜌 , and cancelling gives

1 + 𝜌 = 1[𝑓 ′ (𝐾)̄ + (1 − 𝛿)]

Simplifying gives

𝑓 ′ (𝐾)̄ = 𝜌 + 𝛿

and

𝐾̄ = 𝑓 ′−1 (𝜌 + 𝛿)

Using our production function Eq. (3) gives

𝛼𝐾̄ 𝛼−1 = 𝜌 + 𝛿

Finally, using 𝛼 = .33, 𝜌 = 1/𝛽 − 1 = 1/(19/20) − 1 = 20/19 − 19/19 = 1/19, 𝛿 = 1/50, we get

67
33 100

𝐾̄ = ( 1
100
1 ) ≈ 9.57583
50 + 19

Let’s verify this with Python and then use this steady state 𝐾̄ as our initial capital stock 𝐾0
28.3. THE GROWTH MODEL 487

In [6]: ρ = 1 / β - 1
k_ss = f_prime_inv(k=ρ+δ, A=A, α=α)

print(f'steady state for capital is: {k_ss}')

steady state for capital is: 9.57583816331462

Now we plot

In [7]: T = 150
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss # Start at steady state
paths = bisection_method(c, k, γ, δ, β, α, A)

plot_paths(paths, ss=k_ss)

Converged successfully on iteration 39

Evidently, in this economy with a large value of 𝑇 , 𝐾𝑡 stays near its initial value at the until
the end of time approaches closely
Evidently, the planner likes the steady state capital stock and wants to stay near there for a
long time
Let’s see what happens when we push the initial 𝐾0 below 𝐾̄

In [8]: k_init = k_ss / 3 # Below our steady state


T = 150
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)

plot_paths(paths, ss=k_ss)

Converged successfully on iteration 39


488 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

Notice how the planner pushes capital toward the steady state, stays near there for a while,
then pushes 𝐾𝑡 toward the terminal value 𝐾𝑇 +1 = 0 as 𝑡 gets close to 𝑇
The following graphs compare outcomes as we vary 𝑇

In [9]: T_list = (150, 75, 50, 25)

fix, axes = plt.subplots(1, 3, figsize=(13, 3))

for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)

Converged successfully on iteration 39


Converged successfully on iteration 26
Converged successfully on iteration 25
Converged successfully on iteration 22

The following calculation shows that when we set 𝑇 very large the planner makes the capital
stock spend most of its time close to its steady state value

In [10]: T_list = (250, 150, 50, 25)

fix, axes = plt.subplots(1, 3, figsize=(13, 3))

for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_paths(paths, ss=k_ss, axes=axes)

Failed to converge and hit maximum iteration


Converged successfully on iteration 39
Converged successfully on iteration 25
Converged successfully on iteration 22
28.3. THE GROWTH MODEL 489

The different colors in the above graphs are tied to outcomes with different horizons 𝑇
Notice that as the horizon increases, the planner puts 𝐾𝑡 closer to the steady state value 𝐾̄
for longer
This pattern reflects a turnpike property of the steady state
A rule of thumb for the planner is

• for whatever 𝐾0 you start with, push 𝐾𝑡 toward the steady state and stay there for as
long as you can

In loose language: head for the turnpike and stay near it for as long as you can
As we drive 𝑇 toward +∞, the planner keeps 𝐾𝑡 very close to its steady state for all dates
after some transition toward the steady state
𝑓(𝐾𝑡 )−𝐶𝑡
The planner makes the saving rate 𝑓(𝐾𝑡 ) vary over time
Let’s calculate it

In [11]: @njit
def S(K):
'''Aggregate savings'''
T = len(K) - 2
S = np.zeros(T+1)
for t in range(T+1):
S[t] = K[t+1] - (1 - δ) * K[t]
return S

@njit
def s(K):
'''Savings rate'''
T = len(K) - 2
Y = f(A, K, α)
Y = Y[0:T+1]
s = S(K) / Y
return s

def plot_savings(paths, c_ss=None, k_ss=None, s_ss=None, axes=None):

T = len(paths[0])
k_star = paths[1]
savings_path = s(k_star)
new_paths = (paths[0], paths[1], savings_path)

if axes is None:
fix, axes = plt.subplots(1, 3, figsize=(13, 3))

ylabels = ['$c_t$', '$k_t$', '$s_t$']


titles = ['Consumption', 'Capital', 'Savings Rate']

for path, y, title, ax in zip(new_paths, ylabels, titles, axes):


ax.plot(path)
ax.set(ylabel=y, title=title, xlabel='t')

# Plot steady state value of consumption


if c_ss is not None:
axes[0].axhline(c_ss, c='k', ls='--', lw=1)

# Plot steady state value of capital


if k_ss is not None:
axes[1].axhline(k_ss, c='k', ls='--', lw=1)

# Plot steady state value of savings


if s_ss is not None:
axes[2].axhline(s_ss, c='k', ls='--', lw=1)
490 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

axes[1].axvline(T, c='k', ls='--', lw=1)


axes[1].scatter(T, k_star[-1], s=80)
plt.tight_layout()

T_list = (250, 150, 75, 50)

fix, axes = plt.subplots(1, 3, figsize=(13, 3))

for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_init
paths = bisection_method(c, k, γ, δ, β, α, A)
plot_savings(paths, k_ss=k_ss, axes=axes)

Failed to converge and hit maximum iteration


Converged successfully on iteration 39
Converged successfully on iteration 26
Converged successfully on iteration 25

28.3.4 The Limiting Economy

We now consider an economy in which 𝑇 = +∞


The appropriate thing to do is to replace terminal condition Eq. (10) with

lim 𝛽 𝑇 𝑢′ (𝐶𝑇 )𝐾𝑇 +1 = 0


𝑇 →+∞

which is sometimes called a transversality condition


This condition will be satisfied by a path that converges to an optimal steady state
We can approximate the optimal path from an arbitrary initial 𝐾0 and shooting towards the
optimal steady state 𝐾 at a large but finite 𝑇 + 1
In the following code, we do this for a large 𝑇 ; we shoot towards the steady state and plot
consumption, capital and the savings rate
̄ 𝐶̄
𝑓(𝐾)−
We know that in the steady state that the saving rate must be fixed and that 𝑠 ̄ = 𝑓(𝐾)̄

From Eq. (13) the steady state saving rate equals

𝛿 𝐾̄
𝑠̄ =
𝑓(𝐾)̄
28.3. THE GROWTH MODEL 491

The steady state savings level 𝑆 ̄ = 𝑠𝑓(


̄ 𝐾)̄ is the amount required to offset capital depreciation
each period
We first study optimal capital paths that start below the steady state

In [12]: T = 130

# Steady states
S_ss = δ * k_ss
c_ss = f(A, k_ss, α) - S_ss
s_ss = S_ss / f(A, k_ss, α)

c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3 # Start below steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)

Converged successfully on iteration 35

Since 𝐾0 < 𝐾,̄ 𝑓 ′ (𝐾0 ) > 𝜌 + 𝛿


The planner chooses a positive saving rate above the steady state level offsetting depreciation
that enables us to increase our capital stock
Note, 𝑓 ″ (𝐾) < 0, so as 𝐾 rises, 𝑓 ′ (𝐾) declines
The planner slowly lowers the savings rate until reaching a steady state where 𝑓 ′ (𝐾) = 𝜌 + 𝛿

28.3.5 Exercise

• Plot the optimal consumption, capital, and savings paths when the initial capital level
begins at 1.5 times the steady state level as we shoot towards the steady state at 𝑇 =
130
• Why does the savings rate respond like it does?

28.3.6 Solution
In [13]: T = 130

c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss * 1.5 # Start above steady state
paths = bisection_method(c, k, γ, δ, β, α, A, terminal=k_ss)
plot_savings(paths, k_ss=k_ss, s_ss=s_ss, c_ss=c_ss)
492 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

Converged successfully on iteration 31

28.4 Competitive Equilibrium

Next, we study a decentralized version of an economy with the same technology and prefer-
ence structure as our planned economy
But now there is no planner
Market prices adjust to reconcile distinct decisions that are made separately by a representa-
tive household and a representative firm
The technology for producing goods and accumulating capital via physical investment re-
mains as in our planned economy
There is a representative consumer who has the same preferences over consumption plans as
did the consumer in the planned economy
Instead of being told what to consume and save by a planner, the household chooses for itself
subject to a budget constraint

• At each time 𝑡, the household receives wages and rentals of capital from a firm – these
comprise its income at time 𝑡
• The consumer decides how much income to allocate to consumption or to savings
• The household can save either by acquiring additional physical capital (it trades one
for one with time 𝑡 consumption) or by acquiring claims on consumption at dates other
than 𝑡
• A utility-maximizing household owns all physical capital and labor and rents them to
the firm
• The household consumes, supplies labor, and invests in physical capital
• A profit-maximizing representative firm operates the production technology
• The firm rents labor and capital each period from the representative household and sells
its output each period to the household
• The representative household and the representative firm are both price takers:

– they (correctly) believe that prices are not affected by their choices

Note: We are free to think of there being a large number 𝑀 of identical representative con-
sumers and 𝑀 identical representative firms
28.4. COMPETITIVE EQUILIBRIUM 493

28.4.1 Firm Problem

At time 𝑡 the representative firm hires labor 𝑛̃ 𝑡 and capital 𝑘̃ 𝑡


The firm’s profits at time 𝑡 are

𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘̃ 𝑡

where 𝑤𝑡 is a wage rate at 𝑡 and 𝜂𝑡 is the rental rate on capital at 𝑡


As in the planned economy model

𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝐴𝑘̃ 𝑡𝛼 𝑛̃ 1−𝛼


𝑡

Zero Profit Conditions


Zero-profits condition for capital and labor are

𝐹𝑘 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝜂𝑡

and

𝐹𝑛 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝑤𝑡 (14)

These conditions emerge from a no-arbitrage requirement


To describe this line of reasoning, we begin by applying a theorem of Euler about linearly ho-
mogenous functions
The theorem applies to the Cobb-Douglas production function because it assumed displays
constant returns to scale:

𝛼𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) = 𝐹 (𝛼𝑘̃ 𝑡 , 𝛼𝑛̃ 𝑡 )

for 𝛼 ∈ (0, 1)
𝜕𝐹
Taking the partial derivative 𝜕𝛼 on both sides of the above equation gives

𝜕𝐹 ̃ 𝜕𝐹
𝐹 (𝑘̃ 𝑡 , 𝑛̃ 𝑡 ) =chain rule 𝑘𝑡 + 𝑛̃
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡

Rewrite the firm’s profits as

𝜕𝐹 ̃ 𝜕𝐹
𝑘𝑡 + 𝑛̃ − 𝑤𝑡 𝑛̃ 𝑡 − 𝜂𝑡 𝑘𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡 𝑡

or

𝜕𝐹 𝜕𝐹
( − 𝜂𝑡 ) 𝑘̃ 𝑡 + ( − 𝑤𝑡 ) 𝑛̃ 𝑡
𝜕 𝑘̃ 𝑡 𝜕 𝑛̃ 𝑡
494 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

𝜕𝐹 𝜕𝐹
Because 𝐹 is homogeneous of degree 1, it follows that 𝜕 𝑘̃ 𝑡
and 𝜕 𝑛̃ 𝑡 are homogeneous of degree
0 and therefore fixed with respect to 𝑘̃ 𝑡 and 𝑛̃ 𝑡
If 𝜕𝐹
𝜕 𝑘̃ 𝑡
> 𝜂𝑡 , then the firm makes positive profits on each additional unit of 𝑘̃ 𝑡 , so it will want
to make 𝑘̃ 𝑡 arbitrarily large
But setting 𝑘̃ 𝑡 = +∞ is not physically feasible, so presumably equilibrium prices will assume
values that present the firm with no such arbitrage opportunity
𝜕𝐹
A related argument applies if 𝜕 𝑛̃ 𝑡 > 𝑤𝑡
𝜕 𝑘̃ 𝑡
If 𝜕 𝑘̃ 𝑡
< 𝜂𝑡 , the firm will set 𝑘̃ 𝑡 to zero

Again, equilibrium prices won’t incentive the firm to do that.


And so on…
It is convenient to define 𝑤⃗ 𝑡 = {𝑤0 , … , 𝑤𝑇 }and 𝜂𝑡⃗ = {𝜂0 , … , 𝜂𝑇 }

28.4.2 Household Problem

A representative household lives at 𝑡 = 0, 1, … , 𝑇


At 𝑡, the household rents 1 unit of labor and 𝑘𝑡 units of capital to a firm and receives income

𝑤𝑡 1 + 𝜂 𝑡 𝑘 𝑡

At 𝑡 the household allocates its income to the following purchases

(𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ))

Here (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) is the household’s net investment in physical capital and 𝛿 ∈ (0, 1) is
again a depreciation rate of capital
In period 𝑡 is free to purchase more goods to be consumed and invested in physical capital
than its income from supplying capital and labor to the firm, provided that in some other
periods its income exceeds its purchases
A household’s net excess demand for time 𝑡 consumption goods is the gap

𝑒𝑡 ≡ (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 )) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )

Let 𝑐 ⃗ = {𝑐0 , … , 𝑐𝑇 } and let 𝑘⃗ = {𝑘1 , … , 𝑘𝑇 +1 }


𝑘0 is given to the household

28.4.3 Market Structure for Intertemporal Trades

There is a single grand competitive market in which a representative household can trade
date 0 goods for goods at all other dates 𝑡 = 1, 2, … , 𝑇
What matters are not bilateral trades of the good at one date 𝑡 for the good at another date
𝑡 ̃ ≠ 𝑡.
28.4. COMPETITIVE EQUILIBRIUM 495

Instead, think of there being multilateral and multitemporal trades in which bundles of
goods at some dates can be traded for bundles of goods at some other dates.
There exist complete markets in such bundles with associated market prices

28.4.4 Market Prices

Let 𝑞𝑡0 be the price of a good at date 𝑡 relative to a good at date 0


{𝑞𝑡0 }𝑇𝑡=0 is a vector of Hicks-Arrow prices, named after the 1972 joint economics Nobel
prize winners who used such prices in some of their important work
Evidently,

number of time 0 goods


𝑞𝑡0 =
number of time t goods

Because 𝑞𝑡0 is a relative price, the units in terms of which prices are quoted are arbitrary –
we can normalize them without substantial consequence
If we use the price vector {𝑞𝑡0 }𝑇𝑡=0 to evaluate a stream of excess demands {𝑒𝑡 }𝑇𝑡=0 we compute
𝑇
the present value of {𝑒𝑡 }𝑇𝑡=0 to be ∑𝑡=0 𝑞𝑡0 𝑒𝑡
That the market is multitemporal is reflected in the situation that the household faces a
single budget constraint
It states that the present value of the household’s net excess demands must be zero:

𝑇
∑ 𝑞𝑡0 𝑒𝑡 ≤ 0
𝑡=0

or

𝑇
∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − (𝑤𝑡 1 + 𝜂𝑡 𝑘𝑡 )) ≤ 0
𝑡=0

28.4.5 Household Problem

The household faces the constrained optimization problem:

𝑇
max ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑐,⃗ 𝑘⃗ 𝑡=0
𝑇
subject to ∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 ) ≤ 0
𝑡=0

28.4.6 Definitions

• A price system is a sequence {𝑞𝑡0 , 𝜂𝑡 , 𝑤𝑡 }𝑇𝑡=0 = {𝑞,⃗ 𝜂,⃗ 𝑤}



• An allocation is a sequence {𝑐𝑡 , 𝑘𝑡+1 , 𝑛𝑡 = 1}𝑡=0 = {𝑐,⃗ 𝑘,⃗ 𝑛⃗ = 1}
𝑇

• A competitive equilibrium is a price system and an allocation for which


496 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

– Given the price system, the allocation solves the household’s problem
– Given the price system, the allocation solves the firm’s problem

28.4.7 Computing a Competitive Equilibrium

We shall compute a competitive equilibrium using a guess and verify approach

• We shall guess equilibrium price sequences {𝑞,⃗ 𝜂,⃗ 𝑤}



• We shall then verify that at those prices, the household and the firm choose the same
allocation

Guess for Price System


We have computed an allocation {𝐶,⃗ 𝐾,⃗ 1}⃗ that solves the planning problem
We use that allocation to construct our guess for the equilibrium price system
In particular, we guess that for 𝑡 = 0, … , 𝑇 :

𝜆𝑞𝑡0 = 𝛽 𝑡 𝑢′ (𝐾𝑡 ) = 𝛽 𝑡 𝜇𝑡 (15)

𝑤𝑡 = 𝑓(𝐾𝑡 ) − 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) (16)

𝜂𝑡 = 𝑓 ′ (𝐾𝑡 ) (17)

At these prices, let the capital chosen by the household be

𝑘𝑡∗ (𝑞,⃗ 𝑤,⃗ 𝜂),


⃗ 𝑡≥0 (18)

and let the allocation chosen by the firm be

𝑘̃ 𝑡∗ (𝑞,⃗ 𝑤,⃗ 𝜂),


⃗ 𝑡≥0

and so on
If our guess for the equilibrium price system is correct, then it must occur that

𝑘𝑡∗ = 𝑘̃ 𝑡∗ (19)

1 = 𝑛̃ ∗𝑡 (20)


𝑐𝑡∗ + 𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡∗ = 𝐹 (𝑘̃ 𝑡∗ , 𝑛̃ ∗𝑡 )

We shall verify that for 𝑡 = 0, … , 𝑇 the allocations chosen by the household and the firm both
equal the allocation that solves the planning problem:

𝑘𝑡∗ = 𝑘̃ 𝑡∗ = 𝐾𝑡 , 𝑛̃ 𝑡 = 1, 𝑐𝑡∗ = 𝐶𝑡 (21)


28.4. COMPETITIVE EQUILIBRIUM 497

28.4.8 Verification Procedure

Our approach is to stare at first-order necessary conditions for the optimization problems of
the household and the firm
At the price system we have guessed, both sets of first-order conditions are satisfied at the
allocation that solves the planning problem

28.4.9 Household’s Lagrangian

To solve the household’s problem, we formulate the appropriate Lagrangian and pose the
min-max problem:

𝑇 𝑇
min max ℒ(𝑐,⃗ 𝑘,⃗ 𝜆) = ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) + 𝜆 (∑ 𝑞𝑡0 (((1 − 𝛿)𝑘𝑡 − 𝑤𝑡 ) + 𝜂𝑡 𝑘𝑡 − 𝑐𝑡 − 𝑘𝑡+1 ))
𝜆 𝑐,⃗ 𝑘⃗ 𝑡=0 𝑡=0

First-order conditions are

𝑐𝑡 ∶ 𝛽 𝑡 𝑢′ (𝑐𝑡 ) − 𝜆𝑞𝑡0 = 0 𝑡 = 0, 1, … , 𝑇 (22)

𝑘𝑡 ∶ −𝜆𝑞𝑡0 [(1 − 𝛿) + 𝜂𝑡 ] + 𝜆𝑞𝑡−1


0
=0 𝑡 = 1, 2, … , 𝑇 + 1 (23)

𝑇
𝜆∶ (∑ 𝑞𝑡0 (𝑐𝑡 + (𝑘𝑡+1 − (1 − 𝛿)𝑘𝑡 ) − 𝑤𝑡 − 𝜂𝑡 𝑘𝑡 )) ≤ 0 (24)
𝑡=0

𝑘𝑇 +1 ∶ −𝜆𝑞0𝑇 +1 ≤ 0, ≤ 0 if 𝑘𝑇 +1 = 0; = 0 if 𝑘𝑇 +1 > 0 (25)

Now we plug in for our guesses of prices and derive all the FONC of the planner problem
Eq. (7)-Eq. (10):
Combining Eq. (22) and Eq. (15), we get:

𝑢′ (𝐶𝑡 ) = 𝜇𝑡

which is Eq. (7).


Combining Eq. (23), Eq. (15), and Eq. (17) we get:

−𝜆𝛽 𝑡 𝜇𝑡 [(1 − 𝛿) + 𝑓 ′ (𝐾𝑡 )] + 𝜆𝛽 𝑡−1 𝜇𝑡−1 = 0 (26)

Rewriting Eq. (26) by dividing by 𝜆 on both sides (which is nonzero due to u’>0) we get:

𝛽 𝑡 𝜇𝑡 [(1 − 𝛿 + 𝑓 ′ (𝐾𝑡 )] = 𝛽 𝑡−1 𝜇𝑡−1

or

𝛽𝜇𝑡 [(1 − 𝛿 + 𝑓 ′ (𝐾𝑡 )] = 𝜇𝑡−1


498 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

which is Eq. (8).


Combining Eq. (24), Eq. (15), Eq. (16) and Eq. (17) after multiplying both sides of Eq. (24)
by 𝜆, we get:

𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + (𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 ) − 𝑓(𝐾𝑡 ) + 𝐾𝑡 𝑓 ′ (𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 ) ≤ 0
𝑡=0

Cancelling,

𝑇
∑ 𝛽 𝑡 𝜇𝑡 (𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1)) ≤ 0
𝑡=0

Since 𝛽 𝑡 and 𝜇𝑡 are always positive here, (excepting perhaps the T+1 period) we get:

𝐶𝑡 + 𝐾𝑡+1 − (1 − 𝛿)𝐾𝑡 − 𝐹 (𝐾𝑡 , 1) = 0 for all 𝑡 in 0, … , 𝑇

which is Eq. (9)


Combining Eq. (25) and Eq. (15), we get:

−𝛽 𝑇 +1 𝜇𝑇 +1 ≤ 0

Dividing both sides by 𝛽 𝑇 +1 which will be strictly positive here, we get:

−𝜇𝑇 +1 ≤ 0

which is the Eq. (10) of our planning problem


Thus, at our guess of the equilibrium price system, the allocation that solves the
planning problem also solves the problem faced by a representative household liv-
ing in a competitive equilibrium
We now consider the problem faced by a firm in a competitive equilibrium:
If we plug in Eq. (21) into Eq. (14) for all t, we get

𝜕𝐹 (𝐾𝑡 , 1)
= 𝑓 ′ (𝐾𝑡 ) = 𝜂𝑡
𝜕𝐾𝑡

which is Eq. (17)


If we now plug Eq. (21) into Eq. (14) for all t, we get:

𝜕𝐹 (𝐾̃ 𝑡 , 1)
= 𝑓(𝐾𝑡 ) − 𝑓 ′ (𝐾𝑡 )𝐾𝑡 = 𝑤𝑡
𝜕 𝐿̃

which is exactly Eq. (18)


Thus, at our guess of the equilibrium price system, the allocation that solves the
planning problem also solves the problem faced by a firm within a competitive
equilibrium
28.4. COMPETITIVE EQUILIBRIUM 499

By Eq. (19) and Eq. (20) this allocation is identical to the one that solves the consumer’s
problem
Note: Because budget sets are affected only by relative prices, {𝑞0𝑡 } is determined only up to
multiplication by a positive constant
Normalization: We are free to choose a {𝑞0𝑡 } that makes 𝜆 = 1, thereby making 𝑞0𝑡 be mea-
sured in units of the marginal utility of time 0 goods
We will also plot q, w and 𝜂 below to show the prices that induce the same aggregate move-
ments we saw earlier in the planning problem.

In [14]: @njit
def q_func(β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c) - 2
q = np.zeros(T+1)
q[0] = 1
for t in range(1, T+2):
q[t] = β**t * u_prime(c[t], γ)
return q

@njit
def w_func(A, k, α):
w = f(A, k, α) - k * f_prime(A, k, α)
return w

@njit
def η_func(A, k, α):
η = f_prime(A, k, α)
return η

Now we calculate and plot for each 𝑇

In [15]: T_list = (250, 150, 75, 50)

fix, axes = plt.subplots(2, 3, figsize=(13, 6))


titles = ['Arrow-Hicks Prices', 'Labor Rental Rate', 'Capital Rental Rate',
'Consumption', 'Capital', 'Lagrange Multiplier']
ylabels = ['$q_t^0$', '$w_t$', '$\eta_t$', '$c_t$', '$k_t$', '$\mu_t$']

for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)

q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]

for ax, plot, title, y in zip(axes.flatten(), plots, titles, ylabels):


ax.plot(plot)
ax.set(title=title, ylabel=y, xlabel='t')
if title is 'Capital':
ax.axhline(k_ss, lw=1, ls='--', c='k')
if title is 'Consumption':
ax.axhline(c_ss, lw=1, ls='--', c='k')

plt.tight_layout()
plt.show()

Failed to converge and hit maximum iteration


Converged successfully on iteration 39
500 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

Converged successfully on iteration 26


Converged successfully on iteration 25

Varying Curvature
Now we see how our results change if we keep 𝑇 constant, but allow the curvature parameter,
𝛾 to vary, starting with 𝐾0 below the steady state.
We plot the results for 𝑇 = 150

In [16]: γ_list = (1.1, 4, 6, 8)


T = 150

fix, axes = plt.subplots(2, 3, figsize=(13, 6))

for γ in γ_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)

q = q_func(β, c, γ)
w = w_func(β, k, α)[:-1]
η = η_func(A, k, α)[:-1]
plots = [q, w, η, c, k, μ]

for ax, plot, title, y in zip(axes.flatten(), plots, titles, ylabels):


ax.plot(plot, label=f'$\gamma = {γ}$')
ax.set(title=title, ylabel=y, xlabel='t')
if title is 'Capital':
ax.axhline(k_ss, lw=1, ls='--', c='k')
if title is 'Consumption':
ax.axhline(c_ss, lw=1, ls='--', c='k')

axes[0, 0].legend()
plt.tight_layout()
plt.show()

Converged successfully on iteration 44


Converged successfully on iteration 37
Converged successfully on iteration 37
Converged successfully on iteration 37
28.4. COMPETITIVE EQUILIBRIUM 501

Adjusting 𝛾 means adjusting how much individuals prefer to smooth consumption


Higher 𝛾 means individuals prefer to smooth more resulting in slower adjustments to the
steady state allocations
Vice-versa for lower 𝛾

28.4.10 Yield Curves and Hicks-Arrow Prices Again

Now, we compute Hicks-Arrow prices again, but also calculate the implied yields to maturity
This will let us plot a yield curve
The key formulas are:
The yield to maturity

𝑡
log 𝑞𝑡 0
𝑟𝑡0 ,𝑡 = −
𝑡 − 𝑡0

A generic Hicks-Arrow price for any base-year 𝑡0 ≤ 𝑡

−𝛾
𝑡 𝑢′ (𝑐𝑡 ) 𝑡−𝑡0 𝑐𝑡
𝑞𝑡 0 = 𝛽 𝑡−𝑡0 = 𝛽
𝑢′ (𝑐𝑡0 ) 𝑐𝑡−𝛾
0

We redefine our function for 𝑞 to allow arbitrary base years, and define a new function for 𝑟,
then plot both
First, we plot when 𝑡0 = 0 as before, for different values of 𝑇 , with 𝐾0 below the steady state

In [17]: @njit
def q_func(t_0, β, c, γ):
# Here we choose numeraire to be u'(c_0) -- this is q^(t_0)_t
T = len(c)
q = np.zeros(T+1-t_0)
q[0] = 1
for t in range(t_0+1, T):
q[t-t_0] = β**(t - t_0) * u_prime(c[t], γ) / u_prime(c[t_0], γ)
return q
502 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL

@njit
def r_func(t_0, β, c, γ):
'''Yield to maturity'''
T = len(c) - 1
r = np.zeros(T+1-t_0)
for t in range(t_0+1, T+1):
r[t-t_0]= -np.log(q_func(t_0, β, c, γ)[t-t_0]) / (t - t_0)
return r

t_0 = 0
T_list = [150, 75, 50]
γ = 2
titles = ['Hicks-Arrow Prices', 'Yields']
ylabels = ['$q_t^0$', '$r_t^0$']

fig, axes = plt.subplots(1, 2, figsize=(10, 5))

for T in T_list:
c = np.zeros(T+1)
k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)

for ax, plot, title, y in zip(axes, (q, r), titles, ylabels):


ax.plot(plot)
ax.set(title=title, ylabel=y, xlabel='t')

plt.tight_layout()
plt.show()

Converged successfully on iteration 39


Converged successfully on iteration 26
Converged successfully on iteration 25

Now we plot when 𝑡0 = 20

In [18]: t_0 = 20

fig, axes = plt.subplots(1, 2, figsize=(10, 5))

for T in T_list:
c = np.zeros(T+1)
28.4. COMPETITIVE EQUILIBRIUM 503

k = np.zeros(T+2)
c[0] = 0.3
k[0] = k_ss / 3
c, k, μ = bisection_method(c, k, γ, δ, β, α, A)
q = q_func(t_0, β, c, γ)
r = r_func(t_0, β, c, γ)

for ax, plot, title, y in zip(axes, (q, r), titles, ylabels):


ax.plot(plot)
ax.set(title=title, ylabel=y, xlabel='t')

axes[1].set_title(f'Yields at $t_0 = {t_0}$')


plt.tight_layout()
plt.show()

Converged successfully on iteration 39


Converged successfully on iteration 26
Converged successfully on iteration 25

We shall have more to say about the term structure of interest rates in a later lecture on the
topic
504 28. CASS-KOOPMANS OPTIMAL GROWTH MODEL
29

A First Look at the Kalman Filter

29.1 Contents

• Overview 29.2

• The Basic Idea 29.3

• Convergence 29.4

• Implementation 29.5

• Exercises 29.6

• Solutions 29.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

29.2 Overview

This lecture provides a simple and intuitive introduction to the Kalman filter, for those who
either

• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from

For additional (more advanced) reading on the Kalman filter, see

• [87], section 2.7


• [6]

The second reference presents a comprehensive treatment of the Kalman filter


Required knowledge: Familiarity with matrix manipulations, multivariate normal distribu-
tions, covariance matrices, etc.

505
506 29. A FIRST LOOK AT THE KALMAN FILTER

29.3 The Basic Idea

The Kalman filter has many applications in economics, but for now let’s pretend that we are
rocket scientists
A missile has been launched from country Y and our mission is to track it
Let 𝑥 ∈ R2 denote the current location of the missile—a pair indicating latitude-longitude
coordinates on a map
At the present moment in time, the precise location 𝑥 is unknown, but we do have some be-
liefs about 𝑥
One way to summarize our knowledge is a point prediction 𝑥̂

• But what if the President wants to know the probability that the missile is currently
over the Sea of Japan?

• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝

– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸

The density 𝑝 is called our prior for the random variable 𝑥


To keep things tractable in our example, we assume that our prior is Gaussian
In particular, we take

𝑝 = 𝑁 (𝑥,̂ Σ) (1)

where 𝑥̂ is the mean of the distribution and Σ is a 2 × 2 covariance matrix. In our simula-
tions, we will suppose that

0.2 0.4 0.3


𝑥̂ = ( ), Σ=( ) (2)
−0.2 0.3 0.45

This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being
equal to 𝑥̂

In [2]: from scipy import linalg


import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
%matplotlib inline

# == Set up the Gaussian prior density p == #


Σ = [[0.4, 0.3], [0.3, 0.45]]
Σ = np.matrix(Σ)
x_hat = np.matrix([0.2, -0.2]).T
# == Define the matrices G and R from the equation y = G x + N(0, R) == #
G = [[1, 0], [0, 1]]
G = np.matrix(G)
R = 0.5 * Σ
# == The matrices A and Q == #
A = [[1.2, 0], [0, -0.2]]
A = np.matrix(A)
Q = 0.3 * Σ
# == The observed value of y == #
y = np.matrix([2.3, -1.9]).T
29.3. THE BASIC IDEA 507

# == Set up grid for plotting == #


x_grid = np.linspace(-1.5, 2.9, 100)
y_grid = np.linspace(-3.1, 1.7, 100)
X, Y = np.meshgrid(x_grid, y_grid)

def bivariate_normal(x, y, σ_x=1.0, σ_y=1.0, μ_x=0.0, μ_y=0.0, σ_xy=0.0):


"""
Compute and return the probability density function of bivariate normal distribution
of normal random variables x and y

Parameters
----------
x : array_like(float)
Random variable

y : array_like(float)
Random variable

σ_x : array_like(float)
Standard deviation of random variable x

σ_y : array_like(float)
Standard deviation of random variable y

μ_x : scalar(float)
Mean value of random variable x

μ_y : scalar(float)
Mean value of random variable y

σ_xy : array_like(float)
Covariance of random variables x and y

"""

x_μ = x - μ_x
y_μ = y - μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 - 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 - ρ**2)
return np.exp(-z / (2 * (1 - ρ**2))) / denom

def gen_gaussian_plot_vals(μ, C):


"Z values for plotting the bivariate Gaussian N(μ, C)"
m_x, m_y = float(μ[0]), float(μ[1])
s_x, s_y = np.sqrt(C[0, 0]), np.sqrt(C[1, 1])
s_xy = C[0, 1]
return bivariate_normal(X, Y, s_x, s_y, m_x, m_y, s_xy)

# Plot the figure

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)

plt.show()
508 29. A FIRST LOOK AT THE KALMAN FILTER

29.3.1 The Filtering Step

We are now presented with some good news and some bad news
The good news is that the missile has been located by our sensors, which report that the cur-
rent location is 𝑦 = (2.3, −1.9)
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦

In [3]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()
29.3. THE BASIC IDEA 509

The bad news is that our sensors are imprecise


In particular, we should interpret the output of our sensor not as 𝑦 = 𝑥, but rather as

𝑦 = 𝐺𝑥 + 𝑣, where 𝑣 ∼ 𝑁 (0, 𝑅) (3)

Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the
noise term 𝑣 is assumed to be independent of 𝑥
How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve
our understanding of the location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our
prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via

𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)

where 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝑥) 𝑝(𝑥)𝑑𝑥


In solving for 𝑝(𝑥 | 𝑦), we observe that

• 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ)
• In view of Eq. (3), the conditional density 𝑝(𝑦 | 𝑥) is 𝑁 (𝐺𝑥, 𝑅)
• 𝑝(𝑦) does not depend on 𝑥, and enters into the calculations only as a normalizing con-
stant
510 29. A FIRST LOOK AT THE KALMAN FILTER

Because we are in a linear and Gaussian framework, the updated density can be computed by
calculating population linear regressions
In particular, the solution is known [1] to be

𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )

where

𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (4)

Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden ob-
ject 𝑥 − 𝑥̂ on the surprise 𝑦 − 𝐺𝑥̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the
color map
The original density is left in as contour lines for comparison

In [4]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()
29.3. THE BASIC IDEA 511

Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 −
𝐺𝑥 ̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in
Eq. (2)

29.3.2 The Forecast Step

What have we achieved so far?


We have obtained probabilities for the current location of the state (missile) given prior and
current information
This is called “filtering” rather than forecasting because we are filtering out noise rather than
looking into the future

• 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is called the filtering distribution

But now let’s suppose that we are given another task: to predict the location of the missile
after one unit of time (whatever that may be) has elapsed
To do this we need a model of how the state evolves
Let’s suppose that we have one, and that it’s linear and Gaussian. In particular,

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝑤𝑡+1 , where 𝑤𝑡 ∼ 𝑁 (0, 𝑄) (5)


512 29. A FIRST LOOK AT THE KALMAN FILTER

Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to
come up with a new predictive distribution for the location in one unit of time
In view of Eq. (5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work
out the distribution of 𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄)
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian
Elementary calculations and the expressions in Eq. (4) tell us that

E[𝐴𝑥𝐹 + 𝑤] = 𝐴E𝑥𝐹 + E𝑤 = 𝐴𝑥𝐹̂ = 𝐴𝑥̂ + 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂

and

Var[𝐴𝑥𝐹 + 𝑤] = 𝐴 Var[𝑥𝐹 ]𝐴′ + 𝑄 = 𝐴Σ𝐹 𝐴′ + 𝑄 = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄

The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain

• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥̂

Using this notation, we can summarize our results as follows


Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where

𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
(6)
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄

• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤


̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution

The predictive distribution is the new density shown in the following figure, where the update
has used parameters

1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2

In [5]: fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)

# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)

# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
29.3. THE BASIC IDEA 513

ax.clabel(cs3, inline=1, fontsize=10)


ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()

29.3.3 The Recursive Procedure

Let’s look back at what we’ve done


We started the current period with a prior 𝑝(𝑥) for the location 𝑥 of the missile
We then used the current measurement 𝑦 to update to 𝑝(𝑥 | 𝑦)
Finally, we used the law of motion Eq. (5) for {𝑥𝑡 } to update to 𝑝𝑛𝑒𝑤 (𝑥)
If we now step into the next period, we are ready to go round again, taking 𝑝𝑛𝑒𝑤 (𝑥) as the
current prior
Swapping notation 𝑝𝑡 (𝑥) for 𝑝(𝑥) and 𝑝𝑡+1 (𝑥) for 𝑝𝑛𝑒𝑤 (𝑥), the full recursive procedure is:

1. Start the current period with prior 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 )


2. Observe current measurement 𝑦𝑡
3. Compute the filtering distribution 𝑝𝑡 (𝑥 | 𝑦) = 𝑁 (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 ) from 𝑝𝑡 (𝑥) and 𝑦𝑡 , applying
Bayes rule and the conditional distribution Eq. (3)
4. Compute the predictive distribution 𝑝𝑡+1 (𝑥) = 𝑁 (𝑥𝑡+1 ̂ , Σ𝑡+1 ) from the filtering distribu-
tion and Eq. (5)
514 29. A FIRST LOOK AT THE KALMAN FILTER

5. Increment 𝑡 by one and go to step 1

Repeating Eq. (6), the dynamics for 𝑥𝑡̂ and Σ𝑡 are as follows

𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
(7)
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄

These are the standard dynamic equations for the Kalman filter (see, for example, [87], page
58)

29.4 Convergence

The matrix Σ𝑡 is a measure of the uncertainty of our prediction 𝑥𝑡̂ of 𝑥𝑡


Apart from special cases, this uncertainty will never be fully resolved, regardless of how much
time elapses
One reason is that our prediction 𝑥𝑡̂ is made based on information available at 𝑡 − 1, not 𝑡
Even if we know the precise value of 𝑥𝑡−1 (which we don’t), the transition equation Eq. (5)
implies that 𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝑤𝑡
Since the shock 𝑤𝑡 is not observable at 𝑡 − 1, any time 𝑡 − 1 prediction of 𝑥𝑡 will incur some
error (unless 𝑤𝑡 is degenerate)
However, it is certainly possible that Σ𝑡 converges to a constant matrix as 𝑡 → ∞
To study this topic, let’s expand the second equation in Eq. (7):

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐴Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴′ + 𝑄 (8)

This is a nonlinear difference equation in Σ𝑡


A fixed point of Eq. (8) is a constant matrix Σ such that

Σ = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄 (9)

Equation Eq. (8) is known as a discrete-time Riccati difference equation


Equation Eq. (9) is known as a discrete-time algebraic Riccati equation
Conditions under which a fixed point exists and the sequence {Σ𝑡 } converges to it are dis-
cussed in [7] and [6], chapter 4
A sufficient (but not necessary) condition is that all the eigenvalues 𝜆𝑖 of 𝐴 satisfy |𝜆𝑖 | < 1
(cf. e.g., [6], p. 77)
(This strong condition assures that the unconditional distribution of 𝑥𝑡 converges as 𝑡 → +∞)
In this case, for any initial choice of Σ0 that is both non-negative and symmetric, the se-
quence {Σ𝑡 } in Eq. (8) converges to a non-negative symmetric matrix Σ that solves Eq. (9)
29.5. IMPLEMENTATION 515

29.5 Implementation

The class Kalman from the QuantEcon.py package implements the Kalman filter

• Instance data consists of:

– the moments (𝑥𝑡̂ , Σ𝑡 ) of the current prior


– An instance of the LinearStateSpace class from QuantEcon.py

The latter represents a linear state space model of the form

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡

where the shocks 𝑤𝑡 and 𝑣𝑡 are IID standard normals


To connect this with the notation of this lecture we set

𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′

• The class Kalman from the QuantEcon.py package has a number of methods, some that
we will wait to use until we study more advanced applications in subsequent lectures

• Methods pertinent for this lecture are:

– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹


𝑡̂ , Σ𝑡 )
– filtered_to_forecast, which updates the filtering distribution to the predic-
tive distribution – which becomes the new prior (𝑥𝑡+1
̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to Eq. (9) and the corre-
sponding (stationary) Kalman gain

You can view the program on GitHub

29.6 Exercises

29.6.1 Exercise 1

Consider the following simple application of the Kalman filter, loosely based on [87], section
2.9.2
Suppose that

• all variables are scalars


• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ R unknown to the modeler

State dynamics are therefore given by Eq. (5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃


The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and IID
516 29. A FIRST LOOK AT THE KALMAN FILTER

The task of this exercise to simulate the model and, using the code from kalman.py, plot
the first five predictive densities 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 )
As shown in [87], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1
Your figure should – modulo randomness – look something like this

29.6.2 Exercise 2

The preceding figure gives some support to the idea that probability mass converges to 𝜃
To get a better idea, choose a small 𝜖 > 0 and calculate

𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖

for 𝑡 = 0, 1, 2, … , 𝑇
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600
Your figure should show error erratically declining something like this
29.6. EXERCISES 517

29.6.3 Exercise 3

As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general
possible to predict 𝑥𝑡 without error at time 𝑡 − 1 (and this would be the case even if we could
observe 𝑥𝑡−1 )
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is
allowed to observe 𝑥𝑡−1
This competitor will use the conditional expectation E[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1
The conditional expectation is known to be the optimal prediction method in terms of mini-
mizing mean squared error
(More precisely, the minimizer of E ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶=
E[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in
the sense of being able to observe the latent state) and behaves optimally in terms of mini-
mizing squared error
Our horse race will be assessed in terms of squared error
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and
‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for 𝑡 = 1, … , 50
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity
Set

0.5 0.4
𝐴=( )
0.6 0.3

To initialize the prior density, set


518 29. A FIRST LOOK AT THE KALMAN FILTER

0.9 0.3
Σ0 = ( )
0.3 0.9

and 𝑥0̂ = (8, 8)


Finally, set 𝑥0 = (0, 0)
You should end up with a figure similar to the following (modulo randomness)

Observe how, after an initial learning period, the Kalman filter performs quite well, even rela-
tive to the competitor who predicts optimally with knowledge of the latent state

29.6.4 Exercise 4

Try varying the coefficient 0.3 in 𝑄 = 0.3𝐼 up and down


Observe how the diagonal values in the stationary solution Σ (see Eq. (9)) increase and de-
crease in line with this coefficient
The interpretation is that more randomness in the law of motion for 𝑥𝑡 causes more (perma-
nent) uncertainty in prediction

29.7 Solutions
In [6]: from quantecon import Kalman
from quantecon import LinearStateSpace
from scipy.stats import norm

29.7.1 Exercise 1
In [7]: # == parameters == #
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
29.7. SOLUTIONS 519

# == set prior, initialize kalman filter == #


x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

# == draw observations of y from state space model == #


N = 5
x, y = ss.simulate(N)
y = y.flatten()

# == set up plot == #
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ - 5, θ + 2, 200)

for i in range(N):
# == record the current predicted mean and variance == #
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# == plot, update filter == #
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')
kalman.update(y[i])

ax.set_title(f'First {N} densities when $\\theta = {θ:.1f}$')


ax.legend(loc='upper left')
plt.show()

29.7.2 Exercise 2
In [8]: from scipy.integrate import quad

� = 0.1
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
520 29. A FIRST LOOK AT THE KALMAN FILTER

ss = LinearStateSpace(A, C, G, H, mu_0=θ)

x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()

for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]

f = lambda x: norm.pdf(x, loc=m, scale=np.sqrt(v))


integral, error = quad(f, θ - �, θ + �)
z[t] = 1 - integral

kalman.update(y[t])

fig, ax = plt.subplots(figsize=(9, 7))


ax.set_ylim(0, 1)
ax.set_xlim(0, T)
ax.plot(range(T), z)
ax.fill_between(range(T), np.zeros(T), z, color="blue", alpha=0.2)
plt.show()

29.7.3 Exercise 3
In [9]: from numpy.random import multivariate_normal
from scipy.linalg import eigvals
29.7. SOLUTIONS 521

# === Define A, C, G, H === #


G = np.identity(2)
H = np.sqrt(0.5) * np.identity(2)

A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)

# === Set up state space mode, initial value x_0 set to zero === #
ss = LinearStateSpace(A, C, G, H, mu_0 = np.zeros(2))

# === Define the prior density === #


Σ = [[0.9, 0.3],
[0.3, 0.9]]
Σ = np.array(Σ)
x_hat = np.array([8, 8])

# === Initialize the Kalman filter === #


kn = Kalman(ss, x_hat, Σ)

# == Print eigenvalues of A == #
print("Eigenvalues of A:")
print(eigvals(A))

# == Print stationary Σ == #
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)

# === Generate the plot === #


T = 50
x, y = ss.simulate(T)

e1 = np.empty(T-1)
e2 = np.empty(T-1)

for t in range(1, T):


kn.update(y[:,t])
e1[t-1] = np.sum((x[:, t] - kn.x_hat.flatten())**2)
e2[t-1] = np.sum((x[:, t] - A @ x[:, t-1])**2)

fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k-', lw=2, alpha=0.6, label='Kalman filter error')
ax.plot(range(1, T), e2, 'g-', lw=2, alpha=0.6, label='Conditional expectation error')
ax.legend()
plt.show()

Eigenvalues of A:
[ 0.9+0.j -0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
[0.1050718 0.41061709]]
522 29. A FIRST LOOK AT THE KALMAN FILTER

Footnotes
[1] See, for example, page 93 of [18]. To get from his expressions to the ones used above, you
will also need to apply the Woodbury matrix identity.
30

Reverse Engineering a la Muth

30.1 Contents

• Friedman (1956) and Muth (1960) 30.2

Co-author: Chase Coleman


In addition to what’s in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

We’ll also need the following imports

In [2]: import matplotlib.pyplot as plt


import numpy as np
import scipy.linalg as la

from quantecon import Kalman


from quantecon import LinearStateSpace
from scipy.stats import norm

%matplotlib inline
np.set_printoptions(linewidth=120, precision=4, suppress=True)

This lecture uses the Kalman filter to reformulate John F. Muth’s first paper [98] about ratio-
nal expectations
Muth used classical prediction methods to reverse engineer a stochastic process that renders
optimal Milton Friedman’s [43] “adaptive expectations” scheme

30.2 Friedman (1956) and Muth (1960)

Milton Friedman [43] (1956) posited that consumer’s forecast their future disposable income
with the adaptive expectations scheme



𝑦𝑡+𝑖,𝑡 = 𝐾 ∑(1 − 𝐾)𝑗 𝑦𝑡−𝑗 (1)
𝑗=0


where 𝐾 ∈ (0, 1) and 𝑦𝑡+𝑖,𝑡 is a forecast of future 𝑦 over horizon 𝑖

523
524 30. REVERSE ENGINEERING A LA MUTH

Milton Friedman justified the exponential smoothing forecasting scheme Eq. (1) infor-
mally, noting that it seemed a plausible way to use past income to forecast future income
In his first paper about rational expectations, John F. Muth [98] reverse-engineered a univari-
ate stochastic process {𝑦𝑡 }∞𝑡=−∞ for which Milton Friedman’s adaptive expectations scheme
gives linear least forecasts of 𝑦𝑡+𝑗 for any horizon 𝑖
Muth sought a setting and a sense in which Friedman’s forecasting scheme is optimal
That is, Muth asked for what optimal forecasting question is Milton Friedman’s adaptive
expectation scheme the answer
Muth (1960) used classical prediction methods based on lag-operators and 𝑧-transforms to
find the answer to his question
Please see lectures Classical Control with Linear Algebra and Classical Filtering and Predic-
tion with Linear Algebra for an introduction to the classical tools that Muth used
Rather than using those classical tools, in this lecture we apply the Kalman filter to express
the heart of Muth’s analysis concisely
The lecture First Look at Kalman Filter describes the Kalman filter
We’ll use limiting versions of the Kalman filter corresponding to what are called stationary
values in that lecture

30.2.1 A Process for Which Adaptive Expectations are Optimal

Suppose that an observable 𝑦𝑡 is the sum of an unobserved random walk 𝑥𝑡 and an IID shock
𝜖2,𝑡 :

𝑥𝑡+1 = 𝑥𝑡 + 𝜎𝑥 𝜖1,𝑡+1
(2)
𝑦𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡

where

𝜖
[ 1,𝑡+1 ] ∼ 𝒩(0, 𝐼)
𝜖2,𝑡

is an IID process
Note: A property of the state-space representation Eq. (2) is that in general neither 𝜖1,𝑡 nor
𝜖2,𝑡 is in the space spanned by square-summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , …
𝜖
In general [ 1,𝑡 ] has more information about future 𝑦𝑡+𝑗 ’s than is contained in 𝑦𝑡 , 𝑦𝑡−1 , …
𝜖2𝑡
We can use the asymptotic or stationary values of the Kalman gain and the one-step-ahead
conditional state covariance matrix to compute a time-invariant innovations representation

𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾𝑎𝑡
(3)
𝑦𝑡 = 𝑥𝑡̂ + 𝑎𝑡

where 𝑥𝑡̂ = 𝐸[𝑥𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …] and 𝑎𝑡 = 𝑦𝑡 − 𝐸[𝑦𝑡 |𝑦𝑡−1 , 𝑦𝑡−2 , …]


30.2. FRIEDMAN (1956) AND MUTH (1960) 525

Note: A key property about an innovations representation is that 𝑎𝑡 is in the space spanned
by square summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , …
For more ramifications of this property, see the lectures Shock Non-Invertibility and Recursive
Models of Dynamic Linear Economies
Later we’ll stack these state-space systems Eq. (2) and Eq. (3) to display some classic findings
of Muth
But first, let’s create an instance of the state-space system Eq. (2) then apply the quantecon
Kalman class, then uses it to construct the associated “innovations representation”

In [3]: # Make some parameter choices


# sigx/sigy are state noise std err and measurement noise std err
μ_0, σ_x, σ_y = 10, 1, 5

# Create a LinearStateSpace object


A, C, G, H = 1, σ_x, 1, σ_y
ss = LinearStateSpace(A, C, G, H, mu_0=μ_0)

# Set prior and initialize the Kalman type


x_hat_0, Σ_0 = 10, 1
kmuth = Kalman(ss, x_hat_0, Σ_0)

# Computes stationary values which we need for the innovation representation


S1, K1 = kmuth.stationary_values()

# Form innovation representation state-space


Ak, Ck, Gk, Hk = A, K1, G, 1

ssk = LinearStateSpace(Ak, Ck, Gk, Hk, mu_0=x_hat_0)

30.2.2 Some Useful State-Space Math

Now we want to map the time-invariant innovations representation Eq. (3) and the original
state-space system Eq. (2) into a convenient form for deducing the impulse responses from
the original shocks to the 𝑥𝑡 and 𝑥𝑡̂
Putting both of these representations into a single state-space system is yet another applica-
tion of the insight that “finding the state is an art”
We’ll define a state vector and appropriate state-space matrices that allow us to represent
both systems in one fell swoop
Note that

𝑎𝑡 = 𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂

so that

𝑥𝑡+1
̂ = 𝑥𝑡̂ + 𝐾(𝑥𝑡 + 𝜎𝑦 𝜖2,𝑡 − 𝑥𝑡̂ )
= (1 − 𝐾)𝑥𝑡̂ + 𝐾𝑥𝑡 + 𝐾𝜎𝑦 𝜖2,𝑡

The stacked system

𝑥𝑡+1 1 0 0 𝑥𝑡 𝜎𝑥 0
⎡ 𝑥̂ ⎤ = ⎡𝐾 (1 − 𝐾) 𝐾𝜎 ⎤ ⎡ 𝑥̂ ⎤ + ⎡ 0 0⎤ [𝜖1,𝑡+1 ]
⎢ 𝑡+1 ⎥ ⎢ 𝑦⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝜖
⎣𝜖2,𝑡+1 ⎦ ⎣ 0 0 0 ⎦ ⎣𝜖2,𝑡 ⎦ ⎣ 0 1⎦ 2,𝑡+1
526 30. REVERSE ENGINEERING A LA MUTH

𝑥
𝑦𝑡 1 0 𝜎𝑦 ⎡ 𝑡 ⎤
[ ]=[ ] 𝑥̂
𝑎𝑡 1 −1 𝜎𝑦 ⎢ 𝑡 ⎥
⎣𝜖2,𝑡 ⎦

𝜖
is a state-space system that tells us how the shocks [ 1,𝑡+1 ] affect states 𝑥𝑡+1
̂ , 𝑥𝑡 , the observ-
𝜖2,𝑡+1
able 𝑦𝑡 , and the innovation 𝑎𝑡
With this tool at our disposal, let’s form the composite system and simulate it

In [4]: # Create grand state-space for y_t, a_t as observed vars -- Use stacking trick above
Af = np.array([[ 1, 0, 0],
[K1, 1 - K1, K1 * σ_y],
[ 0, 0, 0]])
Cf = np.array([[σ_x, 0],
[ 0, K1 * σ_y],
[ 0, 1]])
Gf = np.array([[1, 0, σ_y],
[1, -1, σ_y]])

μ_true, μ_prior = 10, 10


μ_f = np.array([μ_true, μ_prior, 0]).reshape(3, 1)

# Create the state-space


ssf = LinearStateSpace(Af, Cf, Gf, mu_0=μ_f)

# Draw observations of y from the state-space model


N = 50
xf, yf = ssf.simulate(N)

print(f"Kalman gain = {K1}")


print(f"Conditional variance = {S1}")

Kalman gain = [[0.181]]


Conditional variance = [[5.5249]]

Now that we have simulated our joint system, we have 𝑥𝑡 , 𝑥𝑡̂ , and 𝑦𝑡
We can now investigate how these variables are related by plotting some key objects

30.2.3 Estimates of Unobservables

First, let’s plot the hidden state 𝑥𝑡 and the filtered version 𝑥𝑡̂ that is linear-least squares pro-
jection of 𝑥𝑡 on the history 𝑦𝑡−1 , 𝑦𝑡−2 , …

In [5]: plt.plot(xf[0, :], label="$x_t$")


plt.plot(xf[1, :], label="Filtered $x_t$")
plt.legend()
plt.xlabel("Time")
plt.title(r"$x$ vs $\hat{x}$")
plt.show()
30.2. FRIEDMAN (1956) AND MUTH (1960) 527

Note how 𝑥𝑡 and 𝑥𝑡̂ differ


For Friedman, 𝑥𝑡̂ and not 𝑥𝑡 is the consumer’s idea about her/his permanent income

30.2.4 Relation between Unobservable and Observable

Now let’s plot 𝑥𝑡 and 𝑦𝑡


Recall that 𝑦𝑡 is just 𝑥𝑡 plus white noise

In [6]: plt.plot(yf[0, :], label="y")


plt.plot(xf[0, :], label="x")
plt.legend()
plt.title(r"$x$ and $y$")
plt.xlabel("Time")
plt.show()
528 30. REVERSE ENGINEERING A LA MUTH

We see above that 𝑦 seems to look like white noise around the values of 𝑥

30.2.5 Innovations

Recall that we wrote down the innovation representation that depended on 𝑎𝑡 . We now plot
the innovations {𝑎𝑡 }:

In [7]: plt.plot(yf[1, :], label="a")


plt.legend()
plt.title(r"Innovation $a_t$")
plt.xlabel("Time")
plt.show()
30.2. FRIEDMAN (1956) AND MUTH (1960) 529

30.2.6 MA and AR Representations

Now we shall extract from the Kalman instance kmuth coefficients of

• a fundamental moving average representation that represents 𝑦𝑡 as a one-sided moving


sum of current and past 𝑎𝑡 s that are square summable linear combinations of 𝑦𝑡 , 𝑦𝑡−1 , …
• a univariate autoregression representation that depicts the coefficients in a linear least
square projection of 𝑦𝑡 on the semi-infinite history 𝑦𝑡−1 , 𝑦𝑡−2 , …

Then we’ll plot each of them

In [8]: # Kalman Methods for MA and VAR


coefs_ma = kmuth.stationary_coefficients(5, "ma")
coefs_var = kmuth.stationary_coefficients(5, "var")

# Coefficients come in a list of arrays, but we


# want to plot them and so need to stack into an array
coefs_ma_array = np.vstack(coefs_ma)
coefs_var_array = np.vstack(coefs_var)

fig, ax = plt.subplots(2)
ax[0].plot(coefs_ma_array, label="MA")
ax[0].legend()
ax[1].plot(coefs_var_array, label="VAR")
ax[1].legend()

plt.show()
530 30. REVERSE ENGINEERING A LA MUTH

The moving average coefficients in the top panel show tell-tale signs of 𝑦𝑡 being a process
whose first difference is a first-order autoregression
The autoregressive coefficients decline geometrically with decay rate (1 − 𝐾)
These are exactly the target outcomes that Muth (1960) aimed to reverse engineer

In [9]: print(f'decay parameter 1 - K1 = {1 - K1}')

decay parameter 1 - K1 = [[0.819]]


Part VI

Dynamic Programming

531
31

Shortest Paths

31.1 Contents

• Overview 31.2

• Outline of the Problem 31.3

• Finding Least-Cost Paths 31.4

• Solving for Minimum Cost-to-Go 31.5

• Exercises 31.6

• Solutions 31.7

31.2 Overview

The shortest path problem is a classic problem in mathematics and computer science with
applications in

• Economics (sequential decision making, analysis of social networks, etc.)


• Operations research and transportation
• Robotics and artificial intelligence
• Telecommunication network design and routing
• etc., etc.

Variations of the methods we discuss in this lecture are used millions of times every day, in
applications such as

• Google Maps
• routing packets on the internet

For us, the shortest path problem also provides a nice introduction to the logic of dynamic
programming
Dynamic programming is an extremely powerful optimization technique that we apply in
many lectures on this site

533
534 31. SHORTEST PATHS

31.3 Outline of the Problem

The shortest path problem is one of finding how to traverse a graph from one specified node
to another at minimum cost
Consider the following graph

We wish to travel from node (vertex) A to node G at minimum cost

• Arrows (edges) indicate the movements we can take


• Numbers on edges indicate the cost of traveling that edge

Possible interpretations of the graph include

• Minimum cost for supplier to reach a destination


• Routing of packets on the internet (minimize time)
• Etc., etc.

For this simple graph, a quick scan of the edges shows that the optimal paths are

• A, C, F, G at cost 8
31.4. FINDING LEAST-COST PATHS 535

- A, D, F, G at cost 8

31.4 Finding Least-Cost Paths

For large graphs, we need a systematic solution


Let 𝐽 (𝑣) denote the minimum cost-to-go from node 𝑣, understood as the total cost from 𝑣 if
we take the best route
Suppose that we know 𝐽 (𝑣) for each node 𝑣, as shown below for the graph from the preceding
example

Note that 𝐽 (𝐺) = 0


The best path can now be found as follows

• Start at A
• From node v, move to any node that solves
536 31. SHORTEST PATHS

min {𝑐(𝑣, 𝑤) + 𝐽 (𝑤)} (1)


𝑤∈𝐹𝑣

where

• 𝐹𝑣 is the set of nodes that can be reached from 𝑣 in one step


• 𝑐(𝑣, 𝑤) is the cost of traveling from 𝑣 to 𝑤

Hence, if we know the function 𝐽 , then finding the best path is almost trivial
But how to find 𝐽 ?
Some thought will convince you that, for every node 𝑣, the function 𝐽 satisfies

𝐽 (𝑣) = min {𝑐(𝑣, 𝑤) + 𝐽 (𝑤)} (2)


𝑤∈𝐹𝑣

This is known as the Bellman equation, after the mathematician Richard Bellman

31.5 Solving for Minimum Cost-to-Go

The standard algorithm for finding 𝐽 is to start with

𝐽0 (𝑣) = 𝑀 if 𝑣 ≠ destination, else 𝐽0 (𝑣) = 0 (3)

where 𝑀 is some large number


Now we use the following algorithm

1. Set 𝑛 = 0
2. Set 𝐽𝑛+1 (𝑣) = min𝑤∈𝐹𝑣 {𝑐(𝑣, 𝑤) + 𝐽𝑛 (𝑤)} for all 𝑣
3. If 𝐽𝑛+1 and 𝐽𝑛 are not equal then increment 𝑛, go to 2

In general, this sequence converges to 𝐽 —the proof is omitted

31.6 Exercises

31.6.1 Exercise 1

Use the algorithm given above to find the optimal path (and its cost) for the following graph
You can put it in a Jupyter notebook cell and hit Shift-Enter — it will be saved in the local
directory as file graph.txt

In [1]: %%file graph.txt


node0, node1 0.04, node8 11.11, node14 72.21
node1, node46 1247.25, node6 20.59, node13 64.94
node2, node66 54.18, node31 166.80, node45 1561.45
node3, node20 133.65, node6 2.06, node11 42.43
node4, node75 3706.67, node5 0.73, node7 1.02
node5, node45 1382.97, node7 3.33, node11 34.54
31.6. EXERCISES 537

node6, node31 63.17, node9 0.72, node10 13.10


node7, node50 478.14, node9 3.15, node10 5.85
node8, node69 577.91, node11 7.45, node12 3.18
node9, node70 2454.28, node13 4.42, node20 16.53
node10, node89 5352.79, node12 1.87, node16 25.16
node11, node94 4961.32, node18 37.55, node20 65.08
node12, node84 3914.62, node24 34.32, node28 170.04
node13, node60 2135.95, node38 236.33, node40 475.33
node14, node67 1878.96, node16 2.70, node24 38.65
node15, node91 3597.11, node17 1.01, node18 2.57
node16, node36 392.92, node19 3.49, node38 278.71
node17, node76 783.29, node22 24.78, node23 26.45
node18, node91 3363.17, node23 16.23, node28 55.84
node19, node26 20.09, node20 0.24, node28 70.54
node20, node98 3523.33, node24 9.81, node33 145.80
node21, node56 626.04, node28 36.65, node31 27.06
node22, node72 1447.22, node39 136.32, node40 124.22
node23, node52 336.73, node26 2.66, node33 22.37
node24, node66 875.19, node26 1.80, node28 14.25
node25, node70 1343.63, node32 36.58, node35 45.55
node26, node47 135.78, node27 0.01, node42 122.00
node27, node65 480.55, node35 48.10, node43 246.24
node28, node82 2538.18, node34 21.79, node36 15.52
node29, node64 635.52, node32 4.22, node33 12.61
node30, node98 2616.03, node33 5.61, node35 13.95
node31, node98 3350.98, node36 20.44, node44 125.88
node32, node97 2613.92, node34 3.33, node35 1.46
node33, node81 1854.73, node41 3.23, node47 111.54
node34, node73 1075.38, node42 51.52, node48 129.45
node35, node52 17.57, node41 2.09, node50 78.81
node36, node71 1171.60, node54 101.08, node57 260.46
node37, node75 269.97, node38 0.36, node46 80.49
node38, node93 2767.85, node40 1.79, node42 8.78
node39, node50 39.88, node40 0.95, node41 1.34
node40, node75 548.68, node47 28.57, node54 53.46
node41, node53 18.23, node46 0.28, node54 162.24
node42, node59 141.86, node47 10.08, node72 437.49
node43, node98 2984.83, node54 95.06, node60 116.23
node44, node91 807.39, node46 1.56, node47 2.14
node45, node58 79.93, node47 3.68, node49 15.51
node46, node52 22.68, node57 27.50, node67 65.48
node47, node50 2.82, node56 49.31, node61 172.64
node48, node99 2564.12, node59 34.52, node60 66.44
node49, node78 53.79, node50 0.51, node56 10.89
node50, node85 251.76, node53 1.38, node55 20.10
node51, node98 2110.67, node59 23.67, node60 73.79
node52, node94 1471.80, node64 102.41, node66 123.03
node53, node72 22.85, node56 4.33, node67 88.35
node54, node88 967.59, node59 24.30, node73 238.61
node55, node84 86.09, node57 2.13, node64 60.80
node56, node76 197.03, node57 0.02, node61 11.06
node57, node86 701.09, node58 0.46, node60 7.01
node58, node83 556.70, node64 29.85, node65 34.32
node59, node90 820.66, node60 0.72, node71 0.67
node60, node76 48.03, node65 4.76, node67 1.63
node61, node98 1057.59, node63 0.95, node64 4.88
node62, node91 132.23, node64 2.94, node76 38.43
node63, node66 4.43, node72 70.08, node75 56.34
node64, node80 47.73, node65 0.30, node76 11.98
node65, node94 594.93, node66 0.64, node73 33.23
node66, node98 395.63, node68 2.66, node73 37.53
node67, node82 153.53, node68 0.09, node70 0.98
node68, node94 232.10, node70 3.35, node71 1.66
node69, node99 247.80, node70 0.06, node73 8.99
node70, node76 27.18, node72 1.50, node73 8.37
node71, node89 104.50, node74 8.86, node91 284.64
node72, node76 15.32, node84 102.77, node92 133.06
node73, node83 52.22, node76 1.40, node90 243.00
node74, node81 1.07, node76 0.52, node78 8.08
node75, node92 68.53, node76 0.81, node77 1.19
node76, node85 13.18, node77 0.45, node78 2.36
node77, node80 8.94, node78 0.98, node86 64.32
node78, node98 355.90, node81 2.59
538 31. SHORTEST PATHS

node79, node81 0.09, node85 1.45, node91 22.35


node80, node92 121.87, node88 28.78, node98 264.34
node81, node94 99.78, node89 39.52, node92 99.89
node82, node91 47.44, node88 28.05, node93 11.99
node83, node94 114.95, node86 8.75, node88 5.78
node84, node89 19.14, node94 30.41, node98 121.05
node85, node97 94.51, node87 2.66, node89 4.90
node86, node97 85.09
node87, node88 0.21, node91 11.14, node92 21.23
node88, node93 1.31, node91 6.83, node98 6.12
node89, node97 36.97, node99 82.12
node90, node96 23.53, node94 10.47, node99 50.99
node91, node97 22.17
node92, node96 10.83, node97 11.24, node99 34.68
node93, node94 0.19, node97 6.71, node99 32.77
node94, node98 5.91, node96 2.03
node95, node98 6.17, node99 0.27
node96, node98 3.32, node97 0.43, node99 5.87
node97, node98 0.30
node98, node99 0.33
node99,

Writing graph.txt

Here the line node0, node1 0.04, node8 11.11, node14 72.21 means that from
node0 we can go to

• node1 at cost 0.04


• node8 at cost 11.11
• node14 at cost 72.21

and so on
According to our calculations, the optimal path and its cost are like this
Your code should replicate this result

31.7 Solutions

31.7.1 Exercise 1
In [2]: def read_graph(in_file):
""" Read in the graph from the data file. The graph is stored
as a dictionary, where the keys are the nodes and the values
are a list of pairs (d, c), where d is a node and c is a number.
If (d, c) is in the list for node n, then d can be reached from
n at cost c.
"""
graph = {}
infile = open(in_file)
for line in infile:
elements = line.split(',')
node = elements.pop(0)
graph[node] = []
if node != 'node99':
for element in elements:
destination, cost = element.split()
graph[node].append((destination, float(cost)))
infile.close()
return graph

def update_J(J, graph):


31.7. SOLUTIONS 539

"The Bellman operator."


next_J = {}
for node in graph:
if node == 'node99':
next_J[node] = 0
else:
next_J[node] = min(cost + J[dest] for dest, cost in graph[node])
return next_J

def print_best_path(J, graph):


""" Given a cost-to-go function, computes the best path. At each node n,
the function prints the current location, looks at all nodes that can be
reached from n, and moves to the node m which minimizes c + J[m], where c
is the cost of moving to m.
"""
sum_costs = 0
current_location = 'node0'
while current_location != 'node99':
print(current_location)
running_min = 1e100 # Any big number
for destination, cost in graph[current_location]:
cost_of_path = cost + J[destination]
if cost_of_path < running_min:
running_min = cost_of_path
minimizer_cost = cost
minimizer_dest = destination
current_location = minimizer_dest
sum_costs += minimizer_cost

print('node99\n')
print('Cost: ', sum_costs)

## Main loop

graph = read_graph('graph.txt')
M = 1e10
J = {}
for node in graph:
J[node] = M
J['node99'] = 0

while True:
next_J = update_J(J, graph)
if next_J == J:
break
else:
J = next_J

print_best_path(J, graph)

node0
node8
node11
node18
node23
node33
node41
node53
node56
node57
node60
node67
node70
node73
node76
node85
node87
node88
node93
node94
node96
node97
540 31. SHORTEST PATHS

node98
node99

Cost: 160.55000000000007
32

Job Search I: The McCall Search


Model

32.1 Contents

• Overview 32.2

• The McCall Model 32.3

• Computing the Optimal Policy: Take 1 32.4

• Computing the Optimal Policy: Take 2 32.5

• Exercises 32.6

• Solutions 32.7

“Questioning a McCall worker is like having a conversation with an out-of-work


friend: ‘Maybe you are setting your sights too high’, or ‘Why did you quit your
old job before you had a new one lined up?’ This is real social science: an attempt
to model, to understand, human behavior by visualizing the situation people find
themselves in, the options they face and the pros and cons as they themselves see
them.” – Robert E. Lucas, Jr.

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

32.2 Overview

The McCall search model [94] helped transform economists’ way of thinking about labor mar-
kets
To clarify vague notions such as “involuntary” unemployment, McCall modeled the decision
problem of unemployed agents directly, in terms of factors such as

• current and likely future wages

541
542 32. JOB SEARCH I: THE MCCALL SEARCH MODEL

• impatience
• unemployment compensation

To solve the decision problem he used dynamic programming


Here we set up McCall’s model and adopt the same solution method
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehi-
cle for learning dynamic programming
Let’s start with some imports

In [2]: import numpy as np


from numba import jit
import matplotlib.pyplot as plt
import quantecon as qe
from quantecon.distributions import BetaBinomial

32.3 The McCall Model

An unemployed worker receives in each period a job offer at wage 𝑊𝑡


At time 𝑡, our worker has two choices:

1. Accept the offer and work permanently at constant wage 𝑊𝑡


2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period

The wage sequence is assumed to be IID with probability mass function 𝜙


Thus 𝜙(𝑤) is the probability of observing wage offer 𝑤 in the set 𝑤1 , … , 𝑤𝑛
The worker is infinitely lived and aims to maximize the expected discounted sum of earnings


E ∑ 𝛽 𝑡 𝑌𝑡
𝑡=0

The constant 𝛽 lies in (0, 1) and is called a discount factor


The smaller is 𝛽, the more the worker discounts future utility relative to current utility
The variable 𝑌𝑡 is income, equal to

• his wage 𝑊𝑡 when employed


• unemployment compensation 𝑐 when unemployed

32.3.1 A Trade-Off

The worker faces a trade-off:

• Waiting too long for a good offer is costly, since the future is discounted
• Accepting too early is costly, since better offers might arrive in the future

To decide optimally in the face of this trade-off, we use dynamic programming


Dynamic programming can be thought of as a two-step procedure that
32.3. THE MCCALL MODEL 543

1. first assigns values to “states” and


2. then deduces optimal actions given those values

We’ll go through these steps in turn

32.3.2 The Value Function

In order to optimally trade-off current and future rewards, we need to think about two things:

1. the current payoffs we get from different choices


2. the different states that those choices will lead to in next period (in this case, either em-
ployment or unemployment)

To weigh these two aspects of the decision problem, we need to assign values to states
To this end, let 𝑣∗ (𝑤) be the total lifetime value accruing to an unemployed worker who en-
ters the current period unemployed but with wage offer 𝑤 in hand
More precisely, 𝑣∗ (𝑤) denotes the value of the objective function (1) when an agent in this
situation makes optimal decisions now and at all future points in time
Of course 𝑣∗ (𝑤) is not trivial to calculate because we don’t yet know what decisions are opti-
mal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible wage 𝑤 the maximal lifetime value
that can be obtained with that offer in hand
A crucial observation is that this function 𝑣∗ must satisfy the recursion

𝑤
𝑣∗ (𝑤) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )} (1)
1−𝛽 𝑤′

for every possible 𝑤 in 𝑤1 , … , 𝑤𝑛


This important equation is a version of the Bellman equation, which is ubiquitous in eco-
nomic dynamics and other fields involving planning over time
The intuition behind it is as follows:

• the first term inside the max operation is the lifetime payoff from accepting current of-
fer 𝑤, since

𝑤
𝑤 + 𝛽𝑤 + 𝛽 2 𝑤 + ⋯ =
1−𝛽

• the second term inside the max operation is the continuation value, which is the life-
time payoff from rejecting the current offer and then behaving optimally in all subse-
quent periods

If we optimize and pick the best of these two options, we obtain maximal lifetime value from
today, given current offer 𝑤
But this is precisely 𝑣∗ (𝑤), which is the l.h.s. of Eq. (1)
544 32. JOB SEARCH I: THE MCCALL SEARCH MODEL

32.3.3 The Optimal Policy

Suppose for now that we are able to solve Eq. (1) for the unknown function 𝑣∗
Once we have this function in hand we can behave optimally (i.e., make the right choice be-
tween accept and reject)
All we have to do is select the maximal choice on the r.h.s. of Eq. (1)
The optimal action is best thought of as a policy, which is, in general, a map from states to
actions
In our case, the state is the current wage offer 𝑤
Given any 𝑤, we can read off the corresponding best choice (accept or reject) by picking the
max on the r.h.s. of Eq. (1)
Thus, we have a map from R to {0, 1}, with 1 meaning accept and 0 meaning reject
We can write the policy as follows

𝑤
𝜎(𝑤) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )}
1−𝛽 𝑤′

Here 1{𝑃 } = 1 if statement 𝑃 is true and equals 0 otherwise


We can also write this as

𝜎(𝑤) ∶= 1{𝑤 ≥ 𝑤}
̄

where

𝑤̄ ∶= (1 − 𝛽) {𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ )}
𝑤′

Here 𝑤̄ is a constant depending on 𝛽, 𝑐 and the wage distribution called the reservation wage
The agent should accept if and only if the current wage offer exceeds the reservation wage
Clearly, we can compute this reservation wage if we can compute the value function

32.4 Computing the Optimal Policy: Take 1

To put the above ideas into action, we need to compute the value function at points
𝑤1 , … , 𝑤 𝑛
In doing so, we can identify these values with the vector 𝑣∗ = (𝑣𝑖∗ ) where 𝑣𝑖∗ ∶= 𝑣∗ (𝑤𝑖 )
In view of Eq. (1), this vector satisfies the nonlinear system of equations

𝑤𝑖
𝑣𝑖∗ = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗∗ 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (2)
1−𝛽 𝑗
32.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 545

32.4.1 The Algorithm

To compute this vector, we proceed as follows:


Step 1: pick an arbitrary initial guess 𝑣 ∈ R𝑛
Step 2: compute a new vector 𝑣′ ∈ R𝑛 via

𝑤𝑖
𝑣𝑖′ = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (3)
1−𝛽 𝑗

Step 3: calculate a measure of the deviation between 𝑣 and 𝑣′ , such as max𝑖 |𝑣𝑖 − 𝑣𝑖′ |
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else
continue
Step 5: return 𝑣
This algorithm returns an arbitrarily good approximation to the true solution to Eq. (2),
which represents the value function
(Arbitrarily good means here that the approximation converges to the true solution as the
tolerance goes to zero)

32.4.2 The Fixed Point Theory

What’s the math behind these ideas?


First, one defines a mapping 𝑇 from R𝑛 to itself via

𝑤𝑖
(𝑇 𝑣)𝑖 = max { , 𝑐 + 𝛽 ∑ 𝑣𝑗 𝜙(𝑤𝑗 )} for 𝑖 = 1, … , 𝑛 (4)
1−𝛽 𝑗

(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖)
One can show that the conditions of the Banach contraction mapping theorem are satisfied by
𝑇 as a self-mapping on R𝑛
One implication is that 𝑇 has a unique fixed point in R𝑛
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is precisely the value
function
The iterative algorithm presented above corresponds to iterating with 𝑇 from some initial
guess 𝑣
The Banach contraction mapping theorem tells us that this iterative process generates a se-
quence that converges to the fixed point

32.4.3 Implementation

Here’s the distribution of wage offers we’ll work with

In [3]: n, a, b = 50, 200, 100


w_min, w_max = 10, 60
w_vals = np.linspace(w_min, w_max, n+1)
546 32. JOB SEARCH I: THE MCCALL SEARCH MODEL

dist = BetaBinomial(n, a, b)
�_vals = dist.pdf()

fig, ax = plt.subplots(figsize=(9, 6.5))


ax.stem(w_vals, �_vals, label='$\phi (w\')$')
ax.set_xlabel('wages')
ax.set_ylabel('probabilities')

plt.show()

First, let’s have a look at the sequence of approximate value functions that the algorithm
above generates
Default parameter values are embedded in the function
Our initial guess 𝑣 is the value of accepting at every given wage

In [4]: def plot_value_function_seq(ax,


c=25,
β=0.99,
w_vals=w_vals,
�_vals=�_vals,
num_plots=6):

v = w_vals / (1 - β)
v_next = np.empty_like(v)
for i in range(num_plots):
ax.plot(w_vals, v, label=f"iterate {i}")
# Update guess
for j, w in enumerate(w_vals):
stop_val = w / (1 - β)
cont_val = c + β * np.sum(v * �_vals)
v_next[j] = max(stop_val, cont_val)
v[:] = v_next

ax.legend(loc='lower right')
32.4. COMPUTING THE OPTIMAL POLICY: TAKE 1 547

fig, ax = plt.subplots(figsize=(9, 6.5))


plot_value_function_seq(ax)
plt.show()

Here’s more serious iteration effort, that continues until measured deviation between succes-
sive iterates is below tol
We’ll be using JIT compilation via Numba to turbo charge our loops

In [5]: @jit(nopython=True)
def compute_reservation_wage(c=25,
β=0.99,
w_vals=w_vals,
�_vals=�_vals,
max_iter=500,
tol=1e-6):

# == First compute the value function == #

v = w_vals / (1 - β)
v_next = np.empty_like(v)
i = 0
error = tol + 1
while i < max_iter and error > tol:

for j, w in enumerate(w_vals):
stop_val = w / (1 - β)
cont_val = c + β * np.sum(v * �_vals)
v_next[j] = max(stop_val, cont_val)

error = np.max(np.abs(v_next - v))


i += 1

v[:] = v_next # copy contents into v


548 32. JOB SEARCH I: THE MCCALL SEARCH MODEL

# == Now compute the reservation wage == #

return (1 - β) * (c + β * np.sum(v * �_vals))

Let’s compute the reservation wage at the default parameters

In [6]: compute_reservation_wage()

Out[6]: 47.316499710024964

32.4.4 Comparative Statics

Now we know how to compute the reservation wage, let’s see how it varies with parameters
In particular, let’s look at what happens when we change 𝛽 and 𝑐

In [7]: grid_size = 25
R = np.empty((grid_size, grid_size))

c_vals = np.linspace(10.0, 30.0, grid_size)


β_vals = np.linspace(0.9, 0.99, grid_size)

for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
R[i, j] = compute_reservation_wage(c=c, β=β)

In [8]: fig, ax = plt.subplots(figsize=(10, 5.7))

cs1 = ax.contourf(c_vals, β_vals, R.T, alpha=0.75)


ctr1 = ax.contour(c_vals, β_vals, R.T)

plt.clabel(ctr1, inline=1, fontsize=13)


plt.colorbar(cs1, ax=ax)

ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)

ax.ticklabel_format(useOffset=False)

plt.show()
32.5. COMPUTING THE OPTIMAL POLICY: TAKE 2 549

As expected, the reservation wage increases both with patience and with unemployment com-
pensation

32.5 Computing the Optimal Policy: Take 2

The approach to dynamic programming just described is very standard and broadly applica-
ble
For this particular problem, there’s also an easier way, which circumvents the need to com-
pute the value function
Let ℎ denote the value of not accepting a job in this period but then behaving optimally in
all subsequent periods
That is,

ℎ = 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝜙(𝑤′ ) (5)


𝑤′

where 𝑣∗ is the value function


By the Bellman equation, we then have

𝑤′
𝑣∗ (𝑤′ ) = max { , ℎ}
1−𝛽

Substituting this last equation into Eq. (5) gives

𝑤′
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝜙(𝑤′ ) (6)
𝑤′
1−𝛽
550 32. JOB SEARCH I: THE MCCALL SEARCH MODEL

This is a nonlinear equation that we can solve for ℎ


The natural solution method for this kind of nonlinear equation is iterative
That is,
Step 1: pick an initial guess ℎ
Step 2: compute the update ℎ′ via

𝑤′
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝜙(𝑤′ ) (7)
𝑤′
1−𝛽

Step 3: calculate the deviation |ℎ − ℎ′ |


Step 4: if the deviation is larger than some fixed tolerance, set ℎ = ℎ′ and go to step 2, else
continue
Step 5: return ℎ
Once again, one can use the Banach contraction mapping theorem to show that this process
always converges
The big difference here, however, is that we’re iterating on a single number, rather than an
𝑛-vector
Here’s an implementation:

In [9]: @jit(nopython=True)
def compute_reservation_wage_two(c=25,
β=0.99,
w_vals=w_vals,
�_vals=�_vals,
max_iter=500,
tol=1e-5):

# == First compute � == #

h = np.sum(w_vals * �_vals) / (1 - β)
i = 0
error = tol + 1
while i < max_iter and error > tol:

s = np.maximum(w_vals / (1 - β), h)
h_next = c + β * np.sum(s * �_vals)

error = np.abs(h_next - h)
i += 1

h = h_next

# == Now compute the reservation wage == #

return (1 - β) * h

You can use this code to solve the exercise below


32.6. EXERCISES 551

32.6 Exercises

32.6.1 Exercise 1

Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following
values

c_vals = np.linspace(10, 40, 25)

That is, start the agent off as unemployed, computed their reservation wage given the param-
eters, and then simulate to see how long it takes to accept
Repeat a large number of times and take the average
Plot mean unemployment duration as a function of 𝑐 in c_vals

32.7 Solutions

32.7.1 Exercise 1

Here’s one solution

In [10]: cdf = np.cumsum(�_vals)

@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):

np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_vals[qe.random.draw(cdf)]
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time

@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
return obs.mean()

c_vals = np.linspace(10, 40, 25)


stop_times = np.empty_like(c_vals)
for i, c in enumerate(c_vals):
w_bar = compute_reservation_wage_two(c=c)
stop_times[i] = compute_mean_stopping_time(w_bar)

fig, ax = plt.subplots(figsize=(9, 6.5))

ax.plot(c_vals, stop_times, label="mean unemployment duration")


ax.set(xlabel="unemployment compensation", ylabel="months")
ax.legend()

plt.show()
552 32. JOB SEARCH I: THE MCCALL SEARCH MODEL
33

Job Search II: Search and


Separation

33.1 Contents

• Overview 33.2

• The Model 33.3

• Solving the Model using Dynamic Programming 33.4

• Implementation 33.5

• The Reservation Wage 33.6

• Exercises 33.7

• Solutions 33.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

33.2 Overview

Previously we looked at the McCall job search model [94] as a way of understanding unem-
ployment and worker decisions
One unrealistic feature of the model is that every job is permanent
In this lecture, we extend the McCall model by introducing job separation
Once separation enters the picture, the agent comes to view

• the loss of a job as a capital loss, and


• a spell of unemployment as an investment in searching for an acceptable job

We’ll need the following imports

553
554 33. JOB SEARCH II: SEARCH AND SEPARATION

In [2]: import numpy as np


from quantecon.distributions import BetaBinomial
from numba import njit
import matplotlib.pyplot as plt
%matplotlib inline

33.3 The Model

The model concerns the life of an infinitely lived worker and

• the opportunities he or she (let’s say he to save one character) has to work at different
wages
• exogenous events that destroy his current job
• his decision making process while unemployed

The worker can be in one of two states: employed or unemployed


He wants to maximize


E ∑ 𝛽 𝑡 𝑢(𝑌𝑡 ) (1)
𝑡=0

The only difference from the baseline model is that we’ve added some flexibility over prefer-
ences by introducing a utility function 𝑢
It satisfies 𝑢′ > 0 and 𝑢″ < 0

33.3.1 Timing and Decisions

Here’s what happens at the start of a given period in our model with search and separation
If currently employed, the worker consumes his wage 𝑤, receiving utility 𝑢(𝑤)
If currently unemployed, he

• receives and consumes unemployment compensation 𝑐


• receives an offer to start work next period at a wage 𝑤′ drawn from a known distribu-
tion 𝜙

He can either accept or reject the offer


If he accepts the offer, he enters next period employed with wage 𝑤′
If he rejects the offer, he enters next period unemployed
When employed, the agent faces a constant probability 𝛼 of becoming unemployed at the end
of the period
(Note: we do not allow for job search while employed—this topic is taken up in a later lec-
ture)
33.4. SOLVING THE MODEL USING DYNAMIC PROGRAMMING 555

33.4 Solving the Model using Dynamic Programming

Let

• 𝑣(𝑤) be the total lifetime value accruing to a worker who enters the current period em-
ployed with wage 𝑤
• ℎ be the total lifetime value accruing to a worker who is unemployed this period

Here value means the value of the objective function Eq. (1) when the worker makes optimal
decisions at all future points in time
Suppose for now that the worker can calculate the function 𝑣 and the constant ℎ and use
them in his decision making
Then 𝑣 and ℎ should satisfy

𝑣(𝑤) = 𝑢(𝑤) + 𝛽[(1 − 𝛼)𝑣(𝑤) + 𝛼ℎ] (2)

and

ℎ = 𝑢(𝑐) + 𝛽 ∑ max {ℎ, 𝑣(𝑤′ )} 𝜙(𝑤′ ) (3)


𝑤′

Let’s interpret these two equations in light of the fact that today’s tomorrow is tomorrow’s
today

• The left-hand sides of equations Eq. (2) and Eq. (3) are the values of a worker in a par-
ticular situation today
• The right-hand sides of the equations are the discounted (by 𝛽) expected values of the
possible situations that worker can be in tomorrow
• But tomorrow the worker can be in only one of the situations whose values today are on
the left sides of our two equations

Equation Eq. (3) incorporates the fact that a currently unemployed worker will maximize his
own welfare
In particular, if his next period wage offer is 𝑤′ , he will choose to remain unemployed unless
ℎ < 𝑣(𝑤′ )
Equations Eq. (2) and Eq. (3) are the Bellman equations for this model
Equations Eq. (2) and Eq. (3) provide enough information to solve out for both 𝑣 and ℎ
Before discussing this, however, let’s make a small extension to the model

33.4.1 Stochastic Offers

Let’s suppose now that unemployed workers don’t always receive job offers
Instead, let’s suppose that unemployed workers only receive an offer with probability 𝛾
If our worker does receive an offer, the wage offer is drawn from 𝜙 as before
He either accepts or rejects the offer
556 33. JOB SEARCH II: SEARCH AND SEPARATION

Otherwise, the model is the same


With some thought, you will be able to convince yourself that 𝑣 and ℎ should now satisfy

𝑣(𝑤) = 𝑢(𝑤) + 𝛽[(1 − 𝛼)𝑣(𝑤) + 𝛼ℎ] (4)

and

ℎ = 𝑢(𝑐) + 𝛽(1 − 𝛾)ℎ + 𝛽𝛾 ∑ max {ℎ, 𝑣(𝑤′ )} 𝜙(𝑤′ ) (5)


𝑤′

33.4.2 Solving the Bellman Equations

We’ll use the same iterative approach to solving the Bellman equations that we adopted in
the first job search lecture
Here this amounts to

1. make guesses for ℎ and 𝑣


2. plug these guesses into the right-hand sides of Eq. (4) and Eq. (5)
3. update the left-hand sides from this rule and then repeat

In other words, we are iterating using the rules

𝑣𝑛+1 (𝑤′ ) = 𝑢(𝑤′ ) + 𝛽[(1 − 𝛼)𝑣𝑛 (𝑤′ ) + 𝛼ℎ𝑛 ] (6)

and

ℎ𝑛+1 = 𝑢(𝑐) + 𝛽(1 − 𝛾)ℎ𝑛 + 𝛽𝛾 ∑ max{ℎ𝑛 , 𝑣𝑛 (𝑤′ )}𝜙(𝑤′ ) (7)


𝑤′

starting from some initial conditions ℎ0 , 𝑣0


As before, the system always converges to the true solutions—in this case, the 𝑣 and ℎ that
solve Eq. (4) and Eq. (5)
A proof can be obtained via the Banach contraction mapping theorem

33.5 Implementation

Let’s implement this iterative process


In the code, you’ll see that we use a class to store the various parameters and other objects
associated with a given model
This helps to tidy up the code and provides an object that’s easy to pass to functions
The default utility function is a CRRA utility function

In [3]: # A default utility function

@njit
33.5. IMPLEMENTATION 557

def u(c, σ):


if c > 0:
return (c**(1 - σ) - 1) / (1 - σ)
else:
return -10e6

class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""

def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount factor
γ=0.7, # Job offer rate
c=6.0, # Unemployment compensation
σ=2.0, # Utility parameter
w_vals=None, # Possible wage values
�_vals=None): # Probabilities over w_vals

self.α, self.β, self.γ, self.c = α, β, γ, c


self.σ = σ

# Add a default wage vector and probabilities over the vector using
# the beta-binomial distribution
if w_vals is None:
n = 60 # number of possible outcomes for wage
self.w_vals = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n-1, a, b)
self.�_vals = dist.pdf()
else:
self.w_vals = w_vals
self.�_vals = �_vals

The following defines jitted versions of the Bellman operators ℎ and 𝑣

In [4]: @njit
def Q(v, h, paras):
"""
A jitted function to update the Bellman equations

"""

α, β, γ, c, σ, w_vals, �_vals = paras

v_new = np.empty_like(v)

for i in range(len(w_vals)):
w = w_vals[i]
v_new[i] = u(w, σ) + β * ((1 - α) * v[i] + α * h)

h_new = u(c, σ) + β * (1 - γ) * h + \
β * γ * np.sum(np.maximum(h, v) * �_vals)

return v_new, h_new

The approach is to iterate until successive iterates are closer together than some small toler-
ance level
We then return the current iterate as an approximate solution

In [5]: def solve_model(mcm, tol=1e-5, max_iter=2000):


"""
Iterates to convergence on the Bellman equations

mcm is an instance of McCallModel


"""
558 33. JOB SEARCH II: SEARCH AND SEPARATION

v = np.ones_like(mcm.w_vals) # Initial guess of v


h = 1 # Initial guess of h
i = 0
error = tol + 1

while error > tol and i < max_iter:


v_new, h_new = Q(v, h, (mcm.α, mcm.β, mcm.γ, mcm.c, mcm.σ, \
mcm.w_vals, mcm.�_vals)
)
error_1 = np.max(np.abs(v_new - v))
error_2 = np.abs(h_new - h)
error = max(error_1, error_2)
v = v_new
h = h_new
i += 1

return v, h

Let’s plot the approximate solutions 𝑣 and ℎ to see what they look like
We’ll use the default parameterizations found in the code above

In [6]: mcm = McCallModel()


v, h = solve_model(mcm)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(mcm.w_vals, v, 'b-', lw=2, alpha=0.7, label='$v$')


ax.plot(mcm.w_vals, [h] * len(mcm.w_vals), 'g-', lw=2, alpha=0.7, label='$h$')
ax.set_xlim(min(mcm.w_vals), max(mcm.w_vals))
ax.legend()
ax.grid()

plt.show()

The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on stay-
ing employed
33.6. THE RESERVATION WAGE 559

33.6 The Reservation Wage

Once 𝑣 and ℎ are known, the agent can use them to make decisions in the face of a given
wage offer
If 𝑣(𝑤) > ℎ, then working at wage 𝑤 is preferred to unemployment
If 𝑣(𝑤) < ℎ, then remaining unemployed will generate greater lifetime value
Suppose in particular that 𝑣 crosses ℎ (as it does in the preceding figure)
Then, since 𝑣 is increasing, there is a unique smallest 𝑤 in the set of possible wages such that
𝑣(𝑤) ≥ ℎ
We denote this wage 𝑤̄ and call it the reservation wage
Optimal behavior for the worker is characterized by 𝑤̄

• if the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts
• if the wage offer 𝑤 in hand is less than 𝑤,̄ then the worker rejects

Here’s a function compute_reservation_wage that takes an instance of McCallModel


and returns the reservation wage associated with a given model
It uses np.searchsorted to obtain the first 𝑤 in the set of possible wages such that 𝑣(𝑤) > ℎ
If 𝑣(𝑤) < ℎ for all 𝑤, then the function returns np.inf

In [7]: def compute_reservation_wage(mcm, return_values=False):


"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) > h.

If v(w) > h for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vals.

If v(w) < h for all w, then w_bar is set to np.inf.

"""

v, h = solve_model(mcm)
w_idx = np.searchsorted(v - h, 0)

if w_idx == len(v):
w_bar = np.inf
else:
w_bar = mcm.w_vals[w_idx]

if not return_values:
return w_bar
else:
return w_bar, v, h

Let’s use it to look at how the reservation wage varies with parameters
In each instance below, we’ll show you a figure and then ask you to reproduce it in the exer-
cises

33.6.1 The Reservation Wage and Unemployment Compensation

First, let’s look at how 𝑤̄ varies with unemployment compensation


560 33. JOB SEARCH II: SEARCH AND SEPARATION

In the figure below, we use the default parameters in the McCallModel class, apart from c
(which takes the values given on the horizontal axis)

As expected, higher unemployment compensation causes the worker to hold out for higher
wages
In effect, the cost of continuing job search is reduced

33.6.2 The Reservation Wage and Discounting

Next, let’s investigate how 𝑤̄ varies with the discount factor


The next figure plots the reservation wage associated with different values of 𝛽

Again, the results are intuitive: More patient workers will hold out for higher wages

33.6.3 The Reservation Wage and Job Destruction

Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼
33.7. EXERCISES 561

Higher 𝛼 translates to a greater chance that a worker will face termination in each period
once employed

Once more, the results are in line with our intuition


If the separation rate is high, then the benefit of holding out for a higher wage falls
Hence the reservation wage is lower

33.7 Exercises

33.7.1 Exercise 1

Reproduce all the reservation wage figures shown above

33.7.2 Exercise 2

Plot the reservation wage against the job offer rate 𝛾


Use

In [8]: grid_size = 25
γ_vals = np.linspace(0.05, 0.95, grid_size)

Interpret your results

33.8 Solutions

33.8.1 Exercise 1

Using the compute_reservation_wage function mentioned earlier in the lecture, we can


create an array for reservation wages for different values of 𝑐, 𝛽 and 𝛼 and plot the results like
so
562 33. JOB SEARCH II: SEARCH AND SEPARATION

In [9]: grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # values of unemployment compensation
w_bar_vals = np.empty_like(c_vals)

mcm = McCallModel()

fig, ax = plt.subplots(figsize=(10, 6))

for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.grid()

plt.show()

33.8.2 Exercise 2

Similar to above, we can plot 𝑤̄ against 𝛾 as follows

In [10]: grid_size = 25
γ_vals = np.linspace(0.05, 0.95, grid_size)
w_bar_vals = np.empty_like(γ_vals)

mcm = McCallModel()

fig, ax = plt.subplots(figsize=(10, 6))

for i, γ in enumerate(γ_vals):
mcm.γ = γ
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.plot(γ_vals, w_bar_vals, label=r'$\bar w$ as a function of $\gamma$')


33.8. SOLUTIONS 563

ax.set(xlabel='job offer rate', ylabel='reservation wage')


ax.grid()

plt.show()

As expected, the reservation wage increases in 𝛾


This is because higher 𝛾 translates to a more favorable job search environment
Hence workers are less willing to accept lower offers
564 33. JOB SEARCH II: SEARCH AND SEPARATION
34

A Problem that Stumped Milton


Friedman

34.1 Contents

• Overview 34.2

• Origin of the Problem 34.3

• A Dynamic Programming Approach 34.4

• Implementation 34.5

• Analysis 34.6

• Comparison with Neyman-Pearson Formulation 34.7

Co-authors: Chase Coleman


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

34.2 Overview

This lecture describes a statistical decision problem encountered by Milton Friedman and W.
Allen Wallis during World War II when they were analysts at the U.S. Government’s Statisti-
cal Research Group at Columbia University
This problem led Abraham Wald [132] to formulate sequential analysis, an approach to
statistical decision problems intimately related to dynamic programming
In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and
Wald’s problem
Key ideas in play will be:

• Bayes’ Law

565
566 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

• Dynamic programming
• Type I and type II statistical errors
– a type I error occurs when you reject a null hypothesis that is true
– a type II error is when you accept a null hypothesis that is false
• Abraham Wald’s sequential probability ratio test
• The power of a statistical test
• The critical region of a statistical test
• A uniformly most powerful test

We’ll begin with some imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from numba import njit, prange, vectorize
from interpolation import interp
from math import gamma

34.3 Origin of the Problem

On pages 137-139 of his 1998 book Two Lucky People with Rose Friedman [44], Milton Fried-
man described a problem presented to him and Allen Wallis during World War II, when they
worked at the US Government’s Statistical Research Group at Columbia University
Let’s listen to Milton Friedman tell us what happened

In order to understand the story, it is necessary to have an idea of a simple statis-


tical problem, and of the standard procedure for dealing with it. The actual prob-
lem out of which sequential analysis grew will serve. The Navy has two alternative
designs (say A and B) for a projectile. It wants to determine which is superior. To
do so it undertakes a series of paired firings. On each round, it assigns the value
1 or 0 to A accordingly as its performance is superior or inferior to that of B and
conversely 0 or 1 to B. The Navy asks the statistician how to conduct the test and
how to analyze the results.

The standard statistical answer was to specify a number of firings (say 1,000) and
a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1
in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in
fewer than 47%, B can be regarded as superior; if the percentage is between 47%
and 53%, neither can be so regarded.

When Allen Wallis was discussing such a problem with (Navy) Captain Garret L.
Schyler, the captain objected that such a test, to quote from Allen’s account, may
prove wasteful. If a wise and seasoned ordnance officer like Schyler were on the
premises, he would see after the first few thousand or even few hundred [rounds]
that the experiment need not be completed either because the new method is ob-
viously inferior or because it is obviously superior beyond what was hoped for …

Friedman and Wallis struggled with the problem but, after realizing that they were not able
to solve it, described the problem to Abraham Wald
That started Wald on the path that led him to Sequential Analysis [132]
We’ll formulate the problem using dynamic programming
34.4. A DYNAMIC PROGRAMMING APPROACH 567

34.4 A Dynamic Programming Approach

The following presentation of the problem closely follows Dmitri Berskekas’s treatment in
Dynamic Programming and Stochastic Control [14]
A decision-maker observes IID draws of a random variable 𝑧
He (or she) wants to know which of two probability distributions 𝑓0 or 𝑓1 governs 𝑧
After a number of draws, also to be determined, he makes a decision as to which of the distri-
butions is generating the draws he observes
He starts with prior

𝜋−1 = P{𝑓 = 𝑓0 ∣ no observations} ∈ (0, 1)

After observing 𝑘 + 1 observations 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , he updates this value to

𝜋𝑘 = P{𝑓 = 𝑓0 ∣ 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 }

which is calculated recursively by applying Bayes’ law:

𝜋𝑘 𝑓0 (𝑧𝑘+1 )
𝜋𝑘+1 = , 𝑘 = −1, 0, 1, …
𝜋𝑘 𝑓0 (𝑧𝑘+1 ) + (1 − 𝜋𝑘 )𝑓1 (𝑧𝑘+1 )

After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker believes that 𝑧𝑘+1 has probability distribu-
tion

𝑓𝜋𝑘 (𝑣) = 𝜋𝑘 𝑓0 (𝑣) + (1 − 𝜋𝑘 )𝑓1 (𝑣)

This is a mixture of distributions 𝑓0 and 𝑓1 , with the weight on 𝑓0 being the posterior proba-
bility that 𝑓 = 𝑓0 [1]
To help illustrate this kind of distribution, let’s inspect some mixtures of beta distributions
The density of a beta probability distribution with parameters 𝑎 and 𝑏 is


Γ(𝑎 + 𝑏)𝑧 𝑎−1 (1 − 𝑧)𝑏−1
𝑓(𝑧; 𝑎, 𝑏) = where Γ(𝑡) ∶= ∫ 𝑥𝑡−1 𝑒−𝑥 𝑑𝑥
Γ(𝑎)Γ(𝑏) 0

The next figure shows two beta distributions in the top panel
The bottom panel presents mixtures of these distributions, with various mixing probabilities
𝜋𝑘

In [3]: def beta_function_factory(a, b):

@vectorize
def p(x):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)

@njit
def p_rvs():
return np.random.beta(a, b)
568 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

return p, p_rvs

f0, _ = beta_function_factory(1, 1)
f1, _ = beta_function_factory(9, 9)
grid = np.linspace(0, 1, 50)

fig, axes = plt.subplots(2, figsize=(10, 8))

axes[0].set_title("Original Distributions")
axes[0].plot(grid, f0(grid), lw=2, label="$f_0$")
axes[0].plot(grid, f1(grid), lw=2, label="$f_1$")

axes[1].set_title("Mixtures")
for π in 0.25, 0.5, 0.75:
y = π * f0(grid) + (1 - π) * f1(grid)
axes[1].plot(y, lw=2, label=f"$\pi_k$ = {π}")

for ax in axes:
ax.legend()
ax.set(xlabel="$z$ values", ylabel="probability of $z_k$")

plt.tight_layout()
plt.show()

34.4.1 Losses and Costs

After observing 𝑧𝑘 , 𝑧𝑘−1 , … , 𝑧0 , the decision-maker chooses among three distinct actions:

• He decides that 𝑓 = 𝑓0 and draws no more 𝑧’s


34.4. A DYNAMIC PROGRAMMING APPROACH 569

• He decides that 𝑓 = 𝑓1 and draws no more 𝑧’s


• He postpones deciding now and instead chooses to draw a 𝑧𝑘+1

Associated with these three actions, the decision-maker can suffer three kinds of losses:

• A loss 𝐿0 if he decides 𝑓 = 𝑓0 when actually 𝑓 = 𝑓1


• A loss 𝐿1 if he decides 𝑓 = 𝑓1 when actually 𝑓 = 𝑓0
• A cost 𝑐 if he postpones deciding and chooses instead to draw another 𝑧

34.4.2 Digression on Type I and Type II Errors

If we regard 𝑓 = 𝑓0 as a null hypothesis and 𝑓 = 𝑓1 as an alternative hypothesis, then 𝐿1 and


𝐿0 are losses associated with two types of statistical errors

• a type I error is an incorrect rejection of a true null hypothesis (a “false positive”)


• a type II error is a failure to reject a false null hypothesis (a “false negative”)

So when we treat 𝑓 = 𝑓0 as the null hypothesis

• We can think of 𝐿1 as the loss associated with a type I error


• We can think of 𝐿0 as the loss associated with a type II error

34.4.3 Intuition

Let’s try to guess what an optimal decision rule might look like before we go further
Suppose at some given point in time that 𝜋 is close to 1
Then our prior beliefs and the evidence so far point strongly to 𝑓 = 𝑓0
If, on the other hand, 𝜋 is close to 0, then 𝑓 = 𝑓1 is strongly favored
Finally, if 𝜋 is in the middle of the interval [0, 1], then we have little information in either di-
rection
This reasoning suggests a decision rule such as the one shown in the figure

As we’ll see, this is indeed the correct form of the decision rule
The key problem is to determine the threshold values 𝛼, 𝛽, which will depend on the parame-
ters listed above
You might like to pause at this point and try to predict the impact of a parameter such as 𝑐
or 𝐿0 on 𝛼 or 𝛽

34.4.4 A Bellman Equation

Let 𝐽 (𝜋) be the total loss for a decision-maker with current belief 𝜋 who chooses optimally
570 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

With some thought, you will agree that 𝐽 should satisfy the Bellman equation

𝐽 (𝜋) = min {(1 − 𝜋)𝐿0 , 𝜋𝐿1 , 𝑐 + E[𝐽 (𝜋′ )]} (1)

where 𝜋′ is the random variable defined by

𝜋𝑓0 (𝑧 ′ )
𝜋′ = 𝜅(𝑧 ′ , 𝜋) =
𝜋𝑓0 (𝑧′ ) + (1 − 𝜋)𝑓1 (𝑧′ )

when 𝜋 is fixed and 𝑧′ is drawn from the current best guess, which is the distribution 𝑓 de-
fined by

𝑓𝜋 (𝑣) = 𝜋𝑓0 (𝑣) + (1 − 𝜋)𝑓1 (𝑣)

In the Bellman equation, minimization is over three actions:

1. Accept the hypothesis that 𝑓 = 𝑓0


2. Accept the hypothesis that 𝑓 = 𝑓1
3. Postpone deciding and draw again

We can represent the Bellman equation as

𝐽 (𝜋) = min {(1 − 𝜋)𝐿0 , 𝜋𝐿1 , ℎ(𝜋)} (2)

where 𝜋 ∈ [0, 1] and

• (1 − 𝜋)𝐿0 is the expected loss associated with accepting 𝑓0 (i.e., the cost of making a
type II error)
• 𝜋𝐿1 is the expected loss associated with accepting 𝑓1 (i.e., the cost of making a type I
error)
• ℎ(𝜋) ∶= 𝑐 + E[𝐽 (𝜋′ )] the continuation value; i.e., the expected cost associated with draw-
ing one more 𝑧

The optimal decision rule is characterized by two numbers 𝛼, 𝛽 ∈ (0, 1) × (0, 1) that satisfy

(1 − 𝜋)𝐿0 < min{𝜋𝐿1 , 𝑐 + E[𝐽 (𝜋′ )]} if 𝜋 ≥ 𝛼

and

𝜋𝐿1 < min{(1 − 𝜋)𝐿0 , 𝑐 + E[𝐽 (𝜋′ )]} if 𝜋 ≤ 𝛽

The optimal decision rule is then

accept 𝑓 = 𝑓0 if 𝜋 ≥ 𝛼
accept 𝑓 = 𝑓1 if 𝜋 ≤ 𝛽
draw another 𝑧 if 𝛽 ≤ 𝜋 ≤ 𝛼

Our aim is to compute the value function 𝐽 , and from it the associated cutoffs 𝛼 and 𝛽
34.5. IMPLEMENTATION 571

To make our computations simpler, using Eq. (2), we can write the continuation value ℎ(𝜋) as

ℎ(𝜋) = 𝑐 + E[𝐽 (𝜋′ )]


= 𝑐 + E𝜋′ min{(1 − 𝜋′ )𝐿0 , 𝜋′ 𝐿1 , ℎ(𝜋′ )}
(3)
= 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧 ′ )𝑑𝑧′

The equality

ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧 ′ )𝑑𝑧′ (4)

can be understood as a functional equation, where ℎ is the unknown


Using the functional equation, Eq. (4), for the continuation value, we can back out optimal
choices using the RHS of Eq. (2)
This functional equation can be solved by taking an initial guess and iterating to find the
fixed point
In other words, we iterate with an operator 𝑄, where

𝑄ℎ(𝜋) = 𝑐 + ∫ min{(1 − 𝜅(𝑧 ′ , 𝜋))𝐿0 , 𝜅(𝑧 ′ , 𝜋)𝐿1 , ℎ(𝜅(𝑧 ′ , 𝜋))}𝑓𝜋 (𝑧 ′ )𝑑𝑧 ′

34.5 Implementation

First, we will construct a class to store the parameters of the model

In [4]: class WaldFriedman:

def __init__(self,
c=1.25, # Cost of another draw
a0=1,
b0=1,
a1=3,
b1=1.2,
L0=25, # Cost of selecting f0 when f1 is true
L1=25, # Cost of selecting f1 when f0 is true
π_grid_size=200,
mc_size=1000):

self.c, self.π_grid_size = c, π_grid_size


self.L0, self.L1 = L0, L1
self.π_grid = np.linspace(0, 1, π_grid_size)
self.mc_size = mc_size

# Set up distributions
self.f0, self.f0_rvs = beta_function_factory(a0, b0)
self.f1, self.f1_rvs = beta_function_factory(a1, b1)

self.z0 = np.random.beta(a0, b0, mc_size)


self.z1 = np.random.beta(a1, b1, mc_size)

As in the optimal growth lecture, to approximate a continuous value function

• We iterate at a finite grid of possible values of 𝜋


• When we evaluate E[𝐽 (𝜋′ )] between grid points, we use linear interpolation
572 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

The function operator_factory returns the operator Q

In [5]: def operator_factory(wf, parallel_flag=True):

"""
Returns a jitted version of the Q operator.

* wf is an instance of the WaldFriedman class


"""

c, π_grid = wf.c, wf.π_grid


L0, L1 = wf.L0, wf.L1
f0, f1 = wf.f0, wf.f1
z0, z1 = wf.z0, wf.z1
mc_size = wf.mc_size

@njit
def κ(z, π):
"""
Updates π using Bayes' rule and the current observation z.
"""
π_f0, π_f1 = π * f0(z), (1 - π) * f1(z)
π_new = π_f0 / (π_f0 + π_f1)

return π_new

@njit(parallel=parallel_flag)
def Q(h):
h_new = np.empty_like(π_grid)
h_func = lambda p: interp(π_grid, h, p)

for i in prange(len(π_grid)):
π = π_grid[i]

# Find the expected value of J by integrating over z


integral_f0, integral_f1 = 0, 0
for m in range(mc_size):
π_0 = κ(z0[m], π) # Draw z from f0 and update π
integral_f0 += min((1 - π_0) * L0, π_0 * L1, h_func(π_0))

π_1 = κ(z1[m], π) # Draw z from f1 and update π


integral_f1 += min((1 - π_1) * L0, π_1 * L1, h_func(π_1))

integral = (π * integral_f0 + (1 - π) * integral_f1) / mc_size

h_new[i] = c + integral

return h_new

return Q

To solve the model, we will iterate using Q to find the fixed point

In [6]: def solve_model(wf,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

"""
Compute the continuation value function

* wf is an instance of WaldFriedman
"""

Q = operator_factory(wf, parallel_flag=use_parallel)

# Set up loop
h = np.zeros(len(wf.π_grid))
34.6. ANALYSIS 573

i = 0
error = tol + 1

while i < max_iter and error > tol:


h_new = Q(h)
error = np.max(np.abs(h - h_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
h = h_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return h_new

34.6 Analysis

Let’s inspect the model’s solutions


We will be using the default parameterization with distributions like so

In [7]: wf = WaldFriedman()

fig, ax = plt.subplots(figsize=(10, 6))


ax.plot(wf.f0(wf.π_grid), label="$f_0$")
ax.plot(wf.f1(wf.π_grid), label="$f_1$")
ax.set(ylabel="probability of $z_k$", xlabel="$k$", title="Distributions")
ax.legend()

plt.show()
574 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

34.6.1 Value Function

To solve the model, we will call our solve_model function

In [8]: h_star = solve_model(wf) # solve the model

Error at iteration 25 is 8.333273109428774e-05.

Converged in 25 iterations.

We will also set up a function to compute the cutoffs 𝛼 and 𝛽 and plot these on our value
function plot

In [9]: def find_cutoff_rule(wf, h):


"""
This function takes a continuation value function and returns the corresponding
cutoffs of where you transition between continuing and choosing a
specific model
"""
π_grid = wf.π_grid
L0, L1 = wf.L0, wf.L1

# Evaluate cost at all points on grid for choosing a model


payoff_f0 = (1 - π_grid) * L0
payoff_f1 = π_grid * L1

# The cutoff points can be found by differencing these costs with


# the Bellman equation (J is always less than or equal to p_c_i)
β = π_grid[np.searchsorted(payoff_f1 - np.minimum(h, payoff_f0), 1e-10) - 1]
α = π_grid[np.searchsorted(np.minimum(h, payoff_f1) - payoff_f0, 1e-10) - 1]

return (β, α)

β, α = find_cutoff_rule(wf, h_star)
cost_L0 = (1 - wf.π_grid) * wf.L0
cost_L1 = wf.π_grid * wf.L1

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(wf.π_grid, h_star, label='continuation value')


ax.plot(wf.π_grid, cost_L1, label='choose f1')
ax.plot(wf.π_grid, cost_L0, label='choose f0')
ax.plot(wf.π_grid, np.amin(np.column_stack([h_star, cost_L0, cost_L1]), axis=1),
lw=15, alpha=0.1, color='b', label='minimum cost')

ax.annotate(r"$\beta$", xy=(β + 0.01, 0.5), fontsize=14)


ax.annotate(r"$\alpha$", xy=(α + 0.01, 0.5), fontsize=14)

plt.vlines(β, 0, β * wf.L0, linestyle="--")


plt.vlines(α, 0, (1 - α) * wf.L1, linestyle="--")

ax.set(xlim=(0, 1), ylim=(0, 0.5 * max(wf.L0, wf.L1)), ylabel="cost",


xlabel="$\pi$", title="Value function")

plt.legend(borderpad=1.1)
plt.show()
34.6. ANALYSIS 575

The value function equals 𝜋𝐿1 for 𝜋 ≤ 𝛽, and (1 − 𝜋)𝐿0 for 𝜋 ≥ 𝛼


The slopes of the two linear pieces of the value function are determined by 𝐿1 and −𝐿0
The value function is smooth in the interior region, where the posterior probability assigned
to 𝑓0 is in the indecisive region 𝜋 ∈ (𝛽, 𝛼)
The decision-maker continues to sample until the probability that he attaches to model 𝑓0
falls below 𝛽 or above 𝛼

34.6.2 Simulations

The next figure shows the outcomes of 500 simulations of the decision process
On the left is a histogram of the stopping times, which equal the number of draws of 𝑧𝑘 re-
quired to make a decision
The average number of draws is around 6.6
On the right is the fraction of correct decisions at the stopping time
In this case, the decision-maker is correct 80% of the time

In [10]: def simulate(wf, true_dist, h_star, π_0=0.5):


"""
This function takes an initial condition and simulates until it
stops (when a decision is made).
"""

f0, f1 = wf.f0, wf.f1


f0_rvs, f1_rvs = wf.f0_rvs, wf.f1_rvs
π_grid = wf.π_grid

def κ(z, π):


"""
Updates π using Bayes' rule and the current observation z.
"""
576 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

π_f0, π_f1 = π * f0(z), (1 - π) * f1(z)


π_new = π_f0 / (π_f0 + π_f1)

return π_new

if true_dist == "f0":
f, f_rvs = wf.f0, wf.f0_rvs
elif true_dist == "f1":
f, f_rvs = wf.f1, wf.f1_rvs

# Find cutoffs
β, α = find_cutoff_rule(wf, h_star)

# Initialize a couple of useful variables


decision_made = False
π = π_0
t = 0

while decision_made is False:


# Maybe should specify which distribution is correct one so that
# the draws come from the "right" distribution
z = f_rvs()
t = t + 1
π = κ(z, π)
if π < β:
decision_made = True
decision = 1
elif π > α:
decision_made = True
decision = 0

if true_dist == "f0":
if decision == 0:
correct = True
else:
correct = False

elif true_dist == "f1":


if decision == 1:
correct = True
else:
correct = False

return correct, π, t

def stopping_dist(wf, h_star, ndraws=250, true_dist="f0"):


"""
Simulates repeatedly to get distributions of time needed to make a
decision and how often they are correct.
"""

tdist = np.empty(ndraws, int)


cdist = np.empty(ndraws, bool)

for i in range(ndraws):
correct, π, t = simulate(wf, true_dist, h_star)
tdist[i] = t
cdist[i] = correct

return cdist, tdist

def simulation_plot(wf):
h_star = solve_model(wf)
ndraws = 500
cdist, tdist = stopping_dist(wf, h_star, ndraws)

fig, ax = plt.subplots(1, 2, figsize=(16, 5))

ax[0].hist(tdist, bins=np.max(tdist))
ax[0].set_title(f"Stopping times over {ndraws} replications")
ax[0].set(xlabel="time", ylabel="number of stops")
ax[0].annotate(f"mean = {np.mean(tdist)}", xy=(max(tdist) / 2,
max(np.histogram(tdist, bins=max(tdist))[0]) / 2))
34.6. ANALYSIS 577

ax[1].hist(cdist.astype(int), bins=2)
ax[1].set_title(f"Correct decisions over {ndraws} replications")
ax[1].annotate(f"% correct = {np.mean(cdist)}",
xy=(0.05, ndraws / 2))

plt.show()

simulation_plot(wf)

Error at iteration 25 is 8.333273109428774e-05.

Converged in 25 iterations.

34.6.3 Comparative Statics

Now let’s consider the following exercise


We double the cost of drawing an additional observation
Before you look, think about what will happen:

• Will the decision-maker be correct more or less often?


• Will he make decisions sooner or later?

In [11]: wf = WaldFriedman(c=2.5)
simulation_plot(wf)

Converged in 13 iterations.
578 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

Increased cost per draw has induced the decision-maker to take less draws before deciding
Because he decides with less, the percentage of time he is correct drops
This leads to him having a higher expected loss when he puts equal weight on both models

34.6.4 A Notebook Implementation

To facilitate comparative statics, we provide a Jupyter notebook that generates the same
plots, but with sliders
With these sliders, you can adjust parameters and immediately observe

• effects on the smoothness of the value function in the indecisive middle range as we in-
crease the number of grid points in the piecewise linear approximation
• effects of different settings for the cost parameters 𝐿0 , 𝐿1 , 𝑐, the parameters of two beta
distributions 𝑓0 and 𝑓1 , and the number of points and linear functions 𝑚 to use in the
piece-wise continuous approximation to the value function
• various simulations from 𝑓0 and associated distributions of waiting times to making a
decision
• associated histograms of correct and incorrect decisions

34.7 Comparison with Neyman-Pearson Formulation

For several reasons, it is useful to describe the theory underlying the test that Navy Captain
G. S. Schuyler had been told to use and that led him to approach Milton Friedman and Allan
Wallis to convey his conjecture that superior practical procedures existed
Evidently, the Navy had told Captail Schuyler to use what it knew to be a state-of-the-art
Neyman-Pearson test
We’ll rely on Abraham Wald’s [132] elegant summary of Neyman-Pearson theory
For our purposes, watch for there features of the setup:

• the assumption of a fixed sample size 𝑛


• the application of laws of large numbers, conditioned on alternative probability models,
to interpret the probabilities 𝛼 and 𝛽 defined in the Neyman-Pearson theory

Recall that in the sequential analytic formulation above, that

• The sample size 𝑛 is not fixed but rather an object to be chosen; technically 𝑛 is a ran-
dom variable
• The parameters 𝛽 and 𝛼 characterize cut-off rules used to determine 𝑛 as a random
variable
• Laws of large numbers make no appearances in the sequential construction

In chapter 1 of Sequential Analysis [132] Abraham Wald summarizes the Neyman-Pearson


approach to hypothesis testing
34.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 579

Wald frames the problem as making a decision about a probability distribution that is par-
tially known
(You have to assume that something is already known in order to state a well-posed problem
– usually, something means a lot)
By limiting what is unknown, Wald uses the following simple structure to illustrate the main
ideas:

• a decision-maker wants to decide which of two distributions 𝑓0 , 𝑓1 govern an IID ran-


dom variable 𝑧
• The null hypothesis 𝐻0 is the statement that 𝑓0 governs the data
• The alternative hypothesis 𝐻1 is the statement that 𝑓1 governs the data
• The problem is to devise and analyze a test of hypothesis 𝐻0 against the alternative
hypothesis 𝐻1 on the basis of a sample of a fixed number 𝑛 independent observations
𝑧1 , 𝑧2 , … , 𝑧𝑛 of the random variable 𝑧

To quote Abraham Wald,

A test procedure leading to the acceptance or rejection of the [null] hypothesis in


question is simply a rule specifying, for each possible sample of size 𝑛, whether the
[null] hypothesis should be accepted or rejected on the basis of the sample. This
may also be expressed as follows: A test procedure is simply a subdivision of the
totality of all possible samples of size 𝑛 into two mutually exclusive parts, say part
1 and part 2, together with the application of the rule that the [null] hypothesis
be accepted if the observed sample is contained in part 2. Part 1 is also called the
critical region. Since part 2 is the totality of all samples of size 𝑛 which are not
included in part 1, part 2 is uniquely determined by part 1. Thus, choosing a test
procedure is equivalent to determining a critical region.

Let’s listen to Wald longer:

As a basis for choosing among critical regions the following considerations have
been advanced by Neyman and Pearson: In accepting or rejecting 𝐻0 we may
commit errors of two kinds. We commit an error of the first kind if we reject 𝐻0
when it is true; we commit an error of the second kind if we accept 𝐻0 when 𝐻1
is true. After a particular critical region 𝑊 has been chosen, the probability of
committing an error of the first kind, as well as the probability of committing an
error of the second kind is uniquely determined. The probability of committing an
error of the first kind is equal to the probability, determined by the assumption
that 𝐻0 is true, that the observed sample will be included in the critical region 𝑊 .
The probability of committing an error of the second kind is equal to the proba-
bility, determined on the assumption that 𝐻1 is true, that the probability will fall
outside the critical region 𝑊 . For any given critical region 𝑊 we shall denote the
probability of an error of the first kind by 𝛼 and the probability of an error of the
second kind by 𝛽.

Let’s listen carefully to how Wald applies law of large numbers to interpret 𝛼 and 𝛽:

The probabilities 𝛼 and 𝛽 have the following important practical interpretation:


Suppose that we draw a large number of samples of size 𝑛. Let 𝑀 be the num-
ber of such samples drawn. Suppose that for each of these 𝑀 samples we reject
580 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN

𝐻0 if the sample is included in 𝑊 and accept 𝐻0 if the sample lies outside 𝑊 . In


this way we make 𝑀 statements of rejection or acceptance. Some of these state-
ments will in general be wrong. If 𝐻0 is true and if 𝑀 is large, the probability is
nearly 1 (i.e., it is practically certain) that the proportion of wrong statements
(i.e., the number of wrong statements divided by 𝑀 ) will be approximately 𝛼. If
𝐻1 is true, the probability is nearly 1 that the proportion of wrong statements will
be approximately 𝛽. Thus, we can say that in the long run [ here Wald applies
law of large numbers by driving 𝑀 → ∞ (our comment, not Wald’s) ] the propor-
tion of wrong statements will be 𝛼 if 𝐻0 is true and 𝛽 if 𝐻1 is true.

The quantity 𝛼 is called the size of the critical region, and the quantity 1 − 𝛽 is called the
power of the critical region
Wald notes that

one critical region 𝑊 is more desirable than another if it has smaller values of 𝛼
and 𝛽. Although either 𝛼 or 𝛽 can be made arbitrarily small by a proper choice of
the critical region 𝑊 , it is possible to make both 𝛼 and 𝛽 arbitrarily small for a
fixed value of 𝑛, i.e., a fixed sample size.

Wald summarizes Neyman and Pearson’s setup as follows:

Neyman and Pearson show that a region consisting of all samples (𝑧1 , 𝑧2 , … , 𝑧𝑛 )
which satisfy the inequality

𝑓1 (𝑧1 ) ⋯ 𝑓1 (𝑧𝑛 )
≥𝑘
𝑓0 (𝑧1 ) ⋯ 𝑓0 (𝑧𝑛 )

is a most powerful critical region for testing the hypothesis 𝐻0 against the alternative hy-
pothesis 𝐻1 . The term 𝑘 on the right side is a constant chosen so that the region will have
the required size 𝛼.
Wald goes on to discuss Neyman and Pearson’s concept of uniformly most powerful test
Here is how Wald introduces the notion of a sequential test

A rule is given for making one of the following three decisions at any stage of the
experiment (at the m th trial for each integral value of m ): (1) to accept the hy-
pothesis H , (2) to reject the hypothesis H , (3) to continue the experiment by
making an additional observation. Thus, such a test procedure is carried out se-
quentially. On the basis of the first observation, one of the aforementioned deci-
sion is made. If the first or second decision is made, the process is terminated. If
the third decision is made, a second trial is performed. Again, on the basis of the
first two observations, one of the three decision is made. If the third decision is
made, a third trial is performed, and so on. The process is continued until either
the first or the second decisions is made. The number n of observations required
by such a test procedure is a random variable, since the value of n depends on the
outcome of the observations.

Footnotes
34.7. COMPARISON WITH NEYMAN-PEARSON FORMULATION 581

[1] Because the decision-maker believes that 𝑧𝑘+1 is drawn from a mixture of two IID distri-
butions, he does not believe that the sequence [𝑧𝑘+1 , 𝑧𝑘+2 , …] is IID Instead, he believes that
it is exchangeable. See [79] chapter 11, for a discussion of exchangeability.
582 34. A PROBLEM THAT STUMPED MILTON FRIEDMAN
35

Job Search III: Search with


Learning

35.1 Contents

• Overview 35.2

• Model 35.3

• Take 1: Solution by VFI 35.4

• Take 2: A More Efficient Method 35.5

• Exercises 35.6

• Solutions 35.7

• Appendix 35.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install interpolation

35.2 Overview

In this lecture, we consider an extension of the previously studied job search model of McCall
[94]
In the McCall model, an unemployed worker decides when to accept a permanent position at
a specified wage, given

• his or her discount factor


• the level of unemployment compensation
• the distribution from which wage offers are drawn

In the version considered below, the wage distribution is unknown and must be learned

• The following is based on the presentation in [87], section 6.6

583
584 35. JOB SEARCH III: SEARCH WITH LEARNING

Let’s start with some imports

In [2]: from numba import njit, prange, vectorize


from interpolation import mlinterp, interp
from math import gamma
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import cm

35.2.1 Model Features

• Infinite horizon dynamic programming with two states and one binary control
• Bayesian updating to learn the unknown distribution

35.3 Model

Let’s first review the basic McCall model [94] and then add the variation we want to consider

35.3.1 The Basic McCall Model

Recall that, in the baseline model, an unemployed worker is presented in each period with a
permanent job offer at wage 𝑊𝑡
At time 𝑡, our worker either

1. accepts the offer and works permanently at constant wage 𝑊𝑡


2. rejects the offer, receives unemployment compensation 𝑐 and reconsiders next period

The wage sequence {𝑊𝑡 } is IID and generated from known density 𝑞

The worker aims to maximize the expected discounted sum of earnings E ∑𝑡=0 𝛽 𝑡 𝑦𝑡 The func-
tion 𝑉 satisfies the recursion

𝑤
𝑣(𝑤) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ )𝑞(𝑤′ )𝑑𝑤′ } (1)
1−𝛽

The optimal policy has the form 1{𝑤 ≥ 𝑤},


̄ where 𝑤̄ is a constant depending called the reser-
vation wage

35.3.2 Offer Distribution Unknown

Now let’s extend the model by considering the variation presented in [87], section 6.6
The model is as above, apart from the fact that

• the density 𝑞 is unknown


• the worker learns about 𝑞 by starting with a prior and updating based on wage offers
that he/she observes
35.3. MODEL 585

The worker knows there are two possible distributions 𝐹 and 𝐺 — with densities 𝑓 and 𝑔
At the start of time, “nature” selects 𝑞 to be either 𝑓 or 𝑔 — the wage distribution from
which the entire sequence {𝑊𝑡 } will be drawn
This choice is not observed by the worker, who puts prior probability 𝜋0 on 𝑓 being chosen
Update rule: worker’s time 𝑡 estimate of the distribution is 𝜋𝑡 𝑓 + (1 − 𝜋𝑡 )𝑔, where 𝜋𝑡 updates
via

𝜋𝑡 𝑓(𝑤𝑡+1 )
𝜋𝑡+1 = (2)
𝜋𝑡 𝑓(𝑤𝑡+1 ) + (1 − 𝜋𝑡 )𝑔(𝑤𝑡+1 )

This last expression follows from Bayes’ rule, which tells us that

P{𝑊 = 𝑤 | 𝑞 = 𝑓}P{𝑞 = 𝑓}
P{𝑞 = 𝑓 | 𝑊 = 𝑤} = and P{𝑊 = 𝑤} = ∑ P{𝑊 = 𝑤 | 𝑞 = 𝜔}P{𝑞 = 𝜔}
P{𝑊 = 𝑤} 𝜔∈{𝑓,𝑔}

The fact that Eq. (2) is recursive allows us to progress to a recursive solution method
Letting

𝜋𝑓(𝑤)
𝑞𝜋 (𝑤) ∶= 𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤) and 𝜅(𝑤, 𝜋) ∶=
𝜋𝑓(𝑤) + (1 − 𝜋)𝑔(𝑤)

we can express the value function for the unemployed worker recursively as follows

𝑤
𝑣(𝑤, 𝜋) = max { , 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ } where 𝜋′ = 𝜅(𝑤′ , 𝜋) (3)
1−𝛽

Notice that the current guess 𝜋 is a state variable, since it affects the worker’s perception of
probabilities for future rewards

35.3.3 Parameterization

Following section 6.6 of [87], our baseline parameterization will be

• 𝑓 is Beta(1, 1)
• 𝑔 is Beta(3, 1.2)
• 𝛽 = 0.95 and 𝑐 = 0.3

The densities 𝑓 and 𝑔 have the following shape

In [3]: def beta_function_factory(a, b):

@vectorize
def p(x):
r = gamma(a + b) / (gamma(a) * gamma(b))
return r * x**(a-1) * (1 - x)**(b-1)

return p

x_grid = np.linspace(0, 1, 100)


586 35. JOB SEARCH III: SEARCH WITH LEARNING

f = beta_function_factory(1, 1)
g = beta_function_factory(3, 1.2)

plt.figure(figsize=(10, 8))
plt.plot(x_grid, f(x_grid), label='$f$', lw=2)
plt.plot(x_grid, g(x_grid), label='$g$', lw=2)

plt.legend()
plt.show()

35.3.4 Looking Forward

What kind of optimal policy might result from Eq. (3) and the parameterization specified
above?
Intuitively, if we accept at 𝑤𝑎 and 𝑤𝑎 ≤ 𝑤𝑏 , then — all other things being given — we should
also accept at 𝑤𝑏
This suggests a policy of accepting whenever 𝑤 exceeds some threshold value 𝑤̄
But 𝑤̄ should depend on 𝜋 — in fact, it should be decreasing in 𝜋 because

• 𝑓 is a less attractive offer distribution than 𝑔


• larger 𝜋 means more weight on 𝑓 and less on 𝑔

Thus larger 𝜋 depresses the worker’s assessment of her future prospects, and relatively low
current offers become more attractive
35.4. TAKE 1: SOLUTION BY VFI 587

Summary: We conjecture that the optimal policy is of the form 1{𝑤 ≥ 𝑤(𝜋)}
̄ for some de-
creasing function 𝑤̄

35.4 Take 1: Solution by VFI

Let’s set about solving the model and see how our results match with our intuition
We begin by solving via value function iteration (VFI), which is natural but ultimately turns
out to be second best
The class SearchProblem is used to store parameters and methods needed to compute opti-
mal actions

In [4]: class SearchProblem:


"""
A class to store a given parameterization of the "offer distribution
unknown" model.

"""

def __init__(self,
β=0.95, # Discount factor
c=0.3, # Unemployment compensation
F_a=1,
F_b=1,
G_a=3,
G_b=1.2,
w_max=1, # Maximum wage possible
w_grid_size=100,
π_grid_size=100,
mc_size=500):

self.β, self.c, self.w_max = β, c, w_max

self.f = beta_function_factory(F_a, F_b)


self.g = beta_function_factory(G_a, G_b)

self.π_min, self.π_max = 1e-3, 1-1e-3 # Avoids instability


self.w_grid = np.linspace(0, w_max, w_grid_size)
self.π_grid = np.linspace(self.π_min, self.π_max, π_grid_size)

self.mc_size = mc_size

self.w_f = np.random.beta(F_a, F_b, mc_size)


self.w_g = np.random.beta(G_a, G_b, mc_size)

The following function takes an instance of this class and returns jitted versions of the Bell-
man operator T, and a get_greedy() function to compute the approximate optimal policy
from a guess v of the value function

In [5]: def operator_factory(sp, parallel_flag=True):

f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid

@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 - π) * g(w)
588 35. JOB SEARCH III: SEARCH WITH LEARNING

π_new = pf / (pf + pg)

return π_new

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator.

"""
v_func = lambda x, y: mlinterp((w_grid, π_grid), v, (x, y))
v_new = np.empty_like(v)

for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]

v_1 = w / (1 - β)

integral_f, integral_g = 0.0, 0.0


for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π))
integral_g += v_func(w_g[m], κ(w_g[m], π))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size

v_2 = c + β * integral
v_new[i, j] = max(v_1, v_2)

return v_new

@njit(parallel=parallel_flag)
def get_greedy(v):
""""
Compute optimal actions taking v as the value function.

"""

v_func = lambda x, y: mlinterp((w_grid, π_grid), v, (x, y))


σ = np.empty_like(v)

for i in prange(len(w_grid)):
for j in prange(len(π_grid)):
w = w_grid[i]
π = π_grid[j]

v_1 = w / (1 - β)

integral_f, integral_g = 0.0, 0.0


for m in prange(mc_size):
integral_f += v_func(w_f[m], κ(w_f[m], π))
integral_g += v_func(w_g[m], κ(w_g[m], π))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size

v_2 = c + β * integral

σ[i, j] = v_1 > v_2 # Evaluates to 1 or 0

return σ

return T, get_greedy

We will omit a detailed discussion of the code because there is a more efficient solution
method that we will use later
To solve the model we will use the following function that iterates using T to find a fixed
point

In [6]: def solve_model(sp,


use_parallel=True,
35.4. TAKE 1: SOLUTION BY VFI 589

tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=5):

"""
Solves for the value function

* sp is an instance of SearchProblem
"""

T, _ = operator_factory(sp, use_parallel)

# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)

# Initialize v
v = np.zeros((m, n)) + sp.c / (1 - sp.β)

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Let’s look at solutions computed from value function iteration

In [7]: sp = SearchProblem()
v_star = solve_model(sp)
fig, ax = plt.subplots(figsize=(6, 6))
ax.contourf(sp.π_grid, sp.w_grid, v_star, 12, alpha=0.6, cmap=cm.jet)
cs = ax.contour(sp.π_grid, sp.w_grid, v_star, 12, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.set(xlabel='$\pi$', ylabel='$w$')

plt.show()

Error at iteration 5 is 0.6125085986473717.


Error at iteration 10 is 0.09459335987390638.
Error at iteration 15 is 0.02032106390906918.
Error at iteration 20 is 0.0045807183791186645.
Error at iteration 25 is 0.0010329776108157773.
Error at iteration 30 is 0.00023294063488954464.

Converged in 33 iterations.
590 35. JOB SEARCH III: SEARCH WITH LEARNING

We will also plot the optimal policy

In [8]: T, get_greedy = operator_factory(sp)


σ_star = get_greedy(v_star)

fig, ax = plt.subplots(figsize=(6, 6))


ax.contourf(sp.π_grid, sp.w_grid, σ_star, 1, alpha=0.6, cmap=cm.jet)
ax.contour(sp.π_grid, sp.w_grid, σ_star, 1, colors="black")
ax.set(xlabel='$\pi$', ylabel='$w$')

ax.text(0.5, 0.6, 'reject')


ax.text(0.7, 0.9, 'accept')

plt.show()
35.5. TAKE 2: A MORE EFFICIENT METHOD 591

The results fit well with our intuition from section looking forward

• The black line in the figure above corresponds to the function 𝑤(𝜋)
̄ introduced there
• It is decreasing as expected

35.5 Take 2: A More Efficient Method

Let’s consider another method to solve for the optimal policy


We will use iteration with an operator that has the same contraction rate as the Bellman op-
erator, but

• one dimensional rather than two dimensional


• no maximization step

As a consequence, the algorithm is orders of magnitude faster than VFI


This section illustrates the point that when it comes to programming, a bit of mathematical
analysis goes a long way
592 35. JOB SEARCH III: SEARCH WITH LEARNING

35.5.1 Another Functional Equation

To begin, note that when 𝑤 = 𝑤(𝜋),


̄ the worker is indifferent between accepting and rejecting
Hence the two choices on the right-hand side of Eq. (3) have equal value:

𝑤(𝜋)
̄
= 𝑐 + 𝛽 ∫ 𝑣(𝑤′ , 𝜋′ ) 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (4)
1−𝛽

Together, Eq. (3) and Eq. (4) give

𝑤 𝑤(𝜋)
̄
𝑣(𝑤, 𝜋) = max { , } (5)
1−𝛽 1−𝛽

Combining Eq. (4) and Eq. (5), we obtain

𝑤(𝜋)
̄ 𝑤′ 𝑤(𝜋
̄ ′)
= 𝑐 + 𝛽 ∫ max { , } 𝑞𝜋 (𝑤′ ) 𝑑𝑤′
1−𝛽 1−𝛽 1−𝛽

Multiplying by 1 − 𝛽, substituting in 𝜋′ = 𝜅(𝑤′ , 𝜋) and using ∘ for composition of functions


yields

𝑤(𝜋)
̄ = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝑤̄ ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (6)

Equation Eq. (6) can be understood as a functional equation, where 𝑤̄ is the unknown func-
tion

• Let’s call it the reservation wage functional equation (RWFE)


• The solution 𝑤̄ to the RWFE is the object that we wish to compute

35.5.2 Solving the RWFE

To solve the RWFE, we will first show that its solution is the fixed point of a contraction
mapping
To this end, let

• 𝑏[0, 1] be the bounded real-valued functions on [0, 1]


• ‖𝜔‖ ∶= sup𝑥∈[0,1] |𝜔(𝑥)|

Consider the operator 𝑄 mapping 𝜔 ∈ 𝑏[0, 1] into 𝑄𝜔 ∈ 𝑏[0, 1] via

(𝑄𝜔)(𝜋) = (1 − 𝛽)𝑐 + 𝛽 ∫ max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (7)

Comparing Eq. (6) and Eq. (7), we see that the set of fixed points of 𝑄 exactly coincides with
the set of solutions to the RWFE

• If 𝑄𝑤̄ = 𝑤̄ then 𝑤̄ solves Eq. (6) and vice versa


35.5. TAKE 2: A MORE EFFICIENT METHOD 593

Moreover, for any 𝜔, 𝜔′ ∈ 𝑏[0, 1], basic algebra and the triangle inequality for integrals tells us
that

|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |max {𝑤′ , 𝜔 ∘ 𝜅(𝑤′ , 𝜋)} − max {𝑤′ , 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)}| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ (8)

Working case by case, it is easy to check that for real numbers 𝑎, 𝑏, 𝑐 we always have

| max{𝑎, 𝑏} − max{𝑎, 𝑐}| ≤ |𝑏 − 𝑐| (9)

Combining Eq. (8) and Eq. (9) yields

|(𝑄𝜔)(𝜋) − (𝑄𝜔′ )(𝜋)| ≤ 𝛽 ∫ |𝜔 ∘ 𝜅(𝑤′ , 𝜋) − 𝜔′ ∘ 𝜅(𝑤′ , 𝜋)| 𝑞𝜋 (𝑤′ ) 𝑑𝑤′ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (10)

Taking the supremum over 𝜋 now gives us

‖𝑄𝜔 − 𝑄𝜔′ ‖ ≤ 𝛽‖𝜔 − 𝜔′ ‖ (11)

In other words, 𝑄 is a contraction of modulus 𝛽 on the complete metric space (𝑏[0, 1], ‖ ⋅ ‖)
Hence

• A unique solution 𝑤̄ to the RWFE exists in 𝑏[0, 1]


• 𝑄𝑘 𝜔 → 𝑤̄ uniformly as 𝑘 → ∞, for any 𝜔 ∈ 𝑏[0, 1]

Implementation
The following function takes an instance of SearchProblem and returns the operator Q

In [9]: def Q_factory(sp, parallel_flag=True):

f, g = sp.f, sp.g
w_f, w_g = sp.w_f, sp.w_g
β, c = sp.β, sp.c
mc_size = sp.mc_size
w_grid, π_grid = sp.w_grid, sp.π_grid

@njit
def κ(w, π):
"""
Updates π using Bayes' rule and the current wage observation w.
"""
pf, pg = π * f(w), (1 - π) * g(w)
π_new = pf / (pf + pg)

return π_new

@njit(parallel=parallel_flag)
def Q(ω):
"""

Updates the reservation wage function guess ω via the operator


Q.

"""
ω_func = lambda p: interp(π_grid, ω, p)
ω_new = np.empty_like(ω)
594 35. JOB SEARCH III: SEARCH WITH LEARNING

for i in prange(len(π_grid)):
π = π_grid[i]
integral_f, integral_g = 0.0, 0.0

for m in prange(mc_size):
integral_f += max(w_f[m], ω_func(κ(w_f[m], π)))
integral_g += max(w_g[m], ω_func(κ(w_g[m], π)))
integral = (π * integral_f + (1 - π) * integral_g) / mc_size

ω_new[i] = (1 - β) * c + β * integral

return ω_new

return Q

In the next exercise, you are asked to compute an approximation to 𝑤̄

35.6 Exercises

35.6.1 Exercise 1

Use the default parameters and Q_factory to compute an optimal policy


Your result should coincide closely with the figure for the optimal policy shown above
Try experimenting with different parameters, and confirm that the change in the optimal pol-
icy coincides with your intuition

35.7 Solutions

35.7.1 Exercise 1

This code solves the “Offer Distribution Unknown” model by iterating on a guess of the reser-
vation wage function
You should find that the run time is shorter than that of the value function approach
Similar to above, we set up a function to iterate with Q to find the fixed point

In [10]: def solve_wbar(sp,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=5):

Q = Q_factory(sp, use_parallel)

# Set up loop
i = 0
error = tol + 1
m, n = len(sp.w_grid), len(sp.π_grid)

# Initialize w
w = np.ones_like(sp.π_grid)

while i < max_iter and error > tol:


w_new = Q(w)
error = np.max(np.abs(w - w_new))
i += 1
if verbose and i % print_skip == 0:
35.7. SOLUTIONS 595

print(f"Error at iteration {i} is {error}.")


w = w_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return w_new

The solution can be plotted as follows

In [11]: sp = SearchProblem()
w_bar = solve_wbar(sp)

fig, ax = plt.subplots(figsize=(9, 7))

ax.plot(sp.π_grid, w_bar, color='k')


ax.fill_between(sp.π_grid, 0, w_bar, color='blue', alpha=0.15)
ax.fill_between(sp.π_grid, w_bar, sp.w_max, color='green', alpha=0.15)
ax.text(0.5, 0.6, 'reject')
ax.text(0.7, 0.9, 'accept')
ax.set(xlabel='$\pi$', ylabel='$w$')
ax.grid()
plt.show()

Error at iteration 5 is 0.02082617137586329.


Error at iteration 10 is 0.006108837557638802.
Error at iteration 15 is 0.0014079505485764532.
Error at iteration 20 is 0.0003011528470184821.

Converged in 24 iterations.
596 35. JOB SEARCH III: SEARCH WITH LEARNING

35.8 Appendix

The next piece of code is just a fun simulation to see what the effect of a change in the un-
derlying distribution on the unemployment rate is
At a point in the simulation, the distribution becomes significantly worse
It takes a while for agents to learn this, and in the meantime, they are too optimistic and
turn down too many jobs
As a result, the unemployment rate spikes

In [12]: F_a, F_b, G_a, G_b = 1, 1, 3, 1.2

sp = SearchProblem(F_a=F_a, F_b=F_b, G_a=G_a, G_b=G_b)


f, g = sp.f, sp.g

# Solve for reservation wage


w_bar = solve_wbar(sp, verbose=False)

# Interpolate reservation wage function


π_grid = sp.π_grid
w_func = njit(lambda x: interp(π_grid, w_bar, x))

@njit
def update(a, b, e, π):
"Update e and π by drawing wage offer from beta distribution with parameters a and b"

if e == False:
w = np.random.beta(a, b) # Draw random wage
if w >= w_func(π):
e = True # Take new job
else:
π = 1 / (1 + ((1 - π) * g(w)) / (π * f(w)))

return e, π

@njit
def simulate_path(F_a=F_a,
F_b=F_b,
G_a=G_a,
G_b=G_b,
N=5000, # Number of agents
T=600, # Simulation length
d=200, # Change date
s=0.025): # Separation rate

"""Simulates path of employment for N number of works over T periods"""

e = np.ones((N, T+1))
π = np.ones((N, T+1)) * 1e-3

a, b = G_a, G_b # Initial distribution parameters

for t in range(T+1):

if t == d:
a, b = F_a, F_b # Change distribution parameters

# Update each agent


for n in range(N):
if e[n, t] == 1: # If agent is currently employment
p = np.random.uniform(0, 1)
if p <= s: # Randomly separate with probability s
e[n, t] = 0

new_e, new_π = update(a, b, e[n, t], π[n, t])


e[n, t+1] = new_e
π[n, t+1] = new_π
35.8. APPENDIX 597

return e[:, 1:]

d = 200 # Change distribution at time d


unemployment_rate = 1 - simulate_path(d=d).mean(axis=0)

plt.figure(figsize=(10, 6))
plt.plot(unemployment_rate)
plt.axvline(d, color='r', alpha=0.6, label='Change date')
plt.xlabel('Time')
plt.title('Unemployment rate')
plt.legend()
plt.show()
598 35. JOB SEARCH III: SEARCH WITH LEARNING
36

Job Search IV: Modeling Career


Choice

36.1 Contents

• Overview 36.2

• Model 36.3

• Implementation 36.4

• Exercises 36.5

• Solutions 36.6

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

36.2 Overview

Next, we study a computational problem concerning career and job choices


The model is originally due to Derek Neal [99]
This exposition draws on the presentation in [87], section 6.5
We begin with some imports

In [2]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np
import quantecon as qe
from numba import njit, prange
from quantecon.distributions import BetaBinomial
from scipy.special import binom, beta
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

599
600 36. JOB SEARCH IV: MODELING CAREER CHOICE

36.2.1 Model Features

• Career and job within career both chosen to maximize expected discounted wage flow
• Infinite horizon dynamic programming with two state variables

36.3 Model

In what follows we distinguish between a career and a job, where

• a career is understood to be a general field encompassing many possible jobs, and


• a job is understood to be a position with a particular firm

For workers, wages can be decomposed into the contribution of job and career

• 𝑤𝑡 = 𝜃𝑡 + 𝜖𝑡 , where

– 𝜃𝑡 is the contribution of career at time 𝑡


– 𝜖𝑡 is the contribution of the job at time 𝑡

At the start of time 𝑡, a worker has the following options

• retain a current (career, job) pair (𝜃𝑡 , 𝜖𝑡 ) — referred to hereafter as “stay put”
• retain a current career 𝜃𝑡 but redraw a job 𝜖𝑡 — referred to hereafter as “new job”
• redraw both a career 𝜃𝑡 and a job 𝜖𝑡 — referred to hereafter as “new life”

Draws of 𝜃 and 𝜖 are independent of each other and past values, with

• 𝜃𝑡 ∼ 𝐹
• 𝜖𝑡 ∼ 𝐺

Notice that the worker does not have the option to retain a job but redraw a career — start-
ing a new career always requires starting a new job
A young worker aims to maximize the expected sum of discounted wages


E ∑ 𝛽 𝑡 𝑤𝑡 (1)
𝑡=0

subject to the choice restrictions specified above


Let 𝑣(𝜃, 𝜖) denote the value function, which is the maximum of Eq. (1) overall feasible (career,
job) policies, given the initial state (𝜃, 𝜖)
The value function obeys

𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}

where
36.3. MODEL 601

𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)

𝐼𝐼 = 𝜃 + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ 𝑣(𝜃, 𝜖′ )𝐺(𝑑𝜖′ ) (2)

𝐼𝐼𝐼 = ∫ 𝜃′ 𝐹 (𝑑𝜃′ ) + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ ∫ 𝑣(𝜃′ , 𝜖′ )𝐺(𝑑𝜖′ )𝐹 (𝑑𝜃′ )

Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively

36.3.1 Parameterization

As in [87], section 6.5, we will focus on a discrete version of the model, parameterized as fol-
lows:

• both 𝜃 and 𝜖 take values in the set np.linspace(0, B, grid_size) — an even


grid of points between 0 and 𝐵 inclusive
• grid_size = 50
• B = 5
• β = 0.95

The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points
np.linspace(0, B, grid_size)
A very useful family of discrete distributions is the Beta-binomial family, with probability
mass function

𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)

Interpretation:

• draw 𝑞 from a Beta distribution with shape parameters (𝑎, 𝑏)


• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials

Nice properties:

• very flexible class of distributions, including uniform, symmetric unimodal, etc.


• only three parameters

Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50

In [3]: def gen_probs(n, a, b):


probs = np.zeros(n+1)
for k in range(n+1):
probs[k] = binom(n, k) * beta(k + a, n - k + b) / beta(a, b)
return probs

n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
602 36. JOB SEARCH IV: MODELING CAREER CHOICE

ab_label = f'$a = {a:.1f}$, $b = {b:.1f}$'


ax.plot(list(range(0, n+1)), gen_probs(n, a, b), '-o', label=ab_label)
ax.legend()
plt.show()

36.4 Implementation

We will first create a class CareerWorkerProblem which will hold the default parameteri-
zations of the model and an initial guess for the value function

In [4]: class CareerWorkerProblem:

def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):

self.β, self.grid_size, self.B = β, grid_size, B

self.θ = np.linspace(0, B, grid_size) # set of θ values


self.� = np.linspace(0, B, grid_size) # set of � values

self.F_probs = BetaBinomial(grid_size - 1, F_a, F_b).pdf()


self.G_probs = BetaBinomial(grid_size - 1, G_a, G_b).pdf()
self.F_mean = np.sum(self.θ * self.F_probs)
self.G_mean = np.sum(self.� * self.G_probs)

# Store these parameters for str and repr methods


self._F_a, self._F_b = F_a, F_b
self._G_a, self._G_b = G_a, G_b

The following function takes an instance of CareerWorkerProblem and returns the corre-
sponding Bellman operator 𝑇 and the greedy policy function
36.4. IMPLEMENTATION 603

In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in
Eq. (2)

In [5]: def operator_factory(cw, parallel_flag=True):

"""
Returns jitted versions of the Bellman operator and the
greedy policy function

cw is an instance of ``CareerWorkerProblem``
"""

θ, �, β = cw.θ, cw.�, cw.β


F_probs, G_probs = cw.F_probs, cw.G_probs
F_mean, G_mean = cw.F_mean, cw.G_mean

@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"

v_new = np.empty_like(v)

for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + �[j] + β * v[i, j] # stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # new job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # new life
v_new[i, j] = max(v1, v2, v3)

return v_new

@njit
def get_greedy(v):
"Computes the v-greedy policy"

σ = np.empty(v.shape)

for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + �[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action

return σ

return T, get_greedy

Lastly, solve_model will take an instance of CareerWorkerProblem and iterate using


the Bellman operator to find the fixed point of the value function

In [6]: def solve_model(cw,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

T, _ = operator_factory(cw, parallel_flag=use_parallel)

# Set up loop
v = np.ones((cw.grid_size, cw.grid_size)) * 100 # Initial guess
i = 0
error = tol + 1
604 36. JOB SEARCH IV: MODELING CAREER CHOICE

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Here’s the solution to the model – an approximate value function

In [7]: cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
tg, eg = np.meshgrid(cw.θ, cw.�)
ax.plot_surface(tg,
eg,
v_star.T,
cmap=cm.jet,
alpha=0.5,
linewidth=0.25)
ax.set(xlabel='θ', ylabel='�', zlim=(150, 200))
ax.view_init(ax.elev, 225)
plt.show()
36.4. IMPLEMENTATION 605

And here is the optimal policy

In [8]: fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.�)
lvls = (0.5, 1.5, 2.5, 3.5)
ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='�')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()

Interpretation:

• If both job and career are poor or mediocre, the worker will experiment with a new job
and new career
• If career is sufficiently good, the worker will hold it and experiment with new jobs until
a sufficiently good one is found
• If both job and career are good, the worker will stay put

Notice that the worker will always hold on to a sufficiently good career, but not necessarily
hold on to even the best paying job
606 36. JOB SEARCH IV: MODELING CAREER CHOICE

The reason is that high lifetime wages require both variables to be large, and the worker can-
not change careers without changing jobs

• Sometimes a good job must be sacrificed in order to change to a better career

36.5 Exercises

36.5.1 Exercise 1

Using the default parameterization in the class CareerWorkerProblem, generate and plot
typical sample paths for 𝜃 and 𝜖 when the worker follows the optimal policy
In particular, modulo randomness, reproduce the following figure (where the horizontal axis
represents time)

Hint: To generate the draws from the distributions 𝐹 and 𝐺, use quante-
con.random.draw()

36.5.2 Exercise 2

Let’s now consider how long it takes for the worker to settle down to a permanent job, given
a starting point of (𝜃, 𝜖) = (0, 0)
In other words, we want to study the distribution of the random variable

𝑇 ∗ ∶= the first point in time from which the worker’s job no longer changes

Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put”
region of (𝜃, 𝜖) space
36.6. SOLUTIONS 607

Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the
optimal policy:

𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}

Collect 25,000 draws of this random variable and compute the median (which should be
about 7)
Repeat the exercise with 𝛽 = 0.99 and interpret the change

36.5.3 Exercise 3

Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure –
interpret

36.6 Solutions

36.6.1 Exercise 1

Simulate job/career paths


In reading the code, recall that optimal_policy[i, j] = policy at (𝜃𝑖 , 𝜖𝑗 ) = either 1, 2
or 3; meaning ‘stay put’, ‘new job’ and ‘new life’

In [9]: F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)

def gen_path(optimal_policy, F, G, t=20):


i = j = 0
θ_index = []
�_index = []
for t in range(t):
if greedy_star[i, j] == 1: # Stay put
pass
elif greedy_star[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
θ_index.append(i)
�_index.append(j)
return cw.θ[θ_index], cw.�[�_index]

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


for ax in axes:
θ_path, �_path = gen_path(greedy_star, F, G)
ax.plot(�_path, label='�')
ax.plot(θ_path, label='θ')
ax.set_ylim(0, 6)

plt.legend()
plt.show()
608 36. JOB SEARCH IV: MODELING CAREER CHOICE

36.6.2 Exercise 2

The median for the original parameterization can be computed as follows

In [10]: cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = int(qe.random.draw(G))
else: # New life
i, j = int(qe.random.draw(F)), int(qe.random.draw(G))
t += 1

@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
samples = np.empty(M)
for i in prange(M):
samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)

median_time(greedy_star, F, G)
36.6. SOLUTIONS 609

Out[10]: 7.0

To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw =
CareerWorkerProblem() with cw = CareerWorkerProblem(β=0.99)
The medians are subject to randomness but should be about 7 and 14 respectively
Not surprisingly, more patient workers will wait longer to settle down to their final job

36.6.3 Exercise 3

In [11]: cw = CareerWorkerProblem(G_a=100, G_b=100)


T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.�)
lvls = (0.5, 1.5, 2.5, 3.5)
ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='�')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()
610 36. JOB SEARCH IV: MODELING CAREER CHOICE

In the new figure, you see that the region for which the worker stays put has grown because
the distribution for 𝜖 has become more concentrated around the mean, making high-paying
jobs less realistic
37

Job Search V: On-the-Job Search

37.1 Contents

• Overview 37.2
• Model 37.3
• Implementation 37.4
• Solving for Policies 37.5
• Exercises 37.6
• Solutions 37.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

37.2 Overview

In this section, we solve a simple on-the-job search model

• based on [87], exercise 6.18, and [71]

Let’s start with some imports

In [2]: import numpy as np


import scipy.stats as stats
from interpolation import interp
from numba import njit, prange
import matplotlib.pyplot as plt
%matplotlib inline
from math import gamma

37.2.1 Model Features

- job-specific human capital accumulation combined with on-the-job search


- infinite-horizon dynamic programming with one state variable and two controls

611
612 37. JOB SEARCH V: ON-THE-JOB SEARCH

37.3 Model

Let

• 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm
• 𝑤𝑡 denote current wages

Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where

• 𝜙𝑡 is investment in job-specific human capital for the current role


• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms

For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 =
𝑔(𝑥𝑡 , 𝜙𝑡 )
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈ [0, 1]
Value of offer is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with common distribution 𝑓
Worker has the right to reject the current offer and continue with existing job
In particular, 𝑥𝑡+1 = 𝑢𝑡+1 if accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) if rejects
Letting 𝑏𝑡+1 ∈ {0, 1} be binary with 𝑏𝑡+1 = 1 indicating an offer, we can write

𝑥𝑡+1 = (1 − 𝑏𝑡+1 )𝑔(𝑥𝑡 , 𝜙𝑡 ) + 𝑏𝑡+1 max{𝑔(𝑥𝑡 , 𝜙𝑡 ), 𝑢𝑡+1 } (1)

Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using Eq. (1), the Bellman equation for this problem
can be written as

𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (2)
𝑠+𝜙≤1

Here nonnegativity of 𝑠 and 𝜙 is understood, while 𝑎 ∨ 𝑏 ∶= max{𝑎, 𝑏}

37.3.1 Parameterization

In the implementation below, we will focus on the parameterization


𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)

with default parameter values

• 𝐴 = 1.4
• 𝛼 = 0.6
• 𝛽 = 0.96

The Beta(2, 2) distribution is supported on (0, 1) - it has a unimodal, symmetric density


peaked at 0.5
37.4. IMPLEMENTATION 613

37.3.2 Back-of-the-Envelope Calculations

Before we solve the model, let’s make some quick calculations that provide intuition on what
the solution should look like
To begin, observe that the worker has two instruments to build capital and hence wages:

1. invest in capital specific to the current job via 𝜙


2. search for a new job with better job-specific capital match via 𝑠

Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical
Our risk-neutral worker should focus on whatever instrument has the highest expected return
The relative expected return will depend on 𝑥
For example, suppose first that 𝑥 = 0.05

• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of Eq. (1) gives expected
next period capital equal to 𝜋(𝑠)E𝑢 = E𝑢 = 0.5
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23

Both rates of return are good, but the return from search is better
Next, suppose that 𝑥 = 0.4

• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5


• If 𝑠 = 0 and 𝜙 = 1, then 𝑔(𝑥, 𝜙) = 𝑔(0.4, 1) ≈ 0.8

Return from investment via 𝜙 dominates expected return from search


Combining these observations gives us two informal predictions:

1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes —
worker will focus on whichever instrument has the higher expected return
2. For sufficiently small 𝑥, search will be preferable to investment in job-specific human
capital. For larger 𝑥, the reverse will be true

Now let’s turn to implementation, and see if we can match our predictions

37.4 Implementation

We will set up a class JVWorker that holds the parameters of the model described above

In [3]: class JVWorker:


r"""
A Jovanovic-type model of employment with on-the-job search.

"""

def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
614 37. JOB SEARCH V: ON-THE-JOB SEARCH

π=np.sqrt, # Search effort function


a=2, # Parameter of f
b=2, # Parameter of f
grid_size=50,
mc_size=100,
�=1e-4):

self.A, self.α, self.β, self.π = A, α, β, π


self.mc_size, self.� = mc_size, �

self.g = njit(lambda x, �: A * (x * �)**α) # Transition function


self.f_rvs = np.random.beta(a, b, mc_size)

# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
� = 1e-4
grid_max = max(A**(1 / (1 - α)), stats.beta(a, b).ppf(1 - �))

# Human capital
self.x_grid = np.linspace(�, grid_max, grid_size)

The function operator_factory takes an instance of this class and returns a jitted version
of the Bellman operator T, ie.

𝑇 𝑣(𝑥) = max 𝑤(𝑠, 𝜙)


𝑠+𝜙≤1

where

𝑤(𝑠, 𝜙) ∶= 𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢) (3)

When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid
But to evaluate the right-hand side of Eq. (3), we need a function, so we replace the arrays v
and x_grid with a function v_func that gives linear interpolation of v on x_grid
Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) =
𝑤(𝑠, 𝜙) defined in Eq. (3)
The function is maximized over all feasible (𝑠, 𝜙) pairs
Another function, get_greedy returns the optimal policies of s and 𝜙 given a value func-
tion

In [4]: def operator_factory(jv, parallel_flag=True):

"""
Returns a jitted version of the Bellman operator T

jv is an instance of JVWorker

"""

π, β = jv.π, jv.β
x_grid, �, mc_size = jv.x_grid, jv.�, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g

@njit
def objective(z, x, v):
s, � = z
v_func = lambda x: interp(x_grid, v, x)

integral = 0
for m in range(mc_size):
37.4. IMPLEMENTATION 615

u = f_rvs[m]
integral += v_func(max(g(x, �), u))
integral = integral / mc_size

q = π(s) * integral + (1 - π(s)) * v_func(g(x, �))


return x * (1 - � - s) + β * q

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""

v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]

# === Search on a grid === #


search_grid = np.linspace(�, 1, 15)
max_val = -1
for s in search_grid:
for � in search_grid:
current_val = objective((s, �), x, v) if s + � <= 1 else -1
if current_val > max_val:
max_val = current_val
v_new[i] = max_val

return v_new

@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
s_policy, �_policy = np.empty_like(v), np.empty_like(v)

for i in range(len(x_grid)):
x = x_grid[i]
# === Search on a grid === #
search_grid = np.linspace(�, 1, 15)
max_val = -1
for s in search_grid:
for � in search_grid:
current_val = objective((s, �), x, v) if s + � <= 1 else -1
if current_val > max_val:
max_val = current_val
max_s, max_� = s, �
s_policy[i], �_policy[i] = max_s, max_�
return s_policy, �_policy

return T, get_greedy

To solve the model, we will write a function that uses the Bellman operator and iterates to
find a fixed point

In [5]: def solve_model(jv,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

"""
Solves the model by value function iteration

* jv is an instance of JVWorker

"""

T, _ = operator_factory(jv, parallel_flag=use_parallel)
616 37. JOB SEARCH V: ON-THE-JOB SEARCH

# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

37.5 Solving for Policies

Let’s plot the optimal policies and see what they look like

In [6]: jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, �_star = get_greedy(v_star)
plots = [s_star, �_star, v_star]
titles = ["� policy", "s policy", "value function"]

fig, axes = plt.subplots(3, 1, figsize=(12, 12))

for ax, plot, title in zip(axes, plots, titles):


ax.plot(jv.x_grid, plot)
ax.set(title=title)
ax.grid()

axes[-1].set_xlabel("x")
plt.show()

Error at iteration 25 is 0.15111043979991212.


Error at iteration 50 is 0.0544597063611576.
Error at iteration 75 is 0.019627099373614953.
Error at iteration 100 is 0.007073542175699998.
Error at iteration 125 is 0.0025492813766803124.
Error at iteration 150 is 0.0009187526385048272.
Error at iteration 175 is 0.00033111543452513104.
Error at iteration 200 is 0.00011933291550647596.

Converged in 205 iterations.


37.6. EXERCISES 617

The horizontal axis is the state 𝑥, while the vertical axis gives 𝑠(𝑥) and 𝜙(𝑥)
Overall, the policies match well with our predictions from above

• Worker switches from one investment strategy to the other depending on relative return
• For low values of 𝑥, the best option is to search for a new job
• Once 𝑥 is larger, worker does better by investing in human capital specific to the cur-
rent position

37.6 Exercises

37.6.1 Exercise 1

Let’s look at the dynamics for the state process {𝑥𝑡 } associated with these policies
The dynamics are given by Eq. (1) when 𝜙𝑡 and 𝑠𝑡 are chosen according to the optimal poli-
cies, and P{𝑏𝑡+1 = 1} = 𝜋(𝑠𝑡 )
Since the dynamics are random, analysis is a bit subtle
618 37. JOB SEARCH V: ON-THE-JOB SEARCH

One way to do it is to plot, for each 𝑥 in a relatively fine grid called plot_grid, a large
number 𝐾 of realizations of 𝑥𝑡+1 given 𝑥𝑡 = 𝑥
Plot this with one dot for each realization, in the form of a 45 degree diagram, setting

jv = JVWorker(grid_size=25, mc_size=50)
plot_grid_max, plot_grid_size = 1.2, 100
plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots()
ax.set_xlim(0, plot_grid_max)
ax.set_ylim(0, plot_grid_max)

By examining the plot, argue that under the optimal policies, the state 𝑥𝑡 will converge to a
constant value 𝑥̄ close to unity
Argue that at the steady state, 𝑠𝑡 ≈ 0 and 𝜙𝑡 ≈ 0.6

37.6.2 Exercise 2

In the preceding exercise, we found that 𝑠𝑡 converges to zero and 𝜙𝑡 converges to about 0.6
Since these results were calculated at a value of 𝛽 close to one, let’s compare them to the best
choice for an infinitely patient worker
Intuitively, an infinitely patient worker would like to maximize steady state wages, which are
a function of steady state capital
You can take it as given—it’s certainly true—that the infinitely patient worker does not
search in the long run (i.e., 𝑠𝑡 = 0 for large 𝑡)
Thus, given 𝜙, steady state capital is the positive fixed point 𝑥∗ (𝜙) of the map 𝑥 ↦ 𝑔(𝑥, 𝜙)
Steady state wages can be written as 𝑤∗ (𝜙) = 𝑥∗ (𝜙)(1 − 𝜙)
Graph 𝑤∗ (𝜙) with respect to 𝜙, and examine the best choice of 𝜙
Can you give a rough interpretation for the value that you see?

37.7 Solutions

37.7.1 Exercise 1

Here’s code to produce the 45 degree diagram

In [7]: jv = JVWorker(grid_size=25, mc_size=50)


π, g, f_rvs, x_grid = jv.π, jv.g, jv.f_rvs, jv.x_grid
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv, verbose=False)
s_policy, �_policy = get_greedy(v_star)

# Turn the policy function arrays into actual functions


s = lambda y: interp(x_grid, s_policy, y)
� = lambda y: interp(x_grid, �_policy, y)

def h(x, b, u):


return (1 - b) * g(x, �(x)) + b * max(g(x, �(x)), u)
37.7. SOLUTIONS 619

plot_grid_max, plot_grid_size = 1.2, 100


plot_grid = np.linspace(0, plot_grid_max, plot_grid_size)
fig, ax = plt.subplots(figsize=(8, 8))
ticks = (0.25, 0.5, 0.75, 1.0)
ax.set(xticks=ticks, yticks=ticks,
xlim=(0, plot_grid_max),
ylim=(0, plot_grid_max),
xlabel='$x_t$', ylabel='$x_{t+1}$')

ax.plot(plot_grid, plot_grid, 'k--', alpha=0.6) # 45 degree line


for x in plot_grid:
for i in range(jv.mc_size):
b = 1 if np.random.uniform(0, 1) < π(s(x)) else 0
u = f_rvs[i]
y = h(x, b, u)
ax.plot(x, y, 'go', alpha=0.25)

plt.show()

Looking at the dynamics, we can see that

• If 𝑥𝑡 is below about 0.2 the dynamics are random, but 𝑥𝑡+1 > 𝑥𝑡 is very likely
• As 𝑥𝑡 increases the dynamics become deterministic, and 𝑥𝑡 converges to a steady state
value close to 1
620 37. JOB SEARCH V: ON-THE-JOB SEARCH

Referring back to the figure here we see that 𝑥𝑡 ≈ 1 means that 𝑠𝑡 = 𝑠(𝑥𝑡 ) ≈ 0 and 𝜙𝑡 =
𝜙(𝑥𝑡 ) ≈ 0.6

37.7.2 Exercise 2

The figure can be produced as follows

In [8]: jv = JVWorker()

def xbar(�):
A, α = jv.A, jv.α
return (A * �**α)**(1 / (1 - α))

�_grid = np.linspace(0, 1, 100)


fig, ax = plt.subplots(figsize=(9, 7))
ax.set(xlabel='$\phi$')
ax.plot(�_grid, [xbar(�) * (1 - �) for � in �_grid], label='$w^*(\phi)$')
ax.legend()

plt.show()

Observe that the maximizer is around 0.6


This is similar to the long-run value for 𝜙 obtained in exercise 1
Hence the behavior of the infinitely patent worker is similar to that of the worker with 𝛽 =
0.96
This seems reasonable and helps us confirm that our dynamic programming solutions are
probably correct
38

Optimal Growth I: The Stochastic


Optimal Growth Model

38.1 Contents

• Overview 38.2

• The Model 38.3

• Computation 38.4

• Exercises 38.5

• Solutions 38.6

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

38.2 Overview

In this lecture, we’re going to study a simple optimal growth model with one agent
The model is a version of the standard one sector infinite horizon growth model studied in

• [123], chapter 2
• [87], section 3.1
• EDTC, chapter 1
• [127], chapter 12

The technique we use to solve the model is dynamic programming


Our treatment of dynamic programming follows on from earlier treatments in our lectures on
shortest paths and job search
We’ll discuss some of the technical details of dynamic programming as we go along
Let’s start with some imports

621
622 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

We use an interpolation function from the interpolation.py package because it comes in handy
later when we want to just-in-time compile our code
This library can be installed with the following command in Jupyter: !pip install in-
terpolation

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
from interpolation import interp
from numba import njit, prange
from quantecon.optimize.scalar_maximization import brent_max

38.3 The Model

Consider an agent who owns an amount 𝑦𝑡 ∈ R+ ∶= [0, ∞) of a consumption good at time 𝑡


This output can either be consumed or invested
When the good is invested it is transformed one-for-one into capital
The resulting capital stock, denoted here by 𝑘𝑡+1 , will then be used for production
Production is stochastic, in that it also depends on a shock 𝜉𝑡+1 realized at the end of the
current period
Next period output is

𝑦𝑡+1 ∶= 𝑓(𝑘𝑡+1 )𝜉𝑡+1

where 𝑓 ∶ R+ → R+ is called the production function


The resource constraint is

𝑘𝑡+1 + 𝑐𝑡 ≤ 𝑦𝑡 (1)

and all variables are required to be nonnegative

38.3.1 Assumptions and Comments

In what follows,

• The sequence {𝜉𝑡 } is assumed to be IID


• The common distribution of each 𝜉𝑡 will be denoted 𝜙
• The production function 𝑓 is assumed to be increasing and continuous
• Depreciation of capital is not made explicit but can be incorporated into the production
function

While many other treatments of the stochastic growth model use 𝑘𝑡 as the state variable, we
will use 𝑦𝑡
This will allow us to treat a stochastic model while maintaining only one state variable
We consider alternative states and timing specifications in some of our other lectures
38.3. THE MODEL 623

38.3.2 Optimization

Taking 𝑦0 as given, the agent wishes to maximize


E [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (2)
𝑡=0

subject to

𝑦𝑡+1 = 𝑓(𝑦𝑡 − 𝑐𝑡 )𝜉𝑡+1 and 0 ≤ 𝑐𝑡 ≤ 𝑦 𝑡 for all 𝑡 (3)

where

• 𝑢 is a bounded, continuous and strictly increasing utility function and


• 𝛽 ∈ (0, 1) is a discount factor

In Eq. (3) we are assuming that the resource constraint Eq. (1) holds with equality — which
is reasonable because 𝑢 is strictly increasing and no output will be wasted at the optimum
In summary, the agent’s aim is to select a path 𝑐0 , 𝑐1 , 𝑐2 , … for consumption that is

1. nonnegative,
2. feasible in the sense of Eq. (1),
3. optimal, in the sense that it maximizes Eq. (2) relative to all other feasible consumption
sequences, and
4. adapted, in the sense that the action 𝑐𝑡 depends only on observable outcomes, not on
future outcomes such as 𝜉𝑡+1

In the present context

• 𝑦𝑡 is called the state variable — it summarizes the “state of the world” at the start of
each period
• 𝑐𝑡 is called the control variable — a value chosen by the agent each period after observ-
ing the state

38.3.3 The Policy Function Approach

One way to think about solving this problem is to look for the best policy function
A policy function is a map from past and present observables into current action
We’ll be particularly interested in Markov policies, which are maps from the current state
𝑦𝑡 into a current action 𝑐𝑡
For dynamic programming problems such as this one (in fact for any Markov decision pro-
cess), the optimal policy is always a Markov policy
In other words, the current state 𝑦𝑡 provides a sufficient statistic for the history in terms of
making an optimal decision today
This is quite intuitive but if you wish you can find proofs in texts such as [123] (section 4.1)
624 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

Hereafter we focus on finding the best Markov policy


In our context, a Markov policy is a function 𝜎 ∶ R+ → R+ , with the understanding that
states are mapped to actions via

𝑐𝑡 = 𝜎(𝑦𝑡 ) for all 𝑡

In what follows, we will call 𝜎 a feasible consumption policy if it satisfies

0 ≤ 𝜎(𝑦) ≤ 𝑦 for all 𝑦 ∈ R+ (4)

In other words, a feasible consumption policy is a Markov policy that respects the resource
constraint
The set of all feasible consumption policies will be denoted by Σ
Each 𝜎 ∈ Σ determines a continuous state Markov process {𝑦𝑡 } for output via

𝑦𝑡+1 = 𝑓(𝑦𝑡 − 𝜎(𝑦𝑡 ))𝜉𝑡+1 , 𝑦0 given (5)

This is the time path for output when we choose and stick with the policy 𝜎
We insert this process into the objective function to get

∞ ∞
E [ ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) ] = E [ ∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 )) ] (6)
𝑡=0 𝑡=0

This is the total expected present value of following policy 𝜎 forever, given initial income 𝑦0
The aim is to select a policy that makes this number as large as possible
The next section covers these ideas more formally

38.3.4 Optimality

The 𝜎 associated with a given policy 𝜎 is the mapping defined by


𝑣𝜎 (𝑦) = E [∑ 𝛽 𝑡 𝑢(𝜎(𝑦𝑡 ))] (7)
𝑡=0

when {𝑦𝑡 } is given by Eq. (5) with 𝑦0 = 𝑦


In other words, it is the lifetime value of following policy 𝜎 starting at initial condition 𝑦
The value function is then defined as

𝑣∗ (𝑦) ∶= sup 𝑣𝜎 (𝑦) (8)


𝜎∈Σ

The value function gives the maximal value that can be obtained from state 𝑦, after consider-
ing all feasible policies
A policy 𝜎 ∈ Σ is called optimal if it attains the supremum in Eq. (8) for all 𝑦 ∈ R+
38.3. THE MODEL 625

38.3.5 The Bellman Equation

With our assumptions on utility and production function, the value function as defined in
Eq. (8) also satisfies a Bellman equation
For this problem, the Bellman equation takes the form

𝑣(𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (𝑦 ∈ R+ ) (9)


0≤𝑐≤𝑦

This is a functional equation in 𝑣


The term ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) can be understood as the expected next period value when

• 𝑣 is used to measure value


• the state is 𝑦
• consumption is set to 𝑐

As shown in EDTC, theorem 10.1.11 and a range of other texts

The value function 𝑣∗ satisfies the Bellman equation

In other words, Eq. (9) holds when 𝑣 = 𝑣∗


The intuition is that maximal value from a given state can be obtained by optimally trading
off

• current reward from a given action, vs


• expected discounted future value of the state resulting from that action

The Bellman equation is important because it gives us more information about the value
function
It also suggests a way of computing the value function, which we discuss below

38.3.6 Greedy Policies

The primary importance of the value function is that we can use it to compute optimal poli-
cies
The details are as follows
Given a continuous function 𝑣 on R+ , we say that 𝜎 ∈ Σ is 𝑣-greedy if 𝜎(𝑦) is a solution to

max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (10)


0≤𝑐≤𝑦

for every 𝑦 ∈ R+
In other words, 𝜎 ∈ Σ is 𝑣-greedy if it optimally trades off current and future rewards when 𝑣
is taken to be the value function
In our setting, we have the following key result
626 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

• A feasible consumption policy is optimal if and only it is 𝑣∗ -greedy

The intuition is similar to the intuition for the Bellman equation, which was provided after
Eq. (9)
See, for example, theorem 10.1.11 of EDTC
Hence, once we have a good approximation to 𝑣∗ , we can compute the (approximately) opti-
mal policy by computing the corresponding greedy policy
The advantage is that we are now solving a much lower dimensional optimization problem

38.3.7 The Bellman Operator

How, then, should we compute the value function?


One way is to use the so-called Bellman operator
(An operator is a map that sends functions into functions)
The Bellman operator is denoted by 𝑇 and defined by

𝑇 𝑣(𝑦) ∶= max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (𝑦 ∈ R+ ) (11)


0≤𝑐≤𝑦

In other words, 𝑇 sends the function 𝑣 into the new function 𝑇 𝑣 defined by Eq. (11)
By construction, the set of solutions to the Bellman equation Eq. (9) exactly coincides with
the set of fixed points of 𝑇
For example, if 𝑇 𝑣 = 𝑣, then, for any 𝑦 ≥ 0,

𝑣(𝑦) = 𝑇 𝑣(𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣∗ (𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)}


0≤𝑐≤𝑦

which says precisely that 𝑣 is a solution to the Bellman equation


It follows that 𝑣∗ is a fixed point of 𝑇

38.3.8 Review of Theoretical Results

One can also show that 𝑇 is a contraction mapping on the set of continuous bounded func-
tions on R+ under the supremum distance

𝜌(𝑔, ℎ) = sup |𝑔(𝑦) − ℎ(𝑦)|


𝑦≥0

See EDTC, lemma 10.1.18


Hence it has exactly one fixed point in this set, which we know is equal to the value function
It follows that

• The value function 𝑣∗ is bounded and continuous


• Starting from any bounded and continuous 𝑣, the sequence 𝑣, 𝑇 𝑣, 𝑇 2 𝑣, … generated by
iteratively applying 𝑇 converges uniformly to 𝑣∗
38.4. COMPUTATION 627

This iterative method is called value function iteration


We also know that a feasible policy is optimal if and only if it is 𝑣∗ -greedy
It’s not too hard to show that a 𝑣∗ -greedy policy exists (see EDTC, theorem 10.1.11 if you
get stuck)
Hence at least one optimal policy exists
Our problem now is how to compute it

38.3.9 Unbounded Utility

The results stated above assume that the utility function is bounded
In practice economists often work with unbounded utility functions — and so will we
In the unbounded setting, various optimality theories exist
Unfortunately, they tend to be case-specific, as opposed to valid for a large range of applica-
tions
Nevertheless, their main conclusions are usually in line with those stated for the bounded case
just above (as long as we drop the word “bounded”)
Consult, for example, section 12.2 of EDTC, [75] or [92]

38.4 Computation

Let’s now look at computing the value function and the optimal policy

38.4.1 Fitted Value Iteration

The first step is to compute the value function by value function iteration
In theory, the algorithm is as follows

1. Begin with a function 𝑣 — an initial condition


2. Solving Eq. (11), obtain the function 𝑇 𝑣
3. Unless some stopping condition is satisfied, set 𝑣 = 𝑇 𝑣 and go to step 2

This generates the sequence 𝑣, 𝑇 𝑣, 𝑇 2 𝑣, …


However, there is a problem we must confront before we implement this procedure: The iter-
ates can neither be calculated exactly nor stored on a computer
To see the issue, consider Eq. (11)
Even if 𝑣 is a known function, unless 𝑇 𝑣 can be shown to have some special structure, the
only way to store it is to record the value 𝑇 𝑣(𝑦) for every 𝑦 ∈ R+
Clearly, this is impossible
What we will do instead is use fitted value function iteration
The procedure is to record the value of the function 𝑇 𝑣 at only finitely many “grid” points
𝑦1 < 𝑦2 < ⋯ < 𝑦𝐼 and reconstruct it from this information when required
628 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

More precisely, the algorithm will be


1. Begin with an array of values {𝑣1 , … , 𝑣𝐼 } representing the values of some initial function 𝑣
on the grid points {𝑦1 , … , 𝑦𝐼 }
1. Build a function 𝑣 ̂ on the state space R+ by interpolation or approximation, based on
these data points
1. Obtain and record the value 𝑇 𝑣(𝑦 ̂ 𝑖 ) on each grid point 𝑦𝑖 by repeatedly solving Eq. (11)
1. Unless some stopping condition is satisfied, set {𝑣1 , … , 𝑣𝐼 } = {𝑇 𝑣(𝑦
̂ 1 ), … , 𝑇 𝑣(𝑦
̂ 𝐼 )} and go to
step 2
How should we go about step 2?
This is a problem of function approximation, and there are many ways to approach it
What’s important here is that the function approximation scheme must not only produce a
good approximation to 𝑇 𝑣, but also combine well with the broader iteration algorithm de-
scribed above
One good choice from both respects is continuous piecewise linear interpolation (see this pa-
per for further discussion)
The next figure illustrates piecewise linear interpolation of an arbitrary function on grid
points 0, 0.2, 0.4, 0.6, 0.8, 1

In [3]: def f(x):


y1 = 2 * np.cos(6 * x) + np.sin(14 * x)
return y1 + 2.5

def Af(x):
return interp(c_grid, f(c_grid), x)

c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(f_grid, f(f_grid), 'b-', label='true function')


ax.plot(f_grid, Af(f_grid), 'g-', label='linear approximation')
ax.vlines(c_grid, c_grid * 0, f(c_grid), linestyle='dashed', alpha=0.5)

ax.set(xlim=(0, 1), ylim=(0, 6))


plt.show()
38.4. COMPUTATION 629

Another advantage of piecewise linear interpolation is that it preserves useful shape properties
such as monotonicity and concavity/convexity

38.4.2 Optimal Growth Model

We will hold the primitives of the optimal growth model in a class


The distribution 𝜙 of the shock is assumed to be lognormal, and so a draw from exp(𝜇 + 𝜎𝜁)
when 𝜁 is standard normal

In [4]: class OptimalGrowthModel:

def __init__(self,
f, # Production function
u, # Utility function
β=0.96, # Discount factor
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):

self.β, self.μ, self.s = β, μ, s


self.f, self.u = f, u

self.grid = np.linspace(1e-5, grid_max, grid_size) # Set up grid


self.shocks = np.exp(μ + s * np.random.randn(shock_size)) # Store shocks

38.4.3 The Bellman Operator

Here’s a function that generates a Bellman operator using linear interpolation

In [5]: def operator_factory(og, parallel_flag=True):


"""
A function factory for building the Bellman operator, as well as
a function that computes greedy policies.
630 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

Here og is an instance of OptimalGrowthModel.


"""

f, u, β = og.f, og.u, og.β


grid, shocks = og.grid, og.shocks

@njit
def objective(c, v, y):
"""
The right-hand side of the Bellman equation
"""
# First turn v into a function via interpolation
v_func = lambda x: interp(grid, v, x)
return u(c) + β * np.mean(v_func(f(y - c) * shocks))

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal v at y
v_max = brent_max(objective, 1e-10, y, args=(v, y))[1]
v_new[i] = v_max
return v_new

@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
σ = np.empty_like(v)
for i in range(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_max = brent_max(objective, 1e-10, y, args=(v, y))[0]
σ[i] = c_max
return σ

return T, get_greedy

optgro The function operator_factory takes a class that represents the growth model and
returns the operator T and a function get_greedy that we will use to solve the model
Notice that the expectation in Eq. (11) is computed via Monte Carlo, using the approxima-
tion

1 𝑛
∫ 𝑣(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧) ≈ ∑ 𝑣(𝑓(𝑦 − 𝑐)𝜉𝑖 )
𝑛 𝑖=1

where {𝜉𝑖 }𝑛𝑖=1 are IID draws from 𝜙


Monte Carlo is not always the most efficient way to compute integrals numerically but it does
have some theoretical advantages in the present setting
(For example, it preserves the contraction mapping property of the Bellman operator — see,
e.g., [102])

38.4.4 An Example

Let’s test out our operator when


38.4. COMPUTATION 631

• 𝑓(𝑘) = 𝑘𝛼
• 𝑢(𝑐) = ln 𝑐
• 𝜙 is the distribution of exp(𝜇 + 𝜎𝜁) when 𝜁 is standard normal

As is well-known (see [87], section 3.1.2), for this particular problem an exact analytical solu-
tion is available, with

ln(1 − 𝛼𝛽) (𝜇 + 𝛼 ln(𝛼𝛽)) 1 1 1


𝑣∗ (𝑦) = + [ − ]+ ln 𝑦 (12)
1−𝛽 1−𝛼 1 − 𝛽 1 − 𝛼𝛽 1 − 𝛼𝛽

The optimal consumption policy is

𝜎∗ (𝑦) = (1 − 𝛼𝛽)𝑦

We will define functions to compute the closed-form solutions to check our answers

In [6]: def σ_star(y, α, β):


"""
True optimal policy
"""
return (1 - α * β) * y

def v_star(y, α, β, μ):


"""
True value function
"""
c1 = np.log(1 - α * β) / (1 - β)
c2 = (μ + α * np.log(α * β)) / (1 - α)
c3 = 1 / (1 - β)
c4 = 1 / (1 - α * β)
return c1 + c2 * (c3 - c4) + c4 * np.log(y)

38.4.5 A First Test

To test our code, we want to see if we can replicate the analytical solution numerically, using
fitted value function iteration
First, having run the code for the general model shown above, let’s generate an instance of
the model and generate its Bellman operator
We first need to define a jitted version of the production function

In [7]: α = 0.4 # Production function parameter

@njit
def f(k):
"""
Cobb-Douglas production function
"""
return k**α

Now we will create an instance of the model and assign it to the variable og
This instance will use the Cobb-Douglas production function and log utility

In [8]: og = OptimalGrowthModel(f=f, u=np.log)

We will use og to generate the Bellman operator and a function that computes greedy poli-
cies
632 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

In [9]: T, get_greedy = operator_factory(og)

Now let’s do some tests


As one preliminary test, let’s see what happens when we apply our Bellman operator to the
exact solution 𝑣∗
In theory, the resulting function should again be 𝑣∗
In practice, we expect some small numerical error

In [10]: grid = og.grid


β, μ = og.β, og.μ

v_init = v_star(grid, α, β, μ) # Start at the solution


v = T(v_init) # Apply the Bellman operator once

fig, ax = plt.subplots(figsize=(9, 5))


ax.set_ylim(-35, -24)
ax.plot(grid, v, lw=2, alpha=0.6, label='$Tv^*$')
ax.plot(grid, v_init, lw=2, alpha=0.6, label='$v^*$')
ax.legend()
plt.show()

The two functions are essentially indistinguishable, so we are off to a good start
Now let’s have a look at iterating with the Bellman operator, starting off from an arbitrary
initial condition
The initial condition we’ll start with is 𝑣(𝑦) = 5 ln(𝑦)

In [11]: v = 5 * np.log(grid) # An initial condition


n = 35

fig, ax = plt.subplots(figsize=(9, 6))

ax.plot(grid, v, color=plt.cm.jet(0),
lw=2, alpha=0.6, label='Initial condition')

for i in range(n):
38.4. COMPUTATION 633

v = T(v) # Apply the Bellman operator


ax.plot(grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)

ax.plot(grid, v_star(grid, α, β, μ), 'k-', lw=2,


alpha=0.8, label='True value function')

ax.legend()
ax.set(ylim=(-40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()

The figure shows

1. the first 36 functions generated by the fitted value function iteration algorithm, with
hotter colors given to higher iterates
2. the true value function 𝑣∗ drawn in black

The sequence of iterates converges towards 𝑣∗


We are clearly getting closer
We can write a function that iterates until the difference is below a particular tolerance level

In [12]: def solve_model(og,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

T, _ = operator_factory(og, parallel_flag=use_parallel)

# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1
634 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

We can check our result by plotting it against the true value

In [13]: v_solution = solve_model(og)

fig, ax = plt.subplots(figsize=(9, 5))

ax.plot(grid, v_solution, lw=2, alpha=0.6,


label='Approximate value function')

ax.plot(grid, v_star(grid, α, β, μ), lw=2,


alpha=0.6, label='True value function')

ax.legend()
ax.set_ylim(-35, -24)
plt.show()

Error at iteration 25 is 0.4141798429821719.


Error at iteration 50 is 0.14926464561285258.
Error at iteration 75 is 0.053794488219015335.
Error at iteration 100 is 0.019387356939361666.
Error at iteration 125 is 0.006987139789480068.
Error at iteration 150 is 0.0025181422403903753.
Error at iteration 175 is 0.0009075301960557169.
Error at iteration 200 is 0.0003270709031326646.
Error at iteration 225 is 0.00011787527991558022.

Converged in 230 iterations.


38.5. EXERCISES 635

The figure shows that we are pretty much on the money

38.4.6 The Policy Function

To compute an approximate optimal policy, we will use the second function returned from
operator_factory that backs out the optimal policy from the solution to the Bellman
equation
The next figure compares the result to the exact solution, which, as mentioned above, is
𝜎(𝑦) = (1 − 𝛼𝛽)𝑦

In [14]: fig, ax = plt.subplots(figsize=(9, 5))

ax.plot(grid, get_greedy(v_solution), lw=2,


alpha=0.6, label='Approximate policy function')

ax.plot(grid, σ_star(grid, α, β),


lw=2, alpha=0.6, label='True policy function')

ax.legend()
plt.show()

The figure shows that we’ve done a good job in this instance of approximating the true policy

38.5 Exercises

38.5.1 Exercise 1

Once an optimal consumption policy 𝜎 is given, income follows Eq. (5)


The next figure shows a simulation of 100 elements of this sequence for three different dis-
count factors (and hence three different policies)
636 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL

In each sequence, the initial condition is 𝑦0 = 0.1


The discount factors are discount_factors = (0.8, 0.9, 0.98)
We have also dialed down the shocks a bit with s = 0.05
Otherwise, the parameters and primitives are the same as the log-linear model discussed ear-
lier in the lecture
Notice that more patient agents typically have higher wealth
Replicate the figure modulo randomness

38.6 Solutions

38.6.1 Exercise 1

Here’s one solution (assuming as usual that you’ve executed everything above)

In [15]: def simulate_og(σ_func, og, α, y0=0.1, ts_length=100):


'''
Compute a time series given consumption policy σ.
'''
y = np.empty(ts_length)
ξ = np.random.randn(ts_length-1)
y[0] = y0
for t in range(ts_length-1):
y[t+1] = (y[t] - σ_func(y[t]))**α * np.exp(og.μ + og.s * ξ[t])
return y

In [16]: fig, ax = plt.subplots(figsize=(9, 6))

for β in (0.8, 0.9, 0.98):

og = OptimalGrowthModel(f, np.log, β=β, s=0.05)


grid = og.grid

v_solution = solve_model(og, verbose=False)

σ_star = get_greedy(v_solution)
σ_func = lambda x: interp(grid, σ_star, x) # Define an optimal policy function
38.6. SOLUTIONS 637

y = simulate_og(σ_func, og, α)
ax.plot(y, lw=2, alpha=0.6, label=rf'$\beta = {β}$')

ax.legend(loc='lower right')
plt.show()
638 38. OPTIMAL GROWTH I: THE STOCHASTIC OPTIMAL GROWTH MODEL
39

Optimal Growth II: Time Iteration

39.1 Contents

• Overview 39.2

• The Euler Equation 39.3

• Comparison with Value Function Iteration 39.4

• Implementation 39.5

• Exercises 39.6

• Solutions 39.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

39.2 Overview

In this lecture, we’ll continue our earlier study of the stochastic optimal growth model
In that lecture, we solved the associated discounted dynamic programming problem using
value function iteration
The beauty of this technique is its broad applicability
With numerical problems, however, we can often attain higher efficiency in specific applica-
tions by deriving methods that are carefully tailored to the application at hand
The stochastic optimal growth model has plenty of structure to exploit for this purpose, espe-
cially when we adopt some concavity and smoothness assumptions over primitives
We’ll use this structure to obtain an Euler equation based method that’s more efficient
than value function iteration for this and some other closely related applications
In a subsequent lecture, we’ll see that the numerical implementation part of the Euler equa-
tion method can be further adjusted to obtain even more efficiency
Let’s start with some imports

639
640 39. OPTIMAL GROWTH II: TIME ITERATION

In [2]: import numpy as np


import quantecon as qe
from interpolation import interp
from numba import njit, prange
from quantecon.optimize import brentq
from quantecon.optimize.scalar_maximization import brent_max
import matplotlib.pyplot as plt
%matplotlib inline

39.3 The Euler Equation

Let’s take the model set out in the stochastic growth model lecture and add the assumptions
that

1. 𝑢 and 𝑓 are continuously differentiable and strictly concave


2. 𝑓(0) = 0
3. lim𝑐→0 𝑢′ (𝑐) = ∞ and lim𝑐→∞ 𝑢′ (𝑐) = 0
4. lim𝑘→0 𝑓 ′ (𝑘) = ∞ and lim𝑘→∞ 𝑓 ′ (𝑘) = 0

The last two conditions are usually called Inada conditions


Recall the Bellman equation

𝑣∗ (𝑦) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣∗ (𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} for all 𝑦 ∈ R+ (1)


0≤𝑐≤𝑦

Let the optimal consumption policy be denoted by 𝜎∗


We know that 𝜎∗ is a 𝑣∗ greedy policy so that 𝜎∗ (𝑦) is the maximizer in Eq. (1)
The conditions above imply that

• 𝜎∗ is the unique optimal policy for the stochastic optimal growth model
• the optimal policy is continuous, strictly increasing and also interior, in the sense that
0 < 𝜎∗ (𝑦) < 𝑦 for all strictly positive 𝑦, and
• the value function is strictly concave and continuously differentiable, with

(𝑣∗ )′ (𝑦) = 𝑢′ (𝜎∗ (𝑦)) ∶= (𝑢′ ∘ 𝜎∗ )(𝑦) (2)

The last result is called the envelope condition due to its relationship with the envelope
theorem
To see why Eq. (2) might be valid, write the Bellman equation in the equivalent form

𝑣∗ (𝑦) = max {𝑢(𝑦 − 𝑘) + 𝛽 ∫ 𝑣∗ (𝑓(𝑘)𝑧)𝜙(𝑑𝑧)} ,


0≤𝑘≤𝑦

differentiate naively with respect to 𝑦, and then evaluate at the optimum


Section 12.1 of EDTC contains full proofs of these results, and closely related discussions can
be found in many other texts
Differentiability of the value function and interiority of the optimal policy imply that optimal
consumption satisfies the first order condition associated with Eq. (1), which is
39.3. THE EULER EQUATION 641

𝑢′ (𝜎∗ (𝑦)) = 𝛽 ∫(𝑣∗ )′ (𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (3)

Combining Eq. (2) and the first-order condition Eq. (3) gives the famous Euler equation

(𝑢′ ∘ 𝜎∗ )(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (4)

We can think of the Euler equation as a functional equation

(𝑢′ ∘ 𝜎)(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧) (5)

over interior consumption policies 𝜎, one solution of which is the optimal policy 𝜎∗
Our aim is to solve the functional equation Eq. (5) and hence obtain 𝜎∗

39.3.1 The Coleman-Reffett Operator

Recall the Bellman operator

𝑇 𝑤(𝑦) ∶= max {𝑢(𝑐) + 𝛽 ∫ 𝑤(𝑓(𝑦 − 𝑐)𝑧)𝜙(𝑑𝑧)} (6)


0≤𝑐≤𝑦

Just as we introduced the Bellman operator to solve the Bellman equation, we will now intro-
duce an operator over policies to help us solve the Euler equation
This operator 𝐾 will act on the set of all 𝜎 ∈ Σ that are continuous, strictly increasing and
interior (i.e., 0 < 𝜎(𝑦) < 𝑦 for all strictly positive 𝑦)
Henceforth we denote this set of policies by 𝒫

1. The operator 𝐾 takes as its argument a 𝜎 ∈ 𝒫 and


2. returns a new function 𝐾𝜎, where 𝐾𝜎(𝑦) is the 𝑐 ∈ (0, 𝑦) that solves

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧) (7)

We call this operator the Coleman-Reffett operator to acknowledge the work of [28] and
[107]
In essence, 𝐾𝜎 is the consumption policy that the Euler equation tells you to choose today
when your future consumption policy is 𝜎
The important thing to note about 𝐾 is that, by construction, its fixed points coincide with
solutions to the functional equation Eq. (5)
In particular, the optimal policy 𝜎∗ is a fixed point
Indeed, for fixed 𝑦, the value 𝐾𝜎∗ (𝑦) is the 𝑐 that solves

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧)

In view of the Euler equation, this is exactly 𝜎∗ (𝑦)


642 39. OPTIMAL GROWTH II: TIME ITERATION

39.3.2 Is the Coleman-Reffett Operator Well Defined?

In particular, is there always a unique 𝑐 ∈ (0, 𝑦) that solves Eq. (7)?


The answer is yes, under our assumptions
For any 𝜎 ∈ 𝒫, the right side of Eq. (7)

• is continuous and strictly increasing in 𝑐 on (0, 𝑦)


• diverges to +∞ as 𝑐 ↑ 𝑦

The left side of Eq. (7)

• is continuous and strictly decreasing in 𝑐 on (0, 𝑦)


• diverges to +∞ as 𝑐 ↓ 0

Sketching these curves and using the information above will convince you that they cross ex-
actly once as 𝑐 ranges over (0, 𝑦)
With a bit more analysis, one can show in addition that 𝐾𝜎 ∈ 𝒫 whenever 𝜎 ∈ 𝒫

39.4 Comparison with Value Function Iteration

How does Euler equation time iteration compare with value function iteration?
Both can be used to compute the optimal policy, but is one faster or more accurate?
There are two parts to this story
First, on a theoretical level, the two methods are essentially isomorphic
In particular, they converge at the same rate
We’ll prove this in just a moment
The other side of the story is the accuracy of the numerical implementation
It turns out that, once we actually implement these two routines, time iteration is more accu-
rate than value function iteration
More on this below

39.4.1 Equivalent Dynamics

Let’s talk about the theory first


To explain the connection between the two algorithms, it helps to understand the notion of
equivalent dynamics
(This concept is very helpful in many other contexts as well)
Suppose that we have a function 𝑔 ∶ 𝑋 → 𝑋 where 𝑋 is a given set
The pair (𝑋, 𝑔) is sometimes called a dynamical system and we associate it with trajecto-
ries of the form
39.4. COMPARISON WITH VALUE FUNCTION ITERATION 643

𝑥𝑡+1 = 𝑔(𝑥𝑡 ), 𝑥0 given

Equivalently, 𝑥𝑡 = 𝑔𝑡 (𝑥0 ), where 𝑔 is the 𝑡-th composition of 𝑔 with itself


Here’s the picture

Now let another function ℎ ∶ 𝑌 → 𝑌 where 𝑌 is another set


Suppose further that

• there exists a bijection 𝜏 from 𝑋 to 𝑌


• the two functions commute under 𝜏 , which is to say that 𝜏 (𝑔(𝑥)) = ℎ(𝜏 (𝑥)) for all
𝑥∈𝑋

The last statement can be written more simply as

𝜏 ∘𝑔 = ℎ∘𝜏

or, by applying 𝜏 −1 to both sides

𝑔 = 𝜏 −1 ∘ ℎ ∘ 𝜏 (8)

Here’s a commutative diagram that illustrates

Here’s a similar figure that traces out the action of the maps on a point 𝑥 ∈ 𝑋

Now, it’s easy to check from Eq. (8) that 𝑔2 = 𝜏 −1 ∘ ℎ2 ∘ 𝜏 holds


644 39. OPTIMAL GROWTH II: TIME ITERATION

In fact, if you like proofs by induction, you won’t have trouble showing that

𝑔𝑛 = 𝜏 −1 ∘ ℎ𝑛 ∘ 𝜏

is valid for all 𝑛


What does this tell us?
It tells us that the following are equivalent

• iterate 𝑛 times with 𝑔, starting at 𝑥


• shift 𝑥 to 𝑌 using 𝜏 , iterate 𝑛 times with ℎ starting at 𝜏 (𝑥) and shift the result
ℎ𝑛 (𝜏 (𝑥)) back to 𝑋 using 𝜏 −1

We end up with exactly the same object

39.4.2 Back to Economics

Have you guessed where this is leading?


What we’re going to show now is that the operators 𝑇 and 𝐾 commute under a certain bijec-
tion
The implication is that they have exactly the same rate of convergence
To make life a little easier, we’ll assume in the following analysis (although not always in our
applications) that 𝑢(0) = 0
A Bijection
Let 𝒱 be all strictly concave, continuously differentiable functions 𝑣 mapping R+ to itself and
satisfying 𝑣(0) = 0 and 𝑣′ (𝑦) > 𝑢′ (𝑦) for all positive 𝑦
For 𝑣 ∈ 𝒱 let

𝑀 𝑣 ∶= ℎ ∘ 𝑣′ where ℎ ∶= (𝑢′ )−1

Although we omit details, 𝜎 ∶= 𝑀 𝑣 is actually the unique 𝑣-greedy policy

• See proposition 12.1.18 of EDTC

It turns out that 𝑀 is a bijection from 𝒱 to 𝒫


A (solved) exercise below asks you to confirm this
Commutative Operators
It is an additional solved exercise (see below) to show that 𝑇 and 𝐾 commute under 𝑀 , in
the sense that

𝑀 ∘𝑇 = 𝐾 ∘𝑀 (9)

In view of the preceding discussion, this implies that


39.5. IMPLEMENTATION 645

𝑇 𝑛 = 𝑀 −1 ∘ 𝐾 𝑛 ∘ 𝑀

Hence, 𝑇 and 𝐾 converge at exactly the same rate!

39.5 Implementation

We’ve just shown that the operators 𝑇 and 𝐾 have the same rate of convergence
However, it turns out that, once numerical approximation is taken into account, significant
differences arise
In particular, the image of policy functions under 𝐾 can be calculated faster and with greater
accuracy than the image of value functions under 𝑇
Our intuition for this result is that

• the Coleman-Reffett operator exploits more information because it uses first order and
envelope conditions
• policy functions generally have less curvature than value functions, and hence admit
more accurate approximations based on grid point information

First, we’ll store the parameters of the model in a class OptimalGrowthModel

In [3]: class OptimalGrowthModel:

def __init__(self,
f,
f_prime,
u,
u_prime,
β=0.96,
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):

self.β, self.μ, self.s = β, μ, s


self.f, self.u = f, u
self.f_prime, self.u_prime = f_prime, u_prime

self.grid = np.linspace(1e-5, grid_max, grid_size) # Set up grid


self.shocks = np.exp(μ + s * np.random.randn(shock_size)) # Store shocks

Here’s some code that returns the Coleman-Reffett operator, 𝐾

In [4]: def time_operator_factory(og, parallel_flag=True):


"""
A function factory for building the Coleman-Reffett operator.
Here og is an instance of OptimalGrowthModel.
"""
β = og.β
f, u = og.f, og.u
f_prime, u_prime = og.f_prime, og.u_prime
grid, shocks = og.grid, og.shocks

@njit
def objective(c, σ, y):
"""
646 39. OPTIMAL GROWTH II: TIME ITERATION

The right hand side of the operator


"""
# First turn w into a function via interpolation
σ_func = lambda x: interp(grid, σ, x)
vals = u_prime(σ_func(f(y - c) * shocks)) * f_prime(y - c) * shocks
return u_prime(c) - β * np.mean(vals)

@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star

return σ_new

return K

It has some similarities to the code for the Bellman operator in our optimal growth lecture
For example, it evaluates integrals by Monte Carlo and approximates functions using linear
interpolation
Here’s that Bellman operator code again, which needs to be executed because we’ll use it in
some tests below

In [5]: def operator_factory(og, parallel_flag=True):


"""
A function factory for building the Bellman operator, as well as
a function that computes greedy policies.

Here og is an instance of OptimalGrowthModel.


"""

f, u, β = og.f, og.u, og.β


grid, shocks = og.grid, og.shocks

@njit
def objective(c, v, y):
"""
The right-hand side of the Bellman equation
"""
# First turn v into a function via interpolation
v_func = lambda x: interp(grid, v, x)
return u(c) + β * np.mean(v_func(f(y - c) * shocks))

@njit(parallel=parallel_flag)
def T(v):
"""
The Bellman operator
"""
v_new = np.empty_like(v)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal v at y
v_max = brent_max(objective, 1e-10, y, args=(v, y))[1]
v_new[i] = v_max
return v_new

@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
σ = np.empty_like(v)
39.5. IMPLEMENTATION 647

for i in range(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_max = brent_max(objective, 1e-10, y, args=(v, y))[0]
σ[i] = c_max
return σ

return T, get_greedy

39.5.1 Testing on the Log / Cobb–Douglas Case

As we did for value function iteration, let’s start by testing our method in the presence of a
model that does have an analytical solution
First, we generate an instance of OptimalGrowthModel and return the corresponding
Coleman-Reffett operator

In [6]: α = 0.3

@njit
def f(k):
"Deterministic part of production function"
return k**α

@njit
def f_prime(k):
return α * k**(α - 1)

og = OptimalGrowthModel(f=f, f_prime=f_prime,
u=np.log, u_prime=njit(lambda x: 1/x))

K = time_operator_factory(og)

As a preliminary test, let’s see if 𝐾𝜎∗ = 𝜎∗ , as implied by the theory

In [7]: @njit
def σ_star(y, α, β):
"True optimal policy"
return (1 - α * β) * y

grid, β = og.grid, og.β


σ_star_new = K(σ_star(grid, α, β))

fig, ax = plt.subplots()
ax.plot(grid, σ_star(grid, α, β), label="optimal policy $\sigma^*$")
ax.plot(grid, σ_star_new, label="$K\sigma^*$")

ax.legend()
plt.show()
648 39. OPTIMAL GROWTH II: TIME ITERATION

We can’t really distinguish the two plots, so we are looking good, at least for this test
Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗
The initial condition we’ll use is the one that eats the whole pie: 𝜎(𝑦) = 𝑦

In [8]: n = 15
σ = grid.copy() # Set initial condition
fig, ax = plt.subplots(figsize=(9, 6))
lb = 'initial condition $\sigma(y) = y$'
ax.plot(grid, σ, color=plt.cm.jet(0), alpha=0.6, label=lb)

for i in range(n):
σ = K(σ)
ax.plot(grid, σ, color=plt.cm.jet(i / n), alpha=0.6)

lb = 'true policy function $\sigma^*$'


ax.plot(grid, σ_star(grid, α, β), 'k-', alpha=0.8, label=lb)
ax.legend()

plt.show()
39.5. IMPLEMENTATION 649

We see that the policy has converged nicely, in only a few steps
Now let’s compare the accuracy of iteration between the operators
We’ll generate

1. 𝐾 𝑛 𝜎 where 𝜎(𝑦) = 𝑦
2. (𝑀 ∘ 𝑇 𝑛 ∘ 𝑀 −1 )𝜎 where 𝜎(𝑦) = 𝑦

In each case, we’ll compare the resulting policy to 𝜎∗


The theory on equivalent dynamics says we will get the same policy function and hence the
same errors
But in fact we expect the first method to be more accurate for reasons discussed above

In [9]: T, get_greedy = operator_factory(og) # Return the Bellman operator

σ = grid # Set initial condition for σ


v = og.u(grid) # Set initial condition for v
sim_length = 20

for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration

# Calculate difference with actual solution


σ_error = σ_star(grid, α, β) - σ
v_error = σ_star(grid, α, β) - get_greedy(v)

plt.plot(grid, σ_error, alpha=0.6, label="policy iteration error")


plt.plot(grid, v_error, alpha=0.6, label="value iteration error")
plt.legend()
plt.show()
650 39. OPTIMAL GROWTH II: TIME ITERATION

As you can see, time iteration is much more accurate for a given number of iterations

39.6 Exercises

39.6.1 Exercise 1

Show that Eq. (9) is valid. In particular,

• Let 𝑣 be strictly concave and continuously differentiable on (0, ∞)


• Fix 𝑦 ∈ (0, ∞) and show that 𝑀 𝑇 𝑣(𝑦) = 𝐾𝑀 𝑣(𝑦)

39.6.2 Exercise 2

Show that 𝑀 is a bijection from 𝒱 to 𝒫

39.6.3 Exercise 3

Consider the same model as above but with the CRRA utility function

𝑐1−𝛾 − 1
𝑢(𝑐) =
1−𝛾

Iterate 20 times with Bellman iteration and Euler equation time iteration

• start time iteration from 𝜎(𝑦) = 𝑦


• start value function iteration from 𝑣(𝑦) = 𝑢(𝑦)
• set 𝛾 = 1.5
39.7. SOLUTIONS 651

Compare the resulting policies and check that they are close

39.6.4 Exercise 4

Solve the above model as we did in the previous lecture using the operators 𝑇 and 𝐾, and
check the solutions are similiar by plotting

39.7 Solutions

39.7.1 Exercise 1

Let 𝑇 , 𝐾, 𝑀 , 𝑣 and 𝑦 be as stated in the exercise


Using the envelope theorem, one can show that (𝑇 𝑣)′ (𝑦) = 𝑢′ (𝜎(𝑦)) where 𝜎(𝑦) solves

𝑢′ (𝜎(𝑦)) = 𝛽 ∫ 𝑣′ (𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧) (10)

Hence 𝑀 𝑇 𝑣(𝑦) = (𝑢′ )−1 (𝑢′ (𝜎(𝑦))) = 𝜎(𝑦)


On the other hand, 𝐾𝑀 𝑣(𝑦) is the 𝜎(𝑦) that solves

𝑢′ (𝜎(𝑦)) = 𝛽 ∫(𝑢′ ∘ (𝑀 𝑣))(𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧)

= 𝛽 ∫(𝑢′ ∘ ((𝑢′ )−1 ∘ 𝑣′ ))(𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧)

= 𝛽 ∫ 𝑣′ (𝑓(𝑦 − 𝜎(𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎(𝑦))𝑧𝜙(𝑑𝑧)

We see that 𝜎(𝑦) is the same in each case

39.7.2 Exercise 2

We need to show that 𝑀 is a bijection from 𝒱 to 𝒫


To see this, first observe that, in view of our assumptions above, 𝑢′ is a strictly decreasing
continuous bijection from (0, ∞) to itself
It follows that ℎ has the same properties
Moreover, for fixed 𝑣 ∈ 𝒱, the derivative 𝑣′ is a continuous, strictly decreasing function
Hence, for fixed 𝑣 ∈ 𝒱, the map 𝑀 𝑣 = ℎ ∘ 𝑣′ is strictly increasing and continuous, taking
values in (0, ∞)
Moreover, interiority holds because 𝑣′ strictly dominates 𝑢′ , implying that

(𝑀 𝑣)(𝑦) = ℎ(𝑣′ (𝑦)) < ℎ(𝑢′ (𝑦)) = 𝑦

In particular, 𝜎(𝑦) ∶= (𝑀 𝑣)(𝑦) is an element of 𝒫


To see that each 𝜎 ∈ 𝒫 has a preimage 𝑣 ∈ 𝒱 with 𝑀 𝑣 = 𝜎, fix any 𝜎 ∈ 𝒫
652 39. OPTIMAL GROWTH II: TIME ITERATION

𝑦
Let 𝑣(𝑦) ∶= ∫0 𝑢′ (𝜎(𝑥))𝑑𝑥 with 𝑣(0) = 0
With a small amount of effort, you will be able to show that 𝑣 ∈ 𝒱 and 𝑀 𝑣 = 𝜎
It’s also true that 𝑀 is one-to-one on 𝒱
To see this, suppose that 𝑣 and 𝑤 are elements of 𝒱 satisfying 𝑀 𝑣 = 𝑀 𝑤
Then 𝑣(0) = 𝑤(0) = 0 and 𝑣′ = 𝑤′ on (0, ∞)
The fundamental theorem of calculus then implies that 𝑣 = 𝑤 on R+

39.7.3 Exercise 3

Here’s the code, which will execute if you’ve run all the code above

In [10]: γ = 1.5 # Preference parameter

@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)

@njit
def u_prime(c):
return c**(-γ)

og = OptimalGrowthModel(f=f, f_prime=f_prime, u=u, u_prime=u_prime)

T, get_greedy = operator_factory(og)
K = time_operator_factory(og)

σ = grid # Initial condition for σ


v = u(grid) # Initial condition for v
sim_length = 20

for i in range(sim_length):
σ = K(σ) # Time iteration
v = T(v) # Value function iteration

plt.plot(grid, σ, alpha=0.6, label="policy iteration")


plt.plot(grid, get_greedy(v), alpha=0.6, label="value iteration")
plt.legend()
plt.show()
39.7. SOLUTIONS 653

The policies are indeed close

39.7.4 Exercise 4

Here’s is the function we need to solve the model using value function iteration, copied from
the previous lecture

In [11]: def solve_model(og,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

T, _ = operator_factory(og, parallel_flag=use_parallel)

# Set up loop
v = np.log(og.grid) # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return v_new

Similarly, we can write a function that uses K to solve the model


654 39. OPTIMAL GROWTH II: TIME ITERATION

In [12]: def solve_model_time(og,


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

K = time_operator_factory(og, parallel_flag=use_parallel)

# Set up loop
σ = og.grid # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


σ_new = K(σ)
error = np.max(np.abs(σ - σ_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ_new

Solving both models and plotting

In [13]: v_star = solve_model(og)


σ_star = solve_model_time(og)

plt.plot(grid, get_greedy(v_star), alpha=0.6, label='Bellman operator')


plt.plot(grid, σ_star, alpha=0.6, label='Coleman-Reffett operator')
plt.legend()
plt.show()

Error at iteration 25 is 0.4127173266072077.


Error at iteration 50 is 0.14874196945970652.
Error at iteration 75 is 0.05360611746223398.
Error at iteration 100 is 0.019319468733378642.
Error at iteration 125 is 0.006962673103384276.
Error at iteration 150 is 0.0025093245369021133.
Error at iteration 175 is 0.0009043523215481741.
Error at iteration 200 is 0.00032592560758359923.
Error at iteration 225 is 0.00011746251896127546.

Converged in 229 iterations.

Converged in 10 iterations.
39.7. SOLUTIONS 655

Time iteration is numerically far more accurate for a given number of iterations
656 39. OPTIMAL GROWTH II: TIME ITERATION
40

Optimal Growth III: The


Endogenous Grid Method

40.1 Contents

• Overview 40.2

• Key Idea 40.3

• Implementation 40.4

• Speed 40.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

40.2 Overview

We solved the stochastic optimal growth model using

1. value function iteration


2. Euler equation based time iteration

We found time iteration to be significantly more accurate at each step


In this lecture, we’ll look at an ingenious twist on the time iteration technique called the en-
dogenous grid method (EGM)
EGM is a numerical method for implementing policy iteration invented by Chris Carroll
It is a good example of how a clever algorithm can save a massive amount of computer time
(Massive when we multiply saved CPU cycles on each implementation times the number of
implementations worldwide)
The original reference is [23]
Let’s start with some imports

657
658 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD

In [2]: import numpy as np


import quantecon as qe
from interpolation import interp
from numba import njit, prange
from quantecon.optimize import brentq
import matplotlib.pyplot as plt
%matplotlib inline

40.3 Key Idea

Let’s start by reminding ourselves of the theory and then see how the numerics fit in

40.3.1 Theory

Take the model set out in the time iteration lecture, following the same terminology and no-
tation
The Euler equation is

(𝑢′ ∘ 𝜎∗ )(𝑦) = 𝛽 ∫(𝑢′ ∘ 𝜎∗ )(𝑓(𝑦 − 𝜎∗ (𝑦))𝑧)𝑓 ′ (𝑦 − 𝜎∗ (𝑦))𝑧𝜙(𝑑𝑧) (1)

As we saw, the Coleman-Reffett operator is a nonlinear operator 𝐾 engineered so that 𝜎∗ is a


fixed point of 𝐾
It takes as its argument a continuous strictly increasing consumption policy 𝜎 ∈ Σ
It returns a new function 𝐾𝜎, where (𝐾𝜎)(𝑦) is the 𝑐 ∈ (0, ∞) that solves

𝑢′ (𝑐) = 𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑦 − 𝑐)𝑧)𝑓 ′ (𝑦 − 𝑐)𝑧𝜙(𝑑𝑧) (2)

40.3.2 Exogenous Grid

As discussed in the lecture on time iteration, to implement the method on a computer we


need a numerical approximation
In particular, we represent a policy function by a set of values on a finite grid
The function itself is reconstructed from this representation when necessary, using interpola-
tion or some other method
Previously, to obtain a finite representation of an updated consumption policy we

• fixed a grid of income points {𝑦𝑖 }


• calculated the consumption value 𝑐𝑖 corresponding to each 𝑦𝑖 using Eq. (2) and a root-
finding routine

Each 𝑐𝑖 is then interpreted as the value of the function 𝐾𝜎 at 𝑦𝑖


Thus, with the points {𝑦𝑖 , 𝑐𝑖 } in hand, we can reconstruct 𝐾𝜎 via approximation
Iteration then continues…
40.4. IMPLEMENTATION 659

40.3.3 Endogenous Grid

The method discussed above requires a root-finding routine to find the 𝑐𝑖 corresponding to a
given income value 𝑦𝑖
Root-finding is costly because it typically involves a significant number of function evalua-
tions
As pointed out by Carroll [23], we can avoid this if 𝑦𝑖 is chosen endogenously
The only assumption required is that 𝑢′ is invertible on (0, ∞)
The idea is this:
First, we fix an exogenous grid {𝑘𝑖 } for capital (𝑘 = 𝑦 − 𝑐)
Then we obtain 𝑐𝑖 via

𝑐𝑖 = (𝑢′ )−1 {𝛽 ∫(𝑢′ ∘ 𝜎)(𝑓(𝑘𝑖 )𝑧) 𝑓 ′ (𝑘𝑖 ) 𝑧 𝜙(𝑑𝑧)} (3)

where (𝑢′ )−1 is the inverse function of 𝑢′


Finally, for each 𝑐𝑖 we set 𝑦𝑖 = 𝑐𝑖 + 𝑘𝑖
It is clear that each (𝑦𝑖 , 𝑐𝑖 ) pair constructed in this manner satisfies Eq. (2)
With the points {𝑦𝑖 , 𝑐𝑖 } in hand, we can reconstruct 𝐾𝜎 via approximation as before
The name EGM comes from the fact that the grid {𝑦𝑖 } is determined endogenously

40.4 Implementation

Let’s implement this version of the Coleman-Reffett operator and see how it performs
First, we will construct a class OptimalGrowthModel to hold the parameters of the model

In [3]: class OptimalGrowthModel:

"""

The class holds parameters and true value and policy functions.
"""

def __init__(self,
f, # Production function
f_prime, # f'(k)
u, # Utility function
u_prime, # Marginal utility
u_prime_inv, # Inverse marginal utility
β=0.96, # Discount factor
μ=0,
s=0.1,
grid_max=4,
grid_size=200,
shock_size=250):

self.β, self.μ, self.s = β, μ, s


self.f, self.u = f, u
self.f_prime, self.u_prime, self.u_prime_inv = f_prime, u_prime, u_prime_inv

self.grid = np.linspace(1e-5, grid_max, grid_size) # Set up grid


self.shocks = np.exp(μ + s * np.random.randn(shock_size)) # Store shocks
660 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD

40.4.1 The Operator

Here’s an implementation of 𝐾 using EGM as described above


Unlike the previous lecture, we do not just-in-time compile the operator because we want to
return the policy function
Despite this, the EGM method is still faster than the standard Coleman-Reffett operator, as
we will see later on

In [4]: def egm_operator_factory(og):


"""
A function factory for building the Coleman-Reffett operator

Here og is an instance of OptimalGrowthModel.


"""

f, u, β = og.f, og.u, og.β


f_prime, u_prime, u_prime_inv = og.f_prime, og.u_prime, og.u_prime_inv
grid, shocks = og.grid, og.shocks

def K(σ):
"""
The Bellman operator

* σ is a function
"""
# Allocate memory for value of consumption on endogenous grid points
c = np.empty_like(grid)

# Solve for updated consumption value


for i, k in enumerate(grid):
vals = u_prime(σ(f(k) * shocks)) * f_prime(k) * shocks
c[i] = u_prime_inv(β * np.mean(vals))

# Determine endogenous grid


y = grid + c # y_i = k_i + c_i

# Update policy function and return


σ_new = lambda x: interp(y, c, x)

return σ_new

return K

Note the lack of any root-finding algorithm


We’ll also run our original implementation, which uses an exogenous grid and requires root-
finding, so we can perform some comparisons

In [5]: def time_operator_factory(og, parallel_flag=True):


"""
A function factory for building the Coleman-Reffett operator.
Here og is an instance of OptimalGrowthModel.
"""
β = og.β
f, u = og.f, og.u
f_prime, u_prime = og.f_prime, og.u_prime
grid, shocks = og.grid, og.shocks

@njit
def objective(c, σ, y):
"""
The right hand side of the operator
"""
# First turn w into a function via interpolation
σ_func = lambda x: interp(grid, σ, x)
vals = u_prime(σ_func(f(y - c) * shocks)) * f_prime(y - c) * shocks
40.4. IMPLEMENTATION 661

return u_prime(c) - β * np.mean(vals)

@njit(parallel=parallel_flag)
def K(σ):
"""
The Coleman-Reffett operator
"""
σ_new = np.empty_like(σ)
for i in prange(len(grid)):
y = grid[i]
# Solve for optimal c at y
c_star = brentq(objective, 1e-10, y-1e-10, args=(σ, y))[0]
σ_new[i] = c_star

return σ_new

return K

Let’s test out the code above on some example parameterizations

40.4.2 Testing on the Log / Cobb–Douglas Case

As we did for value function iteration and time iteration, let’s start by testing our method
with the log-linear benchmark
First, we generate an instance

In [6]: α = 0.4 # Production function parameter

@njit
def f(k):
"""
Cobb-Douglas production function
"""
return k**α

@njit
def f_prime(k):
"""
First derivative of the production function
"""
return α * k**(α - 1)

@njit
def u_prime(c):
return 1 / c

og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=np.log,
u_prime=u_prime,
u_prime_inv=u_prime)

Notice that we’re passing u_prime twice


The reason is that, in the case of log utility, 𝑢′ (𝑐) = (𝑢′ )−1 (𝑐) = 1/𝑐
Hence u_prime and u_prime_inv are the same
As a preliminary test, let’s see if 𝐾𝜎∗ = 𝜎∗ , as implied by the theory

In [7]: β, grid = og.β, og.grid

def c_star(y):
"True optimal policy"
return (1 - α * β) * y
662 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD

K = egm_operator_factory(og) # Return the operator K with endogenous grid

fig, ax = plt.subplots(figsize=(9, 6))

ax.plot(grid, c_star(grid), label="optimal policy $\sigma^*$")


ax.plot(grid, K(c_star)(grid), label="$K\sigma^*$")

ax.legend()
plt.show()

We can’t really distinguish the two plots


In fact it’s easy to see that the difference is essentially zero:

In [8]: max(abs(K(c_star)(grid) - c_star(grid)))

Out[8]: 9.881666666666672e-06

Next, let’s try iterating from an arbitrary initial condition and see if we converge towards 𝜎∗
Let’s start from the consumption policy that eats the whole pie: 𝜎(𝑦) = 𝑦

In [9]: σ = lambda x: x
n = 15
fig, ax = plt.subplots(figsize=(9, 6))

ax.plot(grid, σ(grid), color=plt.cm.jet(0),


alpha=0.6, label='initial condition $\sigma(y) = y$')

for i in range(n):
σ = K(σ) # Update policy
ax.plot(grid, σ(grid), color=plt.cm.jet(i / n), alpha=0.6)

ax.plot(grid, c_star(grid), 'k-',


alpha=0.8, label='true policy function $\sigma^*$')
40.5. SPEED 663

ax.legend()
plt.show()

We see that the policy has converged nicely, in only a few steps

40.5 Speed

Now let’s compare the clock times per iteration for the standard Coleman-Reffett operator
(with exogenous grid) and the EGM version
We’ll do so using the CRRA model adopted in the exercises of the Euler equation time itera-
tion lecture

In [10]: γ = 1.5 # Preference parameter

@njit
def u(c):
return (c**(1 - γ) - 1) / (1 - γ)

@njit
def u_prime(c):
return c**(-γ)

@njit
def u_prime_inv(c):
return c**(-1 / γ)

og = OptimalGrowthModel(f=f,
f_prime=f_prime,
u=u,
u_prime=u_prime,
u_prime_inv=u_prime_inv)

K_time = time_operator_factory(og) # Standard Coleman-Reffett operator


664 40. OPTIMAL GROWTH III: THE ENDOGENOUS GRID METHOD

K_time(grid) # Call once to compile jitted version


K_egm = egm_operator_factory(og) # Coleman-Reffett operator with endogenous grid

Here’s the result

In [11]: sim_length = 20

print("Timing standard Coleman policy function iteration")


σ = grid # Initial policy
qe.util.tic()
for i in range(sim_length):
σ_new = K_time(σ)
σ = σ_new
qe.util.toc()

print("Timing policy function iteration with endogenous grid")


σ = lambda x: x # Initial policy
qe.util.tic()
for i in range(sim_length):
σ_new = K_egm(σ)
σ = σ_new
qe.util.toc()

Timing standard Coleman policy function iteration


TOC: Elapsed: 0:00:0.63
Timing policy function iteration with endogenous grid
TOC: Elapsed: 0:00:0.36

Out[11]: 0.3692951202392578

We see that the EGM version is significantly faster, even without jit compilation!
The absence of numerical root-finding means that it is typically more accurate at each step as
well
41

LQ Dynamic Programming
Problems

41.1 Contents

• Overview 41.2

• Introduction 41.3

• Optimality – Finite Horizon 41.4

• Implementation 41.5

• Extensions and Comments 41.6

• Further Applications 41.7

• Exercises 41.8

• Solutions 41.9

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

41.2 Overview

Linear quadratic (LQ) control refers to a class of dynamic optimization problems that have
found applications in almost every scientific field
This lecture provides an introduction to LQ control and its economic applications
As we will see, LQ systems have a simple structure that makes them an excellent workhorse
for a wide variety of economic problems
Moreover, while the linear-quadratic structure is restrictive, it is in fact far more flexible than
it may appear initially
These themes appear repeatedly below
Mathematically, LQ control problems are closely related to the Kalman filter

665
666 41. LQ DYNAMIC PROGRAMMING PROBLEMS

• Recursive formulations of linear-quadratic control problems and Kalman filtering prob-


lems both involve matrix Riccati equations
• Classical formulations of linear control and linear filtering problems make use of similar
matrix decompositions (see for example this lecture and this lecture)

In reading what follows, it will be useful to have some familiarity with

• matrix manipulations
• vectors of random variables
• dynamic programming and the Bellman equation (see for example this lecture and this
lecture)

For additional reading on LQ control, see, for example,

• [87], chapter 5
• [52], chapter 4
• [65], section 3.5

In order to focus on computation, we leave longer proofs to these sources (while trying to pro-
vide as much intuition as possible)

41.3 Introduction

The “linear” part of LQ is a linear law of motion for the state, while the “quadratic” part
refers to preferences
Let’s begin with the former, move on to the latter, and then put them together into an opti-
mization problem

41.3.1 The Law of Motion

Let 𝑥𝑡 be a vector describing the state of some economic system


Suppose that 𝑥𝑡 follows a linear law of motion given by

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 = 0, 1, 2, … (1)

Here

• 𝑢𝑡 is a “control” vector, incorporating choices available to a decision-maker confronting


the current state 𝑥𝑡
• {𝑤𝑡 } is an uncorrelated zero mean shock process satisfying E𝑤𝑡 𝑤𝑡′ = 𝐼, where the right-
hand side is the identity matrix

Regarding the dimensions

• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
41.3. INTRODUCTION 667

• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗

Example 1
Consider a household budget constraint given by

𝑎𝑡+1 + 𝑐𝑡 = (1 + 𝑟)𝑎𝑡 + 𝑦𝑡

Here 𝑎𝑡 is assets, 𝑟 is a fixed interest rate, 𝑐𝑡 is current consumption, and 𝑦𝑡 is current non-
financial income
If we suppose that {𝑦𝑡 } is serially uncorrelated and 𝑁 (0, 𝜎2 ), then, taking {𝑤𝑡 } to be stan-
dard normal, we can write the system as

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝜎𝑤𝑡+1

This is clearly a special case of Eq. (1), with assets being the state and consumption being
the control
Example 2
One unrealistic feature of the previous model is that non-financial income has a zero mean
and is often negative
This can easily be overcome by adding a sufficiently large mean
Hence in this example, we take 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 for some positive real number 𝜇
Another alteration that’s useful to introduce (we’ll see why soon) is to change the control
variable from consumption to the deviation of consumption from some “ideal” quantity 𝑐 ̄
(Most parameterizations will be such that 𝑐 ̄ is large relative to the amount of consumption
that is attainable in each period, and hence the household wants to increase consumption)
For this reason, we now take our control to be 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐 ̄
In terms of these variables, the budget constraint 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 becomes

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑢𝑡 − 𝑐 ̄ + 𝜎𝑤𝑡+1 + 𝜇 (2)

How can we write this new system in the form of equation Eq. (1)?
If, as in the previous example, we take 𝑎𝑡 as the state, then we run into a problem: the law of
motion contains some constant terms on the right-hand side
This means that we are dealing with an affine function, not a linear one (recall this discus-
sion)
Fortunately, we can easily circumvent this problem by adding an extra state variable
In particular, if we write

𝑎𝑡+1 1 + 𝑟 −𝑐 ̄ + 𝜇 𝑎 −1 𝜎
( )=( )( 𝑡 )+( ) 𝑢𝑡 + ( ) 𝑤𝑡+1 (3)
1 0 1 1 0 0

then the first row is equivalent to Eq. (2)


668 41. LQ DYNAMIC PROGRAMMING PROBLEMS

Moreover, the model is now linear and can be written in the form of Eq. (1) by setting

𝑎𝑡 1 + 𝑟 −𝑐 ̄ + 𝜇 −1 𝜎
𝑥𝑡 ∶= ( ), 𝐴 ∶= ( ), 𝐵 ∶= ( ), 𝐶 ∶= ( ) (4)
1 0 1 0 0

In effect, we’ve bought ourselves linearity by adding another state

41.3.2 Preferences

In the LQ model, the aim is to minimize flow of losses, where time-𝑡 loss is given by the
quadratic expression

𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 (5)

Here

• 𝑅 is assumed to be 𝑛 × 𝑛, symmetric and nonnegative definite


• 𝑄 is assumed to be 𝑘 × 𝑘, symmetric and positive definite

Note
In fact, for many economic problems, the definiteness conditions on 𝑅 and 𝑄 can
be relaxed. It is sufficient that certain submatrices of 𝑅 and 𝑄 be nonnegative
definite. See [52] for details

Example 1
A very simple example that satisfies these assumptions is to take 𝑅 and 𝑄 to be identity ma-
trices so that current loss is

𝑥′𝑡 𝐼𝑥𝑡 + 𝑢′𝑡 𝐼𝑢𝑡 = ‖𝑥𝑡 ‖2 + ‖𝑢𝑡 ‖2

Thus, for both the state and the control, loss is measured as squared distance from the origin
(In fact, the general case Eq. (5) can also be understood in this way, but with 𝑅 and 𝑄 iden-
tifying other – non-Euclidean – notions of “distance” from the zero vector)
Intuitively, we can often think of the state 𝑥𝑡 as representing deviation from a target, such as

• deviation of inflation from some target level


• deviation of a firm’s capital stock from some desired quantity

The aim is to put the state close to the target, while using controls parsimoniously
Example 2
In the household problem studied above, setting 𝑅 = 0 and 𝑄 = 1 yields preferences

𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = 𝑢2𝑡 = (𝑐𝑡 − 𝑐)̄ 2

Under this specification, the household’s current loss is the squared deviation of consumption
from the ideal level 𝑐 ̄
41.4. OPTIMALITY – FINITE HORIZON 669

41.4 Optimality – Finite Horizon

Let’s now be precise about the optimization problem we wish to consider, and look at how to
solve it

41.4.1 The Objective

We will begin with the finite horizon case, with terminal time 𝑇 ∈ N
In this case, the aim is to choose a sequence of controls {𝑢0 , … , 𝑢𝑇 −1 } to minimize the objec-
tive

𝑇 −1
E { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (6)
𝑡=0

subject to the law of motion Eq. (1) and initial state 𝑥0


The new objects introduced here are 𝛽 and the matrix 𝑅𝑓
The scalar 𝛽 is the discount factor, while 𝑥′ 𝑅𝑓 𝑥 gives terminal loss associated with state 𝑥
Comments:

• We assume 𝑅𝑓 to be 𝑛 × 𝑛, symmetric and nonnegative definite


• We allow 𝛽 = 1, and hence include the undiscounted case
• 𝑥0 may itself be random, in which case we require it to be independent of the shock se-
quence 𝑤1 , … , 𝑤𝑇

41.4.2 Information

There’s one constraint we’ve neglected to mention so far, which is that the decision-maker
who solves this LQ problem knows only the present and the past, not the future
To clarify this point, consider the sequence of controls {𝑢0 , … , 𝑢𝑇 −1 }
When choosing these controls, the decision-maker is permitted to take into account the effects
of the shocks {𝑤1 , … , 𝑤𝑇 } on the system
However, it is typically assumed — and will be assumed here — that the time-𝑡 control 𝑢𝑡
can be made with knowledge of past and present shocks only
The fancy measure-theoretic way of saying this is that 𝑢𝑡 must be measurable with respect to
the 𝜎-algebra generated by 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡
This is in fact equivalent to stating that 𝑢𝑡 can be written in the form 𝑢𝑡 =
𝑔𝑡 (𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 ) for some Borel measurable function 𝑔𝑡
(Just about every function that’s useful for applications is Borel measurable, so, for the pur-
poses of intuition, you can read that last phrase as “for some function 𝑔𝑡 ”)
Now note that 𝑥𝑡 will ultimately depend on the realizations of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡
In fact, it turns out that 𝑥𝑡 summarizes all the information about these historical shocks that
the decision-maker needs to set controls optimally
670 41. LQ DYNAMIC PROGRAMMING PROBLEMS

More precisely, it can be shown that any optimal control 𝑢𝑡 can always be written as a func-
tion of the current state alone
Hence in what follows we restrict attention to control policies (i.e., functions) of the form
𝑢𝑡 = 𝑔𝑡 (𝑥𝑡 )
Actually, the preceding discussion applies to all standard dynamic programming problems
What’s special about the LQ case is that – as we shall soon see — the optimal 𝑢𝑡 turns out
to be a linear function of 𝑥𝑡

41.4.3 Solution

To solve the finite horizon LQ problem we can use a dynamic programming strategy based on
backward induction that is conceptually similar to the approach adopted in this lecture
For reasons that will soon become clear, we first introduce the notation 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥
Now consider the problem of the decision-maker in the second to last period
In particular, let the time be 𝑇 − 1, and suppose that the state is 𝑥𝑇 −1
The decision-maker must trade-off current and (discounted) final losses, and hence solves

min{𝑥′𝑇 −1 𝑅𝑥𝑇 −1 + 𝑢′ 𝑄𝑢 + 𝛽 E𝐽𝑇 (𝐴𝑥𝑇 −1 + 𝐵𝑢 + 𝐶𝑤𝑇 )}


𝑢

At this stage, it is convenient to define the function

𝐽𝑇 −1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 E𝐽𝑇 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )} (7)


𝑢

The function 𝐽𝑇 −1 will be called the 𝑇 − 1 value function, and 𝐽𝑇 −1 (𝑥) can be thought of as
representing total “loss-to-go” from state 𝑥 at time 𝑇 − 1 when the decision-maker behaves
optimally
Now let’s step back to 𝑇 − 2
For a decision-maker at 𝑇 − 2, the value 𝐽𝑇 −1 (𝑥) plays a role analogous to that played by the
terminal loss 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥 for the decision-maker at 𝑇 − 1
That is, 𝐽𝑇 −1 (𝑥) summarizes the future loss associated with moving to state 𝑥
The decision-maker chooses her control 𝑢 to trade off current loss against future loss, where

• the next period state is 𝑥𝑇 −1 = 𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 , and hence depends on the choice
of current control
• the “cost” of landing in state 𝑥𝑇 −1 is 𝐽𝑇 −1 (𝑥𝑇 −1 )

Her problem is therefore

min{𝑥′𝑇 −2 𝑅𝑥𝑇 −2 + 𝑢′ 𝑄𝑢 + 𝛽 E𝐽𝑇 −1 (𝐴𝑥𝑇 −2 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 )}


𝑢

Letting

𝐽𝑇 −2 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 E𝐽𝑇 −1 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 −1 )}


𝑢
41.4. OPTIMALITY – FINITE HORIZON 671

the pattern for backward induction is now clear


In particular, we define a sequence of value functions {𝐽0 , … , 𝐽𝑇 } via

𝐽𝑡−1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 E𝐽𝑡 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑡 )} and 𝐽𝑇 (𝑥) = 𝑥′ 𝑅𝑓 𝑥


𝑢

The first equality is the Bellman equation from dynamic programming theory specialized to
the finite horizon LQ problem
Now that we have {𝐽0 , … , 𝐽𝑇 }, we can obtain the optimal controls
As a first step, let’s find out what the value functions look like
It turns out that every 𝐽𝑡 has the form 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 + 𝑑𝑡 where 𝑃𝑡 is a 𝑛 × 𝑛 matrix and 𝑑𝑡
is a constant
We can show this by induction, starting from 𝑃𝑇 ∶= 𝑅𝑓 and 𝑑𝑇 = 0
Using this notation, Eq. (7) becomes

𝐽𝑇 −1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 E(𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )′ 𝑃𝑇 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑇 )} (8)


𝑢

To obtain the minimizer, we can take the derivative of the r.h.s. with respect to 𝑢 and set it
equal to zero
Applying the relevant rules of matrix calculus, this gives

𝑢 = −(𝑄 + 𝛽𝐵′ 𝑃𝑇 𝐵)−1 𝛽𝐵′ 𝑃𝑇 𝐴𝑥 (9)

Plugging this back into Eq. (8) and rearranging yields

𝐽𝑇 −1 (𝑥) = 𝑥′ 𝑃𝑇 −1 𝑥 + 𝑑𝑇 −1

where

𝑃𝑇 −1 = 𝑅 − 𝛽 2 𝐴′ 𝑃𝑇 𝐵(𝑄 + 𝛽𝐵′ 𝑃𝑇 𝐵)−1 𝐵′ 𝑃𝑇 𝐴 + 𝛽𝐴′ 𝑃𝑇 𝐴 (10)

and

𝑑𝑇 −1 ∶= 𝛽 trace(𝐶 ′ 𝑃𝑇 𝐶) (11)

(The algebra is a good exercise — we’ll leave it up to you)


If we continue working backwards in this manner, it soon becomes clear that 𝐽𝑡 (𝑥) = 𝑥′ 𝑃𝑡 𝑥 +
𝑑𝑡 as claimed, where {𝑃𝑡 } and {𝑑𝑡 } satisfy the recursions

𝑃𝑡−1 = 𝑅 − 𝛽 2 𝐴′ 𝑃𝑡 𝐵(𝑄 + 𝛽𝐵′ 𝑃𝑡 𝐵)−1 𝐵′ 𝑃𝑡 𝐴 + 𝛽𝐴′ 𝑃𝑡 𝐴 with 𝑃𝑇 = 𝑅 𝑓 (12)

and

𝑑𝑡−1 = 𝛽(𝑑𝑡 + trace(𝐶 ′ 𝑃𝑡 𝐶)) with 𝑑𝑇 = 0 (13)


672 41. LQ DYNAMIC PROGRAMMING PROBLEMS

Recalling Eq. (9), the minimizers from these backward steps are

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 where 𝐹𝑡 ∶= (𝑄 + 𝛽𝐵′ 𝑃𝑡+1 𝐵)−1 𝛽𝐵′ 𝑃𝑡+1 𝐴 (14)

These are the linear optimal control policies we discussed above


In particular, the sequence of controls given by Eq. (14) and Eq. (1) solves our finite horizon
LQ problem
Rephrasing this more precisely, the sequence 𝑢0 , … , 𝑢𝑇 −1 given by

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 with 𝑥𝑡+1 = (𝐴 − 𝐵𝐹𝑡 )𝑥𝑡 + 𝐶𝑤𝑡+1 (15)

for 𝑡 = 0, … , 𝑇 − 1 attains the minimum of Eq. (6) subject to our constraints

41.5 Implementation

We will use code from lqcontrol.py in QuantEcon.py to solve finite and infinite horizon linear
quadratic control problems
In the module, the various updating, simulation and fixed point methods are wrapped in a
class called LQ, which includes

• Instance data:

– The required parameters 𝑄, 𝑅, 𝐴, 𝐵 and optional parameters C, �, T, R_f, N speci-


fying a given LQ model
* set 𝑇 and 𝑅𝑓 to None in the infinite horizon case
* set C = None (or zero) in the deterministic case
– the value function and policy data
* 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 in the finite horizon case
* 𝑑, 𝑃 , 𝐹 in the infinite horizon case

• Methods:

– update_values — shifts 𝑑𝑡 , 𝑃𝑡 , 𝐹𝑡 to their 𝑡 − 1 values via Eq. (12), Eq. (13) and
Eq. (14)
– stationary_values — computes 𝑃 , 𝑑, 𝐹 in the infinite horizon case
– compute_sequence —- simulates the dynamics of 𝑥𝑡 , 𝑢𝑡 , 𝑤𝑡 given 𝑥0 and assum-
ing standard normal shocks

41.5.1 An Application

Early Keynesian models assumed that households have a constant marginal propensity to
consume from current income
Data contradicted the constancy of the marginal propensity to consume
In response, Milton Friedman, Franco Modigliani and others built models based on a con-
sumer’s preference for an intertemporally smooth consumption stream
41.5. IMPLEMENTATION 673

(See, for example, [43] or [97])


One property of those models is that households purchase and sell financial assets to make
consumption streams smoother than income streams
The household savings problem outlined above captures these ideas
The optimization problem for the household is to choose a consumption sequence in order to
minimize

𝑇 −1
E { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (16)
𝑡=0

subject to the sequence of budget constraints 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 , 𝑡 ≥ 0


Here 𝑞 is a large positive constant, the role of which is to induce the consumer to target zero
debt at the end of her life
(Without such a constraint, the optimal choice is to choose 𝑐𝑡 = 𝑐 ̄ in each period, letting as-
sets adjust accordingly)
As before we set 𝑦𝑡 = 𝜎𝑤𝑡+1 + 𝜇 and 𝑢𝑡 ∶= 𝑐𝑡 − 𝑐,̄ after which the constraint can be written as
in Eq. (2)
We saw how this constraint could be manipulated into the LQ formulation 𝑥𝑡+1 = 𝐴𝑥𝑡 +
𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 by setting 𝑥𝑡 = (𝑎𝑡 1)′ and using the definitions in Eq. (4)
To match with this state and control, the objective function Eq. (16) can be written in the
form of Eq. (6) by choosing

0 0 𝑞 0
𝑄 ∶= 1, 𝑅 ∶= ( ), and 𝑅𝑓 ∶= ( )
0 0 0 0

Now that the problem is expressed in LQ form, we can proceed to the solution by applying
Eq. (12) and Eq. (14)
After generating shocks 𝑤1 , … , 𝑤𝑇 , the dynamics for assets and consumption can be simu-
lated via Eq. (15)
The following figure was computed using 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 2, 𝜇 = 1, 𝜎 = 0.25, 𝑇 = 45
and 𝑞 = 106
The shocks {𝑤𝑡 } were taken to be IID and standard normal

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ

# == Model parameters == #
r = 0.05
β = 1/(1 + r)
T = 45
c_bar = 2
σ = 0.25
μ = 1
q = 1e6

# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((2, 2))
Rf = np.zeros((2, 2))
Rf[0, 0] = q
674 41. LQ DYNAMIC PROGRAMMING PROBLEMS

A = [[1 + r, -c_bar + μ],


[0, 1]]
B = [[-1],
[ 0]]
C = [[σ],
[0]]

# == Compute solutions and simulate == #


lq = LQ(Q, R, A, B, C, beta=β, T=T, Rf=Rf)
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)

# == Convert back to assets, consumption and income == #


assets = xp[0, :] # a_t
c = up.flatten() + c_bar # c_t
income = σ * wp[0, 1:] + μ # y_t

# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(list(range(1, T+1)), income, 'g-', label="non-financial income",


**p_args)
axes[0].plot(list(range(T)), c, 'k-', label="consumption", **p_args)

axes[1].plot(list(range(1, T+1)), np.cumsum(income - μ), 'r-',


label="cumulative unanticipated income", **p_args)
axes[1].plot(list(range(T+1)), assets, 'b-', label="assets", **p_args)
axes[1].plot(list(range(T)), np.zeros(T), 'k-')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()

<Figure size 1200x1000 with 2 Axes>

The top panel shows the time path of consumption 𝑐𝑡 and income 𝑦𝑡 in the simulation
As anticipated by the discussion on consumption smoothing, the time path of consumption is
much smoother than that for income
(But note that consumption becomes more irregular towards the end of life, when the zero
final asset requirement impinges more on consumption choices)
The second panel in the figure shows that the time path of assets 𝑎𝑡 is closely correlated with
cumulative unanticipated income, where the latter is defined as

𝑡
𝑧𝑡 ∶= ∑ 𝜎𝑤𝑡
𝑗=0

A key message is that unanticipated windfall gains are saved rather than consumed, while
unanticipated negative shocks are met by reducing assets
(Again, this relationship breaks down towards the end of life due to the zero final asset re-
quirement)
41.5. IMPLEMENTATION 675

These results are relatively robust to changes in parameters


For example, let’s increase 𝛽 from 1/(1 + 𝑟) ≈ 0.952 to 0.96 while keeping other parameters
fixed
This consumer is slightly more patient than the last one, and hence puts relatively more
weight on later consumption values

In [3]: # == Compute solutions and simulate == #


lq = LQ(Q, R, A, B, C, beta=0.96, T=T, Rf=Rf)
x0 = (0, 1)
xp, up, wp = lq.compute_sequence(x0)

# == Convert back to assets, consumption and income == #


assets = xp[0, :] # a_t
c = up.flatten() + c_bar # c_t
income = σ * wp[0, 1:] + μ # y_t

# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(list(range(1, T+1)), income, 'g-', label="non-financial income",


**p_args)
axes[0].plot(list(range(T)), c, 'k-', label="consumption", **p_args)

axes[1].plot(list(range(1, T+1)), np.cumsum(income - μ), 'r-',


label="cumulative unanticipated income", **p_args)
axes[1].plot(list(range(T+1)), assets, 'b-', label="assets", **p_args)
axes[1].plot(list(range(T)), np.zeros(T), 'k-')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
676 41. LQ DYNAMIC PROGRAMMING PROBLEMS

We now have a slowly rising consumption stream and a hump-shaped build-up of assets in the
middle periods to fund rising consumption
However, the essential features are the same: consumption is smooth relative to income, and
assets are strongly positively correlated with cumulative unanticipated income

41.6 Extensions and Comments

Let’s now consider a number of standard extensions to the LQ problem treated above

41.6.1 Time-Varying Parameters

In some settings, it can be desirable to allow 𝐴, 𝐵, 𝐶, 𝑅 and 𝑄 to depend on 𝑡


For the sake of simplicity, we’ve chosen not to treat this extension in our implementation
given below
However, the loss of generality is not as large as you might first imagine
In fact, we can tackle many models with time-varying parameters by suitable choice of state
variables
One illustration is given below
41.6. EXTENSIONS AND COMMENTS 677

For further examples and a more systematic treatment, see [53], section 2.4

41.6.2 Adding a Cross-Product Term

In some LQ problems, preferences include a cross-product term 𝑢′𝑡 𝑁 𝑥𝑡 , so that the objective
function becomes

𝑇 −1
E { ∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 ) + 𝛽 𝑇 𝑥′𝑇 𝑅𝑓 𝑥𝑇 } (17)
𝑡=0

Our results extend to this case in a straightforward way


The sequence {𝑃𝑡 } from Eq. (12) becomes

𝑃𝑡−1 = 𝑅 − (𝛽𝐵′ 𝑃𝑡 𝐴 + 𝑁 )′ (𝑄 + 𝛽𝐵′ 𝑃𝑡 𝐵)−1 (𝛽𝐵′ 𝑃𝑡 𝐴 + 𝑁 ) + 𝛽𝐴′ 𝑃𝑡 𝐴 with 𝑃𝑇 = 𝑅𝑓 (18)

The policies in Eq. (14) are modified to

𝑢𝑡 = −𝐹𝑡 𝑥𝑡 where 𝐹𝑡 ∶= (𝑄 + 𝛽𝐵′ 𝑃𝑡+1 𝐵)−1 (𝛽𝐵′ 𝑃𝑡+1 𝐴 + 𝑁 ) (19)

The sequence {𝑑𝑡 } is unchanged from Eq. (13)


We leave interested readers to confirm these results (the calculations are long but not overly
difficult)

41.6.3 Infinite Horizon

Finally, we consider the infinite horizon case, with cross-product term, unchanged dynamics
and objective function given by


E {∑ 𝛽 𝑡 (𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑁 𝑥𝑡 )} (20)
𝑡=0

In the infinite horizon case, optimal policies can depend on time only if time itself is a compo-
nent of the state vector 𝑥𝑡
In other words, there exists a fixed matrix 𝐹 such that 𝑢𝑡 = −𝐹 𝑥𝑡 for all 𝑡
That decision rules are constant over time is intuitive — after all, the decision-maker faces
the same infinite horizon at every stage, with only the current state changing
Not surprisingly, 𝑃 and 𝑑 are also constant
The stationary matrix 𝑃 is the solution to the discrete-time algebraic Riccati equation

𝑃 = 𝑅 − (𝛽𝐵′ 𝑃 𝐴 + 𝑁 )′ (𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑁 ) + 𝛽𝐴′ 𝑃 𝐴 (21)

Equation Eq. (21) is also called the LQ Bellman equation, and the map that sends a given 𝑃
into the right-hand side of Eq. (21) is called the LQ Bellman operator
The stationary optimal policy for this model is
678 41. LQ DYNAMIC PROGRAMMING PROBLEMS

𝑢 = −𝐹 𝑥 where 𝐹 = (𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑁 ) (22)

The sequence {𝑑𝑡 } from Eq. (13) is replaced by the constant value

𝛽
𝑑 ∶= trace(𝐶 ′ 𝑃 𝐶) (23)
1−𝛽

The state evolves according to the time-homogeneous process 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1
An example infinite horizon problem is treated below

41.6.4 Certainty Equivalence

Linear quadratic control problems of the class discussed above have the property of certainty
equivalence
By this, we mean that the optimal policy 𝐹 is not affected by the parameters in 𝐶, which
specify the shock process
This can be confirmed by inspecting Eq. (22) or Eq. (19)
It follows that we can ignore uncertainty when solving for optimal behavior, and plug it back
in when examining optimal state dynamics

41.7 Further Applications

41.7.1 Application 1: Age-Dependent Income Process

Previously we studied a permanent income model that generated consumption smoothing


One unrealistic feature of that model is the assumption that the mean of the random income
process does not depend on the consumer’s age
A more realistic income profile is one that rises in early working life, peaks towards the mid-
dle and maybe declines toward the end of working life and falls more during retirement
In this section, we will model this rise and fall as a symmetric inverted “U” using a polyno-
mial in age
As before, the consumer seeks to minimize

𝑇 −1
E { ∑ 𝛽 𝑡 (𝑐𝑡 − 𝑐)̄ 2 + 𝛽 𝑇 𝑞𝑎2𝑇 } (24)
𝑡=0

subject to 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡 , 𝑡 ≥ 0


For income we now take 𝑦𝑡 = 𝑝(𝑡) + 𝜎𝑤𝑡+1 where 𝑝(𝑡) ∶= 𝑚0 + 𝑚1 𝑡 + 𝑚2 𝑡2
(In the next section we employ some tricks to implement a more sophisticated model)
The coefficients 𝑚0 , 𝑚1 , 𝑚2 are chosen such that 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and 𝑝(𝑇 ) = 0
You can confirm that the specification 𝑚0 = 0, 𝑚1 = 𝑇 𝜇/(𝑇 /2)2 , 𝑚2 = −𝜇/(𝑇 /2)2 satisfies
these constraints
41.7. FURTHER APPLICATIONS 679

To put this into an LQ setting, consider the budget constraint, which becomes

𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑢𝑡 − 𝑐 ̄ + 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1 (25)

The fact that 𝑎𝑡+1 is a linear function of (𝑎𝑡 , 1, 𝑡, 𝑡2 ) suggests taking these four variables as
the state vector 𝑥𝑡
Once a good choice of state and control (recall 𝑢𝑡 = 𝑐𝑡 − 𝑐)̄ has been made, the remaining
specifications fall into place relatively easily
Thus, for the dynamics we set

𝑎𝑡 1 + 𝑟 −𝑐 ̄ 𝑚1 𝑚2 −1 𝜎
⎜ 1 ⎞
⎛ ⎟ ⎛
⎜ 0 1 0 0 ⎞⎟ ⎜ 0 ⎞
⎛ ⎟ ⎜ 0 ⎞
⎛ ⎟
𝑥𝑡 ∶= ⎜
⎜ ⎟, 𝐴 ∶= ⎜ ⎟, 𝐵 ∶= ⎜ ⎟, 𝐶 ∶= ⎜ ⎟ (26)
⎜ 𝑡 ⎟⎟ ⎜
⎜ 0 1 1 0 ⎟⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ 0 ⎟
⎜ ⎟
2
⎝ 𝑡 ⎠ ⎝ 0 1 2 1 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠

If you expand the expression 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 using this specification, you will find
that assets follow Eq. (25) as desired and that the other state variables also update appropri-
ately
To implement preference specification Eq. (24) we take

0 0 0 0 𝑞 0 0 0

⎜ 0 0 0 0 ⎞
⎟ ⎛
⎜ 0 0 0 0 ⎞

𝑄 ∶= 1, 𝑅 ∶= ⎜
⎜ ⎟
⎟ and 𝑅𝑓 ∶= ⎜
⎜ ⎟
⎟ (27)
⎜ 0 0 0 0 ⎟ ⎜ 0 0 0 0 ⎟
⎝ 0 0 0 0 ⎠ ⎝ 0 0 0 0 ⎠

The next figure shows a simulation of consumption and assets computed using the com-
680 41. LQ DYNAMIC PROGRAMMING PROBLEMS

pute_sequence method of lqcontrol.py with initial assets set to zero

Once again, smooth consumption is a dominant feature of the sample paths


The asset path exhibits dynamics consistent with standard life cycle theory
Exercise 1 gives the full set of parameters used here and asks you to replicate the figure

41.7.2 Application 2: A Permanent Income Model with Retirement

In the previous application, we generated income dynamics with an inverted U shape using
polynomials and placed them in an LQ framework
It is arguably the case that this income process still contains unrealistic features
A more common earning profile is where

1. income grows over working life, fluctuating around an increasing trend, with growth
flattening off in later years
2. retirement follows, with lower but relatively stable (non-financial) income

Letting 𝐾 be the retirement date, we can express these income dynamics by

𝑝(𝑡) + 𝜎𝑤𝑡+1 if 𝑡 ≤ 𝐾
𝑦𝑡 = { (28)
𝑠 otherwise

Here

• 𝑝(𝑡) ∶= 𝑚1 𝑡 + 𝑚2 𝑡2 with the coefficients 𝑚1 , 𝑚2 chosen such that 𝑝(𝐾) = 𝜇 and 𝑝(0) =
𝑝(2𝐾) = 0
41.7. FURTHER APPLICATIONS 681

• 𝑠 is retirement income

We suppose that preferences are unchanged and given by Eq. (16)


The budget constraint is also unchanged and given by 𝑎𝑡+1 = (1 + 𝑟)𝑎𝑡 − 𝑐𝑡 + 𝑦𝑡
Our aim is to solve this problem and simulate paths using the LQ techniques described in this
lecture
In fact, this is a nontrivial problem, as the kink in the dynamics Eq. (28) at 𝐾 makes it very
difficult to express the law of motion as a fixed-coefficient linear system
However, we can still use our LQ methods here by suitably linking two-component LQ prob-
lems
These two LQ problems describe the consumer’s behavior during her working life
(lq_working) and retirement (lq_retired)
(This is possible because, in the two separate periods of life, the respective income processes
[polynomial trend and constant] each fit the LQ framework)
The basic idea is that although the whole problem is not a single time-invariant LQ problem,
it is still a dynamic programming problem, and hence we can use appropriate Bellman equa-
tions at every stage
Based on this logic, we can

1. solve lq_retired by the usual backward induction procedure, iterating back to the
start of retirement
2. take the start-of-retirement value function generated by this process, and use it as the
terminal condition 𝑅𝑓 to feed into the lq_working specification
3. solve lq_working by backward induction from this choice of 𝑅𝑓 , iterating back to the
start of working life

This process gives the entire life-time sequence of value functions and optimal policies
682 41. LQ DYNAMIC PROGRAMMING PROBLEMS

The next figure shows one simulation based on this procedure

The full set of parameters used in the simulation is discussed in Exercise 2, where you are
asked to replicate the figure
Once again, the dominant feature observable in the simulation is consumption smoothing
The asset path fits well with standard life cycle theory, with dissaving early in life followed by
later saving
Assets peak at retirement and subsequently decline

41.7.3 Application 3: Monopoly with Adjustment Costs

Consider a monopolist facing stochastic inverse demand function

𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡 + 𝑑𝑡

Here 𝑞𝑡 is output, and the demand shock 𝑑𝑡 follows

𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑤𝑡+1

where {𝑤𝑡 } is IID and standard normal


The monopolist maximizes the expected discounted sum of present and future profits


E {∑ 𝛽 𝑡 𝜋𝑡 } where 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 (29)
𝑡=0
41.7. FURTHER APPLICATIONS 683

Here

• 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs


• 𝑐 is average cost of production

This can be formulated as an LQ problem and then solved and simulated, but first let’s study
the problem and try to get some intuition
One way to start thinking about the problem is to consider what would happen if 𝛾 = 0
Without adjustment costs there is no intertemporal trade-off, so the monopolist will choose
output to maximize current profit in each period
It’s not difficult to show that profit-maximizing output is

𝑎0 − 𝑐 + 𝑑 𝑡
𝑞𝑡̄ ∶=
2𝑎1

In light of this discussion, what we might expect for general 𝛾 is that

• if 𝛾 is close to zero, then 𝑞𝑡 will track the time path of 𝑞𝑡̄ relatively closely
• if 𝛾 is larger, then 𝑞𝑡 will be smoother than 𝑞𝑡̄ , as the monopolist seeks to avoid adjust-
ment costs

This intuition turns out to be correct


The following figures show simulations produced by solving the corresponding LQ problem
The only difference in parameters across the figures is the size of 𝛾
684 41. LQ DYNAMIC PROGRAMMING PROBLEMS

To produce these figures we converted the monopolist problem into an LQ problem


The key to this conversion is to choose the right state — which can be a bit of an art
Here we take 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while the control is chosen as 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
We also manipulated the profit function slightly
In Eq. (29), current profits are 𝜋𝑡 ∶= 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2
Let’s now replace 𝜋𝑡 in Eq. (29) with 𝜋𝑡̂ ∶= 𝜋𝑡 − 𝑎1 𝑞𝑡2̄
This makes no difference to the solution, since 𝑎1 𝑞𝑡2̄ does not depend on the controls
(In fact, we are just adding a constant term to Eq. (29), and optimizers are not affected by
constant terms)
41.8. EXERCISES 685

The reason for making this substitution is that, as you will be able to verify, 𝜋𝑡̂ reduces to the
simple quadratic

𝜋𝑡̂ = −𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 − 𝛾𝑢2𝑡

After negation to convert to a minimization problem, the objective becomes


min E ∑ 𝛽 𝑡 {𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 } (30)
𝑡=0

It’s now relatively straightforward to find 𝑅 and 𝑄 such that Eq. (30) can be written as
Eq. (20)
Furthermore, the matrices 𝐴, 𝐵 and 𝐶 from Eq. (1) can be found by writing down the dy-
namics of each element of the state
Exercise 3 asks you to complete this process, and reproduce the preceding figures

41.8 Exercises

41.8.1 Exercise 1

Replicate the figure with polynomial income shown above


The parameters are 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 1.5, 𝜇 = 2, 𝜎 = 0.15, 𝑇 = 50 and 𝑞 = 104

41.8.2 Exercise 2

Replicate the figure on work and retirement shown above


The parameters are 𝑟 = 0.05, 𝛽 = 1/(1 + 𝑟), 𝑐 ̄ = 4, 𝜇 = 4, 𝜎 = 0.35, 𝐾 = 40, 𝑇 = 60, 𝑠 = 1 and
𝑞 = 104
To understand the overall procedure, carefully read the section containing that figure
Some hints are as follows:
First, in order to make our approach work, we must ensure that both LQ problems have the
same state variables and control
As with previous applications, the control can be set to 𝑢𝑡 = 𝑐𝑡 − 𝑐 ̄
For lq_working, 𝑥𝑡 , 𝐴, 𝐵, 𝐶 can be chosen as in Eq. (26)

• Recall that 𝑚1 , 𝑚2 are chosen so that 𝑝(𝐾) = 𝜇 and 𝑝(2𝐾) = 0

For lq_retired, use the same definition of 𝑥𝑡 and 𝑢𝑡 , but modify 𝐴, 𝐵, 𝐶 to correspond to
constant income 𝑦𝑡 = 𝑠
For lq_retired, set preferences as in Eq. (27)
For lq_working, preferences are the same, except that 𝑅𝑓 should be replaced by the final
value function that emerges from iterating lq_retired back to the start of retirement
686 41. LQ DYNAMIC PROGRAMMING PROBLEMS

With some careful footwork, the simulation can be generated by patching together the simu-
lations from these two separate models

41.8.3 Exercise 3

Reproduce the figures from the monopolist application given above


For parameters, use 𝑎0 = 5, 𝑎1 = 0.5, 𝜎 = 0.15, 𝜌 = 0.9, 𝛽 = 0.95 and 𝑐 = 2, while 𝛾 varies
between 1 and 50 (see figures)

41.9 Solutions

41.9.1 Exercise 1

Here’s one solution


We use some fancy plot commands to get a certain style — feel free to use simpler ones
The model is an LQ permanent income / life-cycle model with hump-shaped income

𝑦𝑡 = 𝑚1 𝑡 + 𝑚2 𝑡2 + 𝜎𝑤𝑡+1

where {𝑤𝑡 } is IID 𝑁 (0, 1) and the coefficients 𝑚1 and 𝑚2 are chosen so that 𝑝(𝑡) = 𝑚1 𝑡 +
𝑚2 𝑡2 has an inverted U shape with

• 𝑝(0) = 0, 𝑝(𝑇 /2) = 𝜇, and


• 𝑝(𝑇 ) = 0

In [4]: # == Model parameters == #


r = 0.05
β = 1/(1 + r)
T = 50
c_bar = 1.5
σ = 0.15
μ = 2
q = 1e4
m1 = T * (μ/(T/2)**2)
m2 = -(μ/(T/2)**2)

# == Formulate as an LQ problem == #
Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, -c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[-1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]

# == Compute solutions and simulate == #


lq = LQ(Q, R, A, B, C, beta=β, T=T, Rf=Rf)
x0 = (0, 1, 0, 0)
41.9. SOLUTIONS 687

xp, up, wp = lq.compute_sequence(x0)

# == Convert results back to assets, consumption and income == #


ap = xp[0, :] # Assets
c = up.flatten() + c_bar # Consumption
time = np.arange(1, T+1)
income = σ * wp[0, 1:] + m1 * time + m2 * time**2 # Income

# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(range(1, T+1), income, 'g-', label="non-financial income", **p_args)


axes[0].plot(range(T), c, 'k-', label="consumption", **p_args)

axes[1].plot(range(T+1), ap.flatten(), 'b-', label="assets", **p_args)


axes[1].plot(range(T+1), np.zeros(T+1), 'k-')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
688 41. LQ DYNAMIC PROGRAMMING PROBLEMS

41.9.2 Exercise 2

This is a permanent income / life-cycle model with polynomial growth in income over work-
ing life followed by a fixed retirement income
The model is solved by combining two LQ programming problems as described in the lecture

In [5]: # == Model parameters == #


r = 0.05
β = 1/(1 + r)
T = 60
K = 40
c_bar = 4
σ = 0.35
μ = 4
q = 1e4
s = 1
m1 = 2 * μ/K
m2 = -μ/K**2

# == Formulate LQ problem 1 (retirement) == #


Q = 1
R = np.zeros((4, 4))
Rf = np.zeros((4, 4))
Rf[0, 0] = q
A = [[1 + r, s - c_bar, 0, 0],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[-1],
[ 0],
[ 0],
[ 0]]
C = [[0],
[0],
[0],
[0]]

# == Initialize LQ instance for retired agent == #


lq_retired = LQ(Q, R, A, B, C, beta=β, T=T-K, Rf=Rf)
# == Iterate back to start of retirement, record final value function == #
for i in range(T-K):
lq_retired.update_values()
Rf2 = lq_retired.P

# == Formulate LQ problem 2 (working life) == #


R = np.zeros((4, 4))
A = [[1 + r, -c_bar, m1, m2],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 2, 1]]
B = [[-1],
[ 0],
[ 0],
[ 0]]
C = [[σ],
[0],
[0],
[0]]

# == Set up working life LQ instance with terminal Rf from lq_retired == #


lq_working = LQ(Q, R, A, B, C, beta=β, T=K, Rf=Rf2)

# == Simulate working state / control paths == #


x0 = (0, 1, 0, 0)
xp_w, up_w, wp_w = lq_working.compute_sequence(x0)
# == Simulate retirement paths (note the initial condition) == #
xp_r, up_r, wp_r = lq_retired.compute_sequence(xp_w[:, K])

# == Convert results back to assets, consumption and income == #


xp = np.column_stack((xp_w, xp_r[:, 1:]))
41.9. SOLUTIONS 689

assets = xp[0, :] # Assets

up = np.column_stack((up_w, up_r))
c = up.flatten() + c_bar # Consumption

time = np.arange(1, K+1)


income_w = σ * wp_w[0, 1:K+1] + m1 * time + m2 * time**2 # Income
income_r = np.ones(T-K) * s
income = np.concatenate((income_w, income_r))

# == Plot results == #
n_rows = 2
fig, axes = plt.subplots(n_rows, 1, figsize=(12, 10))

plt.subplots_adjust(hspace=0.5)

bbox = (0., 1.02, 1., .102)


legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

axes[0].plot(range(1, T+1), income, 'g-', label="non-financial income", **p_args)


axes[0].plot(range(T), c, 'k-', label="consumption", **p_args)

axes[1].plot(range(T+1), assets, 'b-', label="assets", **p_args)


axes[1].plot(range(T+1), np.zeros(T+1), 'k-')

for ax in axes:
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=2, **legend_args)

plt.show()
690 41. LQ DYNAMIC PROGRAMMING PROBLEMS

41.9.3 Exercise 3

The first task is to find the matrices 𝐴, 𝐵, 𝐶, 𝑄, 𝑅 that define the LQ problem
Recall that 𝑥𝑡 = (𝑞𝑡̄ 𝑞𝑡 1)′ , while 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡
Letting 𝑚0 ∶= (𝑎0 − 𝑐)/2𝑎1 and 𝑚1 ∶= 1/2𝑎1 , we can write 𝑞𝑡̄ = 𝑚0 + 𝑚1 𝑑𝑡 , and then, with
some manipulation

𝑞𝑡+1
̄ = 𝑚0 (1 − 𝜌) + 𝜌𝑞𝑡̄ + 𝑚1 𝜎𝑤𝑡+1

By our definition of 𝑢𝑡 , the dynamics of 𝑞𝑡 are 𝑞𝑡+1 = 𝑞𝑡 + 𝑢𝑡


Using these facts you should be able to build the correct 𝐴, 𝐵, 𝐶 matrices (and then check
them against those found in the solution code below)
Suitable 𝑅, 𝑄 matrices can be found by inspecting the objective function, which we repeat
here for convenience:


min E {∑ 𝛽 𝑡 𝑎1 (𝑞𝑡 − 𝑞𝑡̄ )2 + 𝛾𝑢2𝑡 }
𝑡=0

Our solution code is

In [6]: # == Model parameters == #


a0 = 5
a1 = 0.5
σ = 0.15
ρ = 0.9
γ = 1
β = 0.95
c = 2
T = 120

# == Useful constants == #
m0 = (a0-c)/(2 * a1)
m1 = 1/(2 * a1)

# == Formulate LQ problem == #
Q = γ
R = [[ a1, -a1, 0],
[-a1, a1, 0],
[ 0, 0, 0]]
A = [[ρ, 0, m0 * (1 - ρ)],
[0, 1, 0],
[0, 0, 1]]

B = [[0],
[1],
[0]]
C = [[m1 * σ],
[ 0],
[ 0]]

lq = LQ(Q, R, A, B, C=C, beta=β)

# == Simulate state / control paths == #


x0 = (m0, 2, 1)
xp, up, wp = lq.compute_sequence(x0, ts_length=150)
q_bar = xp[0, :]
q = xp[1, :]

# == Plot simulation results == #


fig, ax = plt.subplots(figsize=(10, 6.5))
41.9. SOLUTIONS 691

# == Some fancy plotting stuff -- simplify if you prefer == #


bbox = (0., 1.01, 1., .101)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.6}

time = range(len(q))
ax.set(xlabel='Time', xlim=(0, max(time)))
ax.plot(time, q_bar, 'k-', lw=2, alpha=0.6, label=r'$\bar q_t$')
ax.plot(time, q, 'b-', lw=2, alpha=0.6, label='$q_t$')
ax.legend(ncol=2, **legend_args)
s = f'dynamics with $\gamma = {γ}$'
ax.text(max(time) * 0.6, 1 * q_bar.max(), s, fontsize=14)
plt.show()
692 41. LQ DYNAMIC PROGRAMMING PROBLEMS
42

Optimal Savings I: The Permanent


Income Model

42.1 Contents

• Overview 42.2

• The Savings Problem 42.3

• Alternative Representations 42.4

• Two Classic Examples 42.5

• Further Reading 42.6

• Appendix: The Euler Equation 42.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

42.2 Overview

This lecture describes a rational expectations version of the famous permanent income model
of Milton Friedman [43]
Robert Hall cast Friedman’s model within a linear-quadratic setting [48]
Like Hall, we formulate an infinite-horizon linear-quadratic savings problem
We use the model as a vehicle for illustrating

• alternative formulations of the state of a dynamic system


• the idea of cointegration
• impulse response functions
• the idea that changes in consumption are useful as predictors of movements in income

Background readings on the linear-quadratic-Gaussian permanent income model are Hall’s


[48] and chapter 2 of [87]

693
694 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

Let’s start with some imports

In [2]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np
import random
from numba import njit

42.3 The Savings Problem

In this section, we state and solve the savings and consumption problem faced by the con-
sumer

42.3.1 Preliminaries

We use a class of stochastic processes called martingales


A discrete-time martingale is a stochastic process (i.e., a sequence of random variables) {𝑋𝑡 }
with finite mean at each 𝑡 and satisfying

E𝑡 [𝑋𝑡+1 ] = 𝑋𝑡 , 𝑡 = 0, 1, 2, …

Here E𝑡 ∶= E[⋅ | ℱ𝑡 ] is a conditional mathematical expectation conditional on the time 𝑡 infor-


mation set ℱ𝑡
The latter is just a collection of random variables that the modeler declares to be visible at 𝑡

• When not explicitly defined, it is usually understood that ℱ𝑡 = {𝑋𝑡 , 𝑋𝑡−1 , … , 𝑋0 }

Martingales have the feature that the history of past outcomes provides no predictive power
for changes between current and future outcomes
For example, the current wealth of a gambler engaged in a “fair game” has this property
One common class of martingales is the family of random walks
A random walk is a stochastic process {𝑋𝑡 } that satisfies

𝑋𝑡+1 = 𝑋𝑡 + 𝑤𝑡+1

for some IID zero mean innovation sequence {𝑤𝑡 }


Evidently, 𝑋𝑡 can also be expressed as

𝑡
𝑋𝑡 = ∑ 𝑤𝑗 + 𝑋0
𝑗=1

Not every martingale arises as a random walk (see, for example, Wald’s martingale)
42.3. THE SAVINGS PROBLEM 695

42.3.2 The Decision Problem

A consumer has preferences over consumption streams that are ordered by the utility func-
tional


E0 [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] (1)
𝑡=0

where

• E𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information


• 𝑐𝑡 is time 𝑡 consumption
• 𝑢 is a strictly concave one-period utility function
• 𝛽 ∈ (0, 1) is a discount factor

The consumer maximizes Eq. (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0
subject to the sequence of budget constraints

1
𝑐𝑡 + 𝑏 𝑡 = 𝑏 + 𝑦𝑡 𝑡≥0 (2)
1 + 𝑟 𝑡+1

Here

• 𝑦𝑡 is an exogenous endowment process


• 𝑟 > 0 is a time-invariant risk-free net interest rate
• 𝑏𝑡 is one-period risk-free debt maturing at 𝑡

The consumer also faces initial conditions 𝑏0 and 𝑦0 , which can be fixed or random

42.3.3 Assumptions

For the remainder of this lecture, we follow Friedman and Hall in assuming that (1 + 𝑟)−1 = 𝛽
Regarding the endowment process, we assume it has the state-space representation

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


(3)
𝑦𝑡 = 𝑈 𝑧 𝑡

where

• {𝑤𝑡 } is an IID vector process with E𝑤𝑡 = 0 and E𝑤𝑡 𝑤𝑡′ = 𝐼


• the spectral radius of 𝐴 satisfies 𝜌(𝐴) < √1/𝛽
• 𝑈 is a selection vector that pins down 𝑦𝑡 as a particular linear combination of compo-
nents of 𝑧𝑡

The restriction on 𝜌(𝐴) prevents income from growing so fast that discounted geometric sums
of some quadratic forms to be described below become infinite
Regarding preferences, we assume the quadratic utility function
696 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2

where 𝛾 is a bliss level of consumption

Note
Along with this quadratic utility specification, we allow consumption to be nega-
tive. However, by choosing parameters appropriately, we can make the probability
that the model generates negative consumption paths over finite time horizons as
low as desired

Finally, we impose the no Ponzi scheme condition


E0 [∑ 𝛽 𝑡 𝑏𝑡2 ] < ∞ (4)
𝑡=0

This condition rules out an always-borrow scheme that would allow the consumer to enjoy
bliss consumption forever

42.3.4 First-Order Conditions

First-order conditions for maximizing Eq. (1) subject to Eq. (2) are

E𝑡 [𝑢′ (𝑐𝑡+1 )] = 𝑢′ (𝑐𝑡 ), 𝑡 = 0, 1, … (5)

These optimality conditions are also known as Euler equations


If you’re not sure where they come from, you can find a proof sketch in the appendix
With our quadratic preference specification, Eq. (5) has the striking implication that con-
sumption follows a martingale:

E𝑡 [𝑐𝑡+1 ] = 𝑐𝑡 (6)

(In fact, quadratic preferences are necessary for this conclusion [1])
One way to interpret Eq. (6) is that consumption will change only when “new information”
about permanent income is revealed
These ideas will be clarified below

42.3.5 The Optimal Decision Rule

Now let’s deduce the optimal decision rule [2]

Note
One way to solve the consumer’s problem is to apply dynamic programming as
in this lecture. We do this later. But first we use an alternative approach that is
revealing and shows the work that dynamic programming does for us behind the
scenes
42.3. THE SAVINGS PROBLEM 697

In doing so, we need to combine

1. the optimality condition Eq. (6)


2. the period-by-period budget constraint Eq. (2), and
3. the boundary condition Eq. (4)

𝑡
To accomplish this, observe first that Eq. (4) implies lim𝑡→∞ 𝛽 2 𝑏𝑡+1 = 0
Using this restriction on the debt path and solving Eq. (2) forward yields


𝑏𝑡 = ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 ) (7)
𝑗=0

Take conditional expectations on both sides of Eq. (7) and use the martingale property of
consumption and the law of iterated expectations to deduce


𝑐𝑡
𝑏𝑡 = ∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − (8)
𝑗=0
1−𝛽

Expressed in terms of 𝑐𝑡 we get

∞ ∞
𝑟
𝑐𝑡 = (1 − 𝛽) [∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] = [∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑏𝑡 ] (9)
𝑗=0
1 + 𝑟 𝑗=0

where the last equality uses (1 + 𝑟)𝛽 = 1


These last two equations assert that consumption equals economic income

• financial wealth equals −𝑏𝑡



• non-financial wealth equals ∑𝑗=0 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ]
• total wealth equals the sum of financial and non-financial wealth
• A marginal propensity to consume out of total wealth equals the interest factor
𝑟
1+𝑟
• economic income equals
– a constant marginal propensity to consume times the sum of non-financial wealth
and financial wealth
– the amount the consumer can consume while leaving its wealth intact

Responding to the State


The state vector confronting the consumer at 𝑡 is [𝑏𝑡 𝑧𝑡 ]
Here

• 𝑧𝑡 is an exogenous component, unaffected by consumer behavior


• 𝑏𝑡 is an endogenous component (since it depends on the decision rule)

Note that 𝑧𝑡 contains all variables useful for forecasting the consumer’s future endowment
It is plausible that current decisions 𝑐𝑡 and 𝑏𝑡+1 should be expressible as functions of 𝑧𝑡 and 𝑏𝑡
698 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

This is indeed the case


In fact, from this discussion, we see that

∞ ∞
∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] = E𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0 𝑗=0

Combining this with Eq. (9) gives

𝑟
𝑐𝑡 = [𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] (10)
1+𝑟
Using this equality to eliminate 𝑐𝑡 in the budget constraint Eq. (2) gives

𝑏𝑡+1 = (1 + 𝑟)(𝑏𝑡 + 𝑐𝑡 − 𝑦𝑡 )
= (1 + 𝑟)𝑏𝑡 + 𝑟[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ] − (1 + 𝑟)𝑈 𝑧𝑡
= 𝑏𝑡 + 𝑈 [𝑟(𝐼 − 𝛽𝐴)−1 − (1 + 𝑟)𝐼]𝑧𝑡
= 𝑏𝑡 + 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)𝑧𝑡

To get from the second last to the last expression in this chain of equalities is not trivial

A key is to use the fact that (1 + 𝑟)𝛽 = 1 and (𝐼 − 𝛽𝐴)−1 = ∑𝑗=0 𝛽 𝑗 𝐴𝑗
We’ve now successfully written 𝑐𝑡 and 𝑏𝑡+1 as functions of 𝑏𝑡 and 𝑧𝑡
A State-Space Representation
We can summarize our dynamics in the form of a linear state-space system governing con-
sumption, debt and income:

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


𝑏𝑡+1 = 𝑏𝑡 + 𝑈 [(𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)]𝑧𝑡
(11)
𝑦𝑡 = 𝑈 𝑧𝑡
𝑐𝑡 = (1 − 𝛽)[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ]

To write this more succinctly, let

𝑧 𝐴 0 𝐶
𝑥𝑡 = [ 𝑡 ] , 𝐴̃ = [ ], 𝐶̃ = [ ]
𝑏𝑡 𝑈 (𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼) 1 0

and

𝑈 0 𝑦
𝑈̃ = [ −1 ], 𝑦𝑡̃ = [ 𝑡 ]
(1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴) −(1 − 𝛽) 𝑐𝑡

Then we can express equation Eq. (11) as

𝑥𝑡+1 = 𝐴𝑥 ̃ + 𝐶𝑤
̃
𝑡 𝑡+1
(12)
𝑦𝑡̃ = 𝑈̃ 𝑥𝑡

We can use the following formulas from linear state space models to compute population
mean 𝜇𝑡 = E𝑥𝑡 and covariance Σ𝑡 ∶= E[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
42.3. THE SAVINGS PROBLEM 699

̃
𝜇𝑡+1 = 𝐴𝜇 with 𝜇0 given (13)
𝑡

̃ 𝐴′̃ + 𝐶 𝐶
Σ𝑡+1 = 𝐴Σ ̃ ′̃ with Σ0 given (14)
𝑡

We can then compute the mean and covariance of 𝑦𝑡̃ from

𝜇𝑦,𝑡 = 𝑈̃ 𝜇𝑡
(15)
Σ𝑦,𝑡 = 𝑈̃ Σ𝑡 𝑈̃ ′

A Simple Example with IID Income


To gain some preliminary intuition on the implications of Eq. (11), let’s look at a highly styl-
ized example where income is just IID
(Later examples will investigate more realistic income streams)
In particular, let {𝑤𝑡 }∞
𝑡=1 be IID and scalar standard normal, and let

𝑧1 0 0 𝜎
𝑧𝑡 = [ 𝑡 ] , 𝐴=[ ], 𝑈 = [1 𝜇] , 𝐶=[ ]
1 0 1 0

Finally, let 𝑏0 = 𝑧01 = 0


Under these assumptions, we have 𝑦𝑡 = 𝜇 + 𝜎𝑤𝑡 ∼ 𝑁 (𝜇, 𝜎2 )
Further, if you work through the state space representation, you will see that

𝑡−1
𝑏𝑡 = −𝜎 ∑ 𝑤𝑗
𝑗=1
𝑡
𝑐𝑡 = 𝜇 + (1 − 𝛽)𝜎 ∑ 𝑤𝑗
𝑗=1

Thus income is IID and debt and consumption are both Gaussian random walks
Defining assets as −𝑏𝑡 , we see that assets are just the cumulative sum of unanticipated in-
comes prior to the present date
The next figure shows a typical realization with 𝑟 = 0.05, 𝜇 = 1, and 𝜎 = 0.15

In [3]: r = 0.05
β = 1 / (1 + r)
σ = 0.15
μ = 1
T = 60

@njit
def time_path(T):
w = np.random.randn(T+1) # w_0, w_1, ..., w_T
w[0] = 0
b = np.zeros(T+1)
for t in range(1, T+1):
b[t] = w[1:t].sum()
b = -σ * b
c = μ + (1 - β) * (σ * w - b)
return w, b, c
700 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

w, b, c = time_path(T)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(μ + σ * w, 'g-', label="Non-financial income")


ax.plot(c, 'k-', label="Consumption")
ax.plot( b, 'b-', label="Debt")
ax.legend(ncol=3, mode='expand', bbox_to_anchor=(0., 1.02, 1., .102))
ax.grid()
ax.set_xlabel('Time')

plt.show()

Observe that consumption is considerably smoother than income


The figure below shows the consumption paths of 250 consumers with independent income
streams

In [4]: fig, ax = plt.subplots(figsize=(10, 6))

b_sum = np.zeros(T+1)
for i in range(250):
w, b, c = time_path(T) # Generate new time path
rcolor = random.choice(('c', 'g', 'b', 'k'))
ax.plot(c, color=rcolor, lw=0.8, alpha=0.7)

ax.grid()
ax.set(xlabel='Time', ylabel='Consumption')

plt.show()
42.4. ALTERNATIVE REPRESENTATIONS 701

42.4 Alternative Representations

In this section, we shed more light on the evolution of savings, debt and consumption by rep-
resenting their dynamics in several different ways

42.4.1 Hall’s Representation

Hall [48] suggested an insightful way to summarize the implications of LQ permanent income
theory
First, to represent the solution for 𝑏𝑡 , shift Eq. (9) forward one period and eliminate 𝑏𝑡+1 by
using Eq. (2) to obtain


𝑐𝑡+1 = (1 − 𝛽) ∑ 𝛽 𝑗 E𝑡+1 [𝑦𝑡+𝑗+1 ] − (1 − 𝛽) [𝛽 −1 (𝑐𝑡 + 𝑏𝑡 − 𝑦𝑡 )]
𝑗=0


If we add and subtract 𝛽 −1 (1 − 𝛽) ∑𝑗=0 𝛽 𝑗 E𝑡 𝑦𝑡+𝑗 from the right side of the preceding equation
and rearrange, we obtain


𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽) ∑ 𝛽 𝑗 {E𝑡+1 [𝑦𝑡+𝑗+1 ] − E𝑡 [𝑦𝑡+𝑗+1 ]} (16)
𝑗=0

The right side is the time 𝑡 + 1 innovation to the expected present value of the endowment
process {𝑦𝑡 }
We can represent the optimal decision rule for (𝑐𝑡 , 𝑏𝑡+1 ) in the form of Eq. (16) and Eq. (8),
which we repeat:
702 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL


1
𝑏𝑡 = ∑ 𝛽 𝑗 E𝑡 [𝑦𝑡+𝑗 ] − 𝑐 (17)
𝑗=0
1−𝛽 𝑡

Equation Eq. (17) asserts that the consumer’s debt due at 𝑡 equals the expected present value
of its endowment minus the expected present value of its consumption stream
A high debt thus indicates a large expected present value of surpluses 𝑦𝑡 − 𝑐𝑡
Recalling again our discussion on forecasting geometric sums, we have


E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡
𝑗=0

E𝑡+1 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡+1
𝑗=0

E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗+1 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝐴𝑧𝑡
𝑗=0

Using these formulas together with Eq. (3) and substituting into Eq. (16) and Eq. (17) gives
the following representation for the consumer’s optimum decision rule:

𝑐𝑡+1 = 𝑐𝑡 + (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1


1
𝑏𝑡 = 𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑐
1−𝛽 𝑡 (18)
𝑦𝑡 = 𝑈 𝑧𝑡
𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1

Representation Eq. (18) makes clear that

• The state can be taken as (𝑐𝑡 , 𝑧𝑡 )

– The endogenous part is 𝑐𝑡 and the exogenous part is 𝑧𝑡


– Debt 𝑏𝑡 has disappeared as a component of the state because it is encoded in 𝑐𝑡

• Consumption is a random walk with innovation (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1

– This is a more explicit representation of the martingale result in Eq. (6)

42.4.2 Cointegration

Representation Eq. (18) reveals that the joint process {𝑐𝑡 , 𝑏𝑡 } possesses the property that En-
gle and Granger [39] called cointegration
Cointegration is a tool that allows us to apply powerful results from the theory of stationary
stochastic processes to (certain transformations of) nonstationary models
To apply cointegration in the present context, suppose that 𝑧𝑡 is asymptotically stationary [4]
Despite this, both 𝑐𝑡 and 𝑏𝑡 will be non-stationary because they have unit roots (see Eq. (11)
for 𝑏𝑡 )
Nevertheless, there is a linear combination of 𝑐𝑡 , 𝑏𝑡 that is asymptotically stationary
42.4. ALTERNATIVE REPRESENTATIONS 703

In particular, from the second equality in Eq. (18) we have

(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 (19)

Hence the linear combination (1 − 𝛽)𝑏𝑡 + 𝑐𝑡 is asymptotically stationary


Accordingly, Granger and Engle would call [(1 − 𝛽) 1] a cointegrating vector for the state

When applied to the nonstationary vector process [𝑏𝑡 𝑐𝑡 ] , it yields a process that is asymp-
totically stationary
Equation Eq. (19) can be rearranged to take the form


(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (20)
𝑗=0

Equation Eq. (20) asserts that the cointegrating residual on the left side equals the condi-
tional expectation of the geometric sum of future incomes on the right [6]

42.4.3 Cross-Sectional Implications

Consider again Eq. (18), this time in light of our discussion of distribution dynamics in the
lecture on linear systems
The dynamics of 𝑐𝑡 are given by

𝑐𝑡+1 = 𝑐𝑡 + (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1 (21)

or

𝑡
𝑐𝑡 = 𝑐0 + ∑ 𝑤̂ 𝑗 for 𝑤̂ 𝑡+1 ∶= (1 − 𝛽)𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝑤𝑡+1
𝑗=1

The unit root affecting 𝑐𝑡 causes the time 𝑡 variance of 𝑐𝑡 to grow linearly with 𝑡
In particular, since {𝑤̂ 𝑡 } is IID, we have

Var[𝑐𝑡 ] = Var[𝑐0 ] + 𝑡 𝜎̂ 2 (22)

where

𝜎̂ 2 ∶= (1 − 𝛽)2 𝑈 (𝐼 − 𝛽𝐴)−1 𝐶𝐶 ′ (𝐼 − 𝛽𝐴′ )−1 𝑈 ′

When 𝜎̂ > 0, {𝑐𝑡 } has no asymptotic distribution


Let’s consider what this means for a cross-section of ex-ante identical consumers born at time
0
Let the distribution of 𝑐0 represent the cross-section of initial consumption values
Equation Eq. (22) tells us that the variance of 𝑐𝑡 increases over time at a rate proportional to
𝑡
704 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

A number of different studies have investigated this prediction and found some support for it
(see, e.g., [32], [126])

42.4.4 Impulse Response Functions

Impulse response functions measure responses to various impulses (i.e., temporary shocks)
The impulse response function of {𝑐𝑡 } to the innovation {𝑤𝑡 } is a box
In particular, the response of 𝑐𝑡+𝑗 to a unit increase in the innovation 𝑤𝑡+1 is (1 − 𝛽)𝑈 (𝐼 −
𝛽𝐴)−1 𝐶 for all 𝑗 ≥ 1

42.4.5 Moving Average Representation

It’s useful to express the innovation to the expected present value of the endowment process
in terms of a moving average representation for income 𝑦𝑡
The endowment process defined by Eq. (3) has the moving average representation

𝑦𝑡+1 = 𝑑(𝐿)𝑤𝑡+1 (23)

where


• 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 for some sequence 𝑑𝑗 , where 𝐿 is the lag operator [3]
• at time 𝑡, the consumer has an information set [5] 𝑤𝑡 = [𝑤𝑡 , 𝑤𝑡−1 , …]

Notice that

𝑦𝑡+𝑗 − E𝑡 [𝑦𝑡+𝑗 ] = 𝑑0 𝑤𝑡+𝑗 + 𝑑1 𝑤𝑡+𝑗−1 + ⋯ + 𝑑𝑗−1 𝑤𝑡+1

It follows that

E𝑡+1 [𝑦𝑡+𝑗 ] − E𝑡 [𝑦𝑡+𝑗 ] = 𝑑𝑗−1 𝑤𝑡+1 (24)

Using Eq. (24) in Eq. (16) gives

𝑐𝑡+1 − 𝑐𝑡 = (1 − 𝛽)𝑑(𝛽)𝑤𝑡+1 (25)

The object 𝑑(𝛽) is the present value of the moving average coefficients in the represen-
tation for the endowment process 𝑦𝑡

42.5 Two Classic Examples

We illustrate some of the preceding ideas with two examples


In both examples, the endowment follows the process 𝑦𝑡 = 𝑧1𝑡 + 𝑧2𝑡 where

𝑧 1 0 𝑧1𝑡 𝜎 0 𝑤1𝑡+1
[ 1𝑡+1 ] = [ ][ ]+[ 1 ][ ]
𝑧2𝑡+1 0 0 𝑧2𝑡 0 𝜎2 𝑤2𝑡+1
42.5. TWO CLASSIC EXAMPLES 705

Here

• 𝑤𝑡+1 is an IID 2 × 1 process distributed as 𝑁 (0, 𝐼)


• 𝑧1𝑡 is a permanent component of 𝑦𝑡
• 𝑧2𝑡 is a purely transitory component of 𝑦𝑡

42.5.1 Example 1

Assume as before that the consumer observes the state 𝑧𝑡 at time 𝑡


In view of Eq. (18) we have

𝑐𝑡+1 − 𝑐𝑡 = 𝜎1 𝑤1𝑡+1 + (1 − 𝛽)𝜎2 𝑤2𝑡+1 (26)

Formula Eq. (26) shows how an increment 𝜎1 𝑤1𝑡+1 to the permanent component of income
𝑧1𝑡+1 leads to

• a permanent one-for-one increase in consumption and


• no increase in savings −𝑏𝑡+1

But the purely transitory component of income 𝜎2 𝑤2𝑡+1 leads to a permanent increment in
consumption by a fraction 1 − 𝛽 of transitory income
The remaining fraction 𝛽 is saved, leading to a permanent increment in −𝑏𝑡+1
Application of the formula for debt in Eq. (11) to this example shows that

𝑏𝑡+1 − 𝑏𝑡 = −𝑧2𝑡 = −𝜎2 𝑤2𝑡 (27)

This confirms that none of 𝜎1 𝑤1𝑡 is saved, while all of 𝜎2 𝑤2𝑡 is saved
The next figure illustrates these very different reactions to transitory and permanent income
shocks using impulse-response functions

In [5]: r = 0.05
β = 1 / (1 + r)
S = 5 # Impulse date
σ1 = σ2 = 0.15

@njit
def time_path(T, permanent=False):
"Time path of consumption and debt given shock sequence"
w1 = np.zeros(T+1)
w2 = np.zeros(T+1)
b = np.zeros(T+1)
c = np.zeros(T+1)
if permanent:
w1[S+1] = 1.0
else:
w2[S+1] = 1.0
for t in range(1, T):
b[t+1] = b[t] - σ2 * w2[t]
c[t+1] = c[t] + σ1 * w1[t+1] + (1 - β) * σ2 * w2[t+1]
return b, c

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


titles = ['transitory', 'permanent']
706 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

L = 0.175

for ax, truefalse, title in zip(axes, (True, False), titles):


b, c = time_path(T=20, permanent=truefalse)
ax.set_title(f'Impulse reponse: {title} income shock')
ax.plot(c, 'g-', label="consumption")
ax.plot(b, 'b-', label="debt")
ax.plot((S, S), (-L, L), 'k-', lw=0.5)
ax.grid(alpha=0.5)
ax.set(xlabel=r'Time', ylim=(-L, L))

axes[0].legend(loc='lower right')

plt.tight_layout()
plt.show()

42.5.2 Example 2

Assume now that at time 𝑡 the consumer observes 𝑦𝑡 , and its history up to 𝑡, but not 𝑧𝑡
Under this assumption, it is appropriate to use an innovation representation to form 𝐴, 𝐶, 𝑈
in Eq. (18)
The discussion in sections 2.9.1 and 2.11.3 of [87] shows that the pertinent state space repre-
sentation for 𝑦𝑡 is
42.5. TWO CLASSIC EXAMPLES 707

𝑦 1 −(1 − 𝐾) 𝑦𝑡 1
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑎𝑡+1
𝑎𝑡+1 0 0 𝑎𝑡 1
𝑦
𝑦𝑡 = [1 0] [ 𝑡 ]
𝑎𝑡

where

• 𝐾 ∶= the stationary Kalman gain


• 𝑎𝑡 ∶= 𝑦𝑡 − 𝐸[𝑦𝑡 | 𝑦𝑡−1 , … , 𝑦0 ]

In the same discussion in [87] it is shown that 𝐾 ∈ [0, 1] and that 𝐾 increases as 𝜎1 /𝜎2 does
In other words, 𝐾 increases as the ratio of the standard deviation of the permanent shock to
that of the transitory shock increases
Please see first look at the Kalman filter
Applying formulas Eq. (18) implies

𝑐𝑡+1 − 𝑐𝑡 = [1 − 𝛽(1 − 𝐾)]𝑎𝑡+1 (28)

where the endowment process can now be represented in terms of the univariate innovation to
𝑦𝑡 as

𝑦𝑡+1 − 𝑦𝑡 = 𝑎𝑡+1 − (1 − 𝐾)𝑎𝑡 (29)

Equation Eq. (29) indicates that the consumer regards

• fraction 𝐾 of an innovation 𝑎𝑡+1 to 𝑦𝑡+1 as permanent


• fraction 1 − 𝐾 as purely transitory

The consumer permanently increases his consumption by the full amount of his estimate of
the permanent part of 𝑎𝑡+1 , but by only (1 − 𝛽) times his estimate of the purely transitory
part of 𝑎𝑡+1
Therefore, in total, he permanently increments his consumption by a fraction 𝐾 + (1 − 𝛽)(1 −
𝐾) = 1 − 𝛽(1 − 𝐾) of 𝑎𝑡+1
He saves the remaining fraction 𝛽(1 − 𝐾)
According to equation Eq. (29), the first difference of income is a first-order moving average
Equation Eq. (28) asserts that the first difference of consumption is IID
Application of formula to this example shows that

𝑏𝑡+1 − 𝑏𝑡 = (𝐾 − 1)𝑎𝑡 (30)

This indicates how the fraction 𝐾 of the innovation to 𝑦𝑡 that is regarded as permanent influ-
ences the fraction of the innovation that is saved
708 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL

42.6 Further Reading

The model described above significantly changed how economists think about consumption
While Hall’s model does a remarkably good job as a first approximation to consumption data,
it’s widely believed that it doesn’t capture important aspects of some consumption/savings
data
For example, liquidity constraints and precautionary savings appear to be present sometimes
Further discussion can be found in, e.g., [49], [103], [31], [22]

42.7 Appendix: The Euler Equation

Where does the first-order condition Eq. (5) come from?


Here we’ll give a proof for the two-period case, which is representative of the general argu-
ment
The finite horizon equivalent of the no-Ponzi condition is that the agent cannot end her life in
debt, so 𝑏2 = 0
From the budget constraint Eq. (2) we then have

𝑏1
𝑐0 = − 𝑏0 + 𝑦0 and 𝑐1 = 𝑦1 − 𝑏1
1+𝑟

Here 𝑏0 and 𝑦0 are given constants


Substituting these constraints into our two-period objective 𝑢(𝑐0 ) + 𝛽E0 [𝑢(𝑐1 )] gives

𝑏1
max {𝑢 ( − 𝑏0 + 𝑦0 ) + 𝛽 E0 [𝑢(𝑦1 − 𝑏1 )]}
𝑏1 𝑅

You will be able to verify that the first-order condition is

𝑢′ (𝑐0 ) = 𝛽𝑅 E0 [𝑢′ (𝑐1 )]

Using 𝛽𝑅 = 1 gives Eq. (5) in the two-period case


The proof for the general case is similar
Footnotes
[1] A linear marginal utility is essential for deriving Eq. (6) from Eq. (5). Suppose instead
that we had imposed the following more standard assumptions on the utility function:
𝑢′ (𝑐) > 0, 𝑢″ (𝑐) < 0, 𝑢‴ (𝑐) > 0 and required that 𝑐 ≥ 0. The Euler equation remains Eq. (5).
But the fact that 𝑢‴ < 0 implies via Jensen’s inequality that E𝑡 [𝑢′ (𝑐𝑡+1 )] > 𝑢′ (E𝑡 [𝑐𝑡+1 ]). This
inequality together with Eq. (5) implies that E𝑡 [𝑐𝑡+1 ] > 𝑐𝑡 (consumption is said to be a ‘sub-
martingale’), so that consumption stochastically diverges to +∞. The consumer’s savings also
diverge to +∞.
[2] An optimal decision rule is a map from the current state into current actions—in this case,
consumption
[3] Representation Eq. (3) implies that 𝑑(𝐿) = 𝑈 (𝐼 − 𝐴𝐿)−1 𝐶.
42.7. APPENDIX: THE EULER EQUATION 709

[4] This would be the case if, for example, the spectral radius of 𝐴 is strictly less than one
[5] A moving average representation for a process 𝑦𝑡 is said to be fundamental if the linear
space spanned by 𝑦𝑡 is equal to the linear space spanned by 𝑤𝑡 . A time-invariant innovations
representation, attained via the Kalman filter, is by construction fundamental.
[6] See [70], [84], [85] for interesting applications of related ideas.
710 42. OPTIMAL SAVINGS I: THE PERMANENT INCOME MODEL
43

Optimal Savings II: LQ Techniques

43.1 Contents

• Overview 43.2

• Setup 43.3

• The LQ Approach 43.4

• Implementation 43.5

• Two Example Economies 43.6

Co-author: Chase Coleman


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

43.2 Overview

This lecture continues our analysis of the linear-quadratic (LQ) permanent income model of
savings and consumption
As we saw in our previous lecture on this topic, Robert Hall [48] used the LQ permanent in-
come model to restrict and interpret intertemporal comovements of nondurable consumption,
nonfinancial income, and financial wealth
For example, we saw how the model asserts that for any covariance stationary process for
nonfinancial income

• consumption is a random walk


• financial wealth has a unit root and is cointegrated with consumption

Other applications use the same LQ framework


For example, a model isomorphic to the LQ permanent income model has been used by
Robert Barro [11] to interpret intertemporal comovements of a government’s tax collections,
its expenditures net of debt service, and its public debt

711
712 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

This isomorphism means that in analyzing the LQ permanent income model, we are in effect
also analyzing the Barro tax smoothing model
It is just a matter of appropriately relabeling the variables in Hall’s model
In this lecture, we’ll

• show how the solution to the LQ permanent income model can be obtained using LQ
control methods
• represent the model as a linear state space system as in this lecture
• apply QuantEcon’s LinearStateSpace class to characterize statistical features of the con-
sumer’s optimal consumption and borrowing plans

We’ll then use these characterizations to construct a simple model of cross-section wealth and
consumption dynamics in the spirit of Truman Bewley [16]
(Later we’ll study other Bewley models—see this lecture)
The model will prove useful for illustrating concepts such as

• stationarity
• ergodicity
• ensemble moments and cross-section observations

43.3 Setup

Let’s recall the basic features of the model discussed in the permanent income model
Consumer preferences are ordered by


𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0

where 𝑢(𝑐) = −(𝑐 − 𝛾)2


The consumer maximizes Eq. (1) by choosing a consumption, borrowing plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0
subject to the sequence of budget constraints

1
𝑐𝑡 + 𝑏𝑡 = 𝑏 + 𝑦𝑡 , 𝑡≥0 (2)
1 + 𝑟 𝑡+1

and the no-Ponzi condition


𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < ∞ (3)
𝑡=0

The interpretation of all variables and parameters are the same as in the previous lecture
We continue to assume that (1 + 𝑟)𝛽 = 1
The dynamics of {𝑦𝑡 } again follow the linear state space model
43.3. SETUP 713

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


(4)
𝑦𝑡 = 𝑈 𝑧 𝑡

The restrictions on the shock process and parameters are the same as in our previous lecture

43.3.1 Digression on a Useful Isomorphism

The LQ permanent income model of consumption is mathematically isomorphic with a ver-


sion of Barro’s [11] model of tax smoothing
In the LQ permanent income model

• the household faces an exogenous process of nonfinancial income


• the household wants to smooth consumption across states and time

In the Barro tax smoothing model

• a government faces an exogenous sequence of government purchases (net of interest pay-


ments on its debt)
• a government wants to smooth tax collections across states and time

If we set

• 𝑇𝑡 , total tax collections in Barro’s model to consumption 𝑐𝑡 in the LQ permanent in-


come model
• 𝐺𝑡 , exogenous government expenditures in Barro’s model to nonfinancial income 𝑦𝑡 in
the permanent income model
• 𝐵𝑡 , government risk-free one-period assets falling due in Barro’s model to risk-free one-
period consumer debt 𝑏𝑡 falling due in the LQ permanent income model
• 𝑅, the gross rate of return on risk-free one-period government debt in Barro’s model
to the gross rate of return 1 + 𝑟 on financial assets in the permanent income model of
consumption

then the two models are mathematically equivalent


All characterizations of a {𝑐𝑡 , 𝑦𝑡 , 𝑏𝑡 } in the LQ permanent income model automatically apply
to a {𝑇𝑡 , 𝐺𝑡 , 𝐵𝑡 } process in the Barro model of tax smoothing
See consumption and tax smoothing models for further exploitation of an isomorphism be-
tween consumption and tax smoothing models

43.3.2 A Specification of the Nonfinancial Income Process

For the purposes of this lecture, let’s assume {𝑦𝑡 } is a second-order univariate autoregressive
process:

𝑦𝑡+1 = 𝛼 + 𝜌1 𝑦𝑡 + 𝜌2 𝑦𝑡−1 + 𝜎𝑤𝑡+1


714 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

We can map this into the linear state space framework in Eq. (4), as discussed in our lecture
on linear models
To do so we take

1 1 0 0 0
𝑧𝑡 = ⎢ 𝑦𝑡 ⎤

⎥, 𝐴 = ⎢𝛼 𝜌1 𝜌2 ⎤

⎥,

𝐶 = ⎢𝜎 ⎤⎥, and 𝑈 = [0 1 0]
𝑦
⎣ 𝑡−1 ⎦ ⎣ 0 1 0 ⎦ 0
⎣ ⎦

43.4 The LQ Approach

Previously we solved the permanent income model by solving a system of linear expectational
difference equations subject to two boundary conditions
Here we solve the same model using LQ methods based on dynamic programming
After confirming that answers produced by the two methods agree, we apply QuantEcon’s
LinearStateSpace class to illustrate features of the model
Why solve a model in two distinct ways?
Because by doing so we gather insights about the structure of the model
Our earlier approach based on solving a system of expectational difference equations brought
to the fore the role of the consumer’s expectations about future nonfinancial income
On the other hand, formulating the model in terms of an LQ dynamic programming problem
reminds us that

• finding the state (of a dynamic programming problem) is an art, and


• iterations on a Bellman equation implicitly jointly solve both a forecasting problem and
a control problem

43.4.1 The LQ Problem

Recall from our lecture on LQ theory that the optimal linear regulator problem is to choose a
decision rule for 𝑢𝑡 to minimize


E ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 },
𝑡=0

subject to 𝑥0 given and the law of motion

̃ + 𝐵𝑢
𝑥𝑡+1 = 𝐴𝑥 ̃
̃ 𝑡 + 𝐶𝑤
𝑡 𝑡+1 , 𝑡 ≥ 0, (5)

where 𝑤𝑡+1 is IID with mean vector zero and E𝑤𝑡 𝑤𝑡′ = 𝐼

The tildes in 𝐴,̃ 𝐵,̃ 𝐶 ̃ are to avoid clashing with notation in Eq. (4)
The value function for this problem is 𝑣(𝑥) = −𝑥′ 𝑃 𝑥 − 𝑑, where

• 𝑃 is the unique positive semidefinite solution of the corresponding matrix Riccati equa-
tion
43.5. IMPLEMENTATION 715

̃ ′̃ )
• The scalar 𝑑 is given by 𝑑 = 𝛽(1 − 𝛽)−1 trace(𝑃 𝐶 𝐶

The optimal policy is 𝑢𝑡 = −𝐹 𝑥𝑡 , where 𝐹 ∶= 𝛽(𝑄 + 𝛽 𝐵̃ ′ 𝑃 𝐵)̃ −1 𝐵̃ ′ 𝑃 𝐴 ̃


Under an optimal decision rule 𝐹 , the state vector 𝑥𝑡 evolves according to 𝑥𝑡+1 = (𝐴 ̃ −
̃
̃ )𝑥𝑡 + 𝐶𝑤
𝐵𝐹 𝑡+1

43.4.2 Mapping into the LQ Framework

To map into the LQ framework, we’ll use

1
𝑧 ⎡ 𝑦 ⎤
𝑥𝑡 ∶= [ 𝑡 ] = ⎢ 𝑡 ⎥
𝑏𝑡 ⎢𝑦𝑡−1 ⎥
⎣ 𝑏𝑡 ⎦

as the state vector and 𝑢𝑡 ∶= 𝑐𝑡 − 𝛾 as the control


With this notation and 𝑈𝛾 ∶= [𝛾 0 0], we can write the state dynamics as in Eq. (5) when

𝐴 0 0 𝐶
𝐴 ̃ ∶= [ ] 𝐵̃ ∶= [ ] and 𝐶 ̃ ∶= [ ] 𝑤𝑡+1
(1 + 𝑟)(𝑈𝛾 − 𝑈 ) 1 + 𝑟 1+𝑟 0

Please confirm for yourself that, with these definitions, the LQ dynamics Eq. (5) match the
dynamics of 𝑧𝑡 and 𝑏𝑡 described above
To map utility into the quadratic form 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 we can set

• 𝑄 ∶= 1 (remember that we are minimizing) and


• 𝑅 ∶= a 4 × 4 matrix of zeros

However, there is one problem remaining


We have no direct way to capture the non-recursive restriction Eq. (3) on the debt sequence
{𝑏𝑡 } from within the LQ framework
To try to enforce it, we’re going to use a trick: put a small penalty on 𝑏𝑡2 in the criterion func-
tion
In the present setting, this means adding a small entry 𝜖 > 0 in the (4, 4) position of 𝑅
That will induce a (hopefully) small approximation error in the decision rule
We’ll check whether it really is small numerically soon

43.5 Implementation

Let’s write some code to solve the model


One comment before we start is that the bliss level of consumption 𝛾 in the utility function
has no effect on the optimal decision rule
We saw this in the previous lecture permanent income
716 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

The reason is that it drops out of the Euler equation for consumption
In what follows we set it equal to unity

43.5.1 The Exogenous Nonfinancial Income Process

First, we create the objects for the optimal linear regulator

In [2]: import quantecon as qe


import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
%matplotlib inline

# Set parameters
α, β, ρ1, ρ2, σ = 10.0, 0.95, 0.9, 0.0, 1.0

R = 1 / β
A = np.array([[1., 0., 0.],
[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
G = np.array([[0., 1., 0.]])

# Form LinearStateSpace system and pull off steady state moments


μ_z0 = np.array([[1.0], [0.0], [0.0]])
Σ_z0 = np.zeros((3, 3))
Lz = qe.LinearStateSpace(A, C, G, mu_0=μ_z0, Sigma_0=Σ_z0)
μ_z, μ_y, Σ_z, Σ_y = Lz.stationary_distributions()

# Mean vector of state for the savings problem


mxo = np.vstack([μ_z, 0.0])

# Create stationary covariance matrix of x -- start everyone off at b=0


a1 = np.zeros((3, 1))
aa = np.hstack([Σ_z, a1])
bb = np.zeros((1, 4))
sxo = np.vstack([aa, bb])

# These choices will initialize the state vector of an individual at zero debt
# and the ergodic distribution of the endowment process. Use these to create
# the Bewley economy.
mxbewley = mxo
sxbewley = sxo

The next step is to create the matrices for the LQ system

In [3]: A12 = np.zeros((3,1))


ALQ_l = np.hstack([A, A12])
ALQ_r = np.array([[0, -R, 0, R]])
ALQ = np.vstack([ALQ_l, ALQ_r])

RLQ = np.array([[0., 0., 0., 0.],


[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 1e-9]])

QLQ = np.array([1.0])
BLQ = np.array([0., 0., 0., R]).reshape(4,1)
CLQ = np.array([0., σ, 0., 0.]).reshape(4,1)
β_LQ = β

Let’s print these out and have a look at them

In [4]: print(f"A = \n {ALQ}")


print(f"B = \n {BLQ}")
43.5. IMPLEMENTATION 717

print(f"R = \n {RLQ}")
print(f"Q = \n {QLQ}")

A =
[[ 1. 0. 0. 0. ]
[10. 0.9 0. 0. ]
[ 0. 1. 0. 0. ]
[ 0. -1.05263158 0. 1.05263158]]
B =
[[0. ]
[0. ]
[0. ]
[1.05263158]]
R =
[[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 0.e+00]
[0.e+00 0.e+00 0.e+00 1.e-09]]
Q =
[1.]

Now create the appropriate instance of an LQ model

In [5]: LQPI = qe.LQ(QLQ, RLQ, ALQ, BLQ, C=CLQ, beta=β_LQ)

We’ll save the implied optimal policy function soon compare them with what we get by em-
ploying an alternative solution method

In [6]: P, F, d = LQPI.stationary_values() # Compute value function and decision rule


ABF = ALQ - BLQ @ F # Form closed loop system

43.5.2 Comparison with the Difference Equation Approach

In our first lecture on the infinite horizon permanent income problem we used a different solu-
tion method
The method was based around

• deducing the Euler equations that are the first-order conditions with respect to con-
sumption and savings
• using the budget constraints and boundary condition to complete a system of expecta-
tional linear difference equations
• solving those equations to obtain the solution

Expressed in state space notation, the solution took the form

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐶𝑤𝑡+1


𝑏𝑡+1 = 𝑏𝑡 + 𝑈 [(𝐼 − 𝛽𝐴)−1 (𝐴 − 𝐼)]𝑧𝑡
𝑦𝑡 = 𝑈 𝑧𝑡
𝑐𝑡 = (1 − 𝛽)[𝑈 (𝐼 − 𝛽𝐴)−1 𝑧𝑡 − 𝑏𝑡 ]

Now we’ll apply the formulas in this system


718 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

In [7]: # Use the above formulas to create the optimal policies for b_{t+1} and c_t
b_pol = G @ la.inv(np.eye(3, 3) - β * A) @ (A - np.eye(3, 3))
c_pol = (1 - β) * G @ la.inv(np.eye(3, 3) - β * A)

# Create the A matrix for a LinearStateSpace instance


A_LSS1 = np.vstack([A, b_pol])
A_LSS2 = np.eye(4, 1, -3)
A_LSS = np.hstack([A_LSS1, A_LSS2])

# Create the C matrix for LSS methods


C_LSS = np.vstack([C, np.zeros(1)])

# Create the G matrix for LSS methods


G_LSS1 = np.vstack([G, c_pol])
G_LSS2 = np.vstack([np.zeros(1), -(1 - β)])
G_LSS = np.hstack([G_LSS1, G_LSS2])

# Use the following values to start everyone off at b=0, initial incomes zero
μ_0 = np.array([1., 0., 0., 0.])
Σ_0 = np.zeros((4, 4))

A_LSS calculated as we have here should equal ABF calculated above using the LQ model

In [8]: ABF - A_LSS

Out[8]: array([[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,


0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[-9.51248175e-06, 9.51247915e-08, 3.36117263e-17,
-1.99999923e-08]])

Now compare pertinent elements of c_pol and F

In [9]: print(c_pol, "\n", -F)

[[6.55172414e+01 3.44827586e-01 1.68058632e-18]]


[[ 6.55172323e+01 3.44827677e-01 -0.00000000e+00 -5.00000190e-02]]

We have verified that the two methods give the same solution
Now let’s create instances of the LinearStateSpace class and use it to do some interesting ex-
periments
To do this, we’ll use the outcomes from our second method

43.6 Two Example Economies

In the spirit of Bewley models [16], we’ll generate panels of consumers


The examples differ only in the initial states with which we endow the consumers
All other parameter values are kept the same in the two examples

• In the first example, all consumers begin with zero nonfinancial income and zero debt

– The consumers are thus ex-ante identical


43.6. TWO EXAMPLE ECONOMIES 719

• In the second example, while all begin with zero debt, we draw their initial income lev-
els from the invariant distribution of financial income

– Consumers are ex-ante heterogeneous

In the first example, consumers’ nonfinancial income paths display pronounced transients
early in the sample

• these will affect outcomes in striking ways

Those transient effects will not be present in the second example


We use methods affiliated with the LinearStateSpace class to simulate the model

43.6.1 First Set of Initial Conditions

We generate 25 paths of the exogenous non-financial income process and the associated opti-
mal consumption and debt paths.
In the first set of graphs, darker lines depict a particular sample path, while the lighter lines
describe 24 other paths
A second graph plots a collection of simulations against the population distribution that we
extract from the LinearStateSpace instance LSS
Comparing sample paths with population distributions at each date 𝑡 is a useful exercise—see
our discussion of the laws of large numbers

In [10]: LSS = qe.LinearStateSpace(A_LSS, C_LSS, G_LSS, mu_0=μ_0, Sigma_0=Σ_0)

43.6.2 Population and Sample Panels

In the code below, we use the LinearStateSpace class to

• compute and plot population quantiles of the distributions of consumption and debt for
a population of consumers
• simulate a group of 25 consumers and plot sample paths on the same graph as the pop-
ulation distribution

In [11]: def income_consumption_debt_series(A, C, G, μ_0, Σ_0, T=150, npaths=25):


"""
This function takes initial conditions (μ_0, Σ_0) and uses the LinearStateSpace
class from QuantEcon to simulate an economy npaths times for T periods.
It then uses that information to generate some graphs related to the discussion
below.
"""
LSS = qe.LinearStateSpace(A, C, G, mu_0=μ_0, Sigma_0=Σ_0)

# Simulation/Moment Parameters
moment_generator = LSS.moment_sequence()

# Simulate various paths


bsim = np.empty((npaths, T))
csim = np.empty((npaths, T))
ysim = np.empty((npaths, T))
720 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

for i in range(npaths):
sims = LSS.simulate(T)
bsim[i, :] = sims[0][-1, :]
csim[i, :] = sims[1][1, :]
ysim[i, :] = sims[1][0, :]

# Get the moments


cons_mean = np.empty(T)
cons_var = np.empty(T)
debt_mean = np.empty(T)
debt_var = np.empty(T)
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(moment_generator)
cons_mean[t], cons_var[t] = μ_y[1], Σ_y[1, 1]
debt_mean[t], debt_var[t] = μ_x[3], Σ_x[3, 3]

return bsim, csim, ysim, cons_mean, cons_var, debt_mean, debt_var

def consumption_income_debt_figure(bsim, csim, ysim):

# Get T
T = bsim.shape[1]

# Create the first figure


fig, ax = plt.subplots(2, 1, figsize=(10, 8))
xvals = np.arange(T)

# Plot consumption and income


ax[0].plot(csim[0, :], label="c", color="b")
ax[0].plot(ysim[0, :], label="y", color="g")
ax[0].plot(csim.T, alpha=.1, color="b")
ax[0].plot(ysim.T, alpha=.1, color="g")
ax[0].legend(loc=4)
ax[0].set(title="Nonfinancial Income, Consumption, and Debt",
xlabel="t", ylabel="y and c")

# Plot debt
ax[1].plot(bsim[0, :], label="b", color="r")
ax[1].plot(bsim.T, alpha=.1, color="r")
ax[1].legend(loc=4)
ax[1].set(xlabel="t", ylabel="debt")

fig.tight_layout()
return fig

def consumption_debt_fanchart(csim, cons_mean, cons_var,


bsim, debt_mean, debt_var):
# Get T
T = bsim.shape[1]

# Create percentiles of cross-section distributions


cmean = np.mean(cons_mean)
c90 = 1.65 * np.sqrt(cons_var)
c95 = 1.96 * np.sqrt(cons_var)
c_perc_95p, c_perc_95m = cons_mean + c95, cons_mean - c95
c_perc_90p, c_perc_90m = cons_mean + c90, cons_mean - c90

# Create percentiles of cross-section distributions


dmean = np.mean(debt_mean)
d90 = 1.65 * np.sqrt(debt_var)
d95 = 1.96 * np.sqrt(debt_var)
d_perc_95p, d_perc_95m = debt_mean + d95, debt_mean - d95
d_perc_90p, d_perc_90m = debt_mean + d90, debt_mean - d90

# Create second figure


fig, ax = plt.subplots(2, 1, figsize=(10, 8))
xvals = np.arange(T)

# Consumption fan
ax[0].plot(xvals, cons_mean, color="k")
ax[0].plot(csim.T, color="k", alpha=.25)
ax[0].fill_between(xvals, c_perc_95m, c_perc_95p, alpha=.25, color="b")
43.6. TWO EXAMPLE ECONOMIES 721

ax[0].fill_between(xvals, c_perc_90m, c_perc_90p, alpha=.25, color="r")


ax[0].set(title="Consumption/Debt over time",
ylim=(cmean-15, cmean+15), ylabel="consumption")

# Debt fan
ax[1].plot(xvals, debt_mean, color="k")
ax[1].plot(bsim.T, color="k", alpha=.25)
ax[1].fill_between(xvals, d_perc_95m, d_perc_95p, alpha=.25, color="b")
ax[1].fill_between(xvals, d_perc_90m, d_perc_90p, alpha=.25, color="r")
ax[1].set(xlabel="t", ylabel="debt")

fig.tight_layout()
return fig

Now let’s create figures with initial conditions of zero for 𝑦0 and 𝑏0

In [12]: out = income_consumption_debt_series(A_LSS, C_LSS, G_LSS, μ_0, Σ_0)


bsim0, csim0, ysim0 = out[:3]
cons_mean0, cons_var0, debt_mean0, debt_var0 = out[3:]

consumption_income_debt_figure(bsim0, csim0, ysim0)

plt.show()

In [13]: consumption_debt_fanchart(csim0, cons_mean0, cons_var0,


bsim0, debt_mean0, debt_var0)

plt.show()
722 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

Here is what is going on in the above graphs


For our simulation, we have set initial conditions 𝑏0 = 𝑦−1 = 𝑦−2 = 0
Because 𝑦−1 = 𝑦−2 = 0, nonfinancial income 𝑦𝑡 starts far below its stationary mean 𝜇𝑦,∞ and
rises early in each simulation
Recall from the previous lecture that we can represent the optimal decision rule for consump-
tion in terms of the co-integrating relationship


(1 − 𝛽)𝑏𝑡 + 𝑐𝑡 = (1 − 𝛽)𝐸𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 (6)
𝑗=0

So at time 0 we have


𝑐0 = (1 − 𝛽)𝐸0 ∑ 𝛽 𝑗 𝑦𝑡
𝑡=0

This tells us that consumption starts at the income that would be paid by an annuity whose
value equals the expected discounted value of nonfinancial income at time 𝑡 = 0
To support that level of consumption, the consumer borrows a lot early and consequently
builds up substantial debt
In fact, he or she incurs so much debt that eventually, in the stochastic steady state, he con-
sumes less each period than his nonfinancial income
43.6. TWO EXAMPLE ECONOMIES 723

He uses the gap between consumption and nonfinancial income mostly to service the interest
payments due on his debt
Thus, when we look at the panel of debt in the accompanying graph, we see that this is a
group of ex-ante identical people each of whom starts with zero debt
All of them accumulate debt in anticipation of rising nonfinancial income
They expect their nonfinancial income to rise toward the invariant distribution of income, a
consequence of our having started them at 𝑦−1 = 𝑦−2 = 0
Cointegration Residual
The following figure plots realizations of the left side of Eq. (6), which, as discussed in our
last lecture, is called the cointegrating residual
As mentioned above, the right side can be thought of as an annuity payment on the expected

present value of future income 𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗
Early along a realization, 𝑐𝑡 is approximately constant while (1 − 𝛽)𝑏𝑡 and (1 −

𝛽)𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 both rise markedly as the household’s present value of income and borrow-
ing rise pretty much together
This example illustrates the following point: the definition of cointegration implies that the
cointegrating residual is asymptotically covariance stationary, not covariance stationary
The cointegrating residual for the specification with zero income and zero debt initially has a
notable transient component that dominates its behavior early in the sample
By altering initial conditions, we shall remove this transient in our second example to be pre-
sented below

In [14]: def cointegration_figure(bsim, csim):


"""
Plots the cointegration
"""
# Create figure
fig, ax = plt.subplots(figsize=(10, 8))
ax.plot((1 - β) * bsim[0, :] + csim[0, :], color="k")
ax.plot((1 - β) * bsim.T + csim.T, color="k", alpha=.1)

ax.set(title="Cointegration of Assets and Consumption", xlabel="t")

return fig

In [15]: cointegration_figure(bsim0, csim0)


plt.show()
724 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

43.6.3 A “Borrowers and Lenders” Closed Economy

When we set 𝑦−1 = 𝑦−2 = 0 and 𝑏0 = 0 in the preceding exercise, we make debt “head north”
early in the sample
Average debt in the cross-section rises and approaches the asymptote
We can regard these as outcomes of a “small open economy” that borrows from abroad at the
fixed gross interest rate 𝑅 = 𝑟 + 1 in anticipation of rising incomes
So with the economic primitives set as above, the economy converges to a steady state in
which there is an excess aggregate supply of risk-free loans at a gross interest rate of 𝑅
This excess supply is filled by “foreigner lenders” willing to make those loans
We can use virtually the same code to rig a “poor man’s Bewley [16] model” in the following
way

• as before, we start everyone at 𝑏0 = 0


𝑦
• But instead of starting everyone at 𝑦−1 = 𝑦−2 = 0, we draw [ −1 ] from the invariant
𝑦−2
distribution of the {𝑦𝑡 } process

This rigs a closed economy in which people are borrowing and lending with each other at a
gross risk-free interest rate of 𝑅 = 𝛽 −1
43.6. TWO EXAMPLE ECONOMIES 725

Across the group of people being analyzed, risk-free loans are in zero excess supply
We have arranged primitives so that 𝑅 = 𝛽 −1 clears the market for risk-free loans at zero
aggregate excess supply
So the risk-free loans are being made from one person to another within our closed set of
agent
There is no need for foreigners to lend to our group
Let’s have a look at the corresponding figures

In [16]: out = income_consumption_debt_series(A_LSS, C_LSS, G_LSS, mxbewley, sxbewley)


bsimb, csimb, ysimb = out[:3]
cons_meanb, cons_varb, debt_meanb, debt_varb = out[3:]

consumption_income_debt_figure(bsimb, csimb, ysimb)

plt.show()

In [17]: consumption_debt_fanchart(csimb, cons_meanb, cons_varb,


bsimb, debt_meanb, debt_varb)

plt.show()
726 43. OPTIMAL SAVINGS II: LQ TECHNIQUES

The graphs confirm the following outcomes:

• As before, the consumption distribution spreads out over time

But now there is some initial dispersion because there is ex-ante heterogeneity in the initial
𝑦
draws of [ −1 ]
𝑦−2

• As before, the cross-section distribution of debt spreads out over time


• Unlike before, the average level of debt stays at zero, confirming that this is a closed
borrower-and-lender economy
• Now the cointegrating residual seems stationary, and not just asymptotically stationary

Let’s have a look at the cointegration figure

In [18]: cointegration_figure(bsimb, csimb)


plt.show()
43.6. TWO EXAMPLE ECONOMIES 727
728 43. OPTIMAL SAVINGS II: LQ TECHNIQUES
44

Consumption and Tax Smoothing


with Complete and Incomplete
Markets

44.1 Contents

• Overview 44.2

• Background 44.3

• Model 1 (Complete Markets) 44.4

• Model 2 (One-Period Risk-Free Debt Only) 44.5

• Example: Tax Smoothing with Complete Markets 44.6

• Linear State Space Version of Complete Markets Model 44.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

44.2 Overview

This lecture describes two types of consumption-smoothing and tax-smoothing models

• one is in the complete markets tradition of Lucas and Stokey [90]


• the other is in the incomplete markets tradition of Hall [48] and Barro [11]

Complete markets allow a consumer or government to buy or sell claims contingent on all
possible states of the world
Incomplete markets allow a consumer or government to buy or sell only a limited set of secu-
rities, often only a single risk-free security
Hall [48] and Barro [11] both assumed that the only asset that can be traded is a risk-free one
period bond

729
730 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

Hall assumed an exogenous stochastic process of nonfinancial income and an exogenous gross
interest rate on one period risk-free debt that equals 𝛽 −1 , where 𝛽 ∈ (0, 1) is also a con-
sumer’s intertemporal discount factor
Barro [11] made an analogous assumption about the risk-free interest rate in a tax-smoothing
model that we regard as isomorphic to Hall’s consumption-smoothing model
We maintain Hall and Barro’s assumption about the interest rate when we describe an incom-
plete markets version of our model
In addition, we extend their assumption about the interest rate to an appropriate counterpart
that we use in a “complete markets” model in the style of Lucas and Stokey [90]
While we are equally interested in consumption-smoothing and tax-smoothing models, for the
most part, we focus explicitly on consumption-smoothing versions of these models
But for each version of the consumption-smoothing model, there is a natural tax-smoothing
counterpart obtained simply by

• relabeling consumption as tax collections and nonfinancial income as government expen-


ditures
• relabeling the consumer’s debt as the government’s assets

For elaborations on this theme, please see Optimal Savings II: LQ Techniques and later parts
of this lecture
We’ll consider two closely related alternative assumptions about the consumer’s exogenous
nonfinancial income process (or in the tax-smoothing interpretation, the government’s exoge-
nous expenditure process):

• that it obeys a finite 𝑁 state Markov chain (setting 𝑁 = 2 most of the time)
• that it is described by a linear state space model with a continuous state vector in R𝑛
driven by a Gaussian vector IID shock process

We’ll spend most of this lecture studying the finite-state Markov specification, but will briefly
treat the linear state space specification before concluding

44.2.1 Relationship to Other Lectures

This lecture can be viewed as a followup to Optimal Savings II: LQ Techniques and a warm-
up for a model of tax smoothing described in Optimal Taxation with State-Contingent Debt
Linear-quadratic versions of the Lucas-Stokey tax-smoothing model are described in Optimal
Taxation in an LQ Economy
The key difference between those lectures and this one is

• Here the decision-maker takes all prices as exogenous, meaning that his decisions do not
affect them
• In Optimal Taxation in an LQ Economy and Optimal Taxation with State-Contingent
Debt, the decision-maker – the government in the case of these lectures – recognizes
that his decisions affect prices

So these later lectures are partly about how the government should manipulate prices of gov-
ernment debt
44.3. BACKGROUND 731

44.3 Background

Outcomes in consumption-smoothing (or tax-smoothing) models emerge from two sources:

• a decision-maker – a consumer in the consumption-smoothing model or a government


in the tax-smoothing model – who wants to maximize an intertemporal objective func-
tion that expresses its preference for paths of consumption (or tax collections) that are
smooth in the sense of not varying across time and Markov states
• a set of trading opportunities that allow the optimizer to transform a possibly erratic
nonfinancial income (or government expenditure) process into a smoother consumption
(or tax collections) process by purchasing or selling financial securities

In the complete markets version of the model, each period the consumer can buy or sell one-
period ahead state-contingent securities whose payoffs depend on next period’s realization of
the Markov state
In the two-state Markov chain case, there are two such securities each period
In an 𝑁 state Markov state version of the model, 𝑁 such securities are traded each period
These state-contingent securities are commonly called Arrow securities, after Kenneth Arrow
who first theorized about them
In the incomplete markets version of the model, the consumer can buy and sell only one secu-
rity each period, a risk-free bond with gross return 𝛽 −1

44.3.1 Finite State Markov Income Process

In each version of the consumption-smoothing model, nonfinancial income is governed by a


two-state Markov chain (it’s easy to generalize this to an 𝑁 state Markov chain)
In particular, the state of the world is given by 𝑠𝑡 that follows a Markov chain with transition
probability matrix

𝑃𝑖𝑗 = P{𝑠𝑡+1 = 𝑠𝑗̄ | 𝑠𝑡 = 𝑠𝑖̄ }

Nonfinancial income {𝑦𝑡 } obeys

𝑦1̄ if 𝑠𝑡 = 𝑠1̄
𝑦𝑡 = {
𝑦2̄ if 𝑠𝑡 = 𝑠2̄

A consumer wishes to maximize


E [∑ 𝛽 𝑡 𝑢(𝑐𝑡 )] where 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 and 0 < 𝛽 < 1 (1)
𝑡=0

Remark About Isomorphism


We can regard these as Barro [11] tax-smoothing models if we set 𝑐𝑡 = 𝑇𝑡 and 𝐺𝑡 = 𝑦𝑡 , where
𝑇𝑡 is total tax collections and {𝐺𝑡 } is an exogenous government expenditures process
732 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

44.3.2 Market Structure

The two models differ in how effectively the market structure allows the consumer to trans-
fer resources across time and Markov states, there being more transfer opportunities in the
complete markets setting than in the incomplete markets setting
Watch how these differences in opportunities affect

• how smooth consumption is across time and Markov states


• how the consumer chooses to make his levels of indebtedness behave over time and
across Markov states

44.4 Model 1 (Complete Markets)

At each date 𝑡 ≥ 0, the consumer trades one-period ahead Arrow securities


We assume that prices of these securities are exogenous to the consumer (or in the tax-
smoothing version of the model, to the government)
Exogenous means that they are unaffected by the decision-maker
In Markov state 𝑠𝑡 at time 𝑡, one unit of consumption in state 𝑠𝑡+1 at time 𝑡 + 1 costs
𝑞(𝑠𝑡+1 | 𝑠𝑡 ) units of the time 𝑡 consumption good
At time 𝑡 = 0, the consumer starts with an inherited level of debt due at time 0 of 𝑏0 units of
time 0 consumption goods
The consumer’s budget constraint at 𝑡 ≥ 0 in Markov state 𝑠𝑡 is

𝑐𝑡 + 𝑏𝑡 ≤ 𝑦(𝑠𝑡 ) + ∑ 𝑞(𝑠𝑗̄ | 𝑠𝑡 ) 𝑏𝑡+1 (𝑠𝑗̄ | 𝑠𝑡 )


𝑗

where 𝑏𝑡 is the consumer’s one-period debt that falls due at time 𝑡 and 𝑏𝑡+1 (𝑠𝑗̄ | 𝑠𝑡 ) are the
consumer’s time 𝑡 sales of the time 𝑡 + 1 consumption good in Markov state 𝑠𝑗̄ , a source of
time 𝑡 revenues
An analog of Hall’s assumption that the one-period risk-free gross interest rate is 𝛽 −1 is

𝑞(𝑠𝑗̄ | 𝑠𝑖̄ ) = 𝛽𝑃𝑖𝑗 (2)

To understand this, observe that in state 𝑠𝑖̄ it costs ∑𝑗 𝑞(𝑠𝑗̄ | 𝑠𝑖̄ ) to purchase one unit of con-
sumption next period for sure, i.e., meaning no matter what state of the world occurs at 𝑡 + 1
Hence the implied price of a risk-free claim on one unit of consumption next period is

∑ 𝑞(𝑠𝑗̄ | 𝑠𝑖̄ ) = ∑ 𝛽𝑃𝑖𝑗 = 𝛽


𝑗 𝑗

This confirms that Eq. (2) is a natural analog of Hall’s assumption about the risk-free one-
period interest rate
First-order necessary conditions for maximizing the consumer’s expected utility are
44.4. MODEL 1 (COMPLETE MARKETS) 733

𝑢′ (𝑐𝑡+1 )
𝛽 P{𝑠𝑡+1 | 𝑠𝑡 } = 𝑞(𝑠𝑡+1 | 𝑠𝑡 )
𝑢′ (𝑐𝑡 )

or, under our assumption Eq. (2) on Arrow security prices,

𝑐𝑡+1 = 𝑐𝑡 (3)

Thus, our consumer sets 𝑐𝑡 = 𝑐 ̄ for all 𝑡 ≥ 0 for some value 𝑐 ̄ that it is our job now to deter-
mine
Guess: We’ll make the plausible guess that

𝑏𝑡+1 (𝑠𝑗̄ | 𝑠𝑡 = 𝑠𝑖̄ ) = 𝑏(𝑠𝑗̄ ), 𝑖 = 1, 2; 𝑗 = 1, 2 (4)

so that the amount borrowed today turns out to depend only on tomorrow’s Markov state.
(Why is this is a plausible guess?)
To determine 𝑐,̄ we shall pursue the implications of the consumer’s budget constraints in each
Markov state today and our guess Eq. (4) about the consumer’s debt level choices
For 𝑡 ≥ 1, these imply

𝑐 ̄ + 𝑏(𝑠1̄ ) = 𝑦(𝑠1̄ ) + 𝑞(𝑠1̄ | 𝑠1̄ )𝑏(𝑠1̄ ) + 𝑞(𝑠2̄ | 𝑠1̄ )𝑏(𝑠2̄ )


(5)
𝑐 ̄ + 𝑏(𝑠2̄ ) = 𝑦(𝑠2̄ ) + 𝑞(𝑠1̄ | 𝑠2̄ )𝑏(𝑠1̄ ) + 𝑞(𝑠2̄ | 𝑠2̄ )𝑏(𝑠2̄ ),

or

𝑏(𝑠 ̄ ) 𝑐̄ 𝑦(𝑠 ̄ ) 𝑃 𝑃12 𝑏(𝑠1̄ )


[ 1 ] + [ ] = [ 1 ] + 𝛽 [ 11 ][ ]
𝑏(𝑠2̄ ) 𝑐̄ 𝑦(𝑠2̄ ) 𝑃21 𝑃22 𝑏(𝑠2̄ )

These are 2 equations in the 3 unknowns 𝑐,̄ 𝑏(𝑠1̄ ), 𝑏(𝑠2̄ )


To get a third equation, we assume that at time 𝑡 = 0, 𝑏0 is the debt due; and we assume that
at time 𝑡 = 0, the Markov state is 𝑠1̄
Then the budget constraint at time 𝑡 = 0 is

𝑐 ̄ + 𝑏0 = 𝑦(𝑠1̄ ) + 𝑞(𝑠1̄ | 𝑠1̄ )𝑏(𝑠1̄ ) + 𝑞(𝑠2̄ | 𝑠1̄ )𝑏(𝑠2̄ ) (6)

If we substitute Eq. (6) into the first equation of Eq. (5) and rearrange, we discover that

𝑏(𝑠1̄ ) = 𝑏0 (7)

We can then use the second equation of Eq. (5) to deduce the restriction

𝑦(𝑠1̄ ) − 𝑦(𝑠2̄ ) + [𝑞(𝑠1̄ | 𝑠1̄ ) − 𝑞(𝑠1̄ | 𝑠2̄ ) − 1]𝑏0 + [𝑞(𝑠2̄ | 𝑠1̄ ) + 1 − 𝑞(𝑠2̄ | 𝑠2̄ )]𝑏(𝑠2̄ ) = 0, (8)

an equation in the unknown 𝑏(𝑠2̄ )


Knowing 𝑏(𝑠1̄ ) and 𝑏(𝑠2̄ ), we can solve equation Eq. (6) for the constant level of consumption
𝑐̄
734 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

44.4.1 Key Outcomes

The preceding calculations indicate that in the complete markets version of our model, we
obtain the following striking results:

• The consumer chooses to make consumption perfectly constant across time and Markov
states

We computed the constant level of consumption 𝑐 ̄ and indicated how that level depends on
the underlying specifications of preferences, Arrow securities prices, the stochastic process of
exogenous nonfinancial income, and the initial debt level 𝑏0

• The consumer’s debt neither accumulates, nor decumulates, nor drifts – instead, the
debt level each period is an exact function of the Markov state, so in the two-state
Markov case, it switches between two values
• We have verified guess Eq. (4)

We computed how one of those debt levels depends entirely on initial debt – it equals it – and
how the other value depends on virtually all remaining parameters of the model

44.4.2 Code

Here’s some code that, among other things, contains a function called consump-
tion_complete()
This function computes 𝑏(𝑠1̄ ), 𝑏(𝑠2̄ ), 𝑐 ̄ as outcomes given a set of parameters, under the as-
sumption of complete markets

In [2]: import numpy as np


import quantecon as qe
import scipy.linalg as la

class ConsumptionProblem:
"""
The data for a consumption problem, including some default values.
"""

def __init__(self,
β=.96,
y=[2, 1.5],
b0=3,
P=np.asarray([[.8, .2],
[.4, .6]])):
"""

Parameters
----------

β : discount factor
P : 2x2 transition matrix
y : list containing the two income levels
b0 : debt in period 0 (= state_1 debt level)

"""
self.β = β
self.y = y
self.b0 = b0
self.P = P
44.4. MODEL 1 (COMPLETE MARKETS) 735

def consumption_complete(cp):
"""
Computes endogenous values for the complete market case.

Parameters
----------

cp : instance of ConsumptionProblem

Returns
-------

c_bar : constant consumption


b1 : rolled over b0
b2 : debt in state_2

associated with the price system

Q = β * P

"""
β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack

y1, y2 = y # extract income levels


b1 = b0 # b1 is known to be equal to b0
Q = β * P # assumed price system

# Using equation (7) calculate b2


b2 = (y2 - y1 - (Q[0, 0] - Q[1, 0] - 1) * b1) / (Q[0, 1] + 1 - Q[1, 1])

# Using equation (5) calculate c_bar


c_bar = y1 - b0 + Q[0, :] @ np.asarray([b1, b2])

return c_bar, b1, b2

def consumption_incomplete(cp, N_simul=150):


"""
Computes endogenous values for the incomplete market case.

Parameters
----------

cp : instance of ConsumptionProblem
N_simul : int

"""

β, P, y, b0 = cp.β, cp.P, cp.y, cp.b0 # Unpack


# For the simulation define a quantecon MC class
mc = qe.MarkovChain(P)

# Useful variables
y = np.asarray(y).reshape(2, 1)
v = np.linalg.inv(np.eye(2) - β * P) @ y

# Simulat state path


s_path = mc.simulate(N_simul, init=0)

# Store consumption and debt path


b_path, c_path = np.ones(N_simul + 1), np.ones(N_simul)
b_path[0] = b0

# Optimal decisions from (12) and (13)


db = ((1 - β) * v - y) / β

for i, s in enumerate(s_path):
c_path[i] = (1 - β) * (v - b_path[i] * np.ones((2, 1)))[s, 0]
b_path[i + 1] = b_path[i] + db[s, 0]

return c_path, b_path[:-1], y[s_path], s_path


736 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

Let’s test by checking that 𝑐 ̄ and 𝑏2 satisfy the budget constraint

In [3]: cp = ConsumptionProblem()
c_bar, b1, b2 = consumption_complete(cp)
debt_complete = np.asarray([b1, b2])
np.isclose(c_bar + b2 - cp.y[1] - (cp.β * cp.P)[1, :] @ debt_complete, 0)

Out[3]: True

Below, we’ll take the outcomes produced by this code – in particular the implied consumption
and debt paths – and compare them with outcomes from an incomplete markets model in the
spirit of Hall [48] and Barro [11] (and also, for those who love history, Gallatin (1807) [46])

44.5 Model 2 (One-Period Risk-Free Debt Only)

This is a version of the original models of Hall (1978) and Barro (1979) in which the decision-
maker’s ability to substitute intertemporally is constrained by his ability to buy or sell only
one security, a risk-free one-period bond bearing a constant gross interest rate that equals 𝛽 −1
Given an initial debt 𝑏0 at time 0, the consumer faces a sequence of budget constraints

𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽𝑏𝑡+1 , 𝑡≥0

where 𝛽 is the price at time 𝑡 of a risk-free claim on one unit of time consumption at time
𝑡+1
First-order conditions for the consumer’s problem are

∑ 𝑢′ (𝑐𝑡+1,𝑗 )𝑃𝑖𝑗 = 𝑢′ (𝑐𝑡,𝑖 )


𝑗

For our assumed quadratic utility function this implies

∑ 𝑐𝑡+1,𝑗 𝑃𝑖𝑗 = 𝑐𝑡,𝑖 (9)


𝑗

which is Hall’s (1978) conclusion that consumption follows a random walk


As we saw in our first lecture on the permanent income model, this leads to


𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − (1 − 𝛽)−1 𝑐𝑡 (10)
𝑗=0

and


𝑐𝑡 = (1 − 𝛽) [E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑏𝑡 ] (11)
𝑗=0

Equation Eq. (11) expresses 𝑐𝑡 as a net interest rate factor 1 − 𝛽 times the sum of the ex-

pected present value of nonfinancial income E𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 and financial wealth −𝑏𝑡
44.5. MODEL 2 (ONE-PERIOD RISK-FREE DEBT ONLY) 737

Substituting Eq. (11) into the one-period budget constraint and rearranging leads to


𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗 − 𝑦𝑡 ] (12)
𝑗=0

Now let’s do a useful calculation that will yield a convenient expression for the key term

E𝑡 ∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 in our finite Markov chain setting
Define


𝑣𝑡 ∶= E𝑡 ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0

In our finite Markov chain setting, 𝑣𝑡 = 𝑣(1) when 𝑠𝑡 = 𝑠1̄ and 𝑣𝑡 = 𝑣(2) when 𝑠𝑡 = 𝑠2̄
Therefore, we can write

𝑣(1) = 𝑦(1) + 𝛽𝑃11 𝑣(1) + 𝛽𝑃12 𝑣(2)


𝑣(2) = 𝑦(2) + 𝛽𝑃21 𝑣(1) + 𝛽𝑃22 𝑣(2)
or

𝑣 ⃗ = 𝑦 ⃗ + 𝛽𝑃 𝑣 ⃗

𝑣(1) 𝑦(1)
where 𝑣 ⃗ = [ ] and 𝑦 ⃗ = [ ]
𝑣(2) 𝑦(2)
We can also write the last expression as

𝑣 ⃗ = (𝐼 − 𝛽𝑃 )−1 𝑦 ⃗

In our finite Markov chain setting, from expression Eq. (11), consumption at date 𝑡 when
debt is 𝑏𝑡 and the Markov state today is 𝑠𝑡 = 𝑖 is evidently

𝑐(𝑏𝑡 , 𝑖) = (1 − 𝛽) ([(𝐼 − 𝛽𝑃 )−1 𝑦]⃗ 𝑖 − 𝑏𝑡 ) (13)

and the increment in debt is

𝑏𝑡+1 − 𝑏𝑡 = 𝛽 −1 [(1 − 𝛽)𝑣(𝑖) − 𝑦(𝑖)] (14)

44.5.1 Summary of Outcomes

In contrast to outcomes in the complete markets model, in the incomplete markets model

• consumption drifts over time as a random walk; the level of consumption at time 𝑡 de-
pends on the level of debt that the consumer brings into the period as well as the ex-
pected discounted present value of nonfinancial income at 𝑡
• the consumer’s debt drifts upward over time in response to low realizations of nonfinan-
cial income and drifts downward over time in response to high realizations of nonfinan-
cial income
738 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

• the drift over time in the consumer’s debt and the dependence of current consumption
on today’s debt level account for the drift over time in consumption

44.5.2 The Incomplete Markets Model

The code above also contains a function called consumption_incomplete() that uses Eq. (13)
and Eq. (14) to

• simulate paths of 𝑦𝑡 , 𝑐𝑡 , 𝑏𝑡+1


• plot these against values of 𝑐,̄ 𝑏(𝑠1 ), 𝑏(𝑠2 ) found in a corresponding complete markets
economy

Let’s try this, using the same parameters in both complete and incomplete markets economies

In [4]: import matplotlib.pyplot as plt

np.random.seed(1)
N_simul = 150
cp = ConsumptionProblem()

c_bar, b1, b2 = consumption_complete(cp)


debt_complete = np.asarray([b1, b2])

c_path, debt_path, y_path, s_path = consumption_incomplete(cp, N_simul=N_simul)

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

ax[0].set_title('Consumption paths')
ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), c_bar * np.ones(N_simul), label='complete market')
ax[0].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[0].legend()
ax[0].set_xlabel('Periods')

ax[1].set_title('Debt paths')
ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path], label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='income', alpha=.6, ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')

plt.show()

<Figure size 1500x500 with 2 Axes>

In the graph on the left, for the same sample path of nonfinancial income 𝑦𝑡 , notice that

• consumption is constant when there are complete markets, but it takes a random walk
in the incomplete markets version of the model
• the consumer’s debt oscillates between two values that are functions of the Markov state
in the complete markets model, while the consumer’s debt drifts in a “unit root” fashion
in the incomplete markets economy

Using the Isomorphism


We can simply relabel variables to acquire tax-smoothing interpretations of our two models
44.6. EXAMPLE: TAX SMOOTHING WITH COMPLETE MARKETS 739

In [5]: fig, ax = plt.subplots(1, 2, figsize=(15, 5))

ax[0].set_title('Tax collection paths')


ax[0].plot(np.arange(N_simul), c_path, label='incomplete market')
ax[0].plot(np.arange(N_simul), c_bar * np.ones(N_simul), label='complete market')
ax[0].plot(np.arange(N_simul), y_path, label='govt expenditures', alpha=.6, ls='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([1.4, 2.1])

ax[1].set_title('Government assets paths')


ax[1].plot(np.arange(N_simul), debt_path, label='incomplete market')
ax[1].plot(np.arange(N_simul), debt_complete[s_path], label='complete market')
ax[1].plot(np.arange(N_simul), y_path, label='govt expenditures', ls='--')
ax[1].legend()
ax[1].axhline(0, color='k', ls='--')
ax[1].set_xlabel('Periods')

plt.show()

44.6 Example: Tax Smoothing with Complete Markets

It is useful to focus on a simple tax-smoothing example with complete markets


This example will illustrate how, in a complete markets model like that of Lucas and Stokey
[90], the government purchases insurance from the private sector.

• Purchasing insurance protects the government against the need to raise taxes
too high or issue too much debt in the high government expenditure event.

We assume that government expenditures move between two values 𝐺1 < 𝐺2 , where Markov
state 1 means “peace” and Markov state 2 means “war”
The government budget constraint in Markov state 𝑖 is

𝑇𝑖 + 𝑏𝑖 = 𝐺𝑖 + ∑ 𝑄𝑖𝑗 𝑏𝑗
𝑗

where

𝑄𝑖𝑗 = 𝛽𝑃𝑖𝑗
740 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

is the price of one unit of output next period in state 𝑗 when today’s Markov state is 𝑖 and 𝑏𝑖
is the government’s level of assets in Markov state 𝑖
That is, 𝑏𝑖 is the amount of the one-period loans owned by the government that fall due at
time 𝑡
As above, we’ll assume that the initial Markov state is state 1
In addition, to simplify our example, we’ll set the government’s initial asset level to 0, so that
𝑏1 = 0
Here’s our code to compute a quantitative example with zero debt in peace time:

In [6]: # Parameters

β = .96
y = [1, 2]
b0 = 0
P = np.asarray([[.8, .2],
[.4, .6]])

cp = ConsumptionProblem(β, y, b0, P)
Q = β * P
N_simul = 150

c_bar, b1, b2 = consumption_complete(cp)


debt_complete = np.asarray([b1, b2])

print(f"P \n {P}")
print(f"Q \n {Q}")
print(f"Govt expenditures in peace and war = {y}")
print(f"Constant tax collections = {c_bar}")
print(f"Govt assets in two states = {debt_complete}")

msg = """
Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.
"""
print(msg)

AS1 = Q[0, 1] * b2
print(f"Spending on Arrow war security in peace = {AS1}")
AS2 = Q[1, 1] * b2
print(f"Spending on Arrow war security in war = {AS2}")

print("\n")
print("Government tax collections plus asset levels in peace and war")
TB1 = c_bar + b1
print(f"T+b in peace = {TB1}")
TB2 = c_bar + b2
print(f"T+b in war = {TB2}")

print("\n")
print("Total government spending in peace and war")
G1 = y[0] + AS1
G2 = y[1] + AS2
print(f"Peace = {G1}")
print(f"War = {G2}")

print("\n")
print("Let's see ex-post and ex-ante returns on Arrow securities")

Π = np.reciprocal(Q)
exret = Π
print(f"Ex-post returns to purchase of Arrow securities = {exret}")
exant = Π * P
print(f"Ex-ante returns to purchase of Arrow securities {exant}")

P
[[0.8 0.2]
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 741

[0.4 0.6]]
Q
[[0.768 0.192]
[0.384 0.576]]
Govt expenditures in peace and war = [1, 2]
Constant tax collections = 1.3116883116883118
Govt assets in two states = [0. 1.62337662]

Now let's check the government's budget constraint in peace and war.
Our assumptions imply that the government always purchases 0 units of the
Arrow peace security.

Spending on Arrow war security in peace = 0.3116883116883117


Spending on Arrow war security in war = 0.9350649350649349

Government tax collections plus asset levels in peace and war


T+b in peace = 1.3116883116883118
T+b in war = 2.9350649350649354

Total government spending in peace and war


Peace = 1.3116883116883118
War = 2.935064935064935

Let's see ex-post and ex-ante returns on Arrow securities


Ex-post returns to purchase of Arrow securities = [[1.30208333 5.20833333]
[2.60416667 1.73611111]]
Ex-ante returns to purchase of Arrow securities [[1.04166667 1.04166667]
[1.04166667 1.04166667]]

44.6.1 Explanation

In this example, the government always purchase 0 units of the Arrow security that pays off
in peace time (Markov state 1)
But it purchases a positive amount of the security that pays off in war time (Markov state 2)
We recommend plugging the quantities computed above into the government budget con-
straints in the two Markov states and staring
This is an example in which the government purchases insurance against the possibility that
war breaks out or continues

• the insurance does not pay off so long as peace continues


• the insurance pays off when there is war

Exercise: try changing the Markov transition matrix so that

1 0
𝑃 =[ ]
.2 .8

Also, start the system in Markov state 2 (war) with initial government assets −10, so that the
government starts the war in debt and 𝑏2 = −10

44.7 Linear State Space Version of Complete Markets Model

Now we’ll use a setting like that in the first lecture on the permanent income model
742 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

In that model, there were

• incomplete markets: the consumer could trade only a single risk-free one-period bond
bearing gross one-period risk-free interest rate equal to 𝛽 −1
• the consumer’s exogenous nonfinancial income was governed by a linear state space
model driven by Gaussian shocks, the kind of model studied in an earlier lecture about
linear state space models

We’ll write down a complete markets counterpart of that model


So now we’ll suppose that nonfinancial income is governed by the state space system

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝑆𝑦 𝑥𝑡

where 𝑥𝑡 is an 𝑛 × 1 vector and 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼) is IID over time


Again, as a counterpart of the Hall-Barro assumption that the risk-free gross interest rate is
𝛽 −1 , we assume the scaled prices of one-period ahead Arrow securities are

𝑝𝑡+1 (𝑥𝑡+1 | 𝑥𝑡 ) = 𝛽𝜙(𝑥𝑡+1 | 𝐴𝑥𝑡 , 𝐶𝐶 ′ ) (15)

where 𝜙(⋅ | 𝜇, Σ) is a multivariate Gaussian distribution with mean vector 𝜇 and covariance
matrix Σ
Let 𝑏(𝑥𝑡+1 ) be a vector of state-contingent debt due at 𝑡 + 1 as a function of the 𝑡 + 1 state
𝑥𝑡+1 .
Using the pricing function assumed in Eq. (15), the value at 𝑡 of 𝑏(𝑥𝑡+1 ) is

𝛽 ∫ 𝑏(𝑥𝑡+1 )𝜙(𝑥𝑡+1 | 𝐴𝑥𝑡 , 𝐶𝐶 ′ )𝑑𝑥𝑡+1 = 𝛽E𝑡 𝑏𝑡+1

In the complete markets setting, the consumer faces a sequence of budget constraints

𝑐𝑡 + 𝑏𝑡 = 𝑦𝑡 + 𝛽E𝑡 𝑏𝑡+1 , 𝑡 ≥ 0

We can solve the time 𝑡 budget constraint forward to obtain


𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐𝑡+𝑗 )
𝑗=0

We assume as before that the consumer cares about the expected value of


∑ 𝛽 𝑡 𝑢(𝑐𝑡 ), 0<𝛽<1
𝑡=0

In the incomplete markets version of the model, we assumed that 𝑢(𝑐𝑡 ) = −(𝑐𝑡 − 𝛾)2 , so that
the above utility functional became
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 743


− ∑ 𝛽 𝑡 (𝑐𝑡 − 𝛾)2 , 0<𝛽<1
𝑡=0

But in the complete markets version, we can assume a more general form of utility function
that satisfies 𝑢′ > 0 and 𝑢″ < 0
The first-order condition for the consumer’s problem with complete markets and our assump-
tion about Arrow securities prices is

𝑢′ (𝑐𝑡+1 ) = 𝑢′ (𝑐𝑡 ) for all 𝑡 ≥ 0

which again implies 𝑐𝑡 = 𝑐 ̄ for some 𝑐 ̄


So it follows that


𝑏𝑡 = E𝑡 ∑ 𝛽 𝑗 (𝑦𝑡+𝑗 − 𝑐)̄
𝑗=0

or

1
𝑏𝑡 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥𝑡 − 𝑐̄ (16)
1−𝛽

where the value of 𝑐 ̄ satisfies

1
𝑏̄0 = 𝑆𝑦 (𝐼 − 𝛽𝐴)−1 𝑥0 − 𝑐̄ (17)
1−𝛽

where 𝑏̄0 is an initial level of the consumer’s debt, specified as a parameter of the problem
Thus, in the complete markets version of the consumption-smoothing model, 𝑐𝑡 = 𝑐,̄ ∀𝑡 ≥ 0 is
determined by Eq. (17) and the consumer’s debt is a fixed function of the state 𝑥𝑡 described
by Eq. (16)
Here’s an example that shows how in this setting the availability of insurance against fluctu-
ating nonfinancial income allows the consumer completely to smooth consumption across time
and across states of the world

In [7]: def complete_ss(β, b0, x0, A, C, S_y, T=12):


"""
Computes the path of consumption and debt for the previously described
complete markets model where exogenous income follows a linear
state space
"""
# Create a linear state space for simulation purposes
# This adds "b" as a state to the linear state space system
# so that setting the seed places shocks in same place for
# both the complete and incomplete markets economy
# Atilde = np.vstack([np.hstack([A, np.zeros((A.shape[0], 1))]),
# np.zeros((1, A.shape[1] + 1))])
# Ctilde = np.vstack([C, np.zeros((1, 1))])
# S_ytilde = np.hstack([S_y, np.zeros((1, 1))])

lss = qe.LinearStateSpace(A, C, S_y, mu_0=x0)

# Add extra state to initial condition


# x0 = np.hstack([x0, np.zeros(1)])
744 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS

# Compute the (I - β * A)^{-1}


rm = la.inv(np.eye(A.shape[0]) - β * A)

# Constant level of consumption


cbar = (1 - β) * (S_y @ rm @ x0 - b0)
c_hist = np.ones(T) * cbar

# Debt
x_hist, y_hist = lss.simulate(T)
b_hist = np.squeeze(S_y @ rm @ x_hist - cbar / (1 - β))

return c_hist, b_hist, np.squeeze(y_hist), x_hist

# Define parameters
N_simul = 150
α, ρ1, ρ2 = 10.0, 0.9, 0.0
σ = 1.0

A = np.array([[1., 0., 0.],


[α, ρ1, ρ2],
[0., 1., 0.]])
C = np.array([[0.], [σ], [0.]])
S_y = np.array([[1, 1.0, 0.]])
β, b0 = 0.95, -10.0
x0 = np.array([1.0, α / (1 - ρ1), α / (1 - ρ1)])

# Do simulation for complete markets


s = np.random.randint(0, 10000)
np.random.seed(s) # Seeds get set the same for both economies
out = complete_ss(β, b0, x0, A, C, S_y, 150)
c_hist_com, b_hist_com, y_hist_com, x_hist_com = out

fig, ax = plt.subplots(1, 2, figsize=(15, 5))

# Consumption plots
ax[0].set_title('Cons and income', fontsize=17)
ax[0].plot(np.arange(N_simul), c_hist_com, label='consumption')
ax[0].plot(np.arange(N_simul), y_hist_com, label='income', alpha=.6, linestyle='--')
ax[0].legend()
ax[0].set_xlabel('Periods')
ax[0].set_ylim([-5.0, 110])

# Debt plots
ax[1].set_title('Debt and income')
ax[1].plot(np.arange(N_simul), b_hist_com, label='debt')
ax[1].plot(np.arange(N_simul), y_hist_com, label='Income', alpha=.6, linestyle='--')
ax[1].legend()
ax[1].axhline(0, color='k')
ax[1].set_xlabel('Periods')

plt.show()
44.7. LINEAR STATE SPACE VERSION OF COMPLETE MARKETS MODEL 745

44.7.1 Interpretation of Graph

In the above graph, please note that:

• nonfinancial income fluctuates in a stationary manner


• consumption is completely constant
• the consumer’s debt fluctuates in a stationary manner; in fact, in this case, because
nonfinancial income is a first-order autoregressive process, the consumer’s debt is an
exact affine function (meaning linear plus a constant) of the consumer’s nonfinancial
income

44.7.2 Incomplete Markets Version

The incomplete markets version of the model with nonfinancial income being governed by a
linear state space system is described in the first lecture on the permanent income model and
the followup lecture on the permanent income model
In that version, consumption follows a random walk and the consumer’s debt follows a pro-
cess with a unit root
We leave it to the reader to apply the usual isomorphism to deduce the corresponding impli-
cations for a tax-smoothing model like Barro’s [11]

44.7.3 Government Manipulation of Arrow Securities Prices

In optimal taxation in an LQ economy and recursive optimal taxation, we study complete-


markets models in which the government recognizes that it can manipulate Arrow securities
prices
In optimal taxation with incomplete markets, we study an incomplete-markets model in
which the government manipulates asset prices
746 44. CONSUMPTION AND TAX SMOOTHING WITH COMPLETE AND INCOMPLETE MARKETS
45

Optimal Savings III: Occasionally


Binding Constraints

45.1 Contents

• Overview 45.2

• The Optimal Savings Problem 45.3

• Computation 45.4

• Exercises 45.5

• Solutions 45.6

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon


!pip install interpolation

45.2 Overview

Next, we study an optimal savings problem for an infinitely lived consumer—the “common
ancestor” described in [87], section 1.3
This is an essential sub-problem for many representative macroeconomic models

• [4]
• [68]
• etc.

It is related to the decision problem in the stochastic optimal growth model and yet differs in
important ways
For example, the choice problem for the agent includes an additive income term that leads to
an occasionally binding constraint
Our presentation of the model will be relatively brief

747
748 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

• For further details on economic intuition, implication and models, see [87]
• Proofs of all mathematical results stated below can be found in this paper

To solve the model we will use Euler equation based time iteration, similar to this lecture
This method turns out to be globally convergent under mild assumptions, even when utility is
unbounded (both above and below)
We’ll need the following imports

In [2]: import numpy as np


from quantecon.optimize import brent_max, brentq
from interpolation import interp
from numba import njit
import matplotlib.pyplot as plt
%matplotlib inline

45.2.1 References

Other useful references include [31], [33], [80], [105], [108] and [119]

45.3 The Optimal Savings Problem

Let’s write down the model and then discuss how to solve it

45.3.1 Set-Up

Consider a household that chooses a state-contingent consumption plan {𝑐𝑡 }𝑡≥0 to maximize


E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0

subject to

𝑐𝑡 + 𝑎𝑡+1 ≤ 𝑅𝑎𝑡 + 𝑧𝑡 , 𝑐𝑡 ≥ 0, 𝑎𝑡 ≥ −𝑏 𝑡 = 0, 1, … (1)

Here

• 𝛽 ∈ (0, 1) is the discount factor


• 𝑎𝑡 is asset holdings at time 𝑡, with ad-hoc borrowing constraint 𝑎𝑡 ≥ −𝑏
• 𝑐𝑡 is consumption
• 𝑧𝑡 is non-capital income (wages, unemployment compensation, etc.)
• 𝑅 ∶= 1 + 𝑟, where 𝑟 > 0 is the interest rate on savings

Non-capital income {𝑧𝑡 } is assumed to be a Markov process taking values in 𝑍 ⊂ (0, ∞) with
stochastic kernel Π
This means that Π(𝑧, 𝐵) is the probability that 𝑧𝑡+1 ∈ 𝐵 given 𝑧𝑡 = 𝑧
The expectation of 𝑓(𝑧𝑡+1 ) given 𝑧𝑡 = 𝑧 is written as
45.3. THE OPTIMAL SAVINGS PROBLEM 749

∫ 𝑓(𝑧)́ Π(𝑧, 𝑑𝑧)́

We further assume that

1. 𝑟 > 0 and 𝛽𝑅 < 1


2. 𝑢 is smooth, strictly increasing and strictly concave with lim𝑐→0 𝑢′ (𝑐) = ∞ and
lim𝑐→∞ 𝑢′ (𝑐) = 0

The asset space is [−𝑏, ∞) and the state is the pair (𝑎, 𝑧) ∈ 𝑆 ∶= [−𝑏, ∞) × 𝑍
A feasible consumption path from (𝑎, 𝑧) ∈ 𝑆 is a consumption sequence {𝑐𝑡 } such that {𝑐𝑡 }
and its induced asset path {𝑎𝑡 } satisfy

1. (𝑎0 , 𝑧0 ) = (𝑎, 𝑧)
2. the feasibility constraints in Eq. (1), and
3. measurability of 𝑐𝑡 w.r.t. the filtration generated by {𝑧1 , … , 𝑧𝑡 }

The meaning of the third point is just that consumption at time 𝑡 can only be a function of
outcomes that have already been observed

45.3.2 Value Function and Euler Equation

The value function 𝑉 ∶ 𝑆 → R is defined by


𝑉 (𝑎, 𝑧) ∶= sup E {∑ 𝛽 𝑡 𝑢(𝑐𝑡 )} (2)
𝑡=0

where the supremum is overall feasible consumption paths from (𝑎, 𝑧)


An optimal consumption path from (𝑎, 𝑧) is a feasible consumption path from (𝑎, 𝑧) that at-
tains the supremum in Eq. (2)
To pin down such paths we can use a version of the Euler equation, which in the present set-
ting is

𝑢′ (𝑐𝑡 ) ≥ 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )] (3)

and

𝑢′ (𝑐𝑡 ) = 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )] whenever 𝑐𝑡 < 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏 (4)

In essence, this says that the natural “arbitrage” relation 𝑢′ (𝑐𝑡 ) = 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )] holds when
the choice of current consumption is interior
Interiority means that 𝑐𝑡 is strictly less than its upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏
(The lower boundary case 𝑐𝑡 = 0 never arises at the optimum because 𝑢′ (0) = ∞)
When 𝑐𝑡 does hit the upper bound 𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏, the strict inequality 𝑢′ (𝑐𝑡 ) > 𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )]
can occur because 𝑐𝑡 cannot increase sufficiently to attain equality
750 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

With some thought and effort, one can show that Eq. (3) and Eq. (4) are equivalent to

𝑢′ (𝑐𝑡 ) = max {𝛽𝑅 E𝑡 [𝑢′ (𝑐𝑡+1 )] , 𝑢′ (𝑅𝑎𝑡 + 𝑧𝑡 + 𝑏)} (5)

45.3.3 Optimality Results

Given our assumptions, it is known that

1. For each (𝑎, 𝑧) ∈ 𝑆, a unique optimal consumption path from (𝑎, 𝑧) exists
2. This path is the unique feasible path from (𝑎, 𝑧) satisfying the Euler equality Eq. (5)
and the transversality condition

lim 𝛽 𝑡 E [𝑢′ (𝑐𝑡 )𝑎𝑡+1 ] = 0 (6)


𝑡→∞

Moreover, there exists an optimal consumption function 𝜎∗ ∶ 𝑆 → [0, ∞) such that the path
from (𝑎, 𝑧) generated by

(𝑎0 , 𝑧0 ) = (𝑎, 𝑧), 𝑧𝑡+1 ∼ Π(𝑧𝑡 , 𝑑𝑦), 𝑐𝑡 = 𝜎∗ (𝑎𝑡 , 𝑧𝑡 ) and 𝑎𝑡+1 = 𝑅𝑎𝑡 + 𝑧𝑡 − 𝑐𝑡

satisfies both Eq. (5) and Eq. (6), and hence is the unique optimal path from (𝑎, 𝑧)
In summary, to solve the optimization problem, we need to compute 𝜎∗

45.4 Computation

There are two standard ways to solve for 𝜎∗

1. Time iteration (TI) using the Euler equality


2. Value function iteration (VFI)

Let’s look at these in turn

45.4.1 Time Iteration

We can rewrite Eq. (5) to make it a statement about functions rather than random variables
In particular, consider the functional equation

𝑢′ ∘ 𝜎 (𝑎, 𝑧) = max {𝛾 ∫ 𝑢′ ∘ 𝜎 {𝑅𝑎 + 𝑧 − 𝑐(𝑎, 𝑧), 𝑧}́ Π(𝑧, 𝑑𝑧)́ , 𝑢′ (𝑅𝑎 + 𝑧 + 𝑏)} (7)

where 𝛾 ∶= 𝛽𝑅 and 𝑢′ ∘ 𝑐(𝑠) ∶= 𝑢′ (𝑐(𝑠))


Equation Eq. (7) is a functional equation in 𝜎
In order to identify a solution, let 𝒞 be the set of candidate consumption functions 𝜎 ∶ 𝑆 → R
such that
45.4. COMPUTATION 751

• each 𝜎 ∈ 𝒞 is continuous and (weakly) increasing


• min 𝑍 ≤ 𝑐(𝑎, 𝑧) ≤ 𝑅𝑎 + 𝑧 + 𝑏 for all (𝑎, 𝑧) ∈ 𝑆

In addition, let 𝐾 ∶ 𝒞 → 𝒞 be defined as follows


For given 𝜎 ∈ 𝒞, the value 𝐾𝜎(𝑎, 𝑧) is the unique 𝑡 ∈ 𝐽 (𝑎, 𝑧) that solves

𝑢′ (𝑡) = max {𝛾 ∫ 𝑢′ ∘ 𝜎 {𝑅𝑎 + 𝑧 − 𝑡, 𝑧}́ Π(𝑧, 𝑑𝑧)́ , 𝑢′ (𝑅𝑎 + 𝑧 + 𝑏)} (8)

where

𝐽 (𝑎, 𝑧) ∶= {𝑡 ∈ R ∶ min 𝑍 ≤ 𝑡 ≤ 𝑅𝑎 + 𝑧 + 𝑏} (9)

We refer to 𝐾 as Coleman’s policy function operator [28]


It is known that

• 𝐾 is a contraction mapping on 𝒞 under the metric

𝜌(𝑐, 𝑑) ∶= ‖ 𝑢′ ∘ 𝜎1 − 𝑢′ ∘ 𝜎2 ‖ ∶= sup | 𝑢′ (𝜎1 (𝑠)) − 𝑢′ (𝜎2 (𝑠)) | (𝜎1 , 𝜎2 ∈ 𝒞)


𝑠∈𝑆

• The metric 𝜌 is complete on 𝒞


• Convergence in 𝜌 implies uniform convergence on compacts

In consequence, 𝐾 has a unique fixed point 𝜎∗ ∈ 𝒞 and 𝐾 𝑛 𝑐 → 𝜎∗ as 𝑛 → ∞ for any 𝜎 ∈ 𝒞


By the definition of 𝐾, the fixed points of 𝐾 in 𝒞 coincide with the solutions to Eq. (7) in 𝒞
In particular, it can be shown that the path {𝑐𝑡 } generated from (𝑎0 , 𝑧0 ) ∈ 𝑆 using policy
function 𝜎∗ is the unique optimal path from (𝑎0 , 𝑧0 ) ∈ 𝑆
TL;DR The unique optimal policy can be computed by picking any 𝜎 ∈ 𝒞 and iterating with
the operator 𝐾 defined in Eq. (8)

45.4.2 Value Function Iteration

The Bellman operator for this problem is given by

𝑇 𝑣(𝑎, 𝑧) = max {𝑢(𝑐) + 𝛽 ∫ 𝑣(𝑅𝑎 + 𝑧 − 𝜎, 𝑧)Π(𝑧,


́ 𝑑𝑧)}
́ (10)
0≤𝜎≤𝑅𝑎+𝑧+𝑏

We have to be careful with VFI (i.e., iterating with 𝑇 ) in this setting because 𝑢 is not as-
sumed to be bounded

• In fact typically unbounded both above and below — e.g. 𝑢(𝑐) = log 𝑐
• In which case, the standard DP theory does not apply
• 𝑇 𝑛 𝑣 is not guaranteed to converge to the value function for arbitrary continuous
bounded 𝑣
752 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

Nonetheless, we can always try the popular strategy “iterate and hope”
We can then check the outcome by comparing with that produced by TI
The latter is known to converge, as described above

45.4.3 Implementation

First, we build a class called ConsumerProblem that stores the model primitives

In [3]: class ConsumerProblem:


"""
A class that stores primitives for the income fluctuation problem. The
income process is assumed to be a finite state Markov chain.
"""
def __init__(self,
r=0.01, # Interest rate
β=0.96, # Discount factor
Π=((0.6, 0.4),
(0.05, 0.95)), # Markov matrix for z_t
z_vals=(0.5, 1.0), # State space of z_t
b=0, # Borrowing constraint
grid_max=16,
grid_size=50,
u=np.log, # Utility function
du=njit(lambda x: 1/x)): # Derivative of utility

self.u, self.du = u, du
self.r, self.R = r, 1 + r
self.β, self.b = β, b
self.Π, self.z_vals = np.array(Π), tuple(z_vals)
self.asset_grid = np.linspace(-b, grid_max, grid_size)

The function operator_factory returns the operator K as specified above

In [4]: def operator_factory(cp):


"""
A function factory for building operator K.

Here cp is an instance of ConsumerProblem.


"""
# === Simplify names, set up arrays === #
R, Π, β, u, b, du = cp.R, cp.Π, cp.β, cp.u, cp.b, cp.du
asset_grid, z_vals = cp.asset_grid, cp.z_vals
γ = R * β

@njit
def euler_diff(c, a, z, i_z, σ):
"""
The difference of the left-hand side and the right-hand side
of the Euler Equation.
"""
lhs = du(c)
expectation = 0
for i in range(len(z_vals)):
expectation += du(interp(asset_grid, σ[:, i], R * a + z - c)) * Π[i_z, i]
rhs = max(γ * expectation, du(R * a + z + b))

return lhs - rhs

@njit
def K(σ):
"""
The operator K.

Iteration with this operator corresponds to time iteration on the Euler


45.4. COMPUTATION 753

equation. Computes and returns the updated consumption policy


σ. The array σ is replaced with a function cf that implements
univariate linear interpolation over the asset grid for each
possible value of z.
"""
σ_new = np.empty_like(σ)
for i_a in range(len(asset_grid)):
a = asset_grid[i_a]
for i_z in range(len(z_vals)):
z = z_vals[i_z]
c_star = brentq(euler_diff, 1e-8, R * a + z + b, args=(a, z, i_z, σ)).root
σ_new[i_a, i_z] = c_star

return σ_new

return K

K uses linear interpolation along the asset grid to approximate the value and consumption
functions
To solve for the optimal policy function, we will write a function solve_model to iterate
and find the optimal 𝜎

In [5]: def solve_model(cp,


tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

"""
Solves for the optimal policy using time iteration

* cp is an instance of ConsumerProblem
"""

u, β, b, R = cp.u, cp.β, cp.b, cp.R


asset_grid, z_vals = cp.asset_grid, cp.z_vals

# initial guess of σ
σ = np.empty((len(asset_grid), len(z_vals)))
for i_a, a in enumerate(asset_grid):
for i_z, z in enumerate(z_vals):
c_max = R * a + z + b
σ[i_a, i_z] = c_max

K = operator_factory(cp)

i = 0
error = tol + 1

while i < max_iter and error > tol:


σ_new = K(σ)
error = np.max(np.abs(σ - σ_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
σ = σ_new

if i == max_iter:
print("Failed to converge!")

if verbose and i < max_iter:


print(f"\nConverged in {i} iterations.")

return σ_new

Plotting the result using the default parameters of the ConsumerProblem class

In [6]: cp = ConsumerProblem()
σ_star = solve_model(cp)
754 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

fig, ax = plt.subplots(figsize=(10, 6))


ax.plot(cp.asset_grid, σ_star[:, 0], label='$\sigma^*$')
ax.set(xlabel='asset level', ylabel='optimal consumption')
ax.legend()
plt.show()

Error at iteration 25 is 0.007773142982545167.

Converged in 41 iterations.

The following exercises walk you through several applications where policy functions are com-
puted

45.5 Exercises

45.5.1 Exercise 1

Next, let’s consider how the interest rate affects consumption


Reproduce the following figure, which shows (approximately) optimal consumption policies
for different interest rates
45.5. EXERCISES 755

- Other than r, all parameters are at their default values


- r steps through np.linspace(0, 0.04, 4)
- Consumption is plotted against assets for income shock fixed at the smallest value
The figure shows that higher interest rates boost savings and hence suppress consumption

45.5.2 Exercise 2

Now let’s consider the long run asset levels held by households
We’ll take r = 0.03 and otherwise use default parameters
The following figure is a 45 degree diagram showing the law of motion for assets when con-
sumption is optimal

In [7]: m = ConsumerProblem(r=0.03, grid_max=4)


K = operator_factory(m)

σ_star = solve_model(m, verbose=False)


a = m.asset_grid
R, z_vals = m.R, m.z_vals

fig, ax = plt.subplots(figsize=(10, 8))


ax.plot(a, R * a + z_vals[0] - σ_star[:, 0], label='Low income')
ax.plot(a, R * a + z_vals[1] - σ_star[:, 1], label='High income')
ax.plot(a, a, 'k--')
ax.set(xlabel='Current assets',
ylabel='Next period assets',
xlim=(0, 4), ylim=(0, 4))
ax.legend()
plt.show()
756 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

The blue line and orange line represent the function

𝑎′ = ℎ(𝑎, 𝑧) ∶= 𝑅𝑎 + 𝑧 − 𝜎∗ (𝑎, 𝑧)

when income 𝑧 takes its high and low values respectively


The dashed line is the 45 degree line
We can see from the figure that the dynamics will be stable — assets do not diverge
In fact there is a unique stationary distribution of assets that we can calculate by simulation

• Can be proved via theorem 2 of [66]


• Represents the long run dispersion of assets across households when households have
idiosyncratic shocks

Ergodicity is valid here, so stationary probabilities can be calculated by averaging over a sin-
gle long time series
Hence to approximate the stationary distribution we can simulate a long time series for assets
and histogram, as in the following figure
45.5. EXERCISES 757

Your task is to replicate the figure

• Parameters are as discussed above


• The histogram in the figure used a single time series {𝑎𝑡 } of length 500,000
• Given the length of this time series, the initial condition (𝑎0 , 𝑧0 ) will not matter
• You might find it helpful to use the MarkovChain class from quantecon

45.5.3 Exercise 3

Following on from exercises 1 and 2, let’s look at how savings and aggregate asset holdings
vary with the interest rate

• Note: [87] section 18.6 can be consulted for more background on the topic treated in
this exercise

For a given parameterization of the model, the mean of the stationary distribution can be in-
terpreted as aggregate capital in an economy with a unit mass of ex-ante identical households
facing idiosyncratic shocks
Let’s look at how this measure of aggregate capital varies with the interest rate and borrow-
ing constraint
The next figure plots aggregate capital against the interest rate for b in (1, 3)
758 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

As is traditional, the price (interest rate) is on the vertical axis


The horizontal axis is aggregate capital computed as the mean of the stationary distribution
Exercise 3 is to replicate the figure, making use of code from previous exercises
Try to explain why the measure of aggregate capital is equal to −𝑏 when 𝑟 = 0 for both cases
shown here

45.6 Solutions

45.6.1 Exercise 1

In [8]: r_vals = np.linspace(0, 0.04, 4)

fig, ax = plt.subplots(figsize=(10, 8))


for r_val in r_vals:
cp = ConsumerProblem(r=r_val)
σ_star = solve_model(cp, verbose=False)
ax.plot(cp.asset_grid, σ_star[:, 0], label=f'$r = {r_val:.3f}$')

ax.set(xlabel='asset level', ylabel='consumption (low income)')


ax.legend()
plt.show()
45.6. SOLUTIONS 759

45.6.2 Exercise 2

In [9]: from quantecon import MarkovChain

def compute_asset_series(cp, T=500000, verbose=False):


"""
Simulates a time series of length T for assets, given optimal savings
behavior.

cp is an instance of ConsumerProblem
"""
Π, z_vals, R = cp.Π, cp.z_vals, cp.R # Simplify names
mc = MarkovChain(Π)
σ_star = solve_model(cp, verbose=False)
cf = lambda a, i_z: interp(cp.asset_grid, σ_star[:, i_z], a)
a = np.zeros(T+1)
z_seq = mc.simulate(T)
for t in range(T):
i_z = z_seq[t]
a[t+1] = R * a[t] + z_vals[i_z] - cf(a[t], i_z)
return a

cp = ConsumerProblem(r=0.03, grid_max=4)
a = compute_asset_series(cp)

fig, ax = plt.subplots(figsize=(10, 8))


ax.hist(a, bins=20, alpha=0.5, density=True)
ax.set(xlabel='assets', xlim=(-0.05, 0.75))
plt.show()
760 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS

45.6.3 Exercise 3

In [10]: M = 25
r_vals = np.linspace(0, 0.04, M)
fig, ax = plt.subplots(figsize=(10, 8))

for b in (1, 3):


asset_mean = []
for r_val in r_vals:
cp = ConsumerProblem(r=r_val, b=b)
mean = np.mean(compute_asset_series(cp, T=250000))
asset_mean.append(mean)
ax.plot(asset_mean, r_vals, label=f'$b = {b:d}$')
print(f"Finished iteration b = {b:d}")

ax.set(xlabel='capital', ylabel='interest rate')


ax.grid()
ax.legend()
plt.show()

Finished iteration b = 1
Finished iteration b = 3
45.6. SOLUTIONS 761
762 45. OPTIMAL SAVINGS III: OCCASIONALLY BINDING CONSTRAINTS
46

Robustness

46.1 Contents

• Overview 46.2

• The Model 46.3

• Constructing More Robust Policies 46.4

• Robustness as Outcome of a Two-Person Zero-Sum Game 46.5

• The Stochastic Case 46.6

• Implementation 46.7

• Application 46.8

• Appendix 46.9

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

46.2 Overview

This lecture modifies a Bellman equation to express a decision-maker’s doubts about transi-
tion dynamics
His specification doubts make the decision-maker want a robust decision rule
Robust means insensitive to misspecification of transition dynamics
The decision-maker has a single approximating model
He calls it approximating to acknowledge that he doesn’t completely trust it
He fears that outcomes will actually be determined by another model that he cannot describe
explicitly
All that he knows is that the actual data-generating model is in some (uncountable) set of
models that surrounds his approximating model

763
764 46. ROBUSTNESS

He quantifies the discrepancy between his approximating model and the genuine data-
generating model by using a quantity called entropy
(We’ll explain what entropy means below)
He wants a decision rule that will work well enough no matter which of those other models
actually governs outcomes
This is what it means for his decision rule to be “robust to misspecification of an approximat-
ing model”
This may sound like too much to ask for, but …
… a secret weapon is available to design robust decision rules
The secret weapon is max-min control theory
A value-maximizing decision-maker enlists the aid of an (imaginary) value-minimizing model
chooser to construct bounds on the value attained by a given decision rule under different
models of the transition dynamics
The original decision-maker uses those bounds to construct a decision rule with an assured
performance level, no matter which model actually governs outcomes

Note

In reading this lecture, please don’t think that our decision-maker is paranoid
when he conducts a worst-case analysis. By designing a rule that works well
against a worst-case, his intention is to construct a rule that will work well across
a set of models.

46.2.1 Sets of Models Imply Sets Of Values

Our “robust” decision-maker wants to know how well a given rule will work when he does not
know a single transition law …
… he wants to know sets of values that will be attained by a given decision rule 𝐹 under a set
of transition laws
Ultimately, he wants to design a decision rule 𝐹 that shapes these sets of values in ways that
he prefers
With this in mind, consider the following graph, which relates to a particular decision prob-
lem to be explained below
46.2. OVERVIEW 765

The figure shows a value-entropy correspondence for a particular decision rule 𝐹


The shaded set is the graph of the correspondence, which maps entropy to a set of values as-
sociated with a set of models that surround the decision-maker’s approximating model
Here

• Value refers to a sum of discounted rewards obtained by applying the decision rule 𝐹
when the state starts at some fixed initial state 𝑥0

• Entropy is a non-negative number that measures the size of a set of models surrounding
the decision-maker’s approximating model

– Entropy is zero when the set includes only the approximating model, indicating
that the decision-maker completely trusts the approximating model
– Entropy is bigger, and the set of surrounding models is bigger, the less the
decision-maker trusts the approximating model

The shaded region indicates that for all models having entropy less than or equal to the num-
ber on the horizontal axis, the value obtained will be somewhere within the indicated set of
values
Now let’s compare sets of values associated with two different decision rules, 𝐹𝑟 and 𝐹𝑏
In the next figure,

• The red set shows the value-entropy correspondence for decision rule 𝐹𝑟
• The blue set shows the value-entropy correspondence for decision rule 𝐹𝑏
766 46. ROBUSTNESS

The blue correspondence is skinnier than the red correspondence


This conveys the sense in which the decision rule 𝐹𝑏 is more robust than the decision rule 𝐹𝑟

• more robust means that the set of values is less sensitive to increasing misspecification
as measured by entropy

Notice that the less robust rule 𝐹𝑟 promises higher values for small misspecifications (small
entropy)
(But it is more fragile in the sense that it is more sensitive to perturbations of the approxi-
mating model)
Below we’ll explain in detail how to construct these sets of values for a given 𝐹 , but for now

Here is a hint about the secret weapons we’ll use to construct these sets

• We’ll use some min problems to construct the lower bounds


• We’ll use some max problems to construct the upper bounds

We will also describe how to choose 𝐹 to shape the sets of values


This will involve crafting a skinnier set at the cost of a lower level (at least for low values of
entropy)

46.2.2 Inspiring Video

If you want to understand more about why one serious quantitative researcher is interested in
this approach, we recommend Lars Peter Hansen’s Nobel lecture
46.3. THE MODEL 767

46.2.3 Other References

Our discussion in this lecture is based on

• [56]
• [52]

46.3 The Model

For simplicity, we present ideas in the context of a class of problems with linear transition
laws and quadratic objective functions
To fit in with our earlier lecture on LQ control, we will treat loss minimization rather than
value maximization
To begin, recall the infinite horizon LQ problem, where an agent chooses a sequence of con-
trols {𝑢𝑡 } to minimize


∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (1)
𝑡=0

subject to the linear law of motion

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 = 0, 1, 2, … (2)

As before,

• 𝑥𝑡 is 𝑛 × 1, 𝐴 is 𝑛 × 𝑛
• 𝑢𝑡 is 𝑘 × 1, 𝐵 is 𝑛 × 𝑘
• 𝑤𝑡 is 𝑗 × 1, 𝐶 is 𝑛 × 𝑗
• 𝑅 is 𝑛 × 𝑛 and 𝑄 is 𝑘 × 𝑘

Here 𝑥𝑡 is the state, 𝑢𝑡 is the control, and 𝑤𝑡 is a shock vector


For now, we take {𝑤𝑡 } ∶= {𝑤𝑡 }∞
𝑡=1 to be deterministic — a single fixed sequence

We also allow for model uncertainty on the part of the agent solving this optimization prob-
lem
In particular, the agent takes 𝑤𝑡 = 0 for all 𝑡 ≥ 0 as a benchmark model but admits the
possibility that this model might be wrong
As a consequence, she also considers a set of alternative models expressed in terms of se-
quences {𝑤𝑡 } that are “close” to the zero sequence
She seeks a policy that will do well enough for a set of alternative models whose members are
pinned down by sequences {𝑤𝑡 }
Soon we’ll quantify the quality of a model specification in terms of the maximal size of the

expression ∑𝑡=0 𝛽 𝑡+1 𝑤𝑡+1

𝑤𝑡+1
768 46. ROBUSTNESS

46.4 Constructing More Robust Policies

If our agent takes {𝑤𝑡 } as a given deterministic sequence, then, drawing on intuition from
earlier lectures on dynamic programming, we can anticipate Bellman equations such as

𝐽𝑡−1 (𝑥) = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 𝐽𝑡 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤𝑡 )}


𝑢

(Here 𝐽 depends on 𝑡 because the sequence {𝑤𝑡 } is not recursive)


Our tool for studying robustness is to construct a rule that works well even if an adverse se-
quence {𝑤𝑡 } occurs
In our framework, “adverse” means “loss increasing”
As we’ll see, this will eventually lead us to construct the Bellman equation

𝐽 (𝑥) = min max{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) − 𝜃𝑤′ 𝑤]} (3)


𝑢 𝑤

Notice that we’ve added the penalty term −𝜃𝑤′ 𝑤


Since 𝑤′ 𝑤 = ‖𝑤‖2 , this term becomes influential when 𝑤 moves away from the origin
The penalty parameter 𝜃 controls how much we penalize the maximizing agent for “harming”
the minimizing agent
By raising 𝜃 more and more, we more and more limit the ability of maximizing agent to dis-
tort outcomes relative to the approximating model
So bigger 𝜃 is implicitly associated with smaller distortion sequences {𝑤𝑡 }

46.4.1 Analyzing the Bellman Equation

So what does 𝐽 in Eq. (3) look like?


As with the ordinary LQ control model, 𝐽 takes the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 for some symmetric
positive definite matrix 𝑃
One of our main tasks will be to analyze and compute the matrix 𝑃
Related tasks will be to study associated feedback rules for 𝑢𝑡 and 𝑤𝑡+1
First, using matrix calculus, you will be able to verify that

max{(𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤)′ 𝑃 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) − 𝜃𝑤′ 𝑤}


𝑤
(4)
= (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢)

where

𝒟(𝑃 ) ∶= 𝑃 + 𝑃 𝐶(𝜃𝐼 − 𝐶 ′ 𝑃 𝐶)−1 𝐶 ′ 𝑃 (5)

and 𝐼 is a 𝑗 × 𝑗 identity matrix. Substituting this expression for the maximum into Eq. (3)
yields

𝑥′ 𝑃 𝑥 = min{𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢)} (6)


𝑢
46.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 769

Using similar mathematics, the solution to this minimization problem is 𝑢 = −𝐹 𝑥 where


𝐹 ∶= (𝑄 + 𝛽𝐵′ 𝒟(𝑃 )𝐵)−1 𝛽𝐵′ 𝒟(𝑃 )𝐴
Substituting this minimizer back into Eq. (6) and working through the algebra gives 𝑥′ 𝑃 𝑥 =
𝑥′ ℬ(𝒟(𝑃 ))𝑥 for all 𝑥, or, equivalently,

𝑃 = ℬ(𝒟(𝑃 ))

where 𝒟 is the operator defined in Eq. (5) and

ℬ(𝑃 ) ∶= 𝑅 − 𝛽 2 𝐴′ 𝑃 𝐵(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴 + 𝛽𝐴′ 𝑃 𝐴

The operator ℬ is the standard (i.e., non-robust) LQ Bellman operator, and 𝑃 = ℬ(𝑃 ) is the
standard matrix Riccati equation coming from the Bellman equation — see this discussion
Under some regularity conditions (see [52]), the operator ℬ ∘ 𝒟 has a unique positive definite
fixed point, which we denote below by 𝑃 ̂
A robust policy, indexed by 𝜃, is 𝑢 = −𝐹 ̂ 𝑥 where

𝐹 ̂ ∶= (𝑄 + 𝛽𝐵′ 𝒟(𝑃 ̂ )𝐵)−1 𝛽𝐵′ 𝒟(𝑃 ̂ )𝐴 (7)

We also define

𝐾̂ ∶= (𝜃𝐼 − 𝐶 ′ 𝑃 ̂ 𝐶)−1 𝐶 ′ 𝑃 ̂ (𝐴 − 𝐵𝐹 ̂ ) (8)

The interpretation of 𝐾̂ is that 𝑤𝑡+1 = 𝐾𝑥̂ 𝑡 on the worst-case path of {𝑥𝑡 }, in the sense that
this vector is the maximizer of Eq. (4) evaluated at the fixed rule 𝑢 = −𝐹 ̂ 𝑥
Note that 𝑃 ̂ , 𝐹 ̂ , 𝐾̂ are all determined by the primitives and 𝜃
Note also that if 𝜃 is very large, then 𝒟 is approximately equal to the identity mapping
Hence, when 𝜃 is large, 𝑃 ̂ and 𝐹 ̂ are approximately equal to their standard LQ values
Furthermore, when 𝜃 is large, 𝐾̂ is approximately equal to zero
Conversely, smaller 𝜃 is associated with greater fear of model misspecification and greater
concern for robustness

46.5 Robustness as Outcome of a Two-Person Zero-Sum


Game

What we have done above can be interpreted in terms of a two-person zero-sum game in
which 𝐹 ̂ , 𝐾̂ are Nash equilibrium objects
Agent 1 is our original agent, who seeks to minimize loss in the LQ program while admitting
the possibility of misspecification
Agent 2 is an imaginary malevolent player
Agent 2’s malevolence helps the original agent to compute bounds on his value function
across a set of models
We begin with agent 2’s problem
770 46. ROBUSTNESS

46.5.1 Agent 2’s Problem

Agent 2

1. knows a fixed policy 𝐹 specifying the behavior of agent 1, in the sense that 𝑢𝑡 = −𝐹 𝑥𝑡
for all 𝑡
2. responds by choosing a shock sequence {𝑤𝑡 } from a set of paths sufficiently close to the
benchmark sequence {0, 0, 0, …}

A natural way to say “sufficiently close to the zero sequence” is to restrict the summed inner

product ∑𝑡=1 𝑤𝑡′ 𝑤𝑡 to be small
However, to obtain a time-invariant recursive formulation, it turns out to be convenient to
restrict a discounted inner product


∑ 𝛽 𝑡 𝑤𝑡′ 𝑤𝑡 ≤ 𝜂 (9)
𝑡=1

Now let 𝐹 be a fixed policy, and let 𝐽𝐹 (𝑥0 , w) be the present-value cost of that policy given
sequence w ∶= {𝑤𝑡 } and initial condition 𝑥0 ∈ R𝑛
Substituting −𝐹 𝑥𝑡 for 𝑢𝑡 in Eq. (1), this value can be written as


𝐽𝐹 (𝑥0 , w) ∶= ∑ 𝛽 𝑡 𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 (10)
𝑡=0

where

𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1 (11)

and the initial condition 𝑥0 is as specified in the left side of Eq. (10)
Agent 2 chooses w to maximize agent 1’s loss 𝐽𝐹 (𝑥0 , w) subject to Eq. (9)
Using a Lagrangian formulation, we can express this problem as


max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃(𝑤𝑡+1

𝑤𝑡+1 − 𝜂)}
w
𝑡=0

where {𝑥𝑡 } satisfied Eq. (11) and 𝜃 is a Lagrange multiplier on constraint Eq. (9)
For the moment, let’s take 𝜃 as fixed, allowing us to drop the constant 𝛽𝜃𝜂 term in the objec-
tive function, and hence write the problem as


max ∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 − 𝛽𝜃𝑤𝑡+1

𝑤𝑡+1 }
w
𝑡=0

or, equivalently,


min ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 + 𝛽𝜃𝑤𝑡+1

𝑤𝑡+1 } (12)
w
𝑡=0
46.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 771

subject to Eq. (11)


What’s striking about this optimization problem is that it is once again an LQ discounted
dynamic programming problem, with w = {𝑤𝑡 } as the sequence of controls
The expression for the optimal policy can be found by applying the usual LQ formula (see
here)
We denote it by 𝐾(𝐹 , 𝜃), with the interpretation 𝑤𝑡+1 = 𝐾(𝐹 , 𝜃)𝑥𝑡
The remaining step for agent 2’s problem is to set 𝜃 to enforce the constraint Eq. (9), which
can be done by choosing 𝜃 = 𝜃𝜂 such that


𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃𝜂 )′ 𝐾(𝐹 , 𝜃𝜂 )𝑥𝑡 = 𝜂 (13)
𝑡=0

Here 𝑥𝑡 is given by Eq. (11) — which in this case becomes 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡

46.5.2 Using Agent 2’s Problem to Construct Bounds on the Value Sets

The Lower Bound


Define the minimized object on the right side of problem Eq. (12) as 𝑅𝜃 (𝑥0 , 𝐹 )
Because “minimizers minimize” we have

∞ ∞
𝑅𝜃 (𝑥0 , 𝐹 ) ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } + 𝛽𝜃 ∑ 𝛽 𝑡 𝑤𝑡+1

𝑤𝑡+1 ,
𝑡=0 𝑡=0

where 𝑥𝑡+1 = (𝐴 − 𝐵𝐹 + 𝐶𝐾(𝐹 , 𝜃))𝑥𝑡 and 𝑥0 is a given initial condition


This inequality in turn implies the inequality


𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent ≤ ∑ 𝛽 𝑡 {−𝑥′𝑡 (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥𝑡 } (14)
𝑡=0

where


ent ∶= 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1

𝑤𝑡+1
𝑡=0

The left side of inequality Eq. (14) is a straight line with slope −𝜃
Technically, it is a “separating hyperplane”
At a particular value of entropy, the line is tangent to the lower bound of values as a function
of entropy
In particular, the lower bound on the left side of Eq. (14) is attained when


ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)′ 𝐾(𝐹 , 𝜃)𝑥𝑡 (15)
𝑡=0

To construct the lower bound on the set of values associated with all perturbations w satisfy-
ing the entropy constraint Eq. (9) at a given entropy level, we proceed as follows:
772 46. ROBUSTNESS

• For a given 𝜃, solve the minimization problem Eq. (12)


• Compute the minimizer 𝑅𝜃 (𝑥0 , 𝐹 ) and the associated entropy using Eq. (15)
• Compute the lower bound on the value function 𝑅𝜃 (𝑥0 , 𝐹 ) − 𝜃 ent and plot it
against ent
• Repeat the preceding three steps for a range of values of 𝜃 to trace out the
lower bound

Note
This procedure sweeps out a set of separating hyperplanes indexed by different
values for the Lagrange multiplier 𝜃

The Upper Bound


To construct an upper bound we use a very similar procedure
We simply replace the minimization problem Eq. (12) with the maximization problem


𝑡 ′ ̃ ′
̃ 0 , 𝐹 ) = max ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 − 𝛽 𝜃𝑤𝑡+1 𝑤𝑡+1 }
𝑉𝜃 (𝑥 ′
(16)
w
𝑡=0

where now 𝜃 ̃ > 0 penalizes the choice of w with larger entropy


(Notice that 𝜃 ̃ = −𝜃 in problem Eq. (12))
Because “maximizers maximize” we have

∞ ∞
𝑉𝜃 (𝑥 𝑡 ′ ′ ̃ 𝑡 ′
̃ 0 , 𝐹 ) ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } − 𝛽 𝜃 ∑ 𝛽 𝑤𝑡+1 𝑤𝑡+1
𝑡=0 𝑡=0

which in turn implies the inequality


𝑉𝜃 (𝑥 ̃ 𝑡 ′ ′
̃ 0 , 𝐹 ) + 𝜃 ent ≥ ∑ 𝛽 {−𝑥𝑡 (𝑅 + 𝐹 𝑄𝐹 )𝑥𝑡 } (17)
𝑡=0

where


ent ≡ 𝛽 ∑ 𝛽 𝑡 𝑤𝑡+1

𝑤𝑡+1
𝑡=0

The left side of inequality Eq. (17) is a straight line with slope 𝜃 ̃
The upper bound on the left side of Eq. (17) is attained when


ent = 𝛽 ∑ 𝛽 𝑡 𝑥′𝑡 𝐾(𝐹 , 𝜃)̃ ′ 𝐾(𝐹 , 𝜃)𝑥
̃
𝑡 (18)
𝑡=0

To construct the upper bound on the set of values associated all perturbations w with a given
entropy we proceed much as we did for the lower bound

• For a given 𝜃,̃ solve the maximization problem Eq. (16)


46.5. ROBUSTNESS AS OUTCOME OF A TWO-PERSON ZERO-SUM GAME 773

• Compute the maximizer 𝑉𝜃 (𝑥


̃ 0 , 𝐹 ) and the associated entropy using Eq. (18)
• Compute the upper bound on the value function 𝑉𝜃 (𝑥 ̃
̃ 0 , 𝐹 ) + 𝜃 ent and plot it
against ent
• Repeat the preceding three steps for a range of values of 𝜃 ̃ to trace out the
upper bound

Reshaping the Set of Values


Now in the interest of reshaping these sets of values by choosing 𝐹 , we turn to agent 1’s prob-
lem

46.5.3 Agent 1’s Problem

Now we turn to agent 1, who solves


min ∑ 𝛽 𝑡 {𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 − 𝛽𝜃𝑤𝑡+1

𝑤𝑡+1 } (19)
{𝑢𝑡 }
𝑡=0

where {𝑤𝑡+1 } satisfies 𝑤𝑡+1 = 𝐾𝑥𝑡


In other words, agent 1 minimizes


∑ 𝛽 𝑡 {𝑥′𝑡 (𝑅 − 𝛽𝜃𝐾 ′ 𝐾)𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 } (20)
𝑡=0

subject to

𝑥𝑡+1 = (𝐴 + 𝐶𝐾)𝑥𝑡 + 𝐵𝑢𝑡 (21)

Once again, the expression for the optimal policy can be found here — we denote it by 𝐹 ̃

46.5.4 Nash Equilibrium

Clearly, the 𝐹 ̃ we have obtained depends on 𝐾, which, in agent 2’s problem, depended on an
initial policy 𝐹
Holding all other parameters fixed, we can represent this relationship as a mapping Φ, where

𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))

The map 𝐹 ↦ Φ(𝐾(𝐹 , 𝜃)) corresponds to a situation in which

1. agent 1 uses an arbitrary initial policy 𝐹


2. agent 2 best responds to agent 1 by choosing 𝐾(𝐹 , 𝜃)
3. agent 1 best responds to agent 2 by choosing 𝐹 ̃ = Φ(𝐾(𝐹 , 𝜃))

As you may have already guessed, the robust policy 𝐹 ̂ defined in Eq. (7) is a fixed point of
the mapping Φ
In particular, for any given 𝜃,
774 46. ROBUSTNESS

1. 𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as given in Eq. (8)


2. Φ(𝐾)̂ = 𝐹 ̂

A sketch of the proof is given in the appendix

46.6 The Stochastic Case

Now we turn to the stochastic case, where the sequence {𝑤𝑡 } is treated as an IID sequence of
random vectors
In this setting, we suppose that our agent is uncertain about the conditional probability distri-
bution of 𝑤𝑡+1
The agent takes the standard normal distribution 𝑁 (0, 𝐼) as the baseline conditional distribu-
tion, while admitting the possibility that other “nearby” distributions prevail
These alternative conditional distributions of 𝑤𝑡+1 might depend nonlinearly on the history
𝑥𝑠 , 𝑠 ≤ 𝑡
To implement this idea, we need a notion of what it means for one distribution to be near
another one
Here we adopt a very useful measure of closeness for distributions known as the relative en-
tropy, or Kullback-Leibler divergence
For densities 𝑝, 𝑞, the Kullback-Leibler divergence of 𝑞 from 𝑝 is defined as

𝑝(𝑥)
𝐷𝐾𝐿 (𝑝, 𝑞) ∶= ∫ ln [ ] 𝑝(𝑥) 𝑑𝑥
𝑞(𝑥)

Using this notation, we replace Eq. (3) with the stochastic analog

𝐽 (𝑥) = min max {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 [∫ 𝐽 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]} (22)
𝑢 𝜓∈𝒫

Here 𝒫 represents the set of all densities on R𝑛 and 𝜙 is the benchmark distribution 𝑁 (0, 𝐼)
The distribution 𝜙 is chosen as the least desirable conditional distribution in terms of next
period outcomes, while taking into account the penalty term 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)
This penalty term plays a role analogous to the one played by the deterministic penalty 𝜃𝑤′ 𝑤
in Eq. (3), since it discourages large deviations from the benchmark

46.6.1 Solving the Model

The maximization problem in Eq. (22) appears highly nontrivial — after all, we are maximiz-
ing over an infinite dimensional space consisting of the entire set of densities
However, it turns out that the solution is tractable, and in fact also falls within the class of
normal distributions
First, we note that 𝐽 has the form 𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 for some positive definite matrix 𝑃 and
constant real number 𝑑
46.6. THE STOCHASTIC CASE 775

Moreover, it turns out that if (𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 is nonsingular, then

max {∫(𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤)′ 𝑃 (𝐴𝑥 + 𝐵𝑢 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)}


𝜓∈𝒫 (23)
= (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝜅(𝜃, 𝑃 )

where

𝜅(𝜃, 𝑃 ) ∶= 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 ]

and the maximizer is the Gaussian distribution

𝜓 = 𝑁 ((𝜃𝐼 − 𝐶 ′ 𝑃 𝐶)−1 𝐶 ′ 𝑃 (𝐴𝑥 + 𝐵𝑢), (𝐼 − 𝜃−1 𝐶 ′ 𝑃 𝐶)−1 ) (24)

Substituting the expression for the maximum into Bellman equation Eq. (22) and using
𝐽 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝑑 gives

𝑥′ 𝑃 𝑥 + 𝑑 = min {𝑥′ 𝑅𝑥 + 𝑢′ 𝑄𝑢 + 𝛽 (𝐴𝑥 + 𝐵𝑢)′ 𝒟(𝑃 )(𝐴𝑥 + 𝐵𝑢) + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]} (25)
𝑢

Since constant terms do not affect minimizers, the solution is the same as Eq. (6), leading to

𝑥′ 𝑃 𝑥 + 𝑑 = 𝑥′ ℬ(𝒟(𝑃 ))𝑥 + 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )]

To solve this Bellman equation, we take 𝑃 ̂ to be the positive definite fixed point of ℬ ∘ 𝒟
In addition, we take 𝑑 ̂ as the real number solving 𝑑 = 𝛽 [𝑑 + 𝜅(𝜃, 𝑃 )], which is

𝛽
𝑑 ̂ ∶= 𝜅(𝜃, 𝑃 ) (26)
1−𝛽

The robust policy in this stochastic case is the minimizer in Eq. (25), which is once again 𝑢 =
−𝐹 ̂ 𝑥 for 𝐹 ̂ given by Eq. (7)
Substituting the robust policy into Eq. (24) we obtain the worst-case shock distribution:

̂ 𝑡 , (𝐼 − 𝜃−1 𝐶 ′ 𝑃 ̂ 𝐶)−1 )
𝑤𝑡+1 ∼ 𝑁 (𝐾𝑥

where 𝐾̂ is given by Eq. (8)


Note that the mean of the worst-case shock distribution is equal to the same worst-case 𝑤𝑡+1
as in the earlier deterministic setting

46.6.2 Computing Other Quantities

Before turning to implementation, we briefly outline how to compute several other quantities
of interest
Worst-Case Value of a Policy
One thing we will be interested in doing is holding a policy fixed and computing the dis-
counted loss associated with that policy
776 46. ROBUSTNESS

So let 𝐹 be a given policy and let 𝐽𝐹 (𝑥) be the associated loss, which, by analogy with
Eq. (22), satisfies

𝐽𝐹 (𝑥) = max {𝑥′ (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥 + 𝛽 [∫ 𝐽𝐹 ((𝐴 − 𝐵𝐹 )𝑥 + 𝐶𝑤) 𝜓(𝑑𝑤) − 𝜃𝐷𝐾𝐿 (𝜓, 𝜙)]}


𝜓∈𝒫

Writing 𝐽𝐹 (𝑥) = 𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 and applying the same argument used to derive Eq. (23) we get

𝑥′ 𝑃𝐹 𝑥 + 𝑑𝐹 = 𝑥′ (𝑅 + 𝐹 ′ 𝑄𝐹 )𝑥 + 𝛽 [𝑥′ (𝐴 − 𝐵𝐹 )′ 𝒟(𝑃𝐹 )(𝐴 − 𝐵𝐹 )𝑥 + 𝑑𝐹 + 𝜅(𝜃, 𝑃𝐹 )]

To solve this we take 𝑃𝐹 to be the fixed point

𝑃𝐹 = 𝑅 + 𝐹 ′ 𝑄𝐹 + 𝛽(𝐴 − 𝐵𝐹 )′ 𝒟(𝑃𝐹 )(𝐴 − 𝐵𝐹 )

and

𝛽 𝛽
𝑑𝐹 ∶= 𝜅(𝜃, 𝑃𝐹 ) = 𝜃 ln[det(𝐼 − 𝜃−1 𝐶 ′ 𝑃𝐹 𝐶)−1 ] (27)
1−𝛽 1−𝛽

If you skip ahead to the appendix, you will be able to verify that −𝑃𝐹 is the solution to the
Bellman equation in agent 2’s problem discussed above — we use this in our computations

46.7 Implementation

The QuantEcon.py package provides a class called RBLQ for implementation of robust LQ
optimal control
The code can be found on GitHub
Here is a brief description of the methods of the class

• d_operator() and b_operator() implement 𝒟 and ℬ respectively

• robust_rule() and robust_rule_simple() both solve for the triple 𝐹 ̂ , 𝐾,̂ 𝑃 ̂ , as


described in equations Eq. (7) – Eq. (8) and the surrounding discussion

– robust_rule() is more efficient


– robust_rule_simple() is more transparent and easier to follow

• K_to_F() and F_to_K() solve the decision problems of agent 1 and agent 2 respec-
tively

• compute_deterministic_entropy() computes the left-hand side of Eq. (13)

• evaluate_F() computes the loss and entropy associated with a given policy — see
this discussion
46.8. APPLICATION 777

46.8 Application

Let us consider a monopolist similar to this one, but now facing model uncertainty
The inverse demand function is 𝑝𝑡 = 𝑎0 − 𝑎1 𝑦𝑡 + 𝑑𝑡
where

IID
𝑑𝑡+1 = 𝜌𝑑𝑡 + 𝜎𝑑 𝑤𝑡+1 , {𝑤𝑡 } ∼ 𝑁 (0, 1)

and all parameters are strictly positive


The period return function for the monopolist is

(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑦𝑡 − 𝛾 − 𝑐𝑦𝑡
2

Its objective is to maximize expected discounted profits, or, equivalently, to minimize



E ∑𝑡=0 𝛽 𝑡 (−𝑟𝑡 )
To form a linear regulator problem, we take the state and control to be

1
𝑥𝑡 = ⎢𝑦𝑡 ⎤

⎥ and 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
𝑑
⎣ 𝑡⎦

Setting 𝑏 ∶= (𝑎0 − 𝑐)/2 we define

0 𝑏 0
𝑅 = − ⎢𝑏 −𝑎1 1/2⎤

⎥ and 𝑄 = 𝛾/2
⎣0 1/2 0 ⎦

For the transition matrices, we set

1 0 0 0 0
𝐴=⎡ ⎤
⎢0 1 0⎥ , 𝐵=⎡ ⎤
⎢1⎥ , 𝐶=⎡
⎢0⎥

⎣0 0 𝜌⎦ ⎣0⎦ ⎣𝜎𝑑 ⎦

Our aim is to compute the value-entropy correspondences shown above


The parameters are

𝑎0 = 100, 𝑎1 = 0.5, 𝜌 = 0.9, 𝜎𝑑 = 0.05, 𝛽 = 0.95, 𝑐 = 2, 𝛾 = 50.0

The standard normal distribution for 𝑤𝑡 is understood as the agent’s baseline, with uncer-
tainty parameterized by 𝜃
We compute value-entropy correspondences for two policies

1. The no concern for robustness policy 𝐹0 , which is the ordinary LQ loss minimizer
2. A “moderate” concern for robustness policy 𝐹𝑏 , with 𝜃 = 0.02
778 46. ROBUSTNESS

The code for producing the graph shown above, with blue being for the robust policy, is as
follows

In [2]: """

Authors: Chase Coleman, Spencer Lyon, Thomas Sargent, John Stachurski

"""
import pandas as pd
import numpy as np
from scipy.linalg import eig
import matplotlib.pyplot as plt
import quantecon as qe

# == model parameters == #

a_0 = 100
a_1 = 0.5
ρ = 0.9
σ_d = 0.05
β = 0.95
c = 2
γ = 50.0

θ = 0.002
ac = (a_0 - c) / 2.0

# == Define LQ matrices == #

R = np.array([[0., ac, 0.],


[ac, -a_1, 0.5],
[0., 0.5, 0.]])

R = -R # For minimization
Q = γ / 2

A = np.array([[1., 0., 0.],


[0., 1., 0.],
[0., 0., ρ]])
B = np.array([[0.],
[1.],
[0.]])
C = np.array([[0.],
[0.],
[σ_d]])

# -------------------------------------------------------------------------- #
# Functions
# -------------------------------------------------------------------------- #

def evaluate_policy(θ, F):


"""
Given θ (scalar, dtype=float) and policy F (array_like), returns the
value associated with that policy under the worst case path for {w_t}, as
well as the entropy level.
"""
rlq = qe.robustlq.RBLQ(Q, R, A, B, C, β, θ)
K_F, P_F, d_F, O_F, o_F = rlq.evaluate_F(F)
x0 = np.array([[1.], [0.], [0.]])
value = - x0.T @ P_F @ x0 - d_F
entropy = x0.T @ O_F @ x0 + o_F
return list(map(float, (value, entropy)))

def value_and_entropy(emax, F, bw, grid_size=1000):


"""
Compute the value function and entropy levels for a θ path
increasing until it reaches the specified target entropy value.

Parameters
==========
46.8. APPLICATION 779

emax: scalar
The target entropy value

F: array_like
The policy function to be evaluated

bw: str
A string specifying whether the implied shock path follows best
or worst assumptions. The only acceptable values are 'best' and
'worst'.

Returns
=======
df: pd.DataFrame
A pandas DataFrame containing the value function and entropy
values up to the emax parameter. The columns are 'value' and
'entropy'.

"""
if bw == 'worst':
θs = 1 / np.linspace(1e-8, 1000, grid_size)
else:
θs = -1 / np.linspace(1e-8, 1000, grid_size)

df = pd.DataFrame(index=θs, columns=('value', 'entropy'))

for θ in θs:
df.loc[θ] = evaluate_policy(θ, F)
if df.loc[θ, 'entropy'] >= emax:
break

df = df.dropna(how='any')
return df

# -------------------------------------------------------------------------- #
# Main
# -------------------------------------------------------------------------- #

# == Compute the optimal rule == #


optimal_lq = qe.lqcontrol.LQ(Q, R, A, B, C, beta=β)
Po, Fo, do = optimal_lq.stationary_values()

# == Compute a robust rule given θ == #


baseline_robust = qe.robustlq.RBLQ(Q, R, A, B, C, β, θ)
Fb, Kb, Pb = baseline_robust.robust_rule()

# == Check the positive definiteness of worst-case covariance matrix to == #


# == ensure that θ exceeds the breakdown point == #
test_matrix = np.identity(Pb.shape[0]) - (C.T @ Pb @ C) / θ
eigenvals, eigenvecs = eig(test_matrix)
assert (eigenvals >= 0).all(), 'θ below breakdown point.'

emax = 1.6e6

optimal_best_case = value_and_entropy(emax, Fo, 'best')


robust_best_case = value_and_entropy(emax, Fb, 'best')
optimal_worst_case = value_and_entropy(emax, Fo, 'worst')
robust_worst_case = value_and_entropy(emax, Fb, 'worst')

fig, ax = plt.subplots()

ax.set_xlim(0, emax)
ax.set_ylabel("Value")
ax.set_xlabel("Entropy")
ax.grid()

for axis in 'x', 'y':


plt.ticklabel_format(style='sci', axis=axis, scilimits=(0, 0))

plot_args = {'lw': 2, 'alpha': 0.7}


780 46. ROBUSTNESS

colors = 'r', 'b'

df_pairs = ((optimal_best_case, optimal_worst_case),


(robust_best_case, robust_worst_case))

class Curve:

def __init__(self, x, y):


self.x, self.y = x, y

def __call__(self, z):


return np.interp(z, self.x, self.y)

for c, df_pair in zip(colors, df_pairs):


curves = []
for df in df_pair:
# == Plot curves == #
x, y = df['entropy'], df['value']
x, y = (np.asarray(a, dtype='float') for a in (x, y))
egrid = np.linspace(0, emax, 100)
curve = Curve(x, y)
print(ax.plot(egrid, curve(egrid), color=c, **plot_args))
curves.append(curve)
# == Color fill between curves == #
ax.fill_between(egrid,
curves[0](egrid),
curves[1](egrid),
color=c, alpha=0.1)

plt.show()

[<matplotlib.lines.Line2D object at 0x7f2e32cfa7f0>]


[<matplotlib.lines.Line2D object at 0x7f2e32cfad68>]
[<matplotlib.lines.Line2D object at 0x7f2e32cfa780>]
[<matplotlib.lines.Line2D object at 0x7f2e32d0d7b8>]

<Figure size 640x480 with 1 Axes>

Here’s another such figure, with 𝜃 = 0.002 instead of 0.02


46.9. APPENDIX 781

Can you explain the different shape of the value-entropy correspondence for the robust pol-
icy?

46.9 Appendix

We sketch the proof only of the first claim in this section, which is that, for any given 𝜃,
𝐾(𝐹 ̂ , 𝜃) = 𝐾,̂ where 𝐾̂ is as given in Eq. (8)
This is the content of the next lemma
Lemma. If 𝑃 ̂ is the fixed point of the map ℬ ∘ 𝒟 and 𝐹 ̂ is the robust policy as given in
Eq. (7), then

𝐾(𝐹 ̂ , 𝜃) = (𝜃𝐼 − 𝐶 ′ 𝑃 ̂ 𝐶)−1 𝐶 ′ 𝑃 ̂ (𝐴 − 𝐵𝐹 ̂ ) (28)

Proof: As a first step, observe that when 𝐹 = 𝐹 ̂ , the Bellman equation associated with the
LQ problem Eq. (11) – Eq. (12) is

𝑃 ̃ = −𝑅 − 𝐹 ̂ ′ 𝑄𝐹 ̂ − 𝛽 2 (𝐴 − 𝐵𝐹 ̂ )′ 𝑃 ̃ 𝐶(𝛽𝜃𝐼 + 𝛽𝐶 ′ 𝑃 ̃ 𝐶)−1 𝐶 ′ 𝑃 ̃ (𝐴 − 𝐵𝐹 ̂ ) + 𝛽(𝐴 − 𝐵𝐹 ̂ )′ 𝑃 ̃ (𝐴 − 𝐵𝐹 ̂ )


(29)
(revisit this discussion if you don’t know where Eq. (29) comes from) and the optimal policy
is

𝑤𝑡+1 = −𝛽(𝛽𝜃𝐼 + 𝛽𝐶 ′ 𝑃 ̃ 𝐶)−1 𝐶 ′ 𝑃 ̃ (𝐴 − 𝐵𝐹 ̂ )𝑥𝑡

Suppose for a moment that −𝑃 ̂ solves the Bellman equation Eq. (29)
In this case, the policy becomes

𝑤𝑡+1 = (𝜃𝐼 − 𝐶 ′ 𝑃 ̂ 𝐶)−1 𝐶 ′ 𝑃 ̂ (𝐴 − 𝐵𝐹 ̂ )𝑥𝑡

which is exactly the claim in Eq. (28)


Hence it remains only to show that −𝑃 ̂ solves Eq. (29), or, in other words,

𝑃 ̂ = 𝑅 + 𝐹 ̂ ′ 𝑄𝐹 ̂ + 𝛽(𝐴 − 𝐵𝐹 ̂ )′ 𝑃 ̂ 𝐶(𝜃𝐼 − 𝐶 ′ 𝑃 ̂ 𝐶)−1 𝐶 ′ 𝑃 ̂ (𝐴 − 𝐵𝐹 ̂ ) + 𝛽(𝐴 − 𝐵𝐹 ̂ )′ 𝑃 ̂ (𝐴 − 𝐵𝐹 ̂ )

Using the definition of 𝒟, we can rewrite the right-hand side more simply as

𝑅 + 𝐹 ̂ ′ 𝑄𝐹 ̂ + 𝛽(𝐴 − 𝐵𝐹 ̂ )′ 𝒟(𝑃 ̂ )(𝐴 − 𝐵𝐹 ̂ )

Although it involves a substantial amount of algebra, it can be shown that the latter is just 𝑃 ̂
(Hint: Use the fact that 𝑃 ̂ = ℬ(𝒟(𝑃 ̂ )))
782 46. ROBUSTNESS
47

Discrete State Dynamic


Programming

47.1 Contents

• Overview 47.2

• Discrete DPs 47.3

• Solving Discrete DPs 47.4

• Example: A Growth Model 47.5

• Exercises 47.6

• Solutions 47.7

• Appendix: Algorithms 47.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

47.2 Overview

In this lecture we discuss a family of dynamic programming problems with the following fea-
tures:

1. a discrete state space and discrete choices (actions)


2. an infinite horizon
3. discounted rewards
4. Markov state transitions

We call such problems discrete dynamic programs or discrete DPs


Discrete DPs are the workhorses in much of modern quantitative economics, including

• monetary economics

783
784 47. DISCRETE STATE DYNAMIC PROGRAMMING

• search and labor economics


• household savings and consumption theory
• investment theory
• asset pricing
• industrial organization, etc.

When a given model is not inherently discrete, it is common to replace it with a discretized
version in order to use discrete DP techniques
This lecture covers

• the theory of dynamic programming in a discrete setting, plus examples and applica-
tions
• a powerful set of routines for solving discrete DPs from the QuantEcon code library

47.2.1 How to Read this Lecture

We use dynamic programming many applied lectures, such as

• The shortest path lecture


• The McCall search model lecture
• The optimal growth lecture

The objective of this lecture is to provide a more systematic and theoretical treatment, in-
cluding algorithms and implementation while focusing on the discrete case

47.2.2 Code

The code discussed below was authored primarily by Daisuke Oyama


Among other things, it offers

• a flexible, well-designed interface


• multiple solution methods, including value function and policy function iteration
• high-speed operations via carefully optimized JIT-compiled functions
• the ability to scale to large problems by minimizing vectorized operators and allowing
operations on sparse matrices

JIT compilation relies on Numba, which should work seamlessly if you are using Anaconda as
suggested

47.2.3 References

For background reading on dynamic programming and additional applications, see, for exam-
ple,

• [87]
• [65], section 3.5
47.3. DISCRETE DPS 785

• [104]
• [123]
• [112]
• [96]
• EDTC, chapter 5

47.3 Discrete DPs

Loosely speaking, a discrete DP is a maximization problem with an objective function of the


form


E ∑ 𝛽 𝑡 𝑟(𝑠𝑡 , 𝑎𝑡 ) (1)
𝑡=0

where

• 𝑠𝑡 is the state variable


• 𝑎𝑡 is the action
• 𝛽 is a discount factor
• 𝑟(𝑠𝑡 , 𝑎𝑡 ) is interpreted as a current reward when the state is 𝑠𝑡 and the action chosen is
𝑎𝑡

Each pair (𝑠𝑡 , 𝑎𝑡 ) pins down transition probabilities 𝑄(𝑠𝑡 , 𝑎𝑡 , 𝑠𝑡+1 ) for the next period state
𝑠𝑡+1
Thus, actions influence not only current rewards but also the future time path of the state
The essence of dynamic programming problems is to trade off current rewards vs favorable
positioning of the future state (modulo randomness)
Examples:

• consuming today vs saving and accumulating assets


• accepting a job offer today vs seeking a better one in the future
• exercising an option now vs waiting

47.3.1 Policies

The most fruitful way to think about solutions to discrete DP problems is to compare policies
In general, a policy is a randomized map from past actions and states to current action
In the setting formalized below, it suffices to consider so-called stationary Markov policies,
which consider only the current state
In particular, a stationary Markov policy is a map 𝜎 from states to actions

• 𝑎𝑡 = 𝜎(𝑠𝑡 ) indicates that 𝑎𝑡 is the action to be taken in state 𝑠𝑡

It is known that, for any arbitrary policy, there exists a stationary Markov policy that domi-
nates it at least weakly
786 47. DISCRETE STATE DYNAMIC PROGRAMMING

• See section 5.5 of [104] for discussion and proofs

In what follows, stationary Markov policies are referred to simply as policies


The aim is to find an optimal policy, in the sense of one that maximizes Eq. (1)
Let’s now step through these ideas more carefully

47.3.2 Formal Definition

Formally, a discrete dynamic program consists of the following components:

1. A finite set of states 𝑆 = {0, … , 𝑛 − 1}


2. A finite set of feasible actions 𝐴(𝑠) for each state 𝑠 ∈ 𝑆, and a corresponding set of
feasible state-action pairs

SA ∶= {(𝑠, 𝑎) ∣ 𝑠 ∈ 𝑆, 𝑎 ∈ 𝐴(𝑠)}

1. A reward function 𝑟 ∶ SA → R
2. A transition probability function 𝑄 ∶ SA → Δ(𝑆), where Δ(𝑆) is the set of probability
distributions over 𝑆
3. A discount factor 𝛽 ∈ [0, 1)

We also use the notation 𝐴 ∶= ⋃𝑠∈𝑆 𝐴(𝑠) = {0, … , 𝑚 − 1} and call this set the action space
A policy is a function 𝜎 ∶ 𝑆 → 𝐴
A policy is called feasible if it satisfies 𝜎(𝑠) ∈ 𝐴(𝑠) for all 𝑠 ∈ 𝑆
Denote the set of all feasible policies by Σ
If a decision-maker uses a policy 𝜎 ∈ Σ, then

• the current reward at time 𝑡 is 𝑟(𝑠𝑡 , 𝜎(𝑠𝑡 ))


• the probability that 𝑠𝑡+1 = 𝑠′ is 𝑄(𝑠𝑡 , 𝜎(𝑠𝑡 ), 𝑠′ )

For each 𝜎 ∈ Σ, define

• 𝑟𝜎 by 𝑟𝜎 (𝑠) ∶= 𝑟(𝑠, 𝜎(𝑠)))


• 𝑄𝜎 by 𝑄𝜎 (𝑠, 𝑠′ ) ∶= 𝑄(𝑠, 𝜎(𝑠), 𝑠′ )

Notice that 𝑄𝜎 is a stochastic matrix on 𝑆


It gives transition probabilities of the controlled chain when we follow policy 𝜎
If we think of 𝑟𝜎 as a column vector, then so is 𝑄𝑡𝜎 𝑟𝜎 , and the 𝑠-th row of the latter has the
interpretation

(𝑄𝑡𝜎 𝑟𝜎 )(𝑠) = E[𝑟(𝑠𝑡 , 𝜎(𝑠𝑡 )) ∣ 𝑠0 = 𝑠] when {𝑠𝑡 } ∼ 𝑄𝜎 (2)

Comments
47.3. DISCRETE DPS 787

• {𝑠𝑡 } ∼ 𝑄𝜎 means that the state is generated by stochastic matrix 𝑄𝜎


• See this discussion on computing expectations of Markov chains for an explanation of
the expression in Eq. (2)

Notice that we’re not really distinguishing between functions from 𝑆 to R and vectors in R𝑛
This is natural because they are in one to one correspondence

47.3.3 Value and Optimality

Let 𝑣𝜎 (𝑠) denote the discounted sum of expected reward flows from policy 𝜎 when the initial
state is 𝑠
To calculate this quantity we pass the expectation through the sum in Eq. (1) and use Eq. (2)
to get


𝑣𝜎 (𝑠) = ∑ 𝛽 𝑡 (𝑄𝑡𝜎 𝑟𝜎 )(𝑠) (𝑠 ∈ 𝑆)
𝑡=0

This function is called the policy value function for the policy 𝜎
The optimal value function, or simply value function, is the function 𝑣∗ ∶ 𝑆 → R defined by

𝑣∗ (𝑠) = max 𝑣𝜎 (𝑠) (𝑠 ∈ 𝑆)


𝜎∈Σ

(We can use max rather than sup here because the domain is a finite set)
A policy 𝜎 ∈ Σ is called optimal if 𝑣𝜎 (𝑠) = 𝑣∗ (𝑠) for all 𝑠 ∈ 𝑆
Given any 𝑤 ∶ 𝑆 → R, a policy 𝜎 ∈ Σ is called 𝑤-greedy if

𝜎(𝑠) ∈ arg max {𝑟(𝑠, 𝑎) + 𝛽 ∑ 𝑤(𝑠′ )𝑄(𝑠, 𝑎, 𝑠′ )} (𝑠 ∈ 𝑆)


𝑎∈𝐴(𝑠) 𝑠′ ∈𝑆

As discussed in detail below, optimal policies are precisely those that are 𝑣∗ -greedy

47.3.4 Two Operators

It is useful to define the following operators:

• The Bellman operator 𝑇 ∶ R𝑆 → R𝑆 is defined by

(𝑇 𝑣)(𝑠) = max {𝑟(𝑠, 𝑎) + 𝛽 ∑ 𝑣(𝑠′ )𝑄(𝑠, 𝑎, 𝑠′ )} (𝑠 ∈ 𝑆)


𝑎∈𝐴(𝑠)
𝑠′ ∈𝑆

• For any policy function 𝜎 ∈ Σ, the operator 𝑇𝜎 ∶ R𝑆 → R𝑆 is defined by


788 47. DISCRETE STATE DYNAMIC PROGRAMMING

(𝑇𝜎 𝑣)(𝑠) = 𝑟(𝑠, 𝜎(𝑠)) + 𝛽 ∑ 𝑣(𝑠′ )𝑄(𝑠, 𝜎(𝑠), 𝑠′ ) (𝑠 ∈ 𝑆)


𝑠′ ∈𝑆

This can be written more succinctly in operator notation as

𝑇𝜎 𝑣 = 𝑟𝜎 + 𝛽𝑄𝜎 𝑣

The two operators are both monotone

• 𝑣 ≤ 𝑤 implies 𝑇 𝑣 ≤ 𝑇 𝑤 pointwise on 𝑆, and similarly for 𝑇𝜎

They are also contraction mappings with modulus 𝛽

• ‖𝑇 𝑣 − 𝑇 𝑤‖ ≤ 𝛽‖𝑣 − 𝑤‖ and similarly for 𝑇𝜎 , where ‖⋅‖ is the max norm

For any policy 𝜎, its value 𝑣𝜎 is the unique fixed point of 𝑇𝜎


For proofs of these results and those in the next section, see, for example, EDTC, chapter 10

47.3.5 The Bellman Equation and the Principle of Optimality

The main principle of the theory of dynamic programming is that

• the optimal value function 𝑣∗ is a unique solution to the Bellman equation

𝑣(𝑠) = max {𝑟(𝑠, 𝑎) + 𝛽 ∑ 𝑣(𝑠′ )𝑄(𝑠, 𝑎, 𝑠′ )} (𝑠 ∈ 𝑆)


𝑎∈𝐴(𝑠)
𝑠′ ∈𝑆

or in other words, 𝑣∗ is the unique fixed point of 𝑇 , and

• 𝜎∗ is an optimal policy function if and only if it is 𝑣∗ -greedy

By the definition of greedy policies given above, this means that

𝜎∗ (𝑠) ∈ arg max {𝑟(𝑠, 𝑎) + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑄(𝑠, 𝜎(𝑠), 𝑠′ )} (𝑠 ∈ 𝑆)


𝑎∈𝐴(𝑠) 𝑠′ ∈𝑆

47.4 Solving Discrete DPs

Now that the theory has been set out, let’s turn to solution methods
The code for solving discrete DPs is available in ddp.py from the QuantEcon.py code library
It implements the three most important solution methods for discrete dynamic programs,
namely
47.4. SOLVING DISCRETE DPS 789

• value function iteration


• policy function iteration
• modified policy function iteration

Let’s briefly review these algorithms and their implementation

47.4.1 Value Function Iteration

Perhaps the most familiar method for solving all manner of dynamic programs is value func-
tion iteration
This algorithm uses the fact that the Bellman operator 𝑇 is a contraction mapping with fixed
point 𝑣∗
Hence, iterative application of 𝑇 to any initial function 𝑣0 ∶ 𝑆 → R converges to 𝑣∗
The details of the algorithm can be found in the appendix

47.4.2 Policy Function Iteration

This routine, also known as Howard’s policy improvement algorithm, exploits more closely the
particular structure of a discrete DP problem
Each iteration consists of

1. A policy evaluation step that computes the value 𝑣𝜎 of a policy 𝜎 by solving the linear
equation 𝑣 = 𝑇𝜎 𝑣
2. A policy improvement step that computes a 𝑣𝜎 -greedy policy

In the current setting, policy iteration computes an exact optimal policy in finitely many iter-
ations

• See theorem 10.2.6 of EDTC for a proof

The details of the algorithm can be found in the appendix

47.4.3 Modified Policy Function Iteration

Modified policy iteration replaces the policy evaluation step in policy iteration with “partial
policy evaluation”
The latter computes an approximation to the value of a policy 𝜎 by iterating 𝑇𝜎 for a speci-
fied number of times
This approach can be useful when the state space is very large and the linear system in the
policy evaluation step of policy iteration is correspondingly difficult to solve
The details of the algorithm can be found in the appendix
790 47. DISCRETE STATE DYNAMIC PROGRAMMING

47.5 Example: A Growth Model

Let’s consider a simple consumption-saving model


A single household either consumes or stores its own output of a single consumption good
The household starts each period with current stock 𝑠
Next, the household chooses a quantity 𝑎 to store and consumes 𝑐 = 𝑠 − 𝑎

• Storage is limited by a global upper bound 𝑀


• Flow utility is 𝑢(𝑐) = 𝑐𝛼

Output is drawn from a discrete uniform distribution on {0, … , 𝐵}


The next period stock is therefore

𝑠′ = 𝑎 + 𝑈 where 𝑈 ∼ 𝑈 [0, … , 𝐵]

The discount factor is 𝛽 ∈ [0, 1)

47.5.1 Discrete DP Representation

We want to represent this model in the format of a discrete dynamic program


To this end, we take

• the state variable to be the stock 𝑠

• the state space to be 𝑆 = {0, … , 𝑀 + 𝐵}

– hence 𝑛 = 𝑀 + 𝐵 + 1

• the action to be the storage quantity 𝑎

• the set of feasible actions at 𝑠 to be 𝐴(𝑠) = {0, … , min{𝑠, 𝑀 }}

– hence 𝐴 = {0, … , 𝑀 } and 𝑚 = 𝑀 + 1

• the reward function to be 𝑟(𝑠, 𝑎) = 𝑢(𝑠 − 𝑎)

• the transition probabilities to be

1
if 𝑎 ≤ 𝑠′ ≤ 𝑎 + 𝐵
𝑄(𝑠, 𝑎, 𝑠′ ) ∶= { 𝐵+1 (3)
0 otherwise

47.5.2 Defining a DiscreteDP Instance

This information will be used to create an instance of DiscreteDP by passing the following
information
47.5. EXAMPLE: A GROWTH MODEL 791

1. An 𝑛 × 𝑚 reward array 𝑅
2. An 𝑛 × 𝑚 × 𝑛 transition probability array 𝑄
3. A discount factor 𝛽

For 𝑅 we set 𝑅[𝑠, 𝑎] = 𝑢(𝑠 − 𝑎) if 𝑎 ≤ 𝑠 and −∞ otherwise


For 𝑄 we follow the rule in Eq. (3)
Note:

• The feasibility constraint is embedded into 𝑅 by setting 𝑅[𝑠, 𝑎] = −∞ for 𝑎 ∉ 𝐴(𝑠)


• Probability distributions for (𝑠, 𝑎) with 𝑎 ∉ 𝐴(𝑠) can be arbitrary

The following code sets up these objects for us

In [2]: import numpy as np

class SimpleOG:

def __init__(self, B=10, M=5, α=0.5, β=0.9):


"""
Set up R, Q and β, the three elements that define an instance of
the DiscreteDP class.
"""

self.B, self.M, self.α, self.β = B, M, α, β


self.n = B + M + 1
self.m = M + 1

self.R = np.empty((self.n, self.m))


self.Q = np.zeros((self.n, self.m, self.n))

self.populate_Q()
self.populate_R()

def u(self, c):


return c**self.α

def populate_R(self):
"""
Populate the R matrix, with R[s, a] = -np.inf for infeasible
state-action pairs.
"""
for s in range(self.n):
for a in range(self.m):
self.R[s, a] = self.u(s - a) if a <= s else -np.inf

def populate_Q(self):
"""
Populate the Q matrix by setting

Q[s, a, s'] = 1 / (1 + B) if a <= s' <= a + B

and zero otherwise.


"""

for a in range(self.m):
self.Q[:, a, a:(a + self.B + 1)] = 1.0 / (self.B + 1)

Let’s run this code and create an instance of SimpleOG

In [3]: g = SimpleOG() # Use default parameters

Instances of DiscreteDP are created using the signature DiscreteDP(R, Q, β)


Let’s create an instance using the objects stored in g
792 47. DISCRETE STATE DYNAMIC PROGRAMMING

In [4]: import quantecon as qe

ddp = qe.markov.DiscreteDP(g.R, g.Q, g.β)

Now that we have an instance ddp of DiscreteDP we can solve it as follows

In [5]: results = ddp.solve(method='policy_iteration')

Let’s see what we’ve got here

In [6]: dir(results)

Out[6]: ['max_iter', 'mc', 'method', 'num_iter', 'sigma', 'v']

(In IPython version 4.0 and above you can also type results. and hit the tab key)
The most important attributes are v, the value function, and σ, the optimal policy

In [7]: results.v

Out[7]: array([19.01740222, 20.01740222, 20.43161578, 20.74945302, 21.04078099,


21.30873018, 21.54479816, 21.76928181, 21.98270358, 22.18824323,
22.3845048 , 22.57807736, 22.76109127, 22.94376708, 23.11533996,
23.27761762])

In [8]: results.sigma

Out[8]: array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5])

Since we’ve used policy iteration, these results will be exact unless we hit the iteration bound
max_iter
Let’s make sure this didn’t happen

In [9]: results.max_iter

Out[9]: 250

In [10]: results.num_iter

Out[10]: 3

Another interesting object is results.mc, which is the controlled chain defined by 𝑄𝜎∗ ,
where 𝜎∗ is the optimal policy
In other words, it gives the dynamics of the state when the agent follows the optimal policy
Since this object is an instance of MarkovChain from QuantEcon.py (see this lecture for more
discussion), we can easily simulate it, compute its stationary distribution and so on

In [11]: results.mc.stationary_distributions

Out[11]: array([[0.01732187, 0.04121063, 0.05773956, 0.07426848, 0.08095823,


0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909,
0.09090909, 0.07358722, 0.04969846, 0.03316953, 0.01664061,
0.00995086]])
47.5. EXAMPLE: A GROWTH MODEL 793

Here’s the same information in a bar graph

What happens if the agent is more patient?

In [12]: ddp = qe.markov.DiscreteDP(g.R, g.Q, 0.99) # Increase β to 0.99


results = ddp.solve(method='policy_iteration')
results.mc.stationary_distributions

Out[12]: array([[0.00546913, 0.02321342, 0.03147788, 0.04800681, 0.05627127,


0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909,
0.09090909, 0.08543996, 0.06769567, 0.05943121, 0.04290228,
0.03463782]])

If we look at the bar graph we can see the rightward shift in probability mass

47.5.3 State-Action Pair Formulation

The DiscreteDP class in fact, provides a second interface to set up an instance


One of the advantages of this alternative set up is that it permits the use of a sparse matrix
for Q
(An example of using sparse matrices is given in the exercises below)
794 47. DISCRETE STATE DYNAMIC PROGRAMMING

The call signature of the second formulation is DiscreteDP(R, Q, β, s_indices,


a_indices) where

• s_indices and a_indices are arrays of equal length L enumerating all feasible
state-action pairs
• R is an array of length L giving corresponding rewards
• Q is an L x n transition probability array

Here’s how we could set up these objects for the preceding example

In [13]: B, M, α, β = 10, 5, 0.5, 0.9


n = B + M + 1
m = M + 1

def u(c):
return c**α

s_indices = []
a_indices = []
Q = []
R = []
b = 1.0 / (B + 1)

for s in range(n):
for a in range(min(M, s) + 1): # All feasible a at this s
s_indices.append(s)
a_indices.append(a)
q = np.zeros(n)
q[a:(a + B + 1)] = b # b on these values, otherwise 0
Q.append(q)
R.append(u(s - a))

ddp = qe.markov.DiscreteDP(R, Q, β, s_indices, a_indices)

For larger problems, you might need to write this code more efficiently by vectorizing or using
Numba

47.6 Exercises

In the stochastic optimal growth lecture dynamic programming lecture, we solve a benchmark
model that has an analytical solution to check we could replicate it numerically
The exercise is to replicate this solution using DiscreteDP

47.7 Solutions

Written jointly with Diasuke Oyama


Let’s start with some imports

In [14]: import scipy.sparse as sparse


import matplotlib.pyplot as plt
%matplotlib inline
from quantecon import compute_fixed_point
from quantecon.markov import DiscreteDP
47.7. SOLUTIONS 795

47.7.1 Setup

Details of the model can be found in the lecture on optimal growth


As in the lecture, we let 𝑓(𝑘) = 𝑘𝛼 with 𝛼 = 0.65, 𝑢(𝑐) = log 𝑐, and 𝛽 = 0.95

In [15]: α = 0.65
f = lambda k: k**α
u = np.log
β = 0.95

Here we want to solve a finite state version of the continuous state model above
We discretize the state space into a grid of size grid_size=500, from 10−6 to grid_max=2

In [16]: grid_max = 2
grid_size = 500
grid = np.linspace(1e-6, grid_max, grid_size)

We choose the action to be the amount of capital to save for the next period (the state is the
capital stock at the beginning of the period)
Thus the state indices and the action indices are both 0, …, grid_size-1
Action (indexed by) a is feasible at state (indexed by) s if and only if grid[a] <
f([grid[s]) (zero consumption is not allowed because of the log utility)
Thus the Bellman equation is:

𝑣(𝑘) = max 𝑢(𝑓(𝑘) − 𝑘′ ) + 𝛽𝑣(𝑘′ ),


0<𝑘′ <𝑓(𝑘)

where 𝑘′ is the capital stock in the next period


The transition probability array Q will be highly sparse (in fact it is degenerate as the model
is deterministic), so we formulate the problem with state-action pairs, to represent Q in scipy
sparse matrix format
We first construct indices for state-action pairs:

In [17]: # Consumption matrix, with nonpositive consumption included


C = f(grid).reshape(grid_size, 1) - grid.reshape(1, grid_size)

# State-action indices
s_indices, a_indices = np.where(C > 0)

# Number of state-action pairs


L = len(s_indices)

print(L)
print(s_indices)
print(a_indices)

118841
[ 0 1 1 … 499 499 499]
[ 0 0 1 … 389 390 391]

Reward vector R (of length L):

In [18]: R = u(C[s_indices, a_indices])


796 47. DISCRETE STATE DYNAMIC PROGRAMMING

(Degenerate) transition probability matrix Q (of shape (L, grid_size)), where we choose
the scipy.sparse.lil_matrix format, while any format will do (internally it will be converted to
the csr format):

In [19]: Q = sparse.lil_matrix((L, grid_size))


Q[np.arange(L), a_indices] = 1

(If you are familiar with the data structure of scipy.sparse.csr_matrix, the following is the
most efficient way to create the Q matrix in the current case)

In [20]: # data = np.ones(L)


# indptr = np.arange(L+1)
# Q = sparse.csr_matrix((data, a_indices, indptr), shape=(L, grid_size))

Discrete growth model:

In [21]: ddp = DiscreteDP(R, Q, β, s_indices, a_indices)

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-21-1f5786e1c771> in <module>
----> 1 ddp = DiscreteDP(R, Q, β, s_indices, a_indices)

~/anaconda3/lib/python3.7/site-packages/quantecon/markov/ddp.py in __init__(self, R, Q, beta, s_indice


418 # Linear equation solver to be used in evaluate_policy
419 if self._sparse:
--> 420 self._lineq_solve = sp.linalg.spsolve
421 self._I = sp.identity(self.num_states, format='csr')
422 else:

AttributeError: module 'scipy.sparse' has no attribute 'linalg'

Notes
Here we intensively vectorized the operations on arrays to simplify the code
As noted, however, vectorization is memory consumptive, and it can be prohibitively so for
grids with large size

47.7.2 Solving the Model

Solve the dynamic optimization problem:

In [22]: res = ddp.solve(method='policy_iteration')


v, σ, num_iter = res.v, res.sigma, res.num_iter
num_iter

Out[22]: 3

Note that sigma contains the indices of the optimal capital stocks to save for the next pe-
riod. The following translates sigma to the corresponding consumption vector
47.7. SOLUTIONS 797

In [23]: # Optimal consumption in the discrete version


c = f(grid) - grid[σ]

# Exact solution of the continuous version


ab = α * β
c1 = (np.log(1 - ab) + np.log(ab) * ab / (1 - ab)) / (1 - β)
c2 = α / (1 - ab)

def v_star(k):
return c1 + c2 * np.log(k)

def c_star(k):
return (1 - ab) * k**α

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-23-8108508c9c72> in <module>
1 # Optimal consumption in the discrete version
----> 2 c = f(grid) - grid[σ]
3
4 # Exact solution of the continuous version
5 ab = α * β

ValueError: operands could not be broadcast together with shapes (500,) (16,)

Let us compare the solution of the discrete model with that of the original continuous model

In [24]: fig, ax = plt.subplots(1, 2, figsize=(14, 4))


ax[0].set_ylim(-40, -32)
ax[0].set_xlim(grid[0], grid[-1])
ax[1].set_xlim(grid[0], grid[-1])

lb0 = 'discrete value function'


ax[0].plot(grid, v, lw=2, alpha=0.6, label=lb0)

lb0 = 'continuous value function'


ax[0].plot(grid, v_star(grid), 'k-', lw=1.5, alpha=0.8, label=lb0)
ax[0].legend(loc='upper left')

lb1 = 'discrete optimal consumption'


ax[1].plot(grid, c, 'b-', lw=2, alpha=0.6, label=lb1)

lb1 = 'continuous optimal consumption'


ax[1].plot(grid, c_star(grid), 'k-', lw=1.5, alpha=0.8, label=lb1)
ax[1].legend(loc='upper left')
plt.show()

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-24-7350cbf9e3b1> in <module>
5
6 lb0 = 'discrete value function'
----> 7 ax[0].plot(grid, v, lw=2, alpha=0.6, label=lb0)
8
9 lb0 = 'continuous value function'

~/anaconda3/lib/python3.7/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)


1808 "the Matplotlib list!)" % (label_namer, func.__name__),
1809 RuntimeWarning, stacklevel=2)
-> 1810 return func(ax, *args, **kwargs)
798 47. DISCRETE STATE DYNAMIC PROGRAMMING

1811
1812 inner.__doc__ = _add_data_doc(inner.__doc__,

~/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, *args,


1609 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D._alias_map)
1610
-> 1611 for line in self._get_lines(*args, **kwargs):
1612 self.add_line(line)
1613 lines.append(line)

~/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py in _grab_next_args(self, *args, **kwa


391 this += args[0],
392 args = args[1:]
--> 393 yield from self._plot_args(this, kwargs)
394
395

~/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs)


368 x, y = index_of(tup[-1])
369
--> 370 x, y = self._xy_from_xy(x, y)
371
372 if self.command == 'plot':

~/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py in _xy_from_xy(self, x, y)
229 if x.shape[0] != y.shape[0]:
230 raise ValueError("x and y must have same first dimension, but "
--> 231 "have shapes {} and {}".format(x.shape, y.shape))
232 if x.ndim > 2 or y.ndim > 2:
233 raise ValueError("x and y can be no greater than 2-D, but have "

ValueError: x and y must have same first dimension, but have shapes (500,) and (16,)

The outcomes appear very close to those of the continuous version


Except for the “boundary” point, the value functions are very close:

In [25]: np.abs(v - v_star(grid)).max()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-25-f495487d57cd> in <module>
----> 1 np.abs(v - v_star(grid)).max()
47.7. SOLUTIONS 799

NameError: name 'v_star' is not defined

In [26]: np.abs(v - v_star(grid))[1:].max()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-26-e0fc69a759f9> in <module>
----> 1 np.abs(v - v_star(grid))[1:].max()

NameError: name 'v_star' is not defined

The optimal consumption functions are close as well:

In [27]: np.abs(c - c_star(grid)).max()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-27-c41ccfb31a8e> in <module>
----> 1 np.abs(c - c_star(grid)).max()

NameError: name 'c' is not defined

In fact, the optimal consumption obtained in the discrete version is not really monotone, but
the decrements are quite small:

In [28]: diff = np.diff(c)


(diff >= 0).all()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-28-c5e3251b22c6> in <module>
----> 1 diff = np.diff(c)
2 (diff >= 0).all()

NameError: name 'c' is not defined

In [29]: dec_ind = np.where(diff < 0)[0]


len(dec_ind)

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-29-709b5e262c1e> in <module>
----> 1 dec_ind = np.where(diff < 0)[0]
2 len(dec_ind)

NameError: name 'diff' is not defined


800 47. DISCRETE STATE DYNAMIC PROGRAMMING

In [30]: np.abs(diff[dec_ind]).max()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-30-6d006cd8d323> in <module>
----> 1 np.abs(diff[dec_ind]).max()

NameError: name 'diff' is not defined

The value function is monotone:

In [31]: (np.diff(v) > 0).all()

Out[31]: True

47.7.3 Comparison of the Solution Methods

Let us solve the problem with the other two methods


Value Iteration

In [32]: ddp.epsilon = 1e-4


ddp.max_iter = 500
res1 = ddp.solve(method='value_iteration')
res1.num_iter

Out[32]: 123

In [33]: np.array_equal(σ, res1.sigma)

Out[33]: True

Modified Policy Iteration

In [34]: res2 = ddp.solve(method='modified_policy_iteration')


res2.num_iter

Out[34]: 5

In [35]: np.array_equal(σ, res2.sigma)

Out[35]: True

Speed Comparison

%timeit ddp.solve(method='value_iteration')
%timeit ddp.solve(method='policy_iteration')
%timeit ddp.solve(method='modified_policy_iteration')

As is often the case, policy iteration and modified policy iteration are much faster than value
iteration
47.7. SOLUTIONS 801

47.7.4 Replication of the Figures

Using DiscreteDP we replicate the figures shown in the lecture


Convergence of Value Iteration
Let us first visualize the convergence of the value iteration algorithm as in the lecture, where
we use ddp.bellman_operator implemented as a method of DiscreteDP

In [36]: w = 5 * np.log(grid) - 25 # Initial condition


n = 35
fig, ax = plt.subplots(figsize=(8,5))
ax.set_ylim(-40, -20)
ax.set_xlim(np.min(grid), np.max(grid))
lb = 'initial condition'
ax.plot(grid, w, color=plt.cm.jet(0), lw=2, alpha=0.6, label=lb)
for i in range(n):
w = ddp.bellman_operator(w)
ax.plot(grid, w, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
lb = 'true value function'
ax.plot(grid, v_star(grid), 'k-', lw=2, alpha=0.8, label=lb)
ax.legend(loc='upper left')

plt.show()

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-36-83048ad9d92c> in <module>
7 ax.plot(grid, w, color=plt.cm.jet(0), lw=2, alpha=0.6, label=lb)
8 for i in range(n):
----> 9 w = ddp.bellman_operator(w)
10 ax.plot(grid, w, color=plt.cm.jet(i / n), lw=2, alpha=0.6)
11 lb = 'true value function'

~/anaconda3/lib/python3.7/site-packages/quantecon/markov/ddp.py in bellman_operator(self, v, Tv, sigma


573
574 """
--> 575 vals = self.R + self.beta * self.Q.dot(v) # Shape: (L,) or (n, m)
576
577 if Tv is None:

ValueError: shapes (81,16) and (500,) not aligned: 16 (dim 1) != 500 (dim 0)
802 47. DISCRETE STATE DYNAMIC PROGRAMMING

We next plot the consumption policies along with the value iteration

In [37]: w = 5 * u(grid) - 25 # Initial condition

fig, ax = plt.subplots(3, 1, figsize=(8, 10))


true_c = c_star(grid)

for i, n in enumerate((2, 4, 6)):


ax[i].set_ylim(0, 1)
ax[i].set_xlim(0, 2)
ax[i].set_yticks((0, 1))
ax[i].set_xticks((0, 2))

w = 5 * u(grid) - 25 # Initial condition


compute_fixed_point(ddp.bellman_operator, w, max_iter=n, print_skip=1)
σ = ddp.compute_greedy(w) # Policy indices
c_policy = f(grid) - grid[σ]

ax[i].plot(grid, c_policy, 'b-', lw=2, alpha=0.8,


label='approximate optimal consumption policy')
ax[i].plot(grid, true_c, 'k-', lw=2, alpha=0.8,
label='true optimal consumption policy')
ax[i].legend(loc='upper left')
ax[i].set_title(f'{n} value function iterations')
plt.show()

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-37-6b660fc306cf> in <module>
2
3 fig, ax = plt.subplots(3, 1, figsize=(8, 10))
----> 4 true_c = c_star(grid)
5
6 for i, n in enumerate((2, 4, 6)):

NameError: name 'c_star' is not defined


47.7. SOLUTIONS 803

Dynamics of the Capital Stock


Finally, let us work on Exercise 2, where we plot the trajectories of the capital stock for three
different discount factors, 0.9, 0.94, and 0.98, with initial condition 𝑘0 = 0.1

In [38]: discount_factors = (0.9, 0.94, 0.98)


k_init = 0.1

# Search for the index corresponding to k_init


k_init_ind = np.searchsorted(grid, k_init)

sample_size = 25

fig, ax = plt.subplots(figsize=(8,5))
ax.set_xlabel("time")
ax.set_ylabel("capital")
ax.set_ylim(0.10, 0.30)
804 47. DISCRETE STATE DYNAMIC PROGRAMMING

# Create a new instance, not to modify the one used above


ddp0 = DiscreteDP(R, Q, β, s_indices, a_indices)

for beta in discount_factors:


ddp0.beta = beta
res0 = ddp0.solve()
k_path_ind = res0.mc.simulate(init=k_init_ind, ts_length=sample_size)
k_path = grid[k_path_ind]
ax.plot(k_path, 'o-', lw=2, alpha=0.75, label=f'$\\beta = {beta}$')

ax.legend(loc='lower right')
plt.show()

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-38-7e126512fe9e> in <module>
13
14 # Create a new instance, not to modify the one used above
---> 15 ddp0 = DiscreteDP(R, Q, β, s_indices, a_indices)
16
17 for beta in discount_factors:

~/anaconda3/lib/python3.7/site-packages/quantecon/markov/ddp.py in __init__(self, R, Q, beta, s_indice


418 # Linear equation solver to be used in evaluate_policy
419 if self._sparse:
--> 420 self._lineq_solve = sp.linalg.spsolve
421 self._I = sp.identity(self.num_states, format='csr')
422 else:

AttributeError: module 'scipy.sparse' has no attribute 'linalg'


47.8. APPENDIX: ALGORITHMS 805

47.8 Appendix: Algorithms

This appendix covers the details of the solution algorithms implemented for DiscreteDP
We will make use of the following notions of approximate optimality:

• For 𝜀 > 0, 𝑣 is called an 𝜀-approximation of 𝑣∗ if ‖𝑣 − 𝑣∗ ‖ < 𝜀


• A policy 𝜎 ∈ Σ is called 𝜀-optimal if 𝑣𝜎 is an 𝜀-approximation of 𝑣∗

47.8.1 Value Iteration

The DiscreteDP value iteration method implements value function iteration as follows

1. Choose any 𝑣0 ∈ R𝑛 , and specify 𝜀 > 0; set 𝑖 = 0


2. Compute 𝑣𝑖+1 = 𝑇 𝑣𝑖
3. If ‖𝑣𝑖+1 − 𝑣𝑖 ‖ < [(1 − 𝛽)/(2𝛽)]𝜀, then go to step 4; otherwise, set 𝑖 = 𝑖 + 1 and go to step
2
4. Compute a 𝑣𝑖+1 -greedy policy 𝜎, and return 𝑣𝑖+1 and 𝜎

Given 𝜀 > 0, the value iteration algorithm

• terminates in a finite number of iterations


• returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy
function (unless iter_max is reached)

(While not explicit, in the actual implementation each algorithm is terminated if the number
of iterations reaches iter_max)

47.8.2 Policy Iteration

The DiscreteDP policy iteration method runs as follows

1. Choose any 𝑣0 ∈ R𝑛 and compute a 𝑣0 -greedy policy 𝜎0 ; set 𝑖 = 0


2. Compute the value 𝑣𝜎𝑖 by solving the equation 𝑣 = 𝑇𝜎𝑖 𝑣
3. Compute a 𝑣𝜎𝑖 -greedy policy 𝜎𝑖+1 ; let 𝜎𝑖+1 = 𝜎𝑖 if possible
4. If 𝜎𝑖+1 = 𝜎𝑖 , then return 𝑣𝜎𝑖 and 𝜎𝑖+1 ; otherwise, set 𝑖 = 𝑖 + 1 and go to step 2

The policy iteration algorithm terminates in a finite number of iterations


It returns an optimal value function and an optimal policy function (unless iter_max is
reached)

47.8.3 Modified Policy Iteration

The DiscreteDP modified policy iteration method runs as follows:

1. Choose any 𝑣0 ∈ R𝑛 , and specify 𝜀 > 0 and 𝑘 ≥ 0; set 𝑖 = 0


2. Compute a 𝑣𝑖 -greedy policy 𝜎𝑖+1 ; let 𝜎𝑖+1 = 𝜎𝑖 if possible (for 𝑖 ≥ 1)
806 47. DISCRETE STATE DYNAMIC PROGRAMMING

3. Compute 𝑢 = 𝑇 𝑣𝑖 (= 𝑇𝜎𝑖+1 𝑣𝑖 ). If span(𝑢 − 𝑣𝑖 ) < [(1 − 𝛽)/𝛽]𝜀, then go to step 5; other-


wise go to step 4

• Span is defined by span(𝑧) = max(𝑧) − min(𝑧)

1. Compute 𝑣𝑖+1 = (𝑇𝜎𝑖+1 )𝑘 𝑢 (= (𝑇𝜎𝑖+1 )𝑘+1 𝑣𝑖 ); set 𝑖 = 𝑖 + 1 and go to step 2


2. Return 𝑣 = 𝑢 + [𝛽/(1 − 𝛽)][(min(𝑢 − 𝑣𝑖 ) + max(𝑢 − 𝑣𝑖 ))/2]1 and 𝜎𝑖+1

Given 𝜀 > 0, provided that 𝑣0 is such that 𝑇 𝑣0 ≥ 𝑣0 , the modified policy iteration algorithm
terminates in a finite number of iterations
It returns an 𝜀/2-approximation of the optimal value function and an 𝜀-optimal policy func-
tion (unless iter_max is reached)
See also the documentation for DiscreteDP
Part VII

Multiple Agent Models

807
48

Schelling’s Segregation Model

48.1 Contents

• Outline 48.2
• The Model 48.3
• Results 48.4
• Exercises 48.5
• Solutions 48.6

48.2 Outline

In 1969, Thomas C. Schelling developed a simple but striking model of racial segregation
[121]
His model studies the dynamics of racially mixed neighborhoods
Like much of Schelling’s work, the model shows how local interactions can lead to surprising
aggregate structure
In particular, it shows that relatively mild preference for neighbors of similar race can lead in
aggregate to the collapse of mixed neighborhoods, and high levels of segregation
In recognition of this and other research, Schelling was awarded the 2005 Nobel Prize in Eco-
nomic Sciences (joint with Robert Aumann)
In this lecture, we (in fact you) will build and run a version of Schelling’s model

48.3 The Model

We will cover a variation of Schelling’s model that is easy to program and captures the main
idea

48.3.1 Set-Up

Suppose we have two types of people: orange people and green people

809
810 48. SCHELLING’S SEGREGATION MODEL

For the purpose of this lecture, we will assume there are 250 of each type
These agents all live on a single unit square
The location of an agent is just a point (𝑥, 𝑦), where 0 < 𝑥, 𝑦 < 1

48.3.2 Preferences

We will say that an agent is happy if half or more of her 10 nearest neighbors are of the same
type
Here ‘nearest’ is in terms of Euclidean distance
An agent who is not happy is called unhappy
An important point here is that agents are not averse to living in mixed areas
They are perfectly happy if half their neighbors are of the other color

48.3.3 Behavior

Initially, agents are mixed together (integrated)


In particular, the initial location of each agent is an independent draw from a bivariate uni-
form distribution on 𝑆 = (0, 1)2
Now, cycling through the set of all agents, each agent is now given the chance to stay or move
We assume that each agent will stay put if they are happy and move if unhappy
The algorithm for moving is as follows

1. Draw a random location in 𝑆


2. If happy at new location, move there
3. Else, go to step 1

In this way, we cycle continuously through the agents, moving as required


We continue to cycle until no one wishes to move

48.4 Results

Let’s have a look at the results we got when we coded and ran this model
As discussed above, agents are initially mixed randomly together
48.4. RESULTS 811

But after several cycles, they become segregated into distinct regions
812 48. SCHELLING’S SEGREGATION MODEL
48.4. RESULTS 813
814 48. SCHELLING’S SEGREGATION MODEL

In this instance, the program terminated after 4 cycles through the set of agents, indicating
that all agents had reached a state of happiness
What is striking about the pictures is how rapidly racial integration breaks down
This is despite the fact that people in the model don’t actually mind living mixed with the
other type
Even with these preferences, the outcome is a high degree of segregation

48.5 Exercises

48.5.1 Exercise 1

Implement and run this simulation for yourself


Consider the following structure for your program
Agents can be modeled as objects
Here’s an indication of how they might look

* Data:

* type (green or orange)


* location
48.6. SOLUTIONS 815

* Methods:

* determine whether happy or not given locations of other agents

* If not happy, move

* find a new location where happy

And here’s some pseudocode for the main loop

while agents are still moving


for agent in agents
give agent the opportunity to move

Use 250 agents of each type

48.6 Solutions

48.6.1 Exercise 1

Here’s one solution that does the job we want


If you feel like a further exercise, you can probably speed up some of the computations and
then increase the number of agents

In [1]: from random import uniform, seed


from math import sqrt
import matplotlib.pyplot as plt
%matplotlib inline

seed(10) # for reproducible random numbers

class Agent:

def __init__(self, type):


self.type = type
self.draw_location()

def draw_location(self):
self.location = uniform(0, 1), uniform(0, 1)

def get_distance(self, other):


"Computes the euclidean distance between self and other agent."
a = (self.location[0] - other.location[0])**2
b = (self.location[1] - other.location[1])**2
return sqrt(a + b)

def happy(self, agents):


"True if sufficient number of nearest neighbors are of the same type."
distances = []
# distances is a list of pairs (d, agent), where d is distance from
# agent to self
for agent in agents:
if self != agent:
distance = self.get_distance(agent)
distances.append((distance, agent))
# == Sort from smallest to largest, according to distance == #
distances.sort()
# == Extract the neighboring agents == #
816 48. SCHELLING’S SEGREGATION MODEL

neighbors = [agent for d, agent in distances[:num_neighbors]]


# == Count how many neighbors have the same type as self == #
num_same_type = sum(self.type == agent.type for agent in neighbors)
return num_same_type >= require_same_type

def update(self, agents):


"If not happy, then randomly choose new locations until happy."
while not self.happy(agents):
self.draw_location()

def plot_distribution(agents, cycle_num):


"Plot the distribution of agents after cycle_num rounds of the loop."
x_values_0, y_values_0 = [], []
x_values_1, y_values_1 = [], []
# == Obtain locations of each type == #
for agent in agents:
x, y = agent.location
if agent.type == 0:
x_values_0.append(x)
y_values_0.append(y)
else:
x_values_1.append(x)
y_values_1.append(y)
fig, ax = plt.subplots(figsize=(8, 8))
plot_args = {'markersize': 8, 'alpha': 0.6}
ax.set_facecolor('azure')
ax.plot(x_values_0, y_values_0, 'o', markerfacecolor='orange', **plot_args)
ax.plot(x_values_1, y_values_1, 'o', markerfacecolor='green', **plot_args)
ax.set_title(f'Cycle {cycle_num-1}')
plt.show()

# == Main == #

num_of_type_0 = 250
num_of_type_1 = 250
num_neighbors = 10 # Number of agents regarded as neighbors
require_same_type = 5 # Want at least this many neighbors to be same type

# == Create a list of agents == #


agents = [Agent(0) for i in range(num_of_type_0)]
agents.extend(Agent(1) for i in range(num_of_type_1))

count = 1
# == Loop until none wishes to move == #
while True:
print('Entering loop ', count)
plot_distribution(agents, count)
count += 1
no_one_moved = True
for agent in agents:
old_location = agent.location
agent.update(agents)
if agent.location != old_location:
no_one_moved = False
if no_one_moved:
break

print('Converged, terminating.')

Entering loop 1
48.6. SOLUTIONS 817

Entering loop 2
818 48. SCHELLING’S SEGREGATION MODEL

Entering loop 3
48.6. SOLUTIONS 819

Entering loop 4
820 48. SCHELLING’S SEGREGATION MODEL

Converged, terminating.
49

A Lake Model of Employment and


Unemployment

49.1 Contents

• Overview 49.2

• The Model 49.3

• Implementation 49.4

• Dynamics of an Individual Worker 49.5

• Endogenous Job Finding Rate 49.6

• Exercises 49.7

• Solutions 49.8

• Lake Model Solutions 49.9

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

49.2 Overview

This lecture describes what has come to be called a lake model


The lake model is a basic tool for modeling unemployment
It allows us to analyze

• flows between unemployment and employment


• how these flows influence steady state employment and unemployment rates

It is a good model for interpreting monthly labor department reports on gross and net jobs
created and jobs destroyed

821
822 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

The “lakes” in the model are the pools of employed and unemployed
The “flows” between the lakes are caused by

• firing and hiring


• entry and exit from the labor force

For the first part of this lecture, the parameters governing transitions into and out of unem-
ployment and employment are exogenous
Later, we’ll determine some of these transition rates endogenously using the McCall search
model
We’ll also use some nifty concepts like ergodicity, which provides a fundamental link between
cross-sectional and long run time series distributions
These concepts will help us build an equilibrium model of ex-ante homogeneous workers
whose different luck generates variations in their ex post experiences

49.2.1 Prerequisites

Before working through what follows, we recommend you read the lecture on finite Markov
chains
You will also need some basic linear algebra and probability

49.3 The Model

The economy is inhabited by a very large number of ex-ante identical workers


The workers live forever, spending their lives moving between unemployment and employment
Their rates of transition between employment and unemployment are governed by the follow-
ing parameters:

• 𝜆, the job finding rate for currently unemployed workers


• 𝛼, the dismissal rate for currently employed workers
• 𝑏, the entry rate into the labor force
• 𝑑, the exit rate from the labor force

The growth rate of the labor force evidently equals 𝑔 = 𝑏 − 𝑑

49.3.1 Aggregate Variables

We want to derive the dynamics of the following aggregates

• 𝐸𝑡 , the total number of employed workers at date 𝑡


• 𝑈𝑡 , the total number of unemployed workers at 𝑡
• 𝑁𝑡 , the number of workers in the labor force at 𝑡

We also want to know the values of the following objects


49.3. THE MODEL 823

• The employment rate 𝑒𝑡 ∶= 𝐸𝑡 /𝑁𝑡


• The unemployment rate 𝑢𝑡 ∶= 𝑈𝑡 /𝑁𝑡

(Here and below, capital letters represent stocks and lowercase letters represent flows)

49.3.2 Laws of Motion for Stock Variables

We begin by constructing laws of motion for the aggregate variables 𝐸𝑡 , 𝑈𝑡 , 𝑁𝑡


Of the mass of workers 𝐸𝑡 who are employed at date 𝑡,

• (1 − 𝑑)𝐸𝑡 will remain in the labor force


• of these, (1 − 𝛼)(1 − 𝑑)𝐸𝑡 will remain employed

Of the mass of workers 𝑈𝑡 workers who are currently unemployed,

• (1 − 𝑑)𝑈𝑡 will remain in the labor force


• of these, (1 − 𝑑)𝜆𝑈𝑡 will become employed

Therefore, the number of workers who will be employed at date 𝑡 + 1 will be

𝐸𝑡+1 = (1 − 𝑑)(1 − 𝛼)𝐸𝑡 + (1 − 𝑑)𝜆𝑈𝑡

A similar analysis implies

𝑈𝑡+1 = (1 − 𝑑)𝛼𝐸𝑡 + (1 − 𝑑)(1 − 𝜆)𝑈𝑡 + 𝑏(𝐸𝑡 + 𝑈𝑡 )

The value 𝑏(𝐸𝑡 + 𝑈𝑡 ) is the mass of new workers entering the labor force unemployed
The total stock of workers 𝑁𝑡 = 𝐸𝑡 + 𝑈𝑡 evolves as

𝑁𝑡+1 = (1 + 𝑏 − 𝑑)𝑁𝑡 = (1 + 𝑔)𝑁𝑡

𝑈𝑡
Letting 𝑋𝑡 ∶= ( ), the law of motion for 𝑋 is
𝐸𝑡

(1 − 𝑑)(1 − 𝜆) + 𝑏 (1 − 𝑑)𝛼 + 𝑏
𝑋𝑡+1 = 𝐴𝑋𝑡 where 𝐴 ∶= ( )
(1 − 𝑑)𝜆 (1 − 𝑑)(1 − 𝛼)

This law tells us how total employment and unemployment evolve over time

49.3.3 Laws of Motion for Rates

Now let’s derive the law of motion for rates


To get these we can divide both sides of 𝑋𝑡+1 = 𝐴𝑋𝑡 by 𝑁𝑡+1 to get

𝑈 /𝑁 1 𝑈 /𝑁
( 𝑡+1 𝑡+1 ) = 𝐴 ( 𝑡 𝑡)
𝐸𝑡+1 /𝑁𝑡+1 1+𝑔 𝐸 𝑡 /𝑁𝑡
824 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

Letting

𝑢 𝑈 /𝑁
𝑥𝑡 ∶= ( 𝑡 ) = ( 𝑡 𝑡 )
𝑒𝑡 𝐸𝑡 /𝑁𝑡

we can also write this as

̂ 1
𝑥𝑡+1 = 𝐴𝑥 𝑡 where 𝐴 ̂ ∶= 𝐴
1+𝑔

You can check that 𝑒𝑡 + 𝑢𝑡 = 1 implies that 𝑒𝑡+1 + 𝑢𝑡+1 = 1


This follows from the fact that the columns of 𝐴 ̂ sum to 1

49.4 Implementation

Let’s code up these equations


To do this we’re going to use a class that we’ll call LakeModel
This class will

1. store the primitives 𝛼, 𝜆, 𝑏, 𝑑


2. compute and store the implied objects 𝑔, 𝐴, 𝐴 ̂
3. provide methods to simulate dynamics of the stocks and rates
4. provide a method to compute the state state of the rate

To write a nice implementation, there’s an issue we have to address


Derived data such as 𝐴 depend on the primitives like 𝛼 and 𝜆
If a user alters these primitives, we would ideally like derived data to update automatically
(For example, if a user changes the value of 𝑏 for a given instance of the class, we would like
𝑔 = 𝑏 − 𝑑 to update automatically)
To achieve this outcome, we’re going to use descriptors and decorators such as @property
If you need to refresh your understanding of how these work, consult this lecture
Here’s the code:

In [2]: import numpy as np

class LakeModel:
"""
Solves the lake model and computes dynamics of unemployment stocks and
rates.

Parameters:
------------
λ : scalar
The job finding rate for currently unemployed workers
α : scalar
The dismissal rate for currently employed workers
b : scalar
Entry rate into the labor force
d : scalar
Exit rate from the labor force
49.4. IMPLEMENTATION 825

"""
def __init__(self, λ=0.283, α=0.013, b=0.0124, d=0.00822):
self._λ, self._α, self._b, self._d = λ, α, b, d
self.compute_derived_values()

def compute_derived_values(self):
# Unpack names to simplify expression
λ, α, b, d = self._λ, self._α, self._b, self._d

self._g = b - d
self._A = np.array([[(1-d) * (1-λ) + b, (1 - d) * α + b],
[ (1-d) * λ, (1 - d) * (1 - α)]])

self._A_hat = self._A / (1 + self._g)

@property
def g(self):
return self._g

@property
def A(self):
return self._A

@property
def A_hat(self):
return self._A_hat

@property
def λ(self):
return self._λ

@λ.setter
def λ(self, new_value):
self._α = new_value
self.compute_derived_values()

@property
def α(self):
return self._α

@α.setter
def α(self, new_value):
self._α = new_value
self.compute_derived_values()

@property
def b(self):
return self._b

@b.setter
def b(self, new_value):
self._b = new_value
self.compute_derived_values()

@property
def d(self):
return self._d

@d.setter
def d(self, new_value):
self._d = new_value
self.compute_derived_values()

def rate_steady_state(self, tol=1e-6):


"""
Finds the steady state of the system :math:`x_{t+1} = \hat A x_{t}`

Returns
--------
xbar : steady state vector of employment and unemployment rates
"""
826 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

x = 0.5 * np.ones(2)
error = tol + 1
while error > tol:
new_x = self.A_hat @ x
error = np.max(np.abs(new_x - x))
x = new_x
return x

def simulate_stock_path(self, X0, T):


"""
Simulates the sequence of Employment and Unemployment stocks

Parameters
------------
X0 : array
Contains initial values (E0, U0)
T : int
Number of periods to simulate

Returns
---------
X : iterator
Contains sequence of employment and unemployment stocks
"""

X = np.atleast_1d(X0) # Recast as array just in case


for t in range(T):
yield X
X = self.A @ X

def simulate_rate_path(self, x0, T):


"""
Simulates the sequence of employment and unemployment rates

Parameters
------------
x0 : array
Contains initial values (e0,u0)
T : int
Number of periods to simulate

Returns
---------
x : iterator
Contains sequence of employment and unemployment rates

"""
x = np.atleast_1d(x0) # Recast as array just in case
for t in range(T):
yield x
x = self.A_hat @ x

As desired, if we create an instance and update a primitive like 𝛼, derived objects like 𝐴 will
also change

In [3]: lm = LakeModel()
lm.α

Out[3]: 0.013

In [4]: lm.A

Out[4]: array([[0.72350626, 0.02529314],


[0.28067374, 0.97888686]])

In [5]: lm.α = 2
lm.A

Out[5]: array([[ 0.72350626, 1.99596 ],


[ 0.28067374, -0.99178 ]])
49.4. IMPLEMENTATION 827

49.4.1 Aggregate Dynamics

Let’s run a simulation under the default parameters (see above) starting from 𝑋0 = (12, 138)

In [6]: import matplotlib.pyplot as plt


%matplotlib inline

lm = LakeModel()
N_0 = 150 # Population
e_0 = 0.92 # Initial employment rate
u_0 = 1 - e_0 # Initial unemployment rate
T = 50 # Simulation length

U_0 = u_0 * N_0


E_0 = e_0 * N_0

fig, axes = plt.subplots(3, 1, figsize=(10, 8))


X_0 = (U_0, E_0)
X_path = np.vstack(lm.simulate_stock_path(X_0, T))

axes[0].plot(X_path[:, 0], lw=2)


axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1], lw=2)


axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1), lw=2)
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:15: FutureWarning: arrays to stack must


from ipykernel import kernelapp as app
828 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

The aggregates 𝐸𝑡 and 𝑈𝑡 don’t converge because their sum 𝐸𝑡 + 𝑈𝑡 grows at rate 𝑔
On the other hand, the vector of employment and unemployment rates 𝑥𝑡 can be in a steady
state 𝑥̄ if there exists an 𝑥̄ such that

̂ ̄
• 𝑥 ̄ = 𝐴𝑥
• the components satisfy 𝑒 ̄ + 𝑢̄ = 1

This equation tells us that a steady state level 𝑥̄ is an eigenvector of 𝐴 ̂ associated with a unit
eigenvalue
We also have 𝑥𝑡 → 𝑥̄ as 𝑡 → ∞ provided that the remaining eigenvalue of 𝐴 ̂ has modulus less
that 1
This is the case for our default parameters:

In [7]: lm = LakeModel()
e, f = np.linalg.eigvals(lm.A_hat)
abs(e), abs(f)

Out[7]: (0.6953067378358462, 1.0)

Let’s look at the convergence of the unemployment and employment rate to steady state lev-
els (dashed red line)

In [8]: lm = LakeModel()
e_0 = 0.92 # Initial employment rate
49.5. DYNAMICS OF AN INDIVIDUAL WORKER 829

u_0 = 1 - e_0 # Initial unemployment rate


T = 50 # Simulation length

xbar = lm.rate_steady_state()

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


x_0 = (u_0, e_0)
x_path = np.vstack(lm.simulate_rate_path(x_0, T))

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i], lw=2, alpha=0.5)
axes[i].hlines(xbar[i], 0, T, 'r', '--')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: FutureWarning: arrays to stack must


# Remove the CWD from sys.path while we load stuff.

49.5 Dynamics of an Individual Worker

An individual worker’s employment dynamics are governed by a finite state Markov process
The worker can be in one of two states:
830 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

• 𝑠𝑡 = 0 means unemployed
• 𝑠𝑡 = 1 means employed

Let’s start off under the assumption that 𝑏 = 𝑑 = 0


The associated transition matrix is then

1−𝜆 𝜆
𝑃 =( )
𝛼 1−𝛼

Let 𝜓𝑡 denote the marginal distribution over employment/unemployment states for the
worker at time 𝑡
As usual, we regard it as a row vector
We know from an earlier discussion that 𝜓𝑡 follows the law of motion

𝜓𝑡+1 = 𝜓𝑡 𝑃

We also know from the lecture on finite Markov chains that if 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then
𝑃 has a unique stationary distribution, denoted here by 𝜓∗
The unique stationary distribution satisfies

𝛼
𝜓∗ [0] =
𝛼+𝜆

Not surprisingly, probability mass on the unemployment state increases with the dismissal
rate and falls with the job finding rate

49.5.1 Ergodicity

Let’s look at a typical lifetime of employment-unemployment spells


We want to compute the average amounts of time an infinitely lived worker would spend em-
ployed and unemployed
Let

1 𝑇
𝑠𝑢,𝑇
̄ ∶= ∑ 1{𝑠𝑡 = 0}
𝑇 𝑡=1

and

1 𝑇
𝑠𝑒,𝑇
̄ ∶= ∑ 1{𝑠𝑡 = 1}
𝑇 𝑡=1

(As usual, 1{𝑄} = 1 if statement 𝑄 is true and 0 otherwise)


These are the fraction of time a worker spends unemployed and employed, respectively, up
until period 𝑇
If 𝛼 ∈ (0, 1) and 𝜆 ∈ (0, 1), then 𝑃 is ergodic, and hence we have
49.5. DYNAMICS OF AN INDIVIDUAL WORKER 831

lim 𝑠𝑢,𝑇
̄ = 𝜓∗ [0] and ̄ = 𝜓∗ [1]
lim 𝑠𝑒,𝑇
𝑇 →∞ 𝑇 →∞

with probability one


Inspection tells us that 𝑃 is exactly the transpose of 𝐴 ̂ under the assumption 𝑏 = 𝑑 = 0
Thus, the percentages of time that an infinitely lived worker spends employed and unem-
ployed equal the fractions of workers employed and unemployed in the steady state distribu-
tion

49.5.2 Convergence Rate

How long does it take for time series sample averages to converge to cross-sectional averages?
We can use QuantEcon.py’s MarkovChain class to investigate this
Let’s plot the path of the sample averages over 5,000 periods

In [9]: from quantecon import MarkovChain

lm = LakeModel(d=0, b=0)
T = 5000 # Simulation length

α, λ = lm.α, lm.λ

P = [[1 - λ, λ],
[ α, 1 - α]]

mc = MarkovChain(P)

xbar = lm.rate_steady_state()

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


s_path = mc.simulate(T, init=1)
s_bar_e = s_path.cumsum() / range(1, T+1)
s_bar_u = 1 - s_bar_e

to_plot = [s_bar_u, s_bar_e]


titles = ['Percent of time unemployed', 'Percent of time employed']

for i, plot in enumerate(to_plot):


axes[i].plot(plot, lw=2, alpha=0.5)
axes[i].hlines(xbar[i], 0, T, 'r', '--')
axes[i].set_title(titles[i])
axes[i].grid()

plt.tight_layout()
plt.show()
832 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

The stationary probabilities are given by the dashed red line


In this case it takes much of the sample for these two objects to converge
This is largely due to the high persistence in the Markov chain

49.6 Endogenous Job Finding Rate

We now make the hiring rate endogenous


The transition rate from unemployment to employment will be determined by the McCall
search model [94]
All details relevant to the following discussion can be found in our treatment of that model

49.6.1 Reservation Wage

The most important thing to remember about the model is that optimal decisions are charac-
terized by a reservation wage 𝑤̄

• If the wage offer 𝑤 in hand is greater than or equal to 𝑤,̄ then the worker accepts
• Otherwise, the worker rejects

As we saw in our discussion of the model, the reservation wage depends on the wage offer dis-
tribution and the parameters
49.6. ENDOGENOUS JOB FINDING RATE 833

• 𝛼, the separation rate


• 𝛽, the discount factor
• 𝛾, the offer arrival rate
• 𝑐, unemployment compensation

49.6.2 Linking the McCall Search Model to the Lake Model

Suppose that all workers inside a lake model behave according to the McCall search model
The exogenous probability of leaving employment remains 𝛼
But their optimal decision rules determine the probability 𝜆 of leaving unemployment
This is now

̄ = 𝛾 ∑ 𝑝(𝑤′ )
𝜆 = 𝛾P{𝑤𝑡 ≥ 𝑤} (1)
𝑤′ ≥𝑤̄

49.6.3 Fiscal Policy

We can use the McCall search version of the Lake Model to find an optimal level of unem-
ployment insurance
We assume that the government sets unemployment compensation 𝑐
The government imposes a lump-sum tax 𝜏 sufficient to finance total unemployment pay-
ments
To attain a balanced budget at a steady state, taxes, the steady state unemployment rate 𝑢,
and the unemployment compensation rate must satisfy

𝜏 = 𝑢𝑐

The lump-sum tax applies to everyone, including unemployed workers


Thus, the post-tax income of an employed worker with wage 𝑤 is 𝑤 − 𝜏
The post-tax income of an unemployed worker is 𝑐 − 𝜏
For each specification (𝑐, 𝜏 ) of government policy, we can solve for the worker’s optimal reser-
vation wage
This determines 𝜆 via Eq. (1) evaluated at post tax wages, which in turn determines a steady
state unemployment rate 𝑢(𝑐, 𝜏 )
For a given level of unemployment benefit 𝑐, we can solve for a tax that balances the budget
in the steady state

𝜏 = 𝑢(𝑐, 𝜏 )𝑐

To evaluate alternative government tax-unemployment compensation pairs, we require a wel-


fare criterion
We use a steady state welfare criterion
834 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

𝑊 ∶= 𝑒 E[𝑉 | employed] + 𝑢 𝑈

where the notation 𝑉 and 𝑈 is as defined in the McCall search model lecture
The wage offer distribution will be a discretized version of the lognormal distribution
𝐿𝑁 (log(20), 1), as shown in the next figure

We take a period to be a month


We set 𝑏 and 𝑑 to match monthly birth and death rates, respectively, in the U.S. population

• 𝑏 = 0.0124
• 𝑑 = 0.00822

Following [30], we set 𝛼, the hazard rate of leaving employment, to

• 𝛼 = 0.013

49.6.4 Fiscal Policy Code

We will make use of code we wrote in the McCall model lecture, embedded below for conve-
nience
The first piece of code, repeated below, implements value function iteration

In [10]: import numpy as np


from quantecon.distributions import BetaBinomial
from numba import jit

# A default utility function

@jit
def u(c, σ):
if c > 0:
return (c**(1 - σ) - 1) / (1 - σ)
else:
return -10e6

class McCallModel:
49.6. ENDOGENOUS JOB FINDING RATE 835

"""
Stores the parameters and functions associated with a given model.
"""

def __init__(self,
α=0.2, # Job separation rate
β=0.98, # Discount rate
γ=0.7, # Job offer rate
c=6.0, # Unemployment compensation
σ=2.0, # Utility parameter
w_vec=None, # Possible wage values
p_vec=None): # Probabilities over w_vec

self.α, self.β, self.γ, self.c = α, β, γ, c


self.σ = σ

# Add a default wage vector and probabilities over the vector using
# the beta-binomial distribution
if w_vec is None:
n = 60 # number of possible outcomes for wage
self.w_vec = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n-1, a, b)
self.p_vec = dist.pdf()
else:
self.w_vec = w_vec
self.p_vec = p_vec

@jit
def _update_bellman(α, β, γ, c, σ, w_vec, p_vec, V, V_new, U):
"""
A jitted function to update the Bellman equations. Note that V_new is
modified in place (i.e, modified by this function). The new value of U is
returned.

"""
for w_idx, w in enumerate(w_vec):
# w_idx indexes the vector of possible wages
V_new[w_idx] = u(w, σ) + β * ((1 - α) * V[w_idx] + α * U)

U_new = u(c, σ) + β * (1 - γ) * U + \
β * γ * np.sum(np.maximum(U, V) * p_vec)

return U_new

def solve_mccall_model(mcm, tol=1e-5, max_iter=2000):


"""
Iterates to convergence on the Bellman equations

Parameters
----------
mcm : an instance of McCallModel
tol : float
error tolerance
max_iter : int
the maximum number of iterations
"""

V = np.ones(len(mcm.w_vec)) # Initial guess of V


V_new = np.empty_like(V) # To store updates to V
U = 1 # Initial guess of U
i = 0
error = tol + 1

while error > tol and i < max_iter:


U_new = _update_bellman(mcm.α, mcm.β, mcm.γ,
mcm.c, mcm.σ, mcm.w_vec, mcm.p_vec, V, V_new, U)
error_1 = np.max(np.abs(V_new - V))
error_2 = np.abs(U_new - U)
error = max(error_1, error_2)
V[:] = V_new
U = U_new
836 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

i += 1

return V, U

The second piece of code repeated from the McCall model lecture is used to complete the
reservation wage

In [11]: def compute_reservation_wage(mcm, return_values=False):


"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that V(w) > U.

If V(w) > U for all w, then the reservation wage w_bar is set to
the lowest wage in mcm.w_vec.

If v(w) < U for all w, then w_bar is set to np.inf.

Parameters
----------
mcm : an instance of McCallModel
return_values : bool (optional, default=False)
Return the value functions as well

Returns
-------
w_bar : scalar
The reservation wage

"""

V, U = solve_mccall_model(mcm)
w_idx = np.searchsorted(V - U, 0)

if w_idx == len(V):
w_bar = np.inf
else:
w_bar = mcm.w_vec[w_idx]

if return_values == False:
return w_bar
else:
return w_bar, V, U

Now let’s compute and plot welfare, employment, unemployment, and tax revenue as a func-
tion of the unemployment compensation rate

In [12]: from scipy.stats import norm


from scipy.optimize import brentq

# Some global variables that will stay constant


α = 0.013
α_q = (1-(1-α)**3) # Quarterly (α is monthly)
b = 0.0124
d = 0.00822
β = 0.98
γ = 1.0
σ = 2.0

# The default wage distribution --- a discretized lognormal


log_wage_mean, wage_grid_size, max_wage = 20, 200, 170
logw_dist = norm(np.log(log_wage_mean), 1)
w_vec = np.linspace(1e-8, max_wage, wage_grid_size + 1)
cdf = logw_dist.cdf(np.log(w_vec))
pdf = cdf[1:] - cdf[:-1]
p_vec = pdf / pdf.sum()
w_vec = (w_vec[1:] + w_vec[:-1]) / 2
49.6. ENDOGENOUS JOB FINDING RATE 837

def compute_optimal_quantities(c, τ):


"""
Compute the reservation wage, job finding rate and value functions of the
workers given c and τ.

"""

mcm = McCallModel(α=α_q,
β=β,
γ=γ,
c=c-τ, # post tax compensation
σ=σ,
w_vec=w_vec-τ, # post tax wages
p_vec=p_vec)

w_bar, V, U = compute_reservation_wage(mcm, return_values=True)


λ = γ * np.sum(p_vec[w_vec - τ > w_bar])
return w_bar, λ, V, U

def compute_steady_state_quantities(c, τ):


"""
Compute the steady state unemployment rate given c and τ using optimal
quantities from the McCall model and computing corresponding steady state
quantities

"""
w_bar, λ, V, U = compute_optimal_quantities(c, τ)

# Compute steady state employment and unemployment rates


lm = LakeModel(α=α_q, λ=λ, b=b, d=d)
x = lm.rate_steady_state()
u, e = x

# Compute steady state welfare


w = np.sum(V * p_vec * (w_vec - τ > w_bar)) / np.sum(p_vec * (w_vec -
τ > w_bar))
welfare = e * w + u * U

return e, u, welfare

def find_balanced_budget_tax(c):
"""
Find the tax level that will induce a balanced budget.

"""
def steady_state_budget(t):
e, u, w = compute_steady_state_quantities(c, t)
return t - u * c

τ = brentq(steady_state_budget, 0.0, 0.9 * c)


return τ

# Levels of unemployment insurance we wish to study


c_vec = np.linspace(5, 140, 60)

tax_vec = []
unempl_vec = []
empl_vec = []
welfare_vec = []

for c in c_vec:
t = find_balanced_budget_tax(c)
e_rate, u_rate, welfare = compute_steady_state_quantities(c, t)
tax_vec.append(t)
unempl_vec.append(u_rate)
empl_vec.append(e_rate)
welfare_vec.append(welfare)

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

plots = [unempl_vec, empl_vec, tax_vec, welfare_vec]


838 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

titles = ['Unemployment', 'Employment', 'Tax', 'Welfare']

for ax, plot, title in zip(axes.flatten(), plots, titles):


ax.plot(c_vec, plot, lw=2, alpha=0.7)
ax.set_title(title)
ax.grid()

plt.tight_layout()
plt.show()

Welfare first increases and then decreases as unemployment benefits rise


The level that maximizes steady state welfare is approximately 62

49.7 Exercises

49.7.1 Exercise 1

Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization

• 𝛼 = 0.013
• 𝜆 = 0.283
• 𝑏 = 0.0124
• 𝑑 = 0.00822
49.8. SOLUTIONS 839

(The values for 𝛼 and 𝜆 follow [30])


Suppose that in response to new legislation the hiring rate reduces to 𝜆 = 0.2
Plot the transition dynamics of the unemployment and employment stocks for 50 periods
Plot the transition dynamics for the rates
How long does the economy take to converge to its new steady state?
What is the new steady state level of employment?

49.7.2 Exercise 2

Consider an economy with an initial stock of workers 𝑁0 = 100 at the steady state level of
employment in the baseline parameterization
Suppose that for 20 periods the birth rate was temporarily high (𝑏 = 0.0025) and then re-
turned to its original level
Plot the transition dynamics of the unemployment and employment stocks for 50 periods
Plot the transition dynamics for the rates
How long does the economy take to return to its original steady state?

49.8 Solutions

49.9 Lake Model Solutions

49.9.1 Exercise 1

We begin by constructing the class containing the default parameters and assigning the
steady state values to x0

In [13]: lm = LakeModel()
x0 = lm.rate_steady_state()
print(f"Initial Steady State: {x0}")

Initial Steady State: [0.08266806 0.91733194]

Initialize the simulation values

In [14]: N0 = 100
T = 50

New legislation changes 𝜆 to 0.2

In [15]: lm.lmda = 0.2

xbar = lm.rate_steady_state() # new steady state


X_path = np.vstack(lm.simulate_stock_path(x0 * N0, T))
x_path = np.vstack(lm.simulate_rate_path(x0, T))
print(f"New Steady State: {xbar}")
840 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

New Steady State: [0.08266806 0.91733194]

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: FutureWarning: arrays to stack must


after removing the cwd from sys.path.
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: FutureWarning: arrays to stack must
"""

Now plot stocks

In [16]: fig, axes = plt.subplots(3, 1, figsize=[10, 9])

axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()

And how the rates evolve


49.9. LAKE MODEL SOLUTIONS 841

In [17]: fig, axes = plt.subplots(2, 1, figsize=(10, 8))

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i])
axes[i].hlines(xbar[i], 0, T, 'r', '--')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()

We see that it takes 20 periods for the economy to converge to its new steady state levels

49.9.2 Exercise 2

This next exercise has the economy experiencing a boom in entrances to the labor market and
then later returning to the original levels
For 20 periods the economy has a new entry rate into the labor market
Let’s start off at the baseline parameterization and record the steady state

In [18]: lm = LakeModel()
x0 = lm.rate_steady_state()

Here are the other parameters:


842 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT

In [19]: b_hat = 0.003


T_hat = 20

Let’s increase 𝑏 to the new value and simulate for 20 periods

In [20]: lm.b = b_hat


X_path1 = np.vstack(lm.simulate_stock_path(x0 * N0, T_hat)) # simulate stocks
x_path1 = np.vstack(lm.simulate_rate_path(x0, T_hat)) # simulate rates

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: arrays to stack must

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: arrays to stack must


This is separate from the ipykernel package so we can avoid doing imports until

Now we reset 𝑏 to the original value and then, using the state after 20 periods for the new
initial conditions, we simulate for the additional 30 periods

In [21]: lm.b = 0.0124


X_path2 = np.vstack(lm.simulate_stock_path(X_path1[-1, :2], T-T_hat+1)) # simulate stocks
x_path2 = np.vstack(lm.simulate_rate_path(x_path1[-1, :2], T-T_hat+1)) # simulate rates

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: FutureWarning: arrays to stack must

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: arrays to stack must


This is separate from the ipykernel package so we can avoid doing imports until

Finally, we combine these two paths and plot

In [22]: x_path = np.vstack([x_path1, x_path2[1:]]) # note [1:] to avoid doubling period 20


X_path = np.vstack([X_path1, X_path2[1:]])

fig, axes = plt.subplots(3, 1, figsize=[10, 9])

axes[0].plot(X_path[:, 0])
axes[0].set_title('Unemployment')

axes[1].plot(X_path[:, 1])
axes[1].set_title('Employment')

axes[2].plot(X_path.sum(1))
axes[2].set_title('Labor force')

for ax in axes:
ax.grid()

plt.tight_layout()
plt.show()
49.9. LAKE MODEL SOLUTIONS 843

And the rates

In [23]: fig, axes = plt.subplots(2, 1, figsize=[10, 6])

titles = ['Unemployment rate', 'Employment rate']

for i, title in enumerate(titles):


axes[i].plot(x_path[:, i])
axes[i].hlines(x0[i], 0, T, 'r', '--')
axes[i].set_title(title)
axes[i].grid()

plt.tight_layout()
plt.show()
844 49. A LAKE MODEL OF EMPLOYMENT AND UNEMPLOYMENT
50

Rational Expectations Equilibrium

50.1 Contents

• Overview 50.2

• Defining Rational Expectations Equilibrium 50.3

• Computation of an Equilibrium 50.4

• Exercises 50.5

• Solutions 50.6

“If you’re so smart, why aren’t you rich?”

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

50.2 Overview

This lecture introduces the concept of rational expectations equilibrium


To illustrate it, we describe a linear quadratic version of a famous and important model due
to Lucas and Prescott [89]
This 1971 paper is one of a small number of research articles that kicked off the rational ex-
pectations revolution
We follow Lucas and Prescott by employing a setting that is readily “Bellmanized” (i.e., capa-
ble of being formulated in terms of dynamic programming problems)
Because we use linear quadratic setups for demand and costs, we can adapt the LQ program-
ming techniques described in this lecture
We will learn about how a representative agent’s problem differs from a planner’s, and how a
planning problem can be used to compute rational expectations quantities
We will also learn about how a rational expectations equilibrium can be characterized as a
fixed point of a mapping from a perceived law of motion to an actual law of motion

845
846 50. RATIONAL EXPECTATIONS EQUILIBRIUM

Equality between a perceived and an actual law of motion for endogenous market-wide ob-
jects captures in a nutshell what the rational expectations equilibrium concept is all about
Finally, we will learn about the important “Big 𝐾, little 𝑘” trick, a modeling device widely
used in macroeconomics
Except that for us

• Instead of “Big 𝐾” it will be “Big 𝑌 ”


• Instead of “little 𝑘” it will be “little 𝑦”

50.2.1 The Big Y, Little y Trick

This widely used method applies in contexts in which a “representative firm” or agent is a
“price taker” operating within a competitive equilibrium
We want to impose that

• The representative firm or individual takes aggregate 𝑌 as given when it chooses indi-
vidual 𝑦, but …
• At the end of the day, 𝑌 = 𝑦, so that the representative firm is indeed representative

The Big 𝑌 , little 𝑦 trick accomplishes these two goals by

• Taking 𝑌 as beyond control when posing the choice problem of who chooses 𝑦; but …
• Imposing 𝑌 = 𝑦 after having solved the individual’s optimization problem

Please watch for how this strategy is applied as the lecture unfolds
We begin by applying the Big 𝑌 , little 𝑦 trick in a very simple static context
A Simple Static Example of the Big Y, Little y Trick
Consider a static model in which a collection of 𝑛 firms produce a homogeneous good that is
sold in a competitive market
Each of these 𝑛 firms sell output 𝑦
The price 𝑝 of the good lies on an inverse demand curve

𝑝 = 𝑎 0 − 𝑎1 𝑌 (1)

where

• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌 = 𝑛𝑦 is the market-wide level of output

Each firm has a total cost function

𝑐(𝑦) = 𝑐1 𝑦 + 0.5𝑐2 𝑦2 , 𝑐𝑖 > 0 for 𝑖 = 1, 2

The profits of a representative firm are 𝑝𝑦 − 𝑐(𝑦)


50.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 847

Using Eq. (1), we can express the problem of the representative firm as

max[(𝑎0 − 𝑎1 𝑌 )𝑦 − 𝑐1 𝑦 − 0.5𝑐2 𝑦2 ] (2)


𝑦

In posing problem Eq. (2), we want the firm to be a price taker


We do that by regarding 𝑝 and therefore 𝑌 as exogenous to the firm
The essence of the Big 𝑌 , little 𝑦 trick is not to set 𝑌 = 𝑛𝑦 before taking the first-order condi-
tion with respect to 𝑦 in problem Eq. (2)
This assures that the firm is a price taker
The first-order condition for problem Eq. (2) is

𝑎0 − 𝑎 1 𝑌 − 𝑐 1 − 𝑐 2 𝑦 = 0 (3)

At this point, but not before, we substitute 𝑌 = 𝑛𝑦 into Eq. (3) to obtain the following linear
equation

𝑎0 − 𝑐1 − (𝑎1 + 𝑛−1 𝑐2 )𝑌 = 0 (4)

to be solved for the competitive equilibrium market-wide output 𝑌


After solving for 𝑌 , we can compute the competitive equilibrium price 𝑝 from the inverse de-
mand curve Eq. (1)

50.2.2 Further Reading

References for this lecture include

• [89]
• [118], chapter XIV
• [87], chapter 7

50.3 Defining Rational Expectations Equilibrium

Our first illustration of a rational expectations equilibrium involves a market with 𝑛 firms,
each of which seeks to maximize the discounted present value of profits in the face of adjust-
ment costs
The adjustment costs induce the firms to make gradual adjustments, which in turn requires
consideration of future prices
Individual firms understand that, via the inverse demand curve, the price is determined by
the amounts supplied by other firms
Hence each firm wants to forecast future total industry supplies
In our context, a forecast is generated by a belief about the law of motion for the aggregate
state
848 50. RATIONAL EXPECTATIONS EQUILIBRIUM

Rational expectations equilibrium prevails when this belief coincides with the actual law of
motion generated by production choices induced by this belief
We formulate a rational expectations equilibrium in terms of a fixed point of an operator that
maps beliefs into optimal beliefs

50.3.1 Competitive Equilibrium with Adjustment Costs

To illustrate, consider a collection of 𝑛 firms producing a homogeneous good that is sold in a


competitive market
Each of these 𝑛 firms sell output 𝑦𝑡
The price 𝑝𝑡 of the good lies on the inverse demand curve

𝑝𝑡 = 𝑎0 − 𝑎1 𝑌𝑡 (5)

where

• 𝑎𝑖 > 0 for 𝑖 = 0, 1
• 𝑌𝑡 = 𝑛𝑦𝑡 is the market-wide level of output

The Firm’s Problem


Each firm is a price taker
While it faces no uncertainty, it does face adjustment costs
In particular, it chooses a production plan to maximize


∑ 𝛽 𝑡 𝑟𝑡 (6)
𝑡=0

where

𝛾(𝑦𝑡+1 − 𝑦𝑡 )2
𝑟𝑡 ∶= 𝑝𝑡 𝑦𝑡 − , 𝑦0 given (7)
2

Regarding the parameters,

• 𝛽 ∈ (0, 1) is a discount factor


• 𝛾 > 0 measures the cost of adjusting the rate of output

Regarding timing, the firm observes 𝑝𝑡 and 𝑦𝑡 when it chooses 𝑦𝑡+1 at time 𝑡
To state the firm’s optimization problem completely requires that we specify dynamics for all
state variables
This includes ones that the firm cares about but does not control like 𝑝𝑡
We turn to this problem now
Prices and Aggregate Output
50.3. DEFINING RATIONAL EXPECTATIONS EQUILIBRIUM 849

In view of Eq. (5), the firm’s incentive to forecast the market price translates into an incen-
tive to forecast aggregate output 𝑌𝑡
Aggregate output depends on the choices of other firms
We assume that 𝑛 is such a large number that the output of any single firm has a negligible
effect on aggregate output
That justifies firms in regarding their forecasts of aggregate output as being unaffected by
their own output decisions
The Firm’s Beliefs
We suppose the firm believes that market-wide output 𝑌𝑡 follows the law of motion

𝑌𝑡+1 = 𝐻(𝑌𝑡 ) (8)

where 𝑌0 is a known initial condition


The belief function 𝐻 is an equilibrium object, and hence remains to be determined
Optimal Behavior Given Beliefs
For now, let’s fix a particular belief 𝐻 in Eq. (8) and investigate the firm’s response to it
Let 𝑣 be the optimal value function for the firm’s problem given 𝐻
The value function satisfies the Bellman equation

𝛾(𝑦′ − 𝑦)2
𝑣(𝑦, 𝑌 ) = max {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (9)
𝑦′ 2

Let’s denote the firm’s optimal policy function by ℎ, so that

𝑦𝑡+1 = ℎ(𝑦𝑡 , 𝑌𝑡 ) (10)

where

𝛾(𝑦′ − 𝑦)2
ℎ(𝑦, 𝑌 ) ∶=𝑦′ {𝑎0 𝑦 − 𝑎1 𝑦𝑌 − + 𝛽𝑣(𝑦′ , 𝐻(𝑌 ))} (11)
2

Evidently 𝑣 and ℎ both depend on 𝐻


A First-Order Characterization
In what follows it will be helpful to have a second characterization of ℎ, based on first-order
conditions
The first-order necessary condition for choosing 𝑦′ is

−𝛾(𝑦′ − 𝑦) + 𝛽𝑣𝑦 (𝑦′ , 𝐻(𝑌 )) = 0 (12)

An important useful envelope result of Benveniste-Scheinkman [13] implies that to differenti-


ate 𝑣 with respect to 𝑦 we can naively differentiate the right side of Eq. (9), giving

𝑣𝑦 (𝑦, 𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑦′ − 𝑦)
850 50. RATIONAL EXPECTATIONS EQUILIBRIUM

Substituting this equation into Eq. (12) gives the Euler equation

−𝛾(𝑦𝑡+1 − 𝑦𝑡 ) + 𝛽[𝑎0 − 𝑎1 𝑌𝑡+1 + 𝛾(𝑦𝑡+2 − 𝑦𝑡+1 )] = 0 (13)

The firm optimally sets an output path that satisfies Eq. (13), taking Eq. (8) as given, and
subject to

• the initial conditions for (𝑦0 , 𝑌0 )


• the terminal condition lim𝑡→∞ 𝛽 𝑡 𝑦𝑡 𝑣𝑦 (𝑦𝑡 , 𝑌𝑡 ) = 0

This last condition is called the transversality condition, and acts as a first-order necessary
condition “at infinity”
The firm’s decision rule solves the difference equation Eq. (13) subject to the given initial
condition 𝑦0 and the transversality condition
Note that solving the Bellman equation Eq. (9) for 𝑣 and then ℎ in Eq. (11) yields a decision
rule that automatically imposes both the Euler equation Eq. (13) and the transversality con-
dition
The Actual Law of Motion for Output
As we’ve seen, a given belief translates into a particular decision rule ℎ
Recalling that 𝑌𝑡 = 𝑛𝑦𝑡 , the actual law of motion for market-wide output is then

𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) (14)

Thus, when firms believe that the law of motion for market-wide output is Eq. (8), their opti-
mizing behavior makes the actual law of motion be Eq. (14)

50.3.2 Definition of Rational Expectations Equilibrium

A rational expectations equilibrium or recursive competitive equilibrium of the model with ad-
justment costs is a decision rule ℎ and an aggregate law of motion 𝐻 such that

1. Given belief 𝐻, the map ℎ is the firm’s optimal policy function


2. The law of motion 𝐻 satisfies 𝐻(𝑌 ) = 𝑛ℎ(𝑌 /𝑛, 𝑌 ) for all 𝑌

Thus, a rational expectations equilibrium equates the perceived and actual laws of motion
Eq. (8) and Eq. (14)
Fixed Point Characterization
As we’ve seen, the firm’s optimum problem induces a mapping Φ from a perceived law of mo-
tion 𝐻 for market-wide output to an actual law of motion Φ(𝐻)
The mapping Φ is the composition of two operations, taking a perceived law of motion into a
decision rule via Eq. (9)–Eq. (11), and a decision rule into an actual law via Eq. (14)
The 𝐻 component of a rational expectations equilibrium is a fixed point of Φ
50.4. COMPUTATION OF AN EQUILIBRIUM 851

50.4 Computation of an Equilibrium

Now let’s consider the problem of computing the rational expectations equilibrium

50.4.1 Failure of Contractivity

Readers accustomed to dynamic programming arguments might try to address this problem
by choosing some guess 𝐻0 for the aggregate law of motion and then iterating with Φ
Unfortunately, the mapping Φ is not a contraction
In particular, there is no guarantee that direct iterations on Φ converge [1]
Fortunately, there is another method that works here
The method exploits a general connection between equilibrium and Pareto optimality ex-
pressed in the fundamental theorems of welfare economics (see, e.g, [93])
Lucas and Prescott [89] used this method to construct a rational expectations equilibrium
The details follow

50.4.2 A Planning Problem Approach

Our plan of attack is to match the Euler equations of the market problem with those for a
single-agent choice problem
As we’ll see, this planning problem can be solved by LQ control (linear regulator)
The optimal quantities from the planning problem are rational expectations equilibrium
quantities
The rational expectations equilibrium price can be obtained as a shadow price in the planning
problem
For convenience, in this section, we set 𝑛 = 1
We first compute a sum of consumer and producer surplus at time 𝑡

𝑌𝑡
𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑠(𝑌𝑡 , 𝑌𝑡+1 ) ∶= ∫ (𝑎0 − 𝑎1 𝑥) 𝑑𝑥 − (15)
0 2

The first term is the area under the demand curve, while the second measures the social costs
of changing output
The planning problem is to choose a production plan {𝑌𝑡 } to maximize


∑ 𝛽 𝑡 𝑠(𝑌𝑡 , 𝑌𝑡+1 )
𝑡=0

subject to an initial condition for 𝑌0

50.4.3 Solution of the Planning Problem

Evaluating the integral in Eq. (15) yields the quadratic form 𝑎0 𝑌𝑡 − 𝑎1 𝑌𝑡2 /2
852 50. RATIONAL EXPECTATIONS EQUILIBRIUM

As a result, the Bellman equation for the planning problem is

𝑎1 2 𝛾(𝑌 ′ − 𝑌 )2
𝑉 (𝑌 ) = max {𝑎0 𝑌 − 𝑌 − + 𝛽𝑉 (𝑌 ′ )} (16)
𝑌′ 2 2

The associated first-order condition is

−𝛾(𝑌 ′ − 𝑌 ) + 𝛽𝑉 ′ (𝑌 ′ ) = 0 (17)

Applying the same Benveniste-Scheinkman formula gives

𝑉 ′ (𝑌 ) = 𝑎0 − 𝑎1 𝑌 + 𝛾(𝑌 ′ − 𝑌 )

Substituting this into equation Eq. (17) and rearranging leads to the Euler equation

𝛽𝑎0 + 𝛾𝑌𝑡 − [𝛽𝑎1 + 𝛾(1 + 𝛽)]𝑌𝑡+1 + 𝛾𝛽𝑌𝑡+2 = 0 (18)

50.4.4 The Key Insight

Return to equation Eq. (13) and set 𝑦𝑡 = 𝑌𝑡 for all 𝑡


(Recall that for this section we’ve set 𝑛 = 1 to simplify the calculations)
A small amount of algebra will convince you that when 𝑦𝑡 = 𝑌𝑡 , equations Eq. (18) and
Eq. (13) are identical
Thus, the Euler equation for the planning problem matches the second-order difference equa-
tion that we derived by

1. finding the Euler equation of the representative firm and


2. substituting into it the expression 𝑌𝑡 = 𝑛𝑦𝑡 that “makes the representative firm be rep-
resentative”

If it is appropriate to apply the same terminal conditions for these two difference equations,
which it is, then we have verified that a solution of the planning problem is also a rational
expectations equilibrium quantity sequence
It follows that for this example we can compute equilibrium quantities by forming the optimal
linear regulator problem corresponding to the Bellman equation Eq. (16)
The optimal policy function for the planning problem is the aggregate law of motion 𝐻 that
the representative firm faces within a rational expectations equilibrium
Structure of the Law of Motion
As you are asked to show in the exercises, the fact that the planner’s problem is an LQ prob-
lem implies an optimal policy — and hence aggregate law of motion — taking the form

𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡 (19)

for some parameter pair 𝜅0 , 𝜅1


50.5. EXERCISES 853

Now that we know the aggregate law of motion is linear, we can see from the firm’s Bellman
equation Eq. (9) that the firm’s problem can also be framed as an LQ problem
As you’re asked to show in the exercises, the LQ formulation of the firm’s problem implies a
law of motion that looks as follows

𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 (20)

Hence a rational expectations equilibrium will be defined by the parameters (𝜅0 , 𝜅1 , ℎ0 , ℎ1 , ℎ2 )


in Eq. (19)–Eq. (20)

50.5 Exercises

50.5.1 Exercise 1

Consider the firm problem described above


Let the firm’s belief function 𝐻 be as given in Eq. (19)
Formulate the firm’s problem as a discounted optimal linear regulator problem, being careful
to describe all of the objects needed
Use the class LQ from the QuantEcon.py package to solve the firm’s problem for the following
parameter values:

𝑎0 = 100, 𝑎1 = 0.05, 𝛽 = 0.95, 𝛾 = 10, 𝜅0 = 95.5, 𝜅1 = 0.95

Express the solution of the firm’s problem in the form Eq. (20) and give the values for each
ℎ𝑗
If there were 𝑛 identical competitive firms all behaving according to Eq. (20), what would
Eq. (20) imply for the actual law of motion Eq. (8) for market supply

50.5.2 Exercise 2

Consider the following 𝜅0 , 𝜅1 pairs as candidates for the aggregate law of motion component
of a rational expectations equilibrium (see Eq. (19))
Extending the program that you wrote for exercise 1, determine which if any satisfy the defi-
nition of a rational expectations equilibrium

• (94.0886298678, 0.923409232937)
• (93.2119845412, 0.984323478873)
• (95.0818452486, 0.952459076301)

Describe an iterative algorithm that uses the program that you wrote for exercise 1 to com-
pute a rational expectations equilibrium
(You are not being asked actually to use the algorithm you are suggesting)
854 50. RATIONAL EXPECTATIONS EQUILIBRIUM

50.5.3 Exercise 3

Recall the planner’s problem described above

1. Formulate the planner’s problem as an LQ problem


2. Solve it using the same parameter values in exercise 1

• 𝑎0 = 100, 𝑎1 = 0.05, 𝛽 = 0.95, 𝛾 = 10

1. Represent the solution in the form 𝑌𝑡+1 = 𝜅0 + 𝜅1 𝑌𝑡


2. Compare your answer with the results from exercise 2

50.5.4 Exercise 4

A monopolist faces the industry demand curve Eq. (5) and chooses {𝑌𝑡 } to maximize

∑𝑡=0 𝛽 𝑡 𝑟𝑡 where

𝛾(𝑌𝑡+1 − 𝑌𝑡 )2
𝑟𝑡 = 𝑝𝑡 𝑌𝑡 −
2

Formulate this problem as an LQ problem


Compute the optimal policy using the same parameters as the previous exercise
In particular, solve for the parameters in

𝑌𝑡+1 = 𝑚0 + 𝑚1 𝑌𝑡

Compare your results with the previous exercise – comment

50.6 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

We’ll use the LQ class from quantecon

In [3]: from quantecon import LQ

50.6.1 Exercise 1

To map a problem into a discounted optimal linear control problem, we need to define

• state vector 𝑥𝑡 and control vector 𝑢𝑡


• matrices 𝐴, 𝐵, 𝑄, 𝑅 that define preferences and the law of motion for the state
50.6. SOLUTIONS 855

For the state and control vectors, we choose

𝑦𝑡
𝑥𝑡 = ⎡ 𝑌 ⎤
⎢ 𝑡⎥ , 𝑢𝑡 = 𝑦𝑡+1 − 𝑦𝑡
⎣1⎦
For 𝐵, 𝑄, 𝑅 we set

1 0 0 1 0 𝑎1 /2 −𝑎0 /2
𝐴 = ⎢0 𝜅1 𝜅0 ⎤

⎥, 𝐵 = ⎢0⎤

⎥,

𝑅 = ⎢ 𝑎1 /2 0 0 ⎤ ⎥, 𝑄 = 𝛾/2
⎣0 0 1 ⎦ ⎣0⎦ ⎣−𝑎0 /2 0 0 ⎦

By multiplying out you can confirm that

• 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = −𝑟𝑡


• 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡

We’ll use the module lqcontrol.py to solve the firm’s problem at the stated parameter
values
This will return an LQ policy 𝐹 with the interpretation 𝑢𝑡 = −𝐹 𝑥𝑡 , or

𝑦𝑡+1 − 𝑦𝑡 = −𝐹0 𝑦𝑡 − 𝐹1 𝑌𝑡 − 𝐹2

Matching parameters with 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡 leads to

ℎ0 = −𝐹2 , ℎ1 = 1 − 𝐹0 , ℎ2 = −𝐹1

Here’s our solution

In [4]: # == Model parameters == #

a0 = 100
a1 = 0.05
β = 0.95
γ = 10.0

# == Beliefs == #

κ0 = 95.5
κ1 = 0.95

# == Formulate the LQ problem == #

A = np.array([[1, 0, 0], [0, κ1, κ0], [0, 0, 1]])


B = np.array([1, 0, 0])
B.shape = 3, 1
R = np.array([[0, a1/2, -a0/2], [a1/2, 0, 0], [-a0/2, 0, 0]])
Q = 0.5 * γ

# == Solve for the optimal policy == #

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
out1 = f"F = [{F[0]:.3f}, {F[1]:.3f}, {F[2]:.3f}]"
h0, h1, h2 = -F[2], 1 - F[0], -F[1]
out2 = f"(h0, h1, h2) = ({h0:.3f}, {h1:.3f}, {h2:.3f})"

print(out1)
print(out2)
856 50. RATIONAL EXPECTATIONS EQUILIBRIUM

F = [-0.000, 0.046, -96.949]


(h0, h1, h2) = (96.949, 1.000, -0.046)

The implication is that

𝑦𝑡+1 = 96.949 + 𝑦𝑡 − 0.046 𝑌𝑡

For the case 𝑛 > 1, recall that 𝑌𝑡 = 𝑛𝑦𝑡 , which, combined with the previous equation, yields

𝑌𝑡+1 = 𝑛 (96.949 + 𝑦𝑡 − 0.046 𝑌𝑡 ) = 𝑛96.949 + (1 − 𝑛0.046)𝑌𝑡

50.6.2 Exercise 2

To determine whether a 𝜅0 , 𝜅1 pair forms the aggregate law of motion component of a ratio-
nal expectations equilibrium, we can proceed as follows:

• Determine the corresponding firm law of motion 𝑦𝑡+1 = ℎ0 + ℎ1 𝑦𝑡 + ℎ2 𝑌𝑡


• Test whether the associated aggregate law :𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) evaluates to 𝑌𝑡+1 =
𝜅0 + 𝜅1 𝑌𝑡

In the second step, we can use 𝑌𝑡 = 𝑛𝑦𝑡 = 𝑦𝑡 , so that 𝑌𝑡+1 = 𝑛ℎ(𝑌𝑡 /𝑛, 𝑌𝑡 ) becomes

𝑌𝑡+1 = ℎ(𝑌𝑡 , 𝑌𝑡 ) = ℎ0 + (ℎ1 + ℎ2 )𝑌𝑡

Hence to test the second step we can test 𝜅0 = ℎ0 and 𝜅1 = ℎ1 + ℎ2


The following code implements this test

In [5]: candidates = ((94.0886298678, 0.923409232937),


(93.2119845412, 0.984323478873),
(95.0818452486, 0.952459076301))

for κ0, κ1 in candidates:

# == Form the associated law of motion == #


A = np.array([[1, 0, 0], [0, κ1, κ0], [0, 0, 1]])

# == Solve the LQ problem for the firm == #


lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()
F = F.flatten()
h0, h1, h2 = -F[2], 1 - F[0], -F[1]

# == Test the equilibrium condition == #


if np.allclose((κ0, κ1), (h0, h1 + h2)):
print(f'Equilibrium pair = {κ0}, {κ1}')
print('f(h0, h1, h2) = {h0}, {h1}, {h2}')
break

Equilibrium pair = 95.0818452486, 0.952459076301


f(h0, h1, h2) = {h0}, {h1}, {h2}

The output tells us that the answer is pair (iii), which implies (ℎ0 , ℎ1 , ℎ2 ) =
(95.0819, 1.0000, −.0475)
50.6. SOLUTIONS 857

(Notice we use np.allclose to test equality of floating-point numbers, since exact equality
is too strict)
Regarding the iterative algorithm, one could loop from a given (𝜅0 , 𝜅1 ) pair to the associated
firm law and then to a new (𝜅0 , 𝜅1 ) pair
This amounts to implementing the operator Φ described in the lecture
(There is in general no guarantee that this iterative process will converge to a rational expec-
tations equilibrium)

50.6.3 Exercise 3

We are asked to write the planner problem as an LQ problem


For the state and control vectors, we choose

𝑌
𝑥𝑡 = [ 𝑡 ] , 𝑢𝑡 = 𝑌𝑡+1 − 𝑌𝑡
1

For the LQ matrices, we set

1 0 1 𝑎1 /2 −𝑎0 /2
𝐴=[ ], 𝐵 = [ ], 𝑅=[ ], 𝑄 = 𝛾/2
0 1 0 −𝑎0 /2 0

By multiplying out you can confirm that

• 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 = −𝑠(𝑌𝑡 , 𝑌𝑡+1 )


• 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡

By obtaining the optimal policy and using 𝑢𝑡 = −𝐹 𝑥𝑡 or

𝑌𝑡+1 − 𝑌𝑡 = −𝐹0 𝑌𝑡 − 𝐹1

we can obtain the implied aggregate law of motion via 𝜅0 = −𝐹1 and 𝜅1 = 1 − 𝐹0
The Python code to solve this problem is below:

In [6]: # == Formulate the planner's LQ problem == #

A = np.array([[1, 0], [0, 1]])


B = np.array([[1], [0]])
R = np.array([[a1 / 2, -a0 / 2], [-a0 / 2, 0]])
Q = γ / 2

# == Solve for the optimal policy == #

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()

# == Print the results == #

F = F.flatten()
κ0, κ1 = -F[1], 1 - F[0]
print(κ0, κ1)

95.08187459215002 0.9524590627039248
858 50. RATIONAL EXPECTATIONS EQUILIBRIUM

The output yields the same (𝜅0 , 𝜅1 ) pair obtained as an equilibrium from the previous exer-
cise

50.6.4 Exercise 4

The monopolist’s LQ problem is almost identical to the planner’s problem from the previous
exercise, except that

𝑎1 −𝑎0 /2
𝑅=[ ]
−𝑎0 /2 0

The problem can be solved as follows

In [7]: A = np.array([[1, 0], [0, 1]])


B = np.array([[1], [0]])
R = np.array([[a1, -a0 / 2], [-a0 / 2, 0]])
Q = γ / 2

lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values()

F = F.flatten()
m0, m1 = -F[1], 1 - F[0]
print(m0, m1)

73.47294403502818 0.9265270559649701

We see that the law of motion for the monopolist is approximately 𝑌𝑡+1 = 73.4729 + 0.9265𝑌𝑡
In the rational expectations case, the law of motion was approximately 𝑌𝑡+1 = 95.0818 +
0.9525𝑌𝑡
One way to compare these two laws of motion is by their fixed points, which give long-run
equilibrium output in each case
For laws of the form 𝑌𝑡+1 = 𝑐0 + 𝑐1 𝑌𝑡 , the fixed point is 𝑐0 /(1 − 𝑐1 )
If you crunch the numbers, you will see that the monopolist adopts a lower long-run quantity
than obtained by the competitive market, implying a higher market price
This is analogous to the elementary static-case results
Footnotes
[1] A literature that studies whether models populated with agents who learn can converge to
rational expectations equilibria features iterations on a modification of the mapping Φ that
can be approximated as 𝛾Φ + (1 − 𝛾)𝐼. Here 𝐼 is the identity operator and 𝛾 ∈ (0, 1) is a
relaxation parameter. See [91] and [41] for statements and applications of this approach to
establish conditions under which collections of adaptive agents who use least squares learning
to converge to a rational expectations equilibrium.
51

Markov Perfect Equilibrium

51.1 Contents

• Overview 51.2

• Background 51.3

• Linear Markov Perfect Equilibria 51.4

• Application 51.5

• Exercises 51.6

• Solutions 51.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

51.2 Overview

This lecture describes the concept of Markov perfect equilibrium


Markov perfect equilibrium is a key notion for analyzing economic problems involving dy-
namic strategic interaction, and a cornerstone of applied game theory
In this lecture, we teach Markov perfect equilibrium by example
We will focus on settings with

• two players
• quadratic payoff functions
• linear transition rules for the state

Other references include chapter 7 of [87]

859
860 51. MARKOV PERFECT EQUILIBRIUM

51.3 Background

Markov perfect equilibrium is a refinement of the concept of Nash equilibrium


It is used to study settings where multiple decision-makers interact non-cooperatively over
time, each seeking to pursue its own objective
The agents in the model face a common state vector, the time path of which is influenced by
– and influences – their decisions
In particular, the transition law for the state that confronts each agent is affected by decision
rules of other agents
Individual payoff maximization requires that each agent solve a dynamic programming prob-
lem that includes this transition law
Markov perfect equilibrium prevails when no agent wishes to revise its policy, taking as given
the policies of all other agents
Well known examples include

• Choice of price, output, location or capacity for firms in an industry (e.g., [40], [113],
[36])
• Rate of extraction from a shared natural resource, such as a fishery (e.g., [86], [130])

Let’s examine a model of the first type

51.3.1 Example: A Duopoly Model

Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function

𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (1)

Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0
In Eq. (1) and what follows,

• the time subscript is suppressed when possible to simplify notation


• 𝑥̂ denotes a next period value of variable 𝑥

Each firm recognizes that its output affects total output and therefore the market price
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:

𝜋𝑖 = 𝑝𝑞𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , 𝛾 > 0, (2)

Substituting the inverse demand curve Eq. (1) into Eq. (2) lets us express the one-period pay-
off as

𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) = 𝑎0 𝑞𝑖 − 𝑎1 𝑞𝑖2 − 𝑎1 𝑞𝑖 𝑞−𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , (3)


51.4. LINEAR MARKOV PERFECT EQUILIBRIA 861

where 𝑞−𝑖 denotes the output of the firm other than 𝑖



The objective of the firm is to maximize ∑𝑡=0 𝛽 𝑡 𝜋𝑖𝑡
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current
state (𝑞𝑖 , 𝑞−𝑖 )
An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule
of the other firm as known and given
Given 𝑓−𝑖 , the Bellman equation of firm 𝑖 is

𝑣𝑖 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (4)
𝑞𝑖̂

Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions
(𝑣1 , 𝑣2 ) and a pair of policy functions (𝑓1 , 𝑓2 ) such that, for each 𝑖 ∈ {1, 2} and each possible
state,

• The value function 𝑣𝑖 satisfies the Bellman equation Eq. (4)


• The maximizer on the right side of Eq. (4) is equal to 𝑓𝑖 (𝑞𝑖 , 𝑞−𝑖 )

The adjective “Markov” denotes that the equilibrium decision rules depend only on the cur-
rent values of the state variables, not other parts of their histories
“Perfect” means complete, in the sense that the equilibrium is constructed by backward in-
duction and hence builds in optimizing behavior for each firm at all possible future states

• These include many states that will not be reached when we iterate forward
on the pair of equilibrium strategies 𝑓𝑖 starting from a given initial state

51.3.2 Computation

One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs
of Bellman equations and decision rules
In particular, let 𝑣𝑖𝑗 , 𝑓𝑖𝑗 be the value function and policy function for firm 𝑖 at the 𝑗-th itera-
tion
Imagine constructing the iterates

𝑣𝑖𝑗+1 (𝑞𝑖 , 𝑞−𝑖 ) = max {𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) + 𝛽𝑣𝑖𝑗 (𝑞𝑖̂ , 𝑓−𝑖 (𝑞−𝑖 , 𝑞𝑖 ))} (5)
𝑞𝑖̂

These iterations can be challenging to implement computationally


However, they simplify for the case in which the one-period payoff functions are quadratic
and the transition laws are linear — which takes us to our next topic

51.4 Linear Markov Perfect Equilibria

As we saw in the duopoly example, the study of Markov perfect equilibria in games with two
players leads us to an interrelated pair of Bellman equations
862 51. MARKOV PERFECT EQUILIBRIUM

In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure
We’ll lay out that structure in a general setup and then apply it to some simple problems

51.4.1 Coupled Linear Regulator Problems

We consider a general linear-quadratic regulator game with two players


For convenience, we’ll start with a finite horizon formulation, where 𝑡0 is the initial date and
𝑡1 is the common terminal date
Player 𝑖 takes {𝑢−𝑖𝑡 } as given and minimizes

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 } (6)
𝑡=𝑡0

while the state evolves according to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 (7)

Here

• 𝑥𝑡 is an 𝑛 × 1 state vector and 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖


• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖

51.4.2 Computing Equilibrium

We formulate a linear Markov perfect equilibrium as follows


Player 𝑖 employs linear decision rules 𝑢𝑖𝑡 = −𝐹𝑖𝑡 𝑥𝑡 , where 𝐹𝑖𝑡 is a 𝑘𝑖 × 𝑛 matrix
A Markov perfect equilibrium is a pair of sequences {𝐹1𝑡 , 𝐹2𝑡 } over 𝑡 = 𝑡0 , … , 𝑡1 − 1 such that

• {𝐹1𝑡 } solves player 1’s problem, taking {𝐹2𝑡 } as given, and


• {𝐹2𝑡 } solves player 2’s problem, taking {𝐹1𝑡 } as given

If we take 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 and substitute it into Eq. (6) and Eq. (7), then player 1’s problem
becomes minimization of

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 } (8)
𝑡=𝑡0

subject to
51.4. LINEAR MARKOV PERFECT EQUILIBRIA 863

𝑥𝑡+1 = Λ1𝑡 𝑥𝑡 + 𝐵1 𝑢1𝑡 , (9)

where

• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡



• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡

This is an LQ dynamic programming problem that can be solved by working backwards


The policy rule that solves this problem is

𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) (10)

where 𝑃1𝑡 solves the matrix Riccati difference equation

𝑃1𝑡 = Π1𝑡 − (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 )′ (𝑄1 + 𝛽𝐵1′ 𝑃1𝑡+1 𝐵1 )−1 (𝛽𝐵1′ 𝑃1𝑡+1 Λ1𝑡 + Γ1𝑡 ) + 𝛽Λ′1𝑡 𝑃1𝑡+1 Λ1𝑡
(11)
Similarly, the policy that solves player 2’s problem is

𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) (12)

where 𝑃2𝑡 solves

𝑃2𝑡 = Π2𝑡 − (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 )′ (𝑄2 + 𝛽𝐵2′ 𝑃2𝑡+1 𝐵2 )−1 (𝛽𝐵2′ 𝑃2𝑡+1 Λ2𝑡 + Γ2𝑡 ) + 𝛽Λ′2𝑡 𝑃2𝑡+1 Λ2𝑡
(13)
Here in all cases 𝑡 = 𝑡0 , … , 𝑡1 − 1 and the terminal conditions are 𝑃𝑖𝑡1 = 0
The solution procedure is to use equations Eq. (10), Eq. (11), Eq. (12), and Eq. (13), and
“work backwards” from time 𝑡1 − 1
Since we’re working backward, 𝑃1𝑡+1 and 𝑃2𝑡+1 are taken as given at each stage
Moreover, since

• some terms on the right-hand side of Eq. (10) contain 𝐹2𝑡


• some terms on the right-hand side of Eq. (12) contain 𝐹1𝑡

we need to solve these 𝑘1 + 𝑘2 equations simultaneously


Key Insight
A key insight is that equations Eq. (10) and Eq. (12) are linear in 𝐹1𝑡 and 𝐹2𝑡
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in Eq. (11) and
Eq. (13)
Infinite Horizon
864 51. MARKOV PERFECT EQUILIBRIUM

We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞
This is the approach we adopt in the next section

51.4.3 Implementation

We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium
of the infinite horizon linear-quadratic dynamic game in the manner described above

51.5 Application

Let’s use these procedures to treat some applications, starting with the duopoly model

51.5.1 A Duopoly Model

To map the duopoly model into coupled linear-quadratic dynamic programming problems,
define the state and controls as

1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦

If we write

𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡

where 𝑄1 = 𝑄2 = 𝛾,

0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2


𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦

then we recover the one-period payoffs in expression Eq. (3)


The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where

1 0 0 0 0
𝐴 ∶= ⎢0 1 0⎤

⎥, 𝐵1 ∶= ⎢1⎤

⎥, 𝐵2 ∶= ⎢0⎤


⎣ 0 0 1 ⎦ 0
⎣ ⎦ 1
⎣ ⎦

The optimal decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following
closed-loop system for the evolution of 𝑥 in the Markov perfect equilibrium:

𝑥𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵1 𝐹2 )𝑥𝑡 (14)


51.5. APPLICATION 865

51.5.2 Parameters and Solution

Consider the previously presented duopoly model with parameter values of:

• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12

From these, we compute the infinite horizon MPE using the preceding code

In [2]: """

@authors: Chase Coleman, Thomas Sargent, John Stachurski

"""
import numpy as np
import quantecon as qe

# == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0

# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])

R1 = [[ 0., -a0 / 2, 0.],


[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., -a0 / 2],


[ 0., 0., a1 / 2.],
[-a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)

# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")

Computed policies for firm 1 and firm 2:

F1 = [[-0.66846615 0.29512482 0.07584666]]


F2 = [[-0.66846615 0.07584666 0.29512482]]

Running the code produces the following output


One way to see that 𝐹𝑖 is indeed optimal for firm 𝑖 taking 𝐹2 as given is to use QuantE-
con.py’s LQ class
866 51. MARKOV PERFECT EQUILIBRIUM

In particular, let’s take F2 as computed above, plug it into Eq. (8) and Eq. (9) to get firm 1’s
problem and solve it using LQ
We hope that the resulting policy will agree with F1 as computed above

In [3]: Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
F1_ih

Out[3]: array([[-0.66846613, 0.29512482, 0.07584666]])

This is close enough for rock and roll, as they say in the trade
Indeed, np.allclose agrees with our assessment

In [4]: np.allclose(F1, F1_ih)

Out[4]: True

51.5.3 Dynamics

Let’s now investigate the dynamics of price and output in this simple duopoly model under
the MPE policies
Given our optimal policies 𝐹 1 and 𝐹 2, the state evolves according to Eq. (14)
The following program

• imports 𝐹 1 and 𝐹 2 from the previous program along with all parameters
• computes the evolution of 𝑥𝑡 using Eq. (14)
• extracts and plots industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡

In [5]: import matplotlib.pyplot as plt

AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE

fig, ax = plt.subplots(figsize=(9, 5.8))


ax.plot(q, 'b-', lw=2, alpha=0.75, label='total output')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='price')
ax.set_title('Output and prices, duopoly MPE')
ax.legend(frameon=False)
plt.show()

<Figure size 900x580 with 1 Axes>

Note that the initial condition has been set to 𝑞10 = 𝑞20 = 1.0
To gain some perspective we can compare this to what happens in the monopoly case
51.6. EXERCISES 867

The first panel in the next figure compares output of the monopolist and industry output un-
der the MPE, as a function of time
The second panel shows analogous curves for price

Here parameters are the same as above for both the MPE and monopoly solutions
The monopolist initial condition is 𝑞0 = 2.0 to mimic the industry initial condition 𝑞10 =
𝑞20 = 1.0 in the MPE case
As expected, output is higher and prices are lower under duopoly than monopoly

51.6 Exercises

51.6.1 Exercise 1

Replicate the pair of figures showing the comparison of output and prices for the monopolist
and duopoly under MPE
Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies
under duopoly
The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class

51.6.2 Exercise 2

In this exercise, we consider a slightly more sophisticated duopoly problem


868 51. MARKOV PERFECT EQUILIBRIUM

It takes the form of infinite horizon linear-quadratic game proposed by Judd [72]
Two firms set prices and quantities of two goods interrelated through their demand curves
Relevant variables are defined as follows:

• 𝐼𝑖𝑡 = inventories of firm 𝑖 at beginning of 𝑡


• 𝑞𝑖𝑡 = production of firm 𝑖 during period 𝑡
• 𝑝𝑖𝑡 = price charged by firm 𝑖 during period 𝑡
• 𝑆𝑖𝑡 = sales made by firm 𝑖 during period 𝑡
• 𝐸𝑖𝑡 = costs of production of firm 𝑖 during period 𝑡
• 𝐶𝑖𝑡 = costs of carrying inventories for firm 𝑖 during 𝑡

The firms’ cost functions are

2
• 𝐶𝑖𝑡 = 𝑐𝑖1 + 𝑐𝑖2 𝐼𝑖𝑡 + 0.5𝑐𝑖3 𝐼𝑖𝑡
2
• 𝐸𝑖𝑡 = 𝑒𝑖1 + 𝑒𝑖2 𝑞𝑖𝑡 + 0.5𝑒𝑖3 𝑞𝑖𝑡 where 𝑒𝑖𝑗 , 𝑐𝑖𝑗 are positive scalars

Inventories obey the laws of motion

𝐼𝑖,𝑡+1 = (1 − 𝛿)𝐼𝑖𝑡 + 𝑞𝑖𝑡 − 𝑆𝑖𝑡

Demand is governed by the linear schedule

𝑆𝑡 = 𝐷𝑝𝑖𝑡 + 𝑏

where


• 𝑆𝑡 = [𝑆1𝑡 𝑆2𝑡 ]
• 𝐷 is a 2 × 2 negative definite matrix and
• 𝑏 is a vector of constants

Firm 𝑖 maximizes the undiscounted sum

1 𝑇
lim ∑ (𝑝 𝑆 − 𝐸𝑖𝑡 − 𝐶𝑖𝑡 )
𝑇 →∞ 𝑇 𝑡=0 𝑖𝑡 𝑖𝑡

We can convert this to a linear-quadratic problem by taking

𝐼1𝑡
𝑝
𝑢𝑖𝑡 = [ 𝑖𝑡 ] and ⎡
𝑥𝑡 = ⎢𝐼2𝑡 ⎤

𝑞𝑖𝑡
⎣1⎦

Decision rules for price and quantity take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡
The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices
appropriately
The exercise is to calculate these matrices and compute the following figures
The first figure shows the dynamics of inventories for each firm when the parameters are
51.7. SOLUTIONS 869

In [6]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])

Inventories trend to a common steady state


If we increase the depreciation rate to 𝛿 = 0.05, then we expect steady state inventories to fall
This is indeed the case, as the next figure shows

51.7 Solutions

51.7.1 Exercise 1

First, let’s compute the duopoly MPE under the stated parameters

In [7]: # == Parameters == #
a0 = 10.0
a1 = 2.0
870 51. MARKOV PERFECT EQUILIBRIUM

β = 0.96
γ = 12.0

# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])
R1 = [[ 0., -a0/2, 0.],
[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., -a0 / 2],


[ 0., 0., a1 / 2.],
[-a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)

Now we evaluate the time path of industry output and prices given initial condition 𝑞10 =
𝑞20 = 1

In [8]: AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE

Next, let’s have a look at the monopoly solution


For the state and control, we take

𝑥𝑡 = 𝑞𝑡 − 𝑞 ̄ and 𝑢𝑡 = 𝑞𝑡+1 − 𝑞𝑡

To convert to an LQ problem we set

𝑅 = 𝑎1 and 𝑄=𝛾

in the payoff function 𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 and

𝐴=𝐵=1

in the law of motion 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡


We solve for the optimal policy 𝑢𝑡 = −𝐹 𝑥𝑡 and track the resulting dynamics of {𝑞𝑡 }, starting
at 𝑞0 = 2.0

In [9]: R = a1
Q = γ
A = B = 1
lq_alt = qe.LQ(Q, R, A, B, beta=β)
P, F, d = lq_alt.stationary_values()
51.7. SOLUTIONS 871

q_bar = a0 / (2.0 * a1)


qm = np.empty(n)
qm[0] = 2
x0 = qm[0] - q_bar
x = x0
for i in range(1, n):
x = A * x - B * F * x
qm[i] = float(x) + q_bar
pm = a0 - a1 * qm

Let’s have a look at the different time paths

In [10]: fig, axes = plt.subplots(2, 1, figsize=(9, 9))

ax = axes[0]
ax.plot(qm, 'b-', lw=2, alpha=0.75, label='monopolist output')
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE total output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)

ax = axes[1]
ax.plot(pm, 'b-', lw=2, alpha=0.75, label='monopolist price')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()
872 51. MARKOV PERFECT EQUILIBRIUM

51.7.2 Exercise 2

We treat the case 𝛿 = 0.02

In [11]: δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])

δ_1 = 1 - δ

Recalling that the control and state are

𝐼1𝑡
𝑝
𝑢𝑖𝑡 = [ 𝑖𝑡 ] and 𝑥𝑡 = ⎢𝐼2𝑡 ⎤


𝑞𝑖𝑡
⎣1⎦
we set up the matrices as follows:

In [12]: # == Create matrices needed to compute the Nash feedback equilibrium == #

A = np.array([[δ_1, 0, -δ_1 * b[0]],


[ 0, δ_1, -δ_1 * b[1]],
[ 0, 0, 1]])

B1 = δ_1 * np.array([[1, -D[0, 0]],


[0, -D[1, 0]],
[0, 0]])
B2 = δ_1 * np.array([[0, -D[0, 1]],
[1, -D[1, 1]],
[0, 0]])

R1 = -np.array([[0.5 * c1[2], 0, 0.5 * c1[1]],


[ 0, 0, 0],
[0.5 * c1[1], 0, c1[0]]])
R2 = -np.array([[0, 0, 0],
[0, 0.5 * c2[2], 0.5 * c2[1]],
[0, 0.5 * c2[1], c2[0]]])

Q1 = np.array([[-0.5 * e1[2], 0], [0, D[0, 0]]])


Q2 = np.array([[-0.5 * e2[2], 0], [0, D[1, 1]]])

S1 = np.zeros((2, 2))
S2 = np.copy(S1)

W1 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e1[1], b[0] / 2.]])
W2 = np.array([[ 0, 0],
[ 0, 0],
[-0.5 * e2[1], b[1] / 2.]])

M1 = np.array([[0, 0], [0, D[0, 1] / 2.]])


M2 = np.copy(M1)

We can now compute the equilibrium using qe.nnash

In [13]: F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1,


R2, Q1, Q2, S1,
S2, W1, W2, M1, M2)

print("\nFirm 1's feedback rule:\n")


print(F1)

print("\nFirm 2's feedback rule:\n")


print(F2)
51.7. SOLUTIONS 873

Firm 1's feedback rule:

[[ 2.43666582e-01 2.72360627e-02 -6.82788293e+00]


[ 3.92370734e-01 1.39696451e-01 -3.77341073e+01]]

Firm 2's feedback rule:

[[ 2.72360627e-02 2.43666582e-01 -6.82788293e+00]


[ 1.39696451e-01 3.92370734e-01 -3.77341073e+01]]

Now let’s look at the dynamics of inventories, and reproduce the graph corresponding to 𝛿 =
0.02

In [14]: AF = A - B1 @ F1 - B2 @ F2
n = 25
x = np.empty((3, n))
x[:, 0] = 2, 0, 1
for t in range(n-1):
x[:, t+1] = AF @ x[:, t]
I1 = x[0, :]
I2 = x[1, :]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(I1, 'b-', lw=2, alpha=0.75, label='inventories, firm 1')
ax.plot(I2, 'g-', lw=2, alpha=0.75, label='inventories, firm 2')
ax.set_title(rf'$\delta = {δ}$')
ax.legend()
plt.show()
874 51. MARKOV PERFECT EQUILIBRIUM
52

Robust Markov Perfect Equilibrium

52.1 Contents

• Overview 52.2
• Linear Markov Perfect Equilibria with Robust Agents 52.3
• Application 52.4

Co-author: Dongchen Zou


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

52.2 Overview

This lecture describes a Markov perfect equilibrium with robust agents


We focus on special settings with

• two players
• quadratic payoff functions
• linear transition rules for the state vector

These specifications simplify calculations and allow us to give a simple example that illus-
trates basic forces
This lecture is based on ideas described in chapter 15 of [52] and in Markov perfect equilib-
rium and Robustness

52.2.1 Basic Setup

Decisions of two agents affect the motion of a state vector that appears as an argument of
payoff functions of both agents
As described in Markov perfect equilibrium, when decision-makers have no concerns about
the robustness of their decision rules to misspecifications of the state dynamics, a Markov
perfect equilibrium can be computed via backward recursion on two sets of equations

875
876 52. ROBUST MARKOV PERFECT EQUILIBRIUM

• a pair of Bellman equations, one for each agent


• a pair of equations that express linear decision rules for each agent as func-
tions of that agent’s continuation value function as well as parameters of
preferences and state transition matrices

This lecture shows how a similar equilibrium concept and similar computational procedures
apply when we impute concerns about robustness to both decision-makers
A Markov perfect equilibrium with robust agents will be characterized by

• a pair of Bellman equations, one for each agent


• a pair of equations that express linear decision rules for each agent as func-
tions of that agent’s continuation value function as well as parameters of
preferences and state transition matrices
• a pair of equations that express linear decision rules for worst-case shocks for
each agent as functions of that agent’s continuation value function as well as
parameters of preferences and state transition matrices

Below, we’ll construct a robust firms version of the classic duopoly model with adjustment
costs analyzed in Markov perfect equilibrium

52.3 Linear Markov Perfect Equilibria with Robust Agents

As we saw in Markov perfect equilibrium, the study of Markov perfect equilibria in dynamic
games with two players leads us to an interrelated pair of Bellman equations
In linear quadratic dynamic games, these “stacked Bellman equations” become “stacked Ric-
cati equations” with a tractable mathematical structure

52.3.1 Modified Coupled Linear Regulator Problems

We consider a general linear quadratic regulator game with two players, each of whom fears
model misspecifications
We often call the players agents
The agents share a common baseline model for the transition dynamics of the state vector

• this is a counterpart of a ‘rational expectations’ assumption of shared beliefs

But now one or more agents doubt that the baseline model is correctly specified
The agents express the possibility that their baseline specification is incorrect by adding a
contribution 𝐶𝑣𝑖𝑡 to the time 𝑡 transition law for the state

• 𝐶 is the usual volatility matrix that appears in stochastic versions of optimal


linear regulator problems
• 𝑣𝑖𝑡 is a possibly history-dependent vector of distortions to the dynamics of
the state that agent 𝑖 uses to represent misspecification of the original model
52.3. LINEAR MARKOV PERFECT EQUILIBRIA WITH ROBUST AGENTS 877

For convenience, we’ll start with a finite horizon formulation, where 𝑡0 is the initial date and
𝑡1 is the common terminal date
Player 𝑖 takes a sequence {𝑢−𝑖𝑡 } as given and chooses a sequence {𝑢𝑖𝑡 } to minimize and {𝑣𝑖𝑡 }
to maximize

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 − 𝜃𝑖 𝑣𝑖𝑡

𝑣𝑖𝑡 } (1)
𝑡=𝑡0

while thinking that the state evolves according to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 + 𝐶𝑣𝑖𝑡 (2)

Here

• 𝑥𝑡 is an 𝑛 × 1 state vector, 𝑢𝑖𝑡 is a 𝑘𝑖 × 1 vector of controls for player 𝑖, and


• 𝑣𝑖𝑡 is an ℎ × 1 vector of distortions to the state dynamics that concern player 𝑖
• 𝑅𝑖 is 𝑛 × 𝑛
• 𝑆𝑖 is 𝑘−𝑖 × 𝑘−𝑖
• 𝑄𝑖 is 𝑘𝑖 × 𝑘𝑖
• 𝑊𝑖 is 𝑛 × 𝑘𝑖
• 𝑀𝑖 is 𝑘−𝑖 × 𝑘𝑖
• 𝐴 is 𝑛 × 𝑛
• 𝐵𝑖 is 𝑛 × 𝑘𝑖
• 𝐶 is 𝑛 × ℎ
• 𝜃𝑖 ∈ [𝜃𝑖 , +∞] is a scalar multiplier parameter of player 𝑖

If 𝜃𝑖 = +∞, player 𝑖 completely trusts the baseline model


If 𝜃𝑖 <∞ , player 𝑖 suspects that some other unspecified model actually governs the transition
dynamics

The term 𝜃𝑖 𝑣𝑖𝑡 𝑣𝑖𝑡 is a time 𝑡 contribution to an entropy penalty that an (imaginary) loss-
maximizing agent inside agent 𝑖’s mind charges for distorting the law of motion in a way that
harms agent 𝑖

• the imaginary loss-maximizing agent helps the loss-minimizing agent by help-


ing him construct bounds on the behavior of his decision rule over a large set
of alternative models of state transition dynamics

52.3.2 Computing Equilibrium

We formulate a linear robust Markov perfect equilibrium as follows


Player 𝑖 employs linear decision rules 𝑢𝑖𝑡 = −𝐹𝑖𝑡 𝑥𝑡 , where 𝐹𝑖𝑡 is a 𝑘𝑖 × 𝑛 matrix
Player 𝑖’s malevolent alter ego employs decision rules 𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡 where 𝐾𝑖𝑡 is an ℎ × 𝑛 ma-
trix
A robust Markov perfect equilibrium is a pair of sequences {𝐹1𝑡 , 𝐹2𝑡 } and a pair of sequences
{𝐾1𝑡 , 𝐾2𝑡 } over 𝑡 = 𝑡0 , … , 𝑡1 − 1 that satisfy
878 52. ROBUST MARKOV PERFECT EQUILIBRIUM

• {𝐹1𝑡 , 𝐾1𝑡 } solves player 1’s robust decision problem, taking {𝐹2𝑡 } as given, and
• {𝐹2𝑡 , 𝐾2𝑡 } solves player 2’s robust decision problem, taking {𝐹1𝑡 } as given

If we substitute 𝑢2𝑡 = −𝐹2𝑡 𝑥𝑡 into Eq. (1) and Eq. (2), then player 1’s problem becomes
minimization-maximization of

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 Π1𝑡 𝑥𝑡 + 𝑢′1𝑡 𝑄1 𝑢1𝑡 + 2𝑢′1𝑡 Γ1𝑡 𝑥𝑡 − 𝜃1 𝑣1𝑡

𝑣1𝑡 } (3)
𝑡=𝑡0

subject to

𝑥𝑡+1 = Λ1𝑡 𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐶𝑣1𝑡 (4)

where

• Λ𝑖𝑡 ∶= 𝐴 − 𝐵−𝑖 𝐹−𝑖𝑡



• Π𝑖𝑡 ∶= 𝑅𝑖 + 𝐹−𝑖𝑡 𝑆𝑖 𝐹−𝑖𝑡
• Γ𝑖𝑡 ∶= 𝑊𝑖 − 𝑀𝑖′ 𝐹−𝑖𝑡

This is an LQ robust dynamic programming problem of the type studied in the Robustness
lecture, which can be solved by working backward
Maximization with respect to distortion 𝑣1𝑡 leads to the following version of the 𝒟 operator
from the Robustness lecture, namely

𝒟1 (𝑃 ) ∶= 𝑃 + 𝑃 𝐶(𝜃1 𝐼 − 𝐶 ′ 𝑃 𝐶)−1 𝐶 ′ 𝑃 (5)

The matrix 𝐹1𝑡 in the policy rule 𝑢1𝑡 = −𝐹1𝑡 𝑥𝑡 that solves agent 1’s problem satisfies

𝐹1𝑡 = (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) (6)

where 𝑃1𝑡 solves the matrix Riccati difference equation

𝑃1𝑡 = Π1𝑡 − (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 )′ (𝑄1 + 𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )𝐵1 )−1 (𝛽𝐵1′ 𝒟1 (𝑃1𝑡+1 )Λ1𝑡 + Γ1𝑡 ) + 𝛽Λ′1𝑡 𝒟1 (𝑃1𝑡+1 )Λ1𝑡
(7)
Similarly, the policy that solves player 2’s problem is

𝐹2𝑡 = (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) (8)

where 𝑃2𝑡 solves

𝑃2𝑡 = Π2𝑡 − (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 )′ (𝑄2 + 𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )𝐵2 )−1 (𝛽𝐵2′ 𝒟2 (𝑃2𝑡+1 )Λ2𝑡 + Γ2𝑡 ) + 𝛽Λ′2𝑡 𝒟2 (𝑃2𝑡+1 )Λ2𝑡
(9)
Here in all cases 𝑡 = 𝑡0 , … , 𝑡1 − 1 and the terminal conditions are 𝑃𝑖𝑡1 = 0
52.3. LINEAR MARKOV PERFECT EQUILIBRIA WITH ROBUST AGENTS 879

The solution procedure is to use equations Eq. (6), Eq. (7), Eq. (8), and Eq. (9), and “work
backwards” from time 𝑡1 − 1
Since we’re working backwards, 𝑃1𝑡+1 and 𝑃2𝑡+1 are taken as given at each stage
Moreover, since

• some terms on the right-hand side of Eq. (6) contain 𝐹2𝑡


• some terms on the right-hand side of Eq. (8) contain 𝐹1𝑡

we need to solve these 𝑘1 + 𝑘2 equations simultaneously

52.3.3 Key Insight

As in Markov perfect equilibrium, a key insight here is that equations Eq. (6) and Eq. (8) are
linear in 𝐹1𝑡 and 𝐹2𝑡
After these equations have been solved, we can take 𝐹𝑖𝑡 and solve for 𝑃𝑖𝑡 in Eq. (7) and
Eq. (9)
Notice how 𝑗’s control law 𝐹𝑗𝑡 is a function of {𝐹𝑖𝑠 , 𝑠 ≥ 𝑡, 𝑖 ≠ 𝑗}
Thus, agent 𝑖’s choice of {𝐹𝑖𝑡 ; 𝑡 = 𝑡0 , … , 𝑡1 − 1} influences agent 𝑗’s choice of control laws
However, in the Markov perfect equilibrium of this game, each agent is assumed to ignore the
influence that his choice exerts on the other agent’s choice
After these equations have been solved, we can also deduce associated sequences of worst-case
shocks

52.3.4 Worst-case Shocks

For agent 𝑖 the maximizing or worst-case shock 𝑣𝑖𝑡 is

𝑣𝑖𝑡 = 𝐾𝑖𝑡 𝑥𝑡

where

𝐾𝑖𝑡 = 𝜃𝑖−1 (𝐼 − 𝜃𝑖−1 𝐶 ′ 𝑃𝑖,𝑡+1 𝐶)−1 𝐶 ′ 𝑃𝑖,𝑡+1 (𝐴 − 𝐵1 𝐹𝑖𝑡 − 𝐵2 𝐹2𝑡 )

52.3.5 Infinite Horizon

We often want to compute the solutions of such games for infinite horizons, in the hope that
the decision rules 𝐹𝑖𝑡 settle down to be time-invariant as 𝑡1 → +∞
In practice, we usually fix 𝑡1 and compute the equilibrium of an infinite horizon game by driv-
ing 𝑡0 → −∞
This is the approach we adopt in the next section
880 52. ROBUST MARKOV PERFECT EQUILIBRIUM

52.3.6 Implementation

We use the function nnash_robust to compute a Markov perfect equilibrium of the infinite
horizon linear quadratic dynamic game with robust planers in the manner described above

52.4 Application

52.4.1 A Duopoly Model

Without concerns for robustness, the model is identical to the duopoly model from the
Markov perfect equilibrium lecture
To begin, we briefly review the structure of that model
Two firms are the only producers of a good the demand for which is governed by a linear in-
verse demand function

𝑝 = 𝑎0 − 𝑎1 (𝑞1 + 𝑞2 ) (10)

Here 𝑝 = 𝑝𝑡 is the price of the good, 𝑞𝑖 = 𝑞𝑖𝑡 is the output of firm 𝑖 = 1, 2 at time 𝑡 and
𝑎0 > 0, 𝑎1 > 0
In Eq. (10) and what follows,

• the time subscript is suppressed when possible to simplify notation


• 𝑥̂ denotes a next period value of variable 𝑥

Each firm recognizes that its output affects total output and therefore the market price
The one-period payoff function of firm 𝑖 is price times quantity minus adjustment costs:

𝜋𝑖 = 𝑝𝑞𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , 𝛾 > 0, (11)

Substituting the inverse demand curve Eq. (10) into Eq. (11) lets us express the one-period
payoff as

𝜋𝑖 (𝑞𝑖 , 𝑞−𝑖 , 𝑞𝑖̂ ) = 𝑎0 𝑞𝑖 − 𝑎1 𝑞𝑖2 − 𝑎1 𝑞𝑖 𝑞−𝑖 − 𝛾(𝑞𝑖̂ − 𝑞𝑖 )2 , (12)

where 𝑞−𝑖 denotes the output of the firm other than 𝑖



The objective of the firm is to maximize ∑𝑡=0 𝛽 𝑡 𝜋𝑖𝑡
Firm 𝑖 chooses a decision rule that sets next period quantity 𝑞𝑖̂ as a function 𝑓𝑖 of the current
state (𝑞𝑖 , 𝑞−𝑖 )
This completes our review of the duopoly model without concerns for robustness
Now we activate robustness concerns of both firms
To map a robust version of the duopoly model into coupled robust linear-quadratic dynamic
programming problems, we again define the state and controls as
52.4. APPLICATION 881

1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦

If we write

𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡

where 𝑄1 = 𝑄2 = 𝛾,

0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡−
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥ and 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2


𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦

then we recover the one-period payoffs Eq. (11) for the two firms in the duopoly model
The law of motion for the state 𝑥𝑡 is 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵1 𝑢1𝑡 + 𝐵2 𝑢2𝑡 where

1 0 0 0 0
𝐴 ∶= ⎡ ⎤
⎢0 1 0 ⎥ , 𝐵1 ∶= ⎡ ⎤
⎢1⎥ , 𝐵2 ∶= ⎡
⎢0⎥

⎣0 0 1 ⎦ ⎣0⎦ ⎣1⎦

A robust decision rule of firm 𝑖 will take the form 𝑢𝑖𝑡 = −𝐹𝑖 𝑥𝑡 , inducing the following closed-
loop system for the evolution of 𝑥 in the Markov perfect equilibrium:

𝑥𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵1 𝐹2 )𝑥𝑡 (13)

52.4.2 Parameters and Solution

Consider the duopoly model with parameter values of:

• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12

From these, we computed the infinite horizon MPE without robustness using the code

In [2]: """

@authors: Chase Coleman, Thomas Sargent, John Stachurski

"""
import numpy as np
import quantecon as qe

# == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0

# == In LQ form == #
882 52. ROBUST MARKOV PERFECT EQUILIBRIUM

A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])

R1 = [[ 0., -a0 / 2, 0.],


[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., -a0 / 2],


[ 0., 0., a1 / 2.],
[-a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)

# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")

Computed policies for firm 1 and firm 2:

F1 = [[-0.66846615 0.29512482 0.07584666]]


F2 = [[-0.66846615 0.07584666 0.29512482]]

Markov Perfect Equilibrium with Robustness


We add robustness concerns to the Markov Perfect Equilibrium model by extending the func-
tion qe.nnash (link) into a robustness version by adding the maximization operator 𝒟(𝑃 )
into the backward induction
The MPE with robustness function is nnash_robust
The function’s code is as follows

In [3]: from scipy.linalg import solve


import matplotlib.pyplot as plt
%matplotlib inline

def nnash_robust(A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2,
θ1, θ2, beta=1.0, tol=1e-8, max_iter=1000):
r"""
Compute the limit of a Nash linear quadratic dynamic game with
robustness concern.

In this problem, player i minimizes


.. math::
\sum_{t=0}^{\infty}
\left\{
x_t' r_i x_t + 2 x_t' w_i
u_{it} +u_{it}' q_i u_{it} + u_{jt}' s_i u_{jt} + 2 u_{jt}'
m_i u_{it}
\right\}
subject to the law of motion
.. math::
x_{it+1} = A x_t + b_1 u_{1t} + b_2 u_{2t} + C w_{it+1}
and a perceived control law :math:`u_j(t) = - f_j x_t` for the other
player.
52.4. APPLICATION 883

The player i also concerns about the model misspecification,


and maximizes
.. math::
\sum_{t=0}^{\infty}
\left\{
\beta^{t+1} \theta_{i} w_{it+1}'w_{it+1}
\right\}

The solution computed in this routine is the :math:`f_i` and


:math:`P_i` of the associated double optimal linear regulator
problem.

Parameters
----------
A : scalar(float) or array_like(float)
Corresponds to the MPE equations, should be of size (n, n)
C : scalar(float) or array_like(float)
As above, size (n, c), c is the size of w
B1 : scalar(float) or array_like(float)
As above, size (n, k_1)
B2 : scalar(float) or array_like(float)
As above, size (n, k_2)
R1 : scalar(float) or array_like(float)
As above, size (n, n)
R2 : scalar(float) or array_like(float)
As above, size (n, n)
Q1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
Q2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
S1 : scalar(float) or array_like(float)
As above, size (k_1, k_1)
S2 : scalar(float) or array_like(float)
As above, size (k_2, k_2)
W1 : scalar(float) or array_like(float)
As above, size (n, k_1)
W2 : scalar(float) or array_like(float)
As above, size (n, k_2)
M1 : scalar(float) or array_like(float)
As above, size (k_2, k_1)
M2 : scalar(float) or array_like(float)
As above, size (k_1, k_2)
θ1 : scalar(float)
Robustness parameter of player 1
θ2 : scalar(float)
Robustness parameter of player 2
beta : scalar(float), optional(default=1.0)
Discount factor
tol : scalar(float), optional(default=1e-8)
This is the tolerance level for convergence
max_iter : scalar(int), optional(default=1000)
This is the maximum number of iterations allowed

Returns
-------
F1 : array_like, dtype=float, shape=(k_1, n)
Feedback law for agent 1
F2 : array_like, dtype=float, shape=(k_2, n)
Feedback law for agent 2
P1 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 1
P2 : array_like, dtype=float, shape=(n, n)
The steady-state solution to the associated discrete matrix
Riccati equation for agent 2
"""
# == Unload parameters and make sure everything is a matrix == #
params = A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2
params = map(np.asmatrix, params)
A, C, B1, B2, R1, R2, Q1, Q2, S1, S2, W1, W2, M1, M2 = params

# == Multiply A, B1, B2 by sqrt(β) to enforce discounting == #


884 52. ROBUST MARKOV PERFECT EQUILIBRIUM

A, B1, B2 = [np.sqrt(β) * x for x in (A, B1, B2)]

# == Initial values == #
n = A.shape[0]
k_1 = B1.shape[1]
k_2 = B2.shape[1]

v1 = np.eye(k_1)
v2 = np.eye(k_2)
P1 = np.eye(n) * 1e-5
P2 = np.eye(n) * 1e-5
F1 = np.random.randn(k_1, n)
F2 = np.random.randn(k_2, n)

for it in range(max_iter):
# update
F10 = F1
F20 = F2

I = np.eye(C.shape[1])

# D1(P1)
# Note: INV1 may not be solved if the matrix is singular
INV1 = solve(θ1 * I - C.T @ P1 @ C, I)
D1P1 = P1 + P1 @ C @ INV1 @ C.T @ P1

# D2(P2)
# Note: INV2 may not be solved if the matrix is singular
INV2 = solve(θ2 * I - C.T @ P2 @ C, I)
D2P2 = P2 + P2 @ C @ INV2 @ C.T @ P2

G2 = solve(Q2 + B2.T @ D2P2 @ B2, v2)


G1 = solve(Q1 + B1.T @ D1P1 @ B1, v1)
H2 = G2 @ B2.T @ D2P2
H1 = G1 @ B1.T @ D1P1

# break up the computation of F1, F2


F1_left = v1 - (H1 @ B2 + G1 @ M1.T) @ (H2 @ B1 + G2 @ M2.T)
F1_right = H1 @ A + G1 @ W1.T - \
(H1 @ B2 + G1 @ M1.T) @ (H2 @ A + G2 @ W2.T)
F1 = solve(F1_left, F1_right)
F2 = H2 @ A + G2 @ W2.T - (H2 @ B1 + G2 @ M2.T) @ F1

Λ1 = A - B2 @ F2
Λ2 = A - B1 @ F1
Π1 = R1 + F2.T @ S1 @ F2
Π2 = R2 + F1.T @ S2 @ F1
Γ1 = W1.T - M1.T @ F2
Γ2 = W2.T - M2.T @ F1

# Compute P1 and P2
P1 = Π1 - (B1.T @ D1P1 @ Λ1 + Γ1).T @ F1 + \
Λ1.T @ D1P1 @ Λ1
P2 = Π2 - (B2.T @ D2P2 @ Λ2 + Γ2).T @ F2 + \
Λ2.T @ D2P2 @ Λ2

dd = np.max(np.abs(F10 - F1)) + np.max(np.abs(F20 - F2))

if dd < tol: # success!


break

else:
raise ValueError(f'No convergence: Iteration limit of {maxiter} reached in nnash')

return F1, F2, P1, P2

52.4.3 Some Details

Firm 𝑖 wants to minimize


52.4. APPLICATION 885

𝑡1 −1
∑ 𝛽 𝑡−𝑡0 {𝑥′𝑡 𝑅𝑖 𝑥𝑡 + 𝑢′𝑖𝑡 𝑄𝑖 𝑢𝑖𝑡 + 𝑢′−𝑖𝑡 𝑆𝑖 𝑢−𝑖𝑡 + 2𝑥′𝑡 𝑊𝑖 𝑢𝑖𝑡 + 2𝑢′−𝑖𝑡 𝑀𝑖 𝑢𝑖𝑡 }
𝑡=𝑡0

where

1
𝑥𝑡 ∶= ⎡𝑞 ⎤
⎢ 1𝑡 ⎥ and 𝑢𝑖𝑡 ∶= 𝑞𝑖,𝑡+1 − 𝑞𝑖𝑡 , 𝑖 = 1, 2
⎣𝑞2𝑡 ⎦

and

0 − 𝑎20 0 0 0 − 𝑎20
𝑅1 ∶= ⎡ −
⎢ 2
𝑎0
𝑎1 𝑎1 ⎤
2 ⎥, 𝑅2 ∶= ⎡
⎢ 0 0 𝑎1
2
⎤,
⎥ 𝑄1 = 𝑄2 = 𝛾, 𝑆1 = 𝑆2 = 0, 𝑊1 = 𝑊2 = 0, 𝑀1 = 𝑀
𝑎1 𝑎 𝑎1
⎣ 0 2 0⎦ ⎣− 20 2 𝑎1 ⎦

The parameters of the duopoly model are:

• 𝑎0 = 10
• 𝑎1 = 2
• 𝛽 = 0.96
• 𝛾 = 12

In [4]: # == Parameters == #
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0

# == In LQ form == #
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])

R1 = [[ 0., -a0 / 2, 0.],


[-a0 / 2., a1, a1 / 2.],
[ 0, a1 / 2., 0.]]

R2 = [[ 0., 0., -a0 / 2],


[ 0., 0., a1 / 2.],
[-a0 / 2, a1 / 2., a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

Consistency Check
We first conduct a comparison test to check if nnash_robust agrees with qe.nnash in the
non-robustness case in which each 𝜃𝑖 ≈ +∞

In [5]: # == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β)
886 52. ROBUST MARKOV PERFECT EQUILIBRIUM

# == Solve using nnash_robust == #


F1r, F2r, P1r, P2r = nnash_robust(A, np.zeros((3, 1)), B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1, M2, 1e-10,
1e-10, beta=β)

print('F1 and F1r should be the same: ', np.allclose(F1, F1r))


print('F2 and F2r should be the same: ', np.allclose(F1, F1r))
print('P1 and P1r should be the same: ', np.allclose(P1, P1r))
print('P2 and P2r should be the same: ', np.allclose(P1, P1r))

F1 and F1r should be the same: True


F2 and F2r should be the same: True
P1 and P1r should be the same: True
P2 and P2r should be the same: True

We can see that the results are consistent across the two functions
Comparative Dynamics under Baseline Transition Dynamics
We want to compare the dynamics of price and output under the baseline MPE model with
those under the baseline model under the robust decision rules within the robust MPE
This means that we simulate the state dynamics under the MPE equilibrium closed-loop
transition matrix

𝐴𝑜 = 𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2

where 𝐹1 and 𝐹2 are the firms’ robust decision rules within the robust markov_perfect equi-
librium

• by simulating under the baseline model transition dynamics and the robust
MPE rules we are in assuming that at the end of the day firms’ concerns
about misspecification of the baseline model do not materialize
• a short way of saying this is that misspecification fears are all ‘just in the
minds’ of the firms
• simulating under the baseline model is a common practice in the literature
• note that some assumption about the model that actually governs the data
has to be made in order to create a simulation
• later we will describe the (erroneous) beliefs of the two firms that justify
their robust decisions as best responses to transition laws that are distorted
relative to the baseline model

After simulating 𝑥𝑡 under the baseline transition dynamics and robust decision rules 𝐹𝑖 , 𝑖 =
1, 2, we extract and plot industry output 𝑞𝑡 = 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 = 𝑎0 − 𝑎1 𝑞𝑡
Here we set the robustness and volatility matrix parameters as follows:

• 𝜃1 = 0.02
• 𝜃2 = 0.04
0
• 𝐶 = ⎜0.01⎞
⎛ ⎟
⎝ 0.01⎠

Because we have set 𝜃1 < 𝜃2 < +∞ we know that


52.4. APPLICATION 887

• both firms fear that the baseline specification of the state transition dynam-
ics are incorrect
• firm 1 fears misspecification more than firm 2

In [6]: # == Robustness parameters and matrix == #


C = np.asmatrix([[0], [0.01], [0.01]])
θ1 = 0.02
θ2 = 0.04
n = 20

# == Solve using nnash_robust == #


F1r, F2r, P1r, P2r = nnash_robust(A, C, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1, M2,
θ1, θ2, beta=β)

# == MPE output and price == #


AF = A - B1 @ F1 - B2 @ F2
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n - 1):
x[:, t + 1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE

# == RMPE output and price == #


AO = A - B1 @ F1r - B2 @ F2r
xr = np.empty((3, n))
xr[:, 0] = 1, 1, 1
for t in range(n - 1):
xr[:, t+1] = AO @ xr[:, t]
qr1 = xr[1, :]
qr2 = xr[2, :]
qr = qr1 + qr2 # Total output, RMPE
pr = a0 - a1 * qr # Price, RMPE

# == RMPE heterogeneous beliefs output and price == #


I = np.eye(C.shape[1])
INV1 = solve(θ1 * I - C.T @ P1 @ C, I)
K1 = P1 @ C @ INV1 @ C.T @ P1 @ AO
AOCK1 = AO + C.T @ K1

INV2 = solve(θ2 * I - C.T @ P2 @ C, I)


K2 = P2 @ C @ INV2 @ C.T @ P2 @ AO
AOCK2 = AO + C.T @ K2
xrp1 = np.empty((3, n))
xrp2 = np.empty((3, n))
xrp1[:, 0] = 1, 1, 1
xrp2[:, 0] = 1, 1, 1
for t in range(n - 1):
xrp1[:, t + 1] = AOCK1 @ xrp1[:, t]
xrp2[:, t + 1] = AOCK2 @ xrp2[:, t]
qrp11 = xrp1[1, :]
qrp12 = xrp1[2, :]
qrp21 = xrp2[1, :]
qrp22 = xrp2[2, :]
qrp1 = qrp11 + qrp12 # Total output, RMPE from player 1's belief
qrp2 = qrp21 + qrp22 # Total output, RMPE from player 2's belief
prp1 = a0 - a1 * qrp1 # Price, RMPE from player 1's belief
prp2 = a0 - a1 * qrp2 # Price, RMPE from player 2's belief

The following code prepares graphs that compare market-wide output 𝑞1𝑡 + 𝑞2𝑡 and the price
of the good 𝑝𝑡 under equilibrium decision rules 𝐹𝑖 , 𝑖 = 1, 2 from an ordinary Markov perfect
equilibrium and the decision rules under a Markov perfect equilibrium with robust firms with
multiplier parameters 𝜃𝑖 , 𝑖 = 1, 2 set as described above
888 52. ROBUST MARKOV PERFECT EQUILIBRIUM

Both industry output and price are under the transition dynamics associated with the base-
line model; only the decision rules 𝐹𝑖 differ across the two equilibrium objects presented

In [7]: fig, axes = plt.subplots(2, 1, figsize=(9, 9))

ax = axes[0]
ax.plot(q, 'g-', lw=2, alpha=0.75, label='MPE output')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)

ax = axes[1]
ax.plot(p, 'g-', lw=2, alpha=0.75, label='MPE price')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()

Under the dynamics associated with the baseline model, the price path is higher with the
Markov perfect equilibrium robust decision rules than it is with decision rules for the ordinary
Markov perfect equilibrium
So is the industry output path
52.4. APPLICATION 889

To dig a little beneath the forces driving these outcomes, we want to plot 𝑞1𝑡 and 𝑞2𝑡 in the
Markov perfect equilibrium with robust firms and to compare them with corresponding ob-
jects in the Markov perfect equilibrium without robust firms

In [8]: fig, axes = plt.subplots(2, 1, figsize=(9, 9))

ax = axes[0]
ax.plot(q1, 'g-', lw=2, alpha=0.75, label='firm 1 MPE output')
ax.plot(qr1, 'b-', lw=2, alpha=0.75, label='firm 1 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)

ax = axes[1]
ax.plot(q2, 'g-', lw=2, alpha=0.75, label='firm 2 MPE output')
ax.plot(qr2, 'r-', lw=2, alpha=0.75, label='firm 2 RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(1, 2))
ax.legend(loc='upper left', frameon=0)
plt.show()

Evidently, firm 1’s output path is substantially lower when firms are robust firms while firm
2’s output path is virtually the same as it would be in an ordinary Markov perfect equilib-
rium with no robust firms
890 52. ROBUST MARKOV PERFECT EQUILIBRIUM

Recall that we have set 𝜃1 = .02 and 𝜃2 = .04, so that firm 1 fears misspecification of the
baseline model substantially more than does firm 2

• but also please notice that firm 2’s behavior in the Markov perfect equilibrium with ro-
bust firms responds to the decision rule 𝐹1 𝑥𝑡 employed by firm 1
• thus it is something of a coincidence that its output is almost the same in the two equi-
libria

Larger concerns about misspecification induce firm 1 to be more cautious than firm 2 in pre-
dicting market price and the output of the other firm
To explore this, we study next how ex-post the two firms’ beliefs about state dynamics differ
in the Markov perfect equilibrium with robust firms
(by ex-post we mean after extremization of each firm’s intertemporal objective)
Heterogeneous Beliefs
As before, let 𝐴𝑜 = 𝐴 − 𝐵_1𝐹 _1𝑟 − 𝐵_2𝐹 _2𝑟 , where in a robust MPE, 𝐹𝑖𝑟 is a robust deci-
sion rule for firm 𝑖
Worst-case forecasts of 𝑥𝑡 starting from 𝑡 = 0 differ between the two firms
This means that worst-case forecasts of industry output 𝑞1𝑡 + 𝑞2𝑡 and price 𝑝𝑡 also differ be-
tween the two firms
To find these worst-case beliefs, we compute the following three “closed-loop” transition ma-
trices

• 𝐴𝑜
• 𝐴𝑜 + 𝐶𝐾_1
• 𝐴𝑜 + 𝐶𝐾_2

We call the first transition law, namely, 𝐴𝑜 , the baseline transition under firms’ robust deci-
sion rules
We call the second and third worst-case transitions under robust decision rules for firms 1 and
2
From {𝑥𝑡 } paths generated by each of these transition laws, we pull off the associated price
and total output sequences
The following code plots them

In [9]: print('Baseline Robust transition matrix AO is: \n', np.round(AO, 3))


print('Player 1\'s worst-case transition matrix AOCK1 is: \n', np.round(AOCK1, 3))
print('Player 2\'s worst-case transition matrix AOCK2 is: \n', np.round(AOCK2, 3))

Baseline Robust transition matrix AO is:


[[ 1. 0. 0. ]
[ 0.666 0.682 -0.074]
[ 0.671 -0.071 0.694]]
Player 1's worst-case transition matrix AOCK1 is:
[[ 0.998 0.002 0. ]
[ 0.664 0.685 -0.074]
[ 0.669 -0.069 0.694]]
Player 2's worst-case transition matrix AOCK2 is:
[[ 0.999 0. 0.001]
[ 0.665 0.683 -0.073]
[ 0.67 -0.071 0.695]]
52.4. APPLICATION 891

In [10]: # == Plot == #
fig, axes = plt.subplots(2, 1, figsize=(9, 9))

ax = axes[0]
ax.plot(qrp1, 'b--', lw=2, alpha=0.75, label='RMPE worst-case belief output player 1')
ax.plot(qrp2, 'r:', lw=2, alpha=0.75, label='RMPE worst-case belief output player 2')
ax.plot(qr, 'm-', lw=2, alpha=0.75, label='RMPE output')
ax.set(ylabel="output", xlabel="time", ylim=(2, 4))
ax.legend(loc='upper left', frameon=0)

ax = axes[1]
ax.plot(prp1, 'b--', lw=2, alpha=0.75, label='RMPE worst-case belief price player 1')
ax.plot(prp2, 'r:', lw=2, alpha=0.75, label='RMPE worst-case belief price player 2')
ax.plot(pr, 'm-', lw=2, alpha=0.75, label='RMPE price')
ax.set(ylabel="price", xlabel="time")
ax.legend(loc='upper right', frameon=0)
plt.show()

We see from the above graph that under robustness concerns, player 1 and player 2 have het-
erogeneous beliefs about total output and the goods price even though they share the same
baseline model and information

• firm 1 thinks that total output will be higher and price lower than does firm
2
• this leads firm 1 to produce less than firm 2
892 52. ROBUST MARKOV PERFECT EQUILIBRIUM

These beliefs justify (or rationalize) the Markov perfect equilibrium robust decision rules
This means that the robust rules are the unique optimal rules (or best responses) to the in-
dicated worst-case transition dynamics
([52] discuss how this property of robust decision rules is connected to the concept of admissi-
bility in Bayesian statistical decision theory)
53

Uncertainty Traps

53.1 Contents

• Overview 53.2

• The Model 53.3

• Implementation 53.4

• Results 53.5

• Exercises 53.6

• Solutions 53.7

53.2 Overview

In this lecture, we study a simplified version of an uncertainty traps model of Fajgelbaum,


Schaal and Taschereau-Dumouchel [42]
The model features self-reinforcing uncertainty that has big impacts on economic activity
In the model,

• Fundamentals vary stochastically and are not fully observable


• At any moment there are both active and inactive entrepreneurs; only active en-
trepreneurs produce
• Agents – active and inactive entrepreneurs – have beliefs about the fundamentals ex-
pressed as probability distributions
• Greater uncertainty means greater dispersions of these distributions
• Entrepreneurs are risk-averse and hence less inclined to be active when uncertainty is
high
• The output of active entrepreneurs is observable, supplying a noisy signal that helps
everyone inside the model infer fundamentals
• Entrepreneurs update their beliefs about fundamentals using Bayes’ Law, implemented
via Kalman filtering

Uncertainty traps emerge because:

893
894 53. UNCERTAINTY TRAPS

• High uncertainty discourages entrepreneurs from becoming active


• A low level of participation – i.e., a smaller number of active entrepreneurs – diminishes
the flow of information about fundamentals
• Less information translates to higher uncertainty, further discouraging entrepreneurs
from choosing to be active, and so on

Uncertainty traps stem from a positive externality: high aggregate economic activity levels
generates valuable information

53.3 The Model

The original model described in [42] has many interesting moving parts
Here we examine a simplified version that nonetheless captures many of the key ideas

53.3.1 Fundamentals

The evolution of the fundamental process {𝜃𝑡 } is given by

𝜃𝑡+1 = 𝜌𝜃𝑡 + 𝜎𝜃 𝑤𝑡+1

where

• 𝜎𝜃 > 0 and 0 < 𝜌 < 1


• {𝑤𝑡 } is IID and standard normal

The random variable 𝜃𝑡 is not observable at any time

53.3.2 Output

There is a total 𝑀̄ of risk-averse entrepreneurs


Output of the 𝑚-th entrepreneur, conditional on being active in the market at time 𝑡, is equal
to

𝑥𝑚 = 𝜃 + 𝜖𝑚 where 𝜖𝑚 ∼ 𝑁 (0, 𝛾𝑥−1 ) (1)

Here the time subscript has been dropped to simplify notation


The inverse of the shock variance, 𝛾𝑥 , is called the shock’s precision
The higher is the precision, the more informative 𝑥𝑚 is about the fundamental
Output shocks are independent across time and firms

53.3.3 Information and Beliefs

All entrepreneurs start with identical beliefs about 𝜃0


Signals are publicly observable and hence all agents have identical beliefs always
53.3. THE MODEL 895

Dropping time subscripts, beliefs for current 𝜃 are represented by the normal distribution
𝑁 (𝜇, 𝛾 −1 )
Here 𝛾 is the precision of beliefs; its inverse is the degree of uncertainty
These parameters are updated by Kalman filtering
Let

• M ⊂ {1, … , 𝑀̄ } denote the set of currently active firms


• 𝑀 ∶= |M| denote the number of currently active firms
1
• 𝑋 be the average output 𝑀 ∑𝑚∈M 𝑥𝑚 of the active firms

With this notation and primes for next period values, we can write the updating of the mean
and precision via

𝛾𝜇 + 𝑀 𝛾𝑥 𝑋
𝜇′ = 𝜌 (2)
𝛾 + 𝑀 𝛾𝑥

−1
𝜌2
𝛾′ = ( + 𝜎𝜃2 ) (3)
𝛾 + 𝑀 𝛾𝑥

These are standard Kalman filtering results applied to the current setting
Exercise 1 provides more details on how Eq. (2) and Eq. (3) are derived and then asks you to
fill in remaining steps
The next figure plots the law of motion for the precision in Eq. (3) as a 45 degree diagram,
with one curve for each 𝑀 ∈ {0, … , 6}
The other parameter values are 𝜌 = 0.99, 𝛾𝑥 = 0.5, 𝜎𝜃 = 0.5
896 53. UNCERTAINTY TRAPS

Points where the curves hit the 45 degree lines are long-run steady states for precision for dif-
ferent values of 𝑀
Thus, if one of these values for 𝑀 remains fixed, a corresponding steady state is the equilib-
rium level of precision

• high values of 𝑀 correspond to greater information about the fundamental, and hence
more precision in steady state
• low values of 𝑀 correspond to less information and more uncertainty in steady state

In practice, as we’ll see, the number of active firms fluctuates stochastically

53.3.4 Participation

Omitting time subscripts once more, entrepreneurs enter the market in the current period if

E[𝑢(𝑥𝑚 − 𝐹𝑚 )] > 𝑐 (4)

Here

• the mathematical expectation of 𝑥𝑚 is based on Eq. (1) and beliefs 𝑁 (𝜇, 𝛾 −1 ) for 𝜃
• 𝐹𝑚 is a stochastic but pre-visible fixed cost, independent across time and firms
• 𝑐 is a constant reflecting opportunity costs
53.4. IMPLEMENTATION 897

The statement that 𝐹𝑚 is pre-visible means that it is realized at the start of the period and
treated as a constant in Eq. (4)
The utility function has the constant absolute risk aversion form

1
𝑢(𝑥) = (1 − exp(−𝑎𝑥)) (5)
𝑎
where 𝑎 is a positive parameter
Combining Eq. (4) and Eq. (5), entrepreneur 𝑚 participates in the market (or is said to be
active) when

1
{1 − E[exp (−𝑎(𝜃 + 𝜖𝑚 − 𝐹𝑚 ))]} > 𝑐
𝑎
Using standard formulas for expectations of lognormal random variables, this is equivalent to
the condition

1 𝑎2 ( 𝛾1 + 1
𝛾𝑥 )
𝜓(𝜇, 𝛾, 𝐹𝑚 ) ∶= (1 − exp (−𝑎𝜇 + 𝑎𝐹𝑚 + )) − 𝑐 > 0 (6)
𝑎 2

53.4 Implementation

We want to simulate this economy


As a first step, let’s put together a class that bundles

• the parameters, the current value of 𝜃 and the current values of the two belief parame-
ters 𝜇 and 𝛾
• methods to update 𝜃, 𝜇 and 𝛾, as well as to determine the number of active firms and
their outputs

The updating methods follow the laws of motion for 𝜃, 𝜇 and 𝛾 given above
The method to evaluate the number of active firms generates 𝐹1 , … , 𝐹𝑀̄ and tests condition
Eq. (6) for each firm
The init method encodes as default values the parameters we’ll use in the simulations below

In [1]: import numpy as np

class UncertaintyTrapEcon:

def __init__(self,
a=1.5, # Risk aversion
γ_x=0.5, # Production shock precision
ρ=0.99, # Correlation coefficient for θ
σ_θ=0.5, # Standard dev of θ shock
num_firms=100, # Number of firms
σ_F=1.5, # Standard dev of fixed costs
c=-420, # External opportunity cost
μ_init=0, # Initial value for μ
γ_init=4, # Initial value for γ
θ_init=0): # Initial value for θ
898 53. UNCERTAINTY TRAPS

# == Record values == #
self.a, self.γ_x, self.ρ, self.σ_θ = a, γ_x, ρ, σ_θ
self.num_firms, self.σ_F, self.c, = num_firms, σ_F, c
self.σ_x = np.sqrt(1/γ_x)

# == Initialize states == #
self.γ, self.μ, self.θ = γ_init, μ_init, θ_init

def ψ(self, F):


temp1 = -self.a * (self.μ - F)
temp2 = self.a**2 * (1/self.γ + 1/self.γ_x) / 2
return (1 / self.a) * (1 - np.exp(temp1 + temp2)) - self.c

def update_beliefs(self, X, M):


"""
Update beliefs (μ, γ) based on aggregates X and M.
"""
# Simplify names
γ_x, ρ, σ_θ = self.γ_x, self.ρ, self.σ_θ
# Update μ
temp1 = ρ * (self.γ * self.μ + M * γ_x * X)
temp2 = self.γ + M * γ_x
self.μ = temp1 / temp2
# Update γ
self.γ = 1 / (ρ**2 / (self.γ + M * γ_x) + σ_θ**2)

def update_θ(self, w):


"""
Update the fundamental state θ given shock w.
"""
self.θ = self.ρ * self.θ + self.σ_θ * w

def gen_aggregates(self):
"""
Generate aggregates based on current beliefs (μ, γ). This
is a simulation step that depends on the draws for F.
"""
F_vals = self.σ_F * np.random.randn(self.num_firms)
M = np.sum(self.ψ(F_vals) > 0) # Counts number of active firms
if M > 0:
x_vals = self.θ + self.σ_x * np.random.randn(M)
X = x_vals.mean()
else:
X = 0
return X, M

In the results below we use this code to simulate time series for the major variables

53.5 Results

Let’s look first at the dynamics of 𝜇, which the agents use to track 𝜃
53.5. RESULTS 899

We see that 𝜇 tracks 𝜃 well when there are sufficient firms in the market
However, there are times when 𝜇 tracks 𝜃 poorly due to insufficient information
These are episodes where the uncertainty traps take hold
During these episodes

• precision is low and uncertainty is high


• few firms are in the market

To get a clearer idea of the dynamics, let’s look at all the main time series at once, for a given
set of shocks
900 53. UNCERTAINTY TRAPS

Notice how the traps only take hold after a sequence of bad draws for the fundamental
Thus, the model gives us a propagation mechanism that maps bad random draws into long
downturns in economic activity
53.6. EXERCISES 901

53.6 Exercises

53.6.1 Exercise 1

Fill in the details behind Eq. (2) and Eq. (3) based on the following standard result (see, e.g.,
p. 24 of [136])
Fact Let x = (𝑥1 , … , 𝑥𝑀 ) be a vector of IID draws from common distribution 𝑁 (𝜃, 1/𝛾𝑥 ) and
let 𝑥̄ be the sample mean. If 𝛾𝑥 is known and the prior for 𝜃 is 𝑁 (𝜇, 1/𝛾), then the posterior
distribution of 𝜃 given x is

𝜋(𝜃 | x) = 𝑁 (𝜇0 , 1/𝛾0 )

where

𝜇𝛾 + 𝑀 𝑥𝛾̄ 𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾 𝑥
𝛾 + 𝑀 𝛾𝑥

53.6.2 Exercise 2

Modulo randomness, replicate the simulation figures shown above

• Use the parameter values listed as defaults in the init method of the Uncertainty-
TrapEcon class

53.7 Solutions
In [2]: import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import itertools

53.7.1 Exercise 1

This exercise asked you to validate the laws of motion for 𝛾 and 𝜇 given in the lecture, based
on the stated result about Bayesian updating in a scalar Gaussian setting. The stated result
tells us that after observing average output 𝑋 of the 𝑀 firms, our posterior beliefs will be

𝑁 (𝜇0 , 1/𝛾0 )

where

𝜇𝛾 + 𝑀 𝑋𝛾𝑥
𝜇0 = and 𝛾0 = 𝛾 + 𝑀 𝛾𝑥
𝛾 + 𝑀 𝛾𝑥

If we take a random variable 𝜃 with this distribution and then evaluate the distribution of
𝜌𝜃 + 𝜎𝜃 𝑤 where 𝑤 is independent and standard normal, we get the expressions for 𝜇′ and 𝛾 ′
given in the lecture
902 53. UNCERTAINTY TRAPS

53.7.2 Exercise 2

First, let’s replicate the plot that illustrates the law of motion for precision, which is

−1
𝜌2
𝛾𝑡+1 = ( + 𝜎𝜃2 )
𝛾𝑡 + 𝑀 𝛾 𝑥

Here 𝑀 is the number of active firms. The next figure plots 𝛾𝑡+1 against 𝛾𝑡 on a 45 degree
diagram for different values of 𝑀

In [3]: econ = UncertaintyTrapEcon()


ρ, σ_θ, γ_x = econ.ρ, econ.σ_θ, econ.γ_x # simplify names
γ = np.linspace(1e-10, 3, 200) # γ grid
fig, ax = plt.subplots(figsize=(9, 9))
ax.plot(γ, γ, 'k-') # 45 degree line

for M in range(7):
γ_next = 1 / (ρ**2 / (γ + M * γ_x) + σ_θ**2)
label_string = f"$M = {M}$"
ax.plot(γ, γ_next, lw=2, label=label_string)
ax.legend(loc='lower right', fontsize=14)
ax.set_xlabel(r'$\gamma$', fontsize=16)
ax.set_ylabel(r"$\gamma'$", fontsize=16)
ax.grid()
plt.show()
53.7. SOLUTIONS 903

The points where the curves hit the 45 degree lines are the long-run steady states correspond-
ing to each 𝑀 , if that value of 𝑀 was to remain fixed. As the number of firms falls, so does
the long-run steady state of precision
Next let’s generate time series for beliefs and the aggregates – that is, the number of active
firms and average output

In [4]: sim_length=2000

μ_vec = np.empty(sim_length)
θ_vec = np.empty(sim_length)
γ_vec = np.empty(sim_length)
X_vec = np.empty(sim_length)
M_vec = np.empty(sim_length)

μ_vec[0] = econ.μ
γ_vec[0] = econ.γ
θ_vec[0] = 0

w_shocks = np.random.randn(sim_length)

for t in range(sim_length-1):
X, M = econ.gen_aggregates()
X_vec[t] = X
M_vec[t] = M

econ.update_beliefs(X, M)
econ.update_θ(w_shocks[t])

μ_vec[t+1] = econ.μ
γ_vec[t+1] = econ.γ
θ_vec[t+1] = econ.θ

# Record final values of aggregates


X, M = econ.gen_aggregates()
X_vec[-1] = X
M_vec[-1] = M

First, let’s see how well 𝜇 tracks 𝜃 in these simulations

In [5]: fig, ax = plt.subplots(figsize=(9, 6))


ax.plot(range(sim_length), θ_vec, alpha=0.6, lw=2, label=r"$\theta$")
ax.plot(range(sim_length), μ_vec, alpha=0.6, lw=2, label=r"$\mu$")
ax.legend(fontsize=16)
ax.grid()
plt.show()
904 53. UNCERTAINTY TRAPS

Now let’s plot the whole thing together

In [6]: fig, axes = plt.subplots(4, 1, figsize=(12, 20))


# Add some spacing
fig.subplots_adjust(hspace=0.3)

series = (θ_vec, μ_vec, γ_vec, M_vec)


names = r'$\theta$', r'$\mu$', r'$\gamma$', r'$M$'

for ax, vals, name in zip(axes, series, names):


# determine suitable y limits
s_max, s_min = max(vals), min(vals)
s_range = s_max - s_min
y_max = s_max + s_range * 0.1
y_min = s_min - s_range * 0.1
ax.set_ylim(y_min, y_max)
# Plot series
ax.plot(range(sim_length), vals, alpha=0.6, lw=2)
ax.set_title(f"time series for {name}", fontsize=16)
ax.grid()

plt.show()
53.7. SOLUTIONS 905

If you run the code above you’ll get different plots, of course
906 53. UNCERTAINTY TRAPS

Try experimenting with different parameters to see the effects on the time series
(It would also be interesting to experiment with non-Gaussian distributions for the shocks,
but this is a big exercise since it takes us outside the world of the standard Kalman filter)
54

The Aiyagari Model

54.1 Contents

• Overview 54.2

• The Economy 54.3

• Firms 54.4

• Code 54.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

54.2 Overview

In this lecture, we describe the structure of a class of models that build on work by Truman
Bewley [15]
We begin by discussing an example of a Bewley model due to Rao Aiyagari
The model features

• Heterogeneous agents
• A single exogenous vehicle for borrowing and lending
• Limits on amounts individual agents may borrow

The Aiyagari model has been used to investigate many topics, including

• precautionary savings and the effect of liquidity constraints [4]


• risk sharing and asset pricing [63]
• the shape of the wealth distribution [12]
• etc., etc., etc.

907
908 54. THE AIYAGARI MODEL

54.2.1 References

The primary reference for this lecture is [4]


A textbook treatment is available in chapter 18 of [87]
A continuous time version of the model by SeHyoun Ahn and Benjamin Moll can be found
here

54.3 The Economy

54.3.1 Households

Infinitely lived households / consumers face idiosyncratic income shocks


A unit interval of ex-ante identical households face a common borrowing constraint
The savings problem faced by a typical household is


max E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 )
𝑡=0

subject to

𝑎𝑡+1 + 𝑐𝑡 ≤ 𝑤𝑧𝑡 + (1 + 𝑟)𝑎𝑡 𝑐𝑡 ≥ 0, and 𝑎𝑡 ≥ −𝐵

where

• 𝑐𝑡 is current consumption
• 𝑎𝑡 is assets
• 𝑧𝑡 is an exogenous component of labor income capturing stochastic unemployment risk,
etc.
• 𝑤 is a wage rate
• 𝑟 is a net interest rate
• 𝐵 is the maximum amount that the agent is allowed to borrow

The exogenous process {𝑧𝑡 } follows a finite state Markov chain with given stochastic matrix
𝑃
The wage and interest rate are fixed over time
In this simple version of the model, households supply labor inelastically because they do not
value leisure

54.4 Firms

Firms produce output by hiring capital and labor


Firms act competitively and face constant returns to scale
Since returns to scale are constant the number of firms does not matter
54.4. FIRMS 909

Hence we can consider a single (but nonetheless competitive) representative firm


The firm’s output is

𝑌𝑡 = 𝐴𝐾𝑡𝛼 𝑁 1−𝛼

where

• 𝐴 and 𝛼 are parameters with 𝐴 > 0 and 𝛼 ∈ (0, 1)


• 𝐾𝑡 is aggregate capital
• 𝑁 is total labor supply (which is constant in this simple version of the model)

The firm’s problem is

𝑚𝑎𝑥𝐾,𝑁 {𝐴𝐾𝑡𝛼 𝑁 1−𝛼 − (𝑟 + 𝛿)𝐾 − 𝑤𝑁 }

The parameter 𝛿 is the depreciation rate


From the first-order condition with respect to capital, the firm’s inverse demand for capital is

1−𝛼
𝑁
𝑟 = 𝐴𝛼 ( ) −𝛿 (1)
𝐾

Using this expression and the firm’s first-order condition for labor, we can pin down the equi-
librium wage rate as a function of 𝑟 as

𝑤(𝑟) = 𝐴(1 − 𝛼)(𝐴𝛼/(𝑟 + 𝛿))𝛼/(1−𝛼) (2)

54.4.1 Equilibrium

We construct a stationary rational expectations equilibrium (SREE)


In such an equilibrium

• prices induce behavior that generates aggregate quantities consistent with the prices
• aggregate quantities and prices are constant over time

In more detail, an SREE lists a set of prices, savings and production policies such that

• households want to choose the specified savings policies taking the prices as given
• firms maximize profits taking the same prices as given
• the resulting aggregate quantities are consistent with the prices; in particular, the de-
mand for capital equals the supply
• aggregate quantities (defined as cross-sectional averages) are constant

In practice, once parameter values are set, we can check for an SREE by the following steps

1. pick a proposed quantity 𝐾 for aggregate capital


910 54. THE AIYAGARI MODEL

2. determine corresponding prices, with interest rate 𝑟 determined by Eq. (1) and a wage
rate 𝑤(𝑟) as given in Eq. (2)
3. determine the common optimal savings policy of the households given these prices
4. compute aggregate capital as the mean of steady state capital given this savings policy

If this final quantity agrees with 𝐾 then we have a SREE

54.5 Code

Let’s look at how we might compute such an equilibrium in practice


To solve the household’s dynamic programming problem we’ll use the DiscreteDP class from
QuantEcon.py
Our first task is the least exciting one: write code that maps parameters for a household
problem into the R and Q matrices needed to generate an instance of DiscreteDP
Below is a piece of boilerplate code that does just this
In reading the code, the following information will be helpful

• R needs to be a matrix where R[s, a] is the reward at state s under action a


• Q needs to be a three-dimensional array where Q[s, a, s'] is the probability of tran-
sitioning to state s' when the current state is s and the current action is a

(For a detailed discussion of DiscreteDP see this lecture)


Here we take the state to be 𝑠𝑡 ∶= (𝑎𝑡 , 𝑧𝑡 ), where 𝑎𝑡 is assets and 𝑧𝑡 is the shock
The action is the choice of next period asset level 𝑎𝑡+1
We use Numba to speed up the loops so we can update the matrices efficiently when the pa-
rameters change
The class also includes a default set of parameters that we’ll adopt unless otherwise specified

In [2]: import numpy as np


from numba import jit

class Household:
"""
This class takes the parameters that define a household asset accumulation
problem and computes the corresponding reward and transition matrices R
and Q required to generate an instance of DiscreteDP, and thereby solve
for the optimal policy.

Comments on indexing: We need to enumerate the state space S as a sequence


S = {0, ..., n}. To this end, (a_i, z_i) index pairs are mapped to s_i
indices according to the rule

s_i = a_i * z_size + z_i

To invert this map, use

a_i = s_i // z_size (integer division)


z_i = s_i % z_size

"""

def __init__(self,
54.5. CODE 911

r=0.01, # interest rate


w=1.0, # wages
β=0.96, # discount factor
a_min=1e-10,
Π=[[0.9, 0.1], [0.1, 0.9]], # Markov chain
z_vals=[0.1, 1.0], # exogenous states
a_max=18,
a_size=200):

# Store values, set up grids over a and z


self.r, self.w, self.β = r, w, β
self.a_min, self.a_max, self.a_size = a_min, a_max, a_size

self.Π = np.asarray(Π)
self.z_vals = np.asarray(z_vals)
self.z_size = len(z_vals)

self.a_vals = np.linspace(a_min, a_max, a_size)


self.n = a_size * self.z_size

# Build the array Q


self.Q = np.zeros((self.n, a_size, self.n))
self.build_Q()

# Build the array R


self.R = np.empty((self.n, a_size))
self.build_R()

def set_prices(self, r, w):


"""
Use this method to reset prices. Calling the method will trigger a
re-build of R.
"""
self.r, self.w = r, w
self.build_R()

def build_Q(self):
populate_Q(self.Q, self.a_size, self.z_size, self.Π)

def build_R(self):
self.R.fill(-np.inf)
populate_R(self.R, self.a_size, self.z_size, self.a_vals, self.z_vals, self.r, self.w)

# Do the hard work using JIT-ed functions

@jit(nopython=True)
def populate_R(R, a_size, z_size, a_vals, z_vals, r, w):
n = a_size * z_size
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a = a_vals[a_i]
z = z_vals[z_i]
for new_a_i in range(a_size):
a_new = a_vals[new_a_i]
c = w * z + (1 + r) * a - a_new
if c > 0:
R[s_i, new_a_i] = np.log(c) # Utility

@jit(nopython=True)
def populate_Q(Q, a_size, z_size, Π):
n = a_size * z_size
for s_i in range(n):
z_i = s_i % z_size
for a_i in range(a_size):
for next_z_i in range(z_size):
Q[s_i, a_i, a_i * z_size + next_z_i] = Π[z_i, next_z_i]

@jit(nopython=True)
def asset_marginal(s_probs, a_size, z_size):
a_probs = np.zeros(a_size)
912 54. THE AIYAGARI MODEL

for a_i in range(a_size):


for z_i in range(z_size):
a_probs[a_i] += s_probs[a_i * z_size + z_i]
return a_probs

As a first example of what we can do, let’s compute and plot an optimal accumulation policy
at fixed prices

In [3]: import quantecon as qe


import matplotlib.pyplot as plt
from quantecon.markov import DiscreteDP

# Example prices
r = 0.03
w = 0.956

# Create an instance of Household


am = Household(a_max=20, r=r, w=w)

# Use the instance to build a discrete dynamic program


am_ddp = DiscreteDP(am.R, am.Q, am.β)

# Solve using policy function iteration


results = am_ddp.solve(method='policy_iteration')

# Simplify names
z_size, a_size = am.z_size, am.a_size
z_vals, a_vals = am.z_vals, am.a_vals
n = a_size * z_size

# Get all optimal actions across the set of a indices with z fixed in each row
a_star = np.empty((z_size, a_size))
for s_i in range(n):
a_i = s_i // z_size
z_i = s_i % z_size
a_star[z_i, a_i] = a_vals[results.sigma[s_i]]

fig, ax = plt.subplots(figsize=(9, 9))


ax.plot(a_vals, a_vals, 'k--') # 45 degrees
for i in range(z_size):
lb = f'$z = {z_vals[i]:.2}$'
ax.plot(a_vals, a_star[i, :], lw=2, alpha=0.6, label=lb)
ax.set_xlabel('current assets')
ax.set_ylabel('next period assets')
ax.legend(loc='upper left')

plt.show()

<Figure size 900x900 with 1 Axes>

The plot shows asset accumulation policies at different values of the exogenous state
Now we want to calculate the equilibrium
Let’s do this visually as a first pass
The following code draws aggregate supply and demand curves
The intersection gives equilibrium interest rates and capital

In [4]: A = 1.0
N = 1.0
α = 0.33
β = 0.96
δ = 0.05
54.5. CODE 913

def r_to_w(r):
"""
Equilibrium wages associated with a given interest rate r.
"""
return A * (1 - α) * (A * α / (r + δ))**(α / (1 - α))

def rd(K):
"""
Inverse demand curve for capital. The interest rate associated with a
given demand for capital K.
"""
return A * α * (N / K)**(1 - α) - δ

def prices_to_capital_stock(am, r):


"""
Map prices to the induced level of capital stock.

Parameters:
----------

am : Household
An instance of an aiyagari_household.Household
r : float
The interest rate
"""
w = r_to_w(r)
am.set_prices(r, w)
aiyagari_ddp = DiscreteDP(am.R, am.Q, β)
# Compute the optimal policy
results = aiyagari_ddp.solve(method='policy_iteration')
# Compute the stationary distribution
stationary_probs = results.mc.stationary_distributions[0]
# Extract the marginal distribution for assets
asset_probs = asset_marginal(stationary_probs, am.a_size, am.z_size)
# Return K
return np.sum(asset_probs * am.a_vals)

# Create an instance of Household


am = Household(a_max=20)

# Use the instance to build a discrete dynamic program


am_ddp = DiscreteDP(am.R, am.Q, am.β)

# Create a grid of r values at which to compute demand and supply of capital


num_points = 20
r_vals = np.linspace(0.005, 0.04, num_points)

# Compute supply of capital


k_vals = np.empty(num_points)
for i, r in enumerate(r_vals):
k_vals[i] = prices_to_capital_stock(am, r)

# Plot against demand for capital by firms


fig, ax = plt.subplots(figsize=(11, 8))
ax.plot(k_vals, r_vals, lw=2, alpha=0.6, label='supply of capital')
ax.plot(k_vals, rd(k_vals), lw=2, alpha=0.6, label='demand for capital')
ax.grid()
ax.set_xlabel('capital')
ax.set_ylabel('interest rate')
ax.legend(loc='upper right')

plt.show()
914 54. THE AIYAGARI MODEL
55

Default Risk and Income


Fluctuations

55.1 Contents

• Overview 55.2

• Structure 55.3

• Equilibrium 55.4

• Computation 55.5

• Results 55.6

• Exercises 55.7

• Solutions 55.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

55.2 Overview

This lecture computes versions of Arellano’s [8] model of sovereign default


The model describes interactions among default risk, output, and an equilibrium interest rate
that includes a premium for endogenous default risk
The decision maker is a government of a small open economy that borrows from risk-neutral
foreign creditors
The foreign lenders must be compensated for default risk
The government borrows and lends abroad in order to smooth the consumption of its citizens
The government repays its debt only if it wants to, but declining to pay has adverse conse-
quences

915
916 55. DEFAULT RISK AND INCOME FLUCTUATIONS

The interest rate on government debt adjusts in response to the state-dependent default
probability chosen by government
The model yields outcomes that help interpret sovereign default experiences, including

• countercyclical interest rates on sovereign debt


• countercyclical trade balances
• high volatility of consumption relative to output

Notably, long recessions caused by bad draws in the income process increase the government’s
incentive to default
This can lead to

• spikes in interest rates


• temporary losses of access to international credit markets
• large drops in output, consumption, and welfare
• large capital outflows during recessions

Such dynamics are consistent with experiences of many countries

55.3 Structure

In this section we describe the main features of the model

55.3.1 Output, Consumption and Debt

A small open economy is endowed with an exogenous stochastically fluctuating potential out-
put stream {𝑦𝑡 }
Potential output is realized only in periods in which the government honors its sovereign debt
The output good can be traded or consumed
The sequence {𝑦𝑡 } is described by a Markov process with stochastic density kernel 𝑝(𝑦, 𝑦′ )
Households within the country are identical and rank stochastic consumption streams accord-
ing to


E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0

Here

• 0 < 𝛽 < 1 is a time discount factor


• 𝑢 is an increasing and strictly concave utility function

Consumption sequences enjoyed by households are affected by the government’s decision to


borrow or lend internationally
The government is benevolent in the sense that its aim is to maximize Eq. (1)
55.3. STRUCTURE 917

The government is the only domestic actor with access to foreign credit
Because household are averse to consumption fluctuations, the government will try to smooth
consumption by borrowing from (and lending to) foreign creditors

55.3.2 Asset Markets

The only credit instrument available to the government is a one-period bond traded in inter-
national credit markets
The bond market has the following features

• The bond matures in one period and is not state contingent

• A purchase of a bond with face value 𝐵′ is a claim to 𝐵′ units of the consumption good
next period

• To purchase 𝐵′ next period costs 𝑞𝐵′ now, or, what is equivalent

• For selling −𝐵′ units of next period goods the seller earns −𝑞𝐵′ of today’s goods

– if 𝐵′ < 0, then −𝑞𝐵′ units of the good are received in the current period, for a
promise to repay −𝐵′ units next period
– there is an equilibrium price function 𝑞(𝐵′ , 𝑦) that makes 𝑞 depend on both 𝐵′ and
𝑦

Earnings on the government portfolio are distributed (or, if negative, taxed) lump sum to
households
When the government is not excluded from financial markets, the one-period national budget
constraint is

𝑐 = 𝑦 + 𝐵 − 𝑞(𝐵′ , 𝑦)𝐵′ (2)

Here and below, a prime denotes a next period value or a claim maturing next period
To rule out Ponzi schemes, we also require that 𝐵 ≥ −𝑍 in every period

• 𝑍 is chosen to be sufficiently large that the constraint never binds in equilibrium

55.3.3 Financial Markets

Foreign creditors

• are risk neutral


• know the domestic output stochastic process {𝑦𝑡 } and observe 𝑦𝑡 , 𝑦𝑡−1 , … , at time 𝑡
• can borrow or lend without limit in an international credit market at a constant inter-
national interest rate 𝑟
• receive full payment if the government chooses to pay
• receive zero if the government defaults on its one-period debt due
918 55. DEFAULT RISK AND INCOME FLUCTUATIONS

When a government is expected to default next period with probability 𝛿, the expected value
of a promise to pay one unit of consumption next period is 1 − 𝛿
Therefore, the discounted expected value of a promise to pay 𝐵 next period is

1−𝛿
𝑞= (3)
1+𝑟

Next we turn to how the government in effect chooses the default probability 𝛿

55.3.4 Government’s Decisions

At each point in time 𝑡, the government chooses between

1. defaulting
2. meeting its current obligations and purchasing or selling an optimal quantity of one-
period sovereign debt

Defaulting means declining to repay all of its current obligations


If the government defaults in the current period, then consumption equals current output
But a sovereign default has two consequences:

1. Output immediately falls from 𝑦 to ℎ(𝑦), where 0 ≤ ℎ(𝑦) ≤ 𝑦

• it returns to 𝑦 only after the country regains access to international credit markets

1. The country loses access to foreign credit markets

55.3.5 Reentering International Credit Market

While in a state of default, the economy regains access to foreign credit in each subsequent
period with probability 𝜃

55.4 Equilibrium

Informally, an equilibrium is a sequence of interest rates on its sovereign debt, a stochastic


sequence of government default decisions and an implied flow of household consumption such
that

1. Consumption and assets satisfy the national budget constraint


2. The government maximizes household utility taking into account

• the resource constraint


• the effect of its choices on the price of bonds
• consequences of defaulting now for future net output and future borrowing and lending
opportunities
55.4. EQUILIBRIUM 919

1. The interest rate on the government’s debt includes a risk-premium sufficient to make
foreign creditors expect on average to earn the constant risk-free international interest
rate

To express these ideas more precisely, consider first the choices of the government, which

1. enters a period with initial assets 𝐵, or what is the same thing, initial debt to be repaid
now of −𝐵

2. observes current output 𝑦, and

3. chooses either

4. to default, or

5. to pay −𝐵 and set next period’s debt due to −𝐵′

In a recursive formulation,

• state variables for the government comprise the pair (𝐵, 𝑦)


• 𝑣(𝐵, 𝑦) is the optimum value of the government’s problem when at the beginning of a
period it faces the choice of whether to honor or default
• 𝑣𝑐 (𝐵, 𝑦) is the value of choosing to pay obligations falling due
• 𝑣𝑑 (𝑦) is the value of choosing to default

𝑣𝑑 (𝑦) does not depend on 𝐵 because, when access to credit is eventually regained, net foreign
assets equal 0
Expressed recursively, the value of defaulting is

𝑣𝑑 (𝑦) = 𝑢(ℎ(𝑦)) + 𝛽 ∫ {𝜃𝑣(0, 𝑦′ ) + (1 − 𝜃)𝑣𝑑 (𝑦′ )} 𝑝(𝑦, 𝑦′ )𝑑𝑦′

The value of paying is

𝑣𝑐 (𝐵, 𝑦) = max

{𝑢(𝑦 − 𝑞(𝐵′ , 𝑦)𝐵′ + 𝐵) + 𝛽 ∫ 𝑣(𝐵′ , 𝑦′ )𝑝(𝑦, 𝑦′ )𝑑𝑦′ }
𝐵 ≥−𝑍

The three value functions are linked by

𝑣(𝐵, 𝑦) = max{𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦)}

The government chooses to default when

𝑣𝑐 (𝐵, 𝑦) < 𝑣𝑑 (𝑦)

and hence given 𝐵′ the probability of default next period is

𝛿(𝐵′ , 𝑦) ∶= ∫ 1{𝑣𝑐 (𝐵′ , 𝑦′ ) < 𝑣𝑑 (𝑦′ )}𝑝(𝑦, 𝑦′ )𝑑𝑦′ (4)


920 55. DEFAULT RISK AND INCOME FLUCTUATIONS

Given zero profits for foreign creditors in equilibrium, we can combine Eq. (3) and Eq. (4) to
pin down the bond price function:

1 − 𝛿(𝐵′ , 𝑦)
𝑞(𝐵′ , 𝑦) = (5)
1+𝑟

55.4.1 Definition of Equilibrium

An equilibrium is

• a pricing function 𝑞(𝐵′ , 𝑦),


• a triple of value functions (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)),
• a decision rule telling the government when to default and when to pay as a function of
the state (𝐵, 𝑦), and
• an asset accumulation rule that, conditional on choosing not to default, maps (𝐵, 𝑦) into
𝐵′

such that

• The three Bellman equations for (𝑣𝑐 (𝐵, 𝑦), 𝑣𝑑 (𝑦), 𝑣(𝐵, 𝑦)) are satisfied
• Given the price function 𝑞(𝐵′ , 𝑦), the default decision rule and the asset accumulation
decision rule attain the optimal value function 𝑣(𝐵, 𝑦), and
• The price function 𝑞(𝐵′ , 𝑦) satisfies equation Eq. (5)

55.5 Computation

Let’s now compute an equilibrium of Arellano’s model


The equilibrium objects are the value function 𝑣(𝐵, 𝑦), the associated default decision rule,
and the pricing function 𝑞(𝐵′ , 𝑦)
We’ll use our code to replicate Arellano’s results
After that we’ll perform some additional simulations
The majority of the code below was written by Chase Coleman
It uses a slightly modified version of the algorithm recommended by Arellano

• The appendix to [8] recommends value function iteration until convergence, updating
the price, and then repeating
• Instead, we update the bond price at every value function iteration step

The second approach is faster and the two different procedures deliver very similar results
Here is a more detailed description of our algorithm:

1. Guess a value function 𝑣(𝐵, 𝑦) and price function 𝑞(𝐵′ , 𝑦)


2. At each pair (𝐵, 𝑦),

• update the value of defaulting 𝑣𝑑 (𝑦)


55.5. COMPUTATION 921

• update the value of continuing 𝑣𝑐 (𝐵, 𝑦)

1. Update the value function 𝑣(𝐵, 𝑦), the default rule, the implied ex ante default probabil-
ity, and the price function
2. Check for convergence. If converged, stop – if not, go to step 2

We use simple discretization on a grid of asset holdings and income levels


The output process is discretized using Tauchen’s quadrature method
Numba has been used in two places to speed up the code

In [2]: """

Authors: Chase Coleman, John Stachurski

"""
import numpy as np
import random
import quantecon as qe
from numba import jit

class Arellano_Economy:
"""
Arellano 2008 deals with a small open economy whose government
invests in foreign assets in order to smooth the consumption of
domestic households. Domestic households receive a stochastic
path of income.

Parameters
----------
β : float
Time discounting parameter
γ : float
Risk-aversion parameter
r : float
int lending rate
ρ : float
Persistence in the income process
η : float
Standard deviation of the income process
θ : float
Probability of re-entering financial markets in each period
ny : int
Number of points in y grid
nB : int
Number of points in B grid
tol : float
Error tolerance in iteration
maxit : int
Maximum number of iterations
"""

def __init__(self,
β=.953, # time discount rate
γ=2., # risk aversion
r=0.017, # international interest rate
ρ=.945, # persistence in output
η=0.025, # st dev of output shock
θ=0.282, # prob of regaining access
ny=21, # number of points in y grid
nB=251, # number of points in B grid
tol=1e-8, # error tolerance in iteration
maxit=10000):

# Save parameters
self.β, self.γ, self.r = β, γ, r
self.ρ, self.η, self.θ = ρ, η, θ
922 55. DEFAULT RISK AND INCOME FLUCTUATIONS

self.ny, self.nB = ny, nB

# Create grids and discretize Markov process


self.Bgrid = np.linspace(-.45, .45, nB)
self.mc = qe.markov.tauchen(ρ, η, 3, ny)
self.ygrid = np.exp(self.mc.state_values)
self.Py = self.mc.P

# Output when in default


ymean = np.mean(self.ygrid)
self.def_y = np.minimum(0.969 * ymean, self.ygrid)

# Allocate memory
self.Vd = np.zeros(ny)
self.Vc = np.zeros((ny, nB))
self.V = np.zeros((ny, nB))
self.Q = np.ones((ny, nB)) * .95 # Initial guess for prices
self.default_prob = np.empty((ny, nB))

# Compute the value functions, prices, and default prob


self.solve(tol=tol, maxit=maxit)
# Compute the optimal savings policy conditional on no default
self.compute_savings_policy()

def solve(self, tol=1e-8, maxit=10000):


# Iteration Stuff
it = 0
dist = 10.

# Alloc memory to store next iterate of value function


V_upd = np.zeros((self.ny, self.nB))

# == Main loop == #
while dist > tol and maxit > it:

# Compute expectations for this iteration


Vs = self.V, self.Vd, self.Vc
EV, EVd, EVc = (self.Py @ v for v in Vs)

# Run inner loop to update value functions Vc and Vd.


# Note that Vc and Vd are updated in place. Other objects
# are not modified.
_inner_loop(self.ygrid, self.def_y,
self.Bgrid, self.Vd, self.Vc,
EVc, EVd, EV, self.Q,
self.β, self.θ, self.γ)

# Update prices
Vd_compat = np.repeat(self.Vd, self.nB).reshape(self.ny, self.nB)
default_states = Vd_compat > self.Vc
self.default_prob[:, :] = self.Py @ default_states
self.Q[:, :] = (1 - self.default_prob)/(1 + self.r)

# Update main value function and distance


V_upd[:, :] = np.maximum(self.Vc, Vd_compat)
dist = np.max(np.abs(V_upd - self.V))
self.V[:, :] = V_upd[:, :]

it += 1
if it % 25 == 0:
print(f"Running iteration {it} with dist of {dist}")

return None

def compute_savings_policy(self):
"""
Compute optimal savings B' conditional on not defaulting.
The policy is recorded as an index value in Bgrid.
"""

# Allocate memory
self.next_B_index = np.empty((self.ny, self.nB))
EV = self.Py @ self.V
55.5. COMPUTATION 923

_compute_savings_policy(self.ygrid, self.Bgrid, self.Q, EV,


self.γ, self.β, self.next_B_index)

def simulate(self, T, y_init=None, B_init=None):


"""
Simulate time series for output, consumption, B'.
"""
# Find index i such that Bgrid[i] is near 0
zero_B_index = np.searchsorted(self.Bgrid, 0)

if y_init is None:
# Set to index near the mean of the ygrid
y_init = np.searchsorted(self.ygrid, self.ygrid.mean())
if B_init is None:
B_init = zero_B_index
# Start off not in default
in_default = False

y_sim_indices = self.mc.simulate_indices(T, init=y_init)


B_sim_indices = np.empty(T, dtype=np.int64)
B_sim_indices[0] = B_init
q_sim = np.empty(T)
in_default_series = np.zeros(T, dtype=np.int64)

for t in range(T-1):
yi, Bi = y_sim_indices[t], B_sim_indices[t]
if not in_default:
if self.Vc[yi, Bi] < self.Vd[yi]:
in_default = True
Bi_next = zero_B_index
else:
new_index = self.next_B_index[yi, Bi]
Bi_next = new_index
else:
in_default_series[t] = 1
Bi_next = zero_B_index
if random.uniform(0, 1) < self.θ:
in_default = False
B_sim_indices[t+1] = Bi_next
q_sim[t] = self.Q[yi, int(Bi_next)]

q_sim[-1] = q_sim[-2] # Extrapolate for the last price


return_vecs = (self.ygrid[y_sim_indices],
self.Bgrid[B_sim_indices],
q_sim,
in_default_series)

return return_vecs

@jit(nopython=True)
def u(c, γ):
return c**(1-γ)/(1-γ)

@jit(nopython=True)
def _inner_loop(ygrid, def_y, Bgrid, Vd, Vc, EVc,
EVd, EV, qq, β, θ, γ):
"""
This is a numba version of the inner loop of the solve in the
Arellano class. It updates Vd and Vc in place.
"""
ny, nB = len(ygrid), len(Bgrid)
zero_ind = nB // 2 # Integer division
for iy in range(ny):
y = ygrid[iy] # Pull out current y

# Compute Vd
Vd[iy] = u(def_y[iy], γ) + \
β * (θ * EVc[iy, zero_ind] + (1 - θ) * EVd[iy])

# Compute Vc
924 55. DEFAULT RISK AND INCOME FLUCTUATIONS

for ib in range(nB):
B = Bgrid[ib] # Pull out current B

current_max = -1e14
for ib_next in range(nB):
c = max(y - qq[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
Vc[iy, ib] = current_max

return None

@jit(nopython=True)
def _compute_savings_policy(ygrid, Bgrid, Q, EV, γ, β, next_B_index):
# Compute best index in Bgrid given iy, ib
ny, nB = len(ygrid), len(Bgrid)
for iy in range(ny):
y = ygrid[iy]
for ib in range(nB):
B = Bgrid[ib]
current_max = -1e10
for ib_next in range(nB):
c = max(y - Q[iy, ib_next] * Bgrid[ib_next] + B, 1e-14)
m = u(c, γ) + β * EV[iy, ib_next]
if m > current_max:
current_max = m
current_max_index = ib_next
next_B_index[iy, ib] = current_max_index
return None

55.6 Results

Let’s start by trying to replicate the results obtained in [8]


In what follows, all results are computed using Arellano’s parameter values
The values can be seen in the __init__ method of the Arellano_Economy shown above

• For example, r=0.017 matches the average quarterly rate on a 5 year US treasury over
the period 1983–2001

Details on how to compute the figures are reported as solutions to the exercises
The first figure shows the bond price schedule and replicates Figure 3 of Arellano, where 𝑦𝐿
and 𝑌𝐻 are particular below average and above average values of output 𝑦
55.6. RESULTS 925

- 𝑦𝐿 is 5% below the mean of the 𝑦 grid values


- 𝑦𝐻 is 5% above the mean of the 𝑦 grid values
The grid used to compute this figure was relatively coarse (ny, nB = 21, 251) in order to
match Arrelano’s findings
Here’s the same relationships computed on a finer grid (ny, nB = 51, 551)

In either case, the figure shows that

• Higher levels of debt (larger −𝐵′ ) induce larger discounts on the face value, which cor-
respond to higher interest rates
• Lower income also causes more discounting, as foreign creditors anticipate greater likeli-
hood of default
926 55. DEFAULT RISK AND INCOME FLUCTUATIONS

The next figure plots value functions and replicates the right hand panel of Figure 4 of [8]

We can use the results of the computation to study the default probability 𝛿(𝐵′ , 𝑦) defined in
Eq. (4)
The next plot shows these default probabilities over (𝐵′ , 𝑦) as a heat map

As anticipated, the probability that the government chooses to default in the following period
increases with indebtedness and falls with income
Next let’s run a time series simulation of {𝑦𝑡 }, {𝐵𝑡 } and 𝑞(𝐵𝑡+1 , 𝑦𝑡 )
The grey vertical bars correspond to periods when the economy is excluded from financial
markets because of a past default
55.7. EXERCISES 927

One notable feature of the simulated data is the nonlinear response of interest rates
Periods of relative stability are followed by sharp spikes in the discount rate on government
debt

55.7 Exercises

55.7.1 Exercise 1

To the extent that you can, replicate the figures shown above

• Use the parameter values listed as defaults in the __init__ method of the Arel-
lano_Economy
• The time series will of course vary depending on the shock draws

55.8 Solutions

Compute the value function, policy and equilibrium prices


928 55. DEFAULT RISK AND INCOME FLUCTUATIONS

In [3]: import matplotlib.pyplot as plt


%matplotlib inline

ae = Arellano_Economy(β=.953, # time discount factor


γ=2., # risk aversion
r=0.017, # international interest rate
ρ=.945, # persistence in output
η=0.025, # st dev of output shock
θ=0.282, # prob of regaining access
ny=21, # number of points in y grid
nB=251, # number of points in B grid
tol=1e-8, # error tolerance in iteration
maxit=10000)

Running iteration 25 with dist of 0.34324232989002823


Running iteration 50 with dist of 0.09839155779848241
Running iteration 75 with dist of 0.029212095591656606
Running iteration 100 with dist of 0.00874510696905162
Running iteration 125 with dist of 0.002623141215579494
Running iteration 150 with dist of 0.0007871926699110077
Running iteration 175 with dist of 0.00023625911163449587
Running iteration 200 with dist of 7.091000628989264e-05
Running iteration 225 with dist of 2.1282821137447172e-05
Running iteration 250 with dist of 6.387802962137812e-06
Running iteration 275 with dist of 1.917228964032347e-06
Running iteration 300 with dist of 5.754352905285032e-07
Running iteration 325 with dist of 1.7271061381052277e-07
Running iteration 350 with dist of 5.1837211856309295e-08
Running iteration 375 with dist of 1.555838480271632e-08

Compute the bond price schedule as seen in figure 3 of Arellano (2008)

In [4]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))

fig, ax = plt.subplots(figsize=(10, 6.5))


ax.set_title("Bond price schedule $q(y, B')$")

# Extract a suitable plot grid


x = []
q_low = []
q_high = []
for i in range(ae.nB):
b = ae.Bgrid[i]
if -0.35 <= b <= 0: # To match fig 3 of Arellano
x.append(b)
q_low.append(ae.Q[iy_low, i])
q_high.append(ae.Q[iy_high, i])
ax.plot(x, q_high, label="$y_H$", lw=2, alpha=0.7)
ax.plot(x, q_low, label="$y_L$", lw=2, alpha=0.7)
ax.set_xlabel("$B'$")
ax.legend(loc='upper left', frameon=False)
plt.show()
55.8. SOLUTIONS 929

Draw a plot of the value functions

In [5]: # Create "Y High" and "Y Low" values as 5% devs from mean
high, low = np.mean(ae.ygrid) * 1.05, np.mean(ae.ygrid) * .95
iy_high, iy_low = (np.searchsorted(ae.ygrid, x) for x in (high, low))

fig, ax = plt.subplots(figsize=(10, 6.5))


ax.set_title("Value Functions")
ax.plot(ae.Bgrid, ae.V[iy_high], label="$y_H$", lw=2, alpha=0.7)
ax.plot(ae.Bgrid, ae.V[iy_low], label="$y_L$", lw=2, alpha=0.7)
ax.legend(loc='upper left')
ax.set(xlabel="$B$", ylabel="$V(y, B)$")
ax.set_xlim(ae.Bgrid.min(), ae.Bgrid.max())
plt.show()
930 55. DEFAULT RISK AND INCOME FLUCTUATIONS

Draw a heat map for default probability

In [6]: xx, yy = ae.Bgrid, ae.ygrid


zz = ae.default_prob

# Create figure
fig, ax = plt.subplots(figsize=(10, 6.5))
hm = ax.pcolormesh(xx, yy, zz)
cax = fig.add_axes([.92, .1, .02, .8])
fig.colorbar(hm, cax=cax)
ax.axis([xx.min(), 0.05, yy.min(), yy.max()])
ax.set(xlabel="$B'$", ylabel="$y$", title="Probability of Default")
plt.show()
55.8. SOLUTIONS 931

Plot a time series of major variables simulated from the model

In [7]: T = 250
y_vec, B_vec, q_vec, default_vec = ae.simulate(T)

# Pick up default start and end dates


start_end_pairs = []
i = 0
while i < len(default_vec):
if default_vec[i] == 0:
i += 1
else:
# If we get to here we're in default
start_default = i
while i < len(default_vec) and default_vec[i] == 1:
i += 1
end_default = i - 1
start_end_pairs.append((start_default, end_default))

plot_series = y_vec, B_vec, q_vec


titles = 'output', 'foreign assets', 'bond price'

fig, axes = plt.subplots(len(plot_series), 1, figsize=(10, 12))


fig.subplots_adjust(hspace=0.3)

for ax, series, title in zip(axes, plot_series, titles):


# determine suitable y limits
s_max, s_min = max(series), min(series)
s_range = s_max - s_min
y_max = s_max + s_range * 0.1
y_min = s_min - s_range * 0.1
ax.set_ylim(y_min, y_max)
for pair in start_end_pairs:
ax.fill_between(pair, (y_min, y_min), (y_max, y_max),
color='k', alpha=0.3)
ax.grid()
ax.plot(range(T), series, lw=2, alpha=0.7)
ax.set(title=title, xlabel="time")

plt.show()
932 55. DEFAULT RISK AND INCOME FLUCTUATIONS
56

Globalization and Cycles

56.1 Contents

• Overview 56.2

• Key Ideas 56.3

• Model 56.4

• Simulation 56.5

• Exercises 56.6

• Solutions 56.7

This lecture is coauthored with Chase Coleman

56.2 Overview

In this lecture, we review the paper Globalization and Synchronization of Innovation Cycles
by Kiminori Matsuyama, Laura Gardini and Iryna Sushko
This model helps us understand several interesting stylized facts about the world economy
One of these is synchronized business cycles across different countries
Most existing models that generate synchronized business cycles do so by assumption, since
they tie output in each country to a common shock
They also fail to explain certain features of the data, such as the fact that the degree of syn-
chronization tends to increase with trade ties
By contrast, in the model we consider in this lecture, synchronization is both endogenous and
increasing with the extent of trade integration
In particular, as trade costs fall and international competition increases, innovation incentives
become aligned and countries synchronize their innovation cycles

933
934 56. GLOBALIZATION AND CYCLES

56.2.1 Background

The model builds on work by Judd [73], Deneckner and Judd [34] and Helpman and Krugman
[64] by developing a two-country model with trade and innovation
On the technical side, the paper introduces the concept of coupled oscillators to economic
modeling
As we will see, coupled oscillators arise endogenously within the model
Below we review the model and replicate some of the results on synchronization of innovation
across countries

56.3 Key Ideas

It is helpful to begin with an overview of the mechanism

56.3.1 Innovation Cycles

As discussed above, two countries produce and trade with each other
In each country, firms innovate, producing new varieties of goods and, in doing so, receiving
temporary monopoly power
Imitators follow and, after one period of monopoly, what had previously been new varieties
now enter competitive production
Firms have incentives to innovate and produce new goods when the mass of varieties of goods
currently in production is relatively low
In addition, there are strategic complementarities in the timing of innovation
Firms have incentives to innovate in the same period, so as to avoid competing with substi-
tutes that are competitively produced
This leads to temporal clustering in innovations in each country
After a burst of innovation, the mass of goods currently in production increases
However, goods also become obsolete, so that not all survive from period to period
This mechanism generates a cycle, where the mass of varieties increases through simultaneous
innovation and then falls through obsolescence

56.3.2 Synchronization

In the absence of trade, the timing of innovation cycles in each country is decoupled
This will be the case when trade costs are prohibitively high
If trade costs fall, then goods produced in each country penetrate each other’s markets
As illustrated below, this leads to synchronization of business cycles across the two countries
56.4. MODEL 935

56.4 Model

Let’s write down the model more formally


(The treatment is relatively terse since full details can be found in the original paper)
Time is discrete with 𝑡 = 0, 1, …
There are two countries indexed by 𝑗 or 𝑘
In each country, a representative household inelastically supplies 𝐿𝑗 units of labor at wage
rate 𝑤𝑗,𝑡
Without loss of generality, it is assumed that 𝐿1 ≥ 𝐿2
Households consume a single nontradeable final good which is produced competitively
Its production involves combining two types of tradeable intermediate inputs via

𝑜 1−𝛼 𝛼
𝑋𝑘,𝑡 𝑋𝑘,𝑡
𝑌𝑘,𝑡 = 𝐶𝑘,𝑡 = ( ) ( )
1−𝛼 𝛼

𝑜
Here 𝑋𝑘,𝑡 is a homogeneous input which can be produced from labor using a linear, one-for-
one technology
It is freely tradeable, competitively supplied, and homogeneous across countries
By choosing the price of this good as numeraire and assuming both countries find it optimal
to always produce the homogeneous good, we can set 𝑤1,𝑡 = 𝑤2,𝑡 = 1
The good 𝑋𝑘,𝑡 is a composite, built from many differentiated goods via

1
1− 1 1− 𝜎
𝑋𝑘,𝑡 𝜎 = ∫ [𝑥𝑘,𝑡 (𝜈)] 𝑑𝜈
Ω𝑡

Here 𝑥𝑘,𝑡 (𝜈) is the total amount of a differentiated good 𝜈 ∈ Ω𝑡 that is produced
The parameter 𝜎 > 1 is the direct partial elasticity of substitution between a pair of varieties
and Ω𝑡 is the set of varieties available in period 𝑡
We can split the varieties into those which are supplied competitively and those supplied mo-
nopolistically; that is, Ω𝑡 = Ω𝑐𝑡 + Ω𝑚
𝑡

56.4.1 Prices

Demand for differentiated inputs is

−𝜎
𝑝𝑘,𝑡 (𝜈) 𝛼𝐿𝑘
𝑥𝑘,𝑡 (𝜈) = ( )
𝑃𝑘,𝑡 𝑃𝑘,𝑡

Here

• 𝑝𝑘,𝑡 (𝜈) is the price of the variety 𝜈 and


• 𝑃𝑘,𝑡 is the price index for differentiated inputs in 𝑘, defined by
936 56. GLOBALIZATION AND CYCLES

1−𝜎
[𝑃𝑘,𝑡 ] = ∫ [𝑝𝑘,𝑡 (𝜈)]1−𝜎 𝑑𝜈
Ω𝑡

The price of a variety also depends on the origin, 𝑗, and destination, 𝑘, of the goods because
shipping varieties between countries incurs an iceberg trade cost 𝜏𝑗,𝑘
Thus the effective price in country 𝑘 of a variety 𝜈 produced in country 𝑗 becomes 𝑝𝑘,𝑡 (𝜈) =
𝜏𝑗,𝑘 𝑝𝑗,𝑡 (𝜈)
Using these expressions, we can derive the total demand for each variety, which is

𝐷𝑗,𝑡 (𝜈) = ∑ 𝜏𝑗,𝑘 𝑥𝑘,𝑡 (𝜈) = 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 (𝜈))−𝜎


𝑘

where

𝜌𝑗,𝑘 𝐿𝑘
𝐴𝑗,𝑡 ∶= ∑ and 𝜌𝑗,𝑘 = (𝜏𝑗,𝑘 )1−𝜎 ≤ 1
𝑘
(𝑃𝑘,𝑡 )1−𝜎

It is assumed that 𝜏1,1 = 𝜏2,2 = 1 and 𝜏1,2 = 𝜏2,1 = 𝜏 for some 𝜏 > 1, so that

𝜌1,2 = 𝜌2,1 = 𝜌 ∶= 𝜏 1−𝜎 < 1

The value 𝜌 ∈ [0, 1) is a proxy for the degree of globalization


Producing one unit of each differentiated variety requires 𝜓 units of labor, so the marginal
cost is equal to 𝜓 for 𝜈 ∈ Ω𝑗,𝑡
Additionally, all competitive varieties will have the same price (because of equal marginal
cost), which means that, for all 𝜈 ∈ Ω𝑐 ,

𝑐 𝑐 𝑐 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= 𝜓 and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )

Monopolists will have the same marked-up price, so, for all 𝜈 ∈ Ω𝑚 ,

𝑚 𝜓 𝑚 𝑚 −𝜎
𝑝𝑗,𝑡 (𝜈) = 𝑝𝑗,𝑡 ∶= and 𝐷𝑗,𝑡 = 𝑦𝑗,𝑡 ∶= 𝛼𝐴𝑗,𝑡 (𝑝𝑗,𝑡 )
1 − 𝜎1

Define

𝑐
𝑝𝑗,𝑡 𝑐
𝑦𝑗,𝑡 1 1−𝜎
𝜃 ∶= 𝑚 𝑚 = (1 − )
𝑝𝑗,𝑡 𝑦𝑗,𝑡 𝜎

Using the preceding definitions and some algebra, the price indices can now be rewritten as

1−𝜎 𝑚
𝑃𝑘,𝑡 𝑐
𝑁𝑗,𝑡
( ) = 𝑀𝑘,𝑡 + 𝜌𝑀𝑗,𝑡 where 𝑀𝑗,𝑡 ∶= 𝑁𝑗,𝑡 +
𝜓 𝜃

𝑐 𝑚
The symbols 𝑁𝑗,𝑡 and 𝑁𝑗,𝑡 will denote the measures of Ω𝑐 and Ω𝑚 respectively
56.4. MODEL 937

56.4.2 New Varieties

To introduce a new variety, a firm must hire 𝑓 units of labor per variety in each country
Monopolist profits must be less than or equal to zero in expectation, so

𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝑁𝑗,𝑡 ≥ 0, 𝜋𝑗,𝑡 ∶= (𝑝𝑗,𝑡 − 𝜓)𝑦𝑗,𝑡 −𝑓 ≤ 0 and 𝜋𝑗,𝑡 𝑁𝑗,𝑡 =0

With further manipulations, this becomes

𝑚 𝑐 1 𝛼𝐿𝑗 𝛼𝐿𝑘
𝑁𝑗,𝑡 = 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ) ≥ 0, [ + ]≤𝑓
𝜎 𝜃(𝑀𝑗,𝑡 + 𝜌𝑀𝑘,𝑡 ) 𝜃(𝑀𝑗,𝑡 + 𝑀𝑘,𝑡 /𝜌)

56.4.3 Law of Motion

With 𝛿 as the exogenous probability of a variety becoming obsolete, the dynamic equation for
the measure of firms becomes

𝑐 𝑐 𝑚 𝑐 𝑐
𝑁𝑗,𝑡+1 = 𝛿(𝑁𝑗,𝑡 + 𝑁𝑗,𝑡 ) = 𝛿(𝑁𝑗,𝑡 + 𝜃(𝑀𝑗,𝑡 − 𝑁𝑗,𝑡 ))

We will work with a normalized measure of varieties

𝑐 𝑚
𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑁𝑗,𝑡 𝜃𝜎𝑓𝑀𝑗,𝑡 𝑖𝑗,𝑡
𝑛𝑗,𝑡 ∶= , 𝑖𝑗,𝑡 ∶= , 𝑚𝑗,𝑡 ∶= = 𝑛𝑗,𝑡 +
𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝛼(𝐿1 + 𝐿2 ) 𝜃
𝐿𝑗
We also use 𝑠𝑗 ∶= 𝐿1 +𝐿2 to be the share of labor employed in country 𝑗
We can use these definitions and the preceding expressions to obtain a law of motion for
𝑛𝑡 ∶= (𝑛1,𝑡 , 𝑛2,𝑡 )
In particular, given an initial condition, 𝑛0 = (𝑛1,0 , 𝑛2,0 ) ∈ R2+ , the equilibrium trajectory,
𝑡=0 = {(𝑛1,𝑡 , 𝑛2,𝑡 )}𝑡=0 , is obtained by iterating on 𝑛𝑡+1 = 𝐹 (𝑛𝑡 ) where 𝐹 ∶ R+ → R+ is
{𝑛𝑡 }∞ ∞ 2 2

given by

⎧(𝛿(𝜃𝑠1 (𝜌) + (1 − 𝜃)𝑛1,𝑡 ), 𝛿(𝜃𝑠2 (𝜌) + (1 − 𝜃)𝑛2,𝑡 )) for 𝑛𝑡 ∈ 𝐷𝐿𝐿


{
{(𝛿𝑛1,𝑡 , 𝛿𝑛2,𝑡 ) for 𝑛𝑡 ∈ 𝐷𝐻𝐻
𝐹 (𝑛𝑡 ) = ⎨
{(𝛿𝑛1,𝑡 , 𝛿(𝜃ℎ2 (𝑛1,𝑡 ) + (1 − 𝜃)𝑛2,𝑡 )) for 𝑛𝑡 ∈ 𝐷𝐻𝐿
{(𝛿(𝜃ℎ (𝑛 ) + (1 − 𝜃)𝑛 , 𝛿𝑛 )) for 𝑛𝑡 ∈ 𝐷𝐿𝐻
⎩ 1 2,𝑡 1,𝑡 2,𝑡

Here

𝐷𝐿𝐿 ∶= {(𝑛1 , 𝑛2 ) ∈ R2+ |𝑛𝑗 ≤ 𝑠𝑗 (𝜌)}


𝐷𝐻𝐻 ∶= {(𝑛1 , 𝑛2 ) ∈ R2+ |𝑛𝑗 ≥ ℎ𝑗 (𝜌)}
𝐷𝐻𝐿 ∶= {(𝑛1 , 𝑛2 ) ∈ R2+ |𝑛1 ≥ 𝑠1 (𝜌) and 𝑛2 ≤ ℎ2 (𝑛1 )}
𝐷𝐿𝐻 ∶= {(𝑛1 , 𝑛2 ) ∈ R2+ |𝑛1 ≤ ℎ1 (𝑛2 ) and 𝑛2 ≥ 𝑠2 (𝜌)}

while

𝑠1 − 𝜌𝑠2
𝑠1 (𝜌) = 1 − 𝑠2 (𝜌) = min { , 1}
1−𝜌
938 56. GLOBALIZATION AND CYCLES

and ℎ𝑗 (𝑛𝑘 ) is defined implicitly by the equation

𝑠𝑗 𝑠𝑘
1= +
ℎ𝑗 (𝑛𝑘 ) + 𝜌𝑛𝑘 ℎ𝑗 (𝑛𝑘 ) + 𝑛𝑘 /𝜌

Rewriting the equation above gives us a quadratic equation in terms of ℎ𝑗 (𝑛𝑘 )


Since we know ℎ𝑗 (𝑛𝑘 ) > 0 then we can just solve the quadratic equation and return the posi-
tive root
This gives us

1 𝑠𝑗 𝑛𝑘
ℎ𝑗 (𝑛𝑘 )2 + ((𝜌 + )𝑛𝑘 − 𝑠𝑗 − 𝑠𝑘 ) ℎ𝑗 (𝑛𝑘 ) + (𝑛2𝑘 − − 𝑠𝑘 𝑛𝑘 𝜌) = 0
𝜌 𝜌

56.5 Simulation

Let’s try simulating some of these trajectories


We will focus in particular on whether or not innovation cycles synchronize across the two
countries
As we will see, this depends on initial conditions
For some parameterizations, synchronization will occur for “most” initial conditions, while for
others synchronization will be rare
The computational burden of testing synchronization across many initial conditions is not
trivial
In order to make our code fast, we will use just in time compiled functions that will get called
and handled by our class
These are the @jit statements that you see below (review this lecture if you don’t recall how
to use JIT compilation)
Here’s the main body of code

In [1]: import matplotlib.pyplot as plt


import numpy as np
import seaborn as sns

from numba import jit, vectorize

@jit(nopython=True)
def _hj(j, nk, s1, s2, θ, δ, ρ):
"""
If we expand the implicit function for h_j(n_k) then we find that
it is quadratic. We know that h_j(n_k) > 0 so we can get its
value by using the quadratic form
"""
# Find out who's h we are evaluating
if j == 1:
sj = s1
sk = s2
else:
sj = s2
sk = s1

# Coefficients on the quadratic a x^2 + b x + c = 0


a = 1.0
56.5. SIMULATION 939

b = ((ρ + 1 / ρ) * nk - sj - sk)
c = (nk * nk - (sj * nk) / ρ - sk * ρ * nk)

# Positive solution of quadratic form


root = (-b + np.sqrt(b * b - 4 * a * c)) / (2 * a)

return root

@jit(nopython=True)
def DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLL"
return (n1 <= s1_ρ) and (n2 <= s2_ρ)

@jit(nopython=True)
def DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHH"
return (n1 >= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= _hj(2, n1, s1, s2, θ, δ, ρ))

@jit(nopython=True)
def DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DHL"
return (n1 >= s1_ρ) and (n2 <= _hj(2, n1, s1, s2, θ, δ, ρ))

@jit(nopython=True)
def DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"Determine whether (n1, n2) is in the set DLH"
return (n1 <= _hj(1, n2, s1, s2, θ, δ, ρ)) and (n2 >= s2_ρ)

@jit(nopython=True)
def one_step(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Takes a current value for (n_{1, t}, n_{2, t}) and returns the
values (n_{1, t+1}, n_{2, t+1}) according to the law of motion.
"""
# Depending on where we are, evaluate the right branch
if DLL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * s1_ρ + (1 - θ) * n1)
n2_tp1 = δ * (θ * s2_ρ + (1 - θ) * n2)
elif DHH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * n2
elif DHL(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * n1
n2_tp1 = δ * (θ * _hj(2, n1, s1, s2, θ, δ, ρ) + (1 - θ) * n2)
elif DLH(n1, n2, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
n1_tp1 = δ * (θ * _hj(1, n2, s1, s2, θ, δ, ρ) + (1 - θ) * n1)
n2_tp1 = δ * n2

return n1_tp1, n2_tp1

@jit(nopython=True)
def n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ):
"""
Given an initial condition, continues to yield new values of
n1 and n2
"""
n1_t, n2_t = n1_0, n2_0
while True:
n1_tp1, n2_tp1 = one_step(n1_t, n2_t, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)
yield (n1_tp1, n2_tp1)
n1_t, n2_t = n1_tp1, n2_tp1

@jit(nopython=True)
def _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers):
"""
Takes initial values and iterates forward to see whether
the histories eventually end up in sync.

If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm
940 56. GLOBALIZATION AND CYCLES

Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for

Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Initialize the status of synchronization
synchronized = False
pers_2_sync = maxiter
iters = 0

# Initialize generator
n_gen = n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)

# Will use a counter to determine how many times in a row


# the firm measures are the same
nsync = 0

while (not synchronized) and (iters < maxiter):


# Increment the number of iterations and get next values
iters += 1
n1_t, n2_t = next(n_gen)

# Check whether same in this period


if abs(n1_t - n2_t) < 1e-8:
nsync += 1
# If not, then reset the nsync counter
else:
nsync = 0

# If we have been in sync for npers then stop and countries


# became synchronized nsync periods ago
if nsync > npers:
synchronized = True
pers_2_sync = iters - nsync

return synchronized, pers_2_sync

@jit(nopython=True)
def _create_attraction_basis(s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers, npts):
# Create unit range with npts
synchronized, pers_2_sync = False, 0
unit_range = np.linspace(0.0, 1.0, npts)

# Allocate space to store time to sync


time_2_sync = np.empty((npts, npts))
# Iterate over initial conditions
for (i, n1_0) in enumerate(unit_range):
for (j, n2_0) in enumerate(unit_range):
synchronized, pers_2_sync = _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ,
s1, s2, θ, δ, ρ,
maxiter, npers)
time_2_sync[i, j] = pers_2_sync

return time_2_sync

# == Now we define a class for the model == #


56.5. SIMULATION 941

class MSGSync:
"""
The paper "Globalization and Synchronization of Innovation Cycles" presents
a two-country model with endogenous innovation cycles. Combines elements
from Deneckere Judd (1985) and Helpman Krugman (1985) to allow for a
model with trade that has firms who can introduce new varieties into
the economy.

We focus on being able to determine whether the two countries eventually


synchronize their innovation cycles. To do this, we only need a few
of the many parameters. In particular, we need the parameters listed
below

Parameters
----------
s1 : scalar(Float)
Amount of total labor in country 1 relative to total worldwide labor
θ : scalar(Float)
A measure of how much more of the competitive variety is used in
production of final goods
δ : scalar(Float)
Percentage of firms that are not exogenously destroyed every period
ρ : scalar(Float)
Measure of how expensive it is to trade between countries
"""
def __init__(self, s1=0.5, θ=2.5, δ=0.7, ρ=0.2):
# Store model parameters
self.s1, self.θ, self.δ, self.ρ = s1, θ, δ, ρ

# Store other cutoffs and parameters we use


self.s2 = 1 - s1
self.s1_ρ = self._calc_s1_ρ()
self.s2_ρ = 1 - self.s1_ρ

def _unpack_params(self):
return self.s1, self.s2, self.θ, self.δ, self.ρ

def _calc_s1_ρ(self):
# Unpack params
s1, s2, θ, δ, ρ = self._unpack_params()

# s_1(ρ) = min(val, 1)
val = (s1 - ρ * s2) / (1 - ρ)
return min(val, 1)

def simulate_n(self, n1_0, n2_0, T):


"""
Simulates the values of (n1, n2) for T periods

Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
T : scalar(Int)
Number of periods to simulate

Returns
-------
n1 : Array(Float64, ndim=1)
A history of normalized measures of firms in country one
n2 : Array(Float64, ndim=1)
A history of normalized measures of firms in country two
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ

# Allocate space
n1 = np.empty(T)
n2 = np.empty(T)
942 56. GLOBALIZATION AND CYCLES

# Create the generator


n1[0], n2[0] = n1_0, n2_0
n_gen = n_generator(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ)

# Simulate for T periods


for t in range(1, T):
# Get next values
n1_tp1, n2_tp1 = next(n_gen)

# Store in arrays
n1[t] = n1_tp1
n2[t] = n2_tp1

return n1, n2

def pers_till_sync(self, n1_0, n2_0, maxiter=500, npers=3):


"""
Takes initial values and iterates forward to see whether
the histories eventually end up in sync.

If countries are symmetric then as soon as the two countries have the
same measure of firms then they will be synchronized -- However, if
they are not symmetric then it is possible they have the same measure
of firms but are not yet synchronized. To address this, we check whether
firms stay synchronized for `npers` periods with Euclidean norm

Parameters
----------
n1_0 : scalar(Float)
Initial normalized measure of firms in country one
n2_0 : scalar(Float)
Initial normalized measure of firms in country two
maxiter : scalar(Int)
Maximum number of periods to simulate
npers : scalar(Int)
Number of periods we would like the countries to have the
same measure for

Returns
-------
synchronized : scalar(Bool)
Did the two economies end up synchronized
pers_2_sync : scalar(Int)
The number of periods required until they synchronized
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ

return _pers_till_sync(n1_0, n2_0, s1_ρ, s2_ρ, s1, s2, θ, δ, ρ, maxiter, npers)

def create_attraction_basis(self, maxiter=250, npers=3, npts=50):


"""
Creates an attraction basis for values of n on [0, 1] X [0, 1] with npts in each dimension
"""
# Unpack parameters
s1, s2, θ, δ, ρ = self._unpack_params()
s1_ρ, s2_ρ = self.s1_ρ, self.s2_ρ

ab = _create_attraction_basis(s1_ρ, s2_ρ, s1, s2, θ, δ,


ρ, maxiter, npers, npts)

return ab

56.5.1 Time Series of Firm Measures

We write a short function below that exploits the preceding code and plots two time series
Each time series gives the dynamics for the two countries
56.5. SIMULATION 943

The time series share parameters but differ in their initial condition
Here’s the function

In [2]: def plot_timeseries(n1_0, n2_0, s1=0.5, θ=2.5, δ=0.7, ρ=0.2, ax=None, title=''):
"""
Plot a single time series with initial conditions
"""
if ax is None:
fig, ax = plt.subplots()

# Create the MSG Model and simulate with initial conditions


model = MSGSync(s1, θ, δ, ρ)
n1, n2 = model.simulate_n(n1_0, n2_0, 25)

ax.plot(np.arange(25), n1, label="$n_1$", lw=2)


ax.plot(np.arange(25), n2, label="$n_2$", lw=2)

ax.legend()
ax.set(title=title, ylim=(0.15, 0.8))

return ax

# Create figure
fig, ax = plt.subplots(2, 1, figsize=(10, 8))

plot_timeseries(0.15, 0.35, ax=ax[0], title='Not Synchronized')


plot_timeseries(0.4, 0.3, ax=ax[1], title='Synchronized')

fig.tight_layout()

plt.show()
944 56. GLOBALIZATION AND CYCLES

In the first case, innovation in the two countries does not synchronize
In the second case, different initial conditions are chosen, and the cycles become synchronized

56.5.2 Basin of Attraction

Next, let’s study the initial conditions that lead to synchronized cycles more systematically
We generate time series from a large collection of different initial conditions and mark those
conditions with different colors according to whether synchronization occurs or not
The next display shows exactly this for four different parameterizations (one for each subfig-
ure)
Dark colors indicate synchronization, while light colors indicate failure to synchronize

As you can see, larger values of 𝜌 translate to more synchronization


You are asked to replicate this figure in the exercises
In the solution to the exercises, you’ll also find a figure with sliders, allowing you to experi-
ment with different parameters
Here’s one snapshot from the interactive figure
56.6. EXERCISES 945

56.6 Exercises

56.6.1 Exercise 1

Replicate the figure shown above by coloring initial conditions according to whether or not
synchronization occurs from those conditions

56.7 Solutions

These solutions are written by Chase Coleman

In [3]: import matplotlib.pyplot as plt


%matplotlib inline

def plot_attraction_basis(s1=0.5, θ=2.5, δ=0.7, ρ=0.2, npts=250, ax=None):


if ax is None:
fig, ax = plt.subplots()

# Create attraction basis


unitrange = np.linspace(0, 1, npts)
model = MSGSync(s1, θ, δ, ρ)
ab = model.create_attraction_basis(npts=npts)
cf = ax.pcolormesh(unitrange, unitrange, ab, cmap="viridis")

return ab, cf

fig = plt.figure(figsize=(14, 12))

# Left - Bottom - Width - Height


ax0 = fig.add_axes((0.05, 0.475, 0.38, 0.35), label="axes0")
ax1 = fig.add_axes((0.5, 0.475, 0.38, 0.35), label="axes1")
ax2 = fig.add_axes((0.05, 0.05, 0.38, 0.35), label="axes2")
ax3 = fig.add_axes((0.5, 0.05, 0.38, 0.35), label="axes3")
946 56. GLOBALIZATION AND CYCLES

params = [[0.5, 2.5, 0.7, 0.2],


[0.5, 2.5, 0.7, 0.4],
[0.5, 2.5, 0.7, 0.6],
[0.5, 2.5, 0.7, 0.8]]

ab0, cf0 = plot_attraction_basis(*params[0], npts=500, ax=ax0)


ab1, cf1 = plot_attraction_basis(*params[1], npts=500, ax=ax1)
ab2, cf2 = plot_attraction_basis(*params[2], npts=500, ax=ax2)
ab3, cf3 = plot_attraction_basis(*params[3], npts=500, ax=ax3)

cbar_ax = fig.add_axes([0.9, 0.075, 0.03, 0.725])


plt.colorbar(cf0, cax=cbar_ax)

ax0.set_title(r"$s_1=0.5$, $\theta=2.5$, $\delta=0.7$, $\rho=0.2$",


fontsize=22)
ax1.set_title(r"$s_1=0.5$, $\theta=2.5$, $\delta=0.7$, $\rho=0.4$",
fontsize=22)
ax2.set_title(r"$s_1=0.5$, $\theta=2.5$, $\delta=0.7$, $\rho=0.6$",
fontsize=22)
ax3.set_title(r"$s_1=0.5$, $\theta=2.5$, $\delta=0.7$, $\rho=0.8$",
fontsize=22)

fig.suptitle("Synchronized versus Asynchronized 2-cycles",


x=0.475, y=0.915, size=26)
plt.show()

56.7.1 Interactive Version

Additionally, instead of just seeing 4 plots at once, we might want to manually be able to
change 𝜌 and see how it affects the plot in real-time. Below we use an interactive plot to do
56.7. SOLUTIONS 947

this
Note, interactive plotting requires the ipywidgets module to be installed and enabled

In [4]: from ipywidgets import interact

def interact_attraction_basis(ρ=0.2, maxiter=250, npts=250):


# Create the figure and axis that we will plot on
fig, ax = plt.subplots(figsize=(12, 10))

# Create model and attraction basis


s1, θ, δ = 0.5, 2.5, 0.75
model = MSGSync(s1, θ, δ, ρ)
ab = model.create_attraction_basis(maxiter=maxiter, npts=npts)

# Color map with colormesh


unitrange = np.linspace(0, 1, npts)
cf = ax.pcolormesh(unitrange, unitrange, ab, cmap="viridis")
cbar_ax = fig.add_axes([0.95, 0.15, 0.05, 0.7])
plt.colorbar(cf, cax=cbar_ax)
plt.show()
return None

In [5]: fig = interact(interact_attraction_basis,


ρ=(0.0, 1.0, 0.05),
maxiter=(50, 5000, 50),
npts=(25, 750, 25))
948 56. GLOBALIZATION AND CYCLES
57

Coase’s Theory of the Firm

57.1 Contents

• Overview 57.2
• The Model 57.3
• Equilibrium 57.4
• Existence, Uniqueness and Computation of Equilibria 57.5
• Implementation 57.6
• Exercises 57.7
• Solutions 57.8

57.2 Overview

In 1937, Ronald Coase wrote a brilliant essay on the nature of the firm [27]
Coase was writing at a time when the Soviet Union was rising to become a significant indus-
trial power
At the same time, many free-market economies were afflicted by a severe and painful depres-
sion
This contrast led to an intensive debate on the relative merits of decentralized, price-based
allocation versus top-down planning
In the midst of this debate, Coase made an important observation: even in free-market
economies, a great deal of top-down planning does in fact take place
This is because firms form an integral part of free-market economies and, within firms, alloca-
tion is by planning
In other words, free-market economies blend both planning (within firms) and decentralized
production coordinated by prices
The question Coase asked is this: if prices and free markets are so efficient, then why do firms
even exist?
Couldn’t the associated within-firm planning be done more efficiently by the market?

949
950 57. COASE’S THEORY OF THE FIRM

57.2.1 Why Firms Exist

On top of asking a deep and fascinating question, Coase also supplied an illuminating answer:
firms exist because of transaction costs
Here’s one example of a transaction cost:
Suppose agent A is considering setting up a small business and needs a web developer to con-
struct and help run an online store
She can use the labor of agent B, a web developer, by writing up a freelance contract for
these tasks and agreeing on a suitable price
But contracts like this can be time-consuming and difficult to verify

• How will agent A be able to specify exactly what she wants, to the finest detail, when
she herself isn’t sure how the business will evolve?
• And what if she isn’t familiar with web technology? How can she specify all the relevant
details?
• And, if things go badly, will failure to comply with the contract be verifiable in court?

In this situation, perhaps it will be easier to employ agent B under a simple labor contract
The cost of this contract is far smaller because such contracts are simpler and more standard
The basic agreement in a labor contract is: B will do what A asks him to do for the term of
the contract, in return for a given salary
Making this agreement is much easier than trying to map every task out in advance in a con-
tract that will hold up in a court of law
So agent A decides to hire agent B and a firm of nontrivial size appears, due to transaction
costs

57.2.2 A Trade-Off

Actually, we haven’t yet come to the heart of Coase’s investigation


The issue of why firms exist is a binary question: should firms have positive size or zero size?
A better and more general question is: what determines the size of firms?
The answer Coase came up with was that “a firm will tend to expand until the costs of or-
ganizing an extra transaction within the firm become equal to the costs of carrying out the
same transaction by means of an exchange on the open market…” ([27], p. 395)
But what are these internal and external costs?
In short, Coase envisaged a trade-off between

• transaction costs, which add to the expense of operating between firms, and
• diminishing returns to management, which adds to the expense of operating within
firms

We discussed an example of transaction costs above (contracts)


The other cost, diminishing returns to management, is a catch-all for the idea that big opera-
tions are increasingly costly to manage
57.3. THE MODEL 951

For example, you could think of management as a pyramid, so hiring more workers to im-
plement more tasks requires expansion of the pyramid, and hence labor costs grow at a rate
more than proportional to the range of tasks
Diminishing returns to management makes in-house production expensive, favoring small
firms

57.2.3 Summary

Here’s a summary of our discussion:

• Firms grow because transaction costs encourage them to take some operations in house
• But as they get large, in-house operations become costly due to diminishing returns to
management
• The size of firms is determined by balancing these effects, thereby equalizing the
marginal costs of each form of operation

57.2.4 A Quantitative Interpretation

Coases ideas were expressed verbally, without any mathematics


In fact, his essay is a wonderful example of how far you can get with clear thinking and plain
English
However, plain English is not good for quantitative analysis, so let’s bring some mathematical
and computation tools to bear
In doing so we’ll add a bit more structure than Coase did, but this price will be worth paying
Our exposition is based on [77]
We use the following standard imports:

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

57.3 The Model

The model we study involves production of a single unit of a final good


Production requires a linearly ordered chain, requiring sequential completion of a large num-
ber of processing stages
The stages are indexed by 𝑡 ∈ [0, 1], with 𝑡 = 0 indicating that no tasks have been undertaken
and 𝑡 = 1 indicating that the good is complete

57.3.1 Subcontracting

The subcontracting scheme by which tasks are allocated across firms is illustrated in the fig-
ure below
952 57. COASE’S THEORY OF THE FIRM

In this example,

• Firm 1 receives a contract to sell one unit of the completed good to a final buyer
• Firm 1 then forms a contract with firm 2 to purchase the partially completed good at
stage 𝑡1 , with

the intention of implementing the remaining 1 − 𝑡1 tasks in-house (i.e., processing from stage
𝑡1 to stage 1)

• Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at
stage 𝑡2
• firm 3 decides to complete the chain, selecting 𝑡3 = 0

At this point, production unfolds in the opposite direction (i.e., from upstream to down-
stream)

• Firm 3 completes processing stages from 𝑡3 = 0 up to 𝑡2 and transfers the good to firm
2
• Firm 2 then processes from 𝑡2 up to 𝑡1 and transfers the good to firm 1,
• Firm 1 processes from 𝑡1 to 1 and delivers the completed good to the final buyer

The length of the interval of stages (range of tasks) carried out by firm 𝑖 is denoted by ℓ𝑖

Each firm chooses only its upstream boundary, treating its downstream boundary as given
The benefit of this formulation is that it implies a recursive structure for the decision problem
for each firm
57.4. EQUILIBRIUM 953

In choosing how many processing stages to subcontract, each successive firm faces essentially
the same decision problem as the firm above it in the chain, with the only difference being
that the decision space is a subinterval of the decision space for the firm above
We will exploit this recursive structure in our study of equilibrium

57.3.2 Costs

Recall that we are considering a trade-off between two types of costs


Let’s discuss these costs and how we represent them mathematically
Diminishing returns to management means rising costs per task when a firm expands
the range of productive activities coordinated by its managers
We represent these ideas by taking the cost of carrying out ℓ tasks in-house to be 𝑐(ℓ), where
𝑐 is increasing and strictly convex
Thus, the average cost per task rises with the range of tasks performed in-house
We also assume that 𝑐 is continuously differentiable, with 𝑐(0) = 0 and 𝑐′ (0) > 0
Transaction costs are represented as a wedge between the buyer’s and seller’s prices
It matters little for us whether the transaction cost is borne by the buyer or the seller
Here we assume that the cost is borne only by the buyer
In particular, when two firms agree to a trade at face value 𝑣, the buyer’s total outlay is 𝛿𝑣,
where 𝛿 > 1
The seller receives only 𝑣, and the difference is paid to agents outside the model

57.4 Equilibrium

We assume that all firms are ex-ante identical and act as price takers
As price takers, they face a price function 𝑝, which is a map from [0, 1] to R+ , with 𝑝(𝑡) inter-
preted as the price of the good at processing stage 𝑡
There is a countable infinity of firms indexed by 𝑖 and no barriers to entry
The cost of supplying the initial input (the good processed up to stage zero) is set to zero for
simplicity
Free entry and the infinite fringe of competitors rule out positive profits for incumbents, since
any incumbent could be replaced by a member of the competitive fringe filling the same role
in the production chain
Profits are never negative in equilibrium because firms can freely exit

57.4.1 Informal Definition of Equilibrium

An equilibrium in this setting is an allocation of firms and a price function such that

1. all active firms in the chain make zero profits, including suppliers of raw materials
2. no firm in the production chain has an incentive to deviate, and
954 57. COASE’S THEORY OF THE FIRM

3. no inactive firms can enter and extract positive profits

57.4.2 Formal Definition of Equilibrium

Let’s make this definition more formal


(You might like to skip this section on first reading)
An allocation of firms is a nonnegative sequence {ℓ𝑖 }𝑖∈N such that ℓ𝑖 = 0 for all sufficiently
large 𝑖
Recalling the figures above,

• ℓ𝑖 represents the range of tasks implemented by the 𝑖-th firm

As a labeling convention, we assume that firms enter in order, with firm 1 being the furthest
downstream
An allocation {ℓ𝑖 } is called feasible if ∑ 𝑖≥1 ℓ𝑖 = 1
In a feasible allocation, the entire production process is completed by finitely many firms
Given a feasible allocation, {ℓ𝑖 }, let {𝑡𝑖 } represent the corresponding transaction stages, de-
fined by

𝑡0 = 𝑠 and 𝑡𝑖 = 𝑡𝑖−1 − ℓ𝑖 (1)

In particular, 𝑡𝑖−1 is the downstream boundary of firm 𝑖 and 𝑡𝑖 is its upstream boundary
As transaction costs are incurred only by the buyer, its profits are

𝜋𝑖 = 𝑝(𝑡𝑖−1 ) − 𝑐(ℓ𝑖 ) − 𝛿𝑝(𝑡𝑖 ) (2)

Given a price function 𝑝 and a feasible allocation {ℓ𝑖 }, let

• {𝑡𝑖 } be the corresponding firm boundaries


• {𝜋𝑖 } be corresponding profits, as defined in Eq. (2)

This price-allocation pair is called an equilibrium for the production chain if

1. 𝑝(0) = 0,
2. 𝜋𝑖 = 0 for all 𝑖, and
3. 𝑝(𝑠) − 𝑐(𝑠 − 𝑡) − 𝛿𝑝(𝑡) ≤ 0 for any pair 𝑠, 𝑡 with 0 ≤ 𝑠 ≤ 𝑡 ≤ 1

The rationale behind these conditions was given in our informal definition of equilibrium
above

57.5 Existence, Uniqueness and Computation of Equilibria

We have defined an equilibrium but does one exist? Is it unique? And, if so, how can we com-
pute it?
57.5. EXISTENCE, UNIQUENESS AND COMPUTATION OF EQUILIBRIA 955

57.5.1 A Fixed Point Method

To address these questions, we introduce the operator 𝑇 mapping a nonnegative function 𝑝 on


[0, 1] to 𝑇 𝑝 via

𝑇 𝑝(𝑠) = min {𝑐(𝑠 − 𝑡) + 𝛿𝑝(𝑡)} for all 𝑠 ∈ [0, 1]. (3)


𝑡≤𝑠

Here and below, the restriction 0 ≤ 𝑡 in the minimum is understood


The operator 𝑇 is similar to a Bellman operator
Under this analogy, 𝑝 corresponds to a value function and 𝛿 to a discount factor
But 𝛿 > 1, so 𝑇 is not a contraction in any obvious metric, and in fact, 𝑇 𝑛 𝑝 diverges for
many choices of 𝑝
Nevertheless, there exists a domain on which 𝑇 is well-behaved: the set of convex increasing
continuous functions 𝑝 ∶ [0, 1] → R such that 𝑐′ (0)𝑠 ≤ 𝑝(𝑠) ≤ 𝑐(𝑠) for all 0 ≤ 𝑠 ≤ 1
We denote this set of functions by 𝒫
In [77] it is shown that the following statements are true:

1. 𝑇 maps 𝒫 into itself


2. 𝑇 has a unique fixed point in 𝒫, denoted below by 𝑝∗
3. For all 𝑝 ∈ 𝒫 we have 𝑇 𝑘 𝑝 → 𝑝∗ uniformly as 𝑘 → ∞

Now consider the choice function

𝑡∗ (𝑠) ∶= the solution to min{𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} (4)


𝑡≤𝑠

By definition, 𝑡∗ (𝑠) is the cost-minimizing upstream boundary for a firm that is contracted to
deliver the good at stage 𝑠 and faces the price function 𝑝∗
Since 𝑝∗ lies in 𝒫 and since 𝑐 is strictly convex, it follows that the right-hand side of Eq. (4) is
continuous and strictly convex in 𝑡
Hence the minimizer 𝑡∗ (𝑠) exists and is uniquely defined
We can use 𝑡∗ to construct an equilibrium allocation as follows:
Recall that firm 1 sells the completed good at stage 𝑠 = 1, its optimal upstream boundary is
𝑡∗ (1)
Hence firm 2’s optimal upstream boundary is 𝑡∗ (𝑡∗ (1))
Continuing in this way produces the sequence {𝑡∗𝑖 } defined by

𝑡∗0 = 1 and 𝑡∗𝑖 = 𝑡∗ (𝑡𝑖−1 ) (5)

The sequence ends when a firm chooses to complete all remaining tasks
We label this firm (and hence the number of firms in the chain) as

𝑛∗ ∶= inf{𝑖 ∈ N ∶ 𝑡∗𝑖 = 0} (6)


956 57. COASE’S THEORY OF THE FIRM

The task allocation corresponding to Eq. (5) is given by ℓ𝑖∗ ∶= 𝑡∗𝑖−1 − 𝑡∗𝑖 for all 𝑖
In [77] it is shown that

1. The value 𝑛∗ in Eq. (6) is well-defined and finite,


2. the allocation {ℓ𝑖∗ } is feasible, and
3. the price function 𝑝∗ and this allocation together forms an equilibrium for the produc-
tion chain

While the proofs are too long to repeat here, much of the insight can be obtained by observ-
ing that, as a fixed point of 𝑇 , the equilibrium price function must satisfy

𝑝∗ (𝑠) = min {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} for all 𝑠 ∈ [0, 1] (7)


𝑡≤𝑠

From this equation, it is clear that so profits are zero for all incumbent firms

57.5.2 Marginal Conditions

We can develop some additional insights on the behavior of firms by examining marginal con-
ditions associated with the equilibrium
As a first step, let ℓ∗ (𝑠) ∶= 𝑠 − 𝑡∗ (𝑠)
This is the cost-minimizing range of in-house tasks for a firm with downstream boundary 𝑠
In [77] it is shown that 𝑡∗ and ℓ∗ are increasing and continuous, while 𝑝∗ is continuously dif-
ferentiable at all 𝑠 ∈ (0, 1) with

(𝑝∗ )′ (𝑠) = 𝑐′ (ℓ∗ (𝑠)) (8)

Equation Eq. (8) follows from 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)} and the envelope theorem for
derivatives
A related equation is the first order condition for 𝑝∗ (𝑠) = min𝑡≤𝑠 {𝑐(𝑠 − 𝑡) + 𝛿𝑝∗ (𝑡)}, the mini-
mization problem for a firm with upstream boundary 𝑠, which is

𝛿(𝑝∗ )′ (𝑡∗ (𝑠)) = 𝑐′ (𝑠 − 𝑡∗ (𝑠)) (9)

This condition matches the marginal condition expressed verbally by Coase that we stated
above:

“A firm will tend to expand until the costs of organizing an extra transaction
within the firm become equal to the costs of carrying out the same transaction
by means of an exchange on the open market…”

Combining Eq. (8) and Eq. (9) and evaluating at 𝑠 = 𝑡𝑖 , we see that active firms that are
adjacent satisfy

𝛿 𝑐′ (ℓ𝑖+1

) = 𝑐′ (ℓ𝑖∗ ) (10)
57.6. IMPLEMENTATION 957

In other words, the marginal in-house cost per task at a given firm is equal to that of its up-
stream partner multiplied by gross transaction cost
This expression can be thought of as a Coase–Euler equation, which determines inter-firm
efficiency by indicating how two costly forms of coordination (markets and management) are
jointly minimized in equilibrium

57.6 Implementation

For most specifications of primitives, there is no closed-form solution for the equilibrium as
far as we are aware
However, we know that we can compute the equilibrium corresponding to a given transaction
cost parameter 𝛿 and a cost function 𝑐 by applying the results stated above
In particular, we can

1. fix initial condition 𝑝 ∈ 𝒫,


2. iterate with 𝑇 until 𝑇 𝑛 𝑝 has converged to 𝑝∗ , and
3. recover firm choices via the choice function Eq. (3)

As we step between iterates, we will use linear interpolation of functions, as we did in our lec-
ture on optimal growth and several other places.
To begin, here’s a class to store primitives and a grid

In [2]: from scipy.optimize import fminbound


from interpolation import interp

class ProductionChain:

def __init__(self,
n=1000,
delta=1.05,
c=lambda t: np.exp(10 * t) - 1):

self.n, self.delta, self.c = n, delta, c


self.grid = np.linspace(0, 1, n)

Now let’s implement and iterate with 𝑇 until convergence


Recalling that our initial condition must lie in 𝒫, we set 𝑝0 = 𝑐

In [3]: def compute_prices(pc, tol=1e-5, max_iter=5000):


"""
Compute prices by iterating with T

* pc is an instance of ProductionChain
* The initial condition is p = c

"""
delta, c, n, grid = pc.delta, pc.c, pc.n, pc.grid
p = c(grid) # Initial condition is c(s), as an array
new_p = np.empty_like(p)
error = tol + 1
i = 0

while error > tol and i < max_iter:


for i, s in enumerate(grid):
Tp = lambda t: delta * interp(grid, p, t) + c(s - t)
958 57. COASE’S THEORY OF THE FIRM

new_p[i] = Tp(fminbound(Tp, 0, s))


error = np.max(np.abs(p - new_p))
p = new_p
i = i + 1

if i < max_iter:
print(f"Iteration converged in {i} steps")
else:
print(f"Warning: iteration hit upper bound {max_iter}")

p_func = lambda x: interp(grid, p, x)


return p_func

The next function computes optimal choice of upstream boundary and range of task imple-
mented for a firm face price function p_function and with downstream boundary 𝑠

In [4]: def optimal_choices(pc, p_function, s):


"""
Takes p_func as the true function, minimizes on [0,s]

Returns optimal upstream boundary t_star and optimal size of


firm ell_star

In fact, the algorithm minimizes on [-1,s] and then takes the


max of the minimizer and zero. This results in better results
close to zero

"""
delta, c = pc.delta, pc.c
f = lambda t: delta * p_function(t) + c(s - t)
t_star = max(fminbound(f, -1, s), 0)
ell_star = s - t_star
return t_star, ell_star

The allocation of firms can be computed by recursively stepping through firms’ choices of
their respective upstream boundary, treating the previous firm’s upstream boundary as their
own downstream boundary
In doing so, we start with firm 1, who has downstream boundary 𝑠 = 1

In [5]: def compute_stages(pc, p_function):


s = 1.0
transaction_stages = [s]
while s > 0:
s, ell = optimal_choices(pc, p_function, s)
transaction_stages.append(s)
return np.array(transaction_stages)

Let’s try this at the default parameters


The next figure shows the equilibrium price function, as well as the boundaries of firms as
vertical lines

In [6]: pc = ProductionChain()
p_star = compute_prices(pc)

transaction_stages = compute_stages(pc, p_star)

fig, ax = plt.subplots()

ax.plot(pc.grid, p_star(pc.grid))
ax.set_xlim(0.0, 1.0)
ax.set_ylim(0.0)
for s in transaction_stages:
ax.axvline(x=s, c="0.5")
plt.show()
57.6. IMPLEMENTATION 959

Iteration converged in 1000 steps

Here’s the function ℓ∗ , which shows how large a firm with downstream boundary 𝑠 chooses to
be

In [7]: ell_star = np.empty(pc.n)


for i, s in enumerate(pc.grid):
t, e = optimal_choices(pc, p_star, s)
ell_star[i] = e

fig, ax = plt.subplots()
ax.plot(pc.grid, ell_star, label="$\ell^*$")
ax.legend(fontsize=14)
plt.show()
960 57. COASE’S THEORY OF THE FIRM

Note that downstream firms choose to be larger, a point we return to below

57.7 Exercises

57.7.1 Exercise 1

The number of firms is endogenously determined by the primitives


What do you think will happen in terms of the number of firms as 𝛿 increases? Why?
Check your intuition by computing the number of firms at delta in (1.01, 1.05, 1.1)

57.7.2 Exercise 2

The value added of firm 𝑖 is 𝑣𝑖 ∶= 𝑝∗ (𝑡𝑖−1 ) − 𝑝∗ (𝑡𝑖 )


One of the interesting predictions of the model is that value added is increasing with down-
streamness, as are several other measures of firm size
Can you give any intution?
Try to verify this phenomenon (value added increasing with downstreamness) using the code
above

57.8 Solutions

57.8.1 Exercise 1
In [8]: for delta in (1.01, 1.05, 1.1):
57.8. SOLUTIONS 961

pc = ProductionChain(delta=delta)
p_star = compute_prices(pc)
transaction_stages = compute_stages(pc, p_star)
num_firms = len(transaction_stages)
print(f"When delta={delta} there are {num_firms} firms")

Iteration converged in 1000 steps


When delta=1.01 there are 46 firms
Iteration converged in 1000 steps
When delta=1.05 there are 22 firms
Iteration converged in 1000 steps
When delta=1.1 there are 16 firms

57.8.2 Exercise 2

Firm size increases with downstreamness because 𝑝∗ , the equilibrium price function, is in-
creasing and strictly convex
This means that, for a given producer, the marginal cost of the input purchased from the pro-
ducer just upstream from itself in the chain increases as we go further downstream
Hence downstream firms choose to do more in house than upstream firms — and are therefore
larger
The equilibrium price function is strictly convex due to both transaction costs and diminish-
ing returns to management
One way to put this is that firms are prevented from completely mitigating the costs associ-
ated with diminishing returns to management — which induce convexity — by transaction
costs. This is because transaction costs force firms to have nontrivial size
Here’s one way to compute and graph value added across firms

In [9]: pc = ProductionChain()
p_star = compute_prices(pc)
stages = compute_stages(pc, p_star)

va = []

for i in range(len(stages) - 1):


va.append(p_star(stages[i]) - p_star(stages[i+1]))

fig, ax = plt.subplots()
ax.plot(va, label="value added by firm")
ax.set_xticks((5, 25))
ax.set_xticklabels(("downstream firms", "upstream firms"))
plt.show()

Iteration converged in 1000 steps


962 57. COASE’S THEORY OF THE FIRM
Part VIII

Recursive Models of Dynamic


Linear Economies

963
58

Recursive Models of Dynamic


Linear Economies

58.1 Contents

• A Suite of Models 58.2


• Econometrics 58.3
• Dynamic Demand Curves and Canonical Household Technologies 58.4
• Gorman Aggregation and Engel Curves 58.5
• Cattle Cycles 58.6
• Models of Occupational Choice and Pay 58.7
• Permanent Income Models 58.8
• Gorman Heterogeneous Households 58.9
• Non-Gorman Heterogeneous Households 58.10

“Mathematics is the art of giving the same name to different things” – Henri
Poincare

“Complete market economies are all alike” – Robert E. Lucas, Jr., (1989)

“Every partial equilibrium model can be reinterpreted as a general equilibrium


model.” – Anonymous

58.2 A Suite of Models

This lecture presents a class of linear-quadratic-Gaussian models of general economic equilib-


rium designed by Lars Peter Hansen and Thomas J. Sargent [59]
The class of models is implemented in a Python class DLE that is part of quantecon
Subsequent lectures use the DLE class to implement various instances that have appeared in
the economics literature

965
966 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

1. Growth in Dynamic Linear Economies


2. Lucas Asset Pricing using DLE
3. IRFs in Hall Model
4. Permanent Income Using the DLE class
5. Rosen schooling model
6. Cattle cycles
7. Shock Non Invertibility

58.2.1 Overview of the Models

In saying that “complete markets are all alike”, Robert E. Lucas, Jr. was noting that all of
them have

• a commodity space
• a space dual to the commodity space in which prices reside
• endowments of resources
• peoples’ preferences over goods
• physical technologies for transforming resources into goods
• random processes that govern shocks to technologies and preferences and associated in-
formation flows
• a single budget constraint per person
• the existence of a representative consumer even when there are many people in the
model
• a concept of competitive equilibrium
• theorems connecting competitive equilibrium allocations to allocations that would be
chosen by a benevolent social planner

The models have no frictions such as …

• Enforcement difficulties
• Information asymmetries
• Other forms of transactions costs
• Externalities

The models extensively use the powerful ideas of

• Indexing commodities and their prices by time (John R. Hicks)


• Indexing commodities and their prices by chance (Kenneth Arrow)

Much of the imperialism of complete markets models comes from applying these two tricks
The Hicks trick of indexing commodities by time is the idea that dynamics are a special
case of statics
The Arrow trick of indexing commodities by chance is the idea that analysis of trade un-
der uncertainty is a special case of the analysis of trade under certainty
The [59] class of models specify the commodity space, preferences, technologies, stochastic
shocks and information flows in ways that allow the models to be analyzed completely using
only the tools of linear time series models and linear-quadratic optimal control described in
the two lectures Linear State Space Models and Linear Quadratic Control
58.2. A SUITE OF MODELS 967

There are costs and benefits associated with the simplifications and specializations needed to
make a particular model fit within the [59] class

• the costs are that linear-quadratic structures are sometimes too confining
• benefits include computational speed, simplicity, and ability to analyze many model fea-
tures analytically or nearly analytically

A variety of superficially different models are all instances of the [59] class of models

• Lucas asset pricing model


• Lucas-Prescott model of investment under uncertainty
• Asset pricing models with habit persistence
• Rosen-Topel equilibrium model of housing
• Rosen schooling models
• Rosen-Murphy-Scheinkman model of cattle cycles
• Hansen-Sargent-Tallarini model of robustness and asset pricing
• Many more …

The diversity of these models conceals an essential unity that illustrates the quotation by
Robert E. Lucas, Jr., with which we began this lecture

58.2.2 Forecasting?

A consequence of a single budget constraint per person plus the Hicks-Arrow tricks is that
households and firms need not forecast
But there exist equivalent structures called recursive competitive equilibria in which they
do appear to need to forecast
In these structures, to forecast, households and firms use:

• equilibrium pricing functions, and


• knowledge of the Markov structure of the economy’s state vector

58.2.3 Theory and Econometrics

For an application of the [59] class of models, the outcome of theorizing is a stochastic pro-
cess, i.e., a probability distribution over sequences of prices and quantities, indexed by param-
eters describing preferences, technologies, and information flows
Another name for that object is a likelihood function, a key object of both frequentist and
Bayesian statistics
There are two important uses of an equilibrium stochastic process or likelihood func-
tion
The first is to solve the direct problem
The direct problem takes as inputs values of the parameters that define preferences, tech-
nologies, and information flows and as an output characterizes or simulates random paths of
quantities and prices
968 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

The second use of an equilibrium stochastic process or likelihood function is to solve the in-
verse problem
The inverse problem takes as an input a time series sample of observations on a subset of
prices and quantities determined by the model and from them makes inferences about the
parameters that define the model’s preferences, technologies, and information flows

58.2.4 More Details

A [59] economy consists of lists of matrices that describe peoples’ household technologies,
their preferences over consumption services, their production technologies, and their informa-
tion sets
There are complete markets in history-contingent commodities
Competitive equilibrium allocations and prices

• satisfy equations that are easy to write down and solve


• have representations that are convenient econometrically

Different example economies manifest themselves simply as different settings for various ma-
trices
[59] use these tools:

• A theory of recursive dynamic competitive economies


• Linear optimal control theory
• Recursive methods for estimating and interpreting vector autoregressions

The models are flexible enough to express alternative senses of a representative household

• A single ‘stand-in’ household of the type used to good effect by Edward C. Prescott
• Heterogeneous households satisfying conditions for Gorman aggregation into a represen-
tative household
• Heterogeneous household technologies that violate conditions for Gorman aggregation
but are still susceptible to aggregation into a single representative household via ‘non-
Gorman’ or ‘mongrel’ aggregation’

These three alternative types of aggregation have different consequences in terms of how
prices and allocations can be computed
In particular, can prices and an aggregate allocation be computed before the equilibrium allo-
cation to individual heterogeneous households is computed?

• Answers are “Yes” for Gorman aggregation, “No” for non-Gorman aggregation

In summary, the insights and practical benefits from economics to be introduced in this lec-
ture are

• Deeper understandings that come from recognizing common underlying structures


58.2. A SUITE OF MODELS 969

• Speed and ease of computation that comes from unleashing a common suite of Python
programs

We’ll use the following mathematical tools

• Stochastic Difference Equations (Linear)


• Duality: LQ Dynamic Programming and Linear Filtering are the same things mathe-
matically
• The Spectral Factorization Identity (for understanding vector autoregressions and non-
Gorman aggregation)

So here is our roadmap


We’ll describe sets of matrices that pin down

• Information
• Technologies
• Preferences

Then we’ll describe

• Equilibrium concept and computation


• Econometric representation and estimation

58.2.5 Stochastic Model of Information Flows and Outcomes

We’ll use stochastic linear difference equations to describe information flows and equilibrium
outcomes
The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be a martingale difference sequence adapted to
{𝐽𝑡 ∶ 𝑡 = 0, 1, …} if 𝐸(𝑤𝑡+1 |𝐽𝑡 ) = 0 for 𝑡 = 0, 1, …

The sequence {𝑤𝑡 ∶ 𝑡 = 1, 2, …} is said to be conditionally homoskedastic if 𝐸(𝑤𝑡+1 𝑤𝑡+1 ∣
𝐽𝑡 ) = 𝐼 for 𝑡 = 0, 1, …
We assume that the {𝑤𝑡 ∶ 𝑡 = 1, 2, …} process is conditionally homoskedastic
Let {𝑥𝑡 ∶ 𝑡 = 1, 2, …} be a sequence of 𝑛-dimensional random vectors, i.e. an 𝑛-dimensional
stochastic process
The process {𝑥𝑡 ∶ 𝑡 = 1, 2, …} is constructed recursively using an initial random vector 𝑥0 ∼
𝒩(𝑥0̂ , Σ0 ) and a time-invariant law of motion:

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1

for 𝑡 = 0, 1, … where 𝐴 is an 𝑛 by 𝑛 matrix and 𝐶 is an 𝑛 by 𝑁 matrix


Evidently, the distribution of 𝑥𝑡+1 conditional on 𝑥𝑡 is 𝒩(𝐴𝑥𝑡 , 𝐶𝐶 ′ )

58.2.6 Information Sets

Let 𝐽0 be generated by 𝑥0 and 𝐽𝑡 be generated by 𝑥0 , 𝑤1 , … , 𝑤𝑡 , which means that 𝐽𝑡 consists


of the set of all measurable functions of {𝑥0 , 𝑤1 , … , 𝑤𝑡 }
970 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

58.2.7 Prediction Theory

The optimal forecast of 𝑥𝑡+1 given current information is

𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ) = 𝐴𝑥𝑡

and the one-step-ahead forecast error is

𝑥𝑡+1 − 𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ) = 𝐶𝑤𝑡+1

The covariance matrix of 𝑥𝑡+1 conditioned on 𝐽𝑡 is

𝐸(𝑥𝑡+1 − 𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ))(𝑥𝑡+1 − 𝐸(𝑥𝑡+1 ∣ 𝐽𝑡 ))′ = 𝐶𝐶 ′

A nonrecursive expression for 𝑥𝑡 as a function of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 is

𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
𝑡−1
= [∑ 𝐴𝜏 𝐶𝑤𝑡−𝜏 ] + 𝐴𝑡 𝑥0
𝜏=0

Shift forward in time:

𝑗−1
𝑥𝑡+𝑗 = ∑ 𝐴𝑠 𝐶𝑤𝑡+𝑗−𝑠 + 𝐴𝑗 𝑥𝑡
𝑠=0

Projecting on the information set {𝑥0 , 𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤1 } gives

𝐸𝑡 𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡

where 𝐸𝑡 (⋅) ≡ 𝐸[(⋅) ∣ 𝑥0 , 𝑤𝑡 , 𝑤𝑡−1 , … , 𝑤1 ] = 𝐸(⋅) ∣ 𝐽𝑡 , and 𝑥𝑡 is in 𝐽𝑡


It is useful to obtain the covariance matrix of the 𝑗-step-ahead prediction error 𝑥𝑡+𝑗 −
𝑗−1
𝐸𝑡 𝑥𝑡+𝑗 = ∑𝑠=0 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗
Evidently,

𝑗−1

𝐸𝑡 (𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 )(𝑥𝑡+𝑗 − 𝐸𝑡 𝑥𝑡+𝑗 ) = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 ≡ 𝑣𝑗

𝑘=0

𝑣𝑗 can be calculated recursively via

𝑣1 = 𝐶𝐶 ′
𝑣𝑗 = 𝐶𝐶 ′ + 𝐴𝑣𝑗−1 𝐴′ , 𝑗≥2
58.2. A SUITE OF MODELS 971

58.2.8 Orthogonal Decomposition

To decompose these covariances into parts attributable to the individual components of 𝑤𝑡 ,


we let 𝑖𝜏 be an 𝑁 -dimensional column vector of zeroes except in position 𝜏 , where there is a
one. Define a matrix 𝜐𝑗,𝜏

𝑗−1

𝜐𝑗,𝜏 = ∑ 𝐴𝑘 𝐶𝑖𝜏 𝑖′𝜏 𝐶 ′ 𝐴 𝑘 .
𝑘=0

𝑁
Note that ∑𝜏=1 𝑖𝜏 𝑖′𝜏 = 𝐼, so that we have

𝑁
∑ 𝜐𝑗,𝜏 = 𝜐𝑗
𝜏=1

Evidently, the matrices {𝜐𝑗,𝜏 , 𝜏 = 1, … , 𝑁 } give an orthogonal decomposition of the covari-


ance matrix of 𝑗-step-ahead prediction errors into the parts attributable to each of the com-
ponents 𝜏 = 1, … , 𝑁

58.2.9 Taste and Technology Shocks

𝐸(𝑤𝑡 ∣ 𝐽𝑡−1 ) = 0 and 𝐸(𝑤𝑡 𝑤𝑡′ ∣ 𝐽𝑡−1 ) = 𝐼 for 𝑡 = 1, 2, …

𝑏𝑡 = 𝑈𝑏 𝑧𝑡 and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡 ,

𝑈𝑏 and 𝑈𝑑 are matrices that select entries of 𝑧𝑡 . The law of motion for {𝑧𝑡 ∶ 𝑡 = 0, 1, …} is

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 for 𝑡 = 0, 1, …

where 𝑧0 is a given initial condition. The eigenvalues of the matrix 𝐴22 have absolute values
that are less than or equal to one
Thus, in summary, our model of information and shocks is

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1


𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡 .

We can now briefly summarize other components of our economies, in particular

• Production technologies
• Household technologies
• Household preferences

Production Technology
Where 𝑐𝑡 is a vector of consumption rates, 𝑘𝑡 is a vector of physical capital goods, 𝑔𝑡 is a vec-
tor intermediate productions goods, 𝑑𝑡 is a vector of technology shocks, the production tech-
nology is
972 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2

Here Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 are all matrices conformable to the vectors they multiply and ℓ𝑡 is a
disutility generating resource supplied by the household
For technical reasons that facilitate computations, we make the following
Assumption: [Φ𝑐 Φ𝑔 ] is nonsingular
Household Technology
Households confront a technology that allows them to devote consumption goods to construct
a vector ℎ𝑡 of household capital goods and a vector 𝑠𝑡 of utility generating house services

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

where Λ, Π, Δℎ , Θℎ are matrices that pin down the household technology


We make the following
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one
Below, we’ll outline further assumptions that we shall occasionally impose
Preferences
Where 𝑏𝑡 is a stochastic process of preference shocks that will play the role of demand
shifters, the representative household orders stochastic processes of consumption services 𝑠𝑡
according to


1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0

We now proceed to give examples of production and household technologies that appear in
various models that appear in the literature
First, we give examples of production Technologies

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡

∣ 𝑔𝑡 ∣≤ ℓ𝑡

so we’ll be looking for specifications of the matrices Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 that define them

58.2.10 Endowment Economy

There is a single consumption good that cannot be stored over time


In time period 𝑡, there is an endowment 𝑑𝑡 of this single good
There is neither a capital stock, nor an intermediate good, nor a rate of investment
58.2. A SUITE OF MODELS 973

So 𝑐𝑡 = 𝑑𝑡
To implement this specification, we can choose 𝐴22 , 𝐶2 , and 𝑈𝑑 to make 𝑑𝑡 follow any of a
variety of stochastic processes
To satisfy our earlier rank assumption, we set:

𝑐𝑡 + 𝑖𝑡 = 𝑑1𝑡

𝑔𝑡 = 𝜙1 𝑖𝑡

where 𝜙1 is a small positive number


To implement this version, we set Δ𝑘 = Θ𝑘 = 0 and

1 1 0 0 𝑑
Φ𝑐 = [ ] , Φ𝑖 = [ ] , Φ𝑔 = [ ] , Γ = [ ] , 𝑑𝑡 = [ 1𝑡 ]
0 𝜙1 −1 0 0

We can use this specification to create a linear-quadratic version of Lucas’s (1978) asset pric-
ing model

58.2.11 Single-Period Adjustment Costs

There is a single consumption good, a single intermediate good, and a single investment good
The technology is described by

𝑐𝑡 = 𝛾𝑘𝑡−1 + 𝑑1𝑡 , 𝛾 > 0


𝜙1 𝑖𝑡 = 𝑔𝑡 + 𝑑2𝑡 , 𝜙1 > 0
ℓ𝑡2 = 𝑔𝑡2
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡 , 0 < 𝛿𝑘 < 1

Set

1 0 0
Φ𝑐 = [ ] , Φ𝑔 = [ ] , Φ𝑖 = [ ]
0 −1 𝜙1

𝛾
Γ = [ ] , Δ𝑘 = 𝛿 𝑘 , Θ 𝑘 = 1
0

We set 𝐴22 , 𝐶2 and 𝑈𝑑 to make (𝑑1𝑡 , 𝑑2𝑡 )′ = 𝑑𝑡 follow a desired stochastic process
Now we describe some examples of preferences, which as we have seen are ordered by


1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + (ℓ𝑡 )2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0

where household services are produced via the household technology

ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
974 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡

and we make
Assumption: The absolute values of the eigenvalues of Δℎ are less than or equal to one
Later we shall introduce canonical household technologies that satisfy an ‘invertibility’ re-
quirement relating sequences {𝑠𝑡 } of services and {𝑐𝑡 } of consumption flows
And we’ll describe how to obtain a canonical representation of a household technology from
one that is not canonical
Here are some examples of household preferences
Time Separable preferences

1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0 , 0<𝛽<1
2 𝑡=0

Consumer Durables

ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 0 < 𝛿ℎ < 1

Services at 𝑡 are related to the stock of durables at the beginning of the period:

𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0

Preferences are ordered by

1 ∞
− 𝐸 ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0

Set Δℎ = 𝛿ℎ , Θℎ = 1, Λ = 𝜆, Π = 0
Habit Persistence

∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0
2 𝑡=0 𝑗=0

0 < 𝛽 < 1 , 0 < 𝛿ℎ < 1 , 𝜆 > 0



Here the effective bliss point 𝑏𝑡 + 𝜆(1 − 𝛿ℎ ) ∑𝑗=0 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 shifts in response to a moving aver-
age of past consumption
Initial Conditions

Preferences of this form require an initial condition for the geometric sum ∑𝑗=0 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 that
we specify as an initial condition for the ‘stock of household durables,’ ℎ−1
Set

ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + (1 − 𝛿ℎ )𝑐𝑡 , 0 < 𝛿ℎ < 1


58.2. A SUITE OF MODELS 975

𝑡
ℎ𝑡 = (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗 + 𝛿ℎ𝑡+1 ℎ−1
𝑗=0

𝑠𝑡 = −𝜆ℎ𝑡−1 + 𝑐𝑡 , 𝜆 > 0

To implement, set Λ = −𝜆, Π = 1, Δℎ = 𝛿ℎ , Θℎ = 1 − 𝛿ℎ


Seasonal Habit Persistence

∞ ∞
1 2
−( ) 𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝜆(1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−4𝑗−4 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0 𝑗=0

0 < 𝛽 < 1 , 0 < 𝛿ℎ < 1 , 𝜆 > 0



Here the effective bliss point 𝑏𝑡 + 𝜆(1 − 𝛿ℎ ) ∑𝑗=0 𝛿ℎ𝑗 𝑐𝑡−4𝑗−4 shifts in response to a moving av-
erage of past consumptions of the same quarter
To implement, set

ℎ̃ 𝑡 = 𝛿ℎ ℎ̃ 𝑡−4 + (1 − 𝛿ℎ )𝑐𝑡 , 0 < 𝛿ℎ < 1

This implies that

ℎ̃ 0 0 0 𝛿ℎ ℎ̃ (1 − 𝛿ℎ )
⎡ ̃ 𝑡 ⎤ ⎡ ⎡ ̃ 𝑡−1 ⎤ ⎡
ℎ 1 0 0 0 ⎥ ⎢ℎ𝑡−2 ⎥ ⎢ 0 ⎤

ℎ𝑡 = ⎢ 𝑡−1 ⎥
⎢ℎ̃ ⎥ = ⎢ ⎥+ ⎥ 𝑐𝑡
⎢ 𝑡−2 ⎥ ⎢0 1 0 0 ⎥⎢ ⎢ℎ̃ 𝑡−3 ⎥ ⎢ 0 ⎥
⎣ℎ̃ 𝑡−3 ⎦ ⎣0 0 1 0 ⎦ ⎣ℎ̃ 𝑡−4 ⎦ ⎣ 0 ⎦

with consumption services

𝑠𝑡 = − [0 0 0 −𝜆] ℎ𝑡−1 + 𝑐𝑡 , 𝜆>0

Adjustment Costs
Recall


1
−( )𝐸 ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏1𝑡 )2 + 𝜆2 (𝑐𝑡 − 𝑐𝑡−1 )2 + ℓ𝑡2 ] ∣ 𝐽0
2 𝑡=0

0<𝛽<1 , 𝜆>0

To capture adjustment costs, set

ℎ𝑡 = 𝑐 𝑡

0 1
𝑠𝑡 = [ ] ℎ + [ ] 𝑐𝑡
−𝜆 𝑡−1 𝜆
976 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

so that

𝑠1𝑡 = 𝑐𝑡

𝑠2𝑡 = 𝜆(𝑐𝑡 − 𝑐𝑡−1 )

We set the first component 𝑏1𝑡 of 𝑏𝑡 to capture the stochastic bliss process and set the second
component identically equal to zero.
Thus, we set Δℎ = 0, Θℎ = 1

0 1
Λ=[ ] , Π=[ ]
−𝜆 𝜆

Multiple Consumption Goods

0 𝜋 0
Λ = [ ] and Π = [ 1 ].
0 𝜋2 𝜋3

1
− 𝛽 𝑡 (Π𝑐𝑡 − 𝑏𝑡 )′ (Π𝑐𝑡 − 𝑏𝑡 )
2

𝑚𝑢𝑡 = −𝛽 𝑡 [Π′ Π 𝑐𝑡 − Π′ 𝑏𝑡 ]

𝑐𝑡 = −(Π′ Π)−1 𝛽 −𝑡 𝑚𝑢𝑡 + (Π′ Π)−1 Π′ 𝑏𝑡

This is called the Frisch demand function for consumption


We can think of the vector 𝑚𝑢𝑡 as playing the role of prices, up to a common factor, for all
dates and states
The scale factor is determined by the choice of numeraire
Notions of substitutes and complements can be defined in terms of these Frisch demand
functions
Two goods can be said to be substitutes if the cross-price effect is positive and to be com-
plements if this effect is negative
Hence this classification is determined by the off-diagonal element of −(Π′ Π)−1 , which is
equal to 𝜋2 𝜋3 / det(Π′ Π)
If 𝜋2 and 𝜋3 have the same sign, the goods are substitutes
If they have opposite signs, the goods are complements
To summarize, our economic structure consists of the matrices that define the following com-
ponents:
Information and shocks

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1


𝑏𝑡 = 𝑈𝑏 𝑧𝑡
𝑑𝑡 = 𝑈 𝑑 𝑧𝑡
58.2. A SUITE OF MODELS 977

Production Technology

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡
𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2

Household Technology

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

Preferences


1
( )𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]∣𝐽0 , 0 < 𝛽 < 1
2 𝑡=0

*Next steps:** we move on to discuss two closely connected concepts

• A Planning Problem or Optimal Resource Allocation Problem


• Competitive Equilibrium

58.2.12 Optimal Resource Allocation

Imagine a planner who chooses sequences {𝑐𝑡 , 𝑖𝑡 , 𝑔𝑡 }∞


𝑡=0 to maximize


−(1/2)𝐸 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]∣𝐽0
𝑡=0

subject to the constraints

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡

and initial conditions for ℎ−1 , 𝑘−1 , and 𝑧0


Throughout, we shall impose the following square summability conditions

∞ ∞
𝐸 ∑ 𝛽 ℎ𝑡 ⋅ ℎ𝑡 ∣ 𝐽0 < ∞ and 𝐸 ∑ 𝛽 𝑡 𝑘𝑡 ⋅ 𝑘𝑡 ∣ 𝐽0 < ∞
𝑡

𝑡=0 𝑡=0

Define:


𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0
978 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

Thus, we require that each component of ℎ𝑡 and each component of 𝑘𝑡 belong to 𝐿20
We shall compare and utilize two approaches to solving the planning problem

• Lagrangian
• Dynamic Programming

58.2.13 Lagrangian Formulation

Form the Lagrangian


1
ℒ = −𝐸 ∑ 𝛽 𝑡 [( )[(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
𝑡=0
2
+ 𝑀𝑡𝑑′ ⋅ (Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 − Γ𝑘𝑡−1 − 𝑑𝑡 )
+ 𝑀𝑡𝑘′ ⋅ (𝑘𝑡 − Δ𝑘 𝑘𝑡−1 − Θ𝑘 𝑖𝑡 )
+ 𝑀𝑡ℎ′ ⋅ (ℎ𝑡 − Δℎ ℎ𝑡−1 − Θℎ 𝑐𝑡 )

+ 𝑀𝑡𝑠′ ⋅ (𝑠𝑡 − Λℎ𝑡−1 − Π𝑐𝑡 )]∣𝐽0

The planner maximizes ℒ with respect to the quantities {𝑐𝑡 , 𝑖𝑡 , 𝑔𝑡 }∞


𝑡=0 and minimizes with re-
spect to the Lagrange multipliers 𝑀𝑡𝑑 , 𝑀𝑡𝑘 , 𝑀𝑡ℎ , 𝑀𝑡𝑠
First-order necessary conditions for maximization with respect to 𝑐𝑡 , 𝑔𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , and 𝑠𝑡 , re-
spectively, are:

−Φ′𝑐 𝑀𝑡𝑑 + Θ′ℎ 𝑀𝑡ℎ + Π′ 𝑀𝑡𝑠 = 0,


− 𝑔𝑡 − Φ′𝑔 𝑀𝑡𝑑 = 0,
−𝑀𝑡ℎ + 𝛽𝐸(Δ′ℎ 𝑀𝑡+1

+ Λ′ 𝑀𝑡+1
𝑠
) ∣ 𝐽𝑡 = 0,
− Φ′𝑖 𝑀𝑡𝑑 + Θ′𝑘 𝑀𝑡𝑘 = 0,
−𝑀𝑡𝑘 + 𝛽𝐸(Δ′𝑘 𝑀𝑡+1
𝑘
+ Γ′ 𝑀𝑡+1
𝑑
) ∣ 𝐽𝑡 = 0,
− 𝑠𝑡 + 𝑏𝑡 − 𝑀𝑡𝑠 = 0

for 𝑡 = 0, 1, …
In addition, we have the complementary slackness conditions (these recover the original tran-
sition equations) and also transversality conditions

lim 𝛽 𝑡 𝐸[𝑀𝑡𝑘′ 𝑘𝑡 ] ∣ 𝐽0 = 0
𝑡→∞
lim 𝛽 𝑡 𝐸[𝑀𝑡ℎ′ ℎ𝑡 ] ∣ 𝐽0 = 0
𝑡→∞

The system formed by the FONCs and the transition equations can be handed over to
Python
Python will solve the planning problem for fixed parameter values
Here are the Python Ready Equations
58.2. A SUITE OF MODELS 979

−Φ′𝑐 𝑀𝑡𝑑 + Θ′ℎ 𝑀𝑡ℎ + Π′ 𝑀𝑡𝑠 = 0,


− 𝑔𝑡 − Φ′𝑔 𝑀𝑡𝑑 = 0,
−𝑀𝑡ℎ + 𝛽𝐸(Δ′ℎ 𝑀𝑡+1

+ Λ′ 𝑀𝑡+1
𝑠
) ∣ 𝐽𝑡 = 0,
− Φ𝑖′ 𝑀𝑡𝑑 + Θ′𝑘 𝑀𝑡𝑘 = 0,
−𝑀𝑡𝑘 + 𝛽𝐸(Δ′𝑘 𝑀𝑡+1
𝑘
+ Γ′ 𝑀𝑡+1
𝑑
) ∣ 𝐽𝑡 = 0,
− 𝑠𝑡 + 𝑏𝑡 − 𝑀𝑡𝑠 = 0
Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡 ,
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 ,
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 ,
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡 ,
𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1 , 𝑏𝑡 = 𝑈𝑏 𝑧𝑡 , and 𝑑𝑡 = 𝑈𝑑 𝑧𝑡

The Lagrange multipliers or shadow prices satisfy

𝑀𝑡𝑠 = 𝑏𝑡 − 𝑠𝑡


𝑀𝑡ℎ = 𝐸[∑ 𝛽 𝜏 (Δ′ℎ )𝜏−1 Λ′ 𝑀𝑡+𝜏
𝑠
∣ 𝐽𝑡 ]
𝜏=1

−1
Φ′ Θ′ 𝑀 ℎ + Π′ 𝑀𝑡𝑠
𝑀𝑡𝑑 = [ ′𝑐 ] [ ℎ 𝑡 ]
Φ𝑔 −𝑔𝑡


𝑀𝑡𝑘 = 𝐸[∑ 𝛽 𝜏 (Δ′𝑘 )𝜏−1 Γ′ 𝑀𝑡+𝜏
𝑑
∣ 𝐽𝑡 ]
𝜏=1

𝑀𝑡𝑖 = Θ′𝑘 𝑀𝑡𝑘

Although it is possible to use matrix operator methods to solve the above Python ready
equations, that is not the approach we’ll use
Instead, we’ll use dynamic programming to get recursive representations for both quantities
and shadow prices

58.2.14 Dynamic Programming

Dynamic Programming always starts with the word let


Thus, let 𝑉 (𝑥0 ) be the optimal value function for the
planning problem as a function of the initial state vector 𝑥0
(Thus, in essence, dynamic programming amounts to an application of a guess and verify
method in which we begin with a guess about the answer to the problem we want to solve.
That’s why we start with let 𝑉 (𝑥0 ) be the (value of the) answer to the problem, then estab-
lish and verify a bunch of conditions 𝑉 (𝑥0 ) has to satisfy if indeed it is the answer)
The optimal value function 𝑉 (𝑥) satisfies the Bellman equation
980 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

𝑉 (𝑥0 ) = max [−.5[(𝑠0 − 𝑏0 ) ⋅ (𝑠0 − 𝑏0 ) + 𝑔0 ⋅ 𝑔0 ] + 𝛽𝐸𝑉 (𝑥1 )]


𝑐0 ,𝑖0 ,𝑔0

subject to the linear constraints

Φ𝑐 𝑐0 + Φ𝑔 𝑔0 + Φ𝑖 𝑖0 = Γ𝑘−1 + 𝑑0 ,
𝑘0 = Δ𝑘 𝑘−1 + Θ𝑘 𝑖0 ,
ℎ0 = Δℎ ℎ−1 + Θℎ 𝑐0 ,
𝑠0 = Λℎ−1 + Π𝑐0 ,
𝑧1 = 𝐴22 𝑧0 + 𝐶2 𝑤1 , 𝑏0 = 𝑈𝑏 𝑧0 and 𝑑0 = 𝑈𝑑 𝑧0

𝑉 (𝑥) = 𝑥′ 𝑃 𝑥 + 𝜌

This is a version of the following linear-quadratic dynamic programming problem


Choose a contingency plan for {𝑥𝑡+1 , 𝑢𝑡 }∞
𝑡=0 to maximize


−𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0 < 𝛽 < 1
𝑡=0

subject to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 ≥ 0

where 𝑥0 is given; 𝑥𝑡 is an 𝑛 × 1 vector of state variables, and 𝑢𝑡 is a 𝑘 × 1 vector of control


variables
We assume 𝑤𝑡+1 is a martingale difference sequence with 𝐸𝑤𝑡 𝑤𝑡′ = 𝐼, and that 𝐶 is a matrix
conformable to 𝑥 and 𝑤
The optimal value function 𝑉 (𝑥) satisfies the Bellman equation

𝑉 (𝑥𝑡 ) = max{−(𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 𝑥𝑡 ) + 𝛽𝐸𝑡 𝑉 (𝑥𝑡+1 )}


𝑢𝑡

where maximization is subject to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 ≥ 0

𝑉 (𝑥𝑡 ) = −𝑥′𝑡 𝑃 𝑥𝑡 − 𝜌

𝑃 satisfies

𝑃 = 𝑅 + 𝛽𝐴′ 𝑃 𝐴 − (𝛽𝐴′ 𝑃 𝐵 + 𝑊 )(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑊 ′ )

This equation in 𝑃 is called the algebraic matrix Riccati equation


The optimal decision rule is 𝑢𝑡 = −𝐹 𝑥𝑡 , where
58.2. A SUITE OF MODELS 981

𝐹 = (𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 (𝛽𝐵′ 𝑃 𝐴 + 𝑊 ′ )

The optimum decision rule for 𝑢𝑡 is independent of the parameters 𝐶, and so of the noise
statistics
Iterating on the Bellman operator leads to

𝑉𝑗+1 (𝑥𝑡 ) = max{−(𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 𝑥𝑡 ) + 𝛽𝐸𝑡 𝑉𝑗 (𝑥𝑡+1 )}


𝑢𝑡

𝑉𝑗 (𝑥𝑡 ) = −𝑥′𝑡 𝑃𝑗 𝑥𝑡 − 𝜌𝑗

where 𝑃𝑗 and 𝜌𝑗 satisfy the equations

𝑃𝑗+1 = 𝑅 + 𝛽𝐴′ 𝑃𝑗 𝐴 − (𝛽𝐴′ 𝑃𝑗 𝐵 + 𝑊 )(𝑄 + 𝛽𝐵′ 𝑃𝑗 𝐵)−1 (𝛽𝐵′ 𝑃𝑗 𝐴 + 𝑊 ′ )


𝜌𝑗+1 = 𝛽𝜌𝑗 + 𝛽 trace 𝑃𝑗 𝐶𝐶 ′

We can now state the planning problem as a dynamic programming problem


max −𝐸 ∑ 𝛽 𝑡 [𝑥′𝑡 𝑅𝑥𝑡 + 𝑢′𝑡 𝑄𝑢𝑡 + 2𝑢′𝑡 𝑊 ′ 𝑥𝑡 ], 0<𝛽<1
{𝑢𝑡 ,𝑥𝑡+1 }
𝑡=0

where maximization is subject to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑢𝑡 + 𝐶𝑤𝑡+1 , 𝑡 ≥ 0

ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢𝑘𝑡−1 ⎥ , 𝑢𝑡 = 𝑖𝑡
⎣ 𝑧𝑡 ⎦

where

Δℎ Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Γ Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 𝑈𝑑



𝐴=⎢ 0 Δ𝑘 0 ⎤

⎣ 0 0 𝐴 22 ⎦
−Θℎ 𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Φ𝑖 0
𝐵=⎡
⎢ Θ𝑘 ⎤ , 𝐶=⎡0⎤
⎥ ⎢ ⎥
⎣ 0 ⎦ ⎣𝐶2 ⎦
′ ′
𝑥 𝑥 𝑥 𝑅 𝑊 𝑥
[ 𝑡] 𝑆 [ 𝑡] = [ 𝑡] [ ′ ] [ 𝑡]
𝑢𝑡 𝑢𝑡 𝑢𝑡 𝑊 𝑄 𝑢𝑡

𝑆 = (𝐺′ 𝐺 + 𝐻 ′ 𝐻)/2

𝐻 = [Λ ⋮ Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Γ ⋮ Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 𝑈𝑑 − 𝑈𝑏 ⋮ −Π𝑈𝑐 [Φ𝑐 Φ𝑔 ]−1 Φ𝑖 ]

𝐺 = 𝑈𝑔 [Φ𝑐 Φ𝑔 ]−1 [0 ⋮ Γ ⋮ 𝑈𝑑 ⋮ −Φ𝑖 ].


982 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

For us a useful fact is that Lagrange multipliers equal gradients of the planner’s value func-
tion

ℳ𝑘𝑡 = 𝑀𝑘 𝑥𝑡 and 𝑀𝑡ℎ = 𝑀ℎ 𝑥𝑡 where


𝑀𝑘 = 2𝛽[0 𝐼 0]𝑃 𝐴𝑜
𝑀ℎ = 2𝛽[𝐼 0 0]𝑃 𝐴𝑜

ℳ𝑠𝑡 = 𝑀𝑠 𝑥𝑡 where 𝑀𝑠 = (𝑆𝑏 − 𝑆𝑠 ) and 𝑆𝑏 = [0 0 𝑈𝑏 ]

−1
Φ′ Θ′ 𝑀 + Π′ 𝑀𝑠
ℳ𝑑𝑡 = 𝑀𝑑 𝑥𝑡 where 𝑀𝑑 = [ ′𝑐 ] [ ℎ ℎ ]
Φ𝑔 −𝑆𝑔

ℳ𝑐𝑡 = 𝑀𝑐 𝑥𝑡 where 𝑀𝑐 = Θ′ℎ 𝑀ℎ + Π′ 𝑀𝑠

ℳ𝑖𝑡 = 𝑀𝑖 𝑥𝑡 where 𝑀𝑖 = Θ′𝑘 𝑀𝑘

We will use this fact and these equations to compute competitive equilibrium prices
Let’s start with describing the commodity space and pricing functional for our competi-
tive equilibrium
For the commodity space, we use


𝐿20 = [{𝑦𝑡 } ∶ 𝑦𝑡 is a random variable in 𝐽𝑡 and 𝐸 ∑ 𝛽 𝑡 𝑦𝑡2 ∣ 𝐽0 < +∞]
𝑡=0

For pricing functionals, we express values as inner products


𝜋(𝑐) = 𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0
𝑡=0

where 𝑝𝑡0 belongs to 𝐿20


With these objects in our toolkit, we move on to state the problem of a Representative
Household in a competitive equilibrium

58.2.15 Representative Household

The representative household owns endowment process and initial stocks of ℎ and 𝑘 and
chooses stochastic processes for {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , ℓ𝑡 }∞ 2
𝑡=0 , each element of which is in 𝐿0 , to maximize


1
− 𝐸0 ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + ℓ𝑡2 ]
2 𝑡=0

subject to

∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘−1
𝑡=0 𝑡=0
58.2. A SUITE OF MODELS 983

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡

ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡 , ℎ−1 , 𝑘−1 given

We now describe the problems faced by two types of firms called type I and type II

58.2.16 Type I Firm

A type I firm rents capital and labor and endowments and produces 𝑐𝑡 , 𝑖𝑡 .
It chooses stochastic processes for {𝑐𝑡 , 𝑖𝑡 , 𝑘𝑡 , ℓ𝑡 , 𝑔𝑡 , 𝑑𝑡 }, each element of which is in 𝐿20 , to maxi-
mize


𝐸0 ∑ 𝛽 𝑡 (𝑝𝑡0 ⋅ 𝑐𝑡 + 𝑞𝑡0 ⋅ 𝑖𝑡 − 𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑤𝑡0 ℓ𝑡 − 𝛼0𝑡 ⋅ 𝑑𝑡 )
𝑡=0

subject to

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡

− ℓ𝑡2 + 𝑔𝑡 ⋅ 𝑔𝑡 = 0

58.2.17 Type II Firm

A firm of type II acquires capital via investment and then rents stocks of capital to the 𝑐, 𝑖-
producing type I firm
A type II firm is a price taker facing the vector 𝑣0 and the stochastic processes {𝑟𝑡0 , 𝑞𝑡0 }.
The firm chooses 𝑘−1 and stochastic processes for {𝑘𝑡 , 𝑖𝑡 }∞
𝑡=0 to maximize


𝐸 ∑ 𝛽 𝑡 (𝑟𝑡0 ⋅ 𝑘𝑡−1 − 𝑞𝑡0 ⋅ 𝑖𝑡 ) ∣ 𝐽0 − 𝑣0 ⋅ 𝑘−1
𝑡=0

subject to

𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡

We can now state the following


Definition: A competitive equilibrium is a price system [𝑣0 , {𝑝𝑡0 , 𝑤𝑡0 , 𝛼0𝑡 , 𝑞𝑡0 , 𝑟𝑡0 }∞
𝑡=0 ] and an

allocation {𝑐𝑡 , 𝑖𝑡 , 𝑘𝑡 , ℎ𝑡 , 𝑔𝑡 , 𝑑𝑡 }𝑡=0 that satisfy the following conditions:

• Each component of the price system and the allocation resides in the space
𝐿20
• Given the price system and given ℎ−1 , 𝑘−1 , the allocation solves the represen-
tative household’s problem and the problems of the two types of firms
984 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

Versions of the two classical welfare theorems prevail under our assumptions
We exploit that fact in our algorithm for computing a competitive equilibrium

Step 1: Solve the planning problem by using dynamic programming

The allocation (i.e., quantities) that solve the planning problem are the competitive equilib-
rium quantities
Step 2: use the following formulas to compute the equilibrium price system

𝑝𝑡0 = [Π′ 𝑀𝑡𝑠 + Θ′ℎ 𝑀𝑡ℎ ]/𝜇𝑤 𝑐 𝑤


0 = 𝑀𝑡 /𝜇0

𝑤𝑡0 =∣ 𝑆𝑔 𝑥𝑡 ∣ /𝜇𝑤
0

𝑟𝑡0 = Γ′ 𝑀𝑡𝑑 /𝜇𝑤


0

𝑞𝑡0 = Θ′𝑘 𝑀𝑡𝑘 /𝜇𝑤 𝑖 𝑤


0 = 𝑀𝑡 /𝜇0

𝛼0𝑡 = 𝑀𝑡𝑑 /𝜇𝑤


0

𝑣0 = Γ′ 𝑀0𝑑 /𝜇𝑤 ′ 𝑘 𝑤
0 + Δ𝑘 𝑀0 /𝜇0

Verification: With this price system, values can be assigned to the Lagrange multipliers for
each of our three classes of agents that cause all first-order necessary conditions to be satisfied
at these prices and at the quantities associated with the optimum of the planning problem
An important use of an equilibrium pricing system is to do asset pricing
Thus, imagine that we are presented a dividend stream: {𝑦𝑡 } ∈ 𝐿20 and want to compute the
value of a perpetual claim to this stream
To value this asset we simply take price times quantity and add to get an asset value:

𝑎0 = 𝐸 ∑𝑡=0 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑦𝑡 ∣ 𝐽0
To compute 𝑎𝑜 we proceed as follows
We let

𝑦𝑡 = 𝑈𝑎 𝑥𝑡


𝑎0 = 𝐸 ∑ 𝛽 𝑡 𝑥′𝑡 𝑍𝑎 𝑥𝑡 ∣ 𝐽0
𝑡=0

𝑍𝑎 = 𝑈𝑎′ 𝑀𝑐 /𝜇𝑤
0

We have the following convenient formulas:


58.3. ECONOMETRICS 985

𝑎0 = 𝑥′0 𝜇𝑎 𝑥0 + 𝜎𝑎


𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜′ )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0


𝛽
𝜎𝑎 = trace (𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 ′ (𝐴𝑜′ )𝜏 )
1−𝛽 𝜏=0

58.2.18 Re-Opening Markets

We have assumed that all trading occurs once-and-for-all at time 𝑡 = 0


If we were to re-open markets at some time 𝑡 > 0 at time 𝑡 wealth levels implicitly defined
by time 0 trades, we would obtain the same equilibrium allocation (i.e., quantities) and the
following time 𝑡 price system

𝐿2𝑡 = [{𝑦𝑠 }∞
𝑠=𝑡 ∶ 𝑦𝑠 is a random variable in 𝐽𝑠 for 𝑠 ≥ 𝑡

and 𝐸 ∑ 𝛽 𝑠−𝑡 𝑦𝑠2 ∣ 𝐽𝑡 < +∞].
𝑠=𝑡

𝑝𝑠𝑡 = 𝑀𝑐 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠≥𝑡

𝑤𝑠𝑡 =∣ 𝑆𝑔 𝑥𝑠 |/[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡

𝑟𝑠𝑡 = Γ′ 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡

𝑞𝑠𝑡 = 𝑀𝑖 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠≥𝑡

𝛼𝑡𝑠 = 𝑀𝑑 𝑥𝑠 /[𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ], 𝑠 ≥ 𝑡

𝑣𝑡 = [Γ′ 𝑀𝑑 + Δ′𝑘 𝑀𝑘 ]𝑥𝑡 / [𝑒𝑗̄ 𝑀𝑐 𝑥𝑡 ]

58.3 Econometrics

Up to now, we have described how to solve the direct problem that maps model parameters
into an (equilibrium) stochastic process of prices and quantities
Recall the inverse problem of inferring model parameters from a single realization of a time
series of some of the prices and quantities
Another name for the inverse problem is econometrics
An advantage of the [59] structure is that it comes with a self-contained theory of economet-
rics
986 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

It is really just a tale of two state-space representations


Here they are:
Original State-Space Representation:

𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡

where 𝑣𝑡 is a martingale difference sequence of measurement errors that satisfies 𝐸𝑣𝑡 𝑣𝑡′ =
𝑅, 𝐸𝑤𝑡+1 𝑣𝑠′ = 0 for all 𝑡 + 1 ≥ 𝑠 and

𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ).

Innovations Representation:

𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡 ,

where 𝑎𝑡 = 𝑦𝑡 − 𝐸[𝑦𝑡 |𝑦𝑡−1 ], 𝐸𝑎𝑡 𝑎′𝑡 ≡ Ω𝑡 = 𝐺Σ𝑡 𝐺′ + 𝑅


Compare numbers of shocks in the two representations:

• 𝑛𝑤 + 𝑛𝑦 versus 𝑛𝑦

Compare spaces spanned

• 𝐻(𝑦𝑡 ) ⊂ 𝐻(𝑤𝑡 , 𝑣𝑡 )
• 𝐻(𝑦𝑡 ) = 𝐻(𝑎𝑡 )

Kalman Filter:
Kalman gain:

𝐾𝑡 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1

Riccati Difference Equation:

Σ𝑡+1 = 𝐴𝑜 Σ𝑡 𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴𝑜′

Innovations Representation as Whitener


Whitening Filter:

𝑎𝑡 = 𝑦𝑡 − 𝐺𝑥𝑡̂
𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡

can be used recursively to construct a record of innovations {𝑎𝑡 }𝑇𝑡=0 from an (𝑥0̂ , Σ0 ) and a
record of observations {𝑦𝑡 }𝑇𝑡=0
58.3. ECONOMETRICS 987

Limiting Time-Invariant Innovations Representation

Σ = 𝐴𝑜 Σ𝐴𝑜′ + 𝐶𝐶 ′
− 𝐴𝑜 Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴𝑜′
𝐾 = 𝐴𝑜 Σ𝑡 𝐺′ (𝐺Σ𝐺′ + 𝑅)−1

𝑥𝑡+1
̂ = 𝐴𝑜 𝑥𝑡̂ + 𝐾𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡

where 𝐸𝑎𝑡 𝑎′𝑡 ≡ Ω = 𝐺Σ𝐺′ + 𝑅

58.3.1 Factorization of Likelihood Function

Sample of observations {𝑦𝑠 }𝑇𝑠=0 on a (𝑛𝑦 × 1) vector.

𝑓(𝑦𝑇 , 𝑦𝑇 −1 , … , 𝑦0 ) = 𝑓𝑇 (𝑦𝑇 |𝑦𝑇 −1 , … , 𝑦0 )𝑓𝑇 −1 (𝑦𝑇 −1 |𝑦𝑇 −2 , … , 𝑦0 ) ⋯ 𝑓1 (𝑦1 |𝑦0 )𝑓0 (𝑦0 )
= 𝑔𝑇 (𝑎𝑇 )𝑔𝑇 −1 (𝑎𝑇 −1 ) … 𝑔1 (𝑎1 )𝑓0 (𝑦0 ).

Gaussian Log-Likelihood:

𝑇
−.5 ∑{𝑛𝑦 ln(2𝜋) + ln |Ω𝑡 | + 𝑎′𝑡 Ω−1
𝑡 𝑎𝑡 }
𝑡=0

58.3.2 Covariance Generating Functions

Autocovariance: 𝐶𝑥 (𝜏 ) = 𝐸𝑥𝑡 𝑥′𝑡−𝜏



Generating Function: 𝑆𝑥 (𝑧) = ∑𝜏=−∞ 𝐶𝑥 (𝜏 )𝑧 𝜏 , 𝑧 ∈ 𝐶

58.3.3 Spectral Factorization Identity

Original state-space representation has too many shocks and implies:

𝑆𝑦 (𝑧) = 𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐶𝐶 ′ (𝑧 −1 𝐼 − (𝐴𝑜 )′ )−1 𝐺′ + 𝑅

Innovations representation has as many shocks as dimension of 𝑦𝑡 and implies

𝑆𝑦 (𝑧) = [𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐾 + 𝐼][𝐺Σ𝐺′ + 𝑅][𝐾 ′ (𝑧 −1 𝐼 − 𝐴𝑜′ )−1 𝐺′ + 𝐼]

Equating these two leads to:

𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐶𝐶 ′ (𝑧 −1 𝐼 − 𝐴𝑜′ )−1 𝐺′ + 𝑅 =


[𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐾 + 𝐼][𝐺Σ𝐺′ + 𝑅][𝐾 ′ (𝑧 −1 𝐼 − 𝐴𝑜′ )−1 𝐺′ + 𝐼].

Key Insight: The zeros of the polynomial det[𝐺(𝑧𝐼 − 𝐴𝑜 )−1 𝐾 + 𝐼] all lie inside the unit cir-
cle, which means that 𝑎𝑡 lies in the space spanned by square summable linear combinations of
𝑦𝑡
988 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

𝐻(𝑎𝑡 ) = 𝐻(𝑦𝑡 )

Key Property: Invertibility

58.3.4 Wold and Vector Autoregressive Representations

Let’s start with some lag operator arithmetic


The lag operator 𝐿 and the inverse lag operator 𝐿−1 each map an infinite sequence into an
infinite sequence according to the transformation rules

𝐿𝑥𝑡 ≡ 𝑥𝑡−1

𝐿−1 𝑥𝑡 ≡ 𝑥𝑡+1

A Wold moving average representation for {𝑦𝑡 } is

𝑦𝑡 = [𝐺(𝐼 − 𝐴𝑜 𝐿)−1 𝐾𝐿 + 𝐼]𝑎𝑡

Applying the inverse of the operator on the right side and using

[𝐺(𝐼 − 𝐴𝑜 𝐿)−1 𝐾𝐿 + 𝐼]−1 = 𝐼 − 𝐺[𝐼 − (𝐴𝑜 − 𝐾𝐺)𝐿]−1 𝐾𝐿

gives the vector autoregressive representation


𝑦𝑡 = ∑ 𝐺(𝐴𝑜 − 𝐾𝐺)𝑗−1 𝐾𝑦𝑡−𝑗 + 𝑎𝑡
𝑗=1

58.4 Dynamic Demand Curves and Canonical Household


Technologies

58.4.1 Canonical Household Technologies

ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡
𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡
𝑏𝑡 = 𝑈𝑏 𝑧𝑡

Definition: A household service technology (Δℎ , Θℎ , Π, Λ, 𝑈𝑏 ) is said to be canonical if

• Π is nonsingular, and
• the√absolute values of the eigenvalues of (Δℎ − Θℎ Π−1 Λ) are strictly less than
1/ 𝛽.

. Key invertibility property: A canonical household service technology maps a service


process {𝑠𝑡 } in 𝐿20 into a corresponding consumption process {𝑐𝑡 } for which the implied
household capital stock process {ℎ𝑡 } is also in 𝐿20
58.5. GORMAN AGGREGATION AND ENGEL CURVES 989

An inverse household technology:

𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑠𝑡


ℎ𝑡 = (Δℎ − Θℎ Π−1 Λ)ℎ𝑡−1 + Θℎ Π−1 𝑠𝑡

The restriction on the eigenvalues of the matrix (Δℎ − Θℎ Π−1 Λ) keeps the household capital
stock {ℎ𝑡 } in 𝐿20

58.4.2 Dynamic Demand Functions



𝜌𝑡0 ≡ Π−1′ [𝑝𝑡0 − Θ′ℎ 𝐸𝑡 ∑ 𝛽 𝜏 (Δ′ℎ − Λ′ Π−1′ Θ′ℎ )𝜏−1 Λ′ Π−1′ 𝑝𝑡+𝜏
0
]
𝜏=1

𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1

where ℎ𝑖,−1 = ℎ−1


𝑊0 = 𝐸0 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑡 ) + 𝑣0 ⋅ 𝑘−1
𝑡=0


𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ (𝑏𝑡 − 𝑠𝑖,𝑡 ) − 𝑊0
𝜇𝑤
0 = ∞
𝐸0 ∑𝑡=0 𝛽 𝑡 𝜌𝑡0 ⋅ 𝜌𝑡0

𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑏𝑡 − Π−1 𝜇𝑤


0 𝐸𝑡 {Π
′ −1
− Π′ −1 Θ′ℎ
[𝐼 − (Δ′ℎ − Λ′ Π′ −1 Θ′ℎ )𝛽𝐿−1 ]−1 Λ′ Π′−1 𝛽𝐿−1 }𝑝𝑡0
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

This system expresses consumption demands at date 𝑡 as functions of: (i) time-𝑡 conditional
0
expectations of future scaled Arrow-Debreu prices {𝑝𝑡+𝑠 }∞
𝑠=0 ; (ii) the stochastic process for
the household’s endowment {𝑑𝑡 } and preference shock {𝑏𝑡 }, as mediated through the multi-
plier 𝜇𝑤
0 and wealth 𝑊0 ; and (iii) past values of consumption, as mediated through the state
variable ℎ𝑡−1

58.5 Gorman Aggregation and Engel Curves

We shall explore how the dynamic demand schedule for consumption goods opens up the pos-
sibility of satisfying Gorman’s (1953) conditions for aggregation in a heterogeneous consumer
model
The first equation of our demand system is an Engel curve for consumption that is linear in
the marginal utility 𝜇20 of individual wealth with a coefficient on 𝜇𝑤
0 that depends only on
prices
The multiplier 𝜇𝑤
0 depends on wealth in an affine relationship, so that consumption is linear
in wealth
In a model with multiple consumers who have the same household technologies (Δℎ , Θℎ , Λ, Π)
but possibly different preference shock processes and initial values of household capital stocks,
the coefficient on the marginal utility of wealth is the same for all consumers
990 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

Gorman showed that when Engel curves satisfy this property, there exists a unique commu-
nity or aggregate preference ordering over aggregate consumption that is independent of the
distribution of wealth

58.5.1 Re-Opened Markets



𝜌𝑡𝑡 ≡ Π−1′ [𝑝𝑡𝑡 − Θ′ℎ 𝐸𝑡 ∑ 𝛽 𝜏 (Δ′ℎ − Λ′ Π−1′ Θ′ℎ )𝜏−1 Λ′ Π−1′ 𝑝𝑡+𝜏
𝑡
]
𝜏=1

𝑠𝑖,𝑡 = Λℎ𝑖,𝑡−1
ℎ𝑖,𝑡 = Δℎ ℎ𝑖,𝑡−1 ,

where now ℎ𝑖,𝑡−1 = ℎ𝑡−1 . Define time 𝑡 wealth 𝑊𝑡


𝑊𝑡 = 𝐸𝑡 ∑ 𝛽 𝑗 (𝑤𝑡+𝑗
𝑡
ℓ𝑡+𝑗 + 𝛼𝑡𝑡+𝑗 ⋅ 𝑑𝑡+𝑗 ) + 𝑣𝑡 ⋅ 𝑘𝑡−1
𝑗=0


𝐸𝑡 ∑𝑗=0 𝛽 𝑗 𝜌𝑡+𝑗
𝑡
⋅ (𝑏𝑡+𝑗 − 𝑠𝑖,𝑡+𝑗 ) − 𝑊𝑡
𝜇𝑤
𝑡 = ∞ 𝑡 𝑡
𝐸𝑡 ∑𝑡=0 𝛽 𝑗 𝜌𝑡+𝑗 ⋅ 𝜌𝑡+𝑗

𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑏𝑡 − Π−1 𝜇𝑤


𝑡 𝐸𝑡 {Π
′ −1
− Π′ −1 Θ′ℎ
[𝐼 − (Δ′ℎ − Λ′ Π′ −1 Θ′ℎ )𝛽𝐿−1 ]−1 Λ′ Π′−1 𝛽𝐿−1 }𝑝𝑡𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

58.5.2 Dynamic Demand

Define a time 𝑡 continuation of a sequence {𝑧𝑡 }∞ ∞


𝑡=0 as the sequence {𝑧𝜏 }𝜏=𝑡 . The demand sys-
tem indicates that the time 𝑡 vector of demands for 𝑐𝑡 is influenced by:
Through the multiplier 𝜇𝑤𝑡 , the time 𝑡 continuation of the preference shock process {𝑏𝑡 } and
the time 𝑡 continuation of {𝑠𝑖,𝑡 }
The time 𝑡 − 1 level of household durables ℎ𝑡−1
Everything that affects the household’s time 𝑡 wealth, including its stock of physical capital
𝑘𝑡−1 and its value 𝑣𝑡 , the time 𝑡 continuation of the factor prices {𝑤𝑡 , 𝛼𝑡 }, the household’s
continuation endowment process, and the household’s continuation plan for {ℓ𝑡 }
The time 𝑡 continuation of the vector of prices {𝑝𝑡𝑡 }

58.5.3 Attaining a Canonical Household Technology

Apply the following version of a factorization identity:


[Π + 𝛽 1/2 𝐿Λ(𝐼 − 𝛽 1/2 𝐿Δℎ )−1 Θℎ ]
̂ − 𝛽 1/2 𝐿−1 Δ )−1 Θ ]′ [Π̂ + 𝛽 1/2 𝐿Λ(𝐼
= [Π̂ + 𝛽 1/2 𝐿−1 Λ(𝐼 ̂ − 𝛽 1/2 𝐿Δ )−1 Θ ]
ℎ ℎ ℎ ℎ

The factorization identity guarantees that the [Λ,̂ Π]̂ representation satisfies both require-
ments for a canonical representation
58.5. GORMAN AGGREGATION AND ENGEL CURVES 991

58.5.4 Examples: Partial Equilibrium

Demand:

𝑐𝑡 = −Π−1 Λℎ𝑡−1 + Π−1 𝑏𝑡 − Π−1 𝜇𝑤


0 𝐸𝑡 {Π
′ −1
− Π′ −1 Θ′ℎ
[𝐼 − (Δ′ℎ − Λ′ Π′ −1 Θ′ℎ )𝛽𝐿−1 ]−1 Λ′ Π′−1 𝛽𝐿−1 }𝑝𝑡
ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

Reverse engineer preferences that generate this demand system:


A representative firm takes as given and beyond its control the stochastic process {𝑝𝑡 }∞
𝑡=0

The firm sells its output 𝑐𝑡 in a competitive market each period


Only spot markets convene at each date 𝑡 ≥ 0
The firm also faces an exogenous process of cost disturbances 𝑑𝑡
The firm chooses stochastic processes {𝑐𝑡 , 𝑔𝑡 , 𝑖𝑡 , 𝑘𝑡 }∞
𝑡=0 to maximize


𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 ⋅ 𝑐𝑡 − 𝑔𝑡 ⋅ 𝑔𝑡 /2}
𝑡=0

subject to given 𝑘−1 and

Φ𝑐 𝑐𝑡 + Φ𝑖 𝑖𝑡 + Φ𝑔 𝑔𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡
𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡 .

58.5.5 Equilibrium Investment Under Uncertainty

A representative firm maximizes


𝐸 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − 𝑔𝑡2 /2}
𝑡=0

subject to the technology

𝑐𝑡 = 𝛾𝑘𝑡−1
𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡
𝑔𝑡 = 𝑓1 𝑖𝑡 + 𝑓2 𝑑𝑡

where 𝑑𝑡 is a cost shifter, 𝛾 > 0, and 𝑓1 > 0 is a cost parameter and 𝑓2 = 1. Demand is
governed by

𝑝𝑡 = 𝛼0 − 𝛼1 𝑐𝑡 + 𝑢𝑡

where 𝑢𝑡 is a demand shifter with mean zero and 𝛼0 , 𝛼1 are positive parameters
Assume that 𝑢𝑡 , 𝑑𝑡 are uncorrelated first-order autoregressive processes
992 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

58.5.6 A Rosen-Topel Housing Model

𝑅𝑡 = 𝑏𝑡 + 𝛼ℎ𝑡

𝑝𝑡 = 𝐸𝑡 ∑(𝛽𝛿ℎ )𝜏 𝑅𝑡+𝜏
𝜏=0

where ℎ𝑡 is the stock of housing at time 𝑡 𝑅𝑡 is the rental rate for housing, 𝑝𝑡 is the price of
new houses, and 𝑏𝑡 is a demand shifter; 𝛼 < 0 is a demand parameter, and 𝛿ℎ is a deprecia-
tion factor for houses
We cast this demand specification within our class of models by letting the stock of houses ℎ𝑡
evolve according to

ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 𝛿ℎ ∈ (0, 1)

where 𝑐𝑡 is the rate of production of new houses


̄ 𝑡 or 𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡 , where 𝜆 = 𝜆𝛿
Houses produce services 𝑠𝑡 according to 𝑠𝑡 = 𝜆ℎ ̄ ℎ , 𝜋 = 𝜆̄
̄ 𝑡0 = 𝑅𝑡 as the rental rate on housing at time 𝑡, measured in units of time 𝑡
We can take 𝜆𝜌
consumption (housing)
Demand for housing services is

𝑠𝑡 = 𝑏𝑡 − 𝜇0 𝜌𝑡0

where the price of new houses 𝑝𝑡 is related to 𝜌𝑡0 by 𝜌𝑡0 = 𝜋−1 [𝑝𝑡 − 𝛽𝛿ℎ 𝐸𝑡 𝑝𝑡+1 ]

58.6 Cattle Cycles

Rosen, Murphy, and Scheinkman (1994). Let 𝑝𝑡 be the price of freshly slaughtered beef, 𝑚𝑡
the feeding cost of preparing an animal for slaughter, ℎ̃ 𝑡 the one-period holding cost for a ma-
ture animal, 𝛾1 ℎ̃ 𝑡 the one-period holding cost for a yearling, and 𝛾0 ℎ̃ 𝑡 the one-period holding
cost for a calf
The cost processes {ℎ̃ 𝑡 , 𝑚𝑡 }∞ ∞
𝑡=0 are exogenous, while the stochastic process {𝑝𝑡 }𝑡=0 is deter-
mined by a rational expectations equilibrium. Let 𝑥𝑡̃ be the breeding stock, and 𝑦𝑡̃ be the to-
tal stock of animals
The law of motion for cattle stocks is

𝑥𝑡̃ = (1 − 𝛿)𝑥𝑡−1
̃ + 𝑔𝑥𝑡−3
̃ − 𝑐𝑡

where 𝑐𝑡 is a rate of slaughtering. The total head-count of cattle

𝑦𝑡̃ = 𝑥𝑡̃ + 𝑔𝑥𝑡−1


̃ + 𝑔𝑥𝑡−2
̃

is the sum of adults, calves, and yearlings, respectively


A representative farmer chooses {𝑐𝑡 , 𝑥𝑡̃ } to maximize
58.7. MODELS OF OCCUPATIONAL CHOICE AND PAY 993


𝐸0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ̃ 𝑡 𝑥𝑡̃ − (𝛾0 ℎ̃ 𝑡 )(𝑔𝑥𝑡−1
̃ ) − (𝛾1 ℎ̃ 𝑡 )(𝑔𝑥𝑡−2
̃ ) − 𝑚𝑡 𝑐𝑡
𝑡=0
− Ψ(𝑥𝑡̃ , 𝑥𝑡−1
̃ , 𝑥𝑡−2
̃ , 𝑐𝑡 )}

where

𝜓1 2 𝜓2 2 𝜓 𝜓
Ψ= 𝑥𝑡̃ + ̃ + 3 𝑥2𝑡−2
𝑥𝑡−1 ̃ + 4 𝑐𝑡2
2 2 2 2

Demand is governed by

𝑐𝑡 = 𝛼0 − 𝛼1 𝑝𝑡 + 𝑑𝑡̃

where 𝛼0 > 0, 𝛼1 > 0, and {𝑑𝑡̃ }∞


𝑡=0 is a stochastic process with mean zero representing a
demand shifter

58.7 Models of Occupational Choice and Pay

• Rosen schooling model for engineers


• Two-occupation model

58.7.1 Market for Engineers

Ryoo and Rosen’s (2004) [114] model consists of the following equations:
first, a demand curve for engineers

𝑤𝑡 = −𝛼𝑑 𝑁𝑡 + 𝜖1𝑡 , 𝛼𝑑 > 0

second, a time-to-build structure of the education process

𝑁𝑡+𝑘 = 𝛿𝑁 𝑁𝑡+𝑘−1 + 𝑛𝑡 , 0 < 𝛿𝑁 < 1

third, a definition of the discounted present value of each new engineering student


𝑣𝑡 = 𝛽 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗 ;
𝑘

𝑗=0

and fourth, a supply curve of new students driven by 𝑣𝑡

𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖2𝑡 , 𝛼𝑠 > 0

Here {𝜖1𝑡 , 𝜖2𝑡 } are stochastic processes of labor demand and supply shocks

Definition: A partial equilibrium is a stochastic process {𝑤𝑡 , 𝑁𝑡 , 𝑣𝑡 , 𝑛𝑡 }𝑡=0 satisfying these
four equations, and initial conditions 𝑁−1 , 𝑛−𝑠 , 𝑠 = 1, … , −𝑘
994 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

We sweep the time-to-build structure and the demand for engineers into the household tech-
nology and putting the supply of new engineers into the technology for producing goods

ℎ1𝑡−1
⎡ ℎ ⎤
𝑠𝑡 = [𝜆1 0 … 0] ⎢ 2𝑡−1 ⎥ + 0 ⋅ 𝑐𝑡
⎢ ⋮ ⎥

⎣ 𝑘+1,𝑡−1 ⎦
ℎ1𝑡 𝛿𝑁 1 0 ⋯ 0 ℎ1𝑡−1 0
⎡ ℎ ⎤ ⎡0 0 1 ⋯ 0⎤ ⎡ ℎ2𝑡−1 ⎤ ⎡0⎤
⎢ 2𝑡 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥=⎢ ⋮ ⋮ ⋮ ⋱ ⋮⎥⎢ ⋮ ⎥ + ⎢ ⋮ ⎥ 𝑐𝑡
⎢ ℎ𝑘,𝑡 ⎥ ⎢ 0 ⋯ ⋯ 0 1⎥ ⎢ ℎ𝑘,𝑡−1 ⎥ ⎢0⎥
⎣ℎ𝑘+1,𝑡 ⎦ ⎣ 0 0 0 ⋯ 0⎦ ⎣ℎ𝑘+1,𝑡−1 ⎦ ⎣1⎦

This specification sets Rosen’s 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−𝜏 , 𝜏 = 1, … , 𝑘, and uses the
home-produced service to capture the demand for labor. Here 𝜆1 embodies Rosen’s demand
parameter 𝛼𝑑

• The supply of new workers becomes our consumption


• The dynamic demand curve becomes Rosen’s dynamic supply curve for new workers

Remark: This has an Imai-Keane flavor


For more details and Python code see Rosen schooling model

58.7.2 Skilled and Unskilled Workers

First, a demand curve for labor

𝑤 𝑁
[ 𝑢𝑡 ] = 𝛼𝑑 [ 𝑢𝑡 ] + 𝜖1𝑡
𝑤𝑠𝑡 𝑁𝑠𝑡

where 𝛼𝑑 is a (2 × 2) matrix of demand parameters and 𝜖1𝑡 is a vector of demand shifters sec-
ond, time-to-train specifications for skilled and unskilled labor, respectively:

𝑁𝑠𝑡+𝑘 = 𝛿𝑁 𝑁𝑠𝑡+𝑘−1 + 𝑛𝑠𝑡


𝑁𝑢𝑡 = 𝛿𝑁 𝑁𝑢𝑡−1 + 𝑛𝑢𝑡 ;

where 𝑁𝑠𝑡 , 𝑁𝑢𝑡 are stocks of the two types of labor, and 𝑛𝑠𝑡 , 𝑛𝑢𝑡 are entry rates into the two
occupations
third, definitions of discounted present values of new entrants to the skilled and unskilled oc-
cupations, respectively:


𝑣𝑠𝑡 = 𝐸𝑡 𝛽 𝑘 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑠𝑡+𝑘+𝑗
𝑗=0

𝑣𝑢𝑡 = 𝐸𝑡 ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑢𝑡+𝑗
𝑗=0

where 𝑤𝑢𝑡 , 𝑤𝑠𝑡 are wage rates for the two occupations; and fourth, supply curves for new en-
trants:
58.8. PERMANENT INCOME MODELS 995

𝑛 𝑣
[ 𝑠𝑡 ] = 𝛼𝑠 [ 𝑢𝑡 ] + 𝜖2𝑡
𝑛𝑢𝑡 𝑣𝑠𝑡

Short Cut
As an alternative, Siow simply used the equalizing differences condition

𝑣𝑢𝑡 = 𝑣𝑠𝑡

58.8 Permanent Income Models

• Many consumption goods and services


• A single capital good with ‘𝑅𝛽 = 1’

𝜙𝑐 ⋅ 𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑒𝑡
𝑘𝑡 = 𝑘𝑡−1 + 𝑖𝑡

𝜙𝑖 𝑖 𝑡 − 𝑔 𝑡 = 0

Implication One:
Equality of Present Values of Moving Average Coefficients of 𝑐 and 𝑒


𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 (𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )
𝑗=0


𝑘𝑡−1 = 𝛽 ∑ 𝛽 𝑗 𝐸(𝜙𝑐 ⋅ 𝑐𝑡+𝑗 − 𝑒𝑡+𝑗 )|𝐽𝑡
𝑗=0

∞ ∞
∑ 𝛽 (𝜙𝑐 ) 𝜒𝑗 = ∑ 𝛽 𝑗 𝜖𝑗
𝑗 ′

𝑗=0 𝑗=0

where 𝜒𝑗 𝑤𝑡 is the response of 𝑐𝑡+𝑗 to 𝑤𝑡 and 𝜖𝑗 𝑤𝑡 is the response of endowment 𝑒𝑡+𝑗 to 𝑤𝑡 :


Implication Two:
Martingales

ℳ𝑘𝑡 = 𝐸(ℳ𝑘𝑡+1 |𝐽𝑡 )


ℳ𝑒𝑡 = 𝐸(ℳ𝑒𝑡+1 |𝐽𝑡 )

and

ℳ𝑐𝑡 = (Φ𝑐 )′ ℳ𝑑𝑡 = 𝜙𝑐 𝑀𝑡𝑒


996 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

Testing Permanent Income Models:


Test the two implications:

• Equality of present values of moving average coefficients


• Martingale ℳ𝑘𝑡

These have been tested in work by Hansen, Sargent, and Roberts (1991) [116] and by Attana-
sio and Pavoni (2011) [10]

58.9 Gorman Heterogeneous Households

We now assume that there is a finite number of households, each with its own household tech-
nology and preferences over consumption services
Household 𝑗 orders preferences over consumption processes according to


1
− ( ) 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) ⋅ (𝑠𝑗𝑡 − 𝑏𝑗𝑡 ) + ℓ𝑗𝑡
2
] ∣ 𝐽0
2 𝑡=0

𝑠𝑗𝑡 = Λ ℎ𝑗,𝑡−1 + Π 𝑐𝑗𝑡

ℎ𝑗𝑡 = Δℎ ℎ𝑗,𝑡−1 + Θℎ 𝑐𝑗𝑡

and ℎ𝑗,−1 is given

𝑏𝑗𝑡 = 𝑈𝑏𝑗 𝑧𝑡

∞ ∞
𝐸 ∑ 𝛽 𝑡 𝑝𝑡0 ⋅ 𝑐𝑗𝑡 ∣ 𝐽0 = 𝐸 ∑ 𝛽 𝑡 (𝑤𝑡0 ℓ𝑗𝑡 + 𝛼0𝑡 ⋅ 𝑑𝑗𝑡 ) ∣ 𝐽0 + 𝑣0 ⋅ 𝑘𝑗,−1 ,
𝑡=0 𝑡=0

where 𝑘𝑗,−1 is given. The 𝑗th consumer owns an endowment process 𝑑𝑗𝑡 , governed by the
stochastic process 𝑑𝑗𝑡 = 𝑈𝑑𝑗 𝑧𝑡
We refer to this as a setting with Gorman heterogeneous households
This specification confines heterogeneity among consumers to:

• differences in the preference processes {𝑏𝑗𝑡 }, represented by different selections of 𝑈𝑏𝑗


• differences in the endowment processes {𝑑𝑗𝑡 }, represented by different selections of 𝑈𝑑𝑗
• differences in ℎ𝑗,−1 and
• differences in 𝑘𝑗,−1

The matrices Λ, Π, Δℎ , Θℎ do not depend on 𝑗


This makes everybody’s demand system have the form described earlier, with different 𝜇𝑤 𝑗0 ’s
(reflecting different wealth levels) and different 𝑏𝑗𝑡 preference shock processes and initial con-
ditions for household capital stocks
58.10. NON-GORMAN HETEROGENEOUS HOUSEHOLDS 997

Punchline: ∃ a representative consumer


We can use the representative consumer to compute a competitive equilibrium aggregate
allocation and price system
With the equilibrium aggregate allocation and price system in hand, we can then compute
allocations to each household
Computing Allocations to Individuals:
Set

ℓ𝑗𝑡 = (𝜇𝑤 𝑤
0𝑗 /𝜇0𝑎 )ℓ𝑎𝑡

Then solve the following equation for 𝜇𝑤


0𝑗 :

∞ ∞
𝜇𝑤 𝑡 0 0 0 𝑤 𝑡 0 𝑖 0
0𝑗 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ 𝜌𝑡 + (𝑤𝑡 /𝜇0𝑎 )ℓ𝑎𝑡 } = 𝐸0 ∑ 𝛽 {𝜌𝑡 ⋅ (𝑏𝑗𝑡 − 𝑠𝑗𝑡 ) − 𝛼𝑡 ⋅ 𝑑𝑗𝑡 } − 𝑣0 𝑘𝑗,−1
𝑡=0 𝑡=0

𝑠𝑗𝑡 − 𝑏𝑗𝑡 = 𝜇𝑤 0
0𝑗 𝜌𝑡

𝑐𝑗𝑡 = −Π−1 Λℎ𝑗,𝑡−1 + Π−1 𝑠𝑗𝑡


ℎ𝑗𝑡 = (Δℎ − Θℎ Π−1 Λ)ℎ𝑗,𝑡−1 + Π−1 Θℎ 𝑠𝑗𝑡

Here ℎ𝑗,−1 given

58.10 Non-Gorman Heterogeneous Households

We now describe a less tractable type of heterogeneity across households that we dub Non-
Gorman heterogeneity
Here is the specification
Preferences and Household Technologies:


1
− 𝐸 ∑ 𝛽 𝑡 [(𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) ⋅ (𝑠𝑖𝑡 − 𝑏𝑖𝑡 ) + ℓ𝑖𝑡
2
] ∣ 𝐽0
2 𝑡=0

𝑠𝑖𝑡 = Λ𝑖 ℎ𝑖𝑡−1 + Π𝑖 𝑐𝑖𝑡


ℎ𝑖𝑡 = Δℎ𝑖 ℎ𝑖𝑡−1 + Θℎ𝑖 𝑐𝑖𝑡 , 𝑖 = 1, 2.

𝑏𝑖𝑡 = 𝑈𝑏𝑖 𝑧𝑡

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1

Production Technology

Φ𝑐 (𝑐1𝑡 + 𝑐2𝑡 ) + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑1𝑡 + 𝑑2𝑡


998 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES

𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡

𝑔𝑡 ⋅ 𝑔𝑡 = ℓ𝑡2 , ℓ𝑡 = ℓ1𝑡 + ℓ2𝑡

𝑑𝑖𝑡 = 𝑈𝑑𝑖 𝑧𝑡 , 𝑖 = 1, 2

Pareto Problem:


1
− 𝜆𝐸0 ∑ 𝛽 𝑡 [(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + ℓ1𝑡
2
]
2 𝑡=0

1
− (1 − 𝜆)𝐸0 ∑ 𝛽 𝑡 [(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 ) + ℓ2𝑡
2
]
2 𝑡=0

Mongrel Aggregation: Static


There is what we call a kind of mongrel aggregation in this setting
We first describe the idea within a simple static setting in which there is a single consumer
static inverse demand with implied preferences:

𝑐𝑡 = Π−1 𝑏𝑡 − 𝜇0 Π−1 Π−1′ 𝑝𝑡

An inverse demand curve is

𝑝𝑡 = 𝜇−1 ′ −1 ′
0 Π 𝑏𝑡 − 𝜇0 Π Π𝑐𝑡

Integrating the marginal utility vector shows that preferences can be taken to be

(−2𝜇0 )−1 (Π𝑐𝑡 − 𝑏𝑡 ) ⋅ (Π𝑐𝑡 − 𝑏𝑡 )

Key Insight: Factor the inverse of a ‘covariance matrix’


Now assume that there are two consumers, 𝑖 = 1, 2, with demand curves

𝑐𝑖𝑡 = Π−1 −1 −1′


𝑖 𝑏𝑖𝑡 − 𝜇0𝑖 Π𝑖 Π𝑖 𝑝𝑡

𝑐1𝑡 + 𝑐2𝑡 = (Π−1 −1 −1 −1′


1 𝑏1𝑡 + Π2 𝑏2𝑡 ) − (𝜇01 Π1 Π1 + 𝜇02 Π2 Π−1′
2 )𝑝𝑡

Setting 𝑐1𝑡 + 𝑐2𝑡 = 𝑐𝑡 and solving for 𝑝𝑡 gives

𝑝𝑡 = (𝜇01 Π−1 −1′


1 Π1 + 𝜇02 Π−1 −1′ −1 −1 −1
2 Π2 ) (Π1 𝑏1𝑡 + Π2 𝑏2𝑡 )
− (𝜇01 Π−1 −1′
1 Π1 + 𝜇02 Π−1 −1′ −1
2 Π2 ) 𝑐 𝑡

Punchline: choose Π associated with the aggregate ordering to satisfy

𝜇−1 ′ −1 −1′
0 Π Π = (𝜇01 Π1 Π2 + 𝜇02 Π−1 −1′ −1
2 Π2 )
58.10. NON-GORMAN HETEROGENEOUS HOUSEHOLDS 999

Dynamic Analogue:
We now describe how to extend mongrel aggregation to a dynamic setting
The key comparison is

• Static: factor a covariance matrix-like object


• Dynamic: factor a spectral-density matrix-like object

Programming Problem for Dynamic Mongrel Aggregation:


Our strategy for deducing the mongrel preference ordering over 𝑐𝑡 = 𝑐1𝑡 + 𝑐2𝑡 is to solve the
programming problem: choose {𝑐1𝑡 , 𝑐2𝑡 } to maximize the criterion


∑ 𝛽 𝑡 [𝜆(𝑠1𝑡 − 𝑏1𝑡 ) ⋅ (𝑠1𝑡 − 𝑏1𝑡 ) + (1 − 𝜆)(𝑠2𝑡 − 𝑏2𝑡 ) ⋅ (𝑠2𝑡 − 𝑏2𝑡 )]
𝑡=0

subject to

ℎ𝑗𝑡 = Δℎ𝑗 ℎ𝑗𝑡−1 + Θℎ𝑗 𝑐𝑗𝑡 , 𝑗 = 1, 2


𝑠𝑗𝑡 = Δ𝑗 ℎ𝑗𝑡−1 + Π𝑗 𝑐𝑗𝑡 , 𝑗 = 1, 2
𝑐1𝑡 + 𝑐2𝑡 = 𝑐𝑡

subject to (ℎ1,−1 , ℎ2,−1 ) given and {𝑏1𝑡 }, {𝑏2𝑡 }, {𝑐𝑡 } being known and fixed sequences
Substituting the {𝑐1𝑡 , 𝑐2𝑡 } sequences that solve this problem as functions of {𝑏1𝑡 , 𝑏2𝑡 , 𝑐𝑡 } into
the objective determines a mongrel preference ordering over {𝑐𝑡 } = {𝑐1𝑡 + 𝑐2𝑡 }
In solving this problem, it is convenient to proceed by using Fourier transforms. For details,
please see [59] where they deploy a
Secret Weapon: Another application of the spectral factorization identity
Concluding remark: The [59] class of models described in this lecture are all complete
markets models. We have exploited the fact that complete market models are all alike to
allow us to define a class that gives the same name to different things in the spirit of
Henri Poincare
Could we create such a class for incomplete markets models?
That would be nice, but before trying it would be wise to contemplate the remainder of a
statement by Robert E. Lucas, Jr., with which we began this lecture

“Complete market economies are all alike but each incomplete market economy is
incomplete in its own individual way.” Robert E. Lucas, Jr., (1989)
1000 58. RECURSIVE MODELS OF DYNAMIC LINEAR ECONOMIES
59

Growth in Dynamic Linear


Economies

59.1 Contents

• Common Structure 59.2

• A Planning Problem 59.3

• Example Economies 59.4

Co-author: Sebastian Graves


This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s included in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

This lecture describes several complete market economies having a common linear-quadratic-
Gaussian structure
Three examples of such economies show how the DLE class can be used to compute equilibria
of such economies in Python and to illustrate how different versions of these economies can or
cannot generate sustained growth
We require the following imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ
from quantecon import DLE
%matplotlib inline

59.2 Common Structure

Our example economies have the following features

1001
1002 59. GROWTH IN DYNAMIC LINEAR ECONOMIES

• Information flows are governed by an exogenous stochastic process 𝑧𝑡 that follows

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1

where 𝑤𝑡+1 is a martingale difference sequence


• Preference shocks 𝑏𝑡 and technology shocks 𝑑𝑡 are linear functions of 𝑧𝑡

𝑏𝑡 = 𝑈𝑏 𝑧𝑡

𝑑𝑡 = 𝑈 𝑑 𝑧𝑡

• Consumption and physical investment goods are produced using the following technol-
ogy

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡

𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡

𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡

where 𝑐𝑡 is a vector of consumption goods, 𝑔𝑡 is a vector of intermediate goods, 𝑖𝑡 is a


vector of investment goods, 𝑘𝑡 is a vector of physical capital goods, and 𝑙𝑡 is the amount
of labor supplied by the representative household
• Preferences of a representative household are described by

1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡

ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

where 𝑠𝑡 is a vector of consumption services, and ℎ𝑡 is a vector of household capital stocks


Thus, an instance of this class of economies is described by the matrices

{𝐴22 , 𝐶2 , 𝑈𝑏 , 𝑈𝑑 , Φ𝑐 , Φ𝑔 , Φ𝑖 , Γ, Δ𝑘 , Θ𝑘 , Λ, Π, Δℎ , Θℎ }

and the scalar 𝛽

59.3 A Planning Problem

The first welfare theorem asserts that a competitive equilibrium allocation solves the follow-
ing planning problem
Choose {𝑐𝑡 , 𝑠𝑡 , 𝑖𝑡 , ℎ𝑡 , 𝑘𝑡 , 𝑔𝑡 }∞
𝑡=0 to maximize

1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 ) ⋅ (𝑠𝑡 − 𝑏𝑡 ) + 𝑔𝑡 ⋅ 𝑔𝑡 ]
2 𝑡=0
59.3. A PLANNING PROBLEM 1003

subject to the linear constraints

Φ𝑐 𝑐𝑡 + Φ𝑔 𝑔𝑡 + Φ𝑖 𝑖𝑡 = Γ𝑘𝑡−1 + 𝑑𝑡

𝑘𝑡 = Δ𝑘 𝑘𝑡−1 + Θ𝑘 𝑖𝑡

ℎ𝑡 = Δℎ ℎ𝑡−1 + Θℎ 𝑐𝑡

𝑠𝑡 = Λℎ𝑡−1 + Π𝑐𝑡

and

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1

𝑏𝑡 = 𝑈𝑏 𝑧𝑡

𝑑𝑡 = 𝑈 𝑑 𝑧𝑡

The DLE class in Python maps this planning problem into a linear-quadratic dynamic pro-
gramming problem and then solves it by using QuantEcon’s LQ class
(See Section 5.5 of Hansen & Sargent (2013) [59] for a full description of how to map these
economies into an LQ setting, and how to use the solution to the LQ problem to construct
the output matrices in order to simulate the economies)
The state for the LQ problem is

ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦

and the control variable is 𝑢𝑡 = 𝑖𝑡


Once the LQ problem has been solved, the law of motion for the state is

𝑥𝑡+1 = (𝐴 − 𝐵𝐹 )𝑥𝑡 + 𝐶𝑤𝑡+1

where the optimal control law is 𝑢𝑡 = −𝐹 𝑥𝑡


Letting 𝐴𝑜 = 𝐴 − 𝐵𝐹 we write this law of motion as

𝑥𝑡+1 = 𝐴𝑜 𝑥𝑡 + 𝐶𝑤𝑡+1
1004 59. GROWTH IN DYNAMIC LINEAR ECONOMIES

59.4 Example Economies

Each of the example economies shown here will share a number of components. In particular,
for each we will consider preferences of the form

1 ∞
− E ∑ 𝛽 𝑡 [(𝑠𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ], 0 < 𝛽 < 1
2 𝑡=0

𝑠𝑡 = 𝜆ℎ𝑡−1 + 𝜋𝑐𝑡

ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝜃ℎ 𝑐𝑡

𝑏𝑡 = 𝑈𝑏 𝑧𝑡

Technology of the form

𝑐𝑡 + 𝑖𝑡 = 𝛾1 𝑘𝑡−1 + 𝑑1𝑡

𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡

𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0

𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0

And information of the form

1 0 0 0 0
𝑧𝑡+1 = ⎢ 0 0.8 0 ⎥ 𝑧𝑡 + ⎢ 1 0 ⎤
⎡ ⎤ ⎡
⎥ 𝑤𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦

𝑈𝑏 = [ 30 0 0 ]

5 1 0
𝑈𝑑 = [ ]
0 0 0

We shall vary {𝜆, 𝜋, 𝛿ℎ , 𝜃ℎ , 𝛾1 , 𝛿𝑘 , 𝜙1 } and the initial state 𝑥0 across the three economies
59.4. EXAMPLE ECONOMIES 1005

59.4.1 Example 1: Hall (1978)

First, we set parameters such that consumption follows a random walk. In particular, we set

1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05

(In this economy 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1
For simulations of this economy, we choose an initial condition of


𝑥0 = [ 5 150 1 0 0 ]

In [3]: # Parameter Matrices


γ_1 = 0.1
�_1 = 1e-5

�_c, �_g, �_i, γ, δ_k, θ_k = (np.array([[1], [0]]),


np.array([[0], [1]]),
np.array([[1], [-�_1]]),
np.array([[γ_1], [0]]),
np.array([[.95]]),
np.array([[1]]))

β, l_λ, π_h, δ_h, θ_h = (np.array([[1 / 1.05]]),


np.array([[0]]),
np.array([[1]]),
np.array([[.9]]),
np.array([[1]]) - np.array([[.9]]))

a22, c2, ub, ud = (np.array([[1, 0, 0],


[0, 0.8, 0],
[0, 0, 0.5]]),
np.array([[0, 0],
[1, 0],
[0, 1]]),
np.array([[30, 0, 0]]),
np.array([[5, 1, 0],
[0, 0, 0]]))

# Initial condition
x0 = np.array([[5], [150], [1], [0], [0]])

Info1 = (a22, c2, ub, ud)


Tech1 = (�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = (β, l_λ, π_h, δ_h, θ_h)

These parameter values are used to define an economy of the DLE class

In [4]: Econ1 = DLE(Info1, Tech1, Pref1)

We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0

In [5]: Econ1.compute_sequence(x0, ts_length=300)

The economy stores the simulated values for each variable. Below we plot consumption and
investment
1006 59. GROWTH IN DYNAMIC LINEAR ECONOMIES

In [6]: # This is the right panel of Fig 5.7.1 from p.105 of HS2013
plt.plot(Econ1.c[0], label='Cons.')
plt.plot(Econ1.i[0], label='Inv.')
plt.legend()
plt.show()

Inspection of the plot shows that the sample paths of consumption and investment drift in
ways that suggest that each has or nearly has a random walk or unit root component
This is confirmed by checking the eigenvalues of 𝐴𝑜

In [7]: Econ1.endo, Econ1.exo

Out[7]: (array([0.9, 1. ]), array([1. , 0.8, 0.5]))

The endogenous eigenvalue that appears to be unity reflects the random walk character of
consumption in Hall’s model

• Actually, the largest endogenous eigenvalue is very slightly below 1


• This outcome comes from the small adjustment cost 𝜙1

In [8]: Econ1.endo[1]

Out[8]: 0.9999999999904767

The fact that the largest endogenous eigenvalue is strictly less than unity in modulus means
that it is possible to compute the non-stochastic steady state of consumption, investment and
capital

In [9]: Econ1.compute_steadystate()
np.set_printoptions(precision=3, suppress=True)
print(Econ1.css, Econ1.iss, Econ1.kss)
59.4. EXAMPLE ECONOMIES 1007

[[4.999]] [[-0.001]] [[-0.022]]

However, the near-unity endogenous eigenvalue means that these steady state values are of
little relevance

59.4.2 Example 2: Altered Growth Condition

We generate our next economy by making two alterations to the parameters of Example 1

• First, we raise 𝜙1 from 0.00001 to 1

– This will lower the endogenous eigenvalue that is close to 1, causing the economy
to head more quickly to the vicinity of its non-stochastic steady-state

• Second, we raise 𝛾1 from 0.1 to 0.15

– This has the effect of raising the optimal steady-state value of capital

We also start the economy off from an initial condition with a lower capital stock


𝑥0 = [ 5 20 1 0 0 ]

Therefore, we need to define the following new parameters

In [10]: γ2 = 0.15
γ22 = np.array([[γ2], [0]])

�_12 = 1
�_i2 = np.array([[1], [-�_12]])

Tech2 = (�_c, �_g, �_i2, γ22, δ_k, θ_k)

x02 = np.array([[5], [20], [1], [0], [0]])

Creating the DLE class and then simulating gives the following plot for consumption and in-
vestment

In [11]: Econ2 = DLE(Info1, Tech2, Pref1)

Econ2.compute_sequence(x02, ts_length=300)

plt.plot(Econ2.c[0], label='Cons.')
plt.plot(Econ2.i[0], label='Inv.')
plt.legend()
plt.show()
1008 59. GROWTH IN DYNAMIC LINEAR ECONOMIES

Simulating our new economy shows that consumption grows quickly in the early stages of the
sample
However, it then settles down around the new non-stochastic steady-state level of consump-
tion of 17.5, which we find as follows

In [12]: Econ2.compute_steadystate()
print(Econ2.css, Econ2.iss, Econ2.kss)

[[17.5]] [[6.25]] [[125.]]

The economy converges faster to this level than in Example 1 because the largest endogenous
eigenvalue of 𝐴𝑜 is now significantly lower than 1

In [13]: Econ2.endo, Econ2.exo

Out[13]: (array([0.9 , 0.952]), array([1. , 0.8, 0.5]))

59.4.3 Example 3: A Jones-Manuelli (1990) Economy

For our third economy, we choose parameter values with the aim of generating sustained
growth in consumption, investment and capital
To do this, we set parameters so that Jones and Manuelli’s “growth condition” is just satisfied
In our notation, just satisfying the growth condition is actually equivalent to setting 𝛽(𝛾1 +
𝛿𝑘 ) = 1, the condition that was necessary for consumption to be a random walk in Hall’s
model
Thus, we lower 𝛾1 back to 0.1
In our model, this is a necessary but not sufficient condition for growth
59.4. EXAMPLE ECONOMIES 1009

To generate growth we set preference parameters to reflect habit persistence


In particular, we set 𝜆 = −1, 𝛿ℎ = 0.9 and 𝜃ℎ = 1 − 𝛿ℎ = 0.1
This makes preferences assume the form

1 ∞ ∞
− E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 − (1 − 𝛿ℎ ) ∑ 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 )2 + 𝑙2𝑡 ]
2 𝑡=0 𝑗=0

These preferences reflect habit persistence


• the effective “bliss point” 𝑏𝑡 + (1 − 𝛿ℎ ) ∑𝑗=0 𝛿ℎ𝑗 𝑐𝑡−𝑗−1 now shifts in response to a moving
average of past consumption

Since 𝛿ℎ and 𝜃ℎ were defined earlier, the only change we need to make from the parameters of
Example 1 is to define the new value of 𝜆

In [14]: l_λ2 = np.array([[-1]])


Pref2 = (β, l_λ2, π_h, δ_h, θ_h)

In [15]: Econ3 = DLE(Info1, Tech1, Pref2)

We simulate this economy from the original state vector

In [16]: Econ3.compute_sequence(x0, ts_length=300)

# This is the right panel of Fig 5.10.1 from p.110 of HS2013


plt.plot(Econ3.c[0], label='Cons.')
plt.plot(Econ3.i[0], label='Inv.')
plt.legend()
plt.show()
1010 59. GROWTH IN DYNAMIC LINEAR ECONOMIES

Thus, adding habit persistence to the Hall model of Example 1 is enough to generate sus-
tained growth in our economy
The eigenvalues of 𝐴𝑜 in this new economy are

In [17]: Econ3.endo, Econ3.exo

Out[17]: (array([1.+0.j, 1.-0.j]), array([1. , 0.8, 0.5]))

We now have two unit endogenous eigenvalues. One stems from satisfying the growth condi-
tion (as in Example 1)
The other unit eigenvalue results from setting 𝜆 = −1
To show the importance of both of these for generating growth, we consider the following ex-
periments

59.4.4 Example 3.1: Varying Sensitivity

Next we raise 𝜆 to -0.7

In [18]: l_λ3 = np.array([[-0.7]])


Pref3 = (β, l_λ3, π_h, δ_h, θ_h)

Econ4 = DLE(Info1, Tech1, Pref3)

Econ4.compute_sequence(x0, ts_length=300)

plt.plot(Econ4.c[0], label='Cons.')
plt.plot(Econ4.i[0], label='Inv.')
plt.legend()
plt.show()
59.4. EXAMPLE ECONOMIES 1011

We no longer achieve sustained growth if 𝜆 is raised from -1 to -0.7


This is related to the fact that one of the endogenous eigenvalues is now less than 1

In [19]: Econ4.endo, Econ4.exo

Out[19]: (array([0.97, 1. ]), array([1. , 0.8, 0.5]))

59.4.5 Example 3.2: More Impatience

Next let’s lower 𝛽 to 0.94

In [20]: β_2 = np.array([[0.94]])


Pref4 = (β_2, l_λ, π_h, δ_h, θ_h)

Econ5 = DLE(Info1, Tech1, Pref4)

Econ5.compute_sequence(x0, ts_length=300)

plt.plot(Econ5.c[0], label='Cons.')
plt.plot(Econ5.i[0], label='Inv.')
plt.legend()
plt.show()

Growth also fails if we lower 𝛽, since we now have 𝛽(𝛾1 + 𝛿𝑘 ) < 1


Consumption and investment explode downwards, as a lower value of 𝛽 causes the representa-
tive consumer to front-load consumption
This explosive path shows up in the second endogenous eigenvalue now being larger than one

In [21]: Econ5.endo, Econ5.exo

Out[21]: (array([0.9 , 1.013]), array([1. , 0.8, 0.5]))


1012 59. GROWTH IN DYNAMIC LINEAR ECONOMIES
60

Lucas Asset Pricing Using DLE

60.1 Contents

• Asset Pricing Equations 60.2


• Asset Pricing Simulations 60.3

Co-author: Sebastian Graves


This is one of a suite of lectures that use the quantecon DLE class to instantiate models
within the [59] class of models described in detail in Recursive Models of Dynamic Linear
Economies
In addition to what’s in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

This lecture uses the DLE class to price payout streams that are linear functions of the econ-
omy’s state vector, as well as risk-free assets that pay out one unit of the first consumption
good with certainty
We assume basic knowledge of the class of economic environments that fall within the domain
of the DLE class
Many details about the basic environment are contained in the lecture Growth in Dynamic
Linear Economies
We’ll also need the following imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ
from quantecon import DLE
%matplotlib inline

We use a linear-quadratic version of an economy that Lucas (1978) [88] used to develop an
equilibrium theory of asset prices:
Preferences

1 ∞
− E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0

1013
1014 60. LUCAS ASSET PRICING USING DLE

𝑠𝑡 = 𝑐𝑡

𝑏𝑡 = 𝑈𝑏 𝑧𝑡

Technology

𝑐𝑡 = 𝑑1𝑡

𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡

𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0

𝑑1𝑡
[ ] = 𝑈𝑑 𝑧𝑡
0

Information

1 0 0 0 0
𝑧𝑡+1 = ⎡
⎢ 0 0.8 0 ⎤𝑧 +⎡ 1 0 ⎤𝑤
⎥ 𝑡 ⎢ ⎥ 𝑡+1
⎣ 0 0 0.5 ⎦ ⎣ 0 1 ⎦

𝑈𝑏 = [ 30 0 0 ]

5 1 0
𝑈𝑑 = [ ]
0 0 0


𝑥0 = [ 5 150 1 0 0 ]

60.2 Asset Pricing Equations

[59] show that the time t value of a permanent claim to a stream 𝑦𝑠 = 𝑈𝑎 𝑥𝑠 , 𝑠 ≥ 𝑡 is:

𝑎𝑡 = (𝑥′𝑡 𝜇𝑎 𝑥𝑡 + 𝜎𝑎 )/(𝑒1̄ 𝑀𝑐 𝑥𝑡 )

with



𝜇𝑎 = ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝑍𝑎 𝐴𝑜𝜏
𝜏=0


𝛽 ′ ′
𝜎𝑎 = trace(𝑍𝑎 ∑ 𝛽 𝜏 (𝐴𝑜 )𝜏 𝐶𝐶 (𝐴𝑜 )𝜏 )
1−𝛽 𝜏=0

where
60.3. ASSET PRICING SIMULATIONS 1015


𝑍𝑎 = 𝑈 𝑎 𝑀 𝑐

The use of 𝑒1̄ indicates that the first consumption good is the numeraire

60.3 Asset Pricing Simulations

In [3]: gam = 0
γ = np.array([[gam], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 1e-4
�_i = np.array([[0], [-�_1]])
δ_k = np.array([[.95]])
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
ud = np.array([[5, 1, 0],
[0, 0, 0]])
a22 = np.array([[1, 0, 0],
[0, 0.8, 0],
[0, 0, 0.5]])
c2 = np.array([[0, 1, 0],
[0, 0, 1]]).T
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ub = np.array([[30, 0, 0]])
x0 = np.array([[5, 150, 1, 0, 0]]).T

Info1 = (a22, c2, ub, ud)


Tech1 = (�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = (β, l_λ, π_h, δ_h, θ_h)

In [4]: Econ1 = DLE(Info1, Tech1, Pref1)

After specifying a “Pay” matrix, we simulate the economy


The particular choice of “Pay” used below means that we are pricing a perpetual claim on the
endowment process 𝑑1𝑡

In [5]: Econ1.compute_sequence(x0, ts_length=100, Pay=np.array([Econ1.Sd[0, :]]))

The graph below plots the price of this claim over time:

In [6]: ### Fig 7.12.1 from p.147 of HS2013


plt.plot(Econ1.Pay_Price, label='Price of Tree')
plt.legend()
plt.show()
1016 60. LUCAS ASSET PRICING USING DLE

The next plot displays the realized gross rate of return on this “Lucas tree” as well as on a
risk-free one-period bond:

In [7]: ### Left panel of Fig 7.12.2 from p.148 of HS2013


plt.plot(Econ1.Pay_Gross, label='Tree')
plt.plot(Econ1.R1_Gross, label='Risk-Free')
plt.legend()
plt.show()

In [8]: np.corrcoef(Econ1.Pay_Gross[1:, 0], Econ1.R1_Gross[1:, 0])


60.3. ASSET PRICING SIMULATIONS 1017

Out[8]: array([[ 1. , -0.43010995],


[-0.43010995, 1. ]])

Above we have also calculated the correlation coefficient between these two returns
To give an idea of how the term structure of interest rates moves in this economy, the next
plot displays the net rates of return on one-period and five-period risk-free bonds:

In [9]: ### Right panel of Fig 7.12.2 from p.148 of HS2013


plt.plot(Econ1.R1_Net, label='One-Period')
plt.plot(Econ1.R5_Net, label='Five-Period')
plt.legend()
plt.show()

From the above plot, we can see the tendency of the term structure to slope up when rates
are low and to slope down when rates are high
Comparing it to the previous plot of the price of the “Lucas tree”, we can also see that net
rates of return are low when the price of the tree is high, and vice versa
We now plot the realized gross rate of return on a “Lucas tree” as well as on a risk-free one-
period bond when the autoregressive parameter for the endowment process is reduced to 0.4:

In [10]: a22_2 = np.array([[1, 0, 0],


[0, 0.4, 0],
[0, 0, 0.5]])
Info2 = (a22_2, c2, ub, ud)

Econ2 = DLE(Info2, Tech1, Pref1)


Econ2.compute_sequence(x0, ts_length=100, Pay=np.array([Econ2.Sd[0, :]]))

In [11]: ### Left panel of Fig 7.12.3 from p.148 of HS2013


plt.plot(Econ2.Pay_Gross, label='Tree')
plt.plot(Econ2.R1_Gross, label='Risk-Free')
plt.legend()
plt.show()
1018 60. LUCAS ASSET PRICING USING DLE

In [12]: np.corrcoef(Econ2.Pay_Gross[1:, 0], Econ2.R1_Gross[1:, 0])

Out[12]: array([[ 1. , -0.66759621],


[-0.66759621, 1. ]])

The correlation between these two gross rates is now more negative
Next, we again plot the net rates of return on one-period and five-period risk-free bonds:

In [13]: ### Right panel of Fig 7.12.3 from p.148 of HS2013


plt.plot(Econ2.R1_Net, label='One-Period')
plt.plot(Econ2.R5_Net, label='Five-Period')
plt.legend()
plt.show()
60.3. ASSET PRICING SIMULATIONS 1019

We can see the tendency of the term structure to slope up when rates are low (and down
when rates are high) has been accentuated relative to the first instance of our economy
1020 60. LUCAS ASSET PRICING USING DLE
61

IRFs in Hall Models

61.1 Contents

• Example 1: Hall (1978) 61.2

• Example 2: Higher Adjustment Costs 61.3

• Example 3: Durable Consumption Goods 61.4

Co-author: Sebastian Graves


This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

We’ll make these imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ
from quantecon import DLE
%matplotlib inline

This lecture shows how the DLE class can be used to create impulse response functions for
three related economies, starting from Hall (1978) [48]
Knowledge of the basic economic environment is assumed
See the lecture “Growth in Dynamic Linear Economies” for more details

61.2 Example 1: Hall (1978)

First, we set parameters to make consumption (almost) follow a random walk


We set

1021
1022 61. IRFS IN HALL MODELS

1
𝜆 = 0, 𝜋 = 1, 𝛾1 = 0.1, 𝜙1 = 0.00001, 𝛿𝑘 = 0.95, 𝛽 =
1.05

(In this example 𝛿ℎ and 𝜃ℎ are arbitrary as household capital does not enter the equation for
consumption services
We set them to values that will become useful in Example 3)
It is worth noting that this choice of parameter values ensures that 𝛽(𝛾1 + 𝛿𝑘 ) = 1
For simulations of this economy, we choose an initial condition of:


𝑥0 = [ 5 150 1 0 0 ]

In [3]: γ_1 = 0.1


γ = np.array([[γ_1], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 1e-5
�_i = np.array([[1], [-�_1]])
δ_k = np.array([[.95]])
θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]])
a22 = np.array([[1, 0, 0],
[0, 0.8, 0],
[0, 0, 0.5]])
c2 = np.array([[0, 0],
[1, 0],
[0, 1]])
ud = np.array([[5, 1, 0],
[0, 0, 0]])
ub = np.array([[30, 0, 0]])
x0 = np.array([[5], [150], [1], [0], [0]])

Info1 = (a22, c2, ub, ud)


Tech1 = (�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = (β, l_λ, π_h, δ_h, θ_h)

These parameter values are used to define an economy of the DLE class
We can then simulate the economy for a chosen length of time, from our initial state vector
𝑥0
The economy stores the simulated values for each variable. Below we plot consumption and
investment:

In [4]: Econ1 = DLE(Info1, Tech1, Pref1)


Econ1.compute_sequence(x0, ts_length=300)

# This is the right panel of Fig 5.7.1 from p.105 of HS2013


plt.plot(Econ1.c[0], label='Cons.')
plt.plot(Econ1.i[0], label='Inv.')
plt.legend()
plt.show()
61.2. EXAMPLE 1: HALL (1978) 1023

The DLE class can be used to create impulse response functions for each of the endogenous
variables: {𝑐𝑡 , 𝑠𝑡 , ℎ𝑡 , 𝑖𝑡 , 𝑘𝑡 , 𝑔𝑡 }
If no selector vector for the shock is specified, the default choice is to give IRFs to the first
shock in 𝑤𝑡+1
Below we plot the impulse response functions of investment and consumption to an endow-
ment innovation (the first shock) in the Hall model:

In [5]: Econ1.irf(ts_length=40, shock=None)


# This is the left panel of Fig 5.7.1 from p.105 of HS2013
plt.plot(Econ1.c_irf, label='Cons.')
plt.plot(Econ1.i_irf, label='Inv.')
plt.legend()
plt.show()
1024 61. IRFS IN HALL MODELS

It can be seen that the endowment shock has permanent effects on the level of both consump-
tion and investment, consistent with the endogenous unit eigenvalue in this economy
Investment is much more responsive to the endowment shock at shorter time horizons

61.3 Example 2: Higher Adjustment Costs

We generate our next economy by making only one change to the parameters of Example 1:
we raise the parameter associated with the cost of adjusting capital,𝜙1 , from 0.00001 to 0.2
This will lower the endogenous eigenvalue that is unity in Example 1 to a value slightly below
1

In [6]: �_12 = 0.2


�_i2 = np.array([[1], [-�_12]])
Tech2 = (�_c, �_g, �_i2, γ, δ_k, θ_k)

Econ2 = DLE(Info1, Tech2, Pref1)


Econ2.compute_sequence(x0, ts_length = 300)

# This is the right panel of Fig 5.8.1 from p.106 of HS2013


plt.plot(Econ2.c[0], label='Cons.')
plt.plot(Econ2.i[0], label='Inv.')
plt.legend()
plt.show()
61.3. EXAMPLE 2: HIGHER ADJUSTMENT COSTS 1025

In [7]: Econ2.irf(ts_length=40,shock=None)
# This is the left panel of Fig 5.8.1 from p.106 of HS2013
plt.plot(Econ2.c_irf,label='Cons.')
plt.plot(Econ2.i_irf,label='Inv.')
plt.legend()
plt.show()

In [8]: Econ2.endo
1026 61. IRFS IN HALL MODELS

Out[8]: array([0.9 , 0.99657126])

In [9]: Econ2.compute_steadystate()
print(Econ2.css, Econ2.iss, Econ2.kss)

[[5.]] [[2.02678791e-12]] [[4.05357139e-11]]

The first graph shows that there seems to be a downward trend in both consumption and in-
vestment
his is a consequence of the decrease in the largest endogenous eigenvalue from unity in the
earlier economy, caused by the higher adjustment cost
The present economy has a nonstochastic steady state value of 5 for consumption and 0 for
both capital and investment
Because the largest endogenous eigenvalue is still close to 1, the economy heads only slowly
towards these mean values
The impulse response functions now show that an endowment shock does not have a perma-
nent effect on the levels of either consumption or investment

61.4 Example 3: Durable Consumption Goods

We generate our third economy by raising 𝜙1 further, to 1.0. We also raise the production
function parameter from 0.1 to 0.15 (which raises the non-stochastic steady state value of
capital above zero)
We also change the specification of preferences to make the consumption good durable
Specifically, we allow for a single durable household good obeying:

ℎ𝑡 = 𝛿ℎ ℎ𝑡−1 + 𝑐𝑡 , 0 < 𝛿ℎ < 1

Services are related to the stock of durables at the beginning of the period:

𝑠𝑡 = 𝜆ℎ𝑡−1 , 𝜆 > 0

And preferences are ordered by:

1 ∞
− E ∑ 𝛽 𝑡 [(𝜆ℎ𝑡−1 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0

To implement this, we set 𝜆 = 0.1 and 𝜋 = 0 (we have already set 𝜃ℎ = 1 and 𝛿ℎ = 0.9)
We start from an initial condition that makes consumption begin near around its non-
stochastic steady state

In [10]: �_13 = 1
�_i3 = np.array([[1], [-�_13]])

γ_12 = 0.15
γ_2 = np.array([[γ_12], [0]])
61.4. EXAMPLE 3: DURABLE CONSUMPTION GOODS 1027

l_λ2 = np.array([[0.1]])
π_h2 = np.array([[0]])

x01 = np.array([[150], [100], [1], [0], [0]])

Tech3 = (�_c, �_g, �_i3, γ_2, δ_k, θ_k)


Pref2 = (β, l_λ2, π_h2, δ_h, θ_h)

Econ3 = DLE(Info1, Tech3, Pref2)


Econ3.compute_sequence(x01, ts_length=300)

# This is the right panel of Fig 5.11.1 from p.111 of HS2013


plt.plot(Econ3.c[0], label='Cons.')
plt.plot(Econ3.i[0], label='Inv.')
plt.legend()
plt.show()

In contrast to Hall’s original model of Example 1, it is now investment that is much smoother
than consumption
This illustrates how making consumption goods durable tends to undo the strong consump-
tion smoothing result that Hall obtained

In [11]: Econ3.irf(ts_length=40, shock=None)


# This is the left panel of Fig 5.11.1 from p.111 of HS2013
plt.plot(Econ3.c_irf, label='Cons.')
plt.plot(Econ3.i_irf, label='Inv.')
plt.legend()
plt.show()
1028 61. IRFS IN HALL MODELS

The impulse response functions confirm that consumption is now much more responsive to an
endowment shock (and investment less so) than in Example 1
As in Example 2, the endowment shock has permanent effects on neither variable
62

Permanent Income Model using the


DLE Class

62.1 Contents

• The Permanent Income Model 62.2

Co-author: Sebastian Graves


This lecture is part of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s included in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

This lecture adds a third solution method for the linear-quadratic-Gaussian permanent in-
come model with 𝛽𝑅 = 1, complementing the other two solution methods described in
Optimal Savings I: The Permanent Income Model and Optimal Savings II: LQ Techniques
and this Jupyter notebook http://nbviewer.jupyter.org/github/QuantEcon/
QuantEcon.notebooks/blob/master/permanent_income.ipynb
The additional solution method uses the DLE class
In this way, we map the permanent income model into the framework of Hansen & Sargent
(2013) “Recursive Models of Dynamic Linear Economies” [59]
We’ll also require the following imports

In [2]: import quantecon as qe


import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
from quantecon import DLE

%matplotlib inline
np.set_printoptions(suppress=True, precision=4)

1029
1030 62. PERMANENT INCOME MODEL USING THE DLE CLASS

62.2 The Permanent Income Model

The LQ permanent income model is an example of a savings problem


A consumer has preferences over consumption streams that are ordered by the utility func-
tional


𝐸0 ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0

where 𝐸𝑡 is the mathematical expectation conditioned on the consumer’s time 𝑡 information,


𝑐𝑡 is time 𝑡 consumption, 𝑢(𝑐) is a strictly concave one-period utility function, and 𝛽 ∈ (0, 1)
is a discount factor
The LQ model gets its name partly from assuming that the utility function 𝑢 is quadratic:

𝑢(𝑐) = −.5(𝑐 − 𝛾)2

where 𝛾 > 0 is a bliss level of consumption


The consumer maximizes the utility functional Eq. (1) by choosing a consumption, borrowing
plan {𝑐𝑡 , 𝑏𝑡+1 }∞
𝑡=0 subject to the sequence of budget constraints

𝑐𝑡 + 𝑏𝑡 = 𝑅−1 𝑏𝑡+1 + 𝑦𝑡 , 𝑡 ≥ 0 (2)

where 𝑦𝑡 is an exogenous stationary endowment process, 𝑅 is a constant gross risk-free inter-


est rate, 𝑏𝑡 is one-period risk-free debt maturing at 𝑡, and 𝑏0 is a given initial condition
We shall assume that 𝑅−1 = 𝛽
Equation Eq. (2) is linear
We use another set of linear equations to model the endowment process
In particular, we assume that the endowment process has the state-space representation

𝑧𝑡+1 = 𝐴22 𝑧𝑡 + 𝐶2 𝑤𝑡+1


(3)
𝑦 𝑡 = 𝑈 𝑦 𝑧𝑡

where 𝑤𝑡+1 is an IID process with mean zero and identity contemporaneous covariance ma-
trix, 𝐴22 is a stable matrix, its eigenvalues being strictly below unity in modulus, and 𝑈𝑦 is a
selection vector that identifies 𝑦 with a particular linear combination of the 𝑧𝑡
We impose the following condition on the consumption, borrowing plan:


𝐸0 ∑ 𝛽 𝑡 𝑏𝑡2 < +∞ (4)
𝑡=0

This condition suffices to rule out Ponzi schemes


(We impose this condition to rule out a borrow-more-and-more plan that would allow the
household to enjoy bliss consumption forever)
The state vector confronting the household at 𝑡 is
62.2. THE PERMANENT INCOME MODEL 1031

𝑧
𝑥𝑡 = [ 𝑡 ]
𝑏𝑡

where 𝑏𝑡 is its one-period debt falling due at the beginning of period 𝑡 and 𝑧𝑡 contains all
variables useful for forecasting its future endowment
We assume that {𝑦𝑡 } follows a second order univariate autoregressive process:

𝑦𝑡+1 = 𝛼 + 𝜌1 𝑦𝑡 + 𝜌2 𝑦𝑡−1 + 𝜎𝑤𝑡+1

62.2.1 Solution with the DLE Class

One way of solving this model is to map the problem into the framework outlined in Section
4.8 of [59] by setting up our technology, information and preference matrices as follows:
1 0 −1 −1
Technology: 𝜙𝑐 = [ ] , 𝜙𝑔 = [ ] , 𝜙𝑖 = [ ], Γ = [ ], Δ𝑘 = 0, Θ𝑘 = 𝑅
0 1 −0.00001 0
1 0 0 0
0 1 0
Information: 𝐴22 = ⎢ 𝛼 𝜌1 𝜌2 ⎥, 𝐶2 = ⎢ 𝜎 ⎤
⎡ ⎤ ⎡
⎥, 𝑈𝑏 = [ 𝛾 0 0 ], 𝑈𝑑 = [ 0 0 0 ]
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
Preferences: Λ = 0, Π = 1, Δℎ = 0, Θℎ = 0
We set parameters
𝛼 = 10, 𝛽 = 0.95, 𝜌1 = 0.9, 𝜌2 = 0, 𝜎 = 1
(The value of 𝛾 does not affect the optimal decision rule)
The chosen matrices mean that the household’s technology is:

𝑐𝑡 + 𝑘𝑡−1 = 𝑖𝑡 + 𝑦𝑡

𝑘𝑡
= 𝑖𝑡
𝑅

𝑙2𝑡 = (0.00001)2 𝑖𝑡

Combining the first two of these gives the budget constraint of the permanent income model,
where 𝑘𝑡 = 𝑏𝑡+1
The third equation is a very small penalty on debt-accumulation to rule out Ponzi schemes
We set up this instance of the DLE class below:

In [3]: α, β, ρ_1, ρ_2, σ = 10, 0.95, 0.9, 0, 1

γ = np.array([[-1], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 1e-5
�_i = np.array([[-1], [-�_1]])
δ_k = np.array([[0]])
θ_k = np.array([[1 / β]])
β = np.array([[β]])
1032 62. PERMANENT INCOME MODEL USING THE DLE CLASS

l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[0]])
θ_h = np.array([[0]])

a22 = np.array([[1, 0, 0],


[α, ρ_1, ρ_2],
[0, 1, 0]])

c2 = np.array([[0], [σ], [0]])


ud = np.array([[0, 1, 0],
[0, 0, 0]])
ub = np.array([[100, 0, 0]])

x0 = np.array([[0], [0], [1], [0], [0]])

Info1 = (a22, c2, ub, ud)


Tech1 = (�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = (β, l_λ, π_h, δ_h, θ_h)
Econ1 = DLE(Info1, Tech1, Pref1)

To check the solution of this model with that from the LQ problem, we select the 𝑆𝑐 matrix
from the DLE class
The solution to the DLE economy has:

𝑐𝑡 = 𝑆𝑐 𝑥𝑡

In [4]: Econ1.Sc

Out[4]: array([[ 0. , -0.05 , 65.5172, 0.3448, 0. ]])

The state vector in the DLE class is:

ℎ𝑡−1
𝑥𝑡 = ⎡ ⎤
⎢ 𝑘𝑡−1 ⎥
⎣ 𝑧𝑡 ⎦

where 𝑘𝑡−1 = 𝑏𝑡 is set up to be 𝑏𝑡 in the permanent income model


𝑧
The state vector in the LQ problem is [ 𝑡 ]
𝑏𝑡
Consequently, the relevant elements of Econ1.Sc are the same as in −𝐹 occur when we ap-
ply other approaches to the same model in the lecture Optimal Savings II: LQ Techniques
and this Jupyter notebook http://nbviewer.jupyter.org/github/QuantEcon/
QuantEcon.notebooks/blob/master/permanent_income.ipynb
The plot below quickly replicates the first two figures of that lecture and that notebook to
confirm that the solutions are the same

In [5]: plt.figure(figsize=(16, 5))


plt.subplot(121)

for i in range(25):
Econ1.compute_sequence(x0, ts_length=150)
plt.plot(Econ1.c[0], c='g')
plt.plot(Econ1.d[0], c='b')
plt.plot(Econ1.c[0], label='Consumption', c='g')
plt.plot(Econ1.d[0], label='Income', c='b')
plt.legend()
62.2. THE PERMANENT INCOME MODEL 1033

plt.subplot(122)
for i in range(25):
Econ1.compute_sequence(x0, ts_length=150)
plt.plot(Econ1.k[0], color='r')
plt.plot(Econ1.k[0], label='Debt', c='r')
plt.legend()
plt.show()
1034 62. PERMANENT INCOME MODEL USING THE DLE CLASS
63

Rosen Schooling Model

63.1 Contents

• A One-Occupation Model 63.2

• Mapping into HS2013 Framework 63.3

Co-author: Sebastian Graves


This lecture is yet another part of a suite of lectures that use the quantecon DLE class to in-
stantiate models within the [59] class of models described in detail in Recursive Models of
Dynamic Linear Economies
In addition to what’s included in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

We’ll also need the following imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ
from collections import namedtuple
from quantecon import DLE
from math import sqrt
%matplotlib inline

63.2 A One-Occupation Model

Ryoo and Rosen’s (2004) [114] partial equilibrium model determines

• a stock of “Engineers” 𝑁𝑡
• a number of new entrants in engineering school, 𝑛𝑡
• the wage rate of engineers, 𝑤𝑡

It takes k periods of schooling to become an engineer


The model consists of the following equations:

1035
1036 63. ROSEN SCHOOLING MODEL

• a demand curve for engineers:

𝑤𝑡 = −𝛼𝑑 𝑁𝑡 + 𝜖𝑑𝑡

• a time-to-build structure of the education process:

𝑁𝑡+𝑘 = 𝛿𝑁 𝑁𝑡+𝑘−1 + 𝑛𝑡

• a definition of the discounted present value of each new engineering student:


𝑣𝑡 = 𝛽𝑘 E ∑(𝛽𝛿𝑁 )𝑗 𝑤𝑡+𝑘+𝑗
𝑗=0

• a supply curve of new students driven by present value 𝑣𝑡 :

𝑛𝑡 = 𝛼𝑠 𝑣𝑡 + 𝜖𝑠𝑡

63.3 Mapping into HS2013 Framework

We represent this model in the [59] framework by

• sweeping the time-to-build structure and the demand for engineers into the household
technology, and
• putting the supply of engineers into the technology for producing goods

63.3.1 Preferences

𝛿𝑁 1 0 ⋯ 0 0
⎡0 0 1 ⋯ 0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
Π = 0, Λ = [𝛼𝑑 0 ⋯ 0] , Δℎ = ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥ , Θℎ = ⎢ ⋮ ⎥
⎢0 ⋯ ⋯ 0 1⎥ ⎢0⎥
⎣0 0 0 ⋯ 0⎦ ⎣1⎦

where Λ is a k+1 x 1 matrix, Δℎ is a k_1 x k+1 matrix, and Θℎ is a k+1 x 1 matrix


This specification sets 𝑁𝑡 = ℎ1𝑡−1 , 𝑛𝑡 = 𝑐𝑡 , ℎ𝜏+1,𝑡−1 = 𝑛𝑡−(𝑘−𝜏) for 𝜏 = 1, ..., 𝑘
Below we set things up so that the number of years of education, k, can be varied
63.3. MAPPING INTO HS2013 FRAMEWORK 1037

63.3.2 Technology

To capture Ryoo and Rosen’s [114] supply curve, we use the physical technology:

𝑐𝑡 = 𝑖𝑡 + 𝑑1𝑡

𝜓1 𝑖 𝑡 = 𝑔 𝑡

where 𝜓1 is inversely proportional to 𝛼𝑠

63.3.3 Information

Because we want 𝑏𝑡 = 𝜖𝑑𝑡 and 𝑑1𝑡 = 𝜖𝑠𝑡 , we set

1 0 0 0 0
10 1 0
𝐴22 = ⎢0 𝜌𝑠 0 ⎥ , 𝐶2 = ⎢1 0⎤
⎡ ⎤ ⎡
⎥ , 𝑈𝑏 = [30 0 1] , 𝑈𝑑 = [ 0 0 0]
⎣0 0 𝜌𝑑 ⎦ ⎣0 1⎦

where 𝜌𝑠 and 𝜌𝑑 describe the persistence of the supply and demand shocks

In [3]: Information = namedtuple('Information', ['a22', 'c2','ub','ud'])


Technology = namedtuple('Technology', ['�_c', '�_g', '�_i', 'γ', 'δ_k', 'θ_k'])
Preferences = namedtuple('Preferences', ['β', 'l_λ', 'π_h', 'δ_h', 'θ_h'])

63.3.4 Effects of Changes in Education Technology and Demand

We now study how changing

• the number of years of education required to become an engineer and


• the slope of the demand curve

affects responses to demand shocks


To begin, we set 𝑘 = 4 and 𝛼𝑑 = 0.1

In [4]: k = 4 # Number of periods of schooling required to become an engineer

β = np.array([[1 / 1.05]])
α_d = np.array([[0.1]])
α_s = 1
ε_1 = 1e-7
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1)) # Use of ε_1 is trick to aquire detectability, see HS2013 p. 228 footnote
π_h = np.array([[0]])

δ_n = np.array([[0.95]])
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))

θ_h = np.vstack((np.zeros((k, 1)),


np.ones((1, 1))))

ψ_1 = 1 / α_s
1038 63. ROSEN SCHOOLING MODEL

�_c = np.array([[1], [0]])


�_g = np.array([[0], [-1]])
�_i = np.array([[-1], [ψ_1]])
γ = np.array([[0], [0]])

δ_k = np.array([[0]])
θ_k = np.array([[0]])

ρ_s = 0.8
ρ_d = 0.8

a22 = np.array([[1, 0, 0],


[0, ρ_s, 0],
[0, 0, ρ_d]])

c2 = np.array([[0, 0], [10, 0], [0, 10]])


ub = np.array([[30, 0, 1]])
ud = np.array([[10, 1, 0], [0, 0, 0]])

Info1 = Information(a22, c2, ub, ud)


Tech1 = Technology(�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = Preferences(β, l_λ, π_h, δ_h, θ_h)

Econ1 = DLE(Info1, Tech1, Pref1)

We create three other instances by:

1. Raising 𝛼𝑑 to 2
2. Raising k to 7
3. Raising k to 10

In [5]: α_d = np.array([[2]])


l_λ = np.hstack((α_d, λ_1))
Pref2 = Preferences(β, l_λ, π_h, δ_h, θ_h)
Econ2 = DLE(Info1, Tech1, Pref2)

α_d = np.array([[0.1]])

k = 7
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k+1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))

Pref3 = Preferences(β, l_λ, π_h, δ_h, θ_h)


Econ3 = DLE(Info1, Tech1, Pref3)

k = 10
λ_1 = np.ones((1, k)) * ε_1
l_λ = np.hstack((α_d, λ_1))
d1 = np.vstack((δ_n, np.zeros((k - 1, 1))))
d2 = np.hstack((d1, np.eye(k)))
δ_h = np.vstack((d2, np.zeros((1, k + 1))))
θ_h = np.vstack((np.zeros((k, 1)),
np.ones((1, 1))))

Pref4 = Preferences(β, l_λ, π_h, δ_h, θ_h)


Econ4 = DLE(Info1, Tech1, Pref4)

shock_demand = np.array([[0], [1]])

Econ1.irf(ts_length=25, shock=shock_demand)
Econ2.irf(ts_length=25, shock=shock_demand)
Econ3.irf(ts_length=25, shock=shock_demand)
Econ4.irf(ts_length=25, shock=shock_demand)
63.3. MAPPING INTO HS2013 FRAMEWORK 1039

The first figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝛼𝑑 = 0.1 and 𝛼𝑑 = 2
When 𝛼𝑑 = 2, the number of new students 𝑛𝑡 rises initially, but the response then turns nega-
tive
A positive demand shock raises wages, drawing new students into the profession
However, these new students raise 𝑁𝑡
The higher is 𝛼𝑑 , the larger the effect of this rise in 𝑁𝑡 on wages
This counteracts the demand shock’s positive effect on wages, reducing the number of new
students in subsequent periods
Consequently, when 𝛼𝑑 is lower, the effect of a demand shock on 𝑁𝑡 is larger

In [6]: plt.figure(figsize=(12, 4))


plt.subplot(121)
plt.plot(Econ1.c_irf,label='$\\alpha_d = 0.1$')
plt.plot(Econ2.c_irf,label='$\\alpha_d = 2$')
plt.legend()
plt.title('Response of $n_t$ to a demand shock')

plt.subplot(122)
plt.plot(Econ1.h_irf[:, 0], label='$\\alpha_d = 0.1$')
plt.plot(Econ2.h_irf[:, 0], label='$\\alpha_d = 24$')
plt.legend()
plt.title('Response of $N_t$ to a demand shock')
plt.show()

The next figure plots the impulse response of 𝑛𝑡 (on the left) and 𝑁𝑡 (on the right) to a posi-
tive demand shock, for 𝑘 = 4, 𝑘 = 7 and 𝑘 = 10 (with 𝛼𝑑 = 0.1)

In [7]: plt.figure(figsize=(12, 4))


plt.subplot(121)
plt.plot(Econ1.c_irf, label='$k=4$')
plt.plot(Econ3.c_irf, label='$k=7$')
plt.plot(Econ4.c_irf, label='$k=10$')
plt.legend()
plt.title('Response of $n_t$ to a demand shock')

plt.subplot(122)
plt.plot(Econ1.h_irf[:,0], label='$k=4$')
plt.plot(Econ3.h_irf[:,0], label='$k=7$')
plt.plot(Econ4.h_irf[:,0], label='$k=10$')
plt.legend()
plt.title('Response of $N_t$ to a demand shock')
plt.show()
1040 63. ROSEN SCHOOLING MODEL

Both panels in the above figure show that raising k lowers the effect of a positive demand
shock on entry into the engineering profession
Increasing the number of periods of schooling lowers the number of new students in response
to a demand shock
This occurs because with longer required schooling, new students ultimately benefit less from
the impact of that shock on wages
64

Cattle Cycles

64.1 Contents

• The Model 64.2

• Mapping into HS2013 Framework 64.3

Co-author: Sebastian Graves


This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

This lecture uses the DLE class to construct instances of the “Cattle Cycles” model of Rosen,
Murphy and Scheinkman (1994) [110]
That paper constructs a rational expectations equilibrium model to understand sources of
recurrent cycles in US cattle stocks and prices
We make the following imports

In [2]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import LQ
from collections import namedtuple
from quantecon import DLE
from math import sqrt
%matplotlib inline

64.2 The Model

The model features a static linear demand curve and a “time-to-grow” structure for cattle
Let 𝑝𝑡 be the price of slaughtered beef, 𝑚𝑡 the cost of preparing an animal for slaughter, ℎ𝑡
the holding cost for a mature animal, 𝛾1 ℎ𝑡 the holding cost for a yearling, and 𝛾0 ℎ𝑡 the hold-
ing cost for a calf

1041
1042 64. CATTLE CYCLES

The cost processes {ℎ𝑡 , 𝑚𝑡 }∞ ∞


𝑡=0 are exogenous, while the price process {𝑝𝑡 }𝑡=0 is determined
within a rational expectations equilibrium
Let 𝑥𝑡 be the breeding stock, and 𝑦𝑡 be the total stock of cattle
The law of motion for the breeding stock is

𝑥𝑡 = (1 − 𝛿)𝑥𝑡−1 + 𝑔𝑥𝑡−3 − 𝑐𝑡

where 𝑔 < 1 is the number of calves that each member of the breeding stock has each year,
and 𝑐𝑡 is the number of cattle slaughtered
The total headcount of cattle is

𝑦𝑡 = 𝑥𝑡 + 𝑔𝑥𝑡−1 + 𝑔𝑥𝑡−2

This equation states that the total number of cattle equals the sum of adults, calves and
yearlings, respectively
A representative farmer chooses {𝑐𝑡 , 𝑥𝑡 } to maximize:


𝜓1 2 𝜓2 2 𝜓 𝜓
E0 ∑ 𝛽 𝑡 {𝑝𝑡 𝑐𝑡 − ℎ𝑡 𝑥𝑡 − 𝛾0 ℎ𝑡 (𝑔𝑥𝑡−1 ) − 𝛾1 ℎ𝑡 (𝑔𝑥𝑡−2 ) − 𝑚𝑡 𝑐𝑡 − 𝑥 − 𝑥 − 3 𝑥2𝑡−3 − 4 𝑐𝑡2 }
𝑡=0
2 𝑡 2 𝑡−1 2 2

subject to the law of motion for 𝑥𝑡 , taking as given the stochastic laws of motion for the ex-
ogenous processes, the equilibrium price process, and the initial state [𝑥−1 , 𝑥−2 , 𝑥−3 ]
Remark The 𝜓𝑗 parameters are very small quadratic costs that are included for technical
reasons to make well posed and well behaved the linear quadratic dynamic programming
problem solved by the fictitious planner who in effect chooses equilibrium quantities and
shadow prices
Demand for beef is government by 𝑐𝑡 = 𝑎0 − 𝑎1 𝑝𝑡 + 𝑑𝑡̃ where 𝑑𝑡̃ is a stochastic process with
mean zero, representing a demand shifter

64.3 Mapping into HS2013 Framework

64.3.1 Preferences
1

We set Λ = 0, Δℎ = 0, Θℎ = 0, Π = 𝛼1 2 and 𝑏𝑡 = Π𝑑𝑡̃ + Π𝛼0
With these settings, the FOC for the household’s problem becomes the demand curve of the
“Cattle Cycles” model

64.3.2 Technology

To capture the law of motion for cattle, we set

(1 − 𝛿) 0 𝑔 1
Δ𝑘 = ⎡
⎢ 1 0 0 ⎤
⎥ , Θ 𝑘 = ⎡ 0 ⎤
⎢ ⎥
⎣ 0 1 0 ⎦ ⎣ 0 ⎦
64.3. MAPPING INTO HS2013 FRAMEWORK 1043

(where 𝑖𝑡 = −𝑐𝑡 )
To capture the production of cattle, we set

1 0 0 0 0 1 0 0 0
⎡ 𝑓 ⎤ ⎡ 1 0 0 0 ⎤ ⎡ 0 ⎤ ⎡ 𝑓 (1 − 𝛿) 0 𝑔𝑓 ⎤
⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 1 ⎥
Φ𝑐 = ⎢ 0 ⎥ , Φ𝑔 = ⎢ 0 1 0 0 ⎥ , Φ𝑖 = ⎢ 0 ⎥ , Γ = ⎢ 𝑓3 0 0 ⎥
⎢ 0 ⎥ ⎢ 0 0 1 0 ⎥ ⎢ 0 ⎥ ⎢ 0 𝑓5 0 ⎥
⎣ −𝑓7 ⎦ ⎣ 0 0 0 1 ⎦ 0
⎣ ⎦ ⎣ 0 0 0 ⎦

64.3.3 Information

We set

0
1 0 0 0 0 0 0 ⎡ ⎤
⎡ ⎤ ⎡ ⎤ 𝑓2 𝑈ℎ
0 𝜌1 0 0 1 0 0 ⎢ ⎥
𝐴22 =⎢ ⎥ , 𝐶2 = ⎢ ⎥ , 𝑈𝑏 = [ Π𝛼0 0 0 Π ] , 𝑈𝑑 = ⎢ 𝑓4 𝑈ℎ ⎥
⎢ 0 0 𝜌2 0 ⎥ ⎢ 0 1 0 ⎥
⎢ 𝑓6 𝑈ℎ ⎥
⎣ 0 0 0 𝜌3 ⎦ ⎣ 0 0 15 ⎦
⎣ 𝑓8 𝑈ℎ ⎦

Ψ1 Ψ2 Ψ3
To map this into our class, we set 𝑓12 = 2 , 𝑓22 = 2 , 𝑓32 = 2 , 2𝑓1 𝑓2 = 1, 2𝑓3 𝑓4 = 𝛾0 𝑔,
2𝑓5 𝑓6 = 𝛾1 𝑔

In [3]: # We define namedtuples in this way as it allows us to check, for example,


# what matrices are associated with a particular technology.

Information = namedtuple('Information', ['a22', 'c2', 'ub', 'ud'])


Technology = namedtuple('Technology', ['�_c', '�_g', '�_i', 'γ', 'δ_k', 'θ_k'])
Preferences = namedtuple('Preferences', ['β', 'l_λ', 'π_h', 'δ_h', 'θ_h'])

We set parameters to those used by [110]

In [4]: β = np.array([[0.909]])
lλ = np.array([[0]])

a1 = 0.5
πh = np.array([[1 / (sqrt(a1))]])
δh = np.array([[0]])
θh = np.array([[0]])

δ = 0.1
g = 0.85
f1 = 0.001
f3 = 0.001
f5 = 0.001
f7 = 0.001

�c = np.array([[1], [f1], [0], [0], [-f7]])

�g = np.array([[0, 0, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1,0],
[0, 0, 0, 1]])

�i = np.array([[1], [0], [0], [0], [0]])

γ = np.array([[ 0, 0, 0],
[f1 * (1 - δ), 0, g * f1],
1044 64. CATTLE CYCLES

[ f3, 0, 0],
[ 0, f5, 0],
[ 0, 0, 0]])

δk = np.array([[1 - δ, 0, g],
[ 1, 0, 0],
[ 0, 1, 0]])

θk = np.array([[1], [0], [0]])

ρ1 = 0
ρ2 = 0
ρ3 = 0.6
a0 = 500
γ0 = 0.4
γ1 = 0.7
f2 = 1 / (2 * f1)
f4 = γ0 * g / (2 * f3)
f6 = γ1 * g / (2 * f5)
f8 = 1 / (2 * f7)

a22 = np.array([[1, 0, 0, 0],


[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3]])

c2 = np.array([[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 15]])

ub = np.array([[πh * a0, 0, 0, πh]])


uh = np.array([[50, 1, 0, 0]])
um = np.array([[100, 0, 1, 0]])
ud = np.vstack(([0, 0, 0, 0],
f2 * uh, f4 * uh, f6 * uh, f8 * um))

Notice that we have set 𝜌1 = 𝜌2 = 0, so ℎ𝑡 and 𝑚𝑡 consist of a constant and a white noise
component
We set up the economy using tuples for information, technology and preference matrices be-
low
We also construct two extra information matrices, corresponding to cases when 𝜌3 = 1 and
𝜌3 = 0 (as opposed to the baseline case of 𝜌3 = 0.6)

In [5]: Info1 = Information(a22, c2, ub, ud)


Tech1 = Technology(�c, �g, �i, γ, δk, θk)
Pref1 = Preferences(β, lλ, πh, δh, θh)

ρ3_2 = 1
a22_2 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_2]])

Info2 = Information(a22_2, c2, ub, ud)

ρ3_3 = 0
a22_3 = np.array([[1, 0, 0, 0],
[0, ρ1, 0, 0],
[0, 0, ρ2, 0],
[0, 0, 0, ρ3_3]])

Info3 = Information(a22_3, c2, ub, ud)

# Example of how we can look at the matrices associated with a given namedtuple
Info1.a22

Out[5]: array([[1. , 0. , 0. , 0. ],
64.3. MAPPING INTO HS2013 FRAMEWORK 1045

[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.6]])

In [6]: # Use tuples to define DLE class


Econ1 = DLE(Info1, Tech1, Pref1)
Econ2 = DLE(Info2, Tech1, Pref1)
Econ3 = DLE(Info3, Tech1, Pref1)

# Calculate steady-state in baseline case and use to set the initial condition
Econ1.compute_steadystate(nnc=4)
x0 = Econ1.zz

In [7]: Econ1.compute_sequence(x0, ts_length=100)

[110] use the model to understand the sources of recurrent cycles in total cattle stocks
Plotting 𝑦𝑡 for a simulation of their model shows its ability to generate cycles in quantities

In [8]: TotalStock = Econ1.k[0] + g * Econ1.k[1] + g * Econ1.k[2] # Calculation of y_t


plt.plot(TotalStock)
plt.xlim((-1, 100))
plt.title('Total number of cattle')
plt.show()

In their Figure 3, [110] plot the impulse response functions of consumption and the breeding
stock of cattle to the demand shock, 𝑑𝑡̃ , under the three different values of 𝜌3
We replicate their Figure 3 below

In [9]: shock_demand = np.array([[0], [0], [1]])

Econ1.irf(ts_length=25, shock=shock_demand)
Econ2.irf(ts_length=25, shock=shock_demand)
Econ3.irf(ts_length=25, shock=shock_demand)
1046 64. CATTLE CYCLES

plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.c_irf, label='$\\rho=0.6$')
plt.plot(Econ2.c_irf, label='$\\rho=1$')
plt.plot(Econ3.c_irf, label='$\\rho=0$')
plt.title('Consumption response to demand shock')
plt.legend()

plt.subplot(122)
plt.plot(Econ1.k_irf[:, 0], label='$\\rho=0.6$')
plt.plot(Econ2.k_irf[:, 0], label='$\\rho=1$')
plt.plot(Econ3.k_irf[:, 0], label='$\\rho=0$')
plt.title('Breeding stock response to demand shock')
plt.legend()
plt.show()

The above figures show how consumption patterns differ markedly, depending on the persis-
tence of the demand shock:

• If it is purely transitory (𝜌3 = 0) then consumption rises immediately but is later re-
duced to build stocks up again.
• If it is permanent (𝜌3 = 1), then consumption falls immediately, in order to build up
stocks to satisfy the permanent rise in future demand.

In Figure 4 of their paper, [110] plot the response to a demand shock of the breeding stock
and the total stock, for 𝜌3 = 0 and 𝜌3 = 0.6
We replicate their Figure 4 below

In [10]: Total1_irf = Econ1.k_irf[:, 0] + g * Econ1.k_irf[:, 1] + g * Econ1.k_irf[:, 2]


Total3_irf = Econ3.k_irf[:, 0] + g * Econ3.k_irf[:, 1] + g * Econ3.k_irf[:, 2]

plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.k_irf[:, 0], label='Breeding Stock')
plt.plot(Total1_irf, label='Total Stock')
plt.title('$\\rho=0.6$')

plt.subplot(122)
plt.plot(Econ3.k_irf[:, 0], label='Breeding Stock')
plt.plot(Total3_irf, label='Total Stock')
plt.title('$\\rho=0$')
plt.show()
64.3. MAPPING INTO HS2013 FRAMEWORK 1047

The fact that 𝑦𝑡 is a weighted moving average of 𝑥𝑡 creates a humped shape response of the
total stock in response to demand shocks, contributing to the cyclicality seen in the first
graph of this lecture
1048 64. CATTLE CYCLES
65

Shock Non Invertibility

This is another member of a suite of lectures that use the quantecon DLE class to instantiate
models within the [59] class of models described in detail in Recursive Models of Dynamic
Linear Economies
In addition to what’s in Anaconda, this lecture uses the quantecon library

In [1]: !pip install quantecon

We’ll make these imports

In [2]: import numpy as np


import quantecon as qe
import matplotlib.pyplot as plt
from quantecon import LQ
from quantecon import DLE
from math import sqrt
%matplotlib inline

This lecture can be viewed as introducing an early contribution to what is now often called a
news and noise issue
In particular, it analyzes and illustrates an invertibility issue that is endemic within a class
of permanent income models
Technically, the invertibility problem indicates a situation in which histories of the shocks in
an econometrician’s autoregressive or Wold moving average representation span a smaller in-
formation space than do the shocks seen by the agent inside the econometrician’s model
This situation sets the stage for an econometrician who is unaware of the problem to misin-
terpret shocks and likely responses to them
We consider the following modification of Robert Hall’s (1978) model [48] in which the en-
dowment process is the sum of two orthogonal autoregressive processes:
Preferences

1 ∞ 𝑡
− E ∑ 𝛽 [(𝑐𝑡 − 𝑏𝑡 )2 + 𝑙2𝑡 ]|𝐽0
2 𝑡=0

𝑠𝑡 = 𝑐𝑡

1049
1050 65. SHOCK NON INVERTIBILITY

𝑏𝑡 = 𝑈𝑏 𝑧𝑡

Technology

𝑐𝑡 + 𝑖𝑡 = 𝛾𝑘𝑡−1 + 𝑑𝑡

𝑘𝑡 = 𝛿𝑘 𝑘𝑡−1 + 𝑖𝑡

𝑔𝑡 = 𝜙1 𝑖𝑡 , 𝜙1 > 0

𝑔𝑡 ⋅ 𝑔𝑡 = 𝑙2𝑡

Information

1 0 0 0 0 0 0 0
⎡ 0 0.9 0 0 0 0 ⎤ ⎡ 1 0 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 0 0 0 0 0 ⎥ ⎢ 0 4 ⎥
𝑧𝑡+1 =⎢ 𝑧𝑡 + ⎢ ⎥ 𝑤𝑡+1
⎢ 0 0 1 0 0 0 ⎥ ⎥ ⎢ 0 0 ⎥
⎢ 0 0 0 1 0 0 ⎥ ⎢ 0 0 ⎥
⎣ 0 0 0 0 1 0 ⎦ ⎣ 0 0 ⎦

𝑈𝑏 = [ 30 0 0 0 0 0 ]

5 1 1 0.8 0.6 0.4


𝑈𝑑 = [ ]
0 0 0 0 0 0

The preference shock is constant at 30, while the endowment process is the sum of a constant
and two orthogonal processes
Specifically:

𝑑𝑡 = 5 + 𝑑1𝑡 + 𝑑2𝑡

𝑑1𝑡 = 0.9𝑑1𝑡−1 + 𝑤1𝑡

𝑑2𝑡 = 4𝑤2𝑡 + 0.8(4𝑤2𝑡−1 ) + 0.6(4𝑤2𝑡−2 ) + 0.4(4𝑤2𝑡−3 )

𝑑1𝑡 is a first-order AR process, while 𝑑2𝑡 is a third-order pure moving average process

In [3]: γ_1 = 0.05


γ = np.array([[γ_1], [0]])
�_c = np.array([[1], [0]])
�_g = np.array([[0], [1]])
�_1 = 0.00001
�_i = np.array([[1], [-�_1]])
δ_k = np.array([[1]])
1051

θ_k = np.array([[1]])
β = np.array([[1 / 1.05]])
l_λ = np.array([[0]])
π_h = np.array([[1]])
δ_h = np.array([[.9]])
θ_h = np.array([[1]]) - δ_h
ud = np.array([[5, 1, 1, 0.8, 0.6, 0.4],
[0, 0, 0, 0, 0, 0]])
a22 = np.zeros((6, 6))
a22[[0, 1, 3, 4, 5], [0, 1, 2, 3, 4]] = np.array([1.0, 0.9, 1.0, 1.0, 1.0]) # Chase's great trick
c2 = np.zeros((6, 2))
c2[[1, 2], [0, 1]] = np.array([1.0, 4.0])
ub = np.array([[30, 0, 0, 0, 0, 0]])
x0 = np.array([[5], [150], [1], [0], [0], [0], [0], [0]])

Info1 = (a22, c2, ub, ud)


Tech1 = (�_c, �_g, �_i, γ, δ_k, θ_k)
Pref1 = (β, l_λ, π_h, δ_h, θ_h)

Econ1 = DLE(Info1, Tech1, Pref1)

We define the household’s net of interest deficit as 𝑐𝑡 − 𝑑𝑡


Hall’s model imposes “expected present-value budget balance” in the sense that


E ∑ 𝛽 𝑗 (𝑐𝑡+𝑗 − 𝑑𝑡+𝑗 )|𝐽𝑡 = 𝛽 −1 𝑘𝑡−1 ∀𝑡
𝑗=0

If we define the moving average representation of (𝑐𝑡 , 𝑐𝑡 − 𝑑𝑡 ) in terms of the 𝑤𝑡 s to be:

𝑐𝑡 𝜎 (𝐿)
[ ]=[ 1 ] 𝑤𝑡
𝑐𝑡 − 𝑑𝑡 𝜎2 (𝐿)

then Hall’s model imposes the restriction 𝜎2 (𝛽) = [0 0]


The agent inside this model sees histories of both components of the endowment process 𝑑1𝑡
and 𝑑2𝑡
The econometrician has data on the history of the pair [𝑐𝑡 , 𝑑𝑡 ], but not directly on the history
of 𝑤𝑡
The econometrician obtains a Wold representation for the process [𝑐𝑡 , 𝑐𝑡 − 𝑑𝑡 ]:

𝑐𝑡 𝜎∗ (𝐿)
[ ] = [ 1∗ ] 𝑢𝑡
𝑐𝑡 − 𝑑 𝑡 𝜎2 (𝐿)

The Appendix of chapter 8 of [59] explains why the impulse response functions in the Wold
representation estimated by the econometrician do not resemble the impulse response func-
tions that depict the response of consumption and the deficit to innovations to agents’ infor-
mation
Technically, 𝜎2 (𝛽) = [0 0] implies that the history of 𝑢𝑡 s spans a smaller linear space than
does the history of 𝑤𝑡 s
This means that 𝑢𝑡 will typically be a distributed lag of 𝑤𝑡 that is not concentrated at zero
lag:


𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0
1052 65. SHOCK NON INVERTIBILITY

Thus, the econometrician’s news 𝑢𝑡 potentially responds belatedly to agents’ news 𝑤𝑡


We will construct Figures from Chapter 8 Appendix E of [59] to illustrate these ideas:

In [4]: # This is Fig 8.E.1 from p.188 of HS2013

Econ1.irf(ts_length=40, shock=None)

plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(Econ1.c_irf, label='Consumption')
plt.plot(Econ1.c_irf - Econ1.d_irf[:,0].reshape(40,1), label='Deficit')
plt.legend()
plt.title('Response to $w_{1t}$')

shock2 = np.array([[0], [1]])


Econ1.irf(ts_length=40, shock=shock2)

plt.subplot(122)
plt.plot(Econ1.c_irf, label='Consumption')
plt.plot(Econ1.c_irf - Econ1.d_irf[:,0].reshape(40, 1), label='Deficit')
plt.legend()
plt.title('Response to $w_{2t}$')
plt.show()

The above figure displays the impulse response of consumption and the deficit to the endow-
ment innovations
Consumption displays the characteristic “random walk” response with respect to each innova-
tion
Each endowment innovation leads to a temporary surplus followed by a permanent net-of-
interest deficit
The temporary surplus just offsets the permanent deficit in terms of expected present value

In [5]: G_HS = np.vstack([Econ1.Sc, Econ1.Sc-Econ1.Sd[0, :].reshape(1, 8)])


H_HS = 1e-8 * np.eye(2) # Set very small so there is no measurement error
LSS_HS = qe.LinearStateSpace(Econ1.A0, Econ1.C, G_HS, H_HS)

HS_kal = qe.Kalman(LSS_HS)
w_lss = HS_kal.whitener_lss()
ma_coefs = HS_kal.stationary_coefficients(50, 'ma')

# This is Fig 8.E.2 from p.189 of HS2013

ma_coefs = ma_coefs
jj = 50
y1_w1 = np.empty(jj)
1053

y2_w1 = np.empty(jj)
y1_w2 = np.empty(jj)
y2_w2 = np.empty(jj)

for t in range(jj):
y1_w1[t] = ma_coefs[t][0, 0]
y1_w2[t] = ma_coefs[t][0, 1]
y2_w1[t] = ma_coefs[t][1, 0]
y2_w2[t] = ma_coefs[t][1, 1]

# This scales the impulse responses to match those in the book


y1_w1 = sqrt(HS_kal.stationary_innovation_covar()[0, 0]) * y1_w1
y2_w1 = sqrt(HS_kal.stationary_innovation_covar()[0, 0]) * y2_w1
y1_w2 = sqrt(HS_kal.stationary_innovation_covar()[1, 1]) * y1_w2
y2_w2 = sqrt(HS_kal.stationary_innovation_covar()[1, 1]) * y2_w2

plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(y1_w1, label='Consumption')
plt.plot(y2_w1, label='Deficit')
plt.legend()
plt.title('Response to $u_{1t}$')

plt.subplot(122)
plt.plot(y1_w2, label='Consumption')
plt.plot(y2_w2, label='Deficit')
plt.legend()
plt.title('Response to $u_{2t}$')
plt.show()

The above figure displays the impulse response of consumption and the deficit to the innova-
tions in the econometrician’s Wold representation

• this is the object that would be recovered from a high order vector autoregression on
the econometrician’s observations

Consumption responds only to the first innovation

• this is indicative of the Granger causality imposed on the [𝑐𝑡 , 𝑐𝑡 − 𝑑𝑡 ] process by Hall’s
model: consumption Granger causes 𝑐𝑡 − 𝑑𝑡 , with no reverse causality

In [6]: # This is Fig 8.E.3 from p.189 of HS2013

jj = 20
irf_wlss = w_lss.impulse_response(jj)
ycoefs = irf_wlss[1]
1054 65. SHOCK NON INVERTIBILITY

# Pull out the shocks


a1_w1 = np.empty(jj)
a1_w2 = np.empty(jj)
a2_w1 = np.empty(jj)
a2_w2 = np.empty(jj)

for t in range(jj):
a1_w1[t] = ycoefs[t][0, 0]
a1_w2[t] = ycoefs[t][0, 1]
a2_w1[t] = ycoefs[t][1, 0]
a2_w2[t] = ycoefs[t][1, 1]

plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(a1_w1, label='Consumption innov.')
plt.plot(a2_w1, label='Deficit innov.')
plt.title('Response to $w_{1t}$')
plt.legend()
plt.subplot(122)
plt.plot(a1_w2, label='Consumption innov.')
plt.plot(a2_w2, label='Deficit innov.')
plt.legend()
plt.title('Response to $w_{2t}$')
plt.show()

The above figure displays the impulse responses of 𝑢𝑡 to 𝑤𝑡 , as depicted in:


𝑢𝑡 = ∑ 𝛼𝑗 𝑤𝑡−𝑗
𝑗=0

While the responses of the innovations to consumption are concentrated at lag zero for both
components of 𝑤𝑡 , the responses of the innovations to (𝑐𝑡 − 𝑑𝑡 ) are spread over time (espe-
cially in response to 𝑤1𝑡 )
Thus, the innovations to (𝑐𝑡 − 𝑑𝑡 ) as revealed by the vector autoregression depend on what
the economic agent views as “old news”
Part IX

Classic Linear Models

1055
66

Von Neumann Growth Model (and


a Generalization)

66.1 Contents

• Notation 66.2

• Model Ingredients and Assumptions 66.3

• Dynamic Interpretation 66.4

• Duality 66.5

• Interpretation as a Game Theoretic Problem (Two-player Zero-sum Game) 66.6

Co-author: Balint Szoke


This notebook uses the class Neumann to calculate key objects of a linear growth model of
John von Neumann (1937) [131] that was generalized by Kemeny, Moregenstern and Thomp-
son (1956) [76]
Objects of interest are the maximal expansion rate (𝛼), the interest factor (𝛽), and the opti-
mal intensities (𝑥) and prices (𝑝)
In addition to watching how the towering mind of John von Neumann formulated an equilib-
rium model of price and quantity vectors in balanced growth, this notebook shows how fruit-
fully to employ the following important tools:

• a zero-sum two-player game


• linear programming
• the Perron-Frobenius theorem

In [1]: import numpy as np


import matplotlib.pyplot as plt
from scipy.linalg import solve
from scipy.optimize import fsolve, linprog
from textwrap import dedent
%matplotlib inline

np.set_printoptions(precision=2)

The code below provides the Neumann class

1057
1058 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

In [2]: class Neumann(object):


"""
This class describes the Generalized von Neumann growth model as it was
discussed in Kemeny et al. (1956, ECTA) :cite:`kemeny1956generalization` and Gale (1960, Chapter 9

Let:
n ... number of goods
m ... number of activities
A ... input matrix is m-by-n
a_{i,j} - amount of good j consumed by activity i
B ... output matrix is m-by-n
b_{i,j} - amount of good j produced by activity i

x ... intensity vector (m-vector) with non-negative entries


x'B - the vector of goods produced
x'A - the vector of goods consumed
p ... price vector (n-vector) with non-negative entries
Bp - the revenue vector for every activity
Ap - the cost of each activity

Both A and B have non-negative entries. Moreover, we assume that


(1) Assumption I (every good which is consumed is also produced):
for all j, b_{.,j} > 0, i.e. at least one entry is strictly positive
(2) Assumption II (no free lunch):
for all i, a_{i,.} > 0, i.e. at least one entry is strictly positive

Parameters
----------
A : array_like or scalar(float)
Part of the state transition equation. It should be `n x n`
B : array_like or scalar(float)
Part of the state transition equation. It should be `n x k`
"""

def __init__(self, A, B):

self.A, self.B = list(map(self.convert, (A, B)))


self.m, self.n = self.A.shape

# Check if (A, B) satisfy the basic assumptions


assert self.A.shape == self.B.shape, 'The input and output matrices must have the same dimensi
assert (self.A >= 0).all() and (self.B >= 0).all(), 'The input and output matrices must have o

# (1) Check whether Assumption I is satisfied:


if (np.sum(B, 0) <= 0).any():
self.AI = False
else:
self.AI = True

# (2) Check whether Assumption II is satisfied:


if (np.sum(A, 1) <= 0).any():
self.AII = False
else:
self.AII = True

def __repr__(self):
return self.__str__()

def __str__(self):

me = """
Generalized von Neumann expanding model:
- number of goods : {n}
- number of activities : {m}

Assumptions:
- AI: every column of B has a positive entry : {AI}
- AII: every row of A has a positive entry : {AII}

"""
# Irreducible : {irr}
return dedent(me.format(n=self.n, m=self.m,
AI=self.AI, AII=self.AII))
66.1. CONTENTS 1059

def convert(self, x):


"""
Convert array_like objects (lists of lists, floats, etc.) into
well-formed 2D NumPy arrays
"""
return np.atleast_2d(np.asarray(x))

def bounds(self):
"""
Calculate the trivial upper and lower bounds for alpha (expansion rate) and
beta (interest factor). See the proof of Theorem 9.8 in Gale (1960) :cite:`gale1989theory`
"""

n, m = self.n, self.m
A, B = self.A, self.B

f = lambda α: ((B - α * A) @ np.ones((n, 1))).max()


g = lambda β: (np.ones((1, m)) @ (B - β * A)).min()

UB = np.asscalar(fsolve(f, 1)) # Upper bound for α, β


LB = np.asscalar(fsolve(g, 2)) # Lower bound for α, β

return LB, UB

def zerosum(self, γ, dual=False):


"""
Given gamma, calculate the value and optimal strategies of a two-player
zero-sum game given by the matrix

M(gamma) = B - gamma * A

Row player maximizing, column player minimizing

Zero-sum game as an LP (primal --> α)

max (0', 1) @ (x', v)


subject to
[-M', ones(n, 1)] @ (x', v)' <= 0
(x', v) @ (ones(m, 1), 0) = 1
(x', v) >= (0', -inf)

Zero-sum game as an LP (dual --> beta)

min (0', 1) @ (p', u)


subject to
[M, -ones(m, 1)] @ (p', u)' <= 0
(p', u) @ (ones(n, 1), 0) = 1
(p', u) >= (0', -inf)

Outputs:
--------
value: scalar
value of the zero-sum game

strategy: vector
if dual = False, it is the intensity vector,
if dual = True, it is the price vector
"""

A, B, n, m = self.A, self.B, self.n, self.m


M = B - γ * A

if dual == False:
# Solve the primal LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(m), -1])

# (2) Add constraints :


# ... non-negativity constraints
bounds = tuple(m * [(0, None)] + [(None, None)])
1060 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

# ... inequality constraints


A_iq = np.hstack([-M.T, np.ones((n, 1))])
b_iq = np.zeros((n, 1))
# ... normalization
A_eq = np.hstack([np.ones(m), 0]).reshape(1, m + 1)
b_eq = 1

res = linprog(c, A_ub=A_iq, b_ub=b_iq, A_eq=A_eq, b_eq=b_eq,


bounds=bounds, options=dict(bland=True, tol=1e-7))

else:
# Solve the dual LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(n), 1])

# (2) Add constraints :


# ... non-negativity constraints
bounds = tuple(n * [(0, None)] + [(None, None)])
# ... inequality constraints
A_iq = np.hstack([M, -np.ones((m, 1))])
b_iq = np.zeros((m, 1))
# ... normalization
A_eq = np.hstack([np.ones(n), 0]).reshape(1, n + 1)
b_eq = 1

res = linprog(c, A_ub=A_iq, b_ub=b_iq, A_eq=A_eq, b_eq=b_eq,


bounds=bounds, options=dict(bland=True, tol=1e-7))

if res.status != 0:
print(res.message)

# Pull out the required quantities


value = res.x[-1]
strategy = res.x[:-1]

return value, strategy

def expansion(self, tol=1e-8, maxit=1000):


"""
The algorithm used here is described in Hamburger-Thompson-Weil (1967, ECTA).
It is based on a simple bisection argument and utilizes the idea that for
a given γ (= α or β), the matrix "M = B - γ * A" defines a
two-player zero-sum game, where the optimal strategies are the (normalized)
intensity and price vector.

Outputs:
--------
alpha: scalar
optimal expansion rate
"""

LB, UB = self.bounds()

for iter in range(maxit):

γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ)
V = ZS[0] # value of the game with γ

if V >= 0:
LB = γ
else:
UB = γ

if abs(UB - LB) < tol:


γ = (UB + LB) / 2
x = self.zerosum(γ=γ)[1]
p = self.zerosum(γ=γ, dual=True)[1]
break

return γ, x, p
66.2. NOTATION 1061

def interest(self, tol=1e-8, maxit=1000):


"""
The algorithm used here is described in Hamburger-Thompson-Weil (1967, ECTA).
It is based on a simple bisection argument and utilizes the idea that for
a given gamma (= alpha or beta), the matrix "M = B - γ * A" defines a
two-player zero-sum game, where the optimal strategies are the (normalized)
intensity and price vector.

Outputs:
--------
beta: scalar
optimal interest rate
"""

LB, UB = self.bounds()

for iter in range(maxit):


γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ, dual=True)
V = ZS[0]

if V > 0:
LB = γ
else:
UB = γ

if abs(UB - LB) < tol:


γ = (UB + LB) / 2
p = self.zerosum(γ=γ, dual=True)[1]
x = self.zerosum(γ=γ)[1]
break

return γ, x, p

66.2 Notation

We use the following notation


0 denotes a vector of zeros. We call an 𝑛-vector - positive or 𝑥 ≫ 0 if 𝑥𝑖 > 0 for all 𝑖 =
1, 2, … , 𝑛 - non-negative or 𝑥 ≥ 0 if 𝑥𝑖 ≥ 0 for all 𝑖 = 1, 2, … , 𝑛 - semi-positive or 𝑥 > 0 if
𝑥 ≥ 0 and 𝑥 ≠ 0
For two conformable vectors 𝑥 and 𝑦, 𝑥 ≫ 𝑦, 𝑥 ≥ 𝑦 and 𝑥 > 𝑦 mean 𝑥 − 𝑦 ≫ 0, 𝑥 − 𝑦 ≥ 0,
and 𝑥 − 𝑦 > 0
By default, all vectors are column vectors, 𝑥𝑇 denotes the transpose of 𝑥 (i.e. a row vector)
Let 𝜄𝑛 denote a column vector composed of 𝑛 ones, i.e. 𝜄𝑛 = (1, 1, … , 1)𝑇
Let 𝑒𝑖 denote the vector (of arbitrary size) containing zeros except for the 𝑖 th position where
it is one
We denote matrices by capital letters. For an arbitrary matrix 𝐴, 𝑎𝑖,𝑗 represents the entry in
its 𝑖 th row and 𝑗 th column
𝑎⋅𝑗 and 𝑎𝑖⋅ denote the 𝑗 th column and 𝑖 th row of 𝐴, respectively

66.3 Model Ingredients and Assumptions

A pair (𝐴, 𝐵) of 𝑚 × 𝑛 non-negative matrices defines an economy.

• 𝑚 is the number of activities (or sectors)


1062 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

• 𝑛 is the number of goods (produced and/or used in the economy)


• 𝐴 is called the input matrix; 𝑎𝑖,𝑗 denotes the amount of good 𝑗 consumed by activity 𝑖
• 𝐵 is called the output matrix; 𝑏𝑖,𝑗 represents the amount of good 𝑗 produced by activity
𝑖

Two key assumptions restrict economy (𝐴, 𝐵):

• Assumption I: (every good which is consumed is also produced)

𝑏.,𝑗 > 0 ∀𝑗 = 1, 2, … , 𝑛

• Assumption II: (no free lunch)

𝑎𝑖,. > 0 ∀𝑖 = 1, 2, … , 𝑚

A semi-positive 𝑚-vector:math:x denotes the levels at which activities are operated (intensity
vector)
Therefore,

• vector 𝑥𝑇 𝐴 gives the total amount of goods used in production


• vector 𝑥𝑇 𝐵 gives total outputs

An economy (𝐴, 𝐵) is said to be productive, if there exists a non-negative intensity vector 𝑥 ≥


0 such that 𝑥𝑇 𝐵 > 𝑥𝑇 𝐴
The semi-positive 𝑛-vector 𝑝 contains prices assigned to the 𝑛 goods
The 𝑝 vector implies cost and revenue vectors

• the vector 𝐴𝑝 tells costs of the vector of activities


• the vector 𝐵𝑝 tells revenues from the vector of activities

A property of an input-output pair (𝐴, 𝐵) called irreducibility (or indecomposability) deter-


mines whether an economy can be decomposed into multiple ‘’sub-economies”
Definition: Given an economy (𝐴, 𝐵), the set of goods 𝑆 ⊂ {1, 2, … , 𝑛} is called an indepen-
dent subset if it is possible to produce every good in 𝑆 without consuming any good outside
𝑆. Formally, the set 𝑆 is independent if ∃𝑇 ⊂ {1, 2, … , 𝑚} (subset of activities) such that
𝑎𝑖,𝑗 = 0, ∀𝑖 ∈ 𝑇 and 𝑗 ∈ 𝑆 𝑐 and for all 𝑗 ∈ 𝑆, ∃𝑖 ∈ 𝑇 , s.t. 𝑏𝑖,𝑗 > 0. The economy is irre-
ducible if there are no proper independent subsets
We study two examples, both coming from Chapter 9.6 of Gale (1960) [45]

In [3]: # (1) Irreducible (A, B) example: α_0 = β_0


A1 = np.array([[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0]])

B1 = np.array([[1, 0, 0, 0],
[0, 0, 2, 0],
66.4. DYNAMIC INTERPRETATION 1063

[0, 1, 0, 1]])

# (2) Reducible (A, B) example: β_0 < α_0


A2 = np.array([[0, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 1, 0]])

B2 = np.array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 1, 0, 1]])

The following code sets up our first Neumann economy or Neumann instance

In [4]: N1 = Neumann(A1, B1)


N1

Out[4]:
Generalized von Neumann expanding model:
- number of goods : 4
- number of activities : 3

Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True

In [5]: N2 = Neumann(A2, B2)


N2

Out[5]:
Generalized von Neumann expanding model:
- number of goods : 6
- number of activities : 5

Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True

66.4 Dynamic Interpretation

Attach a time index 𝑡 to the preceding objects, regard an economy as a dynamic system, and
study sequences

{(𝐴𝑡 , 𝐵𝑡 )}𝑡≥0 , {𝑥𝑡 }𝑡≥0 , {𝑝𝑡 }𝑡≥0

An interesting special case holds the technology process constant and investigates the dynam-
ics of quantities and prices only
Accordingly, in the rest of this notebook, we assume that (𝐴𝑡 , 𝐵𝑡 ) = (𝐴, 𝐵) for all 𝑡 ≥ 0
A crucial element of the dynamic interpretation involves the timing of production
We assume that production (consumption of inputs) takes place in period 𝑡, while the associ-
ated output materializes in period 𝑡 + 1, i.e. consumption of 𝑥𝑇𝑡 𝐴 in period 𝑡 results in 𝑥𝑇𝑡 𝐵
amounts of output in period 𝑡 + 1
1064 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

These timing conventions imply the following feasibility condition:

𝑥𝑇𝑡 𝐵 ≥ 𝑥𝑇𝑡+1 𝐴 ∀𝑡 ≥ 1

which asserts that no more goods can be used today than were produced yesterday
Accordingly, 𝐴𝑝𝑡 tells the costs of production in period 𝑡 and 𝐵𝑝𝑡 tells revenues in period 𝑡 +
1

66.4.1 Balanced Growth

We follow John von Neumann in studying “balanced growth”


Let ./ denote an elementwise division of one vector by another and let 𝛼 > 0 be a scalar
Then balanced growth is a situation in which

𝑥𝑡+1 ./𝑥𝑡 = 𝛼, ∀𝑡 ≥ 0

With balanced growth, the law of motion of 𝑥 is evidently 𝑥𝑡+1 = 𝛼𝑥𝑡 and so we can rewrite
the feasibility constraint as

𝑥𝑇𝑡 𝐵 ≥ 𝛼𝑥𝑇𝑡 𝐴 ∀𝑡

In the same spirit, define 𝛽 ∈ R as the interest factor per unit of time
We assume that it is always possible to earn a gross return equal to the constant interest fac-
tor 𝛽 by investing “outside the model”
Under this assumption about outside investment opportunities, a no-arbitrage condition gives
rise to the following (no profit) restriction on the price sequence:

𝛽𝐴𝑝𝑡 ≥ 𝐵𝑝𝑡 ∀𝑡

This says that production cannot yield a return greater than that offered by the investment
opportunity (note that we compare values in period 𝑡 + 1)
The balanced growth assumption allows us to drop time subscripts and conduct an analysis
purely in terms of a time-invariant growth rate 𝛼 and interest factor 𝛽

66.5 Duality

The following two problems are connected by a remarkable dual relationship between the
technological and valuation characteristics of the economy:
Definition: The technological expansion problem (TEP) for the economy (𝐴, 𝐵) is to find a
semi-positive 𝑚-vector 𝑥 > 0 and a number 𝛼 ∈ R, s.t.

max 𝛼
𝛼
s.t. 𝑥𝑇 𝐵 ≥ 𝛼𝑥𝑇 𝐴
66.5. DUALITY 1065

Theorem 9.3 of David Gale’s book [45] assets that if Assumptions I and II are both satisfied,
then a maximum value of 𝛼 exists and it is positive
It is called the technological expansion rate and is denoted by 𝛼0 . The associated intensity
vector 𝑥0 is the optimal intensity vector
Definition: The economical expansion problem (EEP) for (𝐴, 𝐵) is to find a semi-positive
𝑛-vector 𝑝 > 0 and a number 𝛽 ∈ R, such that

min 𝛽
𝛽

s.t. 𝐵𝑝 ≤ 𝛽𝐴𝑝

Assumptions I and II imply existence of a minimum value 𝛽0 > 0 called the economic expan-
sion rate
The corresponding price vector 𝑝0 is the optimal price vector
Evidently, the criterion functions in technological expansion problem and the economical ex-
pansion problem are both linearly homogeneous, so the optimality of 𝑥0 and 𝑝0 are defined
only up to a positive scale factor
For simplicity (and to emphasize a close connection to zero-sum games), in the following, we
normalize both vectors 𝑥0 and 𝑝0 to have unit length
A standard duality argument (see Lemma 9.4. in (Gale, 1960) [45]) implies that under As-
sumptions I and II, 𝛽0 ≤ 𝛼0
But in the other direction, that is 𝛽0 ≥ 𝛼0 , Assumptions I and II are not sufficient
Nevertheless, von Neumann (1937) [131] proved the following remarkable “duality-type” re-
sult connecting TEP and EEP
Theorem 1 (von Neumann): If the economy (𝐴, 𝐵) satisfies Assumptions I and II, then
there exists a set (𝛾 ∗ , 𝑥0 , 𝑝0 ), where 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ] ⊂ R, 𝑥0 > 0 is an 𝑚-vector, 𝑝0 > 0 is an
𝑛-vector and the following holds true

𝑥𝑇0 𝐵 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴
𝐵𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0
𝑥𝑇0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0

Proof (Sketch): Assumption I and II imply that there exist (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 )
solving the TEP and EEP, respectively. If 𝛾 ∗ > 𝛼0 , then by definition of 𝛼0 , there
cannot exist a semi-positive 𝑥 that satisfies 𝑥𝑇 𝐵 ≥ 𝛾 ∗ 𝑥𝑇 𝐴. Similarly, if 𝛾 ∗ < 𝛽0 ,
there is no semi-positive 𝑝 so that 𝐵𝑝 ≤ 𝛾 ∗ 𝐴𝑝. Let 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ], then 𝑥𝑇0 𝐵 ≥
𝛼0 𝑥𝑇0 𝐴 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴. Moreover, 𝐵𝑝0 ≤ 𝛽0 𝐴𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0 . These two inequalities imply
𝑥0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0.

Here the constant 𝛾 ∗ is both expansion and interest factor (not necessarily optimal)
We have already encountered and discussed the first two inequalities that represent feasibility
and no-profit conditions
Moreover, the equality compactly captures the requirements that if any good grows at a rate
larger than 𝛾 ∗ (i.e., if it is oversupplied), then its price must be zero; and that if any activity
provides negative profit, it must be unused
1066 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

Therefore, these expressions encode all equilibrium conditions and Theorem I essentially
states that under Assumptions I and II there always exists an equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ) with
balanced growth
Note that Theorem I is silent about uniqueness of the equilibrium. In fact, it does not rule
out (trivial) cases with 𝑥𝑇0 𝐵𝑝0 = 0 so that nothing of value is produced
To exclude such uninteresting cases, Kemeny, Morgenstern and Thomspson (1956) add an
extra requirement

𝑥𝑇0 𝐵𝑝0 > 0

and call the resulting equilibria economic solutions


They show that this extra condition does not affect the existence result, while it significantly
reduces the number of (relevant) solutions

66.6 Interpretation as a Game Theoretic Problem (Two-player


Zero-sum Game)

To compute the equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ), we follow the algorithm proposed by Hamburger,


Thompson and Weil (1967), building on the key insight that the equilibrium (with balanced
growth) can be considered as a solution of a particular two-player zero-sum game. First, we
introduce some notations
Consider the 𝑚 × 𝑛 matrix 𝐶 as a payoff matrix, with the entries representing payoffs from
the minimizing column player to the maximizing row player and assume that the players
can use mixed strategies: - row player chooses the 𝑚-vector 𝑥 > 0, s.t. 𝜄𝑇𝑚 𝑥 = 1 - column
player chooses the 𝑛-vector 𝑝 > 0, s.t. 𝜄𝑇𝑛 𝑝 = 1
Definition: The 𝑚 × 𝑛 matrix game 𝐶 has the solution (𝑥∗ , 𝑝∗ , 𝑉 (𝐶)) in mixed strategies, if

(𝑥∗ )𝑇 𝐶𝑒𝑗 ≥ 𝑉 (𝐶) ∀𝑗 ∈ {1, … , 𝑛} and (𝑒𝑖 )𝑇 𝐶𝑝∗ ≤ 𝑉 (𝐶) ∀𝑖 ∈ {1, … , 𝑚}

The number 𝑉 (𝐶) is called the value of the game


From the above definition, it is clear that the value 𝑉 (𝐶) has two alternative interpretations:

• by playing the appropriate mixed stategy, the maximizing player can assure himself at
least 𝑉 (𝐶) (no matter what the column player chooses)
• by playing the appropriate mixed stategy, the minimizing player can make sure that the
maximizing player will not get more than 𝑉 (𝐶) (irrespective of what is the maximizing
player’s choice)

From the famous theorem of Nash (1951), it follows that there always exists a mixed strategy
Nash equilibrium for any finite two-player zero-sum game
Moreover, von Neumann’s Minmax Theorem (1928) [100] implies that

𝑉 (𝐶) = max min 𝑥𝑇 𝐶𝑝 = min max 𝑥𝑇 𝐶𝑝 = (𝑥∗ )𝑇 𝐶𝑝∗


𝑥 𝑝 𝑝 𝑥
66.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)10

66.6.1 Connection with Linear Programming (LP)

Finding Nash equilibria of a finite two-player zero-sum game can be formulated as a linear
programming problem
To see this, we introduce the following notation - For a fixed 𝑥, let 𝑣 be the value of the min-
imization problem: 𝑣 ≡ min𝑝 𝑥𝑇 𝐶𝑝 = min𝑗 𝑥𝑇 𝐶𝑒𝑗 - For a fixed 𝑝, let 𝑢 be the value of the
maximization problem: 𝑢 ≡ max𝑥 𝑥𝑇 𝐶𝑝 = max𝑖 (𝑒𝑖 )𝑇 𝐶𝑝
Then the max-min problem (the game from the maximizing player’s point of view) can be
written as the primal LP

𝑉 (𝐶) = max 𝑣
s.t. 𝑣𝜄𝑇𝑛 ≤ 𝑥𝑇 𝐶
𝑥≥0
𝜄𝑇𝑛 𝑥 =1

while the min-max problem (the game from the minimizing player’s point of view) is the dual
LP

𝑉 (𝐶) = min 𝑢
s.t. 𝑢𝜄𝑚 ≥ 𝐶𝑝
𝑝≥0
𝜄𝑇𝑚 𝑝 =1

Hamburger, Thompson and Weil (1967) [50] view the input-output pair of the economy as
payoff matrices of two-player zero-sum games. Using this interpretation, they restate As-
sumption I and II as follows

𝑉 (−𝐴) < 0 and 𝑉 (𝐵) > 0

Proof (Sketch): * ⇒ 𝑉 (𝐵) > 0 implies 𝑥𝑇0 𝐵 ≫ 0, where 𝑥0 is a maximizing vec-


tor. Since 𝐵 is non-negative, this requires that each column of 𝐵 has at least one
positive entry, which is Assumption I. * ⇐ From Assumption I and the fact that
𝑝 > 0, it follows that 𝐵𝑝 > 0. This implies that the maximizing player can always
choose 𝑥 so that 𝑥𝑇 𝐵𝑝 > 0, that is it must be the case that 𝑉 (𝐵) > 0

In order to (re)state Theorem I in terms of a particular two-player zero-sum game, we define


the matrix for 𝛾 ∈ R

𝑀 (𝛾) ≡ 𝐵 − 𝛾𝐴

For fixed 𝛾, treating 𝑀 (𝛾) as a matrix game, we can calculate the solution of the game

• If 𝛾 > 𝛼0 , then for all 𝑥 > 0, there ∃𝑗 ∈ {1, … , 𝑛}, s.t. [𝑥𝑇 𝑀 (𝛾)]𝑗 < 0 implying that
𝑉 (𝑀 (𝛾)) < 0
• If 𝛾 < 𝛽0 , then for all 𝑝 > 0, there ∃𝑖 ∈ {1, … , 𝑚}, s.t. [𝑀 (𝛾)𝑝]𝑖 > 0 implying that
𝑉 (𝑀 (𝛾)) > 0
1068 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

• If 𝛾 ∈ {𝛽0 , 𝛼0 }, then (by Theorem I) the optimal intensity and price vectors 𝑥0 and 𝑝0
satisfy

𝑥𝑇0 𝑀 (𝛾) ≥ 0𝑇 and 𝑀 (𝛾)𝑝0 ≤ 0

That is, (𝑥0 , 𝑝0 , 0) is a solution of the game 𝑀 (𝛾) so that 𝑉 (𝑀 (𝛽0 )) = 𝑉 (𝑀 (𝛼0 )) = 0

• If 𝛽0 < 𝛼0 and 𝛾 ∈ (𝛽0 , 𝛼0 ), then 𝑉 (𝑀 (𝛾)) = 0

Moreover, if 𝑥′ is optimal for the maximizing player in 𝑀 (𝛾 ′ ) for 𝛾 ′ ∈ (𝛽0 , 𝛼0 ) and 𝑝″ is op-
timal for the minimizing player in 𝑀 (𝛾 ″ ) where 𝛾 ″ ∈ (𝛽0 , 𝛾 ′ ), then (𝑥′ , 𝑝″ , 0) is a solution for
𝑀 (𝛾), ∀𝛾 ∈ (𝛾 ″ , 𝛾 ′ )

Proof (Sketch): If 𝑥′ is optimal for a maximizing player in game 𝑀 (𝛾 ′ ), then


(𝑥′ )𝑇 𝑀 (𝛾 ′ ) ≥ 0𝑇 and so for all 𝛾 < 𝛾 ′

(𝑥′ )𝑇 𝑀 (𝛾) = (𝑥′ )𝑇 𝑀 (𝛾 ′ ) + (𝑥′ )𝑇 (𝛾 ′ − 𝛾)𝐴 ≥ 0𝑇

hence 𝑉 (𝑀 (𝛾)) ≥ 0. If 𝑝″ is optimal for a minimizing player in game 𝑀 (𝛾 ″ ), then 𝑀 (𝛾)𝑝 ≤ 0


and so for all 𝛾 ″ < 𝛾

𝑀 (𝛾)𝑝″ = 𝑀 (𝛾 ″ ) + (𝛾 ″ − 𝛾)𝐴𝑝″ ≤ 0

hence 𝑉 (𝑀 (𝛾)) ≤ 0
It is clear from the above argument that 𝛽0 , 𝛼0 are the minimal and maximal 𝛾 for which
𝑉 (𝑀 (𝛾)) = 0
Moreover, Hamburger et al. (1967) [50] show that the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)) is continuous
and nonincreasing in 𝛾
This suggests an algorithm to compute (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 ) for a given input-output pair
(𝐴, 𝐵)

66.6.2 Algorithm

Hamburger, Thompson and Weil (1967) [50] propose a simple bisection algorithm to find the
minimal and maximal roots (i.e. 𝛽0 and 𝛼0 ) of the function 𝛾 ↦ 𝑉 (𝑀 (𝛾))
Step 1
First, notice that we can easily find trivial upper and lower bounds for 𝛼0 and 𝛽0

• TEP requires that 𝑥𝑇 (𝐵 − 𝛼𝐴) ≥ 0𝑇 and 𝑥 > 0, so if 𝛼 is so large that max𝑖 {[(𝐵 −
𝛼𝐴)𝜄𝑛 ]𝑖 } < 0, then TEP ceases to have a solution

Accordingly, let UB be the 𝛼∗ that solves max𝑖 {[(𝐵 − 𝛼∗ 𝐴)𝜄𝑛 ]𝑖 } = 0

• Similar to the upper bound, if 𝛽 is so low that min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽𝐴)]𝑗 } > 0, then the EEP
has no solution and so we can define LB as the 𝛽 ∗ that solves min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽 ∗ 𝐴)]𝑗 } = 0
66.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)10

The bounds method calculates these trivial bounds for us

In [6]: N1.bounds()

Out[6]: (1.0, 2.0)

Step 2
Compute 𝛼0 and 𝛽0

• Finding 𝛼0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated with 𝑀 (𝛾). We can use either the primal or the dual LP problem
2. If 𝑉 (𝑀 (𝛾)) ≥ 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖
• Finding 𝛽0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associ-
ated with 𝑀 (𝛾). We can use either the primal or the dual LP problem
2. If 𝑉 (𝑀 (𝛾)) > 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖
Existence: Since 𝑉 (𝑀 (𝐿𝐵)) > 0 and 𝑉 (𝑀 (𝑈 𝐵)) < 0 and 𝑉 (𝑀 (⋅)) is a continuous,
nonincreasing function, there is at least one 𝛾 ∈ [𝐿𝐵, 𝑈 𝐵], s.t. 𝑉 (𝑀 (𝛾)) = 0

The zerosum method calculates the value and optimal strategies associated with a given 𝛾

In [7]: γ = 2

print(f'Value of the game with γ = {γ}')


print(N1.zerosum(γ=γ)[0])
print('Intensity vector (from the primal)')
print(N1.zerosum(γ=γ)[1])
print('Price vector (from the dual)')
print(N1.zerosum(γ=γ, dual=True)[1])

Value of the game with γ = 2


-0.24
Intensity vector (from the primal)
[0.32 0.28 0.4 ]
Price vector (from the dual)
[0.4 0.32 0.28 0. ]

In [8]: numb_grid = 100


γ_grid = np.linspace(0.4, 2.1, numb_grid)

value_ex1_grid = np.asarray([N1.zerosum(γ=γ_grid[i])[0] for i in range(numb_grid)])


value_ex2_grid = np.asarray([N2.zerosum(γ=γ_grid[i])[0] for i in range(numb_grid)])

fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharey=True)


fig.suptitle(r'The function $V(M(\gamma))$', fontsize=16)

for ax, grid, N, i in zip(axes, (value_ex1_grid, value_ex2_grid), (N1, N2), (1, 2)):
ax.plot(γ_grid, grid)
ax.set(title=f'Example {i}', xlabel='$\gamma$')
ax.axhline(0, c='k', lw=1)
ax.axvline(N.bounds()[0], c='r', ls='--', label='lower bound')
ax.axvline(N.bounds()[1], c='g', ls='--', label='upper bound')

plt.show()
1070 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

The expansion method implements the bisection algorithm for 𝛼0 (and uses the primal LP
problem for 𝑥0 )

In [9]: α_0, x, p = N1.expansion()


print(f'α_0 = {α_0}')
print(f'x_0 = {x}')
print(f'The corresponding p from the dual = {p}')

α_0 = 1.2599210478365421
x_0 = [0.33 0.26 0.41]
The corresponding p from the dual = [0.41 0.33 0.26 0. ]

The interest method implements the bisection algorithm for 𝛽0 (and uses the dual LP prob-
lem for 𝑝0 )

In [10]: β_0, x, p = N1.interest()


print(f'β_0 = {β_0}')
print(f'p_0 = {p}')
print(f'The corresponding x from the primal = {x}')

β_0 = 1.2599210478365421
p_0 = [0.41 0.33 0.26 0. ]
The corresponding x from the primal = [0.33 0.26 0.41]

Of course, when 𝛾 ∗ is unique, it is irrelevant which one of the two methods we use
In particular, as will be shown below, in case of an irreducible (𝐴, 𝐵) (like in Example 1), the
maximal and minimal roots of 𝑉 (𝑀 (𝛾)) necessarily coincide implying a ‘’full duality” result,
i.e. 𝛼0 = 𝛽0 = 𝛾 ∗ , and that the expansion (and interest) rate 𝛾 ∗ is unique

66.6.3 Uniqueness and Irreducibility

As an illustration, compute first the maximal and minimal roots of 𝑉 (𝑀 (⋅)) for Example 2,
which displays a reducible input-output pair (𝐴, 𝐵)

In [11]: α_0, x, p = N2.expansion()


print(f'α_0 = {α_0}')
print(f'x_0 = {x}')
print(f'The corresponding p from the dual = {p}')
66.6. INTERPRETATION AS A GAME THEORETIC PROBLEM (TWO-PLAYER ZERO-SUM GAME)10

α_0 = 1.2556518474593759
x_0 = [0. 0. 0.33 0.26 0.41]
The corresponding p from the dual = [4.43e-01 5.57e-01 0.00e+00 8.49e-17 1.26e-17 0.00e+00]

In [12]: β_0, x, p = N2.interest()


print(f'β_0 = {β_0}')
print(f'p_0 = {p}')
print(f'The corresponding x from the primal = {x}')

β_0 = 1.0000000009313226
p_0 = [0.5 0.5 0. 0. 0. 0. ]
The corresponding x from the primal = [3.33e-01 3.33e-01 3.33e-01 1.45e-19 0.00e+00]

As we can see, with a reducible (𝐴, 𝐵), the roots found by the bisection algorithms might dif-
fer, so there might be multiple 𝛾 ∗ that make the value of the game with 𝑀 (𝛾 ∗ ) zero. (see the
figure above)
Indeed, although the von Neumann theorem assures existence of the equilibrium, Assump-
tions I and II are not sufficient for uniqueness. Nonetheless, Kemeny et al. (1967) show that
there are at most finitely many economic solutions, meaning that there are only finitely many
𝛾 ∗ that satisfy 𝑉 (𝑀 (𝛾 ∗ )) = 0 and 𝑥𝑇0 𝐵𝑝0 > 0 and that for each such 𝛾𝑖∗ , there is a self-
sufficient part of the economy (a sub-economy) that in equilibrium can expand independently
with the expansion coefficient 𝛾𝑖∗
The following theorem (see Theorem 9.10. in Gale, 1960 [45]) asserts that imposing irre-
ducibility is sufficient for uniqueness of (𝛾 ∗ , 𝑥0 , 𝑝0 )
Theorem II: Consider the conditions of Theorem 1. If the economy (𝐴, 𝐵) is irreducible,
then 𝛾 ∗ = 𝛼0 = 𝛽0

66.6.4 A Special Case

There is a special (𝐴, 𝐵) that allows us to simplify the solution method significantly by invok-
ing the powerful Perron-Frobenius theorem for non-negative matrices
Definition: We call an economy simple if it satisfies 1. 𝑛 = 𝑚 2. Each activity produces
exactly one good 3. Each good is produced by one and only one activity
These assumptions imply that 𝐵 = 𝐼𝑛 , i.e., that 𝐵 can be written as an identity matrix (pos-
sibly after reshuffling its rows and columns)
The simple model has the following special property (Theorem 9.11. in [45]): if 𝑥0 and 𝛼0 > 0
solve the TEP with (𝐴, 𝐼𝑛 ), then

1
𝑥𝑇0 = 𝛼0 𝑥𝑇0 𝐴 ⇔ 𝑥𝑇0 𝐴 = ( ) 𝑥𝑇0
𝛼0

The latter shows that 1/𝛼0 is a positive eigenvalue of 𝐴 and 𝑥0 is the corresponding non-
negative left eigenvector
The classical result of Perron and Frobenius implies that a non-negative matrix always has
a non-negative eigenvalue-eigenvector pair
Moreover, if 𝐴 is irreducible, then the optimal intensity vector 𝑥0 is positive and unique up to
multiplication by a positive scalar
1072 66. VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

Suppose that 𝐴 is reducible with 𝑘 irreducible subsets 𝑆1 , … , 𝑆𝑘 . Let 𝐴𝑖 be the submatrix


corresponding to 𝑆𝑖 and let 𝛼𝑖 and 𝛽𝑖 be the associated expansion and interest factors, re-
spectively. Then we have

𝛼0 = max{𝛼𝑖 } and 𝛽0 = min{𝛽𝑖 }


𝑖 𝑖
Part X

Time Series Models

1073
67

Covariance Stationary Processes

67.1 Contents

• Overview 67.2
• Introduction 67.3
• Spectral Analysis 67.4
• Implementation 67.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

67.2 Overview

In this lecture we study covariance stationary linear stochastic processes, a class of models
routinely used to study economic and financial time series
This class has the advantage of being

1. simple enough to be described by an elegant and comprehensive theory


2. relatively broad in terms of the kinds of dynamics it can represent

We consider these models in both the time and frequency domain

67.2.1 ARMA Processes

We will focus much of our attention on linear covariance stationary models with a finite num-
ber of parameters
In particular, we will study stationary ARMA processes, which form a cornerstone of the
standard theory of time series analysis
Every ARMA process can be represented in linear state space form
However, ARMA processes have some important structure that makes it valuable to study
them separately

1075
1076 67. COVARIANCE STATIONARY PROCESSES

67.2.2 Spectral Analysis

Analysis in the frequency domain is also called spectral analysis


In essence, spectral analysis provides an alternative representation of the autocovariance func-
tion of a covariance stationary process
Having a second representation of this important object

• shines a light on the dynamics of the process in question


• allows for a simpler, more tractable representation in some important cases

The famous Fourier transform and its inverse are used to map between the two representa-
tions

67.2.3 Other Reading

For supplementary reading, see

• [87], chapter 2
• [118], chapter 11
• John Cochrane’s notes on time series analysis, chapter 8
• [122], chapter 6
• [29], all

67.3 Introduction

Consider a sequence of random variables {𝑋𝑡 } indexed by 𝑡 ∈ Z and taking values in R


Thus, {𝑋𝑡 } begins in the infinite past and extends to the infinite future — a convenient and
standard assumption
As in other fields, successful economic modeling typically assumes the existence of features
that are constant over time
If these assumptions are correct, then each new observation 𝑋𝑡 , 𝑋𝑡+1 , … can provide addi-
tional information about the time-invariant features, allowing us to learn from as data arrive
For this reason, we will focus in what follows on processes that are stationary — or become so
after a transformation (see for example this lecture and this lecture)

67.3.1 Definitions

A real-valued stochastic process {𝑋𝑡 } is called covariance stationary if

1. Its mean 𝜇 ∶= E𝑋𝑡 does not depend on 𝑡


2. For all 𝑘 in Z, the 𝑘-th autocovariance 𝛾(𝑘) ∶= E(𝑋𝑡 − 𝜇)(𝑋𝑡+𝑘 − 𝜇) is finite and de-
pends only on 𝑘

The function 𝛾 ∶ Z → R is called the autocovariance function of the process


67.3. INTRODUCTION 1077

Throughout this lecture, we will work exclusively with zero-mean (i.e., 𝜇 = 0) covariance
stationary processes
The zero-mean assumption costs nothing in terms of generality since working with non-zero-
mean processes involves no more than adding a constant

67.3.2 Example 1: White Noise

Perhaps the simplest class of covariance stationary processes is the white noise processes
A process {𝜖𝑡 } is called a white noise process if

1. E𝜖𝑡 = 0
2. 𝛾(𝑘) = 𝜎2 1{𝑘 = 0} for some 𝜎 > 0

(Here 1{𝑘 = 0} is defined to be 1 if 𝑘 = 0 and zero otherwise)


White noise processes play the role of building blocks for processes with more complicated
dynamics

67.3.3 Example 2: General Linear Processes

From the simple building block provided by white noise, we can construct a very flexible fam-
ily of covariance stationary processes — the general linear processes


𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 , 𝑡∈Z (1)
𝑗=0

where

• {𝜖𝑡 } is white noise



• {𝜓𝑡 } is a square summable sequence in R (that is, ∑𝑡=0 𝜓𝑡2 < ∞)

The sequence {𝜓𝑡 } is often called a linear filter


Equation Eq. (1) is said to present a moving average process or a moving average represen-
tation
With some manipulations, it is possible to confirm that the autocovariance function for
Eq. (1) is


2
𝛾(𝑘) = 𝜎 ∑ 𝜓𝑗 𝜓𝑗+𝑘 (2)
𝑗=0

By the Cauchy-Schwartz inequality, one can show that 𝛾(𝑘) satisfies equation Eq. (2)
Evidently, 𝛾(𝑘) does not depend on 𝑡
1078 67. COVARIANCE STATIONARY PROCESSES

67.3.4 Wold’s Decomposition

Remarkably, the class of general linear processes goes a long way towards describing the en-
tire class of zero-mean covariance stationary processes
In particular, Wold’s decomposition theorem states that every zero-mean covariance station-
ary process {𝑋𝑡 } can be written as


𝑋𝑡 = ∑ 𝜓𝑗 𝜖𝑡−𝑗 + 𝜂𝑡
𝑗=0

where

• {𝜖𝑡 } is white noise


• {𝜓𝑡 } is square summable
• 𝜂𝑡 can be expressed as a linear function of 𝑋𝑡−1 , 𝑋𝑡−2 , … and is perfectly predictable
over arbitrarily long horizons

For intuition and further discussion, see [118], p. 286

67.3.5 AR and MA

General linear processes are a very broad class of processes.


It often pays to specialize to those for which there exists a representation having only finitely
many parameters
(Experience and theory combine to indicate that models with a relatively small number of
parameters typically perform better than larger models, especially for forecasting)
One very simple example of such a model is the first-order autoregressive or AR(1) process

𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 where |𝜙| < 1 and {𝜖𝑡 } is white noise (3)



By direct substitution, it is easy to verify that 𝑋𝑡 = ∑𝑗=0 𝜙𝑗 𝜖𝑡−𝑗
Hence {𝑋𝑡 } is a general linear process
Applying Eq. (2) to the previous expression for 𝑋𝑡 , we get the AR(1) autocovariance function

𝜎2
𝛾(𝑘) = 𝜙𝑘 , 𝑘 = 0, 1, … (4)
1 − 𝜙2

The next figure plots an example of this function for 𝜙 = 0.8 and 𝜙 = −0.8 with 𝜎 = 1

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.4)

for i, � in enumerate((0.8, -0.8)):


ax = axes[i]
67.3. INTRODUCTION 1079

times = list(range(16))
acov = [�**k / (1 - �**2) for k in times]
ax.plot(times, acov, 'bo-', alpha=0.6, label=f'autocovariance, $\phi = {�:.2}$')
ax.legend(loc='upper right')
ax.set(xlabel='time', xlim=(0, 15))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
plt.show()

Another very simple process is the MA(1) process (here MA means “moving average”)

𝑋𝑡 = 𝜖𝑡 + 𝜃𝜖𝑡−1

You will be able to verify that

𝛾(0) = 𝜎2 (1 + 𝜃2 ), 𝛾(1) = 𝜎2 𝜃, and 𝛾(𝑘) = 0 ∀𝑘 > 1

The AR(1) can be generalized to an AR(𝑝) and likewise for the MA(1)
Putting all of this together, we get the

67.3.6 ARMA Processes

A stochastic process {𝑋𝑡 } is called an autoregressive moving average process, or ARMA(𝑝, 𝑞),
if it can be written as
1080 67. COVARIANCE STATIONARY PROCESSES

𝑋𝑡 = 𝜙1 𝑋𝑡−1 + ⋯ + 𝜙𝑝 𝑋𝑡−𝑝 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + ⋯ + 𝜃𝑞 𝜖𝑡−𝑞 (5)

where {𝜖𝑡 } is white noise


An alternative notation for ARMA processes uses the lag operator 𝐿
Def. Given arbitrary variable 𝑌𝑡 , let 𝐿𝑘 𝑌𝑡 ∶= 𝑌𝑡−𝑘
It turns out that

• lag operators facilitate succinct representations for linear stochastic processes


• algebraic manipulations that treat the lag operator as an ordinary scalar are legitimate

Using 𝐿, we can rewrite Eq. (5) as

𝐿0 𝑋𝑡 − 𝜙1 𝐿1 𝑋𝑡 − ⋯ − 𝜙𝑝 𝐿𝑝 𝑋𝑡 = 𝐿0 𝜖𝑡 + 𝜃1 𝐿1 𝜖𝑡 + ⋯ + 𝜃𝑞 𝐿𝑞 𝜖𝑡 (6)

If we let 𝜙(𝑧) and 𝜃(𝑧) be the polynomials

𝜙(𝑧) ∶= 1 − 𝜙1 𝑧 − ⋯ − 𝜙𝑝 𝑧𝑝 and 𝜃(𝑧) ∶= 1 + 𝜃1 𝑧 + ⋯ + 𝜃𝑞 𝑧𝑞 (7)

then Eq. (6) becomes

𝜙(𝐿)𝑋𝑡 = 𝜃(𝐿)𝜖𝑡 (8)

In what follows we always assume that the roots of the polynomial 𝜙(𝑧) lie outside the unit
circle in the complex plane
This condition is sufficient to guarantee that the ARMA(𝑝, 𝑞) process is covariance stationary
In fact, it implies that the process falls within the class of general linear processes described
above
That is, given an ARMA(𝑝, 𝑞) process {𝑋𝑡 } satisfying the unit circle condition, there exists a

square summable sequence {𝜓𝑡 } with 𝑋𝑡 = ∑𝑗=0 𝜓𝑗 𝜖𝑡−𝑗 for all 𝑡
The sequence {𝜓𝑡 } can be obtained by a recursive procedure outlined on page 79 of [29]
The function 𝑡 ↦ 𝜓𝑡 is often called the impulse response function

67.4 Spectral Analysis

Autocovariance functions provide a great deal of information about covariance stationary pro-
cesses
In fact, for zero-mean Gaussian processes, the autocovariance function characterizes the entire
joint distribution
Even for non-Gaussian processes, it provides a significant amount of information
It turns out that there is an alternative representation of the autocovariance function of a
covariance stationary process, called the spectral density
At times, the spectral density is easier to derive, easier to manipulate, and provides additional
intuition
67.4. SPECTRAL ANALYSIS 1081

67.4.1 Complex Numbers

Before discussing the spectral density, we invite you to recall the main properties of complex
numbers (or skip to the next section)
It can be helpful to remember that, in a formal sense, complex numbers are just points
(𝑥, 𝑦) ∈ R2 endowed with a specific notion of multiplication
When (𝑥, 𝑦) is regarded as a complex number, 𝑥 is called the real part and 𝑦 is called the
imaginary part
The modulus or absolute value of a complex number 𝑧 = (𝑥, 𝑦) is just its Euclidean norm in
R2 , but is usually written as |𝑧| instead of ‖𝑧‖
The product of two complex numbers (𝑥, 𝑦) and (𝑢, 𝑣) is defined to be (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢),
while addition is standard pointwise vector addition
When endowed with these notions of multiplication and addition, the set of complex numbers
forms a field — addition and multiplication play well together, just as they do in R
The complex number (𝑥, 𝑦) is often written as 𝑥 + 𝑖𝑦, where 𝑖 is called the imaginary unit and
is understood to obey 𝑖2 = −1
The 𝑥 + 𝑖𝑦 notation provides an easy way to remember the definition of multiplication given
above, because, proceeding naively,

(𝑥 + 𝑖𝑦)(𝑢 + 𝑖𝑣) = 𝑥𝑢 − 𝑦𝑣 + 𝑖(𝑥𝑣 + 𝑦𝑢)

Converted back to our first notation, this becomes (𝑥𝑢 − 𝑣𝑦, 𝑥𝑣 + 𝑦𝑢) as promised
Complex numbers can be represented in the polar form 𝑟𝑒𝑖𝜔 where

𝑟𝑒𝑖𝜔 ∶= 𝑟(cos(𝜔) + 𝑖 sin(𝜔)) = 𝑥 + 𝑖𝑦

where 𝑥 = 𝑟 cos(𝜔), 𝑦 = 𝑟 sin(𝜔), and 𝜔 = arctan(𝑦/𝑧) or tan(𝜔) = 𝑦/𝑥

67.4.2 Spectral Densities

Let {𝑋𝑡 } be a covariance stationary process with autocovariance function 𝛾 satisfying


∑𝑘 𝛾(𝑘)2 < ∞
The spectral density 𝑓 of {𝑋𝑡 } is defined as the discrete time Fourier transform of its autoco-
variance function 𝛾

𝑓(𝜔) ∶= ∑ 𝛾(𝑘)𝑒−𝑖𝜔𝑘 , 𝜔∈R


𝑘∈Z

(Some authors normalize the expression on the right by constants such as 1/𝜋 — the conven-
tion chosen makes little difference provided you are consistent)
Using the fact that 𝛾 is even, in the sense that 𝛾(𝑡) = 𝛾(−𝑡) for all 𝑡, we can show that

𝑓(𝜔) = 𝛾(0) + 2 ∑ 𝛾(𝑘) cos(𝜔𝑘) (9)


𝑘≥1

It is not difficult to confirm that 𝑓 is


1082 67. COVARIANCE STATIONARY PROCESSES

• real-valued
• even (𝑓(𝜔) = 𝑓(−𝜔) ), and
• 2𝜋-periodic, in the sense that 𝑓(2𝜋 + 𝜔) = 𝑓(𝜔) for all 𝜔

It follows that the values of 𝑓 on [0, 𝜋] determine the values of 𝑓 on all of R — the proof is an
exercise
For this reason, it is standard to plot the spectral density only on the interval [0, 𝜋]

67.4.3 Example 1: White Noise

Consider a white noise process {𝜖𝑡 } with standard deviation 𝜎


It is easy to check that in this case 𝑓(𝜔) = 𝜎2 . So 𝑓 is a constant function
As we will see, this can be interpreted as meaning that “all frequencies are equally present”
(White light has this property when frequency refers to the visible spectrum, a connection
that provides the origins of the term “white noise”)

67.4.4 Example 2: AR and MA and ARMA

It is an exercise to show that the MA(1) process 𝑋𝑡 = 𝜃𝜖𝑡−1 + 𝜖𝑡 has a spectral density

𝑓(𝜔) = 𝜎2 (1 + 2𝜃 cos(𝜔) + 𝜃2 ) (10)

With a bit more effort, it’s possible to show (see, e.g., p. 261 of [118]) that the spectral den-
sity of the AR(1) process 𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 is

𝜎2
𝑓(𝜔) = (11)
1 − 2𝜙 cos(𝜔) + 𝜙2

More generally, it can be shown that the spectral density of the ARMA process Eq. (5) is

2
𝜃(𝑒𝑖𝜔 )
𝑓(𝜔) = ∣ ∣ 𝜎2 (12)
𝜙(𝑒𝑖𝜔 )

where

• 𝜎 is the standard deviation of the white noise process {𝜖𝑡 }


• the polynomials 𝜙(⋅) and 𝜃(⋅) are as defined in Eq. (7)

The derivation of Eq. (12) uses the fact that convolutions become products under Fourier
transformations
The proof is elegant and can be found in many places — see, for example, [118], chapter 11,
section 4
It’s a nice exercise to verify that Eq. (10) and Eq. (11) are indeed special cases of Eq. (12)
67.4. SPECTRAL ANALYSIS 1083

67.4.5 Interpreting the Spectral Density

Plotting Eq. (11) reveals the shape of the spectral density for the AR(1) model when 𝜙 takes
the values 0.8 and -0.8 respectively

In [3]: def ar1_sd(�, ω):


return 1 / (1 - 2 * � * np.cos(ω) + �**2)

ωs = np.linspace(0, np.pi, 180)


num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.4)

# Autocovariance when phi = 0.8


for i, � in enumerate((0.8, -0.8)):
ax = axes[i]
sd = ar1_sd(�, ωs)
ax.plot(ωs, sd, 'b-', alpha=0.6, lw=2, label='spectral density, $\phi = {�:.2}$')
ax.legend(loc='upper center')
ax.set(xlabel='frequency', xlim=(0, np.pi))
plt.show()

These spectral densities correspond to the autocovariance functions for the AR(1) process
shown above
Informally, we think of the spectral density as being large at those 𝜔 ∈ [0, 𝜋] at which the
autocovariance function seems approximately to exhibit big damped cycles
To see the idea, let’s consider why, in the lower panel of the preceding figure, the spectral
density for the case 𝜙 = −0.8 is large at 𝜔 = 𝜋
1084 67. COVARIANCE STATIONARY PROCESSES

Recall that the spectral density can be expressed as

𝑓(𝜔) = 𝛾(0) + 2 ∑ 𝛾(𝑘) cos(𝜔𝑘) = 𝛾(0) + 2 ∑(−0.8)𝑘 cos(𝜔𝑘) (13)


𝑘≥1 𝑘≥1

When we evaluate this at 𝜔 = 𝜋, we get a large number because cos(𝜋𝑘) is large and positive
when (−0.8)𝑘 is positive, and large in absolute value and negative when (−0.8)𝑘 is negative
Hence the product is always large and positive, and hence the sum of the products on the
right-hand side of Eq. (13) is large
These ideas are illustrated in the next figure, which has 𝑘 on the horizontal axis

In [4]: � = -0.8
times = list(range(16))
y1 = [�**k / (1 - �**2) for k in times]
y2 = [np.cos(np.pi * k) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]

num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)

# Autocovariance when � = -0.8


ax = axes[0]
ax.plot(times, y1, 'bo-', alpha=0.6, label='$\gamma(k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-2, 0, 2))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)

# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)

# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)
ax.set_xlabel("k")

plt.show()
67.4. SPECTRAL ANALYSIS 1085

On the other hand, if we evaluate 𝑓(𝜔) at 𝜔 = 𝜋/3, then the cycles are not matched, the
sequence 𝛾(𝑘) cos(𝜔𝑘) contains both positive and negative terms, and hence the sum of these
terms is much smaller

In [5]: � = -0.8
times = list(range(16))
y1 = [�**k / (1 - �**2) for k in times]
y2 = [np.cos(np.pi * k/3) for k in times]
y3 = [a * b for a, b in zip(y1, y2)]

num_rows, num_cols = 3, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 8))
plt.subplots_adjust(hspace=0.25)

# Autocovariance when phi = -0.8


ax = axes[0]
ax.plot(times, y1, 'bo-', alpha=0.6, label='$\gamma(k)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-2, 0, 2))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)

# Cycles at frequency π
ax = axes[1]
ax.plot(times, y2, 'bo-', alpha=0.6, label='$\cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), yticks=(-1, 0, 1))
ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)

# Product
ax = axes[2]
ax.stem(times, y3, label='$\gamma(k) \cos(\pi k/3)$')
ax.legend(loc='upper right')
ax.set(xlim=(0, 15), ylim=(-3, 3), yticks=(-1, 0, 1, 2, 3))
1086 67. COVARIANCE STATIONARY PROCESSES

ax.hlines(0, 0, 15, linestyle='--', alpha=0.5)


ax.set_xlabel("$k$")

plt.show()

In summary, the spectral density is large at frequencies 𝜔 where the autocovariance function
exhibits damped cycles

67.4.6 Inverting the Transformation

We have just seen that the spectral density is useful in the sense that it provides a frequency-
based perspective on the autocovariance structure of a covariance stationary process
Another reason that the spectral density is useful is that it can be “inverted” to recover the
autocovariance function via the inverse Fourier transform
In particular, for all 𝑘 ∈ Z, we have

𝜋
1
𝛾(𝑘) = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 (14)
2𝜋 −𝜋

This is convenient in situations where the spectral density is easier to calculate and manipu-
late than the autocovariance function
(For example, the expression Eq. (12) for the ARMA spectral density is much easier to work
with than the expression for the ARMA autocovariance)
67.4. SPECTRAL ANALYSIS 1087

67.4.7 Mathematical Theory

This section is loosely based on [118], p. 249-253, and included for those who

• would like a bit more insight into spectral densities


• and have at least some background in Hilbert space theory

Others should feel free to skip to the next section — none of this material is necessary to
progress to computation
Recall that every separable Hilbert space 𝐻 has a countable orthonormal basis {ℎ𝑘 }
The nice thing about such a basis is that every 𝑓 ∈ 𝐻 satisfies

𝑓 = ∑ 𝛼𝑘 ℎ 𝑘 where 𝛼𝑘 ∶= ⟨𝑓, ℎ𝑘 ⟩ (15)


𝑘

where ⟨⋅, ⋅⟩ denotes the inner product in 𝐻


Thus, 𝑓 can be represented to any degree of precision by linearly combining basis vectors
The scalar sequence 𝛼 = {𝛼𝑘 } is called the Fourier coefficients of 𝑓, and satisfies ∑𝑘 |𝛼𝑘 |2 <

In other words, 𝛼 is in ℓ2 , the set of square summable sequences
Consider an operator 𝑇 that maps 𝛼 ∈ ℓ2 into its expansion ∑𝑘 𝛼𝑘 ℎ𝑘 ∈ 𝐻
The Fourier coefficients of 𝑇 𝛼 are just 𝛼 = {𝛼𝑘 }, as you can verify by confirming that
⟨𝑇 𝛼, ℎ𝑘 ⟩ = 𝛼𝑘
Using elementary results from Hilbert space theory, it can be shown that

• 𝑇 is one-to-one — if 𝛼 and 𝛽 are distinct in ℓ2 , then so are their expansions in 𝐻


• 𝑇 is onto — if 𝑓 ∈ 𝐻 then its preimage in ℓ2 is the sequence 𝛼 given by 𝛼𝑘 = ⟨𝑓, ℎ𝑘 ⟩
• 𝑇 is a linear isometry — in particular, ⟨𝛼, 𝛽⟩ = ⟨𝑇 𝛼, 𝑇 𝛽⟩

Summarizing these results, we say that any separable Hilbert space is isometrically isomor-
phic to ℓ2
In essence, this says that each separable Hilbert space we consider is just a different way of
looking at the fundamental space ℓ2
With this in mind, let’s specialize to a setting where

• 𝛾 ∈ ℓ2 is the autocovariance function of a covariance stationary process, and 𝑓 is the


spectral density
• 𝐻 = 𝐿2 , where 𝐿2 is the set of square summable functions on the interval [−𝜋, 𝜋], with
𝜋
inner product ⟨𝑔, ℎ⟩ = ∫−𝜋 𝑔(𝜔)ℎ(𝜔)𝑑𝜔
• {ℎ𝑘 } = the orthonormal basis for 𝐿2 given by the set of trigonometric functions

𝑒𝑖𝜔𝑘
ℎ𝑘 (𝜔) = √ , 𝑘 ∈ Z, 𝜔 ∈ [−𝜋, 𝜋]
2𝜋

Using the definition of 𝑇 from above and the fact that 𝑓 is even, we now have
1088 67. COVARIANCE STATIONARY PROCESSES

𝑒𝑖𝜔𝑘 1
𝑇 𝛾 = ∑ 𝛾(𝑘) √ = √ 𝑓(𝜔) (16)
𝑘∈Z 2𝜋 2𝜋

In other words, apart from a scalar multiple, the spectral density is just a transformation of
𝛾 ∈ ℓ2 under a certain linear isometry — a different way to view 𝛾
In particular, it is an expansion of the autocovariance function with respect to the trigono-
metric basis functions in 𝐿2
As discussed above, the Fourier coefficients of 𝑇 𝛾 are given by the sequence 𝛾, and, in partic-
ular, 𝛾(𝑘) = ⟨𝑇 𝛾, ℎ𝑘 ⟩
Transforming this inner product into its integral expression and using Eq. (16) gives Eq. (14),
justifying our earlier expression for the inverse transform

67.5 Implementation

Most code for working with covariance stationary models deals with ARMA models
Python code for studying ARMA models can be found in the tsa submodule of statsmodels
Since this code doesn’t quite cover our needs — particularly vis-a-vis spectral analysis —
we’ve put together the module arma.py, which is part of QuantEcon.py package
The module provides functions for mapping ARMA(𝑝, 𝑞) models into their

1. impulse response function


2. simulated time series
3. autocovariance function
4. spectral density

67.5.1 Application

Let’s use this code to replicate the plots on pages 68–69 of [87]
Here are some functions to generate the plots

In [6]: def plot_impulse_response(arma, ax=None):


if ax is None:
ax = plt.gca()
yi = arma.impulse_response()
ax.stem(list(range(len(yi))), yi)
ax.set(xlim=(-0.5), ylim=(min(yi)-0.1, max(yi)+0.1),
title='Impulse response', xlabel='time', ylabel='response')
return ax

def plot_spectral_density(arma, ax=None):


if ax is None:
ax = plt.gca()
w, spect = arma.spectral_density(two_pi=False)
ax.semilogy(w, spect)
ax.set(xlim=(0, np.pi), ylim=(0, np.max(spect)),
title='Spectral density', xlabel='frequency', ylabel='spectrum')
return ax

def plot_autocovariance(arma, ax=None):


if ax is None:
ax = plt.gca()
67.5. IMPLEMENTATION 1089

acov = arma.autocovariance()
ax.stem(list(range(len(acov))), acov)
ax.set(xlim=(-0.5, len(acov) - 0.5), title='Autocovariance',
xlabel='time', ylabel='autocovariance')
return ax

def plot_simulation(arma, ax=None):


if ax is None:
ax = plt.gca()
x_out = arma.simulation()
ax.plot(x_out)
ax.set(title='Sample path', xlabel='time', ylabel='state space')
return ax

def quad_plot(arma):
"""
Plots the impulse response, spectral_density, autocovariance,
and one realization of the process.

"""
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 8))
plot_functions = [plot_impulse_response,
plot_spectral_density,
plot_autocovariance,
plot_simulation]
for plot_func, ax in zip(plot_functions, axes.flatten()):
plot_func(arma, ax)
plt.tight_layout()
plt.show()

Now let’s call these functions to generate plots


As a warmup, let’s make sure things look right when we for the pure white noise model 𝑋𝑡 =
𝜖𝑡

In [7]: import quantecon as qe

� = 0.0
θ = 0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)

/home/anju/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py:538: ComplexWarning: Casting complex va


return array(a, dtype, copy=False, order=order)
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3507: UserWarning: Attempting to set
in singular transformations; automatically expanding.
bottom=1.0, top=1.0
self.set_ylim(upper, lower, auto=None)
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/artist.py:913: UserWarning: Attempted to set non-p
Invalid limit will be ignored.
return func(v)
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/transforms.py:954: ComplexWarning: Casting complex
self._points[:, 1] = interval
1090 67. COVARIANCE STATIONARY PROCESSES

If we look carefully, things look good: the spectrum is the flat line at 100 at the very top of
the spectrum graphs, which is at it should be
Also

1 𝜋
• the variance equals 1 = 2𝜋 ∫−𝜋 1𝑑𝜔 as it should
• the covariogram and impulse response look as they should
• it is actually challenging to visualize a time series realization of white noise –
a sequence of surprises – but this too looks pretty good

To get some more examples, as our laboratory we’ll replicate quartets of graphs that [87] use
to teach “how to read spectral densities”
Ljunqvist and Sargent’s first model is 𝑋𝑡 = 1.3𝑋𝑡−1 − .7𝑋𝑡−2 + 𝜖𝑡

In [8]: � = 1.3, -.7


θ = 0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)
67.5. IMPLEMENTATION 1091

Ljungqvist and Sargent’s second model is 𝑋𝑡 = .9𝑋𝑡−1 + 𝜖𝑡

In [9]: � = 0.9
θ = -0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)

Ljungqvist and Sargent’s third model is 𝑋𝑡 = .8𝑋𝑡−4 + 𝜖𝑡


1092 67. COVARIANCE STATIONARY PROCESSES

In [10]: � = 0., 0., 0., .8


θ = -0.0
arma = qe.ARMA(�, θ)
quad_plot(arma)

Ljungqvist and Sargent’s fourth model is 𝑋𝑡 = .98𝑋𝑡−1 + 𝜖𝑡 − .7𝜖𝑡−1

In [11]: � = .98
θ = -0.7
arma = qe.ARMA(�, θ)
quad_plot(arma)
67.5. IMPLEMENTATION 1093

67.5.2 Explanation

The call

arma = ARMA(�, θ, σ)

creates an instance arma that represents the ARMA(𝑝, 𝑞) model

𝑋𝑡 = 𝜙1 𝑋𝑡−1 + ... + 𝜙𝑝 𝑋𝑡−𝑝 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + ... + 𝜃𝑞 𝜖𝑡−𝑞

If � and θ are arrays or sequences, then the interpretation will be

• � holds the vector of parameters (𝜙1 , 𝜙2 , ..., 𝜙𝑝 )


• θ holds the vector of parameters (𝜃1 , 𝜃2 , ..., 𝜃𝑞 )

The parameter σ is always a scalar, the standard deviation of the white noise
We also permit � and θ to be scalars, in which case the model will be interpreted as

𝑋𝑡 = 𝜙𝑋𝑡−1 + 𝜖𝑡 + 𝜃𝜖𝑡−1

The two numerical packages most useful for working with ARMA models are scipy.signal
and numpy.fft
The package scipy.signal expects the parameters to be passed into its functions in a
manner consistent with the alternative ARMA notation Eq. (8)
For example, the impulse response sequence {𝜓𝑡 } discussed above can be obtained using
scipy.signal.dimpulse, and the function call should be of the form
1094 67. COVARIANCE STATIONARY PROCESSES

times, ψ = dimpulse((ma_poly, ar_poly, 1), n=impulse_length)

where ma_poly and ar_poly correspond to the polynomials in Eq. (7) — that is,

• ma_poly is the vector (1, 𝜃1 , 𝜃2 , … , 𝜃𝑞 )


• ar_poly is the vector (1, −𝜙1 , −𝜙2 , … , −𝜙𝑝 )

To this end, we also maintain the arrays ma_poly and ar_poly as instance data, with their
values computed automatically from the values of phi and theta supplied by the user
If the user decides to change the value of either theta or phi ex-post by assignments such
as arma.phi = (0.5, 0.2) or arma.theta = (0, -0.1)
then ma_poly and ar_poly should update automatically to reflect these new parameters
This is achieved in our implementation by using descriptors

67.5.3 Computing the Autocovariance Function

As discussed above, for ARMA processes the spectral density has a simple representation that
is relatively easy to calculate
Given this fact, the easiest way to obtain the autocovariance function is to recover it from the
spectral density via the inverse Fourier transform
Here we use NumPy’s Fourier transform package np.fft, which wraps a standard Fortran-
based package called FFTPACK
A look at the np.fft documentation shows that the inverse transform np.fft.ifft takes a given
sequence 𝐴0 , 𝐴1 , … , 𝐴𝑛−1 and returns the sequence 𝑎0 , 𝑎1 , … , 𝑎𝑛−1 defined by

1 𝑛−1
𝑎𝑘 = ∑ 𝐴 𝑒𝑖𝑘2𝜋𝑡/𝑛
𝑛 𝑡=0 𝑡

Thus, if we set 𝐴𝑡 = 𝑓(𝜔𝑡 ), where 𝑓 is the spectral density and 𝜔𝑡 ∶= 2𝜋𝑡/𝑛, then

1 𝑛−1 𝑖𝜔𝑡 𝑘 1 2𝜋 𝑛−1


𝑎𝑘 = ∑ 𝑓(𝜔𝑡 )𝑒 = ∑ 𝑓(𝜔𝑡 )𝑒𝑖𝜔𝑡 𝑘 , 𝜔𝑡 ∶= 2𝜋𝑡/𝑛
𝑛 𝑡=0 2𝜋 𝑛 𝑡=0

For 𝑛 sufficiently large, we then have

2𝜋 𝜋
1 1
𝑎𝑘 ≈ ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔 = ∫ 𝑓(𝜔)𝑒𝑖𝜔𝑘 𝑑𝜔
2𝜋 0 2𝜋 −𝜋

(You can check the last equality)


In view of Eq. (14), we have now shown that, for 𝑛 sufficiently large, 𝑎𝑘 ≈ 𝛾(𝑘) — which is
exactly what we want to compute
68

Estimation of Spectra

68.1 Contents

• Overview 68.2
• Periodograms 68.3
• Smoothing 68.4
• Exercises 68.5
• Solutions 68.6

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

68.2 Overview

In a previous lecture, we covered some fundamental properties of covariance stationary linear


stochastic processes
One objective for that lecture was to introduce spectral densities — a standard and very use-
ful technique for analyzing such processes
In this lecture, we turn to the problem of estimating spectral densities and other related
quantities from data
Estimates of the spectral density are computed using what is known as a periodogram —
which in turn is computed via the famous fast Fourier transform
Once the basic technique has been explained, we will apply it to the analysis of several key
macroeconomic time series
For supplementary reading, see [118] or [29]

68.3 Periodograms

Recall that the spectral density 𝑓 of a covariance stationary process with autocorrelation
function 𝛾 can be written

1095
1096 68. ESTIMATION OF SPECTRA

𝑓(𝜔) = 𝛾(0) + 2 ∑ 𝛾(𝑘) cos(𝜔𝑘), 𝜔∈R


𝑘≥1

Now consider the problem of estimating the spectral density of a given time series, when 𝛾 is
unknown
In particular, let 𝑋0 , … , 𝑋𝑛−1 be 𝑛 consecutive observations of a single time series that is as-
sumed to be covariance stationary
The most common estimator of the spectral density of this process is the periodogram of
𝑋0 , … , 𝑋𝑛−1 , which is defined as

2
1 𝑛−1
𝐼(𝜔) ∶= ∣∑ 𝑋𝑡 𝑒𝑖𝑡𝜔 ∣ , 𝜔∈R (1)
𝑛 𝑡=0

(Recall that |𝑧| denotes the modulus of complex number 𝑧)


Alternatively, 𝐼(𝜔) can be expressed as

2 2
1⎧
{ 𝑛−1 𝑛−1 ⎫
}
𝐼(𝜔) = [∑ 𝑋𝑡 cos(𝜔𝑡)] + [∑ 𝑋𝑡 sin(𝜔𝑡)]
𝑛⎨
{ ⎬
}
⎩ 𝑡=0 𝑡=0 ⎭

It is straightforward to show that the function 𝐼 is even and 2𝜋-periodic (i.e., 𝐼(𝜔) = 𝐼(−𝜔)
and 𝐼(𝜔 + 2𝜋) = 𝐼(𝜔) for all 𝜔 ∈ R)
From these two results, you will be able to verify that the values of 𝐼 on [0, 𝜋] determine the
values of 𝐼 on all of R
The next section helps to explain the connection between the periodogram and the spectral
density

68.3.1 Interpretation

To interpret the periodogram, it is convenient to focus on its values at the Fourier frequencies

2𝜋𝑗
𝜔𝑗 ∶= , 𝑗 = 0, … , 𝑛 − 1
𝑛

In what sense is 𝐼(𝜔𝑗 ) an estimate of 𝑓(𝜔𝑗 )?


The answer is straightforward, although it does involve some algebra
With a bit of effort, one can show that for any integer 𝑗 > 0,

𝑛−1 𝑛−1
𝑖𝑡𝜔𝑗 𝑡
∑𝑒 = ∑ exp {𝑖2𝜋𝑗 } = 0
𝑡=0 𝑡=0
𝑛

𝑛−1
Letting 𝑋̄ denote the sample mean 𝑛−1 ∑𝑡=0 𝑋𝑡 , we then have

2
𝑛−1 𝑛−1 𝑛−1
̄ 𝑖𝑡𝜔𝑗 ∣ = ∑(𝑋𝑡 − 𝑋)𝑒
𝑛𝐼(𝜔𝑗 ) = ∣∑(𝑋𝑡 − 𝑋)𝑒 ̄ −𝑖𝑟𝜔𝑗
̄ 𝑖𝑡𝜔𝑗 ∑(𝑋𝑟 − 𝑋)𝑒
𝑡=0 𝑡=0 𝑟=0
68.3. PERIODOGRAMS 1097

By carefully working through the sums, one can transform this to

𝑛−1 𝑛−1 𝑛−1


𝑛𝐼(𝜔𝑗 ) = ∑(𝑋𝑡 − 𝑋)̄ 2 + 2 ∑ ∑(𝑋𝑡 − 𝑋)(𝑋
̄ ̄
𝑡−𝑘 − 𝑋) cos(𝜔𝑗 𝑘)
𝑡=0 𝑘=1 𝑡=𝑘

Now let

1 𝑛−1 ̄ ̄
𝛾(𝑘)
̂ ∶= ∑(𝑋 − 𝑋)(𝑋 𝑡−𝑘 − 𝑋), 𝑘 = 0, 1, … , 𝑛 − 1
𝑛 𝑡=𝑘 𝑡

This is the sample autocovariance function, the natural “plug-in estimator” of the autocovari-
ance function 𝛾
(“Plug-in estimator” is an informal term for an estimator found by replacing expectations
with sample means)
With this notation, we can now write

𝑛−1
𝐼(𝜔𝑗 ) = 𝛾(0)
̂ + 2 ∑ 𝛾(𝑘)
̂ cos(𝜔𝑗 𝑘)
𝑘=1

Recalling our expression for 𝑓 given above, we see that 𝐼(𝜔𝑗 ) is just a sample analog of 𝑓(𝜔𝑗 )

68.3.2 Calculation

Let’s now consider how to compute the periodogram as defined in Eq. (1)
There are already functions available that will do this for us — an example is statsmod-
els.tsa.stattools.periodogram in the statsmodels package
However, it is very simple to replicate their results, and this will give us a platform to make
useful extensions
The most common way to calculate the periodogram is via the discrete Fourier transform,
which in turn is implemented through the fast Fourier transform algorithm
In general, given a sequence 𝑎0 , … , 𝑎𝑛−1 , the discrete Fourier transform computes the se-
quence

𝑛−1
𝑡𝑗
𝐴𝑗 ∶= ∑ 𝑎𝑡 exp {𝑖2𝜋 }, 𝑗 = 0, … , 𝑛 − 1
𝑡=0
𝑛

With numpy.fft.fft imported as fft and 𝑎0 , … , 𝑎𝑛−1 stored in NumPy array a, the func-
tion call fft(a) returns the values 𝐴0 , … , 𝐴𝑛−1 as a NumPy array
It follows that when the data 𝑋0 , … , 𝑋𝑛−1 are stored in array X, the values 𝐼(𝜔𝑗 ) at the
Fourier frequencies, which are given by

2
1 𝑛−1 𝑡𝑗
∣∑ 𝑋𝑡 exp {𝑖2𝜋 }∣ , 𝑗 = 0, … , 𝑛 − 1
𝑛 𝑡=0 𝑛

can be computed by np.abs(fft(X))**2 / len(X)


1098 68. ESTIMATION OF SPECTRA

Note: The NumPy function abs acts elementwise, and correctly handles complex numbers
(by computing their modulus, which is exactly what we need)
A function called periodogram that puts all this together can be found here
Let’s generate some data for this function using the ARMA class from QuantEcon.py (see the
lecture on linear processes for more details)
Here’s a code snippet that, once the preceding code has been run, generates data from the
process

𝑋𝑡 = 0.5𝑋𝑡−1 + 𝜖𝑡 − 0.8𝜖𝑡−2 (2)

where {𝜖𝑡 } is white noise with unit variance, and compares the periodogram to the actual
spectral density

In [2]: import matplotlib.pyplot as plt


%matplotlib inline
from quantecon import ARMA, periodogram

n = 40 # Data size
�, θ = 0.5, (0, -0.8) # AR and MA parameters
lp = ARMA(�, θ)
X = lp.simulation(ts_length=n)

fig, ax = plt.subplots()
x, y = periodogram(X)
ax.plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')
x_sd, y_sd = lp.spectral_density(two_pi=False, res=120)
ax.plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')
ax.legend()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py:538: ComplexWarning: Casting complex va


return array(a, dtype, copy=False, order=order)
68.4. SMOOTHING 1099

This estimate looks rather disappointing, but the data size is only 40, so perhaps it’s not sur-
prising that the estimate is poor
However, if we try again with n = 1200 the outcome is not much better

The periodogram is far too irregular relative to the underlying spectral density
This brings us to our next topic

68.4 Smoothing

There are two related issues here


One is that, given the way the fast Fourier transform is implemented, the number of points 𝜔
at which 𝐼(𝜔) is estimated increases in line with the amount of data
In other words, although we have more data, we are also using it to estimate more values
A second issue is that densities of all types are fundamentally hard to estimate without para-
metric assumptions
Typically, nonparametric estimation of densities requires some degree of smoothing
The standard way that smoothing is applied to periodograms is by taking local averages
In other words, the value 𝐼(𝜔𝑗 ) is replaced with a weighted average of the adjacent values

𝐼(𝜔𝑗−𝑝 ), 𝐼(𝜔𝑗−𝑝+1 ), … , 𝐼(𝜔𝑗 ), … , 𝐼(𝜔𝑗+𝑝 )

This weighted average can be written as

𝑝
𝐼𝑆 (𝜔𝑗 ) ∶= ∑ 𝑤(ℓ)𝐼(𝜔𝑗+ℓ ) (3)
ℓ=−𝑝
1100 68. ESTIMATION OF SPECTRA

where the weights 𝑤(−𝑝), … , 𝑤(𝑝) are a sequence of 2𝑝 + 1 nonnegative values summing to
one
In general, larger values of 𝑝 indicate more smoothing — more on this below
The next figure shows the kind of sequence typically used
Note the smaller weights towards the edges and larger weights in the center, so that more dis-
tant values from 𝐼(𝜔𝑗 ) have less weight than closer ones in the sum Eq. (3)

In [3]: import numpy as np

def hanning_window(M):
w = [0.5 - 0.5 * np.cos(2 * np.pi * n/(M-1)) for n in range(M)]
return w

window = hanning_window(25) / np.abs(sum(hanning_window(25)))


x = np.linspace(-12, 12, 25)
plt.figure(figsize=(9, 7))
plt.plot(x, window)
plt.title("Hanning window")
plt.ylabel("Weights")
plt.xlabel("Position in sequence of weights")
plt.show()

68.4.1 Estimation with Smoothing

Our next step is to provide code that will not only estimate the periodogram but also provide
smoothing as required
68.4. SMOOTHING 1101

Such functions have been written in estspec.py and are available once you’ve installed Quan-
tEcon.py
The GitHub listing displays three functions, smooth(), periodogram(),
ar_periodogram(). We will discuss the first two here and the third one below
The periodogram() function returns a periodogram, optionally smoothed via the
smooth() function
Regarding the smooth() function, since smoothing adds a nontrivial amount of computa-
tion, we have applied a fairly terse array-centric method based around np.convolve
Readers are left either to explore or simply to use this code according to their interests
The next three figures each show smoothed and unsmoothed periodograms, as well as the
population or “true” spectral density
(The model is the same as before — see equation Eq. (2) — and there are 400 observations)
From the top figure to bottom, the window length is varied from small to large

In looking at the figure, we can see that for this model and data size, the window length cho-
sen in the middle figure provides the best fit
1102 68. ESTIMATION OF SPECTRA

Relative to this value, the first window length provides insufficient smoothing, while the third
gives too much smoothing
Of course in real estimation problems, the true spectral density is not visible and the choice
of appropriate smoothing will have to be made based on judgement/priors or some other the-
ory

68.4.2 Pre-Filtering and Smoothing

In the code listing, we showed three functions from the file estspec.py
The third function in the file (ar_periodogram()) adds a pre-processing step to peri-
odogram smoothing
First, we describe the basic idea, and after that we give the code
The essential idea is to

1. Transform the data in order to make estimation of the spectral density more efficient
2. Compute the periodogram associated with the transformed data
3. Reverse the effect of the transformation on the periodogram, so that it now estimates
the spectral density of the original process

Step 1 is called pre-filtering or pre-whitening, while step 3 is called recoloring


The first step is called pre-whitening because the transformation is usually designed to turn
the data into something closer to white noise
Why would this be desirable in terms of spectral density estimation?
The reason is that we are smoothing our estimated periodogram based on estimated values at
nearby points — recall Eq. (3)
The underlying assumption that makes this a good idea is that the true spectral density is
relatively regular — the value of 𝐼(𝜔) is close to that of 𝐼(𝜔′ ) when 𝜔 is close to 𝜔′
This will not be true in all cases, but it is certainly true for white noise
For white noise, 𝐼 is as regular as possible — it is a constant function
In this case, values of 𝐼(𝜔′ ) at points 𝜔′ near to 𝜔 provided the maximum possible amount of
information about the value 𝐼(𝜔)
Another way to put this is that if 𝐼 is relatively constant, then we can use a large amount of
smoothing without introducing too much bias

68.4.3 The AR(1) Setting

Let’s examine this idea more carefully in a particular setting — where the data are assumed
to be generated by an AR(1) process
(More general ARMA settings can be handled using similar techniques to those described be-
low)
Suppose in particular that {𝑋𝑡 } is covariance stationary and AR(1), with

𝑋𝑡+1 = 𝜇 + 𝜙𝑋𝑡 + 𝜖𝑡+1 (4)


68.4. SMOOTHING 1103

where 𝜇 and 𝜙 ∈ (−1, 1) are unknown parameters and {𝜖𝑡 } is white noise
It follows that if we regress 𝑋𝑡+1 on 𝑋𝑡 and an intercept, the residuals will approximate white
noise
Let

• 𝑔 be the spectral density of {𝜖𝑡 } — a constant function, as discussed above


• 𝐼0 be the periodogram estimated from the residuals — an estimate of 𝑔
• 𝑓 be the spectral density of {𝑋𝑡 } — the object we are trying to estimate

In view of an earlier result we obtained while discussing ARMA processes, 𝑓 and 𝑔 are related
by

2
1
𝑓(𝜔) = ∣ ∣ 𝑔(𝜔) (5)
1 − 𝜙𝑒𝑖𝜔

This suggests that the recoloring step, which constructs an estimate 𝐼 of 𝑓 from 𝐼0 , should set

2
1
𝐼(𝜔) = ∣ ∣ 𝐼0 (𝜔)
1 − 𝜙𝑒̂ 𝑖𝜔

where 𝜙 ̂ is the OLS estimate of 𝜙


The code for ar_periodogram() — the third function in estspec.py — does exactly
this. (See the code here)
The next figure shows realizations of the two kinds of smoothed periodograms

1. “standard smoothed periodogram”, the ordinary smoothed periodogram, and


2. “AR smoothed periodogram”, the pre-whitened and recolored one generated by
ar_periodogram()

The periodograms are calculated from time series drawn from Eq. (4) with 𝜇 = 0 and 𝜙 =
−0.9
Each time series is of length 150
The difference between the three subfigures is just randomness — each one uses a different
1104 68. ESTIMATION OF SPECTRA

draw of the time series

In all cases, periodograms are fit with the “hamming” window and window length of 65
Overall, the fit of the AR smoothed periodogram is much better, in the sense of being closer
to the true spectral density

68.5 Exercises

68.5.1 Exercise 1

Replicate this figure (modulo randomness)


The model is as in equation Eq. (2) and there are 400 observations
For the smoothed periodogram, the window type is “hamming”

68.5.2 Exercise 2

Replicate this figure (modulo randomness)


68.6. SOLUTIONS 1105

The model is as in equation Eq. (4), with 𝜇 = 0, 𝜙 = −0.9 and 150 observations in each time
series
All periodograms are fit with the “hamming” window and window length of 65

68.6 Solutions

In [4]: from quantecon import ar_periodogram

68.6.1 Exercise 1

In [5]: ## Data
n = 400
� = 0.5
θ = 0, -0.8
lp = ARMA(�, θ)
X = lp.simulation(ts_length=n)

fig, ax = plt.subplots(3, 1, figsize=(10, 12))

for i, wl in enumerate((15, 55, 175)): # window lengths

x, y = periodogram(X)
ax[i].plot(x, y, 'b-', lw=2, alpha=0.5, label='periodogram')

x_sd, y_sd = lp.spectral_density(two_pi=False, res=120)


ax[i].plot(x_sd, y_sd, 'r-', lw=2, alpha=0.8, label='spectral density')

x, y_smoothed = periodogram(X, window='hamming', window_len=wl)


ax[i].plot(x, y_smoothed, 'k-', lw=2, label='smoothed periodogram')

ax[i].legend()
ax[i].set_title(f'window length = {wl}')
plt.show()
1106 68. ESTIMATION OF SPECTRA

68.6.2 Exercise 2
In [6]: lp = ARMA(-0.9)
wl = 65

fig, ax = plt.subplots(3, 1, figsize=(10,12))

for i in range(3):
X = lp.simulation(ts_length=150)
ax[i].set_xlim(0, np.pi)

x_sd, y_sd = lp.spectral_density(two_pi=False, res=180)


ax[i].semilogy(x_sd, y_sd, 'r-', lw=2, alpha=0.75, label='spectral density')

x, y_smoothed = periodogram(X, window='hamming', window_len=wl)


ax[i].semilogy(x, y_smoothed, 'k-', lw=2, alpha=0.75, label='standard smoothed periodogram')
68.6. SOLUTIONS 1107

x, y_ar = ar_periodogram(X, window='hamming', window_len=wl)


ax[i].semilogy(x, y_ar, 'b-', lw=2, alpha=0.75, label='AR smoothed periodogram')

ax[i].legend(loc='upper left')
plt.show()
1108 68. ESTIMATION OF SPECTRA
69

Additive and Multiplicative


Functionals

69.1 Contents

• Overview 69.2
• A Particular Additive Functional 69.3
• Dynamics 69.4
• Code 69.5
• More About the Multiplicative Martingale 69.6

Co-authors: Chase Coleman and Balint Szoke


In addition what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

69.2 Overview

Many economic time series display persistent growth that prevents them from being asymp-
totically stationary and ergodic
For example, outputs, prices, and dividends typically display irregular but persistent growth
Asymptotic stationarity and ergodicity are key assumptions needed to make it possible to
learn by applying statistical methods
Are there ways to model time series having persistent growth that still enables statistical
learning based on a law of large number for an asymptotically stationary and ergodic process?
The answer provided by Hansen and Scheinkman [60] is yes
They described two classes of time series models that accommodate growth
They are:

1. additive functionals that display random “arithmetic growth”

1109
1110 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

2. multiplicative functionals that display random “geometric growth”

These two classes of processes are closely connected


If a process {𝑦𝑡 } is an additive functional and 𝜙𝑡 = exp(𝑦𝑡 ), then {𝜙𝑡 } is a multiplicative func-
tional
Hansen and Sargent [58] (chs. 5 and 8) describe discrete time versions of additive and multi-
plicative functionals
In this lecture, we describe both additive functionals and multiplicative functionals
We also describe and compute decompositions of additive and multiplicative processes into
four components

1. a constant
2. a trend component
3. an asymptotically stationary component
4. a martingale

We describe how to construct, simulate, and interpret these components


More details about these concepts and algorithms can be found in Hansen and Sargent [58]

69.3 A Particular Additive Functional

Hansen and Sargent [58] describe a general class of additive functionals


This lecture focuses on a subclass of these: a scalar process {𝑦𝑡 }∞
𝑡=0 whose increments are
driven by a Gaussian vector autoregression
Our special additive functional displays interesting time series behavior while also being easy
to construct, simulate, and analyze by using linear state-space tools
We construct our additive functional from two pieces, the first of which is a first-order vec-
tor autoregression (VAR)

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝑧𝑡+1 (1)

Here

• 𝑥𝑡 is an 𝑛 × 1 vector,
• 𝐴 is an 𝑛 × 𝑛 stable matrix (all eigenvalues lie within the open unit circle),
• 𝑧𝑡+1 ∼ 𝑁 (0, 𝐼) is an 𝑚 × 1 IID shock,
• 𝐵 is an 𝑛 × 𝑚 matrix, and
• 𝑥0 ∼ 𝑁 (𝜇0 , Σ0 ) is a random initial condition for 𝑥

The second piece is an equation that expresses increments of {𝑦𝑡 }∞


𝑡=0 as linear functions of

• a scalar constant 𝜈,
• the vector 𝑥𝑡 , and
• the same Gaussian vector 𝑧𝑡+1 that appears in the VAR Eq. (1)
69.4. DYNAMICS 1111

In particular,

𝑦𝑡+1 − 𝑦𝑡 = 𝜈 + 𝐷𝑥𝑡 + 𝐹 𝑧𝑡+1 (2)

Here 𝑦0 ∼ 𝑁 (𝜇𝑦0 , Σ𝑦0 ) is a random initial condition for 𝑦


The nonstationary random process {𝑦𝑡 }∞
𝑡=0 displays systematic but random arithmetic growth

69.3.1 Linear State-Space Representation

A convenient way to represent our additive functional is to use a linear state space system
To do this, we set up state and observation vectors

1
𝑥
𝑥𝑡̂ = ⎢𝑥𝑡 ⎤

⎥ and 𝑦𝑡̂ = [ 𝑡 ]
𝑦𝑡
⎣ 𝑦𝑡 ⎦

Next we construct a linear system

1 1 0 0 1 0
⎡𝑥 ⎤ = ⎡0 𝐴 0⎤ ⎡𝑥 ⎤ + ⎡ 𝐵 ⎤ 𝑧
⎢ 𝑡+1 ⎥ ⎢ ⎥ ⎢ 𝑡 ⎥ ⎢ ⎥ 𝑡+1
′ ′
⎣ 𝑦𝑡+1 ⎦ ⎣𝜈 𝐷 1⎦ ⎣ 𝑦𝑡 ⎦ ⎣𝐹 ⎦

1
𝑥𝑡 0 𝐼 0 ⎡ ⎤
[ ]=[ ] 𝑥
𝑦𝑡 0 0 1 ⎢ 𝑡⎥
⎣ 𝑦𝑡 ⎦

This can be written as

𝑥𝑡+1
̂ = 𝐴𝑥 ̂ ̂ + 𝐵𝑧
̂ 𝑡+1
𝑡

𝑦𝑡̂ = 𝐷̂ 𝑥𝑡̂

which is a standard linear state space system


To study it, we could map it into an instance of LinearStateSpace from QuantEcon.py
But here we will use a different set of code for simulation, for reasons described below

69.4 Dynamics

Let’s run some simulations to build intuition


In doing so we’ll assume that 𝑧𝑡+1 is scalar and that 𝑥𝑡̃ follows a 4th-order scalar autoregres-
sion

𝑥𝑡+1
̃ = 𝜙1 𝑥𝑡̃ + 𝜙2 𝑥𝑡−1
̃ + 𝜙3 𝑥𝑡−2
̃ + 𝜙4 𝑥𝑡−3
̃ + 𝜎𝑧𝑡+1 (3)

in which the zeros 𝑧 of the polynomial


1112 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

𝜙(𝑧) = (1 − 𝜙1 𝑧 − 𝜙2 𝑧2 − 𝜙3 𝑧3 − 𝜙4 𝑧4 )

are strictly greater than unity in absolute value


(Being a zero of 𝜙(𝑧) means that 𝜙(𝑧) = 0)
Let the increment in {𝑦𝑡 } obey

𝑦𝑡+1 − 𝑦𝑡 = 𝜈 + 𝑥𝑡̃ + 𝜎𝑧𝑡+1

with an initial condition for 𝑦0


While Eq. (3) is not a first order system like Eq. (1), we know that it can be mapped into a
first order system

• for an example of such a mapping, see this example

In fact, this whole model can be mapped into the additive functional system definition in
Eq. (1) – Eq. (2) by appropriate selection of the matrices 𝐴, 𝐵, 𝐷, 𝐹
You can try writing these matrices down now as an exercise — correct expressions appear in
the code below

69.4.1 Simulation

When simulating we embed our variables into a bigger system


This system also constructs the components of the decompositions of 𝑦𝑡 and of exp(𝑦𝑡 ) pro-
posed by Hansen and Scheinkman [60]
All of these objects are computed using the code below

In [2]: """
@authors: Chase Coleman, Balint Szoke, Tom Sargent

"""

import numpy as np
import scipy as sp
import scipy.linalg as la
import quantecon as qe
import matplotlib.pyplot as plt
from scipy.stats import norm, lognorm

class AMF_LSS_VAR:
"""
This class transforms an additive (multiplicative)
functional into a QuantEcon linear state space system.
"""

def __init__(self, A, B, D, F=None, ν=None):


# Unpack required elements
self.nx, self.nk = B.shape
self.A, self.B = A, B

# checking the dimension of D (extended from the scalar case)


if len(D.shape) > 1 and D.shape[0] != 1:
self.nm = D.shape[0]
self.D = D
69.4. DYNAMICS 1113

elif len(D.shape) > 1 and D.shape[0] == 1:


self.nm = 1
self.D = D
else:
self.nm = 1
self.D = np.expand_dims(D, 0)

# Create space for additive decomposition


self.add_decomp = None
self.mult_decomp = None

# Set F
if not np.any(F):
self.F = np.zeros((self.nk, 1))
else:
self.F = F

# Set ν
if not np.any(ν):
self.ν = np.zeros((self.nm, 1))
elif type(ν) == float:
self.ν = np.asarray([[ν]])
elif len(ν.shape) == 1:
self.ν = np.expand_dims(ν, 1)
else:
self.ν = ν

if self.ν.shape[0] != self.D.shape[0]:
raise ValueError("The dimension of ν is inconsistent with D!")

# Construct BIG state space representation


self.lss = self.construct_ss()

def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
nx, nk, nm = self.nx, self.nk, self.nm
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()

# Auxiliary blocks with 0's and 1's to fill out the lss matrices
nx0c = np.zeros((nx, 1))
nx0r = np.zeros(nx)
nx1 = np.ones(nx)
nk0 = np.zeros(nk)
ny0c = np.zeros((nm, 1))
ny0r = np.zeros(nm)
ny1m = np.eye(nm)
ny0m = np.zeros((nm, nm))
nyx0m = np.zeros_like(D)

# Build A matrix for LSS


# Order of states is: [1, t, xt, yt, mt]
A1 = np.hstack([1, 0, nx0r, ny0r, ny0r]) # Transition for 1
A2 = np.hstack([1, 1, nx0r, ny0r, ny0r]) # Transition for t
A3 = np.hstack([nx0c, nx0c, A, nyx0m.T, nyx0m.T]) # Transition for x_{t+1}
A4 = np.hstack([ν, ny0c, D, ny1m, ny0m]) # Transition for y_{t+1}
A5 = np.hstack([ny0c, ny0c, nyx0m, ny0m, ny1m]) # Transition for m_{t+1}
Abar = np.vstack([A1, A2, A3, A4, A5])

# Build B matrix for LSS


Bbar = np.vstack([nk0, nk0, B, F, H])

# Build G matrix for LSS


# Order of observation is: [xt, yt, mt, st, tt]
G1 = np.hstack([nx0c, nx0c, np.eye(nx), nyx0m.T, nyx0m.T]) # Selector for x_{t}
G2 = np.hstack([ny0c, ny0c, nyx0m, ny1m, ny0m]) # Selector for y_{t}
1114 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

G3 = np.hstack([ny0c, ny0c, nyx0m, ny0m, ny1m]) # Selector for martingale


G4 = np.hstack([ny0c, ny0c, -g, ny0m, ny0m]) # Selector for stationary
G5 = np.hstack([ny0c, ν, nyx0m, ny0m, ny0m]) # Selector for trend
Gbar = np.vstack([G1, G2, G3, G4, G5])

# Build H matrix for LSS


Hbar = np.zeros((Gbar.shape[0], nk))

# Build LSS type


x0 = np.hstack([1, 0, nx0r, ny0r, ny0r])
S0 = np.zeros((len(x0), len(x0)))
lss = qe.lss.LinearStateSpace(Abar, Bbar, Gbar, Hbar, mu_0=x0, Sigma_0=S0)

return lss

def additive_decomp(self):
"""
Return values for the martingale decomposition
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (κ_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
I = np.identity(self.nx)
A_res = la.solve(I - self.A, I)
g = self.D @ A_res
H = self.F + self.D @ A_res @ self.B

return self.ν, H, g

def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5)*np.expand_dims(np.diag(H @ H.T), 1)

return ν_tilde, H, g

def loglikelihood_path(self, x, y):


A, B, D, F = self.A, self.B, self.D, self.F
k, T = y.shape
FF = F @ F.T
FFinv = la.inv(FF)
temp = y[:, 1:] - y[:, :-1] - D @ x[:, :-1]
obs = temp * FFinv * temp
obssum = np.cumsum(obs)
scalar = (np.log(la.det(FF)) + k*np.log(2*np.pi))*np.arange(1, T)

return -(.5)*(obssum + scalar)

def loglikelihood(self, x, y):


llh = self.loglikelihood_path(x, y)

return llh[-1]

def plot_additive(self, T, npaths=25, show_trend=True):


"""
Plots for the additive decomposition

"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm

# Allocate space (nm is the number of additive functionals - we want npaths for each)
mpath = np.empty((nm*npaths, T))
mbounds = np.empty((nm*2, T))
spath = np.empty((nm*npaths, T))
sbounds = np.empty((nm*2, T))
tpath = np.empty((nm*npaths, T))
69.4. DYNAMICS 1115

ypath = np.empty((nm*npaths, T))

# Simulate for as long as we wanted


moment_generator = self.lss.moment_sequence()
# Pull out population moments
for t in range (T):
tmoms = next(moment_generator)
ymeans = tmoms[1]
yvar = tmoms[3]

# Lower and upper bounds - for each additive functional


for ii in range(nm):
li, ui = ii*2, (ii+1)*2
madd_dist = norm(ymeans[nx+nm+ii], np.sqrt(yvar[nx+nm+ii, nx+nm+ii]))
mbounds[li:ui, t] = madd_dist.ppf([0.01, .99])

sadd_dist = norm(ymeans[nx+2*nm+ii], np.sqrt(yvar[nx+2*nm+ii, nx+2*nm+ii]))


sbounds[li:ui, t] = sadd_dist.ppf([0.01, .99])

# Pull out paths


for n in range(npaths):
x, y = self.lss.simulate(T)
for ii in range(nm):
ypath[npaths*ii+n, :] = y[nx+ii, :]
mpath[npaths*ii+n, :] = y[nx+nm + ii, :]
spath[npaths*ii+n, :] = y[nx+2*nm + ii, :]
tpath[npaths*ii+n, :] = y[nx+3*nm + ii, :]

add_figs = []

for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
add_figs.append(self.plot_given_paths(T, ypath[li:ui,:], mpath[li:ui,:], spath[li:ui,:],
tpath[li:ui,:], mbounds[LI:UI,:], sbounds[LI:UI,:],
show_trend=show_trend))

add_figs[ii].suptitle(f'Additive decomposition of $y_{ii+1}$', fontsize=14)

return add_figs

def plot_multiplicative(self, T, npaths=25, show_trend=True):


"""
Plots for the multiplicative decomposition

"""
# Pull out right sizes so we know how to increment
nx, nk, nm = self.nx, self.nk, self.nm
# Matrices for the multiplicative decomposition
ν_tilde, H, g = self.multiplicative_decomp()

# Allocate space (nm is the number of functionals - we want npaths for each)
mpath_mult = np.empty((nm*npaths, T))
mbounds_mult = np.empty((nm*2, T))
spath_mult = np.empty((nm*npaths, T))
sbounds_mult = np.empty((nm*2, T))
tpath_mult = np.empty((nm*npaths, T))
ypath_mult = np.empty((nm*npaths, T))

# Simulate for as long as we wanted


moment_generator = self.lss.moment_sequence()
# Pull out population moments
for t in range(T):
tmoms = next(moment_generator)
ymeans = tmoms[1]
yvar = tmoms[3]

# Lower and upper bounds - for each multiplicative functional


for ii in range(nm):
li, ui = ii*2, (ii+1)*2
Mdist = lognorm(np.asscalar(np.sqrt(yvar[nx+nm+ii, nx+nm+ii])),
scale=np.asscalar( np.exp( ymeans[nx+nm+ii]- \
1116 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

t*(.5)*np.expand_dims(np.diag(H @ H.T),1)[ii])))
Sdist = lognorm(np.asscalar(np.sqrt(yvar[nx+2*nm+ii, nx+2*nm+ii])),
scale = np.asscalar( np.exp(-ymeans[nx+2*nm+ii])))
mbounds_mult[li:ui, t] = Mdist.ppf([.01, .99])
sbounds_mult[li:ui, t] = Sdist.ppf([.01, .99])

# Pull out paths


for n in range(npaths):
x, y = self.lss.simulate(T)
for ii in range(nm):
ypath_mult[npaths*ii+n, :] = np.exp(y[nx+ii, :])
mpath_mult[npaths*ii+n, :] = np.exp(y[nx+nm + ii, :] - np.arange(T)*(.5)*np.expand_dim
spath_mult[npaths*ii+n, :] = 1/np.exp(-y[nx+2*nm + ii, :])
tpath_mult[npaths*ii+n, :] = np.exp(y[nx+3*nm + ii, :] + np.arange(T)*(.5)*np.expand_d

mult_figs = []

for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)

mult_figs.append(self.plot_given_paths(T, ypath_mult[li:ui,:], mpath_mult[li:ui,:],


spath_mult[li:ui,:], tpath_mult[li:ui,:],
mbounds_mult[LI:UI,:], sbounds_mult[LI:UI,:], 1,
show_trend=show_trend))
mult_figs[ii].suptitle(f'Multiplicative decomposition of $y_{ii+1}$', fontsize=14)

return mult_figs

def plot_martingales(self, T, npaths=25):

# Pull out right sizes so we know how to increment


nx, nk, nm = self.nx, self.nk, self.nm
# Matrices for the multiplicative decomposition
ν_tilde, H, g = self.multiplicative_decomp()

# Allocate space (nm is the number of functionals - we want npaths for each)
mpath_mult = np.empty((nm*npaths, T))
mbounds_mult = np.empty((nm*2, T))

# Simulate for as long as we wanted


moment_generator = self.lss.moment_sequence()
# Pull out population moments
for t in range (T):
tmoms = next(moment_generator)
ymeans = tmoms[1]
yvar = tmoms[3]

# Lower and upper bounds - for each functional


for ii in range(nm):
li, ui = ii*2, (ii+1)*2
Mdist = lognorm(np.asscalar(np.sqrt(yvar[nx+nm+ii, nx+nm+ii])),
scale=np.asscalar( np.exp( ymeans[nx+nm+ii]- \
t*(.5)*np.expand_dims(np.diag(H @ H.T),1)[ii])))
mbounds_mult[li:ui, t] = Mdist.ppf([.01, .99])

# Pull out paths


for n in range(npaths):
x, y = self.lss.simulate(T)
for ii in range(nm):
mpath_mult[npaths*ii+n, :] = np.exp(y[nx+nm + ii, :] - np.arange(T)*(.5)*np.expand_dim

mart_figs = []

for ii in range(nm):
li, ui = npaths*(ii), npaths*(ii+1)
LI, UI = 2*(ii), 2*(ii+1)
mart_figs.append(self.plot_martingale_paths(T, mpath_mult[li:ui, :],
mbounds_mult[LI:UI, :],
horline=1))
mart_figs[ii].suptitle(f'Martingale components for many paths of $y_{ii+1}$', fontsize=14)

return mart_figs
69.4. DYNAMICS 1117

def plot_given_paths(self, T, ypath, mpath, spath, tpath,


mbounds, sbounds, horline=0, show_trend=True):

# Allocate space
trange = np.arange(T)

# Create figure
fig, ax = plt.subplots(2, 2, sharey=True, figsize=(15, 8))

# Plot all paths together


ax[0, 0].plot(trange, ypath[0, :], label="$y_t$", color="k")
ax[0, 0].plot(trange, mpath[0, :], label="$m_t$", color="m")
ax[0, 0].plot(trange, spath[0, :], label="$s_t$", color="g")
if show_trend:
ax[0, 0].plot(trange, tpath[0, :], label="$t_t$", color="r")
ax[0, 0].axhline(horline, color="k", linestyle="-.")
ax[0, 0].set_title("One Path of All Variables")
ax[0, 0].legend(loc="upper left")

# Plot Martingale Component


ax[0, 1].plot(trange, mpath[0, :], "m")
ax[0, 1].plot(trange, mpath.T, alpha=0.45, color="m")
ub = mbounds[1, :]
lb = mbounds[0, :]
ax[0, 1].fill_between(trange, lb, ub, alpha=0.25, color="m")
ax[0, 1].set_title("Martingale Components for Many Paths")
ax[0, 1].axhline(horline, color="k", linestyle="-.")

# Plot Stationary Component


ax[1, 0].plot(spath[0, :], color="g")
ax[1, 0].plot(spath.T, alpha=0.25, color="g")
ub = sbounds[1, :]
lb = sbounds[0, :]
ax[1, 0].fill_between(trange, lb, ub, alpha=0.25, color="g")
ax[1, 0].axhline(horline, color="k", linestyle="-.")
ax[1, 0].set_title("Stationary Components for Many Paths")

# Plot Trend Component


if show_trend:
ax[1, 1].plot(tpath.T, color="r")
ax[1, 1].set_title("Trend Components for Many Paths")
ax[1, 1].axhline(horline, color="k", linestyle="-.")

return fig

def plot_martingale_paths(self, T, mpath, mbounds,


horline=1, show_trend=False):
# Allocate space
trange = np.arange(T)

# Create figure
fig, ax = plt.subplots(1, 1, figsize=(10, 6))

# Plot Martingale Component


ub = mbounds[1, :]
lb = mbounds[0, :]
ax.fill_between(trange, lb, ub, color="#ffccff")
ax.axhline(horline, color="k", linestyle="-.")
ax.plot(trange, mpath.T, linewidth=0.25, color="#4c4c4c")

return fig

For now, we just plot 𝑦𝑡 and 𝑥𝑡 , postponing until later a description of exactly how we com-
pute them

In [3]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5


σ = 0.01
ν = 0.01 # Growth rate
1118 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

# A matrix should be n x n
A = np.array([[�_1, �_2, �_3, �_4],
[ 1, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0]])

# B matrix should be n x k
B = np.array([[σ, 0, 0, 0]]).T

D = np.array([1, 0, 0, 0]) @ A
F = np.array([1, 0, 0, 0]) @ B

amf = AMF_LSS_VAR(A, B, D, F, ν=ν)

T = 150
x, y = amf.lss.simulate(T)

fig, ax = plt.subplots(2, 1, figsize=(10, 9))

ax[0].plot(np.arange(T), y[amf.nx, :], color='k')


ax[0].set_title('Path of $y_t$')
ax[1].plot(np.arange(T), y[0, :], color='g')
ax[1].axhline(0, color='k', linestyle='-.')
ax[1].set_title('Associated path of $x_t$')
plt.show()

Notice the irregular but persistent growth in 𝑦𝑡


69.4. DYNAMICS 1119

69.4.2 Decomposition

Hansen and Sargent [58] describe how to construct a decomposition of an additive functional
into four parts:

• a constant inherited from initial values 𝑥0 and 𝑦0


• a linear trend
• a martingale
• an (asymptotically) stationary component

To attain this decomposition for the particular class of additive functionals defined by Eq. (1)
and Eq. (2), we first construct the matrices

𝐻 ∶= 𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷
𝑔 ∶= 𝐷′ (𝐼 − 𝐴)−1

Then the Hansen-Scheinkman [60] decomposition is

Martingale component

𝑡 initial conditions
𝑦𝑡 = 𝑡𝜈
⏟ + ∑ 𝐻𝑧𝑗 − 𝑔𝑥
⏟𝑡 + 𝑔⏞
𝑥 0 + 𝑦0
trend component 𝑗=1 stationary component

At this stage, you should pause and verify that 𝑦𝑡+1 − 𝑦𝑡 satisfies Eq. (2)
It is convenient for us to introduce the following notation:

• 𝜏𝑡 = 𝜈𝑡 , a linear, deterministic trend


𝑡
• 𝑚𝑡 = ∑𝑗=1 𝐻𝑧𝑗 , a martingale with time 𝑡 + 1 increment 𝐻𝑧𝑡+1
• 𝑠𝑡 = 𝑔𝑥𝑡 , an (asymptotically) stationary component

We want to characterize and simulate components 𝜏𝑡 , 𝑚𝑡 , 𝑠𝑡 of the decomposition


A convenient way to do this is to construct an appropriate instance of a linear state space
system by using LinearStateSpace from QuantEcon.py
This will allow us to use the routines in LinearStateSpace to study dynamics
To start, observe that, under the dynamics in Eq. (1) and Eq. (2) and with the definitions
just given,

1 1 0 0 0 0 1 0
⎡ 𝑡 + 1 ⎤ ⎡1 1 0 0 0⎤ ⎡ 𝑡 ⎤ ⎡ 0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 𝑥𝑡+1 ⎥ = ⎢0 0 𝐴 0 0⎥ ⎢ 𝑥𝑡 ⎥ + ⎢ 𝐵 ⎥ 𝑧𝑡+1
⎢ 𝑦𝑡+1 ⎥ ⎢𝜈 0 𝐷′ 1 0⎥ ⎢ 𝑦𝑡 ⎥ ⎢ 𝐹 ′ ⎥
⎣𝑚𝑡+1 ⎦ ⎣0 0 0 0 1⎦ ⎣𝑚𝑡 ⎦ ⎣𝐻 ′ ⎦

and
1120 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

𝑥𝑡 0 0 𝐼 0 0 1
⎡ 𝑦 ⎤ ⎡0 0 0 1 0⎤ ⎡ 𝑡 ⎤
⎢ 𝑡⎥ ⎢ ⎥⎢ ⎥
⎢ 𝜏𝑡 ⎥ = ⎢0 𝜈 0 0 0⎥ ⎢ 𝑥𝑡 ⎥
⎢𝑚𝑡 ⎥ ⎢0 0 0 0 1 ⎥ ⎢ 𝑦𝑡 ⎥
⎣ 𝑠𝑡 ⎦ ⎣0 0 −𝑔 0 0⎦ ⎣𝑚𝑡 ⎦

With

1 𝑥𝑡
⎡ 𝑡 ⎤ ⎡𝑦 ⎤
⎢ ⎥ ⎢ 𝑡⎥
𝑥̃ ∶= ⎢ 𝑥𝑡 ⎥ and 𝑦 ̃ ∶= ⎢ 𝜏𝑡 ⎥
⎢ 𝑦𝑡 ⎥ ⎢𝑚𝑡 ⎥
⎣𝑚𝑡 ⎦ ⎣ 𝑠𝑡 ⎦

we can write this as the linear state space system

𝑥𝑡+1
̃ = 𝐴𝑥 ̃ ̃ + 𝐵𝑧
̃ 𝑡+1
𝑡

𝑦𝑡̃ = 𝐷̃ 𝑥𝑡̃

By picking out components of 𝑦𝑡̃ , we can track all variables of interest

69.5 Code

The class AMF_LSS_VAR mentioned above does all that we want to study our additive
functional
In fact, AMF_LSS_VAR does more because it allows us to study an associated multiplicative
functional as well
(A hint that it does more is the name of the class – here AMF stands for “additive and mul-
tiplicative functional” – the code computes and displays objects associated with multiplicative
functionals too)
Let’s use this code (embedded above) to explore the example process described above
If you run the code that first simulated that example again and then the method call you will
generate (modulo randomness) the plot

In [4]: amf.plot_additive(T)
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py:1920: RuntimeWarning: in
lower_bound = self.a * scale + loc
/home/anju/anaconda3/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py:1921: RuntimeWarning: in
upper_bound = self.b * scale + loc
69.5. CODE 1121

When we plot multiple realizations of a component in the 2nd, 3rd, and 4th panels, we also
plot the population 95% probability coverage sets computed using the LinearStateSpace class
We have chosen to simulate many paths, all starting from the same non-random initial condi-
tions 𝑥0 , 𝑦0 (you can tell this from the shape of the 95% probability coverage shaded areas)
Notice tell-tale signs of these probability coverage shaded areas

• the purple one for the martingale component 𝑚𝑡 grows with 𝑡
• the green one for the stationary component 𝑠𝑡 converges to a constant band

69.5.1 Associated Multiplicative Functional

Where {𝑦𝑡 } is our additive functional, let 𝑀𝑡 = exp(𝑦𝑡 )


As mentioned above, the process {𝑀𝑡 } is called a multiplicative functional
Corresponding to the additive decomposition described above we have a multiplicative decom-
position of 𝑀𝑡

𝑡
𝑀𝑡
= exp(𝑡𝜈) exp(∑ 𝐻 ⋅ 𝑍𝑗 ) exp(𝐷′ (𝐼 − 𝐴)−1 𝑥0 − 𝐷′ (𝐼 − 𝐴)−1 𝑥𝑡 )
𝑀0 𝑗=1

or

𝑀𝑡 ̃
𝑀 𝑒(𝑋
̃ 0)
̃ ( 𝑡)(
= exp (𝜈𝑡) )
𝑀0 ̃
𝑀0 𝑒(𝑥
̃ 𝑡)

where

𝑡
𝐻 ⋅𝐻 ̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )), ̃0 = 1
𝜈̃ = 𝜈 + , 𝑀 𝑀
2 𝑗=1
2
1122 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

and

𝑒(𝑥)
̃ = exp[𝑔(𝑥)] = exp[𝐷′ (𝐼 − 𝐴)−1 𝑥]

An instance of class AMF_LSS_VAR includes this associated multiplicative functional as an


attribute
Let’s plot this multiplicative functional for our example
If you run the code that first simulated that example again and then the method call in the
cell below you’ll obtain the graph in the next cell

In [5]: amf.plot_multiplicative(T)
plt.show()

As before, when we plotted multiple realizations of a component in the 2nd, 3rd, and 4th
panels, we also plotted population 95% confidence bands computed using the LinearStateS-
pace class
Comparing this figure and the last also helps show how geometric growth differs from arith-
metic growth
The top right panel of the above graph shows a panel of martingales associated with the
panel of 𝑀𝑡 = exp(𝑦𝑡 ) that we have generated for a limited horizon 𝑇
It is interesting to how the martingale behaves as 𝑇 → +∞
Let’s see what happens when we set 𝑇 = 12000 instead of 150

69.5.2 Peculiar Large Sample Property

̃𝑡 of the multiplicative
Hansen and Sargent [58] (ch. 8) note that the martingale component 𝑀
decomposition
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1123

̃𝑡 = 1 for all 𝑡 ≥ 0, nevertheless …


• While 𝐸0 𝑀
• As 𝑡 → +∞, 𝑀̃𝑡 converges to zero almost surely

̃𝑡 being a multiplicative martingale with initial condition


The first property follows from 𝑀
̃0 = 1
𝑀
The second is the peculiar property noted and proved by Hansen and Sargent [58]
̃𝑡 illustrates both properties
The following simulation of many paths of 𝑀

In [6]: np.random.seed(10021987)
amf.plot_martingales(12000)
plt.show()

The dotted line in the above graph is the mean 𝐸 𝑀̃ 𝑡 = 1 of the martingale
It remains constant at unity, illustrating the first property
The purple 95 percent coverage intervale collapses around zero, illustrating the second prop-
erty

69.6 More About the Multiplicative Martingale

̃𝑡 }∞
Let’s drill down and study probability distribution of the multiplicative martingale {𝑀 𝑡=0
in more detail
As we have seen, it has representation

𝑡
̃𝑡 = exp(∑(𝐻 ⋅ 𝑧𝑗 − 𝐻 ⋅ 𝐻 )),
𝑀 ̃0 = 1
𝑀
𝑗=1
2
1124 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

where 𝐻 = [𝐹 + 𝐵′ (𝐼 − 𝐴′ )−1 𝐷]
̃𝑡 ∼ 𝒩(− 𝑡𝐻⋅𝐻 , 𝑡𝐻 ⋅ 𝐻) and that consequently 𝑀
It follows that log 𝑀 ̃𝑡 is log normal
2

69.6.1 Simulating a Multiplicative Martingale Again

Next, we want a program to simulate the likelihood ratio process {𝑀̃ 𝑡 }∞


𝑡=0

In particular, we want to simulate 5000 sample paths of length 𝑇 for the case in which 𝑥 is a
scalar and [𝐴, 𝐵, 𝐷, 𝐹 ] = [0.8, 0.001, 1.0, 0.01] and 𝜈 = 0.005
After accomplishing this, we want to display and stare at histograms of 𝑀̃ 𝑇𝑖 for various values
of 𝑇
Here is code that accomplishes these tasks

69.6.2 Sample Paths

Let’s write a program to simulate sample paths of {𝑥𝑡 , 𝑦𝑡 }∞


𝑡=0

We’ll do this by formulating the additive functional as a linear state space model and putting
the LinearStateSpace class to work

In [7]: """

@authors: Chase Coleman, Balint Skoze, Tom Sargent

"""

import numpy as np
import scipy as sp
import scipy.linalg as la
import quantecon as qe
import matplotlib.pyplot as plt
from scipy.stats import lognorm

class AMF_LSS_VAR:
"""
This class is written to transform a scalar additive functional
into a linear state space system.
"""
def __init__(self, A, B, D, F=0.0, ν=0.0):
# Unpack required elements
self.A, self.B, self.D, self.F, self.ν = A, B, D, F, ν

# Create space for additive decomposition


self.add_decomp = None
self.mult_decomp = None

# Construct BIG state space representation


self.lss = self.construct_ss()

def construct_ss(self):
"""
This creates the state space representation that can be passed
into the quantecon LSS class.
"""
# Pull out useful info
A, B, D, F, ν = self.A, self.B, self.D, self.F, self.ν
nx, nk, nm = 1, 1, 1
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1125

if self.add_decomp:
ν, H, g = self.add_decomp
else:
ν, H, g = self.additive_decomp()

# Build A matrix for LSS


# Order of states is: [1, t, xt, yt, mt]
A1 = np.hstack([1, 0, 0, 0, 0]) # Transition for 1
A2 = np.hstack([1, 1, 0, 0, 0]) # Transition for t
A3 = np.hstack([0, 0, A, 0, 0]) # Transition for x_{t+1}
A4 = np.hstack([ν, 0, D, 1, 0]) # Transition for y_{t+1}
A5 = np.hstack([0, 0, 0, 0, 1]) # Transition for m_{t+1}
Abar = np.vstack([A1, A2, A3, A4, A5])

# Build B matrix for LSS


Bbar = np.vstack([0, 0, B, F, H])

# Build G matrix for LSS


# Order of observation is: [xt, yt, mt, st, tt]
G1 = np.hstack([0, 0, 1, 0, 0]) # Selector for x_{t}
G2 = np.hstack([0, 0, 0, 1, 0]) # Selector for y_{t}
G3 = np.hstack([0, 0, 0, 0, 1]) # Selector for martingale
G4 = np.hstack([0, 0, -g, 0, 0]) # Selector for stationary
G5 = np.hstack([0, ν, 0, 0, 0]) # Selector for trend
Gbar = np.vstack([G1, G2, G3, G4, G5])

# Build H matrix for LSS


Hbar = np.zeros((1, 1))

# Build LSS type


x0 = np.hstack([1, 0, 0, 0, 0])
S0 = np.zeros((5, 5))
lss = qe.lss.LinearStateSpace(Abar, Bbar, Gbar, Hbar, mu_0=x0, Sigma_0=S0)

return lss

def additive_decomp(self):
"""
Return values for the martingale decomposition (Proposition 4.3.3.)
- ν : unconditional mean difference in Y
- H : coefficient for the (linear) martingale component (kappa_a)
- g : coefficient for the stationary component g(x)
- Y_0 : it should be the function of X_0 (for now set it to 0.0)
"""
A_res = 1 / (1 - self.A)
g = self.D * A_res
H = self.F + self.D * A_res * self.B

return self.ν, H, g

def multiplicative_decomp(self):
"""
Return values for the multiplicative decomposition (Example 5.4.4.)
- ν_tilde : eigenvalue
- H : vector for the Jensen term
"""
ν, H, g = self.additive_decomp()
ν_tilde = ν + (.5) * H**2

return ν_tilde, H, g

def loglikelihood_path(self, x, y):


A, B, D, F = self.A, self.B, self.D, self.F
T = y.T.size
FF = F**2
FFinv = 1 / FF
temp = y[1:] - y[:-1] - D * x[:-1]
obs = temp * FFinv * temp
obssum = np.cumsum(obs)
scalar = (np.log(FF) + np.log(2 * np.pi)) * np.arange(1, T)

return (-0.5) * (obssum + scalar)


1126 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

def loglikelihood(self, x, y):


llh = self.loglikelihood_path(x, y)

return llh[-1]

The heavy lifting is done inside the AMF_LSS_VAR class


The following code adds some simple functions that make it straightforward to generate sam-
ple paths from an instance of AMF_LSS_VAR

In [8]: def simulate_xy(amf, T):


"Simulate individual paths."
foo, bar = amf.lss.simulate(T)
x = bar[0, :]
y = bar[1, :]

return x, y

def simulate_paths(amf, T=150, I=5000):


"Simulate multiple independent paths."

# Allocate space
storeX = np.empty((I, T))
storeY = np.empty((I, T))

for i in range(I):
# Do specific simulation
x, y = simulate_xy(amf, T)

# Fill in our storage matrices


storeX[i, :] = x
storeY[i, :] = y

return storeX, storeY

def population_means(amf, T=150):


# Allocate Space
xmean = np.empty(T)
ymean = np.empty(T)

# Pull out moment generator


moment_generator = amf.lss.moment_sequence()

for tt in range (T):


tmoms = next(moment_generator)
ymeans = tmoms[1]
xmean[tt] = ymeans[0]
ymean[tt] = ymeans[1]

return xmean, ymean

Now that we have these functions in our took kit, let’s apply them to run some simulations

In [9]: def simulate_martingale_components(amf, T=1000, I=5000):


# Get the multiplicative decomposition
ν, H, g = amf.multiplicative_decomp()

# Allocate space
add_mart_comp = np.empty((I, T))

# Simulate and pull out additive martingale component


for i in range(I):
foo, bar = amf.lss.simulate(T)

# Martingale component is third component


add_mart_comp[i, :] = bar[2, :]

mul_mart_comp = np.exp(add_mart_comp - (np.arange(T) * H**2) / 2)


69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1127

return add_mart_comp, mul_mart_comp

# Build model
amf_2 = AMF_LSS_VAR(0.8, 0.001, 1.0, 0.01,.005)

amc, mmc = simulate_martingale_components(amf_2, 1000, 5000)

amcT = amc[:, -1]


mmcT = mmc[:, -1]

print("The (min, mean, max) of additive Martingale component in period T is")


print(f"\t ({np.min(amcT)}, {np.mean(amcT)}, {np.max(amcT)})")

print("The (min, mean, max) of multiplicative Martingale component in period T is")


print(f"\t ({np.min(mmcT)}, {np.mean(mmcT)}, {np.max(mmcT)})")

The (min, mean, max) of additive Martingale component in period T is


(-1.8379907335579106, 0.011040789361757435, 1.4697384727035145)
The (min, mean, max) of multiplicative Martingale component in period T is
(0.14222026893384476, 1.006753060146832, 3.8858858377907133)

̃𝑡 for 𝑡 = 100, 500, 1000, 10000, 100000


Let’s plot the probability density functions for log 𝑀
Then let’s use the plots to investigate how these densities evolve through time
̃𝑡 for different values of 𝑡
We will plot the densities of log 𝑀
Note: scipy.stats.lognorm expects you to pass the standard deviation first (𝑡𝐻 ⋅ 𝐻) and
then the exponent of the mean as a keyword argument scale (scale=np.exp(-t * H2 /
2))

• See the documentation here

This is peculiar, so make sure you are careful in working with the log normal distribution
Here is some code that tackles these tasks

In [10]: def Mtilde_t_density(amf, t, xmin=1e-8, xmax=5.0, npts=5000):

# Pull out the multiplicative decomposition


νtilde, H, g = amf.multiplicative_decomp()
H2 = H * H

# The distribution
mdist = lognorm(np.sqrt(t * H2), scale=np.exp(-t * H2 / 2))
x = np.linspace(xmin, xmax, npts)
pdf = mdist.pdf(x)

return x, pdf

def logMtilde_t_density(amf, t, xmin=-15.0, xmax=15.0, npts=5000):

# Pull out the multiplicative decomposition


νtilde, H, g = amf.multiplicative_decomp()
H2 = H * H

# The distribution
lmdist = norm(-t * H2 / 2, np.sqrt(t * H2))
x = np.linspace(xmin, xmax, npts)
pdf = lmdist.pdf(x)

return x, pdf
1128 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

times_to_plot = [10, 100, 500, 1000, 2500, 5000]


dens_to_plot = map(lambda t: Mtilde_t_density(amf_2, t, xmin=1e-8, xmax=6.0), times_to_plot)
ldens_to_plot = map(lambda t: logMtilde_t_density(amf_2, t, xmin=-10.0, xmax=10.0), times_to_plot)

fig, ax = plt.subplots(3, 2, figsize=(8, 14))


ax = ax.flatten()

fig.suptitle(r"Densities of $\tilde{M}_t$", fontsize=18, y=1.02)


for (it, dens_t) in enumerate(dens_to_plot):
x, pdf = dens_t
ax[it].set_title(f"Density for time {times_to_plot[it]}")
ax[it].fill_between(x, np.zeros_like(pdf), pdf)

plt.tight_layout()
plt.show()
69.6. MORE ABOUT THE MULTIPLICATIVE MARTINGALE 1129
1130 69. ADDITIVE AND MULTIPLICATIVE FUNCTIONALS

These probability density functions help us understand mechanics underlying the peculiar
property of our multiplicative martingale

• As 𝑇 grows, most of the probability mass shifts leftward toward zero –


• for example, note that most mass is near 1 for 𝑇 = 10 or 𝑇 = 100 but most of it is near
0 for 𝑇 = 5000
̃𝑇 lengthens toward the right
• As 𝑇 grows, the tail of the density of 𝑀
• Enough mass moves toward the right tail to keep 𝐸 𝑀 ̃𝑇 = 1 even as most mass in the
̃
distribution of 𝑀𝑇 collapses around 0

69.6.3 Multiplicative Martingale as Likelihood Ratio Process

A forthcoming lecture studies likelihood processes and likelihood ratio processes


A likelihood ratio process is defined as a multiplicative martingale with mean unity
Likelihood ratio processes exhibit the peculiar property discussed here
We’ll discuss how to interpret that property in the forthcoming lecture
70

Classical Control with Linear


Algebra

70.1 Contents

• Overview 70.2

• A Control Problem 70.3

• Finite Horizon Theory 70.4

• The Infinite Horizon Limit 70.5

• Undiscounted Problems 70.6

• Implementation 70.7

• Exercises 70.8

70.2 Overview

In an earlier lecture Linear Quadratic Dynamic Programming Problems, we have studied


how to solve a special class of dynamic optimization and prediction problems by applying the
method of dynamic programming. In this class of problems

• the objective function is quadratic in states and controls


• the one-step transition function is linear
• shocks are IID Gaussian or martingale differences

In this lecture and a companion lecture Classical Filtering with Linear Algebra, we study the
classical theory of linear-quadratic (LQ) optimal control problems.
The classical approach does not use the two closely related methods – dynamic programming
and Kalman filtering – that we describe in other lectures, namely, Linear Quadratic Dynamic
Programming Problems and A First Look at the Kalman Filter
Instead, they use either

1131
1132 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

• 𝑧-transform and lag operator methods, or


• matrix decompositions applied to linear systems of first-order conditions for
optimum problems.

In this lecture and the sequel Classical Filtering with Linear Algebra, we mostly rely on ele-
mentary linear algebra
The main tool from linear algebra we’ll put to work here is LU decomposition
We’ll begin with discrete horizon problems
Then we’ll view infinite horizon problems as appropriate limits of these finite horizon prob-
lems
Later, we will examine the close connection between LQ control and least-squares prediction
and filtering problems
These classes of problems are connected in the sense that to solve each, essentially the same
mathematics is used

70.2.1 References

Useful references include [133], [55], [101], [9], and [98]

70.3 A Control Problem

Let 𝐿 be the lag operator, so that, for sequence {𝑥𝑡 } we have 𝐿𝑥𝑡 = 𝑥𝑡−1
More generally, let 𝐿𝑘 𝑥𝑡 = 𝑥𝑡−𝑘 with 𝐿0 𝑥𝑡 = 𝑥𝑡 and

𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚

where 𝑑0 , 𝑑1 , … , 𝑑𝑚 is a given scalar sequence


Consider the discrete-time control problem

𝑁
1 2 1 2
max lim ∑ 𝛽 𝑡 {𝑎𝑡 𝑦𝑡 − ℎ𝑦 − [𝑑(𝐿)𝑦𝑡 ] } , (1)
{𝑦𝑡 } 𝑁→∞
𝑡=0
2 𝑡 2

where

• ℎ is a positive parameter and 𝛽 ∈ (0, 1) is a discount factor


• {𝑎𝑡 }𝑡≥0 is a sequence of exponential order less than 𝛽 −1/2 , by which we mean
𝑡
lim𝑡→∞ 𝛽 2 𝑎𝑡 = 0

Maximization in Eq. (1) is subject to initial conditions for 𝑦−1 , 𝑦−2 … , 𝑦−𝑚
Maximization is over infinite sequences {𝑦𝑡 }𝑡≥0
70.4. FINITE HORIZON THEORY 1133

70.3.1 Example

The formulation of the LQ problem given above is broad enough to encompass many useful
models
As a simple illustration, recall that in LQ Dynamic Programming Problems we consider a
monopolist facing stochastic demand shocks and adjustment costs
Let’s consider a deterministic version of this problem, where the monopolist maximizes the
discounted sum


∑ 𝛽 𝑡 𝜋𝑡
𝑡=0

and

𝜋𝑡 = 𝑝𝑡 𝑞𝑡 − 𝑐𝑞𝑡 − 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 with 𝑝𝑡 = 𝛼0 − 𝛼1 𝑞𝑡 + 𝑑𝑡

In this expression, 𝑞𝑡 is output, 𝑐 is average cost of production, and 𝑑𝑡 is a demand shock


The term 𝛾(𝑞𝑡+1 − 𝑞𝑡 )2 represents adjustment costs
You will be able to confirm that the objective function can be rewritten as Eq. (1) when

• 𝑎𝑡 ∶= 𝛼0 + 𝑑𝑡 − 𝑐
• ℎ ∶= 2𝛼1

• 𝑑(𝐿) ∶= 2𝛾(𝐼 − 𝐿)

Further examples of this problem for factor demand, economic growth, and government policy
problems are given in ch. IX of [118]

70.4 Finite Horizon Theory

We first study a finite 𝑁 version of the problem


Later we will study an infinite horizon problem solution as a limiting version of a finite hori-
zon problem
(This will require being careful because the limits as 𝑁 → ∞ of the necessary and suffi-
cient conditions for maximizing finite 𝑁 versions of Eq. (1) are not sufficient for maximizing
Eq. (1))
We begin by

1. fixing 𝑁 > 𝑚,
2. differentiating the finite version of Eq. (1) with respect to 𝑦0 , 𝑦1 , … , 𝑦𝑁 , and
3. setting these derivatives to zero

For 𝑡 = 0, … , 𝑁 − 𝑚 these first-order necessary conditions are the Euler equations


For 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 , the first-order conditions are a set of terminal conditions
Consider the term
1134 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

𝑁
𝐽 = ∑ 𝛽 𝑡 [𝑑(𝐿)𝑦𝑡 ][𝑑(𝐿)𝑦𝑡 ]
𝑡=0
𝑁
= ∑ 𝛽 𝑡 (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 ) (𝑑0 𝑦𝑡 + 𝑑1 𝑦𝑡−1 + ⋯ + 𝑑𝑚 𝑦𝑡−𝑚 )
𝑡=0

Differentiating 𝐽 with respect to 𝑦𝑡 for 𝑡 = 0, 1, … , 𝑁 − 𝑚 gives

𝜕𝐽
= 2𝛽 𝑡 𝑑0 𝑑(𝐿)𝑦𝑡 + 2𝛽 𝑡+1 𝑑1 𝑑(𝐿)𝑦𝑡+1 + ⋯ + 2𝛽 𝑡+𝑚 𝑑𝑚 𝑑(𝐿)𝑦𝑡+𝑚
𝜕𝑦𝑡
= 2𝛽 𝑡 (𝑑0 + 𝑑1 𝛽𝐿−1 + 𝑑2 𝛽 2 𝐿−2 + ⋯ + 𝑑𝑚 𝛽 𝑚 𝐿−𝑚 ) 𝑑(𝐿)𝑦𝑡

We can write this more succinctly as

𝜕𝐽
= 2𝛽 𝑡 𝑑(𝛽𝐿−1 ) 𝑑(𝐿)𝑦𝑡 (2)
𝜕𝑦𝑡

Differentiating 𝐽 with respect to 𝑦𝑡 for 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 gives

𝜕𝐽
= 2𝛽 𝑁 𝑑0 𝑑(𝐿)𝑦𝑁
𝜕𝑦𝑁
𝜕𝐽
= 2𝛽 𝑁−1 [𝑑0 + 𝛽 𝑑1 𝐿−1 ] 𝑑(𝐿)𝑦𝑁−1
𝜕𝑦𝑁−1 (3)
⋮ ⋮
𝜕𝐽
= 2𝛽 𝑁−𝑚+1 [𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 ]𝑑(𝐿)𝑦𝑁−𝑚+1
𝜕𝑦𝑁−𝑚+1

With these preliminaries under our belts, we are ready to differentiate Eq. (1)
Differentiating Eq. (1) with respect to 𝑦𝑡 for 𝑡 = 0, … , 𝑁 − 𝑚 gives the Euler equations

[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑(𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 𝑚 (4)

The system of equations Eq. (4) forms a 2 × 𝑚 order linear difference equation that must hold
for the values of 𝑡 indicated.
Differentiating Eq. (1) with respect to 𝑦𝑡 for 𝑡 = 𝑁 − 𝑚 + 1, … , 𝑁 gives the terminal condi-
tions

𝛽 𝑁 (𝑎𝑁 − ℎ𝑦𝑁 − 𝑑0 𝑑(𝐿)𝑦𝑁 ) = 0


𝛽 𝑁−1 (𝑎𝑁−1 − ℎ𝑦𝑁−1 − (𝑑0 + 𝛽 𝑑1 𝐿−1 ) 𝑑(𝐿) 𝑦𝑁−1 ) = 0
⋮⋮

𝛽 𝑁−𝑚+1 (𝑎𝑁−𝑚+1 − ℎ𝑦𝑁−𝑚+1 − (𝑑0 + 𝛽𝐿−1 𝑑1 + ⋯ + 𝛽 𝑚−1 𝐿−𝑚+1 𝑑𝑚−1 )𝑑(𝐿)𝑦𝑁−𝑚+1 ) = 0


(5)
In the finite 𝑁 problem, we want simultaneously to solve Eq. (4) subject to the 𝑚 initial con-
ditions 𝑦−1 , … , 𝑦−𝑚 and the 𝑚 terminal conditions Eq. (5)
70.4. FINITE HORIZON THEORY 1135

These conditions uniquely pin down the solution of the finite 𝑁 problem
That is, for the finite 𝑁 problem, conditions Eq. (4) and Eq. (5) are necessary and sufficient
for a maximum, by concavity of the objective function
Next, we describe how to obtain the solution using matrix methods

70.4.1 Matrix Methods

Let’s look at how linear algebra can be used to tackle and shed light on the finite horizon LQ
control problem
A Single Lag Term
Let’s begin with the special case in which 𝑚 = 1
We want to solve the system of 𝑁 + 1 linear equations

[ℎ + 𝑑 (𝛽𝐿−1 ) 𝑑 (𝐿)]𝑦𝑡 = 𝑎𝑡 , 𝑡 = 0, 1, … , 𝑁 − 1
𝑁
(6)
𝛽 [𝑎𝑁 − ℎ 𝑦𝑁 − 𝑑0 𝑑 (𝐿)𝑦𝑁 ] = 0

where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿
These equations are to be solved for 𝑦0 , 𝑦1 , … , 𝑦𝑁 as functions of 𝑎0 , 𝑎1 , … , 𝑎𝑁 and 𝑦−1
Let

𝜙(𝐿) = 𝜙0 + 𝜙1 𝐿 + 𝛽𝜙1 𝐿−1 = ℎ + 𝑑(𝛽𝐿−1 )𝑑(𝐿) = (ℎ + 𝑑02 + 𝑑12 ) + 𝑑1 𝑑0 𝐿 + 𝑑1 𝑑0 𝛽𝐿−1

Then we can represent Eq. (6) as the matrix equation

(𝜙0 − 𝑑12 ) 𝜙1 0 0 … … 0 𝑦𝑁 𝑎𝑁
⎡ 𝛽𝜙 𝜙 𝜙 0 … … 0 ⎤ ⎡𝑦 ⎤ ⎡ 𝑎 ⎤
⎢ 1 0 1 ⎥ ⎢ 𝑁−1 ⎥ ⎢ 𝑁−1 ⎥
⎢ 0 𝛽𝜙1 𝜙0 𝜙1 … … 0 ⎥ ⎢𝑦𝑁−2 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⎥ ⎢ ⎥=⎢ ⎥ (7)
⎢ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ 0 … … … 𝛽𝜙1 𝜙0 𝜙1 ⎥ ⎢ 𝑦1 ⎥ ⎢ 𝑎1 ⎥
⎣ 0 … … … 0 𝛽𝜙 1 𝜙 0⎦ ⎣ 𝑦0 ⎦ 𝑎
⎣ 0 − 𝜙 𝑦
1 −1 ⎦

or

𝑊 𝑦 ̄ = 𝑎̄ (8)

Notice how we have chosen to arrange the 𝑦𝑡 ’s in reverse time order.


The matrix 𝑊 on the left side of Eq. (7) is “almost” a Toeplitz matrix (where each descend-
ing diagonal is constant)
There are two sources of deviation from the form of a Toeplitz matrix

1. The first element differs from the remaining diagonal elements, reflecting the terminal
condition
2. The sub-diagonal elements equal 𝛽 time the super-diagonal elements

The solution of Eq. (8) can be expressed in the form


1136 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

𝑦 ̄ = 𝑊 −1 𝑎̄ (9)

which represents each element 𝑦𝑡 of 𝑦 ̄ as a function of the entire vector 𝑎̄


That is, 𝑦𝑡 is a function of past, present, and future values of 𝑎’s, as well as of the initial con-
dition 𝑦−1
An Alternative Representation
An alternative way to express the solution to Eq. (7) or Eq. (8) is in so-called feedback-
feedforward form
The idea here is to find a solution expressing 𝑦𝑡 as a function of past 𝑦’s and current and fu-
ture 𝑎’s
To achieve this solution, one can use an LU decomposition of 𝑊
There always exists a decomposition of 𝑊 of the form 𝑊 = 𝐿𝑈 where

• 𝐿 is an (𝑁 + 1) × (𝑁 + 1) lower triangular matrix


• 𝑈 is an (𝑁 + 1) × (𝑁 + 1) upper triangular matrix.

The factorization can be normalized so that the diagonal elements of 𝑈 are unity
Using the LU representation in Eq. (9), we obtain

𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (10)

Since 𝐿−1 is lower triangular, this representation expresses 𝑦𝑡 as a function of

• lagged 𝑦’s (via the term 𝑈 𝑦),


̄ and
• current and future 𝑎’s (via the term 𝐿−1 𝑎)̄

Because there are zeros everywhere in the matrix on the left of Eq. (7) except on the diago-
nal, super-diagonal, and sub-diagonal, the 𝐿𝑈 decomposition takes

• 𝐿 to be zero except in the diagonal and the leading sub-diagonal


• 𝑈 to be zero except on the diagonal and the super-diagonal

Thus, Eq. (10) has the form

1 𝑈12 0 0 … 0 0 𝑦𝑁
⎡0 1 𝑈 0 … 0 0 ⎤ ⎡𝑦 ⎤
⎢ 23 ⎥ ⎢ 𝑁−1 ⎥
⎢0 0 1 𝑈34 … 0 0 ⎥ ⎢𝑦𝑁−2 ⎥
⎢0 0 0 1 … 0 0 ⎥ ⎢𝑦𝑁−3 ⎥ =
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢0 0 0 0 … 1 𝑈𝑁,𝑁+1 ⎥ ⎢ 𝑦1 ⎥
⎣0 0 0 0 … 0 1 ⎦ ⎣ 𝑦0 ⎦
70.4. FINITE HORIZON THEORY 1137

𝐿−1
11 0 0 … 0 𝑎𝑁
⎡ 𝐿−1 𝐿−1
0 … 0 ⎤⎡ 𝑎 ⎤
⎢ 21
−1
22
−1
⎥⎢ 𝑁−1 ⎥
⎢ 𝐿31 𝐿32 𝐿−1
33 … 0 ⎥ ⎢ 𝑎𝑁−2 ⎥
⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎢ ⋮ ⎥
⎢ −1 −1 −1
⎥⎢ ⎥
⎢ 𝐿𝑁,1 𝐿𝑁,2 𝐿𝑁,3 … 0 ⎥⎢ 𝑎1 ⎥
−1 −1 −1
𝐿
⎣ 𝑁+1,1 𝐿𝑁+1,2 𝐿𝑁+1,3 … 𝐿−1
𝑁+1 𝑁+1 ⎦ ⎣𝑎 0 − 𝜙 𝑦
1 −1 ⎦

where 𝐿−1
𝑖𝑗 is the (𝑖, 𝑗) element of 𝐿
−1
and 𝑈𝑖𝑗 is the (𝑖, 𝑗) element of 𝑈
Note how the left side for a given 𝑡 involves 𝑦𝑡 and one lagged value 𝑦𝑡−1 while the right side
involves all future values of the forcing process 𝑎𝑡 , 𝑎𝑡+1 , … , 𝑎𝑁
Additional Lag Terms
We briefly indicate how this approach extends to the problem with 𝑚 > 1
Assume that 𝛽 = 1 and let 𝐷𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) symmetric matrix whose elements
are determined from the following formula:

𝐷𝑗𝑘 = 𝑑0 𝑑𝑘−𝑗 + 𝑑1 𝑑𝑘−𝑗+1 + … + 𝑑𝑗−1 𝑑𝑘−1 , 𝑘≥𝑗

Let 𝐼𝑚+1 be the (𝑚 + 1) × (𝑚 + 1) identity matrix


Let 𝜙𝑗 be the coefficients in the expansion 𝜙(𝐿) = ℎ + 𝑑(𝐿−1 )𝑑(𝐿)
Then the first order conditions Eq. (4) and Eq. (5) can be expressed as:

𝑦𝑁 𝑎𝑁 𝑦𝑁−𝑚+1
⎡𝑦 ⎤ ⎡𝑎 ⎤ ⎡𝑦 ⎤
(𝐷𝑚+1 + ℎ𝐼𝑚+1 ) ⎢ 𝑁−1 ⎥ = ⎢ 𝑁−1 ⎥ + 𝑀 ⎢ 𝑁−𝑚−2 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑦𝑁−𝑚 ⎦ ⎣𝑎𝑁−𝑚 ⎦ ⎣ 𝑦𝑁−2𝑚 ⎦

where 𝑀 is (𝑚 + 1) × 𝑚 and

𝐷𝑖−𝑗, 𝑚+1 for 𝑖 > 𝑗


𝑀𝑖𝑗 = {
0 for 𝑖 ≤ 𝑗

𝜙𝑚 𝑦𝑁−1 + 𝜙𝑚−1 𝑦𝑁−2 + … + 𝜙0 𝑦𝑁−𝑚−1 + 𝜙1 𝑦𝑁−𝑚−2 +


… + 𝜙𝑚 𝑦𝑁−2𝑚−1 = 𝑎𝑁−𝑚−1
𝜙𝑚 𝑦𝑁−2 + 𝜙𝑚−1 𝑦𝑁−3 + … + 𝜙0 𝑦𝑁−𝑚−2 + 𝜙1 𝑦𝑁−𝑚−3 +
… + 𝜙𝑚 𝑦𝑁−2𝑚−2 = 𝑎𝑁−𝑚−2

𝜙𝑚 𝑦𝑚+1 + 𝜙𝑚−1 𝑦𝑚 + + … + 𝜙0 𝑦1 + 𝜙1 𝑦0 + 𝜙𝑚 𝑦−𝑚+1 = 𝑎1
𝜙𝑚 𝑦𝑚 + 𝜙𝑚−1 𝑦𝑚−1 + 𝜙𝑚−2 + … + 𝜙0 𝑦0 + 𝜙1 𝑦−1 + … + 𝜙𝑚 𝑦−𝑚 = 𝑎0

As before, we can express this equation as 𝑊 𝑦 ̄ = 𝑎̄


The matrix on the left of this equation is “almost” Toeplitz, the exception being the leading
𝑚 × 𝑚 submatrix in the upper left-hand corner
We can represent the solution in feedback-feedforward form by obtaining a decomposition
𝐿𝑈 = 𝑊 , and obtain
1138 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ (11)

𝑡 𝑁−𝑡
∑ 𝑈−𝑡+𝑁+1, −𝑡+𝑁+𝑗+1 𝑦𝑡−𝑗 = ∑ 𝐿−𝑡+𝑁+1, −𝑡+𝑁+1−𝑗 𝑎𝑡+𝑗
̄ ,
𝑗=0 𝑗=0

𝑡 = 0, 1, … , 𝑁

where 𝐿−1
𝑡,𝑠 is the element in the (𝑡, 𝑠) position of 𝐿, and similarly for 𝑈

The left side of equation Eq. (11) is the “feedback” part of the optimal control law for 𝑦𝑡 ,
while the right-hand side is the “feedforward” part
We note that there is a different control law for each 𝑡
Thus, in the finite horizon case, the optimal control law is time-dependent
It is natural to suspect that as 𝑁 → ∞, Eq. (11) becomes equivalent to the solution of our
infinite horizon problem, which below we shall show can be expressed as

𝑐(𝐿)𝑦𝑡 = 𝑐(𝛽𝐿−1 )−1 𝑎𝑡 ,

−1
so that as 𝑁 → ∞ we expect that for each fixed 𝑡, 𝑈𝑡,𝑡−𝑗 → 𝑐𝑗 and 𝐿𝑡,𝑡+𝑗 approaches the
−𝑗 −1 −1
coefficient on 𝐿 in the expansion of 𝑐(𝛽𝐿 )
This suspicion is true under general conditions that we shall study later
For now, we note that by creating the matrix 𝑊 for large 𝑁 and factoring it into the 𝐿𝑈
form, good approximations to 𝑐(𝐿) and 𝑐(𝛽𝐿−1 )−1 can be obtained

70.5 The Infinite Horizon Limit

For the infinite horizon problem, we propose to discover first-order necessary conditions by
taking the limits of Eq. (4) and Eq. (5) as 𝑁 → ∞
This approach is valid, and the limits of Eq. (4) and Eq. (5) as 𝑁 approaches infinity are
first-order necessary conditions for a maximum
However, for the infinite horizon problem with 𝛽 < 1, the limits of Eq. (4) and Eq. (5) are, in
general, not sufficient for a maximum
That is, the limits of Eq. (5) do not provide enough information uniquely to determine the
solution of the Euler equation Eq. (4) that maximizes Eq. (1)
As we shall see below, a side condition on the path of 𝑦𝑡 that together with Eq. (4) is suffi-
cient for an optimum is


∑ 𝛽 𝑡 ℎ𝑦𝑡2 < ∞ (12)
𝑡=0

All paths that satisfy the Euler equations, except the one that we shall select below, violate
this condition and, therefore, evidently lead to (much) lower values of Eq. (1) than does the
optimal path selected by the solution procedure below
Consider the characteristic equation associated with the Euler equation
70.5. THE INFINITE HORIZON LIMIT 1139

ℎ + 𝑑 (𝛽𝑧 −1 ) 𝑑 (𝑧) = 0 (13)

Notice that if 𝑧 ̃ is a root of equation Eq. (13), then so is 𝛽 𝑧−1


̃
Thus, the roots of Eq. (13) come in “𝛽-reciprocal” pairs
Assume that the roots of Eq. (13) are distinct
Let the roots be, in descending order according to their moduli, 𝑧1 , 𝑧2 , … , 𝑧2𝑚
From
√ the reciprocal pairs √
property and the assumption of distinct roots, it follows that |𝑧𝑗 | >
𝛽 for 𝑗 ≤ 𝑚 and |𝑧𝑗 | < 𝛽 for 𝑗 > 𝑚
−1
It also follows that 𝑧2𝑚−𝑗 = 𝛽𝑧𝑗+1 , 𝑗 = 0, 1, … , 𝑚 − 1
Therefore, the characteristic polynomial on the left side of Eq. (13) can be expressed as

ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = 𝑧 −𝑚 𝑧0 (𝑧 − 𝑧1 ) ⋯ (𝑧 − 𝑧𝑚 )(𝑧 − 𝑧𝑚+1 ) ⋯ (𝑧 − 𝑧2𝑚 )


(14)
= 𝑧 −𝑚 𝑧0 (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) ⋯ (𝑧 − 𝑧𝑚 )(𝑧 − 𝛽𝑧𝑚
−1
) ⋯ (𝑧 − 𝛽𝑧2−1 )(𝑧 − 𝛽𝑧1−1 )

where 𝑧0 is a constant
1 𝛽 −1
In Eq. (14), we substitute (𝑧 − 𝑧𝑗 ) = −𝑧𝑗 (1 − 𝑧𝑗 𝑧) and (𝑧 − 𝛽𝑧𝑗−1 ) = 𝑧(1 − 𝑧𝑗 𝑧 ) for 𝑗 =
1, … , 𝑚 to get

1 1 1 1
ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = (−1)𝑚 (𝑧0 𝑧1 ⋯ 𝑧𝑚 )(1 − 𝑧) ⋯ (1 − 𝑧)(1 − 𝛽𝑧−1 ) ⋯ (1 − 𝛽𝑧 −1 )
𝑧1 𝑧𝑚 𝑧1 𝑧𝑚

𝑚
Now define 𝑐(𝑧) = ∑𝑗=0 𝑐𝑗 𝑧𝑗 as

1/2 𝑧 𝑧 𝑧
𝑐 (𝑧) = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] (1 − ) (1 − ) ⋯ (1 − ) (15)
𝑧1 𝑧2 𝑧𝑚

Notice that Eq. (14) can be written

ℎ + 𝑑 (𝛽𝑧 −1 ) 𝑑 (𝑧) = 𝑐 (𝛽𝑧 −1 ) 𝑐 (𝑧) (16)

It is useful to write Eq. (15) as

𝑐(𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧) (17)

where

1/2 1
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 ⋯ 𝑧𝑚 ] ; 𝜆𝑗 = , 𝑗 = 1, … , 𝑚
𝑧𝑗
√ √
Since |𝑧𝑗 | > 𝛽 for 𝑗 = 1, … , 𝑚 it follows that |𝜆𝑗 | < 1/ 𝛽 for 𝑗 = 1, … , 𝑚
Using Eq. (17), we can express the factorization Eq. (16) as

ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) = 𝑐02 (1 − 𝜆1 𝑧) ⋯ (1 − 𝜆𝑚 𝑧)(1 − 𝜆1 𝛽𝑧−1 ) ⋯ (1 − 𝜆𝑚 𝛽𝑧−1 )


1140 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

In sum, we have constructed a factorization Eq. (16) of the characteristic polynomial for the
Euler equation in which the zeros of 𝑐(𝑧) exceed 𝛽 1/2 in modulus, and the zeros of 𝑐 (𝛽𝑧 −1 )
are less than 𝛽 1/2 in modulus
Using Eq. (16), we now write the Euler equation as

𝑐(𝛽𝐿−1 ) 𝑐 (𝐿) 𝑦𝑡 = 𝑎𝑡

The unique solution of the Euler equation that satisfies condition Eq. (12) is

𝑐(𝐿) 𝑦𝑡 = 𝑐 (𝛽𝐿−1 )−1 𝑎𝑡 (18)

This can be established by using an argument paralleling that in chapter IX of [118]


To exhibit the solution in a form paralleling that of [118], we use Eq. (17) to write Eq. (18) as

𝑐0−2 𝑎𝑡
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = (19)
(1 − 𝛽𝜆1 𝐿−1 ) ⋯ (1 − 𝛽𝜆𝑚 𝐿−1 )

Using partial fractions, we can write the characteristic polynomial on the right side of
Eq. (19) as

𝑚
𝐴𝑗 𝑐0−2
∑ where 𝐴𝑗 ∶= 𝜆𝑖
1 − 𝜆𝑗 𝛽𝐿−1 ∏𝑖≠𝑗 (1 −
𝑗=1 𝜆𝑗 )

Then Eq. (19) can be written

𝑚
𝐴𝑗
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝑎
𝑗=1
1 − 𝜆𝑗 𝛽𝐿−1 𝑡

or

𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (20)
𝑗=1 𝑘=0

Equation Eq. (20) expresses the optimum sequence for 𝑦𝑡 in terms of 𝑚 lagged 𝑦’s, and 𝑚
weighted infinite geometric sums of future 𝑎𝑡 ’s
Furthermore, Eq. (20) is the unique solution of the Euler equation that satisfies the initial
conditions and condition Eq. (12)
In effect, condition Eq. (12) compels us to solve the “unstable” roots of ℎ + 𝑑(𝛽𝑧 −1 )𝑑(𝑧) for-
ward (see [118])
−1 −1
The step of factoring the polynomial
√ ℎ + 𝑑(𝛽𝑧 ) 𝑑(𝑧) into 𝑐 (𝛽𝑧 )𝑐 (𝑧), where the zeros of
𝑐 (𝑧) all have modulus exceeding 𝛽, is central to solving the problem
We note two features of the solution Eq. (20)

√ √
• Since |𝜆𝑗 | < 1/ 𝛽 for all 𝑗, it follows that (𝜆𝑗 𝛽) < 𝛽 √
• The assumption that {𝑎𝑡 } is of exponential order less than 1/ 𝛽 is sufficient to guaran-
tee that the geometric sums of future 𝑎𝑡 ’s on the right side of Eq. (20) converge
70.6. UNDISCOUNTED PROBLEMS 1141

We immediately see that those sums will converge under the weaker condition that {𝑎𝑡 } is of
exponential order less than 𝜙−1 where 𝜙 = max {𝛽𝜆𝑖 , 𝑖 = 1, … , 𝑚}
Note that with 𝑎𝑡 identically zero, Eq. (20) implies that in general |𝑦𝑡 | eventually grows expo-
nentially at a rate given by max𝑖 |𝜆𝑖 |

The condition max𝑖 |𝜆𝑖 | < 1/ 𝛽 guarantees that condition Eq. (12) is satisfied

In fact, max𝑖 |𝜆𝑖 | < 1/ 𝛽 is a necessary condition for Eq. (12) to hold
Were Eq. (12) not satisfied, the objective function would diverge to −∞, implying that the 𝑦𝑡
path could not be optimal
For example, with 𝑎𝑡 = 0, for all 𝑡, it is easy to describe a naive (nonoptimal) policy for
{𝑦𝑡 , 𝑡 ≥ 0} that gives a finite value of Eq. (17)
We can simply let 𝑦𝑡 = 0 for 𝑡 ≥ 0
This policy involves at most 𝑚 nonzero values of ℎ𝑦𝑡2 and [𝑑(𝐿)𝑦𝑡 ]2 , and so yields a finite
value of Eq. (1)
Therefore it is easy to dominate a path that violates Eq. (12)

70.6 Undiscounted Problems

It is worthwhile focusing on a special case of the LQ problems above: the undiscounted prob-
lem that emerges when 𝛽 = 1
In this case, the Euler equation is

(ℎ + 𝑑(𝐿−1 )𝑑(𝐿)) 𝑦𝑡 = 𝑎𝑡

The factorization of the characteristic polynomial Eq. (16) becomes

(ℎ + 𝑑 (𝑧 −1 )𝑑(𝑧)) = 𝑐 (𝑧 −1 ) 𝑐 (𝑧)

where

𝑐 (𝑧) = 𝑐0 (1 − 𝜆1 𝑧) … (1 − 𝜆𝑚 𝑧)
𝑐0 = [(−1)𝑚 𝑧0 𝑧1 … 𝑧𝑚 ]
|𝜆𝑗 | < 1 for 𝑗 = 1, … , 𝑚
1
𝜆𝑗 = for 𝑗 = 1, … , 𝑚
𝑧𝑗
𝑧0 = constant

The solution of the problem becomes

𝑚 ∞
(1 − 𝜆1 𝐿) ⋯ (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑ 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0
1142 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

70.6.1 Transforming Discounted to Undiscounted Problem

Discounted problems can always be converted into undiscounted problems via a simple trans-
formation
Consider problem Eq. (1) with 0 < 𝛽 < 1
Define the transformed variables

𝑎𝑡̃ = 𝛽 𝑡/2 𝑎𝑡 , 𝑦𝑡̃ = 𝛽 𝑡/2 𝑦𝑡 (21)

𝑚
Then notice that 𝛽 𝑡 [𝑑 (𝐿)𝑦𝑡 ]2 = [𝑑 ̃(𝐿)𝑦𝑡̃ ]2 with 𝑑 ̃(𝐿) = ∑𝑗=0 𝑑𝑗̃ 𝐿𝑗 and 𝑑𝑗̃ = 𝛽 𝑗/2 𝑑𝑗
Then the original criterion function Eq. (1) is equivalent to

𝑁
1 1
lim ∑{𝑎𝑡̃ 𝑦𝑡̃ − ℎ 𝑦𝑡2̃ − [𝑑 ̃(𝐿) 𝑦𝑡̃ ]2 } (22)
𝑁→∞
𝑡=0
2 2

which is to be maximized over sequences {𝑦𝑡̃ , 𝑡 = 0, …} subject to 𝑦−1


̃ , ⋯ , 𝑦−𝑚
̃ given and
{𝑎𝑡̃ , 𝑡 = 1, …} a known bounded sequence
The Euler equation for this problem is [ℎ + 𝑑 ̃(𝐿−1 ) 𝑑 ̃(𝐿)] 𝑦𝑡̃ = 𝑎𝑡̃
The solution is

𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿) 𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0

or

𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ , (23)
𝑗=1 𝑘=0

where 𝑐 ̃ (𝑧 −1 )𝑐 ̃ (𝑧) = ℎ + 𝑑 ̃(𝑧 −1 )𝑑 ̃(𝑧), and where

1/2
[(−1)𝑚 𝑧0̃ 𝑧1̃ … 𝑧𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) … (1 − 𝜆̃ 𝑚 𝑧) = 𝑐 ̃ (𝑧), where |𝜆̃ 𝑗 | < 1

We leave it to the reader to show that Eq. (23) implies the equivalent form of the solution

𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + ⋯ + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ (𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘
𝑗=1 𝑘=0

where

𝑓𝑗 = 𝑓𝑗̃ 𝛽 −𝑗/2 , 𝐴𝑗 = 𝐴𝑗̃ , 𝜆𝑗 = 𝜆̃ 𝑗 𝛽 −1/2 (24)

The transformations Eq. (21) and the inverse formulas Eq. (24) allow us to solve a discounted
problem by first solving a related undiscounted problem
70.7. IMPLEMENTATION 1143

70.7 Implementation

Code that computes solutions to the LQ problem using the methods described above can be
found in file control_and_filter.py
Here’s how it looks

In [1]: """

Authors: Balint Skoze, Tom Sargent, John Stachurski

"""

import numpy as np
import scipy.stats as spst
import scipy.linalg as la

class LQFilter:

def __init__(self, d, h, y_m, r=None, h_eps=None, β=None):


"""

Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
y_m : list or numpy.array (1-D or a 2-D column vector)
Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""

self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1

self.y_m = np.asarray(y_m)

if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")

#---------------------------------------------
# Define the coefficients of � upfront
#---------------------------------------------
� = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
�[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) @ \
self.d.reshape(1, self.m + 1), k=-i))
�[self.m] = �[self.m] + self.h
self.� = �

#-----------------------------------------------------
# If r is given calculate the vector �_r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
�_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
�_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1, 1) @ \
self.r.reshape(1, self.k + 1), k=-i))
1144 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

if h_eps is None:
self.�_r = �_r
else:
�_r[self.k] = �_r[self.k] + h_eps
self.�_r = �_r

#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)).reshape(self.m, 1)

def construct_W_and_Wm(self, N):


"""
This constructs the matrices W and W_m for a given number of periods N
"""

m = self.m
d = self.d

W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))

#---------------------------------------
# Terminal conditions
#---------------------------------------

D_m1 = np.zeros((m + 1, m + 1))


M = np.zeros((m + 1, m))

# (1) Constuct the D_{m+1} matrix using the formula

for j in range(m + 1):


for k in range(j, m + 1):
D_m1[j, k] = d[:j + 1] @ d[k - j: k + 1]

# Make the matrix symmetric


D_m1 = D_m1 + D_m1.T - np.diag(np.diag(D_m1))

# (2) Construct the M matrix using the entries of D_m1

for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]

#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
� = self.�

W[:(m + 1), :(m + 1)] = D_m1 + self.h * np.eye(m + 1)


W[:(m + 1), (m + 1):(2 * m + 1)] = M

for i, row in enumerate(np.arange(m + 1, N + 1 - m)):


W[row, (i + 1):(2 * m + 2 + i)] = �

for i in range(1, m + 1):


W[N - m + i, -(2 * m + 1 - i):] = �[:-i]

for i in range(m):
W_m[N - i, :(m - i)] = �[(m + 1 + i):]

return W, W_m

def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic equation
associated with the Euler equation (1.7)
70.7. IMPLEMENTATION 1145

Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can be
evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
� = self.�

# Calculate the roots of the 2m-polynomial


roots = np.roots(�)
# sort the roots according to their length (in descending order)
roots_sorted = roots[np.argsort(abs(roots))[::-1]]

z_0 = �.sum() / np.poly1d(roots, True)(1)


z_1_to_m = roots_sorted[:m] # we need only those outside the unit circle

λ = 1 / z_1_to_m

return z_1_to_m, z_0, λ

def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j

Based on the expression (1.9). The order is


c_coeffs = [c_0, c_1, ..., c_{m-1}, c_m]
'''
z_1_to_m, z_0 = self.roots_of_characteristic()[:2]

c_0 = (z_0 * np.prod(z_1_to_m).real * (- 1)**self.m)**(.5)


c_coeffs = np.poly1d(z_1_to_m, True).c * z_0 / c_0

return c_coeffs[::-1]

def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]

A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])

return λ, A

def construct_V(self, N):


'''
This function constructs the covariance matrix for x^N (see section 6)
for a given period N
'''
V = np.zeros((N, N))
�_r = self.�_r

for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = �_r[self.k + abs(i-j)]

return V

def simulate_a(self, N):


"""
Assuming that the u's are normal, this method draws a random path
for x^N
"""
V = self.construct_V(N + 1)
d = spst.multivariate_normal(np.zeros(N + 1), V)
1146 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

return d.rvs()

def predict(self, a_hist, t):


"""
This function implements the prediction formula discussed in section 6 (1.59)
It takes a realization for a^N, and the period in which the prediction is formed

Output: E[abar | a_t, a_{t-1}, ..., a_1, a_0]


"""

N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)

aux_matrix = np.zeros((N + 1, N + 1))


aux_matrix[:(t + 1), :(t + 1)] = np.eye(t + 1)
L = la.cholesky(V).T
Ea_hist = la.inv(L) @ aux_matrix @ L @ a_hist

return Ea_hist

def optimal_y(self, a_hist, t=None):


"""
- if t is NOT given it takes a_hist (list or numpy.array) as a deterministic a_t
- if t is given, it solves the combined control prediction problem (section 7)
(by default, t == None -> deterministic)

for a given sequence of a_t (either deterministic or a particular realization),


it calculates the optimal y_t sequence using the method of the lecture

Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary diagonal
matrix D which renormalizes L and U
"""

N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)

L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))

J = np.fliplr(np.eye(N + 1))

if t is None: # if the problem is deterministic

a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)

#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1].reshape(N + 1, 1)

a_bar = a_hist - W_m @ self.y_m # a_bar from the lecture


Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}

# Reverse the order of y_bar with the matrix J


J = np.fliplr(np.eye(N + self.m + 1))
y_hist = J @ np.vstack([y_bar, self.y_m]) # y_hist : concatenated y_m and y_bar

#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)).reshape(N + 1 + self.m, 1)

return y_hist, L, U, y_bar


70.7. IMPLEMENTATION 1147

else: # if the problem is stochastic and we look at it

Ea_hist = self.predict(a_hist, t).reshape(N + 1, 1)


Ea_hist = J @ Ea_hist

a_bar = Ea_hist - W_m @ self.y_m # a_bar from the lecture


Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}

# Reverse the order of y_bar with the matrix J


J = np.fliplr(np.eye(N + self.m + 1))
y_hist = J @ np.vstack([y_bar, self.y_m]) # y_hist : concatenated y_m and y_bar

return y_hist, L, U, y_bar

70.7.1 Example

In this application, we’ll have one lag, with

𝑑(𝐿)𝑦𝑡 = 𝛾(𝐼 − 𝐿)𝑦𝑡 = 𝛾(𝑦𝑡 − 𝑦𝑡−1 )

Suppose for the moment that 𝛾 = 0


Then the intertemporal component of the LQ problem disappears, and the agent simply
wants to maximize 𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 /2 in each period
This means that the agent chooses 𝑦𝑡 = 𝑎𝑡 /ℎ
In the following we’ll set ℎ = 1, so that the agent just wants to track the {𝑎𝑡 } process
However, as we increase 𝛾, the agent gives greater weight to a smooth time path
Hence {𝑦𝑡 } evolves as a smoothed version of {𝑎𝑡 }
The {𝑎𝑡 } sequence we’ll choose as a stationary cyclic process plus some white noise
Here’s some code that generates a plot when 𝛾 = 0.8

In [2]: import matplotlib.pyplot as plt


%matplotlib inline

# == Set seed and generate a_t sequence == #


np.random.seed(123)
n = 100
a_seq = np.sin(np.linspace(0, 5 * np.pi, n)) + 2 + 0.1 * np.random.randn(n)

def plot_simulation(γ=0.8, m=1, h=1, y_m=2):

d = γ * np.asarray([1, -1])
y_m = np.asarray(y_m).reshape(m, 1)

testlq = LQFilter(d, h, y_m)


y_hist, L, U, y = testlq.optimal_y(a_seq)
y = y[::-1] # reverse y

# == Plot simulation results == #

fig, ax = plt.subplots(figsize=(10, 6))


p_args = {'lw' : 2, 'alpha' : 0.6}
time = range(len(y))
ax.plot(time, a_seq / h, 'k-o', ms=4, lw=2, alpha=0.6, label='$a_t$')
ax.plot(time, y, 'b-o', ms=4, lw=2, alpha=0.6, label='$y_t$')
ax.set(title=rf'Dynamics with $\gamma = {γ}$', xlabel='Time', xlim=(0, max(time)))
ax.legend()
ax.grid()
1148 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

plt.show()

plot_simulation()

Here’s what happens when we change 𝛾 to 5.0

In [3]: plot_simulation(γ=5)

And here’s 𝛾 = 10
70.8. EXERCISES 1149

In [4]: plot_simulation(γ=10)

70.8 Exercises

70.8.1 Exercise 1

Consider solving a discounted version (𝛽 < 1) of problem Eq. (1), as follows


Convert Eq. (1) to the undiscounted problem Eq. (22)
Let the solution of Eq. (22) in feedback form be

𝑚 ∞
(1 − 𝜆̃ 1 𝐿) ⋯ (1 − 𝜆̃ 𝑚 𝐿)𝑦𝑡̃ = ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃
𝑗=1 𝑘=0

or

𝑚 ∞
𝑦𝑡̃ = 𝑓1̃ 𝑦𝑡−1
̃ + ⋯ + 𝑓𝑚̃ 𝑦𝑡−𝑚
̃ + ∑ 𝐴𝑗̃ ∑ 𝜆̃ 𝑘𝑗 𝑎𝑡+𝑘
̃ (25)
𝑗=1 𝑘=0

Here

̃ −1 )𝑑(𝑧)
• ℎ + 𝑑(𝑧 ̃ = 𝑐(𝑧̃ −1 )𝑐(𝑧)
̃
• 𝑐(𝑧) 𝑚
̃ ] (1 − 𝜆̃ 1 𝑧) ⋯ (1 − 𝜆̃ 𝑚 𝑧)
̃ = [(−1) 𝑧0̃ 𝑧1̃ ⋯ 𝑧𝑚 1/2

̃ −1 ) 𝑑(𝑧)
where the 𝑧𝑗̃ are the zeros of ℎ + 𝑑(𝑧 ̃
1150 70. CLASSICAL CONTROL WITH LINEAR ALGEBRA

Prove that Eq. (25) implies that the solution for 𝑦𝑡 in feedback form is

𝑚 ∞
𝑦𝑡 = 𝑓1 𝑦𝑡−1 + … + 𝑓𝑚 𝑦𝑡−𝑚 + ∑ 𝐴𝑗 ∑ 𝛽 𝑘 𝜆𝑘𝑗 𝑎𝑡+𝑘
𝑗=1 𝑘=0

where 𝑓𝑗 = 𝑓𝑗̃ 𝛽 −𝑗/2 , 𝐴𝑗 = 𝐴𝑗̃ , and 𝜆𝑗 = 𝜆̃ 𝑗 𝛽 −1/2

70.8.2 Exercise 2

Solve the optimal control problem, maximize

2
1
∑ {𝑎𝑡 𝑦𝑡 − [(1 − 2𝐿)𝑦𝑡 ]2 }
𝑡=0
2

subject to 𝑦−1 given, and {𝑎𝑡 } a known bounded sequence


Express the solution in the “feedback form” Eq. (20), giving numerical values for the coeffi-
cients
Make sure that the boundary conditions Eq. (5) are satisfied
(Note: this problem differs from the problem in the text in one important way: instead of
ℎ > 0 in Eq. (1), ℎ = 0. This has an important influence on the solution.)

70.8.3 Exercise 3

Solve the infinite time-optimal control problem to maximize

𝑁
1
lim ∑ − [(1 − 2𝐿)𝑦𝑡 ]2 ,
𝑁→∞
𝑡=0
2

subject to 𝑦−1 given. Prove that the solution is

𝑦𝑡 = 2𝑦𝑡−1 = 2𝑡+1 𝑦−1 𝑡>0

70.8.4 Exercise 4

Solve the infinite time problem, to maximize

𝑁
1
lim ∑ (.0000001) 𝑦𝑡2 − [(1 − 2𝐿)𝑦𝑡 ]2
𝑁→∞
𝑡=0
2

subject to 𝑦−1 given. Prove that the solution 𝑦𝑡 = 2𝑦𝑡−1 violates condition Eq. (12), and so is
not optimal
Prove that the optimal solution is approximately 𝑦𝑡 = .5𝑦𝑡−1
71

Classical Prediction and Filtering


With Linear Algebra

71.1 Contents

• Overview 71.2
• Finite Dimensional Prediction 71.3
• Combined Finite Dimensional Control and Prediction 71.4
• Infinite Horizon Prediction and Filtering Problems 71.5
• Exercises 71.6

71.2 Overview

This is a sequel to the earlier lecture Classical Control with Linear Algebra
That lecture used linear algebra – in particular, the LU decomposition – to formulate and
solve a class of linear-quadratic optimal control problems
In this lecture, we’ll be using a closely related decomposition, the Cholesky decomposition, to
solve linear prediction and filtering problems
We exploit the useful fact that there is an intimate connection between two superficially dif-
ferent classes of problems:

• deterministic linear-quadratic (LQ) optimal control problems


• linear least squares prediction and filtering problems

The first class of problems involves no randomness, while the second is all about randomness
Nevertheless, essentially the same mathematics solves both types of problem
This connection, which is often termed “duality,” is present whether one uses “classical” or
“recursive” solution procedures
In fact, we saw duality at work earlier when we formulated control and prediction problems
recursively in lectures LQ dynamic programming problems, A first look at the Kalman filter,
and The permanent income model

1151
1152 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

A useful consequence of duality is that

• With every LQ control problem, there is implicitly affiliated a linear least squares pre-
diction or filtering problem
• With every linear least squares prediction or filtering problem there is implicitly affili-
ated a LQ control problem

An understanding of these connections has repeatedly proved useful in cracking interesting


applied problems
For example, Sargent [118] [chs. IX, XIV] and Hansen and Sargent [55] formulated and solved
control and filtering problems using 𝑧-transform methods
In this lecture, we begin to investigate these ideas by using mostly elementary linear algebra
This is the main purpose and focus of the lecture
However, after showing matrix algebra formulas, we’ll summarize classic infinite-horizon for-
mulas built on 𝑧-transform and lag operator methods
And we’ll occasionally refer to some of these formulas from the infinite dimensional problems
as we present the finite time formulas and associated linear algebra

71.2.1 References

Useful references include [133], [55], [101], [9], and [98]

71.3 Finite Dimensional Prediction

Let (𝑥1 , 𝑥2 , … , 𝑥𝑇 )′ = 𝑥 be a 𝑇 × 1 vector of random variables with mean E𝑥 = 0 and covari-


ance matrix E𝑥𝑥′ = 𝑉
Here 𝑉 is a 𝑇 × 𝑇 positive definite matrix
The 𝑖, 𝑗 component 𝐸𝑥𝑖 𝑥𝑗 of 𝑉 is the inner product between 𝑥𝑖 and 𝑥𝑗
We regard the random variables as being ordered in time so that 𝑥𝑡 is thought of as the value
of some economic variable at time 𝑡
For example, 𝑥𝑡 could be generated by the random process described by the Wold represen-
tation presented in equation Eq. (16) in the section below on infinite dimensional prediction
and filtering
In that case, 𝑉𝑖𝑗 is given by the coefficient on 𝑧∣𝑖−𝑗∣ in the expansion of 𝑔𝑥 (𝑧) = 𝑑(𝑧) 𝑑(𝑧 −1 ) +

ℎ, which equals ℎ + ∑𝑘=0 𝑑𝑘 𝑑𝑘+∣𝑖−𝑗∣
We want to construct 𝑗 step ahead linear least squares predictors of the form

E [𝑥𝑇 |𝑥𝑇 −𝑗 , 𝑥𝑇 −𝑗+1 , … , 𝑥1 ]

where E is the linear least squares projection operator


(Sometimes E is called the wide-sense expectations operator)
To find linear least squares predictors it is helpful first to construct a 𝑇 × 1 vector 𝜀 of ran-
dom variables that form an orthonormal basis for the vector of random variables 𝑥
71.3. FINITE DIMENSIONAL PREDICTION 1153

The key insight here comes from noting that because the covariance matrix 𝑉 of 𝑥 is a posi-
tive definite and symmetric, there exists a (Cholesky) decomposition of 𝑉 such that

𝑉 = 𝐿−1 (𝐿−1 )′

and

𝐿 𝑉 𝐿′ = 𝐼

where 𝐿 and 𝐿−1 are both lower triangular


Form the 𝑇 × 1 random vector 𝜀 = 𝐿𝑥
The random vector 𝜀 is an orthonormal basis for 𝑥 because

• 𝐿 is nonsingular
• E 𝜀 𝜀′ = 𝐿E𝑥𝑥′ 𝐿′ = 𝐼
• 𝑥 = 𝐿−1 𝜀

It is enlightening to write out and interpret the equations 𝐿𝑥 = 𝜀 and 𝐿−1 𝜀 = 𝑥


First, we’ll write 𝐿𝑥 = 𝜀

𝐿11 𝑥1 = 𝜀1
𝐿21 𝑥1 + 𝐿22 𝑥2 = 𝜀2
(1)

𝐿𝑇 1 𝑥1 … + 𝐿𝑇 𝑇 𝑥𝑇 = 𝜀𝑇

or

𝑡−1
∑ 𝐿𝑡,𝑡−𝑗 𝑥𝑡−𝑗 = 𝜀𝑡 , 𝑡 = 1, 2, … 𝑇 (2)
𝑗=0

Next, we write 𝐿−1 𝜀 = 𝑥

𝑥1 = 𝐿−1
11 𝜀1
𝑥2 = 𝐿−1 −1
22 𝜀2 + 𝐿21 𝜀1
, (3)

𝑥𝑇 = 𝐿−1 −1 −1
𝑇 𝑇 𝜀𝑇 + 𝐿𝑇 ,𝑇 −1 𝜀𝑇 −1 … + 𝐿𝑇 ,1 𝜀1

or

𝑡−1
𝑥𝑡 = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (4)
𝑗=0

where 𝐿−1
𝑖,𝑗 denotes the 𝑖, 𝑗 element of 𝐿
−1

From Eq. (2), it follows that 𝜀𝑡 is in the linear subspace spanned by 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥1
From Eq. (4) it follows that that 𝑥𝑡 is in the linear subspace spanned by 𝜀𝑡 , 𝜀𝑡−1 , … , 𝜀1
1154 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

Equation Eq. (2) forms a sequence of autoregressions that for 𝑡 = 1, … , 𝑇 express 𝑥𝑡 as


linear functions of 𝑥𝑠 , 𝑠 = 1, … , 𝑡 − 1 and a random variable (𝐿𝑡,𝑡 )−1 𝜀𝑡 that is orthogonal to
each componenent of 𝑥𝑠 , 𝑠 = 1, … , 𝑡 − 1
(Here (𝐿𝑡,𝑡 )−1 denotes the reciprocal of 𝐿𝑡,𝑡 while 𝐿−1 −1
𝑡,𝑡 denotes the 𝑡, 𝑡 element of 𝐿 )

The equivalence of the subspaces spanned by 𝜀𝑡 , … , 𝜀1 and 𝑥𝑡 , … , 𝑥1 means that for 𝑡 − 1 ≥


𝑚≥1

E[𝑥𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚−1 , … , 𝑥1 ] = E[𝑥𝑡 ∣ 𝜀𝑡−𝑚 , 𝜀𝑡−𝑚−1 , … , 𝜀1 ] (5)

To proceed, it is useful to drill down and note that for 𝑡 − 1 ≥ 𝑚 ≥ 1 we can rewrite Eq. (4)
in the form of the moving average representation

𝑚−1 𝑡−1
𝑥𝑡 = ∑ 𝐿−1 −1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗 + ∑ 𝐿𝑡,𝑡−𝑗 𝜀𝑡−𝑗 (6)
𝑗=0 𝑗=𝑚

𝑡−1
Representation Eq. (6) is an orthogonal decomposition of 𝑥𝑡 into a part ∑𝑗=𝑚 𝐿−1 𝑡,𝑡−𝑗 𝜀𝑡−𝑗
that lies in the space spanned by [𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … , 𝑥1 ] and an orthogonal component
𝑡−1
∑𝑗=𝑚 𝐿−1 𝑡,𝑡−𝑗 𝜀𝑡−𝑗 that does not line in that space but instead in a linear space knowns as its
orthogonal complement
It follows that

𝑚−1
E[𝑥𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚−1 , … , 𝑥1 ] = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=0

71.3.1 Implementation

Code that computes solutions to LQ control and filtering problems using the methods de-
scribed here and in Classical Control with Linear Algebra can be found in the file con-
trol_and_filter.py
Here’s how it looks

In [1]: """

Authors: Balint Skoze, Tom Sargent, John Stachurski

"""

import numpy as np
import scipy.stats as spst
import scipy.linalg as la

class LQFilter:

def __init__(self, d, h, y_m, r=None, h_eps=None, β=None):


"""

Parameters
----------
d : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [d_0, d_1, ..., d_m]
h : scalar
Parameter of the objective function (corresponding to the
quadratic term)
71.3. FINITE DIMENSIONAL PREDICTION 1155

y_m : list or numpy.array (1-D or a 2-D column vector)


Initial conditions for y
r : list or numpy.array (1-D or a 2-D column vector)
The order of the coefficients: [r_0, r_1, ..., r_k]
(optional, if not defined -> deterministic problem)
β : scalar
Discount factor (optional, default value is one)
"""

self.h = h
self.d = np.asarray(d)
self.m = self.d.shape[0] - 1

self.y_m = np.asarray(y_m)

if self.m == self.y_m.shape[0]:
self.y_m = self.y_m.reshape(self.m, 1)
else:
raise ValueError("y_m must be of length m = {self.m:d}")

#---------------------------------------------
# Define the coefficients of � upfront
#---------------------------------------------
� = np.zeros(2 * self.m + 1)
for i in range(- self.m, self.m + 1):
�[self.m - i] = np.sum(np.diag(self.d.reshape(self.m + 1, 1) @ \
self.d.reshape(1, self.m + 1), k=-i))
�[self.m] = �[self.m] + self.h
self.� = �

#-----------------------------------------------------
# If r is given calculate the vector �_r
#-----------------------------------------------------
if r is None:
pass
else:
self.r = np.asarray(r)
self.k = self.r.shape[0] - 1
�_r = np.zeros(2 * self.k + 1)
for i in range(- self.k, self.k + 1):
�_r[self.k - i] = np.sum(np.diag(self.r.reshape(self.k + 1, 1) @ \
self.r.reshape(1, self.k + 1), k=-i))
if h_eps is None:
self.�_r = �_r
else:
�_r[self.k] = �_r[self.k] + h_eps
self.�_r = �_r

#-----------------------------------------------------
# If β is given, define the transformed variables
#-----------------------------------------------------
if β is None:
self.β = 1
else:
self.β = β
self.d = self.β**(np.arange(self.m + 1)/2) * self.d
self.y_m = self.y_m * (self.β**(- np.arange(1, self.m + 1)/2)).reshape(self.m, 1)

def construct_W_and_Wm(self, N):


"""
This constructs the matrices W and W_m for a given number of periods N
"""

m = self.m
d = self.d

W = np.zeros((N + 1, N + 1))
W_m = np.zeros((N + 1, m))

#---------------------------------------
# Terminal conditions
#---------------------------------------
1156 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

D_m1 = np.zeros((m + 1, m + 1))


M = np.zeros((m + 1, m))

# (1) Constuct the D_{m+1} matrix using the formula

for j in range(m + 1):


for k in range(j, m + 1):
D_m1[j, k] = d[:j + 1] @ d[k - j: k + 1]

# Make the matrix symmetric


D_m1 = D_m1 + D_m1.T - np.diag(np.diag(D_m1))

# (2) Construct the M matrix using the entries of D_m1

for j in range(m):
for i in range(j + 1, m + 1):
M[i, j] = D_m1[i - j - 1, m]

#----------------------------------------------
# Euler equations for t = 0, 1, ..., N-(m+1)
#----------------------------------------------
� = self.�

W[:(m + 1), :(m + 1)] = D_m1 + self.h * np.eye(m + 1)


W[:(m + 1), (m + 1):(2 * m + 1)] = M

for i, row in enumerate(np.arange(m + 1, N + 1 - m)):


W[row, (i + 1):(2 * m + 2 + i)] = �

for i in range(1, m + 1):


W[N - m + i, -(2 * m + 1 - i):] = �[:-i]

for i in range(m):
W_m[N - i, :(m - i)] = �[(m + 1 + i):]

return W, W_m

def roots_of_characteristic(self):
"""
This function calculates z_0 and the 2m roots of the characteristic equation
associated with the Euler equation (1.7)

Note:
------
numpy.poly1d(roots, True) defines a polynomial using its roots that can be
evaluated at any point. If x_1, x_2, ... , x_m are the roots then
p(x) = (x - x_1)(x - x_2)...(x - x_m)
"""
m = self.m
� = self.�

# Calculate the roots of the 2m-polynomial


roots = np.roots(�)
# sort the roots according to their length (in descending order)
roots_sorted = roots[np.argsort(abs(roots))[::-1]]

z_0 = �.sum() / np.poly1d(roots, True)(1)


z_1_to_m = roots_sorted[:m] # we need only those outside the unit circle

λ = 1 / z_1_to_m

return z_1_to_m, z_0, λ

def coeffs_of_c(self):
'''
This function computes the coefficients {c_j, j = 0, 1, ..., m} for
c(z) = sum_{j = 0}^{m} c_j z^j

Based on the expression (1.9). The order is


c_coeffs = [c_0, c_1, ..., c_{m-1}, c_m]
'''
z_1_to_m, z_0 = self.roots_of_characteristic()[:2]
71.3. FINITE DIMENSIONAL PREDICTION 1157

c_0 = (z_0 * np.prod(z_1_to_m).real * (- 1)**self.m)**(.5)


c_coeffs = np.poly1d(z_1_to_m, True).c * z_0 / c_0

return c_coeffs[::-1]

def solution(self):
"""
This function calculates {λ_j, j=1,...,m} and {A_j, j=1,...,m}
of the expression (1.15)
"""
λ = self.roots_of_characteristic()[2]
c_0 = self.coeffs_of_c()[-1]

A = np.zeros(self.m, dtype=complex)
for j in range(self.m):
denom = 1 - λ/λ[j]
A[j] = c_0**(-2) / np.prod(denom[np.arange(self.m) != j])

return λ, A

def construct_V(self, N):


'''
This function constructs the covariance matrix for x^N (see section 6)
for a given period N
'''
V = np.zeros((N, N))
�_r = self.�_r

for i in range(N):
for j in range(N):
if abs(i-j) <= self.k:
V[i, j] = �_r[self.k + abs(i-j)]

return V

def simulate_a(self, N):


"""
Assuming that the u's are normal, this method draws a random path
for x^N
"""
V = self.construct_V(N + 1)
d = spst.multivariate_normal(np.zeros(N + 1), V)

return d.rvs()

def predict(self, a_hist, t):


"""
This function implements the prediction formula discussed in section 6 (1.59)
It takes a realization for a^N, and the period in which the prediction is formed

Output: E[abar | a_t, a_{t-1}, ..., a_1, a_0]


"""

N = np.asarray(a_hist).shape[0] - 1
a_hist = np.asarray(a_hist).reshape(N + 1, 1)
V = self.construct_V(N + 1)

aux_matrix = np.zeros((N + 1, N + 1))


aux_matrix[:(t + 1), :(t + 1)] = np.eye(t + 1)
L = la.cholesky(V).T
Ea_hist = la.inv(L) @ aux_matrix @ L @ a_hist

return Ea_hist

def optimal_y(self, a_hist, t=None):


"""
- if t is NOT given it takes a_hist (list or numpy.array) as a deterministic a_t
- if t is given, it solves the combined control prediction problem (section 7)
(by default, t == None -> deterministic)

for a given sequence of a_t (either deterministic or a particular realization),


it calculates the optimal y_t sequence using the method of the lecture
1158 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

Note:
------
scipy.linalg.lu normalizes L, U so that L has unit diagonal elements
To make things consistent with the lecture, we need an auxiliary diagonal
matrix D which renormalizes L and U
"""

N = np.asarray(a_hist).shape[0] - 1
W, W_m = self.construct_W_and_Wm(N)

L, U = la.lu(W, permute_l=True)
D = np.diag(1 / np.diag(U))
U = D @ U
L = L @ np.diag(1 / np.diag(D))

J = np.fliplr(np.eye(N + 1))

if t is None: # if the problem is deterministic

a_hist = J @ np.asarray(a_hist).reshape(N + 1, 1)

#--------------------------------------------
# Transform the 'a' sequence if β is given
#--------------------------------------------
if self.β != 1:
a_hist = a_hist * (self.β**(np.arange(N + 1) / 2))[::-1].reshape(N + 1, 1)

a_bar = a_hist - W_m @ self.y_m # a_bar from the lecture


Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}

# Reverse the order of y_bar with the matrix J


J = np.fliplr(np.eye(N + self.m + 1))
y_hist = J @ np.vstack([y_bar, self.y_m]) # y_hist : concatenated y_m and y_bar

#--------------------------------------------
# Transform the optimal sequence back if β is given
#--------------------------------------------
if self.β != 1:
y_hist = y_hist * (self.β**(- np.arange(-self.m, N + 1)/2)).reshape(N + 1 + self.m, 1)

return y_hist, L, U, y_bar

else: # if the problem is stochastic and we look at it

Ea_hist = self.predict(a_hist, t).reshape(N + 1, 1)


Ea_hist = J @ Ea_hist

a_bar = Ea_hist - W_m @ self.y_m # a_bar from the lecture


Uy = np.linalg.solve(L, a_bar) # U @ y_bar = L^{-1}
y_bar = np.linalg.solve(U, Uy) # y_bar = U^{-1}L^{-1}

# Reverse the order of y_bar with the matrix J


J = np.fliplr(np.eye(N + self.m + 1))
y_hist = J @ np.vstack([y_bar, self.y_m]) # y_hist : concatenated y_m and y_bar

return y_hist, L, U, y_bar

Let’s use this code to tackle two interesting examples

71.3.2 Example 1

Consider a stochastic process with moving average representation

𝑥𝑡 = (1 − 2𝐿)𝜀𝑡

where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity
71.3. FINITE DIMENSIONAL PREDICTION 1159

If we were to use the tools associated with infinite dimensional prediction and filtering to be
described below, we would use the Wiener-Kolmogorov formula Eq. (21) to compute the lin-
ear least squares forecasts E[𝑥𝑡+𝑗 ∣ 𝑥𝑡 , 𝑥𝑡−1 , …], for 𝑗 = 1, 2
But we can do everything we want by instead using our finite dimensional tools and setting
𝑑 = 𝑟, generating an instance of LQFilter, then invoking pertinent methods of LQFilter

In [2]: m = 1
y_m = np.asarray([.0]).reshape(m, 1)
d = np.asarray([1, -2])
r = np.asarray([1, -2])
h = 0.0
example = LQFilter(d, h, y_m, r=d)

The Wold representation is computed by example.coefficients_of_c()


Let’s check that it “flips roots” as required

In [3]: example.coeffs_of_c()

Out[3]: array([ 2., -1.])

In [4]: example.roots_of_characteristic()

Out[4]: (array([2.]), -2.0, array([0.5]))

Now let’s form the covariance matrix of a time series vector of length 𝑁 and put it in 𝑉
Then we’ll take a Cholesky decomposition of 𝑉 = 𝐿−1 𝐿−1 and use it to form the vector of
“moving average representations” 𝑥 = 𝐿−1 𝜀 and the vector of “autoregressive representations”
𝐿𝑥 = 𝜀

In [5]: V = example.construct_V(N=5)
print(V)

[[ 5. -2. 0. 0. 0.]
[-2. 5. -2. 0. 0.]
[ 0. -2. 5. -2. 0.]
[ 0. 0. -2. 5. -2.]
[ 0. 0. 0. -2. 5.]]

Notice how the lower rows of the “moving average representations” are converging to the ap-
propriate infinite history Wold representation to be described below when we study infinite
horizon-prediction and filtering

In [6]: Li = np.linalg.cholesky(V)
print(Li)

[[ 2.23606798 0. 0. 0. 0. ]
[-0.89442719 2.04939015 0. 0. 0. ]
[ 0. -0.97590007 2.01186954 0. 0. ]
[ 0. 0. -0.99410024 2.00293902 0. ]
[ 0. 0. 0. -0.99853265 2.000733 ]]

Notice how the lower rows of the “autoregressive representations” are converging to the ap-
propriate infinite-history autoregressive representation to be described below when we study
infinite horizon-prediction and filtering
1160 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

In [7]: L = np.linalg.inv(Li)
print(L)

[[ 0.4472136 0. 0. 0. 0. ]
[ 0.19518001 0.48795004 0. 0. 0. ]
[ 0.09467621 0.23669053 0.49705012 0. 0. ]
[ 0.04698977 0.11747443 0.2466963 0.49926632 -0. ]
[ 0.02345182 0.05862954 0.12312203 0.24917554 0.49981682]]

71.3.3 Example 2

Consider a stochastic process 𝑋𝑡 with moving average representation


𝑋𝑡 = (1 − 2𝐿2 )𝜀𝑡

where 𝜀𝑡 is a serially uncorrelated random process with mean zero and variance unity
Let’s find a Wold moving average representation for 𝑥𝑡 that will prevail in the infinite-history
context to be studied in detail below
To do this, we’ll use the Wiener-Kolomogorov formula Eq. (21) presented below to compute
the linear least squares forecasts E [𝑋𝑡+𝑗 ∣ 𝑋𝑡−1 , …] for 𝑗 = 1, 2, 3
We proceed in the same way as in example 1

In [8]: m = 2
y_m = np.asarray([.0, .0]).reshape(m, 1)
d = np.asarray([1, 0, -np.sqrt(2)])
r = np.asarray([1, 0, -np.sqrt(2)])
h = 0.0
example = LQFilter(d, h, y_m, r=d)
example.coeffs_of_c()

Out[8]: array([ 1.41421356, -0. , -1. ])

In [9]: example.roots_of_characteristic()

Out[9]: (array([ 1.18920712, -1.18920712]),


-1.4142135623731122,
array([ 0.84089642, -0.84089642]))

In [10]: V = example.construct_V(N=8)
print(V)

[[ 3. 0. -1.41421356 0. 0. 0.
0. 0. ]
[ 0. 3. 0. -1.41421356 0. 0.
0. 0. ]
[-1.41421356 0. 3. 0. -1.41421356 0.
0. 0. ]
[ 0. -1.41421356 0. 3. 0. -1.41421356
0. 0. ]
[ 0. 0. -1.41421356 0. 3. 0.
-1.41421356 0. ]
[ 0. 0. 0. -1.41421356 0. 3.
0. -1.41421356]
[ 0. 0. 0. 0. -1.41421356 0.
3. 0. ]
[ 0. 0. 0. 0. 0. -1.41421356
0. 3. ]]
71.3. FINITE DIMENSIONAL PREDICTION 1161

In [11]: Li = np.linalg.cholesky(V)
print(Li[-3:, :])

[[ 0. 0. 0. -0.9258201 0. 1.46385011
0. 0. ]
[ 0. 0. 0. 0. -0.96609178 0.
1.43759058 0. ]
[ 0. 0. 0. 0. 0. -0.96609178
0. 1.43759058]]

In [12]: L = np.linalg.inv(Li)
print(L)

[[0.57735027 0. 0. 0. 0. 0.
0. 0. ]
[0. 0.57735027 0. 0. 0. 0.
0. 0. ]
[0.3086067 0. 0.65465367 0. 0. 0.
0. 0. ]
[0. 0.3086067 0. 0.65465367 0. 0.
0. 0. ]
[0.19518001 0. 0.41403934 0. 0.68313005 0.
0. 0. ]
[0. 0.19518001 0. 0.41403934 0. 0.68313005
0. 0. ]
[0.13116517 0. 0.27824334 0. 0.45907809 0.
0.69560834 0. ]
[0. 0.13116517 0. 0.27824334 0. 0.45907809
0. 0.69560834]]

71.3.4 Prediction

It immediately follows from the “orthogonality principle” of least squares (see [9] or [118]
[ch. X]) that

𝑡−1
E[𝑥𝑡 ∣ 𝑥𝑡−𝑚 , 𝑥𝑡−𝑚+1 , … 𝑥1 ] = ∑ 𝐿−1
𝑡,𝑡−𝑗 𝜀𝑡−𝑗
𝑗=𝑚 (7)
= [𝐿−1 −1 −1
𝑡,1 𝐿𝑡,2 , … , 𝐿𝑡,𝑡−𝑚 0 0 … 0]𝐿 𝑥

This can be interpreted as a finite-dimensional version of the Wiener-Kolmogorov 𝑚-step


ahead prediction formula
We can use Eq. (7) to represent the linear least squares projection of the vector 𝑥 conditioned
on the first 𝑠 observations [𝑥𝑠 , 𝑥𝑠−1 … , 𝑥1 ]
We have

𝐼 0
E[𝑥 ∣ 𝑥𝑠 , 𝑥𝑠−1 , … , 𝑥1 ] = 𝐿−1 [ 𝑠 ] 𝐿𝑥 (8)
0 0(𝑡−𝑠)

This formula will be convenient in representing the solution of control problems under uncer-
tainty
Equation Eq. (4) can be recognized as a finite dimensional version of a moving average repre-
sentation
Equation Eq. (2) can be viewed as a finite dimension version of an autoregressive representa-
tion
1162 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

Notice that even if the 𝑥𝑡 process is covariance stationary, so that 𝑉 is such that 𝑉𝑖𝑗 depends
only on |𝑖 − 𝑗|, the coefficients in the moving average representation are time-dependent, there
being a different moving average for each 𝑡
If 𝑥𝑡 is a covariance stationary process, the last row of 𝐿−1 converges to the coefficients in the
Wold moving average representation for {𝑥𝑡 } as 𝑇 → ∞
Further, if 𝑥𝑡 is covariance stationary, for fixed 𝑘 and 𝑗 > 0, 𝐿−1 −1
𝑇 ,𝑇 −𝑗 converges to 𝐿𝑇 −𝑘,𝑇 −𝑘−𝑗
as 𝑇 → ∞
That is, the “bottom” rows of 𝐿−1 converge to each other and to the Wold moving average
coefficients as 𝑇 → ∞
This last observation gives one simple and widely-used practical way of forming a finite 𝑇 ap-
proximation to a Wold moving average representation
First, form the covariance matrix E𝑥𝑥′ = 𝑉 , then obtain the Cholesky decomposition

𝐿−1 𝐿−1 of 𝑉 , which can be accomplished quickly on a computer
The last row of 𝐿−1 gives the approximate Wold moving average coefficients
This method can readily be generalized to multivariate systems

71.4 Combined Finite Dimensional Control and Prediction

Consider the finite-dimensional control problem, maximize

𝑁
1 1
E ∑ {𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 } , ℎ>0
𝑡=0
2 2

where 𝑑(𝐿) = 𝑑0 + 𝑑1 𝐿 + … + 𝑑𝑚 𝐿𝑚 , 𝐿 is the lag operator, 𝑎̄ = [𝑎𝑁 , 𝑎𝑁−1 … , 𝑎1 , 𝑎0 ]′ a random


vector with mean zero and E 𝑎𝑎̄ ′̄ = 𝑉
The variables 𝑦−1 , … , 𝑦−𝑚 are given
Maximization is over choices of 𝑦0 , 𝑦1 … , 𝑦𝑁 , where 𝑦𝑡 is required to be a linear function of
{𝑦𝑡−𝑠−1 , 𝑡 + 𝑚 − 1 ≥ 0; 𝑎𝑡−𝑠 , 𝑡 ≥ 𝑠 ≥ 0}
We saw in the lecture Classical Control with Linear Algebra that the solution of this problem
under certainty could be represented in the feedback-feedforward form

𝑦−1
𝑈 𝑦 ̄ = 𝐿−1 𝑎̄ + 𝐾 ⎡
⎢ ⋮ ⎥

⎣𝑦−𝑚 ⎦

for some (𝑁 + 1) × 𝑚 matrix 𝐾


Using a version of formula Eq. (7), we can express E[𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] as

0 0
E[𝑎̄ ∣ 𝑎𝑠 , 𝑎𝑠−1 , … , 𝑎0 ] = 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄
0 𝐼(𝑠+1)

where 𝐼(𝑠+1) is the (𝑠 + 1) × (𝑠 + 1) identity matrix, and 𝑉 = 𝑈̃ −1 𝑈̃ −1 , where 𝑈̃ is the upper


triangular Cholesky factor of the covariance matrix 𝑉


71.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1163

(We have reversed the time axis in dating the 𝑎’s relative to earlier)
The time axis can be reversed in representation Eq. (8) by replacing 𝐿 with 𝐿𝑇
The optimal decision rule to use at time 0 ≤ 𝑡 ≤ 𝑁 is then given by the (𝑁 − 𝑡 + 1)th row of

𝑦−1
0 0
𝑈 𝑦 ̄ = 𝐿−1 𝑈̃ −1 [ ] 𝑈̃ 𝑎 ̄ + 𝐾 ⎡
⎢ ⋮ ⎥

0 𝐼(𝑡+1)
⎣𝑦−𝑚 ⎦

71.5 Infinite Horizon Prediction and Filtering Problems

It is instructive to compare the finite-horizon formulas based on linear algebra decompositions


of finite-dimensional covariance matrices with classic formulas for infinite horizon and infinite
history prediction and control problems
These classic infinite horizon formulas used the mathematics of 𝑧-transforms and lag opera-
tors
We’ll meet interesting lag operator and 𝑧-transform counterparts to our finite horizon matrix
formulas
We pose two related prediction and filtering problems
We let 𝑌𝑡 be a univariate 𝑚th order moving average, covariance stationary stochastic process,

𝑌𝑡 = 𝑑(𝐿)𝑢𝑡 (9)
𝑚
where 𝑑(𝐿) = ∑𝑗=0 𝑑𝑗 𝐿𝑗 , and 𝑢𝑡 is a serially uncorrelated stationary random process satisfy-
ing

E𝑢𝑡 = 0
1 if 𝑡 = 𝑠 (10)
E𝑢𝑡 𝑢𝑠 = {
0 otherwise

We impose no conditions on the zeros of 𝑑(𝑧)


A second covariance stationary process is 𝑋𝑡 given by

𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡 (11)

where 𝜀𝑡 is a serially uncorrelated stationary random process with E𝜀𝑡 = 0 and E𝜀𝑡 𝜀𝑠 = 0 for
all distinct 𝑡 and 𝑠
We also assume that E𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠
The linear least squares prediction problem is to find the 𝐿2 random variable 𝑋̂ 𝑡+𝑗
among linear combinations of {𝑋𝑡 , 𝑋𝑡−1 , …} that minimizes E(𝑋̂ 𝑡+𝑗 − 𝑋𝑡+𝑗 )2
∞ ∞
That is, the problem is to find a 𝛾𝑗 (𝐿) = ∑𝑘=0 𝛾𝑗𝑘 𝐿𝑘 such that ∑𝑘=0 |𝛾𝑗𝑘 |2 < ∞ and
E[𝛾𝑗 (𝐿)𝑋𝑡 − 𝑋𝑡+𝑗 ]2 is minimized

The linear least squares filtering problem is to find a 𝑏 (𝐿) = ∑𝑗=0 𝑏𝑗 𝐿𝑗 such that

∑𝑗=0 |𝑏𝑗 |2 < ∞ and E[𝑏 (𝐿)𝑋𝑡 − 𝑌𝑡 ]2 is minimized
1164 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

Interesting versions of these problems related to the permanent income theory were studied
by [98]

71.5.1 Problem Formulation

These problems are solved as follows


The covariograms of 𝑌 and 𝑋 and their cross covariogram are, respectively,

𝐶𝑋 (𝜏 ) = E𝑋𝑡 𝑋𝑡−𝜏
𝐶𝑌 (𝜏 ) = E𝑌𝑡 𝑌𝑡−𝜏 𝜏 = 0, ±1, ±2, … (12)
𝐶𝑌 ,𝑋 (𝜏 ) = E𝑌𝑡 𝑋𝑡−𝜏

The covariance and cross-covariance generating functions are defined as


𝑔𝑋 (𝑧) = ∑ 𝐶𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞

𝑔𝑌 (𝑧) = ∑ 𝐶𝑌 (𝜏 )𝑧 𝜏 (13)
𝜏=−∞

𝑔𝑌 𝑋 (𝑧) = ∑ 𝐶𝑌 𝑋 (𝜏 )𝑧 𝜏
𝜏=−∞

The generating functions can be computed by using the following facts


Let 𝑣1𝑡 and 𝑣2𝑡 be two mutually and serially uncorrelated white noises with unit variances
That is, E𝑣1𝑡
2
= E𝑣2𝑡 2
= 1, E𝑣1𝑡 = E𝑣2𝑡 = 0, E𝑣1𝑡 𝑣2𝑠 = 0 for all 𝑡 and 𝑠, E𝑣1𝑡 𝑣1𝑡−𝑗 =
E𝑣2𝑡 𝑣2𝑡−𝑗 = 0 for all 𝑗 ≠ 0
Let 𝑥𝑡 and 𝑦𝑡 be two random processes given by

𝑦𝑡 = 𝐴(𝐿)𝑣1𝑡 + 𝐵(𝐿)𝑣2𝑡
𝑥𝑡 = 𝐶(𝐿)𝑣1𝑡 + 𝐷(𝐿)𝑣2𝑡

Then, as shown for example in [118] [ch. XI], it is true that

𝑔𝑦 (𝑧) = 𝐴(𝑧)𝐴(𝑧 −1 ) + 𝐵(𝑧)𝐵(𝑧 −1 )


𝑔𝑥 (𝑧) = 𝐶(𝑧)𝐶(𝑧 −1 ) + 𝐷(𝑧)𝐷(𝑧 −1 ) (14)
𝑔𝑦𝑥 (𝑧) = 𝐴(𝑧)𝐶(𝑧 −1 ) + 𝐵(𝑧)𝐷(𝑧 −1 )

Applying these formulas to Eq. (9) – Eq. (12), we have

𝑔𝑌 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 )
𝑔𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 −1 ) + ℎ (15)
−1
𝑔𝑌 𝑋 (𝑧) = 𝑑(𝑧)𝑑(𝑧 )

The key step in obtaining solutions to our problems is to factor the covariance generating
function 𝑔𝑋 (𝑧) of 𝑋
The solutions of our problems are given by formulas due to Wiener and Kolmogorov
71.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1165

These formulas utilize the Wold moving average representation of the 𝑋𝑡 process,

𝑋𝑡 = 𝑐 (𝐿) 𝜂𝑡 (16)

𝑚
where 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿𝑗 , with

𝑐0 𝜂𝑡 = 𝑋𝑡 − E[𝑋𝑡 |𝑋𝑡−1 , 𝑋𝑡−2 , …] (17)

Here E is the linear least squares projection operator


Equation Eq. (17) is the condition that 𝑐0 𝜂𝑡 can be the one-step-ahead error in predicting 𝑋𝑡
from its own past values
Condition Eq. (17) requires that 𝜂𝑡 lie in the closed linear space spanned by [𝑋𝑡 , 𝑋𝑡−1 , …]
This will be true if and only if the zeros of 𝑐(𝑧) do not lie inside the unit circle
It is an implication of Eq. (17) that 𝜂𝑡 is a serially uncorrelated random process and that nor-
malization can be imposed so that E𝜂𝑡2 = 1
Consequently, an implication of Eq. (16) is that the covariance generating function of 𝑋𝑡 can
be expressed as

𝑔𝑋 (𝑧) = 𝑐 (𝑧) 𝑐 (𝑧 −1 ) (18)

It remains to discuss how 𝑐(𝐿) is to be computed


Combining Eq. (14) and Eq. (18) gives

𝑑(𝑧) 𝑑(𝑧 −1 ) + ℎ = 𝑐 (𝑧) 𝑐 (𝑧 −1 ) (19)

Therefore, we have already shown constructively how to factor the covariance generating
function 𝑔𝑋 (𝑧) = 𝑑(𝑧) 𝑑 (𝑧 −1 ) + ℎ
We now introduce the annihilation operator:

∞ ∞
[ ∑ 𝑓𝑗 𝐿𝑗 ] ≡ ∑ 𝑓𝑗 𝐿𝑗 (20)
𝑗=−∞ 𝑗=0
+

In words, [ ]+ means “ignore negative powers of 𝐿”


We have defined the solution of the prediction problem as E[𝑋𝑡+𝑗 |𝑋𝑡 , 𝑋𝑡−1 , …] = 𝛾𝑗 (𝐿)𝑋𝑡
Assuming that the roots of 𝑐(𝑧) = 0 all lie outside the unit circle, the Wiener-Kolmogorov
formula for 𝛾𝑗 (𝐿) holds:

𝑐(𝐿)
𝛾𝑗 (𝐿) = [ ] 𝑐 (𝐿)−1 (21)
𝐿𝑗 +

We have defined the solution of the filtering problem as E[𝑌𝑡 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = 𝑏(𝐿)𝑋𝑡
The Wiener-Kolomogorov formula for 𝑏(𝐿) is
1166 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

𝑔𝑌 𝑋 (𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝑐(𝐿−1 ) +

or

𝑑(𝐿)𝑑(𝐿−1 )
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (22)
𝑐(𝐿−1 ) +

Formulas Eq. (21) and Eq. (22) are discussed in detail in [134] and [118]
The interested reader can there find several examples of the use of these formulas in eco-
nomics Some classic examples using these formulas are due to [98]
As an example of the usefulness of formula Eq. (22), we let 𝑋𝑡 be a stochastic process with
Wold moving average representation

𝑋𝑡 = 𝑐(𝐿)𝜂𝑡
𝑚
where E𝜂𝑡2 = 1, and 𝑐0 𝜂𝑡 = 𝑋𝑡 − E[𝑋𝑡 |𝑋𝑡−1 , …], 𝑐(𝐿) = ∑𝑗=0 𝑐𝑗 𝐿
Suppose that at time 𝑡, we wish to predict a geometric sum of future 𝑋’s, namely


1
𝑦𝑡 ≡ ∑ 𝛿 𝑗 𝑋𝑡+𝑗 = 𝑋
𝑗=0
1 − 𝛿𝐿−1 𝑡

given knowledge of 𝑋𝑡 , 𝑋𝑡−1 , …


We shall use Eq. (22) to obtain the answer
Using the standard formulas Eq. (14), we have that

𝑔𝑦𝑥 (𝑧) = (1 − 𝛿𝑧 −1 )𝑐(𝑧)𝑐(𝑧 −1 )


𝑔𝑥 (𝑧) = 𝑐(𝑧)𝑐(𝑧 −1 )

Then Eq. (22) becomes

𝑐(𝐿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1 (23)
1 − 𝛿𝐿−1 +

In order to evaluate the term in the annihilation operator, we use the following result from
[55]
Proposition Let

∞ ∞
• 𝑔(𝑧) = ∑𝑗=0 𝑔𝑗 𝑧𝑗 where ∑𝑗=0 |𝑔𝑗 |2 < +∞
• ℎ (𝑧 −1 ) = (1 − 𝛿1 𝑧−1 ) … (1 − 𝛿𝑛 𝑧−1 ), where |𝛿𝑗 | < 1, for 𝑗 = 1, … , 𝑛

Then

𝑛
𝑔(𝑧) 𝑔(𝑧) 𝛿𝑗 𝑔(𝛿𝑗 ) 1
[ −1
] = −1
− ∑ 𝑛 ( ) (24)
ℎ(𝑧 ) + ℎ(𝑧 ) 𝑗=1 ∏ 𝑘=1 (𝛿𝑗 − 𝛿𝑘 ) 𝑧 − 𝛿𝑗
𝑘≠𝑗
71.5. INFINITE HORIZON PREDICTION AND FILTERING PROBLEMS 1167

and, alternatively,

𝑛
𝑔(𝑧) 𝑧𝑔(𝑧) − 𝛿𝑗 𝑔(𝛿𝑗 )
[ −1
] = ∑ 𝐵𝑗 ( ) (25)
ℎ(𝑧 ) + 𝑗=1 𝑧 − 𝛿𝑗

𝑛
where 𝐵𝑗 = 1/ ∏ 𝑘=1 (1 − 𝛿𝑘 /𝛿𝑗 )
𝑘+𝑗

Applying formula Eq. (25) of the proposition to evaluating Eq. (23) with 𝑔(𝑧) = 𝑐(𝑧) and
ℎ(𝑧−1 ) = 1 − 𝛿𝑧 −1 gives

𝐿𝑐(𝐿) − 𝛿𝑐(𝛿)
𝑏(𝐿) = [ ] 𝑐(𝐿)−1
𝐿−𝛿

or

1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
𝑏(𝐿) = [ ]
1 − 𝛿𝐿−1

Thus, we have


1 − 𝛿𝑐(𝛿)𝐿−1 𝑐(𝐿)−1
E [∑ 𝛿 𝑗 𝑋𝑡+𝑗 |𝑋𝑡 , 𝑥𝑡−1 , …] = [ ] 𝑋𝑡 (26)
𝑗=0
1 − 𝛿𝐿−1

This formula is useful in solving stochastic versions of problem 1 of lecture Classical Control
with Linear Algebra in which the randomness emerges because {𝑎𝑡 } is a stochastic process
The problem is to maximize

𝑁
1 1
E0 lim ∑ 𝛽 𝑡 [𝑎𝑡 𝑦𝑡 − ℎ𝑦𝑡2 − [𝑑(𝐿)𝑦𝑡 ]2 ] (27)
𝑁→∞
𝑡−0
2 2

where E𝑡 is mathematical expectation conditioned on information known at 𝑡, and where {𝑎𝑡 }


is a covariance stationary stochastic process with Wold moving average representation

𝑎𝑡 = 𝑐(𝐿) 𝜂𝑡

where

𝑛̃
𝑐(𝐿) = ∑ 𝑐𝑗 𝐿𝑗
𝑗=0

and

𝜂𝑡 = 𝑎𝑡 − E[𝑎𝑡 |𝑎𝑡−1 , …]

The problem is to maximize Eq. (27) with respect to a contingency plan expressing 𝑦𝑡 as a
function of information known at 𝑡, which is assumed to be (𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑎𝑡 , 𝑎𝑡−1 , …)
The solution of this problem can be achieved in two steps
1168 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

First, ignoring the uncertainty, we can solve the problem assuming that {𝑎𝑡 } is a known se-
quence
The solution is, from above,

𝑐(𝐿)𝑦𝑡 = 𝑐(𝛽𝐿−1 )−1 𝑎𝑡

or

𝑚 ∞
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 ∑(𝜆𝑗 𝛽)𝑘 𝑎𝑡+𝑘 (28)
𝑗=1 𝑘=0

Second, the solution of the problem under uncertainty is obtained by replacing the terms on
the right-hand side of the above expressions with their linear least squares predictors
Using Eq. (26) and Eq. (28), we have the following solution

𝑚
1 − 𝛽𝜆𝑗 𝑐(𝛽𝜆𝑗 )𝐿−1 𝑐(𝐿)−1
(1 − 𝜆1 𝐿) … (1 − 𝜆𝑚 𝐿)𝑦𝑡 = ∑ 𝐴𝑗 [ ] 𝑎𝑡
𝑗=1
1 − 𝛽𝜆𝑗 𝐿−1

Blaschke factors
The following is a useful piece of mathematics underlying “root flipping”
𝑚
Let 𝜋(𝑧) = ∑𝑗=0 𝜋𝑗 𝑧𝑗 and let 𝑧1 , … , 𝑧𝑘 be the zeros of 𝜋(𝑧) that are inside the unit circle,
𝑘<𝑚
Then define

(𝑧1 𝑧 − 1) (𝑧 𝑧 − 1) (𝑧 𝑧 − 1)
𝜃(𝑧) = 𝜋(𝑧)( )( 2 )…( 𝑘 )
(𝑧 − 𝑧1 ) (𝑧 − 𝑧2 ) (𝑧 − 𝑧𝑘 )

The term multiplying 𝜋(𝑧) is termed a “Blaschke factor”


Then it can be proved directly that

𝜃(𝑧 −1 )𝜃(𝑧) = 𝜋(𝑧 −1 )𝜋(𝑧)

and that the zeros of 𝜃(𝑧) are not inside the unit circle

71.6 Exercises

71.6.1 Exercise 1

Let 𝑌𝑡 = (1 − 2𝐿)𝑢𝑡 where 𝑢𝑡 is a mean zero white noise with E𝑢2𝑡 = 1. Let

𝑋𝑡 = 𝑌 𝑡 + 𝜀 𝑡

where 𝜀𝑡 is a serially uncorrelated white noise with E𝜀2𝑡 = 9, and E𝜀𝑡 𝑢𝑠 = 0 for all 𝑡 and 𝑠
Find the Wold moving average representation for 𝑋𝑡
71.6. EXERCISES 1169

Find a formula for the 𝐴1𝑗 ’s in


̂𝑡+1 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴1𝑗 𝑋𝑡−𝑗
E𝑋
𝑗=0

Find a formula for the 𝐴2𝑗 ’s in


E𝑋𝑡+2 ∣ 𝑋𝑡 , 𝑋𝑡−1 , … = ∑ 𝐴2𝑗 𝑋𝑡−𝑗
𝑗=0

71.6.2 Exercise 2

Multivariable Prediction: Let 𝑌𝑡 be an (𝑛 × 1) vector stochastic process with moving aver-


age representation

𝑌𝑡 = 𝐷(𝐿)𝑈𝑡
𝑚
where 𝐷(𝐿) = ∑𝑗=0 𝐷𝑗 𝐿𝐽 , 𝐷𝑗 an 𝑛 × 𝑛 matrix, 𝑈𝑡 an (𝑛 × 1) vector white noise with E𝑈𝑡 = 0
for all 𝑡, E𝑈𝑡 𝑈𝑠′ = 0 for all 𝑠 ≠ 𝑡, and E𝑈𝑡 𝑈𝑡′ = 𝐼 for all 𝑡
Let 𝜀𝑡 be an 𝑛 × 1 vector white noise with mean 0 and contemporaneous covariance matrix 𝐻,
where 𝐻 is a positive definite matrix
Let 𝑋𝑡 = 𝑌𝑡 + 𝜀𝑡
Define the covariograms as 𝐶𝑋 (𝜏 ) = E𝑋𝑡 𝑋𝑡−𝜏

, 𝐶𝑌 (𝜏 ) = E𝑌𝑡 𝑌𝑡−𝜏

, 𝐶𝑌 𝑋 (𝜏 ) = E𝑌𝑡 𝑋𝑡−𝜏

Then define the matrix covariance generating function, as in (21), only interpret all the ob-
jects in (21) as matrices
Show that the covariance generating functions are given by

𝑔𝑦 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′
𝑔𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻
𝑔𝑌 𝑋 (𝑧) = 𝐷(𝑧)𝐷(𝑧 −1 )′

A factorization of 𝑔𝑋 (𝑧) can be found (see [111] or [134]) of the form

𝑚
𝐷(𝑧)𝐷(𝑧 −1 )′ + 𝐻 = 𝐶(𝑧)𝐶(𝑧 −1 )′ , 𝐶(𝑧) = ∑ 𝐶𝑗 𝑧𝑗
𝑗=0

where the zeros of |𝐶(𝑧)| do not lie inside the unit circle
A vector Wold moving average representation of 𝑋𝑡 is then

𝑋𝑡 = 𝐶(𝐿)𝜂𝑡

where 𝜂𝑡 is an (𝑛 × 1) vector white noise that is “fundamental” for 𝑋𝑡


That is, 𝑋𝑡 − E [𝑋𝑡 ∣ 𝑋𝑡−1 , 𝑋𝑡−2 …] = 𝐶0 𝜂𝑡
The optimum predictor of 𝑋𝑡+𝑗 is
1170 71. CLASSICAL PREDICTION AND FILTERING WITH LINEAR ALGEBRA

𝐶(𝐿)
E [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ ] 𝜂
𝐿𝑗 + 𝑡

If 𝐶(𝐿) is invertible, i.e., if the zeros of det 𝐶(𝑧) lie strictly outside the unit circle, then this
formula can be written

𝐶(𝐿)
E [𝑋𝑡+𝑗 ∣ 𝑋𝑡 , 𝑋𝑡−1 , …] = [ ] 𝐶(𝐿)−1 𝑋𝑡
𝐿𝐽 +
Part XI

Asset Pricing and Finance

1171
72

Asset Pricing I: Finite State Models

72.1 Contents

• Overview 72.2
• Pricing Models 72.3
• Prices in the Risk-Neutral Case 72.4
• Asset Prices under Risk Aversion 72.5
• Exercises 72.6
• Solutions 72.7

“A little knowledge of geometric series goes a long way” – Robert E. Lucas, Jr.

“Asset pricing is all about covariances” – Lars Peter Hansen

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

72.2 Overview

An asset is a claim on one or more future payoffs


The spot price of an asset depends primarily on

• the anticipated dynamics for the stream of income accruing to the owners
• attitudes to risk
• rates of time preference

In this lecture, we consider some standard pricing models and dividend stream specifications
We study how prices and dividend-price ratios respond in these different scenarios
We also look at creating and pricing derivative assets by repackaging income streams
Key tools for the lecture are

1173
1174 72. ASSET PRICING I: FINITE STATE MODELS

• formulas for predicting future values of functions of a Markov state


• a formula for predicting the discounted sum of future values of a Markov state

72.3 Pricing Models

In what follows let {𝑑𝑡 }𝑡≥0 be a stream of dividends

• A time-𝑡 cum-dividend asset is a claim to the stream 𝑑𝑡 , 𝑑𝑡+1 , …


• A time-𝑡 ex-dividend asset is a claim to the stream 𝑑𝑡+1 , 𝑑𝑡+2 , …

Let’s look at some equations that we expect to hold for prices of assets under ex-dividend
contracts (we will consider cum-dividend pricing in the exercises)

72.3.1 Risk-Neutral Pricing

Our first scenario is risk-neutral pricing


Let 𝛽 = 1/(1 + 𝜌) be an intertemporal discount factor, where 𝜌 is the rate at which agents
discount the future
The basic risk-neutral asset pricing equation for pricing one unit of an ex-dividend asset is

𝑝𝑡 = 𝛽E𝑡 [𝑑𝑡+1 + 𝑝𝑡+1 ] (1)

This is a simple “cost equals expected benefit” relationship


Here E𝑡 [𝑦] denotes the best forecast of 𝑦, conditioned on information available at time 𝑡

72.3.2 Pricing with Random Discount Factor

What happens if for some reason traders discount payouts differently depending on the state
of the world?
Michael Harrison and David Kreps [62] and Lars Peter Hansen and Scott Richard [54] showed
that in quite general settings the price of an ex-dividend asset obeys

𝑝𝑡 = E𝑡 [𝑚𝑡+1 (𝑑𝑡+1 + 𝑝𝑡+1 )] (2)

for some stochastic discount factor 𝑚𝑡+1


The fixed discount factor 𝛽 in Eq. (1) has been replaced by the random variable 𝑚𝑡+1
The way anticipated future payoffs are evaluated can now depend on various random out-
comes
One example of this idea is that assets that tend to have good payoffs in bad states of the
world might be regarded as more valuable
This is because they pay well when the funds are more urgently needed
We give examples of how the stochastic discount factor has been modeled below
72.4. PRICES IN THE RISK-NEUTRAL CASE 1175

72.3.3 Asset Pricing and Covariances

Recall that, from the definition of a conditional covariance cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ), we have

E𝑡 (𝑥𝑡+1 𝑦𝑡+1 ) = cov𝑡 (𝑥𝑡+1 , 𝑦𝑡+1 ) + E𝑡 𝑥𝑡+1 E𝑡 𝑦𝑡+1 (3)

If we apply this definition to the asset pricing equation Eq. (2) we obtain

𝑝𝑡 = E𝑡 𝑚𝑡+1 E𝑡 (𝑑𝑡+1 + 𝑝𝑡+1 ) + cov𝑡 (𝑚𝑡+1 , 𝑑𝑡+1 + 𝑝𝑡+1 ) (4)

It is useful to regard equation Eq. (4) as a generalization of equation Eq. (1)

• In equation Eq. (1), the stochastic discount factor 𝑚𝑡+1 = 𝛽, a constant


• In equation Eq. (1), the covariance term cov𝑡 (𝑚𝑡+1 , 𝑑𝑡+1 + 𝑝𝑡+1 ) is zero because 𝑚𝑡+1 =
𝛽

Equation Eq. (4) asserts that the covariance of the stochastic discount factor with the one
period payout 𝑑𝑡+1 + 𝑝𝑡+1 is an important determinant of the price 𝑝𝑡
We give examples of some models of stochastic discount factors that have been proposed later
in this lecture and also in a later lecture

72.3.4 The Price-Dividend Ratio

Aside from prices, another quantity of interest is the price-dividend ratio 𝑣𝑡 ∶= 𝑝𝑡 /𝑑𝑡
Let’s write down an expression that this ratio should satisfy
We can divide both sides of Eq. (2) by 𝑑𝑡 to get

𝑑𝑡+1
𝑣𝑡 = E𝑡 [𝑚𝑡+1 (1 + 𝑣𝑡+1 )] (5)
𝑑𝑡

Below we’ll discuss the implication of this equation

72.4 Prices in the Risk-Neutral Case

What can we say about price dynamics on the basis of the models described above?
The answer to this question depends on

1. the process we specify for dividends


2. the stochastic discount factor and how it correlates with dividends

For now let’s focus on the risk-neutral case, where the stochastic discount factor is constant,
and study how prices depend on the dividend process
1176 72. ASSET PRICING I: FINITE STATE MODELS

72.4.1 Example 1: Constant Dividends

The simplest case is risk-neutral pricing in the face of a constant, non-random dividend
stream 𝑑𝑡 = 𝑑 > 0
Removing the expectation from Eq. (1) and iterating forward gives

𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )
= 𝛽(𝑑 + 𝛽(𝑑 + 𝑝𝑡+2 ))

= 𝛽(𝑑 + 𝛽𝑑 + 𝛽 2 𝑑 + ⋯ + 𝛽 𝑘−2 𝑑 + 𝛽 𝑘−1 𝑝𝑡+𝑘 )

Unless prices explode in the future, this sequence converges to

𝛽𝑑
𝑝̄ ∶= (6)
1−𝛽

This price is the equilibrium price in the constant dividend case


Indeed, simple algebra shows that setting 𝑝𝑡 = 𝑝̄ for all 𝑡 satisfies the equilibrium condition
𝑝𝑡 = 𝛽(𝑑 + 𝑝𝑡+1 )

72.4.2 Example 2: Dividends with Deterministic Growth Paths

Consider a growing, non-random dividend process 𝑑𝑡+1 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1
While prices are not usually constant when dividends grow over time, the price dividend-ratio
might be
If we guess this, substituting 𝑣𝑡 = 𝑣 into Eq. (5) as well as our other assumptions, we get
𝑣 = 𝛽𝑔(1 + 𝑣)
Since 𝛽𝑔 < 1, we have a unique positive solution:

𝛽𝑔
𝑣=
1 − 𝛽𝑔

The price is then

𝛽𝑔
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡

If, in this example, we take 𝑔 = 1 + 𝜅 and let 𝜌 ∶= 1/𝛽 − 1, then the price becomes

1+𝜅
𝑝𝑡 = 𝑑
𝜌−𝜅 𝑡

This is called the Gordon formula

72.4.3 Example 3: Markov Growth, Risk-Neutral Pricing

Next, we consider a dividend process


72.4. PRICES IN THE RISK-NEUTRAL CASE 1177

𝑑𝑡+1 = 𝑔𝑡+1 𝑑𝑡 (7)

The stochastic growth factor {𝑔𝑡 } is given by

𝑔𝑡 = 𝑔(𝑋𝑡 ), 𝑡 = 1, 2, …

where

1. {𝑋𝑡 } is a finite Markov chain with state space 𝑆 and transition probabilities

𝑃 (𝑥, 𝑦) ∶= P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆)

1. 𝑔 is a given function on 𝑆 taking positive values

You can think of

• 𝑆 as 𝑛 possible “states of the world” and 𝑋𝑡 as the current state


• 𝑔 as a function that maps a given state 𝑋𝑡 into a growth factor 𝑔𝑡 = 𝑔(𝑋𝑡 ) for the en-
dowment
• ln 𝑔𝑡 = ln(𝑑𝑡+1 /𝑑𝑡 ) is the growth rate of dividends

(For a refresher on notation and theory for finite Markov chains see this lecture)
The next figure shows a simulation, where

• {𝑋𝑡 } evolves as a discretized AR1 process produced using Tauchen’s method


• 𝑔𝑡 = exp(𝑋𝑡 ), so that ln 𝑔𝑡 = 𝑋𝑡 is the growth rate

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline
import quantecon as qe

mc = qe.tauchen(0.96, 0.25, n=25)


sim_length = 80

x_series = mc.simulate(sim_length, init=np.median(mc.state_values))


g_series = np.exp(x_series)
d_series = np.cumprod(g_series) # assumes d_0 = 1

series = [x_series, g_series, d_series, np.log(d_series)]


labels = ['$X_t$', '$g_t$', '$d_t$', r'$\log \, d_t$']

fig, axes = plt.subplots(2, 2, figsize=(12, 8))


for ax, s, label in zip(axes.flatten(), series, labels):
ax.plot(s, 'b-', lw=2, label=label)
ax.legend(loc='upper left', frameon=False)
plt.tight_layout()
plt.show()
1178 72. ASSET PRICING I: FINITE STATE MODELS

Pricing
To obtain asset prices in this setting, let’s adapt our analysis from the case of deterministic
growth
In that case, we found that 𝑣 is constant
This encourages us to guess that, in the current case, 𝑣𝑡 is constant given the state 𝑋𝑡
In other words, we are looking for a fixed function 𝑣 such that the price-dividend ratio satis-
fies 𝑣𝑡 = 𝑣(𝑋𝑡 )
We can substitute this guess into Eq. (5) to get

𝑣(𝑋𝑡 ) = 𝛽E𝑡 [𝑔(𝑋𝑡+1 )(1 + 𝑣(𝑋𝑡+1 ))]

If we condition on 𝑋𝑡 = 𝑥, this becomes

𝑣(𝑥) = 𝛽 ∑ 𝑔(𝑦)(1 + 𝑣(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

or

𝑣(𝑥) = 𝛽 ∑ 𝐾(𝑥, 𝑦)(1 + 𝑣(𝑦)) where 𝐾(𝑥, 𝑦) ∶= 𝑔(𝑦)𝑃 (𝑥, 𝑦) (8)


𝑦∈𝑆

Suppose that there are 𝑛 possible states 𝑥1 , … , 𝑥𝑛


We can then think of Eq. (8) as 𝑛 stacked equations, one for each state, and write it in ma-
trix form as

𝑣 = 𝛽𝐾(1 + 𝑣) (9)
72.4. PRICES IN THE RISK-NEUTRAL CASE 1179

Here

• 𝑣 is understood to be the column vector (𝑣(𝑥1 ), … , 𝑣(𝑥𝑛 ))′


• 𝐾 is the matrix (𝐾(𝑥𝑖 , 𝑥𝑗 ))1≤𝑖,𝑗≤𝑛
• 1 is a column vector of ones

When does Eq. (9) have a unique solution?


From the Neumann series lemma and Gelfand’s formula, this will be the case if 𝛽𝐾 has spec-
tral radius strictly less than one
In other words, we require that the eigenvalues of 𝐾 be strictly less than 𝛽 −1 in modulus
The solution is then

𝑣 = (𝐼 − 𝛽𝐾)−1 𝛽𝐾1 (10)

72.4.4 Code

Let’s calculate and plot the price-dividend ratio at a set of parameters


As before, we’ll generate {𝑋𝑡 } as a discretized AR1 process and set 𝑔𝑡 = exp(𝑋𝑡 )
Here’s the code, including a test of the spectral radius condition

In [3]: from numpy.linalg import eigvals, solve

n = 25 # size of state space


β = 0.9
mc = qe.tauchen(0.96, 0.02, n=n)

K = mc.P * np.exp(mc.state_values)

warning_message = "Spectral radius condition fails"


assert np.max(np.abs(eigvals(K))) < 1 / β, warning_message

I = np.identity(n)
v = solve(I - β * K, β * K @ np.ones(n))

fig, ax = plt.subplots(figsize=(12, 8))


ax.plot(mc.state_values, v, 'g-o', lw=2, alpha=0.7, label='$v$')
ax.set_ylabel("price-dividend ratio")
ax.set_xlabel("state")
ax.legend(loc='upper left')
plt.show()
1180 72. ASSET PRICING I: FINITE STATE MODELS

Why does the price-dividend ratio increase with the state?


The reason is that this Markov process is positively correlated, so high current states suggest
high future states
Moreover, dividend growth is increasing in the state
The anticipation of high future dividend growth leads to a high price-dividend ratio

72.5 Asset Prices under Risk Aversion

Now let’s turn to the case where agents are risk averse
We’ll price several distinct assets, including

• The price of an endowment stream


• A consol (a type of bond issued by the UK government in the 19th century)
• Call options on a consol

72.5.1 Pricing a Lucas Tree

Let’s start with a version of the celebrated asset pricing model of Robert E. Lucas, Jr. [88]
As in [88], suppose that the stochastic discount factor takes the form

𝑢′ (𝑐𝑡+1 )
𝑚𝑡+1 = 𝛽 (11)
𝑢′ (𝑐𝑡 )

where 𝑢 is a concave utility function and 𝑐𝑡 is time 𝑡 consumption of a representative con-


sumer
72.5. ASSET PRICES UNDER RISK AVERSION 1181

(A derivation of this expression is given in a later lecture)


Assume the existence of an endowment that follows Eq. (7)
The asset being priced is a claim on the endowment process
Following [88], suppose further that in equilibrium, consumption is equal to the endowment,
so that 𝑑𝑡 = 𝑐𝑡 for all 𝑡
For utility, we’ll assume the constant relative risk aversion (CRRA) specification

𝑐1−𝛾
𝑢(𝑐) = with 𝛾 > 0 (12)
1−𝛾

When 𝛾 = 1 we let 𝑢(𝑐) = ln 𝑐


Inserting the CRRA specification into Eq. (11) and using 𝑐𝑡 = 𝑑𝑡 gives

−𝛾
𝑐 −𝛾
𝑚𝑡+1 = 𝛽 ( 𝑡+1 ) = 𝛽𝑔𝑡+1 (13)
𝑐𝑡

Substituting this into Eq. (5) gives the price-dividend ratio formula

𝑣(𝑋𝑡 ) = 𝛽E𝑡 [𝑔(𝑋𝑡+1 )1−𝛾 (1 + 𝑣(𝑋𝑡+1 ))]

Conditioning on 𝑋𝑡 = 𝑥, we can write this as

𝑣(𝑥) = 𝛽 ∑ 𝑔(𝑦)1−𝛾 (1 + 𝑣(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

If we let

𝐽 (𝑥, 𝑦) ∶= 𝑔(𝑦)1−𝛾 𝑃 (𝑥, 𝑦)

then we can rewrite in vector form as

𝑣 = 𝛽𝐽 (1 + 𝑣)

Assuming that the spectral radius of 𝐽 is strictly less than 𝛽 −1 , this equation has the unique
solution

𝑣 = (𝐼 − 𝛽𝐽 )−1 𝛽𝐽 1 (14)

We will define a function tree_price to solve for 𝑣 given parameters stored in the class Asset-
PriceModel

In [4]: class AssetPriceModel:


"""
A class that stores the primitives of the asset pricing model.

Parameters
----------
β : scalar, float
Discount factor
1182 72. ASSET PRICING I: FINITE STATE MODELS

mc : MarkovChain
Contains the transition matrix and set of state values for the state
process
γ : scalar(float)
Coefficient of risk aversion
g : callable
The function mapping states to growth rates

"""
def __init__(self, β=0.96, mc=None, γ=2.0, g=np.exp):
self.β, self.γ = β, γ
self.g = g

# == A default process for the Markov chain == #


if mc is None:
self.ρ = 0.9
self.σ = 0.02
self.mc = qe.tauchen(self.ρ, self.σ, n=25)
else:
self.mc = mc

self.n = self.mc.P.shape[0]

def test_stability(self, Q):


"""
Stability test for a given matrix Q.
"""
sr = np.max(np.abs(eigvals(Q)))
if not sr < 1 / self.β:
msg = f"Spectral radius condition failed with radius = {sr}"
raise ValueError(msg)

def tree_price(ap):
"""
Computes the price-dividend ratio of the Lucas tree.

Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

Returns
-------
v : array_like(float)
Lucas tree price-dividend ratio

"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
J = P * ap.g(y)**(1 - γ)

# == Make sure that a unique solution exists == #


ap.test_stability(J)

# == Compute v == #
I = np.identity(ap.n)
Ones = np.ones(ap.n)
v = solve(I - β * J, β * J @ Ones)

return v

Here’s a plot of 𝑣 as a function of the state for several values of 𝛾, with a positively correlated
Markov process and 𝑔(𝑥) = exp(𝑥)

In [5]: γs = [1.2, 1.4, 1.6, 1.8, 2.0]


ap = AssetPriceModel()
states = ap.mc.state_values

fig, ax = plt.subplots(figsize=(12, 8))


72.5. ASSET PRICES UNDER RISK AVERSION 1183

for γ in γs:
ap.γ = γ
v = tree_price(ap)
ax.plot(states, v, lw=2, alpha=0.6, label=rf"$\gamma = {γ}$")

ax.set_title('Price-divdend ratio as a function of the state')


ax.set_ylabel("price-dividend ratio")
ax.set_xlabel("state")
ax.legend(loc='upper right')
plt.show()

Notice that 𝑣 is decreasing in each case


This is because, with a positively correlated state process, higher states suggest higher future
consumption growth
In the stochastic discount factor Eq. (13), higher growth decreases the discount factor, lower-
ing the weight placed on future returns
Special Cases
In the special case 𝛾 = 1, we have 𝐽 = 𝑃
Recalling that 𝑃 𝑖 1 = 1 for all 𝑖 and applying Neumann’s geometric series lemma, we are led
to


1
𝑣 = 𝛽(𝐼 − 𝛽𝑃 )−1 1 = 𝛽 ∑ 𝛽 𝑖 𝑃 𝑖 1 = 𝛽 1
𝑖=0
1−𝛽

Thus, with log preferences, the price-dividend ratio for a Lucas tree is constant
Alternatively, if 𝛾 = 0, then 𝐽 = 𝐾 and we recover the risk-neutral solution Eq. (10)
This is as expected, since 𝛾 = 0 implies 𝑢(𝑐) = 𝑐 (and hence agents are risk-neutral)
1184 72. ASSET PRICING I: FINITE STATE MODELS

72.5.2 A Risk-Free Consol

Consider the same pure exchange representative agent economy


A risk-free consol promises to pay a constant amount 𝜁 > 0 each period
Recycling notation, let 𝑝𝑡 now be the price of an ex-coupon claim to the consol
An ex-coupon claim to the consol entitles the owner at the end of period 𝑡 to

• 𝜁 in period 𝑡 + 1, plus
• the right to sell the claim for 𝑝𝑡+1 next period

The price satisfies Eq. (2) with 𝑑𝑡 = 𝜁, or

𝑝𝑡 = E𝑡 [𝑚𝑡+1 (𝜁 + 𝑝𝑡+1 )]

We maintain the stochastic discount factor Eq. (13), so this becomes

−𝛾
𝑝𝑡 = E𝑡 [𝛽𝑔𝑡+1 (𝜁 + 𝑝𝑡+1 )] (15)

Guessing a solution of the form 𝑝𝑡 = 𝑝(𝑋𝑡 ) and conditioning on 𝑋𝑡 = 𝑥, we get

𝑝(𝑥) = 𝛽 ∑ 𝑔(𝑦)−𝛾 (𝜁 + 𝑝(𝑦))𝑃 (𝑥, 𝑦)


𝑦∈𝑆

Letting 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and rewriting in vector notation yields the solution

𝑝 = (𝐼 − 𝛽𝑀 )−1 𝛽𝑀 𝜁1 (16)

The above is implemented in the function consol_price

In [6]: def consol_price(ap, ζ):


"""
Computes price of a consol bond with payoff ζ

Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

ζ : scalar(float)
Coupon of the console

Returns
-------
p : array_like(float)
Console bond prices

"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)

# == Make sure that a unique solution exists == #


ap.test_stability(M)

# == Compute price == #
I = np.identity(ap.n)
72.5. ASSET PRICES UNDER RISK AVERSION 1185

Ones = np.ones(ap.n)
p = solve(I - β * M, β * ζ * M @ Ones)

return p

72.5.3 Pricing an Option to Purchase the Consol

Let’s now price options of varying maturity that give the right to purchase a consol at a price
𝑝𝑆
An Infinite Horizon Call Option
We want to price an infinite horizon option to purchase a consol at a price 𝑝𝑆
The option entitles the owner at the beginning of a period either to

1. purchase the bond at price 𝑝𝑆 now, or


2. Not to exercise the option now but to retain the right to exercise it later

Thus, the owner either exercises the option now or chooses not to exercise and wait until next
period
This is termed an infinite-horizon call option with strike price 𝑝𝑆
The owner of the option is entitled to purchase the consol at the price 𝑝𝑆 at the beginning of
any period, after the coupon has been paid to the previous owner of the bond
The fundamentals of the economy are identical with the one above, including the stochastic
discount factor and the process for consumption
Let 𝑤(𝑋𝑡 , 𝑝𝑆 ) be the value of the option when the time 𝑡 growth state is known to be 𝑋𝑡 but
before the owner has decided whether or not to exercise the option at time 𝑡 (i.e., today)
Recalling that 𝑝(𝑋𝑡 ) is the value of the consol when the initial growth state is 𝑋𝑡 , the value
of the option satisfies

𝑢′ (𝑐𝑡+1 )
𝑤(𝑋𝑡 , 𝑝𝑆 ) = max {𝛽 E𝑡 𝑤(𝑋𝑡+1 , 𝑝𝑆 ), 𝑝(𝑋𝑡 ) − 𝑝𝑆 }
𝑢′ (𝑐𝑡 )

The first term on the right is the value of waiting, while the second is the value of exercising
now
We can also write this as

𝑤(𝑥, 𝑝𝑆 ) = max {𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 𝑤(𝑦, 𝑝𝑆 ), 𝑝(𝑥) − 𝑝𝑆 } (17)


𝑦∈𝑆

With 𝑀 (𝑥, 𝑦) = 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 and 𝑤 as the vector of values (𝑤(𝑥𝑖 ), 𝑝𝑆 )𝑛𝑖=1 , we can express
Eq. (17) as the nonlinear vector equation

𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 1} (18)

To solve Eq. (18), form the operator 𝑇 mapping vector 𝑤 into vector 𝑇 𝑤 via

𝑇 𝑤 = max{𝛽𝑀 𝑤, 𝑝 − 𝑝𝑆 1}
1186 72. ASSET PRICING I: FINITE STATE MODELS

Start at some initial 𝑤 and iterate to convergence with 𝑇


We can find the solution with the following function call_option

In [7]: def call_option(ap, ζ, p_s, �=1e-7):


"""
Computes price of a call option on a consol bond.

Parameters
----------
ap: AssetPriceModel
An instance of AssetPriceModel containing primitives

ζ : scalar(float)
Coupon of the console

p_s : scalar(float)
Strike price

� : scalar(float), optional(default=1e-8)
Tolerance for infinite horizon problem

Returns
-------
w : array_like(float)
Infinite horizon call option prices

"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)

# == Make sure that a unique consol price exists == #


ap.test_stability(M)

# == Compute option price == #


p = consol_price(ap, ζ)
w = np.zeros(ap.n)
error = � + 1
while error > �:
# == Maximize across columns == #
w_new = np.maximum(β * M @ w, p - p_s)
# == Find maximal difference of each component and update == #
error = np.amax(np.abs(w - w_new))
w = w_new

return w

Here’s a plot of 𝑤 compared to the consol price when 𝑃𝑆 = 40

In [8]: ap = AssetPriceModel(β=0.9)
ζ = 1.0
strike_price = 40

x = ap.mc.state_values
p = consol_price(ap, ζ)
w = call_option(ap, ζ, strike_price)

fig, ax = plt.subplots(figsize=(12, 8))


ax.plot(x, p, 'b-', lw=2, label='consol price')
ax.plot(x, w, 'g-', lw=2, label='value of call option')
ax.set_xlabel("state")
ax.legend(loc='upper right')
plt.show()
72.5. ASSET PRICES UNDER RISK AVERSION 1187

In large states, the value of the option is close to zero


This is despite the fact the Markov chain is irreducible and low states — where the consol
prices are high — will eventually be visited
The reason is that 𝛽 = 0.9, so the future is discounted relatively rapidly

72.5.4 Risk-Free Rates

Let’s look at risk-free interest rates over different periods


The One-period Risk-free Interest Rate
−𝛾
As before, the stochastic discount factor is 𝑚𝑡+1 = 𝛽𝑔𝑡+1
It follows that the reciprocal 𝑅𝑡−1 of the gross risk-free interest rate 𝑅𝑡 in state 𝑥 is

E𝑡 𝑚𝑡+1 = 𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾


𝑦∈𝑆

We can write this as

𝑚1 = 𝛽𝑀 1

where the 𝑖-th element of 𝑚1 is the reciprocal of the one-period gross risk-free interest rate in
state 𝑥𝑖
Other Terms
Let 𝑚𝑗 be an 𝑛 × 1 vector whose 𝑖 th component is the reciprocal of the 𝑗 -period gross risk-
free interest rate in state 𝑥𝑖
1188 72. ASSET PRICING I: FINITE STATE MODELS

Then 𝑚1 = 𝛽𝑀 , and 𝑚𝑗+1 = 𝑀 𝑚𝑗 for 𝑗 ≥ 1

72.6 Exercises

72.6.1 Exercise 1

In the lecture, we considered ex-dividend assets


A cum-dividend asset is a claim to the stream 𝑑𝑡 , 𝑑𝑡+1 , …
Following Eq. (1), find the risk-neutral asset pricing equation for one unit of a cum-dividend
asset
With a constant, non-random dividend stream 𝑑𝑡 = 𝑑 > 0, what is the equilibrium price of a
cum-dividend asset?
With a growing, non-random dividend process 𝑑𝑡 = 𝑔𝑑𝑡 where 0 < 𝑔𝛽 < 1, what is the
equilibrium price of a cum-dividend asset?

72.6.2 Exercise 2

Consider the following primitives

In [9]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # state values of the Markov chain
γ = 2.0
β = 0.94

Let 𝑔 be defined by 𝑔(𝑥) = 𝑥 (that is, 𝑔 is the identity map)


Compute the price of the Lucas tree
Do the same for

• the price of the risk-free consol when 𝜁 = 1


• the call option on the consol when 𝜁 = 1 and 𝑝𝑆 = 150.0

72.6.3 Exercise 3

Let’s consider finite horizon call options, which are more common than the infinite horizon
variety
Finite horizon options obey functional equations closely related to Eq. (17)
A 𝑘 period option expires after 𝑘 periods
If we view today as date zero, a 𝑘 period option gives the owner the right to exercise the op-
tion to purchase the risk-free consol at the strike price 𝑝𝑆 at dates 0, 1, … , 𝑘 − 1
The option expires at time 𝑘
Thus, for 𝑘 = 1, 2, …, let 𝑤(𝑥, 𝑘) be the value of a 𝑘-period option
It obeys
72.7. SOLUTIONS 1189

𝑤(𝑥, 𝑘) = max {𝛽 ∑ 𝑃 (𝑥, 𝑦)𝑔(𝑦)−𝛾 𝑤(𝑦, 𝑘 − 1), 𝑝(𝑥) − 𝑝𝑆 }


𝑦∈𝑆

where 𝑤(𝑥, 0) = 0 for all 𝑥


We can express the preceding as the sequence of nonlinear vector equations

𝑤𝑘 = max{𝛽𝑀 𝑤𝑘−1 , 𝑝 − 𝑝𝑆 1} 𝑘 = 1, 2, … with 𝑤0 = 0

Write a function that computes 𝑤𝑘 for any given 𝑘


Compute the value of the option with k = 5 and k = 25 using parameter values as in Exer-
cise 1
Is one higher than the other? Can you give intuition?

72.7 Solutions

72.7.1 Exercise 1

For a cum-dividend asset, the basic risk-neutral asset pricing equation is

𝑝𝑡 = 𝑑𝑡 + 𝛽E𝑡 [𝑝𝑡+1 ]

With constant dividends, the equilibrium price is

1
𝑝𝑡 = 𝑑
1−𝛽 𝑡

With a growing, non-random dividend process, the equilibrium price is

1
𝑝𝑡 = 𝑑
1 − 𝛽𝑔 𝑡

72.7.2 Exercise 2

First, let’s enter the parameters:

In [10]: n = 5
P = 0.0125 * np.ones((n, n))
P += np.diag(0.95 - 0.0125 * np.ones(5))
s = np.array([0.95, 0.975, 1.0, 1.025, 1.05]) # state values
mc = qe.MarkovChain(P, state_values=s)

γ = 2.0
β = 0.94
ζ = 1.0
p_s = 150.0

Next, we’ll create an instance of AssetPriceModel to feed into the functions

In [11]: apm = AssetPriceModel(β=β, mc=mc, γ=γ, g=lambda x: x)


1190 72. ASSET PRICING I: FINITE STATE MODELS

Now we just need to call the relevant functions on the data:

In [12]: tree_price(apm)

Out[12]: array([29.47401578, 21.93570661, 17.57142236, 14.72515002, 12.72221763])

In [13]: consol_price(apm, ζ)

Out[13]: array([753.87100476, 242.55144082, 148.67554548, 109.25108965,


87.56860139])

In [14]: call_option(apm, ζ, p_s)

Out[14]: array([603.87100476, 176.8393343 , 108.67734499, 80.05179254,


64.30843748])

Let’s show the last two functions as a plot

In [15]: fig, ax = plt.subplots()


ax.plot(s, consol_price(apm, ζ), label='consol')
ax.plot(s, call_option(apm, ζ, p_s), label='call option')
ax.legend()
plt.show()

72.7.3 Exercise 3

Here’s a suitable function:


72.7. SOLUTIONS 1191

In [16]: def finite_horizon_call_option(ap, ζ, p_s, k):


"""
Computes k period option value.
"""
# == Simplify names, set up matrices == #
β, γ, P, y = ap.β, ap.γ, ap.mc.P, ap.mc.state_values
M = P * ap.g(y)**(- γ)

# == Make sure that a unique solution exists == #


ap.test_stability(M)

# == Compute option price == #


p = consol_price(ap, ζ)
w = np.zeros(ap.n)
for i in range(k):
# == Maximize across columns == #
w = np.maximum(β * M @ w, p - p_s)

return w

Now let’s compute the option values at k=5 and k=25

In [17]: fig, ax = plt.subplots()


for k in [5, 25]:
w = finite_horizon_call_option(apm, ζ, p_s, k)
ax.plot(s, w, label=rf'$k = {k}$')
ax.legend()
plt.show()

Not surprisingly, the option has greater value with larger 𝑘


This is because the owner has a longer time horizon over which he or she may exercise the
option
1192 72. ASSET PRICING I: FINITE STATE MODELS
73

Asset Pricing II: The Lucas Asset


Pricing Model

73.1 Contents

• Overview 73.2

• The Lucas Model 73.3

• Exercises 73.4

• Solutions 73.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install interpolation

73.2 Overview

As stated in an earlier lecture, an asset is a claim on a stream of prospective payments


What is the correct price to pay for such a claim?
The elegant asset pricing model of Lucas [88] attempts to answer this question in an equilib-
rium setting with risk-averse agents
While we mentioned some consequences of Lucas’ model earlier, it is now time to work
through the model more carefully and try to understand where the fundamental asset pric-
ing equation comes from
A side benefit of studying Lucas’ model is that it provides a beautiful illustration of model
building in general and equilibrium pricing in competitive models in particular
Another difference to our first asset pricing lecture is that the state space and shock will be
continuous rather than discrete
Let’s start with some imports

In [2]: import numpy as np


from interpolation import interp

1193
1194 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL

from numba import njit, prange


from scipy.stats import lognorm
import matplotlib.pyplot as plt
%matplotlib inline

73.3 The Lucas Model

Lucas studied a pure exchange economy with a representative consumer (or household), where

• Pure exchange means that all endowments are exogenous

• Representative consumer means that either

– there is a single consumer (sometimes also referred to as a household), or


– all consumers have identical endowments and preferences

Either way, the assumption of a representative agent means that prices adjust to eradicate
desires to trade
This makes it very easy to compute competitive equilibrium prices

73.3.1 Basic Setup

Let’s review the setup


Assets
There is a single “productive unit” that costlessly generates a sequence of consumption goods
{𝑦𝑡 }∞
𝑡=0

Another way to view {𝑦𝑡 }∞


𝑡=0 is as a consumption endowment for this economy

We will assume that this endowment is Markovian, following the exogenous process

𝑦𝑡+1 = 𝐺(𝑦𝑡 , 𝜉𝑡+1 )

Here {𝜉𝑡 } is an IID shock sequence with known distribution 𝜙 and 𝑦𝑡 ≥ 0


An asset is a claim on all or part of this endowment stream
The consumption goods {𝑦𝑡 }∞
𝑡=0 are nonstorable, so holding assets is the only way to transfer
wealth into the future
For the purposes of intuition, it’s common to think of the productive unit as a “tree” that
produces fruit
Based on this idea, a “Lucas tree” is a claim on the consumption endowment
Consumers
A representative consumer ranks consumption streams {𝑐𝑡 } according to the time separable
utility functional


E ∑ 𝛽 𝑡 𝑢(𝑐𝑡 ) (1)
𝑡=0
73.3. THE LUCAS MODEL 1195

Here

• 𝛽 ∈ (0, 1) is a fixed discount factor


• 𝑢 is a strictly increasing, strictly concave, continuously differentiable period utility func-
tion
• E is a mathematical expectation

73.3.2 Pricing a Lucas Tree

What is an appropriate price for a claim on the consumption endowment?


We’ll price an ex-dividend claim, meaning that

• the seller retains this period’s dividend

• the buyer pays 𝑝𝑡 today to purchase a claim on

– 𝑦𝑡+1 and
– the right to sell the claim tomorrow at price 𝑝𝑡+1

Since this is a competitive model, the first step is to pin down consumer behavior, taking
prices as given
Next, we’ll impose equilibrium constraints and try to back out prices
In the consumer problem, the consumer’s control variable is the share 𝜋𝑡 of the claim held in
each period
Thus, the consumer problem is to maximize Eq. (1) subject to

𝑐𝑡 + 𝜋𝑡+1 𝑝𝑡 ≤ 𝜋𝑡 𝑦𝑡 + 𝜋𝑡 𝑝𝑡

along with 𝑐𝑡 ≥ 0 and 0 ≤ 𝜋𝑡 ≤ 1 at each 𝑡


The decision to hold share 𝜋𝑡 is actually made at time 𝑡 − 1
But this value is inherited as a state variable at time 𝑡, which explains the choice of subscript
The Dynamic Program
We can write the consumer problem as a dynamic programming problem
Our first observation is that prices depend on current information, and current information is
really just the endowment process up until the current period
In fact, the endowment process is Markovian, so that the only relevant information is the cur-
rent state 𝑦 ∈ R+ (dropping the time subscript)
This leads us to guess an equilibrium where price is a function 𝑝 of 𝑦
Remarks on the solution method

• Since this is a competitive (read: price taking) model, the consumer will take this func-
tion 𝑝 as given
• In this way, we determine consumer behavior given 𝑝 and then use equilibrium condi-
tions to recover 𝑝
1196 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL

• This is the standard way to solve competitive equilibrium models

Using the assumption that price is a given function 𝑝 of 𝑦, we write the value function and
constraint as

𝑣(𝜋, 𝑦) = max

{𝑢(𝑐) + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)}
𝑐,𝜋

subject to

𝑐 + 𝜋′ 𝑝(𝑦) ≤ 𝜋𝑦 + 𝜋𝑝(𝑦) (2)

We can invoke the fact that utility is increasing to claim equality in Eq. (2) and hence elimi-
nate the constraint, obtaining

𝑣(𝜋, 𝑦) = max

{𝑢[𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ 𝑝(𝑦)] + 𝛽 ∫ 𝑣(𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)} (3)
𝜋

The solution to this dynamic programming problem is an optimal policy expressing either 𝜋′
or 𝑐 as a function of the state (𝜋, 𝑦)

• Each one determines the other, since 𝑐(𝜋, 𝑦) = 𝜋(𝑦 + 𝑝(𝑦)) − 𝜋′ (𝜋, 𝑦)𝑝(𝑦)

Next Steps
What we need to do now is determine equilibrium prices
It seems that to obtain these, we will have to

1. Solve this two-dimensional dynamic programming problem for the optimal policy
2. Impose equilibrium constraints
3. Solve out for the price function 𝑝(𝑦) directly

However, as Lucas showed, there is a related but more straightforward way to do this
Equilibrium Constraints
Since the consumption good is not storable, in equilibrium we must have 𝑐𝑡 = 𝑦𝑡 for all 𝑡
In addition, since there is one representative consumer (alternatively, since all consumers are
identical), there should be no trade in equilibrium
In particular, the representative consumer owns the whole tree in every period, so 𝜋𝑡 = 1 for
all 𝑡
Prices must adjust to satisfy these two constraints
The Equilibrium Price Function
Now observe that the first-order condition for Eq. (3) can be written as

𝑢′ (𝑐)𝑝(𝑦) = 𝛽 ∫ 𝑣1′ (𝜋′ , 𝐺(𝑦, 𝑧))𝜙(𝑑𝑧)

where 𝑣1′ is the derivative of 𝑣 with respect to its first argument


73.3. THE LUCAS MODEL 1197

To obtain 𝑣1′ we can simply differentiate the right-hand side of Eq. (3) with respect to 𝜋,
yielding

𝑣1′ (𝜋, 𝑦) = 𝑢′ (𝑐)(𝑦 + 𝑝(𝑦))

Next, we impose the equilibrium constraints while combining the last two equations to get

𝑢′ [𝐺(𝑦, 𝑧)]
𝑝(𝑦) = 𝛽 ∫ [𝐺(𝑦, 𝑧) + 𝑝(𝐺(𝑦, 𝑧))]𝜙(𝑑𝑧) (4)
𝑢′ (𝑦)

In sequential rather than functional notation, we can also write this as

𝑢′ (𝑐𝑡+1 )
𝑝𝑡 = E𝑡 [𝛽 (𝑦 + 𝑝𝑡+1 )] (5)
𝑢′ (𝑐𝑡 ) 𝑡+1

This is the famous consumption-based asset pricing equation


Before discussing it further we want to solve out for prices

73.3.3 Solving the Model

Equation Eq. (4) is a functional equation in the unknown function 𝑝


The solution is an equilibrium price function 𝑝∗
Let’s look at how to obtain it
Setting up the Problem
Instead of solving for it directly we’ll follow Lucas’ indirect approach, first setting

𝑓(𝑦) ∶= 𝑢′ (𝑦)𝑝(𝑦) (6)

so that Eq. (4) becomes

𝑓(𝑦) = ℎ(𝑦) + 𝛽 ∫ 𝑓[𝐺(𝑦, 𝑧)]𝜙(𝑑𝑧) (7)

Here ℎ(𝑦) ∶= 𝛽 ∫ 𝑢′ [𝐺(𝑦, 𝑧)]𝐺(𝑦, 𝑧)𝜙(𝑑𝑧) is a function that depends only on the primitives
Equation Eq. (7) is a functional equation in 𝑓
The plan is to solve out for 𝑓 and convert back to 𝑝 via Eq. (6)
To solve Eq. (7) we’ll use a standard method: convert it to a fixed point problem
First, we introduce the operator 𝑇 mapping 𝑓 into 𝑇 𝑓 as defined by

(𝑇 𝑓)(𝑦) = ℎ(𝑦) + 𝛽 ∫ 𝑓[𝐺(𝑦, 𝑧)]𝜙(𝑑𝑧) (8)

In what follows, we refer to 𝑇 as the Lucas operator


The reason we do this is that a solution to Eq. (7) now corresponds to a function 𝑓 ∗ satisfy-
ing (𝑇 𝑓 ∗ )(𝑦) = 𝑓 ∗ (𝑦) for all 𝑦
1198 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL

In other words, a solution is a fixed point of 𝑇


This means that we can use fixed point theory to obtain and compute the solution
A Little Fixed Point Theory
Let 𝑐𝑏R+ be the set of continuous bounded functions 𝑓 ∶ R+ → R+
We now show that

1. 𝑇 has exactly one fixed point 𝑓 ∗ in 𝑐𝑏R+


2. For any 𝑓 ∈ 𝑐𝑏R+ , the sequence 𝑇 𝑘 𝑓 converges uniformly to 𝑓 ∗

(Note: If you find the mathematics heavy going you can take 1–2 as given and skip to the
next section)
Recall the Banach contraction mapping theorem
It tells us that the previous statements will be true if we can find an 𝛼 < 1 such that

‖𝑇 𝑓 − 𝑇 𝑔‖ ≤ 𝛼‖𝑓 − 𝑔‖, ∀ 𝑓, 𝑔 ∈ 𝑐𝑏R+ (9)

Here ‖ℎ‖ ∶= sup𝑥∈R |ℎ(𝑥)|


+

To see that Eq. (9) is valid, pick any 𝑓, 𝑔 ∈ 𝑐𝑏R+ and any 𝑦 ∈ R+
Observe that, since integrals get larger when absolute values are moved to the inside,

|𝑇 𝑓(𝑦) − 𝑇 𝑔(𝑦)| = ∣𝛽 ∫ 𝑓[𝐺(𝑦, 𝑧)]𝜙(𝑑𝑧) − 𝛽 ∫ 𝑔[𝐺(𝑦, 𝑧)]𝜙(𝑑𝑧)∣

≤ 𝛽 ∫ |𝑓[𝐺(𝑦, 𝑧)] − 𝑔[𝐺(𝑦, 𝑧)]| 𝜙(𝑑𝑧)

≤ 𝛽 ∫ ‖𝑓 − 𝑔‖𝜙(𝑑𝑧)

= 𝛽‖𝑓 − 𝑔‖

Since the right-hand side is an upper bound, taking the sup over all 𝑦 on the left-hand side
gives Eq. (9) with 𝛼 ∶= 𝛽

73.3.4 Computation – An Example

The preceding discussion tells that we can compute 𝑓 ∗ by picking any arbitrary 𝑓 ∈ 𝑐𝑏R+ and
then iterating with 𝑇
The equilibrium price function 𝑝∗ can then be recovered by 𝑝∗ (𝑦) = 𝑓 ∗ (𝑦)/𝑢′ (𝑦)
Let’s try this when ln 𝑦𝑡+1 = 𝛼 ln 𝑦𝑡 + 𝜎𝜖𝑡+1 where {𝜖𝑡 } is IID and standard normal
Utility will take the isoelastic form 𝑢(𝑐) = 𝑐1−𝛾 /(1 − 𝛾), where 𝛾 > 0 is the coefficient of
relative risk aversion
We will set up a LucasTree class to hold parameters of the model

In [3]: class LucasTree:


"""
Class to store parameters of the Lucas tree model.
73.3. THE LUCAS MODEL 1199

"""

def __init__(self,
γ=2, # CRRA utility parameter
β=0.95, # Discount factor
α=0.90, # Correlation coefficient
σ=0.1, # Volatility coefficient
grid_size=100):

self.γ, self.β, self.α, self.σ = γ, β, α, σ

# == Set the grid interval to contain most of the mass of the


# stationary distribution of the consumption endowment == #
ssd = self.σ / np.sqrt(1 - self.α**2)
grid_min, grid_max = np.exp(-4 * ssd), np.exp(4 * ssd)
self.grid = np.linspace(grid_min, grid_max, grid_size)
self.grid_size = grid_size

# == set up distribution for shocks == #


self.� = lognorm(σ)
self.draws = self.�.rvs(500)

# == h(y) = β * int G(y,z)^(1-γ) �(dz) == #


self.h = np.empty(self.grid_size)
for i, y in enumerate(self.grid):
self.h[i] = β * np.mean((y**α * self.draws)**(1 - γ))

The following function takes an instance of the LucasTree and generates a jitted version of
the Lucas operator

In [4]: def operator_factory(tree, parallel_flag=True):

"""
Returns approximate Lucas operator, which computes and returns the
updated function Tf on the grid points.

tree is an instance of the LucasTree class

"""

grid, h = tree.grid, tree.h


α, β = tree.α, tree.β
z_vec = tree.draws

@njit(parallel=parallel_flag)
def T(f):
"""
The Lucas operator
"""

# == turn f into a function == #


Af = lambda x: interp(grid, f, x)

Tf = np.empty_like(f)
# == Apply the T operator to f using Monte Carlo integration == #
for i in prange(len(grid)):
y = grid[i]
Tf[i] = h[i] + β * np.mean(Af(y**α * z_vec))

return Tf

return T

To solve the model, we write a function that iterates using the Lucas operator to find the
fixed point

In [5]: def solve_model(tree, tol=1e-6, max_iter=500):


"""
1200 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL

Compute the equilibrium price function associated with Lucas


tree

* tree is an instance of LucasTree

"""
# == simplify notation == #
grid, grid_size = tree.grid, tree.grid_size
γ = tree.γ

T = operator_factory(tree)

i = 0
f = np.ones_like(grid) # Initial guess of f
error = tol + 1
while error > tol and i < max_iter:
Tf = T(f)
error = np.max(np.abs(Tf - f))
f = Tf
i += 1

price = f * grid**γ # Back out price vector

return price

Solving the model and plotting the resulting price function

In [6]: tree = LucasTree()


price_vals = solve_model(tree)

plt.figure(figsize=(12, 8))
plt.plot(tree.grid, price_vals, label='$p*(y)$')
plt.xlabel('$y$')
plt.ylabel('price')
plt.legend()
plt.show()
73.4. EXERCISES 1201

We see that the price is increasing, even if we remove all serial correlation from the endow-
ment process
The reason is that a larger current endowment reduces current marginal utility
The price must therefore rise to induce the household to consume the entire endowment (and
hence satisfy the resource constraint)
What happens with a more patient consumer?
Here the orange line corresponds to the previous parameters and the green line is price when
𝛽 = 0.98

We see that when consumers are more patient the asset becomes more valuable, and the price
of the Lucas tree shifts up
Exercise 1 asks you to replicate this figure

73.4 Exercises

73.4.1 Exercise 1

Replicate the figure to show how discount factors affect prices

73.5 Solutions

73.5.1 Exercise 1
In [7]: fig, ax = plt.subplots(figsize=(10, 6))

for β in (.95, 0.98):


tree = LucasTree(β=β)
grid = tree.grid
price_vals = solve_model(tree)
label = rf'$\beta = {β}$'
1202 73. ASSET PRICING II: THE LUCAS ASSET PRICING MODEL

ax.plot(grid, price_vals, lw=2, alpha=0.7, label=label)

ax.legend(loc='upper left')
ax.set(xlabel='$y$', ylabel='price', xlim=(min(grid), max(grid)))
plt.show()
74

Asset Pricing III: Incomplete


Markets

74.1 Contents

• Overview 74.2

• Structure of the Model 74.3

• Solving the Model 74.4

• Exercises 74.5

• Solutions 74.6

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

74.2 Overview

This lecture describes a version of a model of Harrison and Kreps [61]


The model determines the price of a dividend-yielding asset that is traded by two types of
self-interested investors
The model features

• heterogeneous beliefs
• incomplete markets
• short sales constraints, and possibly …
• (leverage) limits on an investor’s ability to borrow in order to finance purchases of a
risky asset

74.2.1 References

Prior to reading the following, you might like to review our lectures on

1203
1204 74. ASSET PRICING III: INCOMPLETE MARKETS

• Markov chains
• Asset pricing with finite state space

74.2.2 Bubbles

Economists differ in how they define a bubble


The Harrison-Kreps model illustrates the following notion of a bubble that attracts many
economists:

A component of an asset price can be interpreted as a bubble when all investors


agree that the current price of the asset exceeds what they believe the asset’s un-
derlying dividend stream justifies

74.3 Structure of the Model

The model simplifies by ignoring alterations in the distribution of wealth among investors
having different beliefs about the fundamentals that determine asset payouts
There is a fixed number 𝐴 of shares of an asset
Each share entitles its owner to a stream of dividends {𝑑𝑡 } governed by a Markov chain de-
fined on a state space 𝑆 ∈ {0, 1}
The dividend obeys

0 if 𝑠𝑡 = 0
𝑑𝑡 = {
1 if 𝑠𝑡 = 1

The owner of a share at the beginning of time 𝑡 is entitled to the dividend paid at time 𝑡
The owner of the share at the beginning of time 𝑡 is also entitled to sell the share to another
investor during time 𝑡
Two types ℎ = 𝑎, 𝑏 of investors differ only in their beliefs about a Markov transition matrix 𝑃
with typical element

𝑃 (𝑖, 𝑗) = P{𝑠𝑡+1 = 𝑗 ∣ 𝑠𝑡 = 𝑖}

Investors of type 𝑎 believe the transition matrix

1 1
𝑃𝑎 = [ 22 2]
1
3 3

Investors of type 𝑏 think the transition matrix is

2 1
𝑃𝑏 = [ 31 3]
3
4 4

The stationary (i.e., invariant) distributions of these two matrices can be calculated as fol-
lows:
74.3. STRUCTURE OF THE MODEL 1205

In [2]: import numpy as np


import quantecon as qe

qa = np.array([[1/2, 1/2], [2/3, 1/3]])


qb = np.array([[2/3, 1/3], [1/4, 3/4]])
mcA = qe.MarkovChain(qa)
mcB = qe.MarkovChain(qb)
mcA.stationary_distributions

Out[2]: array([[0.57142857, 0.42857143]])

In [3]: mcB.stationary_distributions

Out[3]: array([[0.42857143, 0.57142857]])

The stationary distribution of 𝑃𝑎 is approximately 𝜋𝐴 = [.57 .43]


The stationary distribution of 𝑃𝑏 is approximately 𝜋𝐵 = [.43 .57]

74.3.1 Ownership Rights

An owner of the asset at the end of time 𝑡 is entitled to the dividend at time 𝑡 + 1 and also
has the right to sell the asset at time 𝑡 + 1
Both types of investors are risk-neutral and both have the same fixed discount factor 𝛽 ∈
(0, 1)
In our numerical example, we’ll set 𝛽 = .75, just as Harrison and Kreps did
We’ll eventually study the consequences of two different assumptions about the number of
shares 𝐴 relative to the resources that our two types of investors can invest in the stock

1. Both types of investors have enough resources (either wealth or the capacity to borrow)
so that they can purchase the entire available stock of the asset [1]
2. No single type of investor has sufficient resources to purchase the entire stock

Case 1 is the case studied in Harrison and Kreps


In case 2, both types of investors always hold at least some of the asset

74.3.2 Short Sales Prohibited

No short sales are allowed


This matters because it limits pessimists from expressing their opinions

• They can express their views by selling their shares


• They cannot express their pessimism more loudly by artificially “manufacturing shares”
– that is, they cannot borrow shares from more optimistic investors and sell them im-
mediately
1206 74. ASSET PRICING III: INCOMPLETE MARKETS

74.3.3 Optimism and Pessimism

The above specifications of the perceived transition matrices 𝑃𝑎 and 𝑃𝑏 , taken directly from
Harrison and Kreps, build in stochastically alternating temporary optimism and pessimism
Remember that state 1 is the high dividend state

• In state 0, a type 𝑎 agent is more optimistic about next period’s dividend than a type 𝑏
agent
• In state 1, a type 𝑏 agent is more optimistic about next period’s dividend

However, the stationary distributions 𝜋𝐴 = [.57 .43] and 𝜋𝐵 = [.43 .57] tell us that a
type 𝐵 person is more optimistic about the dividend process in the long run than is a type A
person
Transition matrices for the temporarily optimistic and pessimistic investors are constructed as
follows
Temporarily optimistic investors (i.e., the investor with the most optimistic beliefs in each
state) believe the transition matrix

1 1
𝑃𝑜 = [ 21 2]
3
4 4

Temporarily pessimistic believe the transition matrix

1 1
𝑃𝑝 = [ 21 2]
3
4 4

We’ll return to these matrices and their significance in the exercise

74.3.4 Information

Investors know a price function mapping the state 𝑠𝑡 at 𝑡 into the equilibrium price 𝑝(𝑠𝑡 ) that
prevails in that state
This price function is endogenous and to be determined below
When investors choose whether to purchase or sell the asset at 𝑡, they also know 𝑠𝑡

74.4 Solving the Model

Now let’s turn to solving the model


This amounts to determining equilibrium prices under the different possible specifications of
beliefs and constraints listed above
In particular, we compare equilibrium price functions under the following alternative assump-
tions about beliefs:

1. There is only one type of agent, either 𝑎 or 𝑏


74.4. SOLVING THE MODEL 1207

2. There are two types of agents differentiated only by their beliefs. Each type of agent
has sufficient resources to purchase all of the asset (Harrison and Kreps’s setting)
3. There are two types of agents with different beliefs, but because of limited wealth
and/or limited leverage, both types of investors hold the asset each period

74.4.1 Summary Table

The following table gives a summary of the findings obtained in the remainder of the lecture
(you will be asked to recreate the table in an exercise)
It records implications of Harrison and Kreps’s specifications of 𝑃𝑎 , 𝑃𝑏 , 𝛽

𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08

Here

• 𝑝𝑎 is the equilibrium price function under homogeneous beliefs 𝑃𝑎


• 𝑝𝑏 is the equilibrium price function under homogeneous beliefs 𝑃𝑏
• 𝑝𝑜 is the equilibrium price function under heterogeneous beliefs with optimistic marginal
investors
• 𝑝𝑝 is the equilibrium price function under heterogeneous beliefs with pessimistic
marginal investors
• 𝑝𝑎̂ is the amount type 𝑎 investors are willing to pay for the asset
• 𝑝𝑏̂ is the amount type 𝑏 investors are willing to pay for the asset

We’ll explain these values and how they are calculated one row at a time

74.4.2 Single Belief Prices

We’ll start by pricing the asset under homogeneous beliefs


(This is the case treated in the lecture on asset pricing with finite Markov states)
Suppose that there is only one type of investor, either of type 𝑎 or 𝑏, and that this investor
always “prices the asset”
𝑝ℎ (0)
Let 𝑝ℎ = [ ] be the equilibrium price vector when all investors are of type ℎ
𝑝ℎ (1)
The price today equals the expected discounted value of tomorrow’s dividend and tomorrow’s
price of the asset:

𝑝ℎ (𝑠) = 𝛽 (𝑃ℎ (𝑠, 0)(0 + 𝑝ℎ (0)) + 𝑃ℎ (𝑠, 1)(1 + 𝑝ℎ (1))) , 𝑠 = 0, 1

These equations imply that the equilibrium price vector is


1208 74. ASSET PRICING III: INCOMPLETE MARKETS

𝑝 (0) 0
[ ℎ ] = 𝛽[𝐼 − 𝛽𝑃ℎ ]−1 𝑃ℎ [ ] (1)
𝑝ℎ (1) 1

The first two rows of the table report 𝑝𝑎 (𝑠) and 𝑝𝑏 (𝑠)
Here’s a function that can be used to compute these values

In [4]: """

Authors: Chase Coleman, Tom Sargent

"""
import scipy.linalg as la

def price_single_beliefs(transition, dividend_payoff, β=.75):


"""
Function to Solve Single Beliefs
"""
# First compute inverse piece
imbq_inv = la.inv(np.eye(transition.shape[0]) - β * transition)

# Next compute prices


prices = β * imbq_inv @ transition @ dividend_payoff

return prices

Single Belief Prices as Benchmarks


These equilibrium prices under homogeneous beliefs are important benchmarks for the subse-
quent analysis

• 𝑝ℎ (𝑠) tells what investor ℎ thinks is the “fundamental value” of the asset
• Here “fundamental value” means the expected discounted present value of future divi-
dends

We will compare these fundamental values of the asset with equilibrium values when traders
have different beliefs

74.4.3 Pricing under Heterogeneous Beliefs

There are several cases to consider


The first is when both types of agents have sufficient wealth to purchase all of the asset them-
selves
In this case, the marginal investor who prices the asset is the more optimistic type so that the
equilibrium price 𝑝̄ satisfies Harrison and Kreps’s key equation:

𝑝(𝑠)
̄ = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̄ 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̄ (2)

for 𝑠 = 0, 1
The marginal investor who prices the asset in state 𝑠 is of type 𝑎 if

𝑃𝑎 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ > 𝑃𝑏 (𝑠, 0)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄

The marginal investor is of type 𝑏 if


74.4. SOLVING THE MODEL 1209

𝑃𝑎 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1))
̄ < 𝑃𝑏 (𝑠, 1)𝑝(0)
̄ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))
̄

Thus the marginal investor is the (temporarily) optimistic type


Equation Eq. (2) is a functional equation that, like a Bellman equation, can be solved by

• starting with a guess for the price vector 𝑝̄ and


• iterating to convergence on the operator that maps a guess 𝑝̄𝑗 into an updated guess
𝑝̄𝑗+1 defined by the right side of Eq. (2), namely

𝑝̄𝑗+1 (𝑠) = 𝛽 max {𝑃𝑎 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑎 (𝑠, 1)(1 + 𝑝̄𝑗 (1)), 𝑃𝑏 (𝑠, 0)𝑝̄𝑗 (0) + 𝑃𝑏 (𝑠, 1)(1 + 𝑝̄𝑗 (1))} (3)

for 𝑠 = 0, 1
The third row of the table reports equilibrium prices that solve the functional equation when
𝛽 = .75
Here the type that is optimistic about 𝑠𝑡+1 prices the asset in state 𝑠𝑡
It is instructive to compare these prices with the equilibrium prices for the homogeneous be-
lief economies that solve under beliefs 𝑃𝑎 and 𝑃𝑏
Equilibrium prices 𝑝̄ in the heterogeneous beliefs economy exceed what any prospective in-
vestor regards as the fundamental value of the asset in each possible state
Nevertheless, the economy recurrently visits a state that makes each investor want to pur-
chase the asset for more than he believes its future dividends are worth
The reason is that he expects to have the option to sell the asset later to another investor
who will value the asset more highly than he will

• Investors of type 𝑎 are willing to pay the following price for the asset

𝑝(0)
̄ if 𝑠𝑡 = 0
𝑝𝑎̂ (𝑠) = {
𝛽(𝑃𝑎 (1, 0)𝑝(0)
̄ + 𝑃𝑎 (1, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 1

• Investors of type 𝑏 are willing to pay the following price for the asset

𝛽(𝑃𝑏 (0, 0)𝑝(0)


̄ + 𝑃𝑏 (0, 1)(1 + 𝑝(1)))
̄ if 𝑠𝑡 = 0
𝑝𝑏̂ (𝑠) = {
𝑝(1)
̄ if 𝑠𝑡 = 1

Evidently, 𝑝𝑎̂ (1) < 𝑝(1)


̄ and 𝑝𝑏̂ (0) < 𝑝(0)
̄
Investors of type 𝑎 want to sell the asset in state 1 while investors of type 𝑏 want to sell it in
state 0

• The asset changes hands whenever the state changes from 0 to 1 or from 1 to 0
• The valuations 𝑝𝑎̂ (𝑠) and 𝑝𝑏̂ (𝑠) are displayed in the fourth and fifth rows of the table
1210 74. ASSET PRICING III: INCOMPLETE MARKETS

• Even the pessimistic investors who don’t buy the asset think that it is worth more than
they think future dividends are worth

Here’s code to solve for 𝑝,̄ 𝑝𝑎̂ and 𝑝𝑏̂ using the iterative method described above

In [5]: def price_optimistic_beliefs(transitions, dividend_payoff, β=.75,


max_iter=50000, tol=1e-16):
"""
Function to Solve Optimistic Beliefs
"""
# We will guess an initial price vector of [0, 0]
p_new = np.array([[0], [0]])
p_old = np.array([[10.], [10.]])

# We know this is a contraction mapping, so we can iterate to conv


for i in range(max_iter):
p_old = p_new
p_new = β * np.max([q @ p_old + q @ dividend_payoff for q in transitions], 1)

# If we succeed in converging, break out of for loop


if np.max(np.sqrt((p_new - p_old)**2)) < 1e-12:
break

ptwiddle = β * np.min([q @ p_old + q @ dividend_payoff for q in transitions], 1)

phat_a = np.array([p_new[0], ptwiddle[1]])


phat_b = np.array([ptwiddle[0], p_new[1]])

return p_new, phat_a, phat_b

74.4.4 Insufficient Funds

Outcomes differ when the more optimistic type of investor has insufficient wealth — or insuf-
ficient ability to borrow enough — to hold the entire stock of the asset
In this case, the asset price must adjust to attract pessimistic investors
Instead of equation Eq. (2), the equilibrium price satisfies

𝑝(𝑠)
̌ = 𝛽 min {𝑃𝑎 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑎 (𝑠, 1)(1 + 𝑝(1)),
̌ 𝑃𝑏 (𝑠, 1)𝑝(0)
̌ + 𝑃𝑏 (𝑠, 1)(1 + 𝑝(1))}
̌ (4)

and the marginal investor who prices the asset is always the one that values it less highly
than does the other type
Now the marginal investor is always the (temporarily) pessimistic type
Notice from the sixth row of that the pessimistic price 𝑝 is lower than the homogeneous belief
prices 𝑝𝑎 and 𝑝𝑏 in both states
When pessimistic investors price the asset according to Eq. (4), optimistic investors think
that the asset is underpriced
If they could, optimistic investors would willingly borrow at the one-period gross interest rate
𝛽 −1 to purchase more of the asset
Implicit constraints on leverage prohibit them from doing so
When optimistic investors price the asset as in equation Eq. (2), pessimistic investors think
that the asset is overpriced and would like to sell the asset short
Constraints on short sales prevent that
Here’s code to solve for 𝑝̌ using iteration
74.5. EXERCISES 1211

In [6]: def price_pessimistic_beliefs(transitions, dividend_payoff, β=.75,


max_iter=50000, tol=1e-16):
"""
Function to Solve Pessimistic Beliefs
"""
# We will guess an initial price vector of [0, 0]
p_new = np.array([[0], [0]])
p_old = np.array([[10.], [10.]])

# We know this is a contraction mapping, so we can iterate to conv


for i in range(max_iter):
p_old = p_new
p_new = β * np.min([q @ p_old + q @ dividend_payoff for q in transitions], 1)

# If we succeed in converging, break out of for loop


if np.max(np.sqrt((p_new - p_old)**2)) < 1e-12:
break

return p_new

74.4.5 Further Interpretation

[120] interprets the Harrison-Kreps model as a model of a bubble — a situation in which an


asset price exceeds what every investor thinks is merited by the asset’s underlying dividend
stream
Scheinkman stresses these features of the Harrison-Kreps model:

• Compared to the homogeneous beliefs setting leading to the pricing formula, high vol-
ume occurs when the Harrison-Kreps pricing formula prevails

Type 𝑎 investors sell the entire stock of the asset to type 𝑏 investors every time the state
switches from 𝑠𝑡 = 0 to 𝑠𝑡 = 1
Type 𝑏 investors sell the asset to type 𝑎 investors every time the state switches from 𝑠𝑡 = 1 to
𝑠𝑡 = 0
Scheinkman takes this as a strength of the model because he observes high volume during
famous bubbles

• If the supply of the asset is increased sufficiently either physically (more “houses” are
built) or artificially (ways are invented to short sell “houses”), bubbles end when the
supply has grown enough to outstrip optimistic investors’ resources for purchasing the
asset
• If optimistic investors finance purchases by borrowing, tightening leverage constraints
can extinguish a bubble

Scheinkman extracts insights about the effects of financial regulations on bubbles


He emphasizes how limiting short sales and limiting leverage have opposite effects

74.5 Exercises

74.5.1 Exercise 1

Recreate the summary table using the functions we have built above
1212 74. ASSET PRICING III: INCOMPLETE MARKETS

𝑠𝑡 0 1
𝑝𝑎 1.33 1.22
𝑝𝑏 1.45 1.91
𝑝𝑜 1.85 2.08
𝑝𝑝 1 1
𝑝𝑎̂ 1.85 1.69
𝑝𝑏̂ 1.69 2.08

You will first need to define the transition matrices and dividend payoff vector

74.6 Solutions

74.6.1 Exercise 1

First, we will obtain equilibrium price vectors with homogeneous beliefs, including when all
investors are optimistic or pessimistic

In [7]: qa = np.array([[1/2, 1/2], [2/3, 1/3]]) # Type a transition matrix


qb = np.array([[2/3, 1/3], [1/4, 3/4]]) # Type b transition matrix
qopt = np.array([[1/2, 1/2], [1/4, 3/4]]) # Optimistic investor transition matrix
qpess = np.array([[2/3, 1/3], [2/3, 1/3]]) # Pessimistic investor transition matrix

dividendreturn = np.array([[0], [1]])

transitions = [qa, qb, qopt, qpess]


labels = ['p_a', 'p_b', 'p_optimistic', 'p_pessimistic']

for transition, label in zip(transitions, labels):


print(label)
print("=" * 20)
s0, s1 = np.round(price_single_beliefs(transition, dividendreturn), 2)
print(f"State 0: {s0}")
print(f"State 1: {s1}")
print("-" * 20)

p_a
====================
State 0: [1.33]
State 1: [1.22]
--------------------
p_b
====================
State 0: [1.45]
State 1: [1.91]
--------------------
p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_pessimistic
====================
State 0: [1.]
State 1: [1.]
--------------------

We will use the price_optimistic_beliefs function to find the price under heterogeneous be-
liefs
74.6. SOLUTIONS 1213

In [8]: opt_beliefs = price_optimistic_beliefs([qa, qb], dividendreturn)


labels = ['p_optimistic', 'p_hat_a', 'p_hat_b']

for p, label in zip(opt_beliefs, labels):


print(label)
print("=" * 20)
s0, s1 = np.round(p, 2)
print(f"State 0: {s0}")
print(f"State 1: {s1}")
print("-" * 20)

p_optimistic
====================
State 0: [1.85]
State 1: [2.08]
--------------------
p_hat_a
====================
State 0: [1.85]
State 1: [1.69]
--------------------
p_hat_b
====================
State 0: [1.69]
State 1: [2.08]
--------------------

Notice that the equilibrium price with heterogeneous beliefs is equal to the price under single
beliefs with optimistic investors - this is due to the marginal investor being the temporarily
optimistic type
Footnotes
[1] By assuming that both types of agents always have “deep enough pockets” to purchase
all of the asset, the model takes wealth dynamics off the table. The Harrison-Kreps model
generates high trading volume when the state changes either from 0 to 1 or from 1 to 0.
1214 74. ASSET PRICING III: INCOMPLETE MARKETS
75

Two Modifications of Mean-variance


Portfolio Theory

75.1 Contents

• Overview 75.2

• Appendix 75.3

Authors: Daniel Csaba, Thomas J. Sargent and Balint Szoke

75.2 Overview

75.2.1 Remarks About Estimating Means and Variances

The famous Black-Litterman (1992) [19] portfolio choice model that we describe in this lec-
ture is motivated by the finding that with high or moderate frequency data, means are more
difficult to estimate than variances
A model of robust portfolio choice that we’ll describe also begins from the same starting
point
To begin, we’ll take for granted that means are more difficult to estimate that covariances
and will focus on how Black and Litterman, on the one hand, an robust control theorists,
on the other, would recommend modifying the mean-variance portfolio choice model
to take that into account
At the end of this lecture, we shall use some rates of convergence results and some simula-
tions to verify how means are more difficult to estimate than variances
Among the ideas in play in this lecture will be

• Mean-variance portfolio theory


• Bayesian approaches to estimating linear regressions
• A risk-sensitivity operator and its connection to robust control theory

In [1]: import numpy as np


import scipy as sp

1215
1216 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

import scipy.stats as stat


import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import interact, FloatSlider

75.2.2 Adjusting Mean-variance Portfolio Choice Theory for Distrust of


Mean Excess Returns

This lecture describes two lines of thought that modify the classic mean-variance portfolio
choice model in ways designed to make its recommendations more plausible
As we mentioned above, the two approaches build on a common and widespread hunch – that
because it is much easier statistically to estimate covariances of excess returns than it is to
estimate their means, it makes sense to contemplated the consequences of adjusting investors’
subjective beliefs about mean returns in order to render more sensible decisions
Both of the adjustments that we describe are designed to confront a widely recognized em-
barrassment to mean-variance portfolio theory, namely, that it usually implies taking very
extreme long-short portfolio positions

75.2.3 Mean-variance Portfolio Choice

A risk-free security earns one-period net return 𝑟𝑓


An 𝑛 × 1 vector of risky securities earns an 𝑛 × 1 vector 𝑟 ⃗ − 𝑟𝑓 1 of excess returns, where 1 is
an 𝑛 × 1 vector of ones
The excess return vector is multivariate normal with mean 𝜇 and covariance matrix Σ, which
we express either as

𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇, Σ)

or

𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖

where 𝜖 ∼ 𝒩(0, 𝐼) is an 𝑛 × 1 random vector.


Let 𝑤 be an 𝑛 × 1 vector of portfolio weights
A portfolio consisting 𝑤 earns returns

𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) ∼ 𝒩(𝑤′ 𝜇, 𝑤′ Σ𝑤)

The mean-variance portfolio choice problem is to choose 𝑤 to maximize

𝛿
𝑈 (𝜇, Σ; 𝑤) = 𝑤′ 𝜇 − 𝑤′ Σ𝑤 (1)
2
where 𝛿 > 0 is a risk-aversion parameter. The first-order condition for maximizing Eq. (1)
with respect to the vector 𝑤 is

𝜇 = 𝛿Σ𝑤
75.2. OVERVIEW 1217

which implies the following design of a risky portfolio:

𝑤 = (𝛿Σ)−1 𝜇 (2)

75.2.4 Estimating the Mean and Variance

The key inputs into the portfolio choice model Eq. (2) are

• estimates of the parameters 𝜇, Σ of the random excess return vector(𝑟 ⃗ − 𝑟𝑓 1)


• the risk-aversion parameter 𝛿

A standard way of estimating 𝜇 is maximum-likelihood or least squares; that amounts to esti-


mating 𝜇 by a sample mean of excess returns and estimating Σ by a sample covariance matrix

75.2.5 The Black-Litterman Starting Point

When estimates of 𝜇 and Σ from historical sample means and covariances have been com-
bined with reasonable values of the risk-aversion parameter 𝛿 to compute an optimal port-
folio from formula Eq. (2), a typical outcome has been 𝑤’s with extreme long and short
positions
A common reaction to these outcomes is that they are so unreasonable that a portfolio man-
ager cannot recommend them to a customer

In [2]: np.random.seed(12)

N = 10 # Number of assets
T = 200 # Sample size

# random market portfolio (sum is normalized to 1)


w_m = np.random.rand(N)
w_m = w_m / (w_m.sum())

# True risk premia and variance of excess return (constructed so that the Sharpe ratio is 1)
μ = (np.random.randn(N) + 5) /100 # Mean excess return (risk premium)
S = np.random.randn(N, N) # Random matrix for the covariance matrix
V = S @ S.T # Turn the random matrix into symmetric psd
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m) # Make sure that the Sharpe ratio is one

# Risk aversion of market portfolio holder


δ = 1 / np.sqrt(w_m @ Σ @ w_m)

# Generate a sample of excess returns


excess_return = stat.multivariate_normal(μ, Σ)
sample = excess_return.rvs(T)

# Estimate μ and Σ
μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)

w = np.linalg.solve(δ * Σ_est, μ_est)

fig, ax = plt.subplots(figsize=(8, 5))


ax.set_title('Mean-variance portfolio weights recommendation and the market portfolio')
ax.plot(np.arange(N)+1, w, 'o', c='k', label='$w$ (mean-variance)')
ax.plot(np.arange(N)+1, w_m, 'o', c='r', label='$w_m$ (market portfolio)')
ax.vlines(np.arange(N)+1, 0, w, lw=1)
ax.vlines(np.arange(N)+1, 0, w_m, lw=1)
ax.axhline(0, c='k')
ax.axhline(-1, c='k', ls='--')
1218 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

ax.axhline(1, c='k', ls='--')


ax.set_xlabel('Assets')
ax.xaxis.set_ticks(np.arange(1, N+1, 1))
plt.legend(numpoints=1, fontsize=11)
plt.show()

Black and Litterman’s responded to this situation in the following way:

• They continue to accept Eq. (2) as a good model for choosing an optimal portfolio 𝑤
• They want to continue to allow the customer to express his or her risk tolerance by set-
ting 𝛿
• Leaving Σ at its maximum-likelihood value, they push 𝜇 away from its maximum value
in a way designed to make portfolio choices that are more plausible in terms of conform-
ing to what most people actually do

In particular, given Σ and a reasonable value of 𝛿, Black and Litterman reverse engineered
a vector 𝜇𝐵𝐿 of mean excess returns that makes the 𝑤 implied by formula Eq. (2) equal the
actual market portfolio 𝑤𝑚 , so that

𝑤𝑚 = (𝛿Σ)−1 𝜇𝐵𝐿

75.2.6 Details

Let’s define


𝑤𝑚 𝜇 ≡ (𝑟𝑚 − 𝑟𝑓 )

as the (scalar) excess return on the market portfolio 𝑤𝑚


75.2. OVERVIEW 1219

Define

𝜎 2 = 𝑤𝑚

Σ𝑤𝑚

as the variance of the excess return on the market portfolio 𝑤𝑚


Define

𝑟𝑚 − 𝑟𝑓
SR𝑚 =
𝜎
as the Sharpe-ratio on the market portfolio 𝑤𝑚
Let 𝛿𝑚 be the value of the risk aversion parameter that induces an investor to hold the mar-
ket portfolio in light of the optimal portfolio choice rule Eq. (2)
Evidently, portfolio rule Eq. (2) then implies that 𝑟𝑚 − 𝑟𝑓 = 𝛿𝑚 𝜎2 or

𝑟𝑚 − 𝑟𝑓
𝛿𝑚 =
𝜎2
or

SR𝑚
𝛿𝑚 =
𝜎
Following the Black-Litterman philosophy, our first step will be to back a value of 𝛿𝑚 from

• an estimate of the Sharpe-ratio, and


• our maximum likelihood estimate of 𝜎 drawn from our estimates or 𝑤𝑚 and Σ

The second key Black-Litterman step is then to use this value of 𝛿 together with the maxi-
mum likelihood estimate of Σ to deduce a 𝜇BL that verifies portfolio rule Eq. (2) at the mar-
ket portfolio 𝑤 = 𝑤𝑚

𝜇𝑚 = 𝛿𝑚 Σ𝑤𝑚

The starting point of the Black-Litterman portfolio choice model is thus a pair (𝛿𝑚 , 𝜇𝑚 ) that
tells the customer to hold the market portfolio

In [3]: # Observed mean excess market return


r_m = w_m @ μ_est

# Estimated variance of the market portfolio


σ_m = w_m @ Σ_est @ w_m

# Sharpe-ratio
SR_m = r_m / np.sqrt(σ_m)

# Risk aversion of market portfolio holder


d_m = r_m / σ_m

# Derive "view" which would induce the market portfolio


μ_m = (d_m * Σ_est @ w_m).reshape(N, 1)

fig, ax = plt.subplots(figsize=(8, 5))


ax.set_title(r'Difference between $\hat{\mu}$ (estimate) and $\mu_{BL}$ (market implied)')
ax.plot(np.arange(N)+1, μ_est, 'o', c='k', label='$\hat{\mu}$')
1220 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

ax.plot(np.arange(N)+1, μ_m, 'o', c='r', label='$\mu_{BL}$')


ax.vlines(np.arange(N) + 1, μ_m, μ_est, lw=1)
ax.axhline(0, c='k', ls='--')
ax.set_xlabel('Assets')
ax.xaxis.set_ticks(np.arange(1, N+1, 1))
plt.legend(numpoints=1)
plt.show()

75.2.7 Adding Views

Black and Litterman start with a baseline customer who asserts that he or she shares the
market’s views, which means that he or she believes that excess returns are governed by

𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇𝐵𝐿 , Σ) (3)

Black and Litterman would advise that customer to hold the market portfolio of risky securi-
ties
Black and Litterman then imagine a consumer who would like to express a view that differs
from the market’s
The consumer wants appropriately to mix his view with the market’s before using Eq. (2) to
choose a portfolio
Suppose that the customer’s view is expressed by a hunch that rather than Eq. (3), excess
returns are governed by

𝑟 ⃗ − 𝑟𝑓 1 ∼ 𝒩(𝜇,̂ 𝜏 Σ)
75.2. OVERVIEW 1221

where 𝜏 > 0 is a scalar parameter that determines how the decision maker wants to mix his
view 𝜇̂ with the market’s view 𝜇BL
Black and Litterman would then use a formula like the following one to mix the views 𝜇̂ and
𝜇BL

𝜇̃ = (Σ−1 + (𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + (𝜏 Σ)−1 𝜇)̂ (4)

Black and Litterman would then advise the customer to hold the portfolio associated with
these views implied by rule Eq. (2):

𝑤̃ = (𝛿Σ)−1 𝜇̃

This portfolio 𝑤̃ will deviate from the portfolio 𝑤𝐵𝐿 in amounts that depend on the mixing
parameter 𝜏 .
If 𝜇̂ is the maximum likelihood estimator and 𝜏 is chosen heavily to weight this view, then the
customer’s portfolio will involve big short-long positions

In [4]: def black_litterman(λ, μ1, μ2, Σ1, Σ2):


"""
This function calculates the Black-Litterman mixture
mean excess return and covariance matrix
"""
Σ1_inv = np.linalg.inv(Σ1)
Σ2_inv = np.linalg.inv(Σ2)

μ_tilde = np.linalg.solve(Σ1_inv + λ * Σ2_inv,


Σ1_inv @ μ1 + λ * Σ2_inv @ μ2)
return μ_tilde

τ = 1
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)

# The Black-Litterman recommendation for the portfolio weights


w_tilde = np.linalg.solve(δ * Σ_est, μ_tilde)

τ_slider = FloatSlider(min=0.05, max=10, step=0.5, value=τ)

@interact(τ=τ_slider)
def BL_plot(τ):
μ_tilde = black_litterman(1, μ_m, μ_est, Σ_est, τ * Σ_est)
w_tilde = np.linalg.solve(δ * Σ_est, μ_tilde)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))


ax[0].plot(np.arange(N)+1, μ_est, 'o', c='k', label=r'$\hat{\mu}$ (subj view)')
ax[0].plot(np.arange(N)+1, μ_m, 'o', c='r', label=r'$\mu_{BL}$ (market)')
ax[0].plot(np.arange(N)+1, μ_tilde, 'o', c='y', label=r'$\tilde{\mu}$ (mixture)')
ax[0].vlines(np.arange(N)+1, μ_m, μ_est, lw=1)
ax[0].axhline(0, c='k', ls='--')
ax[0].set(xlim=(0, N+1), xlabel='Assets',
title=r'Relationship between $\hat{\mu}$, $\mu_{BL}$and$\tilde{\mu}$')
ax[0].xaxis.set_ticks(np.arange(1, N+1, 1))
ax[0].legend(numpoints=1)

ax[1].set_title('Black-Litterman portfolio weight recommendation')


ax[1].plot(np.arange(N)+1, w, 'o', c='k', label=r'$w$ (mean-variance)')
ax[1].plot(np.arange(N)+1, w_m, 'o', c='r', label=r'$w_{m}$ (market, BL)')
ax[1].plot(np.arange(N)+1, w_tilde, 'o', c='y', label=r'$\tilde{w}$ (mixture)')
ax[1].vlines(np.arange(N)+1, 0, w, lw=1)
ax[1].vlines(np.arange(N)+1, 0, w_m, lw=1)
ax[1].axhline(0, c='k')
ax[1].axhline(-1, c='k', ls='--')
ax[1].axhline(1, c='k', ls='--')
ax[1].set(xlim=(0, N+1), xlabel='Assets',
1222 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

title='Black-Litterman portfolio weight recommendation')


ax[1].xaxis.set_ticks(np.arange(1, N+1, 1))
ax[1].legend(numpoints=1)
plt.show()

75.2.8 Bayes Interpretation of the Black-Litterman Recommendation

Consider the following Bayesian interpretation of the Black-Litterman recommendation


The prior belief over the mean excess returns is consistent with the market portfolio and is
given by

𝜇 ∼ 𝒩(𝜇𝐵𝐿 , Σ)

Given a particular realization of the mean excess returns 𝜇 one observes the average excess
returns 𝜇̂ on the market according to the distribution

𝜇̂ ∣ 𝜇, Σ ∼ 𝒩(𝜇, 𝜏 Σ)

where 𝜏 is typically small capturing the idea that the variation in the mean is smaller than
the variation of the individual random variable
Given the realized excess returns one should then update the prior over the mean excess re-
turns according to Bayes rule
The corresponding posterior over mean excess returns is normally distributed with mean

(Σ−1 + (𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + (𝜏 Σ)−1 𝜇)̂

The covariance matrix is

(Σ−1 + (𝜏 Σ)−1 )−1

Hence, the Black-Litterman recommendation is consistent with the Bayes update of the prior
over the mean excess returns in light of the realized average excess returns on the market
75.2. OVERVIEW 1223

75.2.9 Curve Decolletage

Consider two independent “competing” views on the excess market returns

𝑟𝑒⃗ ∼ 𝒩(𝜇𝐵𝐿 , Σ)

and

𝑟𝑒⃗ ∼ 𝒩(𝜇,̂ 𝜏 Σ)

A special feature of the multivariate normal random variable 𝑍 is that its density function
depends only on the (Euclidiean) length of its realization 𝑧
Formally, let the 𝑘-dimensional random vector be

𝑍 ∼ 𝒩(𝜇, Σ)

then

𝑍 ̄ ≡ Σ(𝑍 − 𝜇) ∼ 𝒩(0, 𝐼)

and so the points where the density takes the same value can be described by the ellipse

𝑧 ̄ ⋅ 𝑧 ̄ = (𝑧 − 𝜇)′ Σ−1 (𝑧 − 𝜇) = 𝑑 ̄ (5)

where 𝑑 ̄ ∈ R+ denotes the (transformation) of a particular density value


The curves defined by equation Eq. (5) can be labeled as iso-likelihood ellipses

Remark: More generally there is a class of density functions that possesses this
feature, i.e.

∃𝑔 ∶ R+ ↦ R+ and 𝑐 ≥ 0, s.t. the density 𝑓 of 𝑍 has the form 𝑓(𝑧) = 𝑐𝑔(𝑧 ⋅ 𝑧)

This property is called spherical symmetry (see p 81. in Leamer (1978) [83])
In our specific example, we can use the pair (𝑑1̄ , 𝑑2̄ ) as being two “likelihood” values for which
the corresponding iso-likelihood ellipses in the excess return space are given by

(𝑟𝑒⃗ − 𝜇𝐵𝐿 )′ Σ−1 (𝑟𝑒⃗ − 𝜇𝐵𝐿 ) = 𝑑1̄


−1
(𝑟𝑒⃗ − 𝜇)̂ ′ (𝜏 Σ) (𝑟𝑒⃗ − 𝜇)̂ = 𝑑2̄

Notice that for particular 𝑑1̄ and 𝑑2̄ values the two ellipses have a tangency point
These tangency points, indexed by the pairs (𝑑1̄ , 𝑑2̄ ), characterize points 𝑟𝑒⃗ from which there
exists no deviation where one can increase the likelihood of one view without decreasing the
likelihood of the other view
1224 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

The pairs (𝑑1̄ , 𝑑2̄ ) for which there is such a point outlines a curve in the excess return space.
This curve is reminiscent of the Pareto curve in an Edgeworth-box setting
Dickey (1975) [35] calls it a curve decolletage
Leamer (1978) [83] calls it an information contract curve and describes it by the following
program: maximize the likelihood of one view, say the Black-Litterman recommendation
while keeping the likelihood of the other view at least at a prespecified constant 𝑑2̄

𝑑1̄ (𝑑2̄ ) ≡ max (𝑟𝑒⃗ − 𝜇𝐵𝐿 )′ Σ−1 (𝑟𝑒⃗ − 𝜇𝐵𝐿 )


𝑟𝑒⃗

subject to (𝑟𝑒⃗ − 𝜇)̂ ′ (𝜏 Σ)−1 (𝑟𝑒⃗ − 𝜇)̂ ≥ 𝑑2̄

Denoting the multiplier on the constraint by 𝜆, the first-order condition is

2(𝑟𝑒⃗ − 𝜇𝐵𝐿 )′ Σ−1 + 𝜆2(𝑟𝑒⃗ − 𝜇)̂ ′ (𝜏 Σ)−1 = 0

which defines the information contract curve between 𝜇𝐵𝐿 and 𝜇̂

𝑟𝑒⃗ = (Σ−1 + 𝜆(𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + 𝜆(𝜏 Σ)−1 𝜇)̂ (6)

Note that if 𝜆 = 1, Eq. (6) is equivalent with Eq. (4) and it identifies one point on the infor-
mation contract curve.
Furthermore, because 𝜆 is a function of the minimum likelihood 𝑑2̄ on the RHS of the con-
straint, by varying 𝑑2̄ (or 𝜆 ), we can trace out the whole curve as the figure below illustrates

In [5]: np.random.seed(1987102)

N = 2 # Number of assets
T = 200 # Sample size
τ = 0.8

# Random market portfolio (sum is normalized to 1)


w_m = np.random.rand(N)
w_m = w_m / (w_m.sum())

μ = (np.random.randn(N) + 5) / 100
S = np.random.randn(N, N)
V = S @ S.T
Σ = V * (w_m @ μ)**2 / (w_m @ V @ w_m)

excess_return = stat.multivariate_normal(μ, Σ)
sample = excess_return.rvs(T)

μ_est = sample.mean(0).reshape(N, 1)
Σ_est = np.cov(sample.T)

σ_m = w_m @ Σ_est @ w_m


d_m = (w_m @ μ_est) / σ_m
μ_m = (d_m * Σ_est @ w_m).reshape(N, 1)

N_r1, N_r2 = 100, 100


r1 = np.linspace(-0.04, .1, N_r1)
r2 = np.linspace(-0.02, .15, N_r2)

λ_grid = np.linspace(.001, 20, 100)


curve = np.asarray([black_litterman(λ, μ_m, μ_est, Σ_est,
τ * Σ_est).flatten() for λ in λ_grid])

λ_slider = FloatSlider(min=.1, max=7, step=.5, value=1)

@interact(λ=λ_slider)
75.2. OVERVIEW 1225

def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * Σ_est)

X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))

for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))

μ_tilde = black_litterman(λ, μ_m, μ_est, Σ_est, τ * Σ_est).flatten()

fig, ax = plt.subplots(figsize=(10, 6))


ax.contourf(X, Y, Z_hat, cmap='viridis', alpha =.4)
ax.contourf(X, Y, Z_BL, cmap='viridis', alpha =.4)
ax.contour(X, Y, Z_BL, [dist_r_BL.pdf(μ_tilde)], cmap='viridis', alpha=.9)
ax.contour(X, Y, Z_hat, [dist_r_hat.pdf(μ_tilde)], cmap='viridis', alpha=.9)
ax.scatter(μ_est[0], μ_est[1])
ax.scatter(μ_m[0], μ_m[1])
ax.scatter(μ_tilde[0], μ_tilde[1], c='k', s=20*3)

ax.plot(curve[:, 0], curve[:, 1], c='k')


ax.axhline(0, c='k', alpha=.8)
ax.axvline(0, c='k', alpha=.8)
ax.set_xlabel(r'Excess return on the first asset, $r_{e, 1}$')
ax.set_ylabel(r'Excess return on the second asset, $r_{e, 2}$')
ax.text(μ_est[0] + 0.003, μ_est[1], r'$\hat{\mu}$')
ax.text(μ_m[0] + 0.003, μ_m[1] + 0.005, r'$\mu_{BL}$')
plt.show()

Note that the line that connects the two points 𝜇̂ and 𝜇𝐵𝐿 is linear, which comes from the
fact that the covariance matrices of the two competing distributions (views) are proportional
to each other
To illustrate the fact that this is not necessarily the case, consider another example using the
same parameter values, except that the “second view” constituting the constraint has covari-
ance matrix 𝜏 𝐼 instead of 𝜏 Σ
This leads to the following figure, on which the curve connecting 𝜇̂ and 𝜇𝐵𝐿 are bending
1226 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

In [6]: λ_grid = np.linspace(.001, 20000, 1000)


curve = np.asarray([black_litterman(λ, μ_m, μ_est, Σ_est,
τ * np.eye(N)).flatten() for λ in λ_grid])

λ_slider = FloatSlider(min=5, max=1500, step=100, value=200)

@interact(λ=λ_slider)
def decolletage(λ):
dist_r_BL = stat.multivariate_normal(μ_m.squeeze(), Σ_est)
dist_r_hat = stat.multivariate_normal(μ_est.squeeze(), τ * np.eye(N))

X, Y = np.meshgrid(r1, r2)
Z_BL = np.zeros((N_r1, N_r2))
Z_hat = np.zeros((N_r1, N_r2))

for i in range(N_r1):
for j in range(N_r2):
Z_BL[i, j] = dist_r_BL.pdf(np.hstack([X[i, j], Y[i, j]]))
Z_hat[i, j] = dist_r_hat.pdf(np.hstack([X[i, j], Y[i, j]]))

μ_tilde = black_litterman(λ, μ_m, μ_est, Σ_est, τ * np.eye(N)).flatten()

fig, ax = plt.subplots(figsize=(10, 6))


ax.contourf(X, Y, Z_hat, cmap='viridis', alpha=.4)
ax.contourf(X, Y, Z_BL, cmap='viridis', alpha=.4)
ax.contour(X, Y, Z_BL, [dist_r_BL.pdf(μ_tilde)], cmap='viridis', alpha=.9)
ax.contour(X, Y, Z_hat, [dist_r_hat.pdf(μ_tilde)], cmap='viridis', alpha=.9)
ax.scatter(μ_est[0], μ_est[1])
ax.scatter(μ_m[0], μ_m[1])

ax.scatter(μ_tilde[0], μ_tilde[1], c='k', s=20*3)

ax.plot(curve[:, 0], curve[:, 1], c='k')


ax.axhline(0, c='k', alpha=.8)
ax.axvline(0, c='k', alpha=.8)
ax.set_xlabel(r'Excess return on the first asset, $r_{e, 1}$')
ax.set_ylabel(r'Excess return on the second asset, $r_{e, 2}$')
ax.text(μ_est[0] + 0.003, μ_est[1], r'$\hat{\mu}$')
ax.text(μ_m[0] + 0.003, μ_m[1] + 0.005, r'$\mu_{BL}$')
plt.show()
75.2. OVERVIEW 1227

75.2.10 Black-Litterman Recommendation as Regularization

First, consider the OLS regression

min ‖𝑋𝛽 − 𝑦‖2


𝛽

which yields the solution

̂
𝛽𝑂𝐿𝑆 = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

A common performance measure of estimators is the mean squared error (MSE)


An estimator is “good” if its MSE is relatively small. Suppose that 𝛽0 is the “true” value of
the coefficient, then the MSE of the OLS estimator is

̂
mse(𝛽𝑂𝐿𝑆 ̂
, 𝛽0 ) ∶= E‖𝛽𝑂𝐿𝑆 − 𝛽0 ‖2 = ⏟⏟ ̂
E‖𝛽⏟ − E𝛽⏟ ‖2 + ‖E ̂
𝛽⏟⏟⏟ 2
⏟⏟⏟
𝑂𝐿𝑆 ⏟⏟
𝑂𝐿𝑆 ⏟⏟ 𝑂𝐿𝑆 − 𝛽
⏟⏟0‖
variance bias

From this decomposition, one can see that in order for the MSE to be small, both the bias
and the variance terms must be small
For example, consider the case when 𝑋 is a 𝑇 -vector of ones (where 𝑇 is the sample size), so
̂
𝛽𝑂𝐿𝑆 is simply the sample average, while 𝛽0 ∈ R is defined by the true mean of 𝑦
In this example the MSE is

2
𝑇
̂ 1
mse(𝛽𝑂𝐿𝑆 , 𝛽0 ) = 2 E (∑(𝑦𝑡 − 𝛽0 )) + 0⏟
𝑇
⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑡=1 bias
variance

However, because there is a trade-off between the estimator’s bias and variance, there are
cases when by permitting a small bias we can substantially reduce the variance so overall the
MSE gets smaller
A typical scenario when this proves to be useful is when the number of coefficients to be esti-
mated is large relative to the sample size
In these cases, one approach to handle the bias-variance trade-off is the so called Tikhonov
regularization
A general form with regularization matrix Γ can be written as

̃ 2}
min {‖𝑋𝛽 − 𝑦‖2 + ‖Γ(𝛽 − 𝛽)‖
𝛽

which yields the solution

̂
𝛽𝑅𝑒𝑔 = (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑦 + Γ′ Γ𝛽)̃

̂
Substituting the value of 𝛽𝑂𝐿𝑆 yields

̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
1228 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

Often, the regularization matrix takes the form Γ = 𝜆𝐼 with 𝜆 > 0 and 𝛽 ̃ = 0
Then the Tikhonov regularization is equivalent to what is called ridge regression in statistics
To illustrate how this estimator addresses the bias-variance trade-off, we compute the MSE of
the ridge estimator

2
𝑇 2
̂ 1 𝜆
mse(𝛽ridge , 𝛽0 ) = E (∑ (𝑦𝑡 − 𝛽0 )) + ( ) 𝛽02
(𝑇 + 𝜆)2 𝑇 +
⏟⏟⏟⏟⏟ 𝜆
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟𝑡=1
bias
variance

The ridge regression shrinks the coefficients of the estimated vector towards zero relative to
the OLS estimates thus reducing the variance term at the cost of introducing a “small” bias
However, there is nothing special about the zero vector
When 𝛽 ̃ ≠ 0 shrinkage occurs in the direction of 𝛽 ̃
Now, we can give a regularization interpretation of the Black-Litterman portfolio recommen-
dation
To this end, simplify first the equation Eq. (4) characterizing the Black-Litterman recommen-
dation

𝜇̃ = (Σ−1 + (𝜏 Σ)−1 )−1 (Σ−1 𝜇𝐵𝐿 + (𝜏 Σ)−1 𝜇)̂


= (1 + 𝜏 −1 )−1 ΣΣ−1 (𝜇𝐵𝐿 + 𝜏 −1 𝜇)̂
= (1 + 𝜏 −1 )−1 (𝜇𝐵𝐿 + 𝜏 −1 𝜇)̂

In our case, 𝜇̂ is the estimated mean excess returns of securities. This could be written as a
vector autoregression where

• 𝑦 is the stacked vector of observed excess returns of size (𝑁 𝑇 × 1) – 𝑁 securities and 𝑇


observations

• 𝑋 = 𝑇 −1 (𝐼𝑁 ⊗ 𝜄𝑇 ) where 𝐼𝑁 is the identity matrix and 𝜄𝑇 is a column vector of ones.

Correspondingly, the OLS regression of 𝑦 on 𝑋 would yield the mean excess returns as coeffi-
cients

With Γ = 𝜏 𝑇 −1 (𝐼𝑁 ⊗ 𝜄𝑇 ) we can write the regularized version of the mean excess return
estimation

̂
𝛽𝑅𝑒𝑔 ̂
= (𝑋 ′ 𝑋 + Γ′ Γ)−1 (𝑋 ′ 𝑋 𝛽𝑂𝐿𝑆 + Γ′ Γ𝛽)̃
̂
= (1 + 𝜏 )−1 𝑋 ′ 𝑋(𝑋 ′ 𝑋)−1 (𝛽𝑂𝐿𝑆 + 𝜏 𝛽)̃
= (1 + 𝜏 )−1 (𝛽 ̂
𝑂𝐿𝑆+ 𝜏 𝛽)̃
̂
= (1 + 𝜏 −1 )−1 (𝜏 −1 𝛽𝑂𝐿𝑆 + 𝛽)̃

̂
Given that 𝛽𝑂𝐿𝑆 = 𝜇̂ and 𝛽 ̃ = 𝜇𝐵𝐿 in the Black-Litterman model, we have the following
interpretation of the model’s recommendation
The estimated (personal) view of the mean excess returns, 𝜇̂ that would lead to extreme
short-long positions are “shrunk” towards the conservative market view, 𝜇𝐵𝐿 , that leads to
the more conservative market portfolio
75.2. OVERVIEW 1229

So the Black-Litterman procedure results in a recommendation that is a compromise between


the conservative market portfolio and the more extreme portfolio that is implied by estimated
“personal” views

75.2.11 Digression on A Robust Control Operator

The Black-Litterman approach is partly inspired by the econometric insight that it is easier
to estimate covariances of excess returns than the means
That is what gave Black and Litterman license to adjust investors’ perception of mean excess
returns while not tampering with the covariance matrix of excess returns
The robust control theory is another approach that also hinges on adjusting mean excess re-
turns but not covariances
Associated with a robust control problem is what Hansen and Sargent [57], [52] call a T oper-
ator
Let’s define the T operator as it applies to the problem at hand
Let 𝑥 be an 𝑛 × 1 Gaussian random vector with mean vector 𝜇 and covariance matrix Σ =
𝐶𝐶 ′ . This means that 𝑥 can be represented as

𝑥 = 𝜇 + 𝐶𝜖

where 𝜖 ∼ 𝒩(0, 𝐼)
Let 𝜙(𝜖) denote the associated standardized Gaussian density
Let 𝑚(𝜖, 𝜇) be a likelihood ratio, meaning that it satisfies

• 𝑚(𝜖, 𝜇) > 0
• ∫ 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖 = 1

That is, 𝑚(𝜖, 𝜇) is a non-negative random variable with mean 1


Multiplying 𝜙(𝜖) by the likelihood ratio 𝑚(𝜖, 𝜇) produces a distorted distribution for 𝜖,
namely

̃ = 𝑚(𝜖, 𝜇)𝜙(𝜖)
𝜙(𝜖)

The next concept that we need is the entropy of the distorted distribution 𝜙 ̃ with respect to
𝜙
Entropy is defined as

ent = ∫ log 𝑚(𝜖, 𝜇)𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖

or

̃
ent = ∫ log 𝑚(𝜖, 𝜇)𝜙(𝜖)𝑑𝜖
1230 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

That is, relative entropy is the expected value of the likelihood ratio 𝑚 where the expectation
is taken with respect to the twisted density 𝜙 ̃
Relative entropy is non-negative. It is a measure of the discrepancy between two probability
distributions
As such, it plays an important role in governing the behavior of statistical tests designed to
discriminate one probability distribution from another
We are ready to define the T operator
Let 𝑉 (𝑥) be a value function
Define

T (𝑉 (𝑥)) = min ∫ 𝑚(𝜖, 𝜇)[𝑉 (𝜇 + 𝐶𝜖) + 𝜃 log 𝑚(𝜖, 𝜇)]𝜙(𝜖)𝑑𝜖


𝑚(𝜖,𝜇)

−𝑉 (𝜇 + 𝐶𝜖)
= − log 𝜃 ∫ exp ( ) 𝜙(𝜖)𝑑𝜖
𝜃

This asserts that T is an indirect utility function for a minimization problem in which an evil
agent chooses a distorted probability distribution 𝜙 ̃ to lower expected utility, subject to a
penalty term that gets bigger the larger is relative entropy
Here the penalty parameter

𝜃 ∈ [𝜃, +∞]

is a robustness parameter when it is +∞, there is no scope for the minimizing agent to dis-
tort the distribution, so no robustness to alternative distributions is acquired As 𝜃 is lowered,
more robustness is achieved
Note: The T operator is sometimes called a risk-sensitivity operator
We shall apply Tto the special case of a linear value function 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1) where 𝑟 ⃗ − 𝑟𝑓 1 ∼
𝒩(𝜇, Σ) or 𝑟 ⃗ − 𝑟𝑓 1 = 𝜇 + 𝐶𝜖and 𝜖 ∼ 𝒩(0, 𝐼)
The associated worst-case distribution of 𝜖 is Gaussian with mean 𝑣 = −𝜃−1 𝐶 ′ 𝑤 and co-
variance matrix 𝐼 (When the value function is affine, the worst-case distribution distorts the
mean vector of 𝜖 but not the covariance matrix of 𝜖)
For utility function argument 𝑤′ (𝑟 ⃗ − 𝑟𝑓 1)

1 ′
T(𝑟 ⃗ − 𝑟𝑓 1) = 𝑤′ 𝜇 + 𝜁 − 𝑤 Σ𝑤
2𝜃
and entropy is

𝑣′ 𝑣 1
= 2 𝑤′ 𝐶𝐶 ′ 𝑤
2 2𝜃

75.2.12 A Robust Mean-variance Portfolio Model

According to criterion (1), the mean-variance portfolio choice problem chooses 𝑤 to maximize

𝐸[𝑤(𝑟 ⃗ − 𝑟𝑓 1)]] − var[𝑤(𝑟 ⃗ − 𝑟𝑓 1)]


75.3. APPENDIX 1231

which equals

𝛿
𝑤′ 𝜇 − 𝑤′ Σ𝑤
2

A robust decision maker can be modeled as replacing the mean return 𝐸[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] with the
risk-sensitive

1 ′
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] = 𝑤′ 𝜇 − 𝑤 Σ𝑤
2𝜃

that comes from replacing the mean 𝜇 of 𝑟 ⃗ − 𝑟_𝑓1 with the worst-case mean

𝜇 − 𝜃−1 Σ𝑤

Notice how the worst-case mean vector depends on the portfolio 𝑤


The operator T is the indirect utility function that emerges from solving a problem in which
an agent who chooses probabilities does so in order to minimize the expected utility of a max-
imizing agent (in our case, the maximizing agent chooses portfolio weights 𝑤)
The robust version of the mean-variance portfolio choice problem is then to choose a portfolio
𝑤 that maximizes

𝛿
T[𝑤(𝑟 ⃗ − 𝑟𝑓 1)] − 𝑤′ Σ𝑤
2
or

𝛿
𝑤′ (𝜇 − 𝜃−1 Σ𝑤) − 𝑤′ Σ𝑤 (7)
2

The minimizer of Eq. (7) is

1
𝑤rob = Σ−1 𝜇
𝛿+𝛾

where 𝛾 ≡ 𝜃−1 is sometimes called the risk-sensitivity parameter


An increase in the risk-sensitivity parameter 𝛾 shrinks the portfolio weights toward zero in
the same way that an increase in risk aversion does

75.3 Appendix

We want to illustrate the “folk theorem” that with high or moderate frequency data, it is
more difficult to estimate means than variances
In order to operationalize this statement, we take two analog estimators:

𝑁
• sample average: 𝑋̄ 𝑁 = 1
𝑁 ∑𝑖=1 𝑋𝑖
𝑁
• sample variance: 𝑆𝑁 = 1
𝑁−1 ∑𝑡=1 (𝑋𝑖 − 𝑋̄ 𝑁 )2
1232 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

to estimate the unconditional mean and unconditional variance of the random variable 𝑋,
respectively
To measure the “difficulty of estimation”, we use mean squared error (MSE), that is the aver-
age squared difference between the estimator and the true value
Assuming that the process {𝑋𝑖 }is ergodic, both analog estimators are known to converge to
their true values as the sample size 𝑁 goes to infinity
More precisely for all 𝜀 > 0

lim 𝑃 {∣𝑋̄ 𝑁 − E𝑋∣ > 𝜀} = 0


𝑁→∞

and

lim 𝑃 {|𝑆𝑁 − V𝑋| > 𝜀} = 0


𝑁→∞

A necessary condition for these convergence results is that the associated MSEs vanish as 𝑁
goes to infinity, or in other words,

MSE(𝑋̄ 𝑁 , E𝑋) = 𝑜(1) and MSE(𝑆𝑁 , V𝑋) = 𝑜(1)

Even if the MSEs converge to zero, the associated rates might be different. Looking at the
limit of the relative MSE (as the sample size grows to infinity)

MSE(𝑆𝑁 , V𝑋) 𝑜(1)


= → 𝐵
̄
MSE(𝑋𝑁 , E𝑋) 𝑜(1) 𝑁→∞

can inform us about the relative (asymptotic) rates


We will show that in general, with dependent data, the limit 𝐵 depends on the sampling fre-
quency.
In particular, we find that the rate of convergence of the variance estimator is less sensitive to
increased sampling frequency than the rate of convergence of the mean estimator.
Hence, we can expect the relative asymptotic rate, 𝐵, to get smaller with higher frequency
data, illustrating that “it is more difficult to estimate means than variances”.
That is, we need significantly more data to obtain a given precision of the mean estimate
than for our variance estimate

75.3.1 A Special Case – IID Sample

We start our analysis with the benchmark case of IID data. Consider a sample of size 𝑁 gen-
erated by the following IID process,

𝑋𝑖 ∼ 𝒩(𝜇, 𝜎2 )

Taking 𝑋̄ 𝑁 to estimate the mean, the MSE is

𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) =
𝑁
75.3. APPENDIX 1233

Taking 𝑆𝑁 to estimate the variance, the MSE is

2𝜎4
MSE(𝑆𝑁 , 𝜎2 ) =
𝑁 −1
Both estimators are unbiased and hence the MSEs reflect the corresponding variances of the
estimators
Furthermore, both MSEs are 𝑜(1) with a (multiplicative) factor of difference in their rates of
convergence:

MSE(𝑆𝑁 , 𝜎2 ) 𝑁 2𝜎2
= → 2𝜎2
MSE(𝑋̄ 𝑁 , 𝜇) 𝑁 −1 𝑁→∞

We are interested in how this (asymptotic) relative rate of convergence changes as increasing
sampling frequency puts dependence into the data

75.3.2 Dependence and Sampling Frequency

To investigate how sampling frequency affects relative rates of convergence, we assume that
the data are generated by a mean-reverting continuous time process of the form

𝑑𝑋𝑡 = −𝜅(𝑋𝑡 − 𝜇)𝑑𝑡 + 𝜎𝑑𝑊𝑡

where 𝜇is the unconditional mean, 𝜅 > 0 is a persistence parameter, and {𝑊𝑡 } is a standard-
ized Brownian motion
Observations arising from this system in particular discrete periods 𝒯(ℎ) ≡ {𝑛ℎ ∶ 𝑛 ∈
Z}withℎ > 0 can be described by the following process

𝑋𝑡+1 = (1 − exp(−𝜅ℎ))𝜇 + exp(−𝜅ℎ)𝑋𝑡 + 𝜖𝑡,ℎ

where

𝜎2 (1 − exp(−2𝜅ℎ))
𝜖𝑡,ℎ ∼ 𝒩(0, Σℎ ) with Σℎ =
2𝜅
We call ℎ the frequency parameter, whereas 𝑛 represents the number of lags between observa-
tions
Hence, the effective distance between two observations 𝑋𝑡 and 𝑋𝑡+𝑛 in the discrete time nota-
tion is equal to ℎ ⋅ 𝑛 in terms of the underlying continuous time process
Straightforward calculations show that the autocorrelation function for the stochastic process
{𝑋𝑡 }𝑡∈𝒯(ℎ) is

Γℎ (𝑛) ≡ corr(𝑋𝑡+ℎ𝑛 , 𝑋𝑡 ) = exp(−𝜅ℎ𝑛)

and the auto-covariance function is

exp(−𝜅ℎ𝑛)𝜎2
𝛾ℎ (𝑛) ≡ cov(𝑋𝑡+ℎ𝑛 , 𝑋𝑡 ) = .
2𝜅
1234 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

𝜎2
It follows that if 𝑛 = 0, the unconditional variance is given by 𝛾ℎ (0) = 2𝜅 irrespective of the
sampling frequency
The following figure illustrates how the dependence between the observations is related to the
sampling frequency

• For any given ℎ, the autocorrelation converges to zero as we increase the distance – 𝑛–
between the observations. This represents the “weak dependence” of the 𝑋 process
• Moreover, for a fixed lag length, 𝑛, the dependence vanishes as the sampling frequency
goes to infinity. In fact, letting ℎ go to ∞ gives back the case of IID data

In [7]: μ = .0
κ = .1
σ = .5
var_uncond = σ**2 / (2 * κ)

n_grid = np.linspace(0, 40, 100)


autocorr_h1 = np.exp(-κ * n_grid * 1)
autocorr_h2 = np.exp(-κ * n_grid * 2)
autocorr_h5 = np.exp(-κ * n_grid * 5)
autocorr_h1000 = np.exp(-κ * n_grid * 1e8)

fig, ax = plt.subplots(figsize=(8, 4))


ax.plot(n_grid, autocorr_h1, label=r'$h=1$', c='darkblue', lw=2)
ax.plot(n_grid, autocorr_h2, label=r'$h=2$', c='darkred', lw=2)
ax.plot(n_grid, autocorr_h5, label=r'$h=5$', c='orange', lw=2)
ax.plot(n_grid, autocorr_h1000, label=r'"$h=\infty$"', c='darkgreen', lw=2)
ax.legend()
ax.grid()
ax.set(title=r'Autocorrelation functions, $\Gamma_h(n)$',
xlabel=r'Lags between observations, $n$')
plt.show()

75.3.3 Frequency and the Mean Estimator

Consider again the AR(1) process generated by discrete sampling with frequency ℎ. Assume
that we have a sample of size 𝑁 and we would like to estimate the unconditional mean – in
75.3. APPENDIX 1235

our case the true mean is 𝜇


Again, the sample average is an unbiased estimator of the unconditional mean

1 𝑁
E[𝑋̄ 𝑁 ] = ∑ E[𝑋𝑖 ] = E[𝑋0 ] = 𝜇
𝑁 𝑖=1

The variance of the sample mean is given by

1 𝑁
V (𝑋̄ 𝑁 ) = V ( ∑ 𝑋𝑖 )
𝑁 𝑖=1
𝑁 𝑁−1 𝑁
1
= (∑ V(𝑋 𝑖 ) + 2 ∑ ∑ cov(𝑋𝑖 , 𝑋𝑠 ))
𝑁 2 𝑖=1 𝑖=1 𝑠=𝑖+1
𝑁−1
1
= (𝑁 𝛾(0) + 2 ∑ 𝑖 ⋅ 𝛾 (ℎ ⋅ (𝑁 − 𝑖)))
𝑁2 𝑖=1
𝑁−1
1 𝜎2 𝜎2
= 2 (𝑁 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖)) )
𝑁 2𝜅 𝑖=1
2𝜅

It is explicit in the above equation that time dependence in the data inflates the variance of
the mean estimator through the covariance terms. Moreover, as we can see, a higher sampling
frequency—smaller ℎ—makes all the covariance terms larger, everything else being fixed. This
implies a relatively slower rate of convergence of the sample average for high-frequency data
Intuitively, the stronger dependence across observations for high-frequency data reduces the
“information content” of each observation relative to the IID case
We can upper bound the variance term in the following way

𝑁−1
1
V(𝑋̄ 𝑁 ) = 2 (𝑁 𝜎2 + 2 ∑ 𝑖 ⋅ exp(−𝜅ℎ(𝑁 − 𝑖))𝜎2 )
𝑁 𝑖=1
𝑁−1
𝜎2
≤ (1 + 2 ∑ ⋅ exp(−𝜅ℎ(𝑖)))
2𝜅𝑁 𝑖=1
𝜎2 1 − exp(−𝜅ℎ)𝑁−1
= (1 + 2 )
2𝜅𝑁
⏟ 1 − exp(−𝜅ℎ)
IID case

Asymptotically the exp(−𝜅ℎ)𝑁−1 vanishes and the dependence in the data inflates the bench-
mark IID variance by a factor of

1
(1 + 2 )
1 − exp(−𝜅ℎ)

This long run factor is larger the higher is the frequency (the smaller is ℎ)
Therefore, we expect the asymptotic relative MSEs, 𝐵, to change with time-dependent data.
We just saw that the mean estimator’s rate is roughly changing by a factor of

1
(1 + 2 )
1 − exp(−𝜅ℎ)
1236 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY

Unfortunately, the variance estimator’s MSE is harder to derive


Nonetheless, we can approximate it by using (large sample) simulations, thus getting an idea
about how the asymptotic relative MSEs changes in the sampling frequency ℎ relative to the
IID case that we compute in closed form

In [8]: def sample_generator(h, N, M):


� = (1 - np.exp(-κ * h)) * μ
ρ = np.exp(-κ * h)
s = σ**2 * (1 - np.exp(-2 * κ * h)) / (2 * κ)

mean_uncond = μ
std_uncond = np.sqrt(σ**2 / (2 * κ))

ε_path = stat.norm(0, np.sqrt(s)).rvs((M, N))

y_path = np.zeros((M, N + 1))


y_path[:, 0] = stat.norm(mean_uncond, std_uncond).rvs(M)

for i in range(N):
y_path[:, i + 1] = � + ρ * y_path[:, i] + ε_path[:, i]

return y_path

In [9]: # Generate large sample for different frequencies


N_app, M_app = 1000, 30000 # Sample size, number of simulations
h_grid = np.linspace(.1, 80, 30)

var_est_store = []
mean_est_store = []
labels = []

for h in h_grid:
labels.append(h)
sample = sample_generator(h, N_app, M_app)
mean_est_store.append(np.mean(sample, 1))
var_est_store.append(np.var(sample, 1))

var_est_store = np.array(var_est_store)
mean_est_store = np.array(mean_est_store)

# Save mse of estimators


mse_mean = np.var(mean_est_store, 1) + (np.mean(mean_est_store, 1) - μ)**2
mse_var = np.var(var_est_store, 1) + (np.mean(var_est_store, 1) - var_uncond)**2

benchmark_rate = 2 * var_uncond # IID case

# Relative MSE for large samples


rate_h = mse_var / mse_mean

fig, ax = plt.subplots(figsize=(8, 5))


ax.plot(h_grid, rate_h, c='darkblue', lw=2,
label=r'large sample relative MSE, $B(h)$')
ax.axhline(benchmark_rate, c='k', ls='--', label=r'IID benchmark')
ax.set_title('Relative MSE for large samples as a function of sampling frequency \n MSE($S_N$) relativ
ax.set_xlabel('Sampling frequency, $h$')
ax.legend()
plt.show()
75.3. APPENDIX 1237

The above figure illustrates the relationship between the asymptotic relative MSEs and the
sampling frequency

• We can see that with low-frequency data – large values of ℎ – the ratio of asymptotic
rates approaches the IID case
• As ℎ gets smaller – the higher the frequency – the relative performance of the variance
estimator is better in the sense that the ratio of asymptotic rates gets smaller. That
is, as the time dependence gets more pronounced, the rate of convergence of the mean
estimator’s MSE deteriorates more than that of the variance estimator
1238 75. TWO MODIFICATIONS OF MEAN-VARIANCE PORTFOLIO THEORY
Part XII

Dynamic Programming Squared

1239
76

Stackelberg Plans

76.1 Contents

• Overview 76.2

• Duopoly 76.3

• The Stackelberg Problem 76.4

• Stackelberg Plan 76.5

• Recursive Representation of Stackelberg Plan 76.6

• Computing the Stackelberg Plan 76.7

• Exhibiting Time Inconsistency of Stackelberg Plan 76.8

• Recursive Formulation of the Follower’s Problem 76.9

• Markov Perfect Equilibrium 76.10

• MPE vs. Stackelberg 76.11

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

76.2 Overview

This notebook formulates and computes a plan that a Stackelberg leader uses to manip-
ulate forward-looking decisions of a Stackelberg follower that depend on continuation se-
quences of decisions made once and for all by the Stackelberg leader at time 0
To facilitate computation and interpretation, we formulate things in a context that allows us
to apply linear optimal dynamic programming
From the beginning, we carry along a linear-quadratic model of duopoly in which firms face
adjustment costs that make them want to forecast actions of other firms that influence future
prices

1241
1242 76. STACKELBERG PLANS

76.3 Duopoly

Time is discrete and is indexed by 𝑡 = 0, 1, …


Two firms produce a single good whose demand is governed by the linear inverse demand
curve

𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 )

where 𝑞𝑖𝑡 is output of firm 𝑖 at time 𝑡 and 𝑎0 and 𝑎1 are both positive
𝑞10 , 𝑞20 are given numbers that serve as initial conditions at time 0
By incurring a cost of change

2
𝛾𝑣𝑖𝑡

where 𝛾 > 0, firm 𝑖 can change its output according to

𝑞𝑖𝑡+1 = 𝑞𝑖𝑡 + 𝑣𝑖𝑡

Firm 𝑖’s profits at time 𝑡 equal

2
𝜋𝑖𝑡 = 𝑝𝑡 𝑞𝑖𝑡 − 𝛾𝑣𝑖𝑡

Firm 𝑖 wants to maximize the present value of its profits


∑ 𝛽 𝑡 𝜋𝑖𝑡
𝑡=0

where 𝛽 ∈ (0, 1) is a time discount factor

76.3.1 Stackelberg Leader and Follower

Each firm 𝑖 = 1, 2 chooses a sequence 𝑞𝑖⃗ ≡ {𝑞𝑖𝑡+1 }∞


𝑡=0 once and for all at time 0

We let firm 2 be a Stackelberg leader and firm 1 be a Stackelberg follower


The leader firm 2 goes first and chooses {𝑞2𝑡+1 }∞
𝑡=0 once and for all at time 0

Knowing that firm 2 has chosen {𝑞2𝑡+1 }∞


𝑡=0 , the follower firm 1 goes second and chooses
{𝑞1𝑡+1 }∞
𝑡=0 once and for all at time 0
In choosing 𝑞2⃗ , firm 2 takes into account that firm 1 will base its choice of 𝑞1⃗ on firm 2’s
choice of 𝑞2⃗

76.3.2 Abstract Statement of the Leader’s and Follower’s Problems

We can express firm 1’s problem as

max Π1 (𝑞1⃗ ; 𝑞2⃗ )


𝑞1⃗
76.3. DUOPOLY 1243

where the appearance behind the semi-colon indicates that 𝑞2⃗ is given
Firm 1’s problem induces the best response mapping

𝑞1⃗ = 𝐵(𝑞2⃗ )

(Here 𝐵 maps a sequence into a sequence)


The Stackelberg leader’s problem is

max Π2 (𝐵(𝑞2⃗ ), 𝑞2⃗ )


𝑞2⃗

whose maximizer is a sequence 𝑞2⃗ that depends on the initial conditions 𝑞10 , 𝑞20 and the pa-
rameters of the model 𝑎0 , 𝑎1 , 𝛾
This formulation captures key features of the model

• Both firms make once-and-for-all choices at time 0


• This is true even though both firms are choosing sequences of quantities that are in-
dexed by time
• The Stackelberg leader chooses first within time 0, knowing that the Stackelberg fol-
lower will choose second within time 0

While our abstract formulation reveals the timing protocol and equilibrium concept well, it
obscures details that must be addressed when we want to compute and interpret a Stackel-
berg plan and the follower’s best response to it
To gain insights about these things, we study them in more detail

76.3.3 Firms’ Problems

Firm 1 acts as if firm 2’s sequence {𝑞2𝑡+1 }∞


𝑡=0 is given and beyond its control

Firm 2 knows that firm 1 chooses second and takes this into account in choosing {𝑞2𝑡+1 }∞
𝑡=0

In the spirit of working backward, we study firm 1’s problem first, taking {𝑞2𝑡+1 }∞
𝑡=0 as given

We can formulate firm 1’s optimum problem in terms of the Lagrangian


𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0

Firm 1 seeks a maximum with respect to {𝑞1𝑡+1 , 𝑣1𝑡 }∞


𝑡=0 and a minimum with respect to
{𝜆𝑡 }∞
𝑡=0

We approach this problem using methods described in Ljungqvist and Sargent RMT5 chapter
2, appendix A and Macroeconomic Theory, 2nd edition, chapter IX
First-order conditions for this problem are

𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡
1244 76. STACKELBERG PLANS

These first-order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to take the
form

𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞1𝑡+1 − 1 𝑞2𝑡+1
2𝛾 𝛾 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡

We can substitute the second equation into the first equation to obtain

(𝑞1𝑡+1 − 𝑞1𝑡 ) = 𝛽(𝑞1𝑡+2 − 𝑞1𝑡+1 ) + 𝑐0 − 𝑐1 𝑞1𝑡+1 − 𝑐2 𝑞2𝑡+1

𝛽𝑎0 𝛽𝑎1 𝛽𝑎1


where 𝑐0 = 2𝛾 , 𝑐1 = 𝛾 , 𝑐2 = 2𝛾

This equation can in turn be rearranged to become the second-order difference equation

𝑞1𝑡 + (1 + 𝛽 + 𝑐1 )𝑞1𝑡+1 − 𝛽𝑞1𝑡+2 = 𝑐0 − 𝑐2 𝑞2𝑡+1 (1)

Equation Eq. (1) is a second-order difference equation in the sequence 𝑞1⃗ whose solution we
want
It satisfies two boundary conditions:

• an initial condition that 𝑞1,0 , which is given


• a terminal condition requiring that lim𝑇 →+∞ 𝛽 𝑇 𝑞1𝑡
2
< +∞

Using the lag operators described in chapter IX of Macroeconomic Theory, Second edition
(1987), difference equation Eq. (1) can be written as

1 + 𝛽 + 𝑐1
𝛽(1 − 𝐿 + 𝛽 −1 𝐿2 )𝑞1𝑡+2 = −𝑐0 + 𝑐2 𝑞2𝑡+1
𝛽

The polynomial in the lag operator on the left side can be factored as

1 + 𝛽 + 𝑐1
(1 − 𝐿 + 𝛽 −1 𝐿2 ) = (1 − 𝛿1 𝐿)(1 − 𝛿2 𝐿) (2)
𝛽

where 0 < 𝛿1 < 1 < √1 < 𝛿2


𝛽

Because 𝛿2 > √1𝛽 the operator (1 − 𝛿2 𝐿) contributes an unstable component if solved back-
wards but a stable component if solved forwards
Mechanically, write

(1 − 𝛿2 𝐿) = −𝛿2 𝐿(1 − 𝛿2−1 𝐿−1 )

and compute the following inverse operator

−1 −1 −1
[−𝛿2 𝐿(1 − 𝛿2−1 𝐿−1 )] = −𝛿2 (1 − 𝛿2 ) 𝐿−1

Operating on both sides of equation Eq. (2) with 𝛽 −1 times this inverse operator gives the
follower’s decision rule for setting 𝑞1𝑡+1 in the feedback-feedforward form
76.4. THE STACKELBERG PROBLEM 1245


1
𝑞1𝑡+1 = 𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 −1
+ 𝑐 2 𝛿 −1 −1
2 𝛽 ∑ 𝛿2𝑗 𝑞2𝑡+𝑗+1 , 𝑡≥0 (3)
1 − 𝛿2 𝑗=0

The problem of the Stackelberg leader firm 2 is to choose the sequence {𝑞2𝑡+1 }∞
𝑡=0 to maxi-
mize its discounted profits


∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0

subject to the sequence of constraints Eq. (3) for 𝑡 ≥ 0


We can put a sequence {𝜃𝑡 }∞
𝑡=0 of Lagrange multipliers on the sequence of equations Eq. (3)
and formulate the following Lagrangian for the Stackelberg leader firm 2’s problem


𝐿̃ = ∑ 𝛽 𝑡 {(𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ))𝑞2𝑡 − 𝛾(𝑞2𝑡+1 − 𝑞2𝑡 )2 }
𝑡=0
∞ ∞ (4)
1
+∑𝛽 𝑡
𝜃𝑡 {𝛿1 𝑞1𝑡 − 𝑐0 𝛿2−1 𝛽 −1 + 𝑐 𝛿
2 2
−1 −1
𝛽 ∑ 𝛿2−𝑗 𝑞2𝑡+𝑗+1 − 𝑞1𝑡+1 }
𝑡=0
1 − 𝛿2−1 𝑗=0

subject to initial conditions for 𝑞1𝑡 , 𝑞2𝑡 at 𝑡 = 0


Comments: We have formulated the Stackelberg problem in a space of sequences
The max-min problem associated with Lagrangian Eq. (4) is unpleasant because the time 𝑡
component of firm 1’s payoff function depends on the entire future of its choices of {𝑞1𝑡+𝑗 }∞
𝑗=0

This renders a direct attack on the problem cumbersome


Therefore, below, we will formulate the Stackelberg leader’s problem recursively
We’ll put our little duopoly model into a broader class of models with the same conceptual
structure

76.4 The Stackelberg Problem

We formulate a class of linear-quadratic Stackelberg leader-follower problems of which our


duopoly model is an instance
We use the optimal linear regulator (a.k.a. the linear-quadratic dynamic programming prob-
lem described in LQ Dynamic Programming problems) to represent a Stackelberg leader’s
problem recursively
Let 𝑧𝑡 be an 𝑛𝑧 × 1 vector of natural state variables
Let 𝑥𝑡 be an 𝑛𝑥 × 1 vector of endogenous forward-looking variables that are physically free to
jump at 𝑡
In our duopoly example 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the Stackelberg follower
Let 𝑢𝑡 be a vector of decisions chosen by the Stackelberg leader at 𝑡
The 𝑧𝑡 vector is inherited physically from the past
1246 76. STACKELBERG PLANS

But 𝑥𝑡 is a decision made by the Stackelberg follower at time 𝑡 that is the follower’s best re-
sponse to the choice of an entire sequence of decisions made by the Stackelberg leader at time
𝑡=0
Let

𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡

Represent the Stackelberg leader’s one-period loss function as

𝑟(𝑦, 𝑢) = 𝑦′ 𝑅𝑦 + 𝑢′ 𝑄𝑢

Subject to an initial condition for 𝑧0 , but not for 𝑥0 , the Stackelberg leader wants to maxi-
mize


− ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 ) (5)
𝑡=0

The Stackelberg leader faces the model

𝐼 0 𝑧 𝐴̂ ̂
𝐴12 𝑧𝑡 ̂
[ ] [ 𝑡+1 ] = [ 11̂ ̂ ] [𝑥𝑡 ] + 𝐵𝑢𝑡 (6)
𝐺21 𝐺22 𝑥𝑡+1 𝐴21 𝐴22

𝐼 0
We assume that the matrix [ ] on the left side of equation Eq. (6) is invertible, so
𝐺21 𝐺22
that we can multiply both sides by its inverse to obtain

𝑧 𝐴 𝐴12 𝑧𝑡
[ 𝑡+1 ] = [ 11 ] [ ] + 𝐵𝑢𝑡 (7)
𝑥𝑡+1 𝐴21 𝐴22 𝑥𝑡

or

𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡 (8)

76.4.1 Interpretation of the Second Block of Equations

The Stackelberg follower’s best response mapping is summarized by the second block of equa-
tions of Eq. (7)
In particular, these equations are the first-order conditions of the Stackelberg follower’s opti-
mization problem (i.e., its Euler equations)
These Euler equations summarize the forward-looking aspect of the follower’s behavior and
express how its time 𝑡 decision depends on the leader’s actions at times 𝑠 ≥ 𝑡
When combined with a stability condition to be imposed below, the Euler equations summa-
rize the follower’s best response to the sequence of actions by the leader
The Stackelberg leader maximizes Eq. (5) by choosing sequences {𝑢𝑡 , 𝑥𝑡 , 𝑧𝑡+1 }∞
𝑡=0 subject to
Eq. (8) and an initial condition for 𝑧0
76.4. THE STACKELBERG PROBLEM 1247

Note that we have an initial condition for 𝑧0 but not for 𝑥0


𝑥0 is among the variables to be chosen at time 0 by the Stackelberg leader
The Stackelberg leader uses its understanding of the responses restricted by Eq. (8) to manip-
ulate the follower’s decisions

76.4.2 More Mechanical Details

For any vector 𝑎𝑡 , define 𝑎𝑡⃗ = [𝑎𝑡 , 𝑎𝑡+1 …]


Define a feasible set of (𝑦1⃗ , 𝑢⃗0 ) sequences

Ω(𝑦0 ) = {(𝑦1⃗ , 𝑢⃗0 ) ∶ 𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡 , ∀𝑡 ≥ 0}

Please remember that the follower’s Euler equation is embedded in the system of dynamic
equations 𝑦𝑡+1 = 𝐴𝑦𝑡 + 𝐵𝑢𝑡
Note that in the definition of Ω(𝑦0 ), 𝑦0 is taken as given
Although it is taken as given in Ω(𝑦0 ), eventually, the 𝑥0 component of 𝑦0 will be chosen by
the Stackelberg leader

76.4.3 Two Subproblems

Once again we use backward induction


We express the Stackelberg problem in terms of two subproblems
Subproblem 1 is solved by a continuation Stackelberg leader at each date 𝑡 ≥ 0
Subproblem 2 is solved the Stackelberg leader at 𝑡 = 0
The two subproblems are designed

• to respect the protocol in which the follower chooses 𝑞1⃗ after seeing 𝑞2⃗ chosen by the
leader
• to make the leader choose 𝑞2⃗ while respecting that 𝑞1⃗ will be the follower’s best response
to 𝑞2⃗
• to represent the leader’s problem recursively by artfully choosing the state variables
confronting and the control variables available to the leader

Subproblem 1


𝑣(𝑦0 ) = max − ∑ 𝛽 𝑡 𝑟(𝑦𝑡 , 𝑢𝑡 )
(𝑦1⃗ ,𝑢⃗ 0 )∈Ω(𝑦0 )
𝑡=0

Subproblem 2

𝑤(𝑧0 ) = max 𝑣(𝑦0 )


𝑥0

Subproblem 1 takes the vector of forward-looking variables 𝑥0 as given


1248 76. STACKELBERG PLANS

Subproblem 2 optimizes over 𝑥0


The value function 𝑤(𝑧0 ) tells the value of the Stackelberg plan as a function of the vector of
natural state variables at time 0, 𝑧0

76.4.4 Two Bellman Equations

We now describe Bellman equations for 𝑣(𝑦) and 𝑤(𝑧0 )


Subproblem 1
The value function 𝑣(𝑦) in subproblem 1 satisfies the Bellman equation

𝑣(𝑦) = max

{−𝑟(𝑦, 𝑢) + 𝛽𝑣(𝑦∗ )} (9)
𝑢,𝑦

where the maximization is subject to

𝑦∗ = 𝐴𝑦 + 𝐵𝑢

and 𝑦∗ denotes next period’s value


Substituting 𝑣(𝑦) = −𝑦′ 𝑃 𝑦 into Bellman equation Eq. (9) gives

−𝑦′ 𝑃 𝑦 = max𝑢,𝑦∗ {−𝑦′ 𝑅𝑦 − 𝑢′ 𝑄𝑢 − 𝛽𝑦∗′ 𝑃 𝑦∗ }

which as in lecture linear regulator gives rise to the algebraic matrix Riccati equation

𝑃 = 𝑅 + 𝛽𝐴′ 𝑃 𝐴 − 𝛽 2 𝐴′ 𝑃 𝐵(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

and the optimal decision rule coefficient vector

𝐹 = 𝛽(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

where the optimal decision rule is

𝑢𝑡 = −𝐹 𝑦𝑡

Subproblem 2
We find an optimal 𝑥0 by equating to zero the gradient of 𝑣(𝑦0 ) with respect to 𝑥0 :

−2𝑃21 𝑧0 − 2𝑃22 𝑥0 = 0,

which implies that

−1
𝑥0 = −𝑃22 𝑃21 𝑧0
76.5. STACKELBERG PLAN 1249

76.5 Stackelberg Plan

Now let’s map our duopoly model into the above setup.
We will formulate a state space system

𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡

where in this instance 𝑥𝑡 = 𝑣1𝑡 , the time 𝑡 decision of the follower firm 1

76.5.1 Calculations to Prepare Duopoly Model

Now we’ll proceed to cast our duopoly model within the framework of the more general
linear-quadratic structure described above
That will allow us to compute a Stackelberg plan simply by enlisting a Riccati equation to
solve a linear-quadratic dynamic program
As emphasized above, firm 1 acts as if firm 2’s decisions {𝑞2𝑡+1 , 𝑣2𝑡 }∞
𝑡=0 are given and beyond
its control

76.5.2 Firm 1’s Problem

We again formulate firm 1’s optimum problem in terms of the Lagrangian


𝐿 = ∑ 𝛽 𝑡 {𝑎0 𝑞1𝑡 − 𝑎1 𝑞1𝑡
2 2
− 𝑎1 𝑞1𝑡 𝑞2𝑡 − 𝛾𝑣1𝑡 + 𝜆𝑡 [𝑞1𝑡 + 𝑣1𝑡 − 𝑞1𝑡+1 ]}
𝑡=0

Firm 1 seeks a maximum with respect to {𝑞1𝑡+1 , 𝑣1𝑡 }∞


𝑡=0 and a minimum with respect to

{𝜆𝑡 }𝑡=0
First-order conditions for this problem are

𝜕𝐿
= 𝑎0 − 2𝑎1 𝑞1𝑡 − 𝑎1 𝑞2𝑡 + 𝜆𝑡 − 𝛽 −1 𝜆𝑡−1 = 0, 𝑡≥1
𝜕𝑞1𝑡
𝜕𝐿
= −2𝛾𝑣1𝑡 + 𝜆𝑡 = 0, 𝑡 ≥ 0
𝜕𝑣1𝑡

These first-order order conditions and the constraint 𝑞1𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡 can be rearranged to
take the form

𝛽𝑎0 𝛽𝑎1 𝛽𝑎
𝑣1𝑡 = 𝛽𝑣1𝑡+1 + − 𝑞 − 1 𝑞2𝑡+1
2𝛾 𝛾 1𝑡+1 2𝛾
𝑞𝑡+1 = 𝑞1𝑡 + 𝑣1𝑡

We use these two equations as components of the following linear system that confronts a
Stackelberg continuation leader at time 𝑡
1250 76. STACKELBERG PLANS

1 0 0 0 1 1 0 0 0 1 0
⎡ 0 1 0 0 ⎤ ⎡𝑞2𝑡+1 ⎤ ⎡0 1 0 0⎥ ⎢𝑞2𝑡 ⎥ ⎢1⎤
⎤ ⎡ ⎤ ⎡
⎢ ⎥⎢ ⎥=⎢ + ⎥𝑣
⎢ 0 0 1 0 ⎥ ⎢𝑞1𝑡+1 ⎥ ⎢0 0 1 1⎥ ⎢𝑞1𝑡 ⎥ ⎢0⎥ 2𝑡
𝛽𝑎0
⎣ 2𝛾 − 𝛽𝑎
2𝛾
1
− 𝛽𝑎𝛾 1 𝛽 ⎦ ⎣𝑣1𝑡+1 ⎦ ⎣0 0 0 1⎦ ⎣𝑣1𝑡 ⎦ ⎣0⎦

2
Time 𝑡 revenues of firm 2 are 𝜋2𝑡 = 𝑎0 𝑞2𝑡 − 𝑎1 𝑞2𝑡 − 𝑎1 𝑞1𝑡 𝑞2𝑡 which evidently equal

′ 𝑎0
1 0 2 0 1
′ ⎡ ⎤ ⎡ 𝑎0
𝑧𝑡 𝑅1 𝑧𝑡 ≡ ⎢𝑞2𝑡 ⎥ ⎢ 2 −𝑎1 − 2 ⎥ ⎢𝑞2𝑡 ⎤
𝑎1 ⎤ ⎡

⎣𝑞1𝑡 ⎦ ⎣ 0 − 𝑎21 0 ⎦ ⎣𝑞1𝑡 ⎦

If we set 𝑄 = 𝛾, then firm 2’s period 𝑡 profits can then be written

𝑦𝑡′ 𝑅𝑦𝑡 − 𝑄𝑣2𝑡


2

where

𝑧
𝑦𝑡 = [ 𝑡 ]
𝑥𝑡

with 𝑥𝑡 = 𝑣1𝑡 and

𝑅1 0
𝑅=[ ]
0 0

We’ll report results of implementing this code soon


But first, we want to represent the Stackelberg leader’s optimal choices recursively
It is important to do this for several reasons:

• properly to interpret a representation of the Stackelberg leader’s choice as a sequence of


history-dependent functions
• to formulate a recursive version of the follower’s choice problem

First, let’s get a recursive representation of the Stackelberg leader’s choice of 𝑞2⃗ for our
duopoly model

76.6 Recursive Representation of Stackelberg Plan

In order to attain an appropriate representation of the Stackelberg leader’s history-dependent


plan, we will employ what amounts to a version of the Big K, little k device often used in
macroeconomics by distinguishing 𝑧𝑡 , which depends partly on decisions 𝑥𝑡 of the followers,
from another vector 𝑧𝑡̌ , which does not
We will use 𝑧𝑡̌ and its history 𝑧𝑡̌ = [𝑧𝑡̌ , 𝑧𝑡−1
̌ , … , 𝑧0̌ ] to describe the sequence of the Stackelberg
leader’s decisions that the Stackelberg follower takes as given
Thus, we let 𝑦𝑡′̌ = [𝑧𝑡′̌ 𝑥′𝑡̌ ] with initial condition 𝑧0̌ = 𝑧0 given
76.6. RECURSIVE REPRESENTATION OF STACKELBERG PLAN 1251

That we distinguish 𝑧𝑡̌ from 𝑧𝑡 is part and parcel of the Big K, little k device in this in-
stance
We have demonstrated that a Stackelberg plan for {𝑢𝑡 }∞
𝑡=0 has a recursive representation

−1
𝑥0̌ = −𝑃22 𝑃21 𝑧0
𝑢𝑡 = −𝐹 𝑦𝑡̌ , 𝑡≥0
𝑦𝑡+1
̌ = (𝐴 − 𝐵𝐹 )𝑦𝑡̌ , 𝑡≥0

From this representation, we can deduce the sequence of functions 𝜎 = {𝜎𝑡 (𝑧𝑡̌ )}∞
𝑡=0 that com-
prise a Stackelberg plan
𝑧̌
For convenience, let 𝐴 ̌ ≡ 𝐴 − 𝐵𝐹 and partition 𝐴 ̌ conformably to the partition 𝑦𝑡 = [ 𝑡 ] as
𝑥𝑡̌

𝐴̌ ̌
𝐴12
[ 11̌ ̌ ]
𝐴21 𝐴22

Let 𝐻00 ≡ −𝑃22


−1
𝑃21 so that 𝑥0̌ = 𝐻00 𝑧0̌
𝑧0̌
Then iterations on 𝑦𝑡+1
̌ = 𝐴𝑦̌ 𝑡̌ starting from initial condition 𝑦0̌ = [ ] imply that for
𝐻00 𝑧0̌
𝑡≥1

𝑡
𝑥𝑡 = ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗
̌
𝑗=1

where

̌
𝐻1𝑡 = 𝐴21
𝐻𝑡 = 𝐴̌ 𝐴̌
2 22 21
⋮ ⋮
𝑡
𝐻𝑡−1 ̌ 𝐴̌
= 𝐴𝑡−2
22 21

𝐻𝑡𝑡 = ̌
𝐴𝑡−1 ̌ ̌ 𝐻 0)
22 (𝐴21 + 𝐴22 0

An optimal decision rule for the Stackelberg’s choice of 𝑢𝑡 is

𝑧̌
𝑢𝑡 = −𝐹 𝑦𝑡̌ ≡ − [𝐹𝑧 𝐹𝑥 ] [ 𝑡 ]
𝑥𝑡

or

𝑡
𝑢𝑡 = −𝐹𝑧 𝑧𝑡̌ − 𝐹𝑥 ∑ 𝐻𝑗𝑡 𝑧𝑡−𝑗 = 𝜎𝑡 (𝑧𝑡̌ ) (10)
𝑗=1

Representation Eq. (10) confirms that whenever 𝐹𝑥 ≠ 0, the typical situation, the time 𝑡
component 𝜎𝑡 of a Stackelberg plan is history-dependent, meaning that the Stackelberg
leader’s choice 𝑢𝑡 depends not just on 𝑧𝑡̌ but on components of 𝑧𝑡−1
̌
1252 76. STACKELBERG PLANS

76.6.1 Comments and Interpretations

After all, at the end of the day, it will turn out that because we set 𝑧0̌ = 𝑧0 , it will be true
that 𝑧𝑡 = 𝑧𝑡̌ for all 𝑡 ≥ 0
Then why did we distinguish 𝑧𝑡̌ from 𝑧𝑡 ?
The answer is that if we want to present to the Stackelberg follower a history-dependent
representation of the Stackelberg leader’s sequence 𝑞2⃗ , we must use representation Eq. (10)
cast in terms of the history 𝑧𝑡̌ and not a corresponding representation cast in terms of 𝑧𝑡

76.6.2 Dynamic Programming and Time Consistency of follower’s Problem

Given the sequence 𝑞2⃗ chosen by the Stackelberg leader in our duopoly model, it turns out
that the Stackelberg follower’s problem is recursive in the natural state variables that con-
front a follower at any time 𝑡 ≥ 0
This means that the follower’s plan is time consistent
To verify these claims, we’ll formulate a recursive version of a follower’s problem that builds
on our recursive representation of the Stackelberg leader’s plan and our use of the Big K,
little k idea

76.6.3 Recursive Formulation of a Follower’s Problem

We now use what amounts to another “Big 𝐾, little 𝑘” trick (see rational expectations equi-
librium) to formulate a recursive version of a follower’s problem cast in terms of an ordinary
Bellman equation
Firm 1, the follower, faces {𝑞2𝑡 }∞
𝑡=0 as a given quantity sequence chosen by the leader and be-
lieves that its output price at 𝑡 satisfies

𝑝𝑡 = 𝑎0 − 𝑎1 (𝑞1𝑡 + 𝑞2𝑡 ), 𝑡≥0

Our challenge is to represent {𝑞2𝑡 }∞


𝑡=0 as a given sequence

To do so, recall that under the Stackelberg plan, firm 2 sets output according to the 𝑞2𝑡 com-
ponent of

1
⎡𝑞 ⎤
𝑦𝑡+1 = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡 ⎥
⎣ 𝑥𝑡 ⎦

which is governed by

𝑦𝑡+1 = (𝐴 − 𝐵𝐹 )𝑦𝑡

To obtain a recursive representation of a {𝑞2𝑡 } sequence that is exogenous to firm 1, we define


a state 𝑦𝑡̃
76.6. RECURSIVE REPRESENTATION OF STACKELBERG PLAN 1253

1
⎡𝑞 ⎤
𝑦𝑡̃ = ⎢ 2𝑡 ⎥
⎢𝑞1𝑡
̃ ⎥
⎣ 𝑡̃ ⎦
𝑥

that evolves according to

̃ = (𝐴 − 𝐵𝐹 )𝑦𝑡̃
𝑦𝑡+1

−1
subject to the initial condition 𝑞10
̃ = 𝑞10 and 𝑥0̃ = 𝑥0 where 𝑥0 = −𝑃22 𝑃21 as stated above
Firm 1’s state vector is

𝑦𝑡̃
𝑋𝑡 = [ ]
𝑞1𝑡

It follows that the follower firm 1 faces law of motion

𝑦̃ 𝐴 − 𝐵𝐹 0 𝑦𝑡̃ 0
[ 𝑡+1 ] = [ ] [ ] + [ ] 𝑥𝑡 (11)
𝑞1𝑡+1 0 1 𝑞1𝑡 1

This specification assures that from the point of the view of a firm 1, 𝑞2𝑡 is an exogenous pro-
cess
Here

• 𝑞1𝑡
̃ , 𝑥𝑡̃ play the role of Big K
• 𝑞1𝑡 , 𝑥𝑡 play the role of little k

The time 𝑡 component of firm 1’s objective is


1 0 0 0 0 𝑎20 1
⎡𝑞 ⎤ ⎡0 0 0 0 − 𝑎21 ⎤ ⎡𝑞2𝑡 ⎤
2𝑡 ⎥
̃ 𝑡 − 𝑥2𝑡 𝑄̃ = ⎢
𝑋̃ 𝑡′ 𝑅𝑥 ⎢𝑞1𝑡
̃ ⎥

⎢0 0 0 0
⎥⎢ ⎥
0 ⎥ ⎢𝑞1𝑡 ̃ ⎥ − 𝛾𝑥2𝑡
⎢ 𝑥𝑡̃ ⎥ ⎢0 0 0 0 0 ⎥ ⎢ 𝑥𝑡̃ ⎥
𝑎
⎣𝑞1𝑡 ⎦ ⎣ 20 − 𝑎21 0 0 −𝑎1 ⎦ ⎣𝑞1𝑡 ⎦

Firm 1’s optimal decision rule is

𝑥𝑡 = −𝐹 ̃ 𝑋𝑡

and it’s state evolves according to

𝑋̃ 𝑡+1 = (𝐴 ̃ − 𝐵̃ 𝐹 ̃ )𝑋𝑡

under its optimal decision rule


Later we shall compute 𝐹 ̃ and verify that when we set
1254 76. STACKELBERG PLANS

1
⎡𝑞 ⎤
⎢ 20 ⎥
𝑋0 = ⎢𝑞10 ⎥
⎢ 𝑥0 ⎥
⎣𝑞10 ⎦
we recover

𝑥0 = −𝐹 ̃ 𝑋̃ 0

which will verify that we have properly set up a recursive representation of the follower’s
problem facing the Stackelberg leader’s 𝑞2⃗

76.6.4 Time Consistency of Follower’s Plan

Since the follower can solve its problem using dynamic programming its problem is recursive
in what for it are the natural state variables, namely

1
⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎢𝑞10
̃ ⎥
⎣ 𝑥0̃ ⎦

It follows that the follower’s plan is time consistent

76.7 Computing the Stackelberg Plan

Here is our code to compute a Stackelberg plan via a linear-quadratic dynamic program as
outlined above

In [2]: import numpy as np


import numpy.linalg as la
import quantecon as qe
from quantecon import LQ
import matplotlib.pyplot as plt
%matplotlib inline

In [3]: # == Parameters == #
a0 = 10
a1 = 2
β = 0.96
γ = 120
n = 300
tol0 = 1e-8
tol1 = 1e-16
tol2 = 1e-2

βs = np.ones(n)
βs[1:] = β
βs = βs.cumprod()

In [4]: # == In LQ form == #
Alhs = np.eye(4)
76.7. COMPUTING THE STACKELBERG PLAN 1255

# Euler equation coefficients


Alhs[3, :] = β * a0 / (2 * γ), -β * a1 / (2 * γ), -β * a1 / γ, β

Arhs = np.eye(4)
Arhs[2, 3] = 1

Alhsinv = la.inv(Alhs)

A = Alhsinv @ Arhs

B = Alhsinv @ np.array([[0, 1, 0, 0]]).T

R = np.array([[0, -a0 / 2, 0, 0],


[-a0 / 2, a1, a1 / 2, 0],
[0, a1 / 2, 0, 0],
[0, 0, 0, 0]])

Q = np.array([[γ]])

# == Solve using QE's LQ class == #


# LQ solves minimization problems which is why the sign of R and Q was changed
lq = LQ(Q, R, A, B, beta=β)
P, F, d = lq.stationary_values(method='doubling')

P22 = P[3:, 3:]


P21 = P[3:, :3]
P22inv = la.inv(P22)
H_0_0 = -P22inv @ P21

# == Simulate forward == #

π_leader = np.zeros(n)

z0 = np.array([[1, 1, 1]]).T
x0 = H_0_0 @ z0
y0 = np.vstack((z0, x0))

yt, ut = lq.compute_sequence(y0, ts_length=n)[:2]

π_matrix = (R + F. T @ Q @ F)

for t in range(n):
π_leader[t] = -(yt[:, t].T @ π_matrix @ yt[:, t])

# == Display policies == #
print("Computed policy for Stackelberg leader\n")
print(f"F = {F}")

Computed policy for Stackelberg leader

F = [[-1.58004454 0.29461313 0.67480938 6.53970594]]

76.7.1 Implied Time Series for Price and Quantities

The following code plots the price and quantities

In [5]: q_leader = yt[1, :-1]


q_follower = yt[2, :-1]
q = q_leader + q_follower # Total output, Stackelberg
p = a0 - a1 * q # Price, Stackelberg

fig, ax = plt.subplots(figsize=(9, 5.8))


ax.plot(range(n), q_leader, 'b-', lw=2, label='leader output')
ax.plot(range(n), q_follower, 'r-', lw=2, label='follower output')
ax.plot(range(n), p, 'g-', lw=2, label='price')
ax.set_title('Output and prices, Stackelberg duopoly')
ax.legend(frameon=False)
plt.xlabel('t')
plt.show()
1256 76. STACKELBERG PLANS

76.7.2 Value of Stackelberg Leader

We’ll compute the present value earned by the Stackelberg leader


We’ll compute it two ways (they give identical answers – just a check on coding and thinking)

In [6]: v_leader_forward = np.sum(βs * π_leader)


v_leader_direct = -yt[:, 0].T @ P @ yt[:, 0]

# == Display values == #
print("Computed values for the Stackelberg leader at t=0:\n")
print(f"v_leader_forward(forward sim) = {v_leader_forward:.4f}")
print(f"v_leader_direct (direct) = {v_leader_direct:.4f}")

Computed values for the Stackelberg leader at t=0:

v_leader_forward(forward sim) = 150.0316


v_leader_direct (direct) = 150.0324

In [7]: # Manually checks whether P is approximately a fixed point


P_next = (R + F.T @ Q @ F + β * (A - B @ F).T @ P @ (A - B @ F))
(P - P_next < tol0).all()

Out[7]: True

In [8]: # Manually checks whether two different ways of computing the


# value function give approximately the same answer
v_expanded = -((y0.T @ R @ y0 + ut[:, 0].T @ Q @ ut[:, 0] +
β * (y0.T @ (A - B @ F).T @ P @ (A - B @ F) @ y0)))
(v_leader_direct - v_expanded < tol0)[0, 0]

Out[8]: True
76.8. EXHIBITING TIME INCONSISTENCY OF STACKELBERG PLAN 1257

76.8 Exhibiting Time Inconsistency of Stackelberg Plan

In the code below we compare two values

• the continuation value −𝑦𝑡 𝑃 𝑦𝑡 earned by a continuation Stackelberg leader who inherits
state 𝑦𝑡 at 𝑡
• the value of a reborn Stackelberg leader who inherits state 𝑧𝑡 at 𝑡 and sets 𝑥𝑡 =
−1
−𝑃22 𝑃21

The difference between these two values is a tell-tale time of the time inconsistency of the
Stackelberg plan

In [9]: # Compute value function over time with a reset at time t


vt_leader = np.zeros(n)
vt_reset_leader = np.empty_like(vt_leader)

yt_reset = yt.copy()
yt_reset[-1, :] = (H_0_0 @ yt[:3, :])

for t in range(n):
vt_leader[t] = -yt[:, t].T @ P @ yt[:, t]
vt_reset_leader[t] = -yt_reset[:, t].T @ P @ yt_reset[:, t]

In [10]: fig, axes = plt.subplots(3, 1, figsize=(10, 7))

axes[0].plot(range(n+1), (- F @ yt).flatten(), 'bo', label='Stackelberg leader', ms=2)


axes[0].plot(range(n+1), (- F @ yt_reset).flatten(), 'ro', label='continuation leader at t', ms=2)
axes[0].set(title=r'Leader control variable $u_{t}$', xlabel='t')
axes[0].legend()

axes[1].plot(range(n+1), yt[3, :], 'bo', ms=2)


axes[1].plot(range(n+1), yt_reset[3, :], 'ro', ms=2)
axes[1].set(title=r'Follower control variable $x_{t}$', xlabel='t')

axes[2].plot(range(n), vt_leader, 'bo', ms=2)


axes[2].plot(range(n), vt_reset_leader, 'ro', ms=2)
axes[2].set(title=r'Leader value function $v(y_{t})$', xlabel='t')

plt.tight_layout()
plt.show()
1258 76. STACKELBERG PLANS

76.9 Recursive Formulation of the Follower’s Problem

We now formulate and compute the recursive version of the follower’s problem
We check that the recursive Big 𝐾 , little 𝑘 formulation of the follower’s problem produces
the same output path 𝑞1⃗ that we computed when we solved the Stackelberg problem

In [11]: A_tilde = np.eye(5)


A_tilde[:4, :4] = A - B @ F

R_tilde = np.array([[0, 0, 0, 0, -a0 / 2],


[0, 0, 0, 0, a1 / 2],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[-a0 / 2, a1 / 2, 0, 0, a1]])

Q_tilde = Q
B_tilde = np.array([[0, 0, 0, 0, 1]]).T

lq_tilde = LQ(Q_tilde, R_tilde, A_tilde, B_tilde, beta=β)


P_tilde, F_tilde, d_tilde = lq_tilde.stationary_values(method='doubling')

y0_tilde = np.vstack((y0, y0[2]))


yt_tilde = lq_tilde.compute_sequence(y0_tilde, ts_length=n)[0]

In [12]: # Checks that the recursive formulation of the follower's problem gives
# the same solution as the original Stackelberg problem
plt.plot(yt_tilde[4], 'r', label="q_tilde")
plt.plot(yt_tilde[2], 'b', label="q")
plt.legend()
plt.show()
76.9. RECURSIVE FORMULATION OF THE FOLLOWER’S PROBLEM 1259

Note: Variables with _tilde are obtained from solving the follower’s problem – those with-
out are from the Stackelberg problem

In [13]: # Maximum absolute difference in quantities over time between the first and second solution methods
np.max(np.abs(yt_tilde[4] - yt_tilde[2]))

Out[13]: 1.7763568394002505e-15

In [14]: # x0 == x0_tilde
yt[:, 0][-1] - (yt_tilde[:, 1] - yt_tilde[:, 0])[-1] < tol0

Out[14]: True

76.9.1 Explanation of Alignment

If we inspect the coefficients in the decision rule −𝐹 ̃ , we can spot the reason that the follower
chooses to set 𝑥𝑡 = 𝑥𝑡̃ when it sets 𝑥𝑡 = −𝐹 ̃ 𝑋𝑡 in the recursive formulation of the follower
problem
Can you spot what features of 𝐹 ̃ imply this?
Hint: remember the components of 𝑋𝑡

In [15]: # Policy function in the follower's problem


F_tilde.round(4)

Out[15]: array([[ 0. , 0. , -0.1032, -1. , 0.1032]])

In [16]: # Value function in the Stackelberg problem


P

Out[16]: array([[ 963.54083615, -194.60534465, -511.62197962, -5258.22585724],


[ -194.60534465, 37.3535753 , 81.97712513, 784.76471234],
[ -511.62197962, 81.97712513, 247.34333344, 2517.05126111],
[-5258.22585724, 784.76471234, 2517.05126111, 25556.16504097]])
1260 76. STACKELBERG PLANS

In [17]: # Value function in the follower's problem


P_tilde

Out[17]: array([[-1.81991134e+01, 2.58003020e+00, 1.56048755e+01,


1.51229815e+02, -5.00000000e+00],
[ 2.58003020e+00, -9.69465925e-01, -5.26007958e+00,
-5.09764310e+01, 1.00000000e+00],
[ 1.56048755e+01, -5.26007958e+00, -3.22759027e+01,
-3.12791908e+02, -1.23823802e+01],
[ 1.51229815e+02, -5.09764310e+01, -3.12791908e+02,
-3.03132584e+03, -1.20000000e+02],
[-5.00000000e+00, 1.00000000e+00, -1.23823802e+01,
-1.20000000e+02, 1.43823802e+01]])

In [18]: # Manually check that P is an approximate fixed point


(P - ((R + F.T @ Q @ F) + β * (A - B @ F).T @ P @ (A - B @ F)) < tol0).all()

Out[18]: True

In [19]: # Compute `P_guess` using `F_tilde_star`


F_tilde_star = -np.array([[0, 0, 0, 1, 0]])
P_guess = np.zeros((5, 5))

for i in range(1000):
P_guess = ((R_tilde + F_tilde_star.T @ Q @ F_tilde_star) +
β * (A_tilde - B_tilde @ F_tilde_star).T @ P_guess
@ (A_tilde - B_tilde @ F_tilde_star))

In [20]: # Value function in the follower's problem


-(y0_tilde.T @ P_tilde @ y0_tilde)[0, 0]

Out[20]: 112.65590740578095

In [21]: # Value function with `P_guess`


-(y0_tilde.T @ P_guess @ y0_tilde)[0, 0]

Out[21]: 112.65590740578085

In [22]: # Compute policy using policy iteration algorithm


F_iter = (β * la.inv(Q + β * B_tilde.T @ P_guess @ B_tilde)
@ B_tilde.T @ P_guess @ A_tilde)

for i in range(100):
# Compute P_iter
P_iter = np.zeros((5, 5))
for j in range(1000):
P_iter = ((R_tilde + F_iter.T @ Q @ F_iter) + β *
(A_tilde - B_tilde @ F_iter).T @ P_iter @
(A_tilde - B_tilde @ F_iter))

# Update F_iter
F_iter = (β * la.inv(Q + β * B_tilde.T @ P_iter @ B_tilde)
@ B_tilde.T @ P_iter @ A_tilde)

dist_vec = (P_iter - ((R_tilde + F_iter.T @ Q @ F_iter) +


β * (A_tilde - B_tilde @ F_iter).T @ P_iter @
(A_tilde - B_tilde @ F_iter)))

if np.max(np.abs(dist_vec)) < 1e-8:


dist_vec2 = (F_iter - (β * la.inv(Q + β * B_tilde.T @ P_iter @ B_tilde)
@ B_tilde.T @ P_iter @ A_tilde))

if np.max(np.abs(dist_vec2)) < 1e-8:


F_iter
else:
print("The policy didn't converge: try increasing the number of outer loop iterations")
else:
print("`P_iter` didn't converge: try increasing the number of inner loop iterations")
76.10. MARKOV PERFECT EQUILIBRIUM 1261

In [23]: # Simulate the system using `F_tilde_star` and check that it gives the same result as the original so

yt_tilde_star = np.zeros((n, 5))


yt_tilde_star[0, :] = y0_tilde.flatten()

for t in range(n-1):
yt_tilde_star[t+1, :] = (A_tilde - B_tilde @ F_tilde_star) @ yt_tilde_star[t, :]

plt.plot(yt_tilde_star[:, 4], 'r', label="q_tilde")


plt.plot(yt_tilde[2], 'b', label="q")
plt.legend()
plt.show()

In [24]: # Maximum absolute difference


np.max(np.abs(yt_tilde_star[:, 4] - yt_tilde[2, :-1]))

Out[24]: 0.0

76.10 Markov Perfect Equilibrium

The state vector is

1
𝑧𝑡 = ⎡𝑞 ⎤
⎢ 2𝑡 ⎥
⎣𝑞1𝑡 ⎦

and the state transition dynamics are

𝑧𝑡+1 = 𝐴𝑧𝑡 + 𝐵1 𝑣1𝑡 + 𝐵2 𝑣2𝑡

where 𝐴 is a 3 × 3 identity matrix and


1262 76. STACKELBERG PLANS

0 0
𝐵1 = ⎢0⎤

⎥, 𝐵2 = ⎢1⎤


⎣1⎦ ⎣0⎦

The Markov perfect decision rules are

𝑣1𝑡 = −𝐹1 𝑧𝑡 , 𝑣2𝑡 = −𝐹2 𝑧𝑡

and in the Markov perfect equilibrium, the state evolves according to

𝑧𝑡+1 = (𝐴 − 𝐵1 𝐹1 − 𝐵2 𝐹2 )𝑧𝑡

In [25]: # == In LQ form == #
A = np.eye(3)
B1 = np.array([[0], [0], [1]])
B2 = np.array([[0], [1], [0]])

R1 = np.array([[0, 0, -a0 / 2],


[0, 0, a1 / 2],
[-a0 / 2, a1 / 2, a1]])

R2 = np.array([[0, -a0 / 2, 0],


[-a0 / 2, a1, a1 / 2],
[0, a1 / 2, 0]])

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# == Solve using QE's nnash function == #


F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1,
Q2, S1, S2, W1, W2, M1,
M2, beta=β, tol=tol1)

# == Simulate forward == #
AF = A - B1 @ F1 - B2 @ F2
z = np.empty((3, n))
z[:, 0] = 1, 1, 1
for t in range(n-1):
z[:, t+1] = AF @ z[:, t]

# == Display policies == #
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")

Computed policies for firm 1 and firm 2:

F1 = [[-0.22701363 0.03129874 0.09447113]]


F2 = [[-0.22701363 0.09447113 0.03129874]]

In [26]: q1 = z[1, :]
q2 = z[2, :]
q = q1 + q2 # Total output, MPE
p = a0 - a1 * q # Price, MPE

fig, ax = plt.subplots(figsize=(9, 5.8))


ax.plot(range(n), q, 'b-', lw=2, label='total output')
ax.plot(range(n), p, 'g-', lw=2, label='price')
ax.set_title('Output and prices, duopoly MPE')
ax.legend(frameon=False)
plt.xlabel('t')
plt.show()
76.10. MARKOV PERFECT EQUILIBRIUM 1263

In [27]: # Computes the maximum difference between the two quantities of the two firms
np.max(np.abs(q1 - q2))

Out[27]: 6.8833827526759706e-15

In [28]: # Compute values


u1 = (- F1 @ z).flatten()
u2 = (- F2 @ z).flatten()

π_1 = p * q1 - γ * (u1) ** 2
π_2 = p * q2 - γ * (u2) ** 2

v1_forward = np.sum(βs * π_1)


v2_forward = np.sum(βs * π_2)

v1_direct = (- z[:, 0].T @ P1 @ z[:, 0])


v2_direct = (- z[:, 0].T @ P2 @ z[:, 0])

# == Display values == #
print("Computed values for firm 1 and firm 2:\n")
print(f"v1(forward sim) = {v1_forward:.4f}; v1 (direct) = {v1_direct:.4f}")
print(f"v2 (forward sim) = {v2_forward:.4f}; v2 (direct) = {v2_direct:.4f}")

Computed values for firm 1 and firm 2:

v1(forward sim) = 133.3303; v1 (direct) = 133.3296


v2 (forward sim) = 133.3303; v2 (direct) = 133.3296

In [29]: # Sanity check


Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()

v2_direct_alt = - z[:, 0].T @ lq1.P @ z[:, 0] + lq1.d

(np.abs(v2_direct - v2_direct_alt) < tol2).all()


1264 76. STACKELBERG PLANS

Out[29]: True

76.11 MPE vs. Stackelberg


In [30]: vt_MPE = np.zeros(n)
vt_follower = np.zeros(n)

for t in range(n):
vt_MPE[t] = -z[:, t].T @ P1 @ z[:, t]
vt_follower[t] = -yt_tilde[:, t].T @ P_tilde @ yt_tilde[:, t]

plt.plot(vt_MPE, 'b', label='MPE')


plt.plot(vt_leader, 'r', label='Stackelberg leader')
plt.plot(vt_follower, 'g', label='Stackelberg follower')
plt.title(r'MPE vs. Stackelberg Value Function')
plt.xlabel('t')
plt.legend(loc=(1.05, 0))
plt.show()

In [31]: # == Display values == #


print("Computed values:\n")
print(f"vt_leader(y0) = {vt_leader[0]:.4f}")
print(f"vt_follower(y0) = {vt_follower[0]:.4f}")
print(f"vt_MPE(y0) = {vt_MPE[0]:.4f}")

Computed values:

vt_leader(y0) = 150.0324
vt_follower(y0) = 112.6559
vt_MPE(y0) = 133.3296

In [32]: # Compute the difference in total value between the Stackelberg and the MPE
vt_leader[0] + vt_follower[0] - 2 * vt_MPE[0]

Out[32]: -3.9709425620912953
77

Ramsey Plans, Time Inconsistency,


Sustainable Plans

77.1 Contents

• Overview 77.2

• The Model 77.3

• Structure 77.4

• Intertemporal Influences 77.5

• Four Models of Government Policy 77.6

• A Ramsey Planner 77.7

• A Constrained-to-a-Constant-Growth-Rate Ramsey Government 77.8

• Markov Perfect Governments 77.9

• Equilibrium Outcomes for Three Models of Government Policy Making 77.10

• A Fourth Model of Government Decision Making 77.11

• Sustainable or Credible Plan 77.12

• Comparison of Equilibrium Values 77.13

• Note on Dynamic Programming Squared 77.14

Co-author: Sebastian Graves


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

1265
1266 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

77.2 Overview

This lecture describes a linear-quadratic version of a model that Guillermo Calvo [21] used to
illustrate the time inconsistency of optimal government plans
Like Chang [25], we use the model as a laboratory in which to explore the consequences of
different timing protocols for government decision making
The model focuses attention on intertemporal tradeoffs between

• welfare benefits that anticipated deflation generates by increasing a representative


agent’s liquidity as measured by his or her real money balances, and
• costs associated with distorting taxes that must be used to withdraw money from the
economy in order to generate anticipated deflation

The model features

• rational expectations
• costly government actions at all dates 𝑡 ≥ 1 that increase household utilities at dates
before 𝑡
• two Bellman equations, one that expresses the private sector’s expectation of future in-
flation as a function of current and future government actions, another that describes
the value function of a Ramsey planner

A theme of this lecture is that timing protocols affect outcomes


We’ll use ideas from papers by Cagan [20], Calvo [21], Stokey [124], [125], Chari and Kehoe
[26], Chang [25], and Abreu [1] as well as from chapter 19 of [87]
In addition, we’ll use ideas from linear-quadratic dynamic programming described in Linear
Quadratic Control as applied to Ramsey problems in Stackelberg problems
In particular, we have specified the model in a way that allows us to use linear-quadratic
dynamic programming to compute an optimal government plan under a timing protocol in
which a government chooses an infinite sequence of money supply growth rates once and for
all at time 0

77.3 The Model

There is no uncertainty
Let:

• 𝑝𝑡 be the log of the price level


• 𝑚𝑡 be the log of nominal money balances
• 𝜃𝑡 = 𝑝𝑡+1 − 𝑝𝑡 be the net rate of inflation between 𝑡 and 𝑡 + 1
• 𝜇𝑡 = 𝑚𝑡+1 − 𝑚𝑡 be the net rate of growth of nominal balances

The demand for real balances is governed by a perfect foresight version of the Cagan [20] de-
mand function:
77.3. THE MODEL 1267

𝑚𝑡 − 𝑝𝑡 = −𝛼(𝑝𝑡+1 − 𝑝𝑡 ) , 𝛼 > 0 (1)

for 𝑡 ≥ 0
Equation Eq. (1) asserts that the demand for real balances is inversely related to the public’s
expected rate of inflation, which here equals the actual rate of inflation
(When there is no uncertainty, an assumption of rational expectations simplifies to per-
fect foresight)
(See [117] for a rational expectations version of the model when there is uncertainty)
Subtracting the demand function at time 𝑡 from the demand function at 𝑡 + 1 gives:

𝜇𝑡 − 𝜃𝑡 = −𝛼𝜃𝑡+1 + 𝛼𝜃𝑡

or

𝛼 1
𝜃𝑡 = 𝜃𝑡+1 + 𝜇 (2)
1+𝛼 1+𝛼 𝑡
𝛼
Because 𝛼 > 0, 0 < 1+𝛼 <1
Definition: For a scalar 𝑥𝑡 , let 𝐿2 be the space of sequences {𝑥𝑡 }∞
𝑡=0 satisfying


∑ 𝑥2𝑡 < +∞
𝑡=0

We say that a sequence that belongs to 𝐿2 is square summable


When we assume that the sequence 𝜇⃗ = {𝜇𝑡 }∞ 𝑡=0 is square summable and we require that the
⃗ ∞
sequence 𝜃 = {𝜃𝑡 }𝑡=0 is square summable, the linear difference equation Eq. (2) can be solved
forward to get:

∞ 𝑗
1 𝛼
𝜃𝑡 = ∑( ) 𝜇𝑡+𝑗 (3)
1 + 𝛼 𝑗=0 1 + 𝛼

Insight: In the spirit of Chang [25], note that equations Eq. (1) and Eq. (3) show that 𝜃𝑡
intermediates how choices of 𝜇𝑡+𝑗 , 𝑗 = 0, 1, … impinge on time 𝑡 real balances 𝑚𝑡 − 𝑝𝑡 = −𝛼𝜃𝑡
We shall use this insight to help us simplify and analyze government policy problems
That future rates of money creation influence earlier rates of inflation creates optimal govern-
ment policy problems in which timing protocols matter
We can rewrite the model as:

1 1 0 1 0
[ ]=[ 1+𝛼 ] [ ] + [ 1 ] 𝜇𝑡
𝜃𝑡+1 0 𝛼 𝜃𝑡 −𝛼

or

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝜇𝑡 (4)


1268 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

We write the model in the state-space form Eq. (4) even though 𝜃0 is to be determined and
so is not an initial condition as it ordinarily would be in the state-space model described in
Linear Quadratic Control
We write the model in the form Eq. (4) because we want to apply an approach described in
Stackelberg problems
Assume that a representative household’s utility of real balances at time 𝑡 is:

𝑎2
𝑈 (𝑚𝑡 − 𝑝𝑡 ) = 𝑎0 + 𝑎1 (𝑚𝑡 − 𝑝𝑡 ) − (𝑚𝑡 − 𝑝𝑡 )2 , 𝑎0 > 0, 𝑎1 > 0, 𝑎2 > 0 (5)
2
𝑎1
The “bliss level” of real balances is then 𝑎2

The money demand function Eq. (1) and the utility function Eq. (5) imply that utility maxi-
mizing or bliss level of real balances is attained when:

𝑎1
𝜃𝑡 = 𝜃∗ = −
𝑎2 𝛼

Below, we introduce the discount factor 𝛽 ∈ (0, 1) that a representative household and a
benevolent government both use to discount future utilities
(If we set parameters so that 𝜃∗ = log(𝛽), then we can regard a recommendation to set 𝜃𝑡 =
𝜃∗ as a “poor man’s Friedman rule” that attains Milton Friedman’s optimal quantity of
money)
Via equation Eq. (3), a government plan 𝜇⃗ = {𝜇𝑡 }∞
𝑡=0 leads to an equilibrium sequence of
inflation outcomes 𝜃 ⃗ = {𝜃𝑡 }∞
𝑡=0

We assume that social costs 2𝑐 𝜇2𝑡 are incurred at 𝑡 when the government changes the stock of
nominal money balances at rate 𝜇𝑡
Therefore, the one-period welfare function of a benevolent government is:


1 𝑎 − 𝑎12𝛼 1 𝑐 2
−𝑠(𝜃𝑡 , 𝜇𝑡 ) ≡ −𝑟(𝑥𝑡 , 𝜇𝑡 ) = [ ] [ 𝑎01 𝛼 ′ 2
𝑎2 𝛼2 ] [ ] − 𝜇𝑡 = −𝑥𝑡 𝑅𝑥𝑡 − 𝑄𝜇𝑡 (6)
𝜃𝑡 − 2 − 2 𝜃𝑡 2

Household welfare is summarized by:

∞ ∞
𝑣0 = − ∑ 𝛽 𝑟(𝑥𝑡 , 𝜇𝑡 ) = − ∑ 𝛽 𝑡 𝑠(𝜃𝑡 , 𝜇𝑡 )
𝑡
(7)
𝑡=0 𝑡=0

We can represent the dependence of 𝑣0 on (𝜃,⃗ 𝜇)⃗ recursively via

𝑣𝑡 = 𝑠(𝜃𝑡 , 𝜇𝑡 ) + 𝛽𝑣𝑡+1 (8)

77.4 Structure

The following structure is induced by private agents’ behavior as summarized by the demand
function for money Eq. (1) that leads to equation Eq. (3) that tells how future settings of 𝜇
affect the current value of 𝜃
77.5. INTERTEMPORAL INFLUENCES 1269

Equation Eq. (3) maps a policy sequence of money growth rates 𝜇⃗ = {𝜇𝑡 }∞ 2
𝑡=0 ∈ 𝐿 into an
inflation sequence 𝜃 ⃗ = {𝜃𝑡 }∞
𝑡=0 ∈ 𝐿
2

These, in turn, induce a discounted value to a government sequence 𝑣 ⃗ = {𝑣𝑡 }∞ 2


𝑡=0 ∈ 𝐿 that
satisfies the recursion

𝑣𝑡 = 𝑠(𝜃𝑡 , 𝜇𝑡 ) + 𝛽𝑣𝑡+1

where we have called 𝑠(𝜃𝑡 , 𝜇𝑡 ) = 𝑟(𝑥𝑡 , 𝜇𝑡 ) as above


Thus, we have a triple of sequences 𝜇,⃗ 𝜃,⃗ 𝑣 ⃗ associated with a 𝜇⃗ ∈ 𝐿2
At this point 𝜇⃗ ∈ 𝐿2 is an arbitrary exogenous policy
To make 𝜇⃗ endogenous, we require a theory of government decisions

77.5 Intertemporal Influences

Criterion function Eq. (7) and the constraint system Eq. (4) exhibit the following structure:

• Setting 𝜇𝑡 ≠ 0 imposes costs 2𝑐 𝜇2𝑡 at time 𝑡 and at no other times; but


• The money growth rate 𝜇𝑡 affects the representative household’s one-period utilities at
all dates 𝑠 = 0, 1, … , 𝑡

That settings of 𝜇 at one date affect household utilities at earlier dates sets the stage for the
emergence of a time-inconsistent optimal government plan under a Ramsey (also called a
Stackelberg) timing protocol
We’ll study outcomes under a Ramsey timing protocol below
But we’ll also study the consequences of other timing protocols

77.6 Four Models of Government Policy

We consider four models of policymakers that differ in

• what a policymaker is allowed to choose, either a sequence 𝜇⃗ or just a single period 𝜇𝑡


• when a policymaker chooses, either at time 0 or at times 𝑡 ≥ 0
• what a policymaker assumes about how its choice of 𝜇𝑡 affects private agents’ expecta-
tions about earlier and later inflation rates

In two of our models, a single policymaker chooses a sequence {𝜇𝑡 }∞𝑡=0 once and for all, taking
into account how 𝜇𝑡 affects household one-period utilities at dates 𝑠 = 0, 1, … , 𝑡 − 1

• these two models thus employ a Ramsey or Stackelberg timing protocol

In two other models, there is a sequence of policymakers, each of whom sets 𝜇𝑡 at one 𝑡 only

• Each such policymaker ignores effects that its choice of 𝜇𝑡 has on household one-period
utilities at dates 𝑠 = 0, 1, … , 𝑡 − 1
1270 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

The four models differ with respect to timing protocols, constraints on government choices,
and government policymakers’ beliefs about how their decisions affect private agents’ beliefs
about future government decisions
The models are

• A single Ramsey planner chooses a sequence {𝜇𝑡 }∞


𝑡=0 once and for all at time 0

• A single Ramsey planner chooses a sequence {𝜇𝑡 }∞


𝑡=0 once and for all at time 0 subject
to the constraint that 𝜇𝑡 = 𝜇 for all 𝑡 ≥ 0

• A sequence of separate policymakers chooses 𝜇𝑡 for 𝑡 = 0, 1, 2, …

– a time 𝑡 policymaker chooses 𝜇𝑡 only and forecasts that future government deci-
sions are unaffected by its choice

• A sequence of separate policymakers chooses 𝜇𝑡 for 𝑡 = 0, 1, 2, …

– a time 𝑡 policymaker chooses only 𝜇𝑡 but believes that its choice of 𝜇𝑡 shapes pri-
vate agents’ beliefs about future rates of money creation and inflation, and through
them, future government actions

77.7 A Ramsey Planner

First, we consider a Ramsey planner that chooses {𝜇𝑡 , 𝜃𝑡 }∞


𝑡=0 to maximize Eq. (7) subject to
the law of motion Eq. (4)
We can split this problem into two stages, as in Stackelberg problems and [87] Chapter 19
In the first stage, we take the initial inflation rate 𝜃0 as given, and then solve the resulting
LQ dynamic programming problem
In the second stage, we maximize over the initial inflation rate 𝜃0
Define a feasible set of (⃗⃗𝑥⃗⃗1 , 𝜇
⃗⃗⃗⃗0 ) sequences:

Ω(𝑥0 ) = {(⃗⃗𝑥⃗⃗1 , 𝜇
⃗⃗⃗⃗0 ) ∶ 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐵𝜇𝑡 , ∀𝑡 ≥ 0}

77.7.1 Subproblem 1

The value function


𝐽 (𝑥0 ) = max ∑ 𝛽 𝑡 𝑟(𝑥𝑡 , 𝜇𝑡 )
(⃗⃗𝑥⃗ ⃗1 ,⃗⃗⃗⃗⃗
𝜇0 )∈Ω(𝑥0 )
𝑡=0

satisfies the Bellman equation

𝐽 (𝑥) = max′ {−𝑟(𝑥, 𝜇) + 𝛽𝐽 (𝑥′ )}


𝜇,𝑥

subject to:
77.7. A RAMSEY PLANNER 1271

𝑥′ = 𝐴𝑥 + 𝐵𝜇

As in Stackelberg problems, we map this problem into a linear-quadratic control problem and
then carefully use the optimal value function associated with it
Guessing that 𝐽 (𝑥) = −𝑥′ 𝑃 𝑥 and substituting into the Bellman equation gives rise to the
algebraic matrix Riccati equation:

𝑃 = 𝑅 + 𝛽𝐴′ 𝑃 𝐴 − 𝛽 2 𝐴′ 𝑃 𝐵(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

and the optimal decision rule

𝜇𝑡 = −𝐹 𝑥𝑡

where

𝐹 = 𝛽(𝑄 + 𝛽𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

The QuantEcon LQ class solves for 𝐹 and 𝑃 given inputs 𝑄, 𝑅, 𝐴, 𝐵, and 𝛽

77.7.2 Subproblem 2

The value of the Ramsey problem is

𝑉 = max 𝐽 (𝑥0 )
𝑥0

The value function

𝑃11 𝑃12 1
𝐽 (𝑥0 ) = − [1 𝜃0 ] [ ] [ ] = −𝑃11 − 2𝑃21 𝜃0 − 𝑃22 𝜃02
𝑃21 𝑃22 𝜃0

Maximizing this with respect to 𝜃0 yields the FOC:

−2𝑃21 − 2𝑃22 𝜃0 = 0

which implies

𝑃21
𝜃0∗ = −
𝑃22

77.7.3 Representation of Ramsey Plan

The preceding calculations indicate that we can represent a Ramsey plan 𝜇⃗ recursively with
the following system created in the spirit of Chang [25]:
1272 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

𝜃0 = 𝜃0∗
𝜇𝑡 = 𝑏0 + 𝑏1 𝜃𝑡 (9)
𝜃𝑡+1 = 𝑑0 + 𝑑1 𝜃𝑡

To interpret this system, think of the sequence {𝜃𝑡 }∞


𝑡=0 as a sequence of synthetic promised
inflation rates that are just computational devices for generating a sequence 𝜇⃗ of money
growth rates that are to be substituted into equation Eq. (3) to form actual rates of inflation
It can be verified that if we substitute a plan 𝜇⃗ = {𝜇𝑡 }∞
𝑡=0 that satisfies these equations into

equation Eq. (3), we obtain the same sequence 𝜃 generated by the system Eq. (9)
(Here an application of the Big :math:K, little :math:k trick could once again be enlighten-
ing)
Thus, our construction of a Ramsey plan guarantees that promised inflation equals actual
inflation

77.7.4 Multiple roles of 𝜃𝑡

The inflation rate 𝜃𝑡 that appears in the system Eq. (9) and equation Eq. (3) plays three roles
simultaneously:

• In equation Eq. (3), 𝜃𝑡 is the actual rate of inflation between 𝑡 and 𝑡 + 1


• In equation Eq. (2) and Eq. (3), 𝜃𝑡 is also the public’s expected rate of inflation between
𝑡 and 𝑡 + 1
• In system Eq. (9), 𝜃𝑡 is a promised rate of inflation chosen by the Ramsey planner at
time 0

77.7.5 Time Inconsistency

As discussed in Stackelberg problems and Optimal taxation with state-contingent debt, a con-
tinuation Ramsey plan is not a Ramsey plan
This is a concise way of characterizing the time inconsistency of a Ramsey plan
The time inconsistency of a Ramsey plan has motivated other models of government decision
making that alter either

• the timing protocol and/or


• assumptions about how government decision makers think their decisions affect private
agents’ beliefs about future government decisions

77.8 A Constrained-to-a-Constant-Growth-Rate Ramsey Gov-


ernment

We now consider the following peculiar model of optimal government behavior


We have created this model in order to highlight an aspect of an optimal government policy
associated with its time inconsistency, namely, the feature that optimal settings of the policy
instrument vary over time
77.9. MARKOV PERFECT GOVERNMENTS 1273

Instead of allowing the Ramsey government to choose different settings of its instrument at
different moments, we now assume that at time 0, a Ramsey government at time 0 once and
for all chooses a constant sequence 𝜇𝑡 = 𝜇̌ for all 𝑡 ≥ 0 to maximize

𝑐
𝑈 (−𝛼𝜇)̌ − 𝜇2̌
2

Here we have imposed the perfect foresight outcome implied by equation Eq. (2) that 𝜃𝑡 = 𝜇̌
when the government chooses a constant 𝜇 for all 𝑡 ≥ 0
With the quadratic form Eq. (5) for the utility function 𝑈 , the maximizing 𝜇̄ is

𝛼𝑎1
𝜇̌ = −
𝛼 2 𝑎2 + 𝑐

Summary: We have introduced the constrained-to-a-constant 𝜇 government in order to high-


light time-variation of 𝜇𝑡 as a telltale sign of time inconsistency of a Ramsey plan

77.9 Markov Perfect Governments

We now change the timing protocol by considering a sequence of government policymakers,


the time 𝑡 representative of which chooses 𝜇𝑡 and expects all future governments to set 𝜇𝑡+𝑗 =
𝜇̄
This assumption mirrors an assumption made in a different setting Markov Perfect Equilib-
rium
Further, a government policymaker at 𝑡 believes that 𝜇̄ is unaffected by its choice of 𝜇𝑡
The time 𝑡 rate of inflation is then:

𝛼 1
𝜃𝑡 = 𝜇̄ + 𝜇
1+𝛼 1+𝛼 𝑡

The time 𝑡 government policymaker then chooses 𝜇𝑡 to maximize:

𝑐
𝑊 = 𝑈 (−𝛼𝜃𝑡 ) − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
2

where 𝑉 (𝜇)̄ is the time 0 value 𝑣0 of recursion Eq. (8) under a money supply growth rate that
is forever constant at 𝜇̄
Substituting for 𝑈 and 𝜃𝑡 gives:

𝛼2 𝛼 𝑎 𝛼2 𝛼 𝑐
𝑊 = 𝑎0 + 𝑎1 (− 𝜇̄ − 𝜇𝑡 ) − 2 ((− 𝜇̄ − 𝜇𝑡 )2 − 𝜇2𝑡 + 𝛽𝑉 (𝜇)̄
1+𝛼 1+𝛼 2 1+𝛼 1+𝛼 2

The first-order necessary condition for 𝜇𝑡 is then:

𝛼 𝛼2 𝛼 𝛼
− 𝑎1 − 𝑎2 (− 𝜇̄ − 𝜇𝑡 )(− ) − 𝑐𝜇𝑡 = 0
1+𝛼 1+𝛼 1+𝛼 1+𝛼

Rearranging we get:
1274 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

−𝑎1 𝛼 2 𝑎2
𝜇𝑡 = 1+𝛼 𝛼
− 𝜇̄
𝛼 𝑐 + 1+𝛼 𝑎2 [ 1+𝛼 𝛼
𝛼 𝑐 + 1+𝛼 𝑎2 ] (1 + 𝛼)

A Markov Perfect Equilibrium (MPE) outcome sets 𝜇𝑡 = 𝜇:̄

−𝑎1
𝜇𝑡 = 𝜇 ̄ = 1+𝛼 𝛼 𝛼2
𝛼 𝑐 + 1+𝛼 𝑎2 + 1+𝛼 𝑎2

In light of results presented in the previous section, this can be simplified to:

𝛼𝑎1
𝜇̄ = −
𝛼2 𝑎 2 + (1 + 𝛼)𝑐

77.10 Equilibrium Outcomes for Three Models of Government


Policy Making

Below we compute sequences {𝜃𝑡 , 𝜇𝑡 } under a Ramsey plan and compare these with the con-
stant levels of 𝜃 and 𝜇 in a) a Markov Perfect Equilibrium, and b) a Ramsey plan in which
the planner is restricted to choose 𝜇𝑡 = 𝜇̌ for all 𝑡 ≥ 0
We denote the Ramsey sequence as 𝜃𝑅 , 𝜇𝑅 and the MPE values as 𝜃𝑀𝑃 𝐸 , 𝜇𝑀𝑃 𝐸
The bliss level of inflation is denoted by 𝜃∗
First, we will create a class ChangLQ that solves the models and stores their values

In [2]: import numpy as np


from quantecon import LQ
import matplotlib.pyplot as plt
%matplotlib inline

class ChangLQ:
"""
Class to solve LQ Chang model
"""
def __init__(self, α, α0, α1, α2, c, T=1000, θ_n=200):

# Record parameters
self.α, self.α0, self.α1 = α, α0, α1
self.α2, self.c, self.T, self.θ_n = α2, c, T, θ_n

# Create β using "Poor Man's Friedman Rule"


self.β = np.exp(-α1 / (α * α2))

# Solve the Ramsey Problem #

# LQ Matrices
R = -np.array([[α0, -α1 * α / 2],
[-α1 * α/2, -α2 * α**2 / 2]])
Q = -np.array([[-c / 2]])
A = np.array([[1, 0], [0, (1 + α) / α]])
B = np.array([[0], [-1 / α]])

# Solve LQ Problem (Subproblem 1)


lq = LQ(Q, R, A, B, beta=self.β)
self.P, self.F, self.d = lq.stationary_values()

# Solve Subproblem 2
self.θ_R = -self.P[0, 1] / self.P[1, 1]
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1275

# Find bliss level of θ


self.θ_B = np.log(self.β)

# Solve the Markov Perfect Equilibrium


self.μ_MPE = -α1 / ((1 + α) / α * c + α / (1 + α)
* α2 + α**2 / (1 + α) * α2)
self.θ_MPE = self.μ_MPE
self.μ_check = -α * α1 / (α2 * α**2 + c)

# Calculate value under MPE and Check economy


self.J_MPE = (α0 + α1 * (-α * self.μ_MPE) - α2 / 2 *
(-α * self.μ_MPE)**2 - c/2 * self.μ_MPE**2) / (1 - self.β)
self.J_check = (α0 + α1 * (-α * self.μ_check) - α2/2 *
(-α * self.μ_check)**2 - c / 2 * self.μ_check**2) / (1 - self.β)

# Siμlate Ramsey plan for large number of periods


θ_series = np.vstack((np.ones((1, T)), np.zeros((1, T))))
μ_series = np.zeros(T)
J_series = np.zeros(T)
θ_series[1, 0] = self.θ_R
μ_series[0] = -self.F.dot(θ_series[:, 0])
J_series[0] = -θ_series[:, 0] @ self.P @ θ_series[:, 0].T
for i in range(1, T):
θ_series[:, i] = (A - B @ self.F) @ θ_series[:, i-1]
μ_series[i] = -self.F @ θ_series[:, i]
J_series[i] = -θ_series[:, i] @ self.P @ θ_series[:, i].T

self.J_series = J_series
self.μ_series = μ_series
self.θ_series = θ_series

# Find the range of θ in Ramsey plan


θ_LB = min(θ_series[1, :])
θ_LB = min(θ_LB, self.θ_B)
θ_UB = max(θ_series[1, :])
θ_UB = max(θ_UB, self.θ_MPE)
θ_range = θ_UB - θ_LB
self.θ_LB = θ_LB - 0.05 * θ_range
self.θ_UB = θ_UB + 0.05 * θ_range
self.θ_range = θ_range

# Find value function and policy functions over range of θ


θ_space = np.linspace(self.θ_LB, self.θ_UB, 200)
J_space = np.zeros(200)
check_space = np.zeros(200)
μ_space = np.zeros(200)
θ_prime = np.zeros(200)
for i in range(200):
J_space[i] = - np.array((1, θ_space[i])) @ self.P @ np.array((1, θ_space[i])).T
μ_space[i] = - self.F @ np.array((1, θ_space[i]))
x_prime = (A - B @ self.F) @ np.array((1, θ_space[i]))
θ_prime[i] = x_prime[1]
check_space[i] = (α0 + α1 * (-α * θ_space[i]) -
α2/2 * (-α * θ_space[i])**2 - c/2 * θ_space[i]**2) / (1 - self.β)

J_LB = min(J_space)
J_UB = max(J_space)
J_range = J_UB - J_LB
self.J_LB = J_LB - 0.05 * J_range
self.J_UB = J_UB + 0.05 * J_range
self.J_range = J_range
self.J_space = J_space
self.θ_space = θ_space
self.μ_space = μ_space
self.θ_prime = θ_prime
self.check_space = check_space

We will create an instance of ChangLQ with the following parameters

In [3]: clq = ChangLQ(α=1, α0=1, α1=0.5, α2=3, c=2)


clq.β
1276 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

Out[3]: 0.8464817248906141

The following code generates a figure that plots the value function from the Ramsey Planner’s
problem, which is maximized at 𝜃0𝑅
𝑅
The figure also shows the limiting value 𝜃∞ to which the inflation rate 𝜃𝑡 converges under the
Ramsey plan and compares it to the MPE value and the bliss value

In [4]: def plot_value_function(clq):


"""
Method to plot the value function over the relevant range of θ

Here clq is an instance of ChangLQ

"""
fig, ax = plt.subplots()

ax.set_xlim([clq.θ_LB, clq.θ_UB])
ax.set_ylim([clq.J_LB, clq.J_UB])

# Plot value function


ax.plot(clq.θ_space, clq.J_space, lw=2)
plt.xlabel(r"$\theta$", fontsize=18)
plt.ylabel(r"$J(\theta)$", fontsize=18)

t1 = clq.θ_space[np.argmax(clq.J_space)]
tR = clq.θ_series[1, -1]
θ_points = [t1, tR, clq.θ_B, clq.θ_MPE]
labels = [r"$\theta_0^R$", r"$\theta_\infty^R$",
r"$\theta^*$", r"$\theta^{MPE}$"]

# Add points for θs


for θ, label in zip(θ_points, labels):
ax.scatter(θ, clq.J_LB + 0.02 * clq.J_range, 60, 'black', 'v')
ax.annotate(label,
xy=(θ, clq.J_LB + 0.01 * clq.J_range),
xytext=(θ - 0.01 * clq.θ_range,
clq.J_LB + 0.08 * clq.J_range),
fontsize=18)
plt.tight_layout()
plt.show()

plot_value_function(clq)
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1277

The next code generates a figure that plots the value function from the Ramsey Planner’s
problem as well as that for a Ramsey planner that must choose a constant 𝜇 (that in turn
equals an implied constant 𝜃)

In [5]: def compare_ramsey_check(clq):


"""
Method to compare values of Ramsey and Check

Here clq is an instance of ChangLQ


"""
fig, ax = plt.subplots()
check_min = min(clq.check_space)
check_max = max(clq.check_space)
check_range = check_max - check_min
check_LB = check_min - 0.05 * check_range
check_UB = check_max + 0.05 * check_range
ax.set_xlim([clq.θ_LB, clq.θ_UB])
ax.set_ylim([check_LB, check_UB])
ax.plot(clq.θ_space, clq.J_space, lw=2, label=r"$J(\theta)$")

plt.xlabel(r"$\theta$", fontsize=18)
ax.plot(clq.θ_space, clq.check_space,
lw=2, label=r"$V^\check(\theta)$")
plt.legend(fontsize=14, loc='upper left')

θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.μ_check]
labels = [r"$\theta_0^R$", r"$\theta^\check$"]

for θ, label in zip(θ_points, labels):


ax.scatter(θ, check_LB + 0.02 * check_range, 60, 'k', 'v')
ax.annotate(label,
xy=(θ, check_LB + 0.01 * check_range),
xytext=(θ - 0.02 * check_range, check_LB + 0.08 * check_range),
fontsize=18)
plt.tight_layout()
plt.show()

compare_ramsey_check(clq)
1278 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

The next code generates figures that plot the policy functions for a continuation Ramsey
planner
The left figure shows the choice of 𝜃′ chosen by a continuation Ramsey planner who inherits 𝜃
The right figure plots a continuation Ramsey planner’s choice of 𝜇 as a function of an inher-
ited 𝜃

In [6]: def plot_policy_functions(clq):


"""
Method to plot the policy functions over the relevant range of θ

Here clq is an instance of ChangLQ


"""
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

labels = [r"$\theta_0^R$", r"$\theta_\infty^R$"]

ax = axes[0]
ax.set_ylim([clq.θ_LB, clq.θ_UB])
ax.plot(clq.θ_space, clq.θ_prime,
label=r"$\theta'(\theta)$", lw=2)
x = np.linspace(clq.θ_LB, clq.θ_UB, 5)
ax.plot(x, x, 'k--', lw=2, alpha=0.7)
ax.set_ylabel(r"$\theta'$", fontsize=18)

θ_points = [clq.θ_space[np.argmax(clq.J_space)],
clq.θ_series[1, -1]]

for θ, label in zip(θ_points, labels):


ax.scatter(θ, clq.θ_LB + 0.02 * clq.θ_range, 60, 'k', 'v')
ax.annotate(label,
xy=(θ, clq.θ_LB + 0.01 * clq.θ_range),
xytext=(θ - 0.02 * clq.θ_range,
clq.θ_LB + 0.08 * clq.θ_range),
fontsize=18)

ax = axes[1]
μ_min = min(clq.μ_space)
μ_max = max(clq.μ_space)
77.10. EQUILIBRIUM OUTCOMES FOR THREE MODELS OF GOVERNMENT POLICY MAKING1279

μ_range = μ_max - μ_min


ax.set_ylim([μ_min - 0.05 * μ_range, μ_max + 0.05 * μ_range])
ax.plot(clq.θ_space, clq.μ_space, lw=2)
ax.set_ylabel(r"$\mu(\theta)$", fontsize=18)

for ax in axes:
ax.set_xlabel(r"$\theta$", fontsize=18)
ax.set_xlim([clq.θ_LB, clq.θ_UB])

for θ, label in zip(θ_points, labels):


ax.scatter(θ, μ_min - 0.03 * μ_range, 60, 'black', 'v')
ax.annotate(label, xy=(θ, μ_min - 0.03 * μ_range),
xytext=(θ - 0.02 * clq.θ_range,
μ_min + 0.03 * μ_range),
fontsize=18)
plt.tight_layout()
plt.show()

plot_policy_functions(clq)

The following code generates a figure that plots sequences of 𝜇 and 𝜃 in the Ramsey plan and
compares these to the constant levels in a MPE and in a Ramsey plan with a government re-
stricted to set 𝜇𝑡 to a constant for all 𝑡

In [7]: def plot_ramsey_MPE(clq, T=15):


"""
Method to plot Ramsey plan against Markov Perfect Equilibrium

Here clq is an instance of ChangLQ


"""
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

plots = [clq.θ_series[1, 0:T], clq.μ_series[0:T]]


MPEs = [clq.θ_MPE, clq.μ_MPE]
labels = [r"\theta", r"\mu"]

axes[0].hlines(clq.θ_B, 0, T-1, 'r', label=r"$\theta^*$")

for ax, plot, MPE, label in zip(axes, plots, MPEs, labels):


ax.plot(plot, label=r"$" + label + "^R$")
ax.hlines(MPE, 0, T-1, 'orange', label=r"$" + label + "^{MPE}$")
ax.hlines(clq.μ_check, 0, T, 'g', label=r"$" + label + "^\check$")
ax.set_xlabel(r"$t$", fontsize=16)
ax.set_ylabel(r"$" + label + "_t$", fontsize=18)
ax.legend(loc='upper right')

plt.tight_layout()
plt.show()

plot_ramsey_MPE(clq)
1280 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

77.10.1 Time Inconsistency of Ramsey Plan

The variation over time in 𝜇⃗ chosen by the Ramsey planner is a symptom of time inconsis-
tency

• The Ramsey planner reaps immediate benefits from promising lower inflation later to be
achieved by costly distorting taxes
• These benefits are intermediated by reductions in expected inflation that precede the re-
ductions in money creation rates that rationalize them, as indicated by equation Eq. (3)
• A government authority offered the opportunity to ignore effects on past utilities and to
reoptimize at date 𝑡 ≥ 1 would, if allowed, want to deviate from a Ramsey plan

Note: A modified Ramsey plan constructed under the restriction that 𝜇𝑡 must be constant
over time is time consistent (see 𝜇̌ and 𝜃 ̌ in the above graphs)

77.10.2 Meaning of Time Inconsistency

In settings in which governments actually choose sequentially, many economists regard a time
inconsistent plan implausible because of the incentives to deviate that occur along the plan
A way to summarize this defect in a Ramsey plan is to say that it is not credible because
there endure incentives for policymakers to deviate from it
For that reason, the Markov perfect equilibrium concept attracts many economists

• A Markov perfect equilibrium plan is constructed to insure that government policymak-


ers who choose sequentially do not want to deviate from it

The no incentive to deviate from the plan property is what makes the Markov perfect equilib-
rium concept attractive

77.10.3 Ramsey Plan Strikes Back

Research by Abreu [1], Chari and Kehoe [26] [124], and Stokey [125] discovered conditions
under which a Ramsey plan can be rescued from the complaint that it is not credible
77.11. A FOURTH MODEL OF GOVERNMENT DECISION MAKING 1281

They accomplished this by expanding the description of a plan to include expectations about
adverse consequences of deviating from it that can serve to deter deviations
We turn to such theories of sustainable plans next

77.11 A Fourth Model of Government Decision Making

This is a model in which

• The government chooses {𝜇𝑡 }∞ 𝑡=0 not once and for all at 𝑡 = 0 but chooses to set 𝜇𝑡 at
time 𝑡, not before
• private agents’ forecasts of {𝜇𝑡+𝑗+1 , 𝜃𝑡+𝑗+1 }∞
𝑗=0 respond to whether the government at 𝑡
confirms or disappoints their forecasts of 𝜇𝑡 brought into period 𝑡 from period 𝑡 − 1
• the government at each time 𝑡 understands how private agents’ forecasts will respond to
its choice of 𝜇𝑡
• at each 𝑡, the government chooses 𝜇𝑡 to maximize a continuation discounted utility of a
representative household

77.11.1 A Theory of Government Decision Making

𝜇⃗ is chosen by a sequence of government decision makers, one for each 𝑡 ≥ 0


We assume the following within-period and between-period timing protocol for each 𝑡 ≥ 0:

• at time 𝑡 − 1, private agents expect that the government will set 𝜇𝑡 = 𝜇𝑡̃ , and more
generally that it will set 𝜇𝑡+𝑗 = 𝜇𝑡+𝑗
̃ for all 𝑗 ≥ 0
̃
• Those forecasts determine a 𝜃𝑡 = 𝜃𝑡 and an associated log of real balances 𝑚𝑡 − 𝑝𝑡 =
−𝛼𝜃𝑡̃ at 𝑡
• Given those expectations and the associated 𝜃𝑡 , at 𝑡 a government is free to set 𝜇𝑡 ∈ R
• If the government at 𝑡 confirms private agents’ expectations by setting 𝜇𝑡 = 𝜇𝑡̃ at time
𝑡, private agents expect the continuation government policy {𝜇𝑡+𝑗+1 ̃ }∞
𝑗=0 and therefore
bring expectation 𝜃𝑡+1̃ into period 𝑡 + 1
• If the government at 𝑡 disappoints private agents by setting 𝜇𝑡 ≠ 𝜇𝑡̃ , private agents
expect {𝜇𝐴 ∞ 𝐴 ∞
𝑗 }𝑗=0 as the continuation policy for 𝑡 + 1, i.e., {𝜇𝑡+𝑗+1 } = {𝜇𝑗 }𝑗=0 and there-
fore expect 𝜃0𝐴 for 𝑡 + 1. Here 𝜇𝐴⃗ = {𝜇𝐴 ∞
𝑗 }𝑗=0 is an alternative government plan to be
described below

77.11.2 Temptation to Deviate from Plan

The government’s one-period return function 𝑠(𝜃, 𝜇) described in equation Eq. (6) above has
the property that for all 𝜃

𝑠(𝜃, 0) ≥ 𝑠(𝜃, 𝜇)

This inequality implies that whenever the policy calls for the government to set 𝜇 ≠ 0, the
government could raise its one-period return by setting 𝜇 = 0
Disappointing private sector expectations in that way would increase the government’s cur-
rent payoff but would have adverse consequences for subsequent government payoffs be-
cause the private sector would alter its expectations about future settings of 𝜇
1282 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

The temporary gain constitutes the government’s temptation to deviate from a plan
If the government at 𝑡 is to resist the temptation to raise its current payoff, it is only because
it forecasts adverse consequences that its setting of 𝜇𝑡 would bring for subsequent government
payoffs via alterations in the private sector’s expectations

77.12 Sustainable or Credible Plan

We call a plan 𝜇⃗ sustainable or credible if at each 𝑡 ≥ 0 the government chooses to confirm


private agents’ prior expectation of its setting for 𝜇𝑡
The government will choose to confirm prior expectations if the long-term loss from disap-
pointing private sector expectations – coming from the government’s understanding of the
way the private sector adjusts its expectations in response to having its prior expectations at
𝑡 disappointed – outweigh the short-term gain from disappointing those expectations
The theory of sustainable or credible plans assumes throughout that private sector expecta-
tions about what future governments will do are based on the assumption that governments
at times 𝑡 ≥ 0 will act to maximize the continuation discounted utilities that describe those
governments’ purposes
This aspect of the theory means that credible plans come in pairs:

• a credible (continuation) plan to be followed if the government at 𝑡 confirms private


sector expectations
• a credible plan to be followed if the government at 𝑡 disappoints private sector expec-
tations

That credible plans come in pairs seems to bring an explosion of plans to keep track of

• each credible plan itself consists of two credible plans


• therefore, the number of plans underlying one plan is unbounded

But Dilip Abreu showed how to render manageable the number of plans that must be kept
track of
The key is an object called a self-enforcing plan

77.12.1 Abreu’s Self-Enforcing Plan

A plan 𝜇𝐴
⃗ is said to be self-enforcing if

• the consequence of disappointing private agents’ expectations at time 𝑗 is to restart the


plan at time 𝑗 + 1
• that consequence is sufficiently adverse that it deters all deviations from the plan

More precisely, a government plan 𝜇𝐴


⃗ is self-enforcing if

𝑣𝑗𝐴 = 𝑠(𝜃𝑗𝐴 , 𝜇𝐴 𝐴
𝑗 ) + 𝛽𝑣𝑗+1
(10)
≥ 𝑠(𝜃𝑗𝐴 , 0) + 𝛽𝑣0𝐴 ≡ 𝑣𝑗𝐴,𝐷 , 𝑗≥0
77.12. SUSTAINABLE OR CREDIBLE PLAN 1283

(Here it is useful to recall that setting 𝜇 = 0 is the maximizing choice for the government’s
one-period return function)
The first line tells the consequences of confirming private agents’ expectations, while the sec-
ond line tells the consequences of disappointing private agents’ expectations
A consequence of the definition is that a self-enforcing plan is credible
Self-enforcing plans can be used to construct other credible plans, including ones with better
values
A sufficient condition for a plan 𝜇⃗ to be credible or sustainable is that

𝑣𝑗̃ = 𝑠(𝜃𝑗̃ , 𝜇𝑗 ) + 𝛽 𝑣𝑗+1


̃
≥ 𝑠(𝜃𝑗̃ , 0) + 𝛽𝑣0𝐴 ∀𝑗 ≥ 0

Abreu taught us that key step in constructing a credible plan is first constructing a self-
enforcing plan that has a low time 0 value
The idea is to use the self-enforcing plan as a continuation plan whenever the government’s
choice at time 𝑡 fails to confirm private agents’ expectation
We shall use a construction featured in [1] to construct a self-enforcing plan with low time 0
value

77.12.2 Abreu Carrot-Stick Plan

[1] invented a way to create a self-enforcing plan with a low initial value
Imitating his idea, we can construct a self-enforcing plan 𝜇⃗ with a low time 0 value to the
government by insisting that future government decision makers set 𝜇𝑡 to a value yielding
low one-period utilities to the household for a long time, after which government decisions
thereafter yield high one-period utilities

• low one-period utilities early are a stick


• high one-period utilities later are a carrot

Consider a plan 𝜇𝐴
⃗ that sets 𝜇𝐴
𝑡 = 𝜇̄ (a high positive number) for 𝑇𝐴 periods, and then re-
verts to the Ramsey plan
Denote this sequence by {𝜇𝐴 ∞
𝑡 }𝑡=0

The sequence of inflation rates implied by this plan, {𝜃𝑡𝐴 }∞


𝑡=0 , can be calculated using:

∞ 𝑗
1 𝛼
𝜃𝑡𝐴 = ∑( ) 𝜇𝐴
𝑡+𝑗
1 + 𝛼 𝑗=0 1 + 𝛼

The value of {𝜃𝑡𝐴 , 𝜇𝐴 ∞


𝑡 }𝑡=0 is

𝑇𝐴 −1
𝑣0𝐴 = ∑ 𝛽 𝑡 𝑠(𝜃𝑡𝐴 , 𝜇𝐴
𝑡 )+𝛽
𝑇𝐴
𝐽 (𝜃0𝑅 )
𝑡=0
1284 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

77.12.3 Example of Self-Enforcing Plan

The following example implements an Abreu stick-and-carrot plan


The government sets 𝜇𝐴
𝑡 = 0.1 for 𝑡 = 0, 1, … , 9 and then starts the Ramsey plan

We have computed outcomes for this plan


For this plan, we plot the 𝜃𝐴 , 𝜇𝐴 sequences as well as the implied 𝑣𝐴 sequence
Notice that because the government sets money supply growth high for 10 periods, inflation
starts high
Inflation gradually slowly declines immediately because people immediately expect the gov-
ernment to lower the money growth rate after period 10
From the 10th period onwards, the inflation rate 𝜃𝑡𝐴 associated with this Abreu plan starts
𝐴
the Ramsey plan from its beginning, i.e., 𝜃𝑡+10 = 𝜃𝑡𝑅 ∀𝑡 ≥ 0

In [8]: def abreu_plan(clq, T=1000, T_A=10, μ_bar=0.1, T_Plot=20):

# Append Ramsey μ series to stick μ series


clq.μ_A = np.append(np.ones(T_A) * μ_bar, clq.μ_series[:-T_A])

# Calculate implied stick θ series


clq.θ_A = np.zeros(T)
discount = np.zeros(T)
for t in range(T):
discount[t] = (clq.α / (1 + clq.α))**t
for t in range(T):
length = clq.μ_A[t:].shape[0]
clq.θ_A[t] = 1 / (clq.α + 1) * sum(clq.μ_A[t:] * discount[0:length])

# Calculate utility of stick plan


U_A = np.zeros(T)
for t in range(T):
U_A[t] = clq.β**t * (clq.α0 + clq.α1 * (-clq.θ_A[t]) -
clq.α2 / 2 * (-clq.θ_A[t])**2 - clq.c * clq.μ_A[t]**2)

clq.V_A = np.zeros(T)
for t in range(T):
clq.V_A[t] = sum(U_A[t:] / clq.β**t)

# Make sure Abreu plan is self-enforcing


clq.V_dev = np.zeros(T_Plot)
for t in range(T_Plot):
clq.V_dev[t] = (clq.α0 + clq.α1 * (-clq.θ_A[t]) -
clq.α2 / 2 * (-clq.θ_A[t])**2) + \
clq.β * clq.V_A[0]

fig, axes = plt.subplots(3, 1, figsize=(8, 12))

axes[2].plot(clq.V_dev[0:T_Plot], label="$V^{A, D}_t$", c="orange")

plots = [clq.θ_A, clq.μ_A, clq.V_A]


labels = [r"$\theta_t^A$", r"$\mu_t^A$", r"$V^A_t$"]

for plot, ax, label in zip(plots, axes, labels):


ax.plot(plot[0:T_Plot], label=label)
ax.set(xlabel="$t$", ylabel=label)
ax.legend()

plt.tight_layout()
plt.show()

abreu_plan(clq)
77.12. SUSTAINABLE OR CREDIBLE PLAN 1285

To confirm that the plan 𝜇𝐴 ⃗ is self-enforcing, we plot an object that we call 𝑉𝑡𝐴,𝐷 , defined
in the second line of equation Eq. (10) above
𝑉𝑡𝐴,𝐷 is the value at 𝑡 of deviating from the self-enforcing plan 𝜇𝐴
⃗ by setting 𝜇𝑡 = 0 and then
1286 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS

restarting the plan at 𝑣0𝐴 at 𝑡 + 1


Notice that 𝑣𝑡𝐴 > 𝑣𝑡𝐴,𝐷
This confirms that 𝜇𝐴
⃗ is a self-enforcing plan
We can also verify the inequalities required for 𝜇𝐴
⃗ to be self-confirming numerically as follows

In [9]: np.all(clq.V_A[0:20] > clq.V_dev[0:20])

Out[9]: True

Given that plan 𝜇𝐴


⃗ is self-enforcing, we can check that the Ramsey plan 𝜇𝑅
⃗ is sustainable by
verifying that:

𝑣𝑡𝑅 ≥ 𝑠(𝜃𝑡𝑅 , 0) + 𝛽𝑣0𝐴 , ∀𝑡 ≥ 0

In [10]: def check_ramsey(clq, T=1000):


# Make sure Ramsey plan is sustainable
R_dev = np.zeros(T)
for t in range(T):
R_dev[t] = (clq.α0 + clq.α1 * (-clq.θ_series[1, t]) -
clq.α2 / 2 * (-clq.θ_series[1, t])**2) + clq.β * clq.V_A[0]

return np.all(clq.J_series > R_dev)

check_ramsey(clq)

Out[10]: True

77.12.4 Recursive Representation of a Sustainable Plan

We can represent a sustainable plan recursively by taking the continuation value 𝑣𝑡 as a state
variable
We form the following 3-tuple of functions:

𝜇𝑡̂ = 𝜈𝜇 (𝑣𝑡 )
𝜃𝑡 = 𝜈𝜃 (𝑣𝑡 ) (11)
𝑣𝑡+1 = 𝜈𝑣 (𝑣𝑡 , 𝜇𝑡 )

In addition to these equations, we need an initial value 𝑣0 to characterize a sustainable plan


The first equation of Eq. (11) tells the recommended value of 𝜇𝑡̂ as a function of the promised
value 𝑣𝑡
The second equation of Eq. (11) tells the inflation rate as a function of 𝑣𝑡
The third equation of Eq. (11) updates the continuation value in a way that depends on
whether the government at 𝑡 confirms private agents’ expectations by setting 𝜇𝑡 equal to the
recommended value 𝜇𝑡̂ , or whether it disappoints those expectations

77.13 Comparison of Equilibrium Values

We have computed plans for


77.14. NOTE ON DYNAMIC PROGRAMMING SQUARED 1287

• an ordinary (unrestricted) Ramsey planner who chooses a sequence {𝜇𝑡 }∞


𝑡=0 at time 0
• a Ramsey planner restricted to choose a constant 𝜇 for all 𝑡 ≥ 0
• a Markov perfect sequence of governments

Below we compare equilibrium time zero values for these three


We confirm that the value delivered by the unrestricted Ramsey planner exceeds the value
delivered by the restricted Ramsey planner which in turn exceeds the value delivered by the
Markov perfect sequence of governments

In [11]: clq.J_series[0]

Out[11]: 6.67918822960449

In [12]: clq.J_check

Out[12]: 6.676729524674898

In [13]: clq.J_MPE

Out[13]: 6.663435886995107

We have also computed sustainable plans for a government or sequence of governments


that choose sequentially
These include

• a self-enforcing plan that gives a low initial value 𝑣0


• a better plan – possibly one that attains values associated with Ramsey plan – that is
not self-enforcing

77.14 Note on Dynamic Programming Squared

The theory deployed in this lecture is an application of what we nickname dynamic pro-
gramming squared
The nickname refers to the fact that a value satisfying one Bellman equation is itself an argu-
ment in a second Bellman equation
Thus, our models have involved two Bellman equations:

• equation Eq. (1) expresses how 𝜃𝑡 depends on 𝜇𝑡 and 𝜃𝑡+1


• equation Eq. (4) expresses how value 𝑣𝑡 depends on (𝜇𝑡 , 𝜃𝑡 ) and 𝑣𝑡+1

A value 𝜃 from one Bellman equation appears as an argument of a second Bellman equation
for another value 𝑣
1288 77. RAMSEY PLANS, TIME INCONSISTENCY, SUSTAINABLE PLANS
78

Optimal Taxation in an LQ
Economy

78.1 Contents

• Overview 78.2

• The Ramsey Problem 78.3

• Implementation 78.4

• Examples 78.5

• Exercises 78.6

• Solutions 78.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

78.2 Overview

In this lecture, we study optimal fiscal policy in a linear quadratic setting


We slightly modify a well-known model of Robert Lucas and Nancy Stokey [90] so that conve-
nient formulas for solving linear-quadratic models can be applied to simplify the calculations
The economy consists of a representative household and a benevolent government
The government finances an exogenous stream of government purchases with state-contingent
loans and a linear tax on labor income
A linear tax is sometimes called a flat-rate tax
The household maximizes utility by choosing paths for consumption and labor, taking prices
and the government’s tax rate and borrowing plans as given
Maximum attainable utility for the household depends on the government’s tax and borrow-
ing plans

1289
1290 78. OPTIMAL TAXATION IN AN LQ ECONOMY

The Ramsey problem [106] is to choose tax and borrowing plans that maximize the house-
hold’s welfare, taking the household’s optimizing behavior as given
There is a large number of competitive equilibria indexed by different government fiscal poli-
cies
The Ramsey planner chooses the best competitive equilibrium
We want to study the dynamics of tax rates, tax revenues, government debt under a Ramsey
plan
Because the Lucas and Stokey model features state-contingent government debt, the govern-
ment debt dynamics differ substantially from those in a model of Robert Barro [11]
The treatment given here closely follows this manuscript, prepared by Thomas J. Sargent and
Francois R. Velde
We cover only the key features of the problem in this lecture, leaving you to refer to that
source for additional results and intuition

78.2.1 Model Features

• Linear quadratic (LQ) model


• Representative household
• Stochastic dynamic programming over an infinite horizon
• Distortionary taxation

78.3 The Ramsey Problem

We begin by outlining the key assumptions regarding technology, households and the govern-
ment sector

78.3.1 Technology

Labor can be converted one-for-one into a single, non-storable consumption good


In the usual spirit of the LQ model, the amount of labor supplied in each period is unre-
stricted
This is unrealistic, but helpful when it comes to solving the model
Realistic labor supply can be induced by suitable parameter values

78.3.2 Households

Consider a representative household who chooses a path {ℓ𝑡 , 𝑐𝑡 } for labor and consumption to
maximize

1 ∞
−E ∑ 𝛽 𝑡 [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] (1)
2 𝑡=0

subject to the budget constraint


78.3. THE RAMSEY PROBLEM 1291


E ∑ 𝛽 𝑡 𝑝𝑡0 [𝑑𝑡 + (1 − 𝜏𝑡 )ℓ𝑡 + 𝑠𝑡 − 𝑐𝑡 ] = 0 (2)
𝑡=0

Here

• 𝛽 is a discount factor in (0, 1)


• 𝑝𝑡0 is a scaled Arrow-Debreu price at time 0 of history contingent goods at time 𝑡 + 𝑗
• 𝑏𝑡 is a stochastic preference parameter
• 𝑑𝑡 is an endowment process
• 𝜏𝑡 is a flat tax rate on labor income
• 𝑠𝑡 is a promised time-𝑡 coupon payment on debt issued by the government

The scaled Arrow-Debreu price 𝑝𝑡0 is related to the unscaled Arrow-Debreu price as follows.
If we let 𝜋𝑡0 (𝑥𝑡 ) denote the probability (density) of a history 𝑥𝑡 = [𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] of the state
𝑥𝑡 , then the Arrow-Debreu time 0 price of a claim on one unit of consumption at date 𝑡, his-
tory 𝑥𝑡 would be

𝛽 𝑡 𝑝𝑡0
𝜋𝑡0 (𝑥𝑡 )

Thus, our scaled Arrow-Debreu price is the ordinary Arrow-Debreu price multiplied by the
discount factor 𝛽 𝑡 and divided by an appropriate probability
The budget constraint Eq. (2) requires that the present value of consumption be restricted to
equal the present value of endowments, labor income and coupon payments on bond holdings

78.3.3 Government

The government imposes a linear tax on labor income, fully committing to a stochastic path
of tax rates at time zero
The government also issues state-contingent debt
Given government tax and borrowing plans, we can construct a competitive equilibrium with
distorting government taxes
Among all such competitive equilibria, the Ramsey plan is the one that maximizes the welfare
of the representative consumer

78.3.4 Exogenous Variables

Endowments, government expenditure, the preference shock process 𝑏𝑡 , and promised coupon
payments on initial government debt 𝑠𝑡 are all exogenous, and given by

• 𝑑𝑡 = 𝑆𝑑 𝑥𝑡
• 𝑔𝑡 = 𝑆𝑔 𝑥𝑡
• 𝑏𝑡 = 𝑆𝑏 𝑥𝑡
• 𝑠𝑡 = 𝑆𝑠 𝑥𝑡
1292 78. OPTIMAL TAXATION IN AN LQ ECONOMY

The matrices 𝑆𝑑 , 𝑆𝑔 , 𝑆𝑏 , 𝑆𝑠 are primitives and {𝑥𝑡 } is an exogenous stochastic process taking
values in R𝑘
We consider two specifications for {𝑥𝑡 }
1. Discrete case: {𝑥𝑡 } is a discrete state Markov chain with transition matrix 𝑃
1. VAR case: {𝑥𝑡 } obeys 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 where {𝑤𝑡 } is independent zero-mean Gaussian
with identify covariance matrix

78.3.5 Feasibility

The period-by-period feasibility restriction for this economy is

𝑐𝑡 + 𝑔 𝑡 = 𝑑 𝑡 + ℓ 𝑡 (3)

A labor-consumption process {ℓ𝑡 , 𝑐𝑡 } is called feasible if Eq. (3) holds for all 𝑡

78.3.6 Government Budget Constraint

Where 𝑝𝑡0 is again a scaled Arrow-Debreu price, the time zero government budget constraint
is


E ∑ 𝛽 𝑡 𝑝𝑡0 (𝑠𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 ) = 0 (4)
𝑡=0

78.3.7 Equilibrium

An equilibrium is a feasible allocation {ℓ𝑡 , 𝑐𝑡 }, a sequence of prices {𝑝𝑡0 }, and a tax system
{𝜏𝑡 } such that

1. The allocation {ℓ𝑡 , 𝑐𝑡 } is optimal for the household given {𝑝𝑡0 } and {𝜏𝑡 }
2. The government’s budget constraint Eq. (4) is satisfied

The Ramsey problem is to choose the equilibrium {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } that maximizes the house-
hold’s welfare
If {ℓ𝑡 , 𝑐𝑡 , 𝜏𝑡 , 𝑝𝑡0 } solves the Ramsey problem, then {𝜏𝑡 } is called the Ramsey plan
The solution procedure we adopt is

1. Use the first-order conditions from the household problem to pin down prices and allo-
cations given {𝜏𝑡 }
2. Use these expressions to rewrite the government budget constraint Eq. (4) in terms of
exogenous variables and allocations
3. Maximize the household’s objective function Eq. (1) subject to the constraint con-
structed in step 2 and the feasibility constraint Eq. (3)

The solution to this maximization problem pins down all quantities of interest
78.3. THE RAMSEY PROBLEM 1293

78.3.8 Solution

Step one is to obtain the first-conditions for the household’s problem, taking taxes and prices
as given
Letting 𝜇 be the Lagrange multiplier on Eq. (2), the first-order conditions are 𝑝𝑡0 = (𝑐𝑡 − 𝑏𝑡 )/𝜇
and ℓ𝑡 = (𝑐𝑡 − 𝑏𝑡 )(1 − 𝜏𝑡 )
Rearranging and normalizing at 𝜇 = 𝑏0 − 𝑐0 , we can write these conditions as

𝑏𝑡 − 𝑐𝑡 ℓ𝑡
𝑝𝑡0 = and 𝜏𝑡 = 1 − (5)
𝑏0 − 𝑐0 𝑏𝑡 − 𝑐𝑡

Substituting Eq. (5) into the government’s budget constraint Eq. (4) yields


E ∑ 𝛽 𝑡 [(𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 − ℓ𝑡 ) + ℓ𝑡2 ] = 0 (6)
𝑡=0

The Ramsey problem now amounts to maximizing Eq. (1) subject to Eq. (6) and Eq. (3)
The associated Lagrangian is


1
ℒ = E ∑ 𝛽 𝑡 {− [(𝑐𝑡 − 𝑏𝑡 )2 + ℓ𝑡2 ] + 𝜆 [(𝑏𝑡 − 𝑐𝑡 )(ℓ𝑡 − 𝑠𝑡 − 𝑔𝑡 ) − ℓ𝑡2 ] + 𝜇𝑡 [𝑑𝑡 + ℓ𝑡 − 𝑐𝑡 − 𝑔𝑡 ]} (7)
𝑡=0
2

The first-order conditions associated with 𝑐𝑡 and ℓ𝑡 are

−(𝑐𝑡 − 𝑏𝑡 ) + 𝜆[−ℓ𝑡 + (𝑔𝑡 + 𝑠𝑡 )] = 𝜇𝑡

and

ℓ𝑡 − 𝜆[(𝑏𝑡 − 𝑐𝑡 ) − 2ℓ𝑡 ] = 𝜇𝑡

Combining these last two equalities with Eq. (3) and working through the algebra, one can
show that

ℓ𝑡 = ℓ𝑡̄ − 𝜈𝑚𝑡 and 𝑐𝑡 = 𝑐𝑡̄ − 𝜈𝑚𝑡 (8)

where

• 𝜈 ∶= 𝜆/(1 + 2𝜆)
• ℓ𝑡̄ ∶= (𝑏𝑡 − 𝑑𝑡 + 𝑔𝑡 )/2
• 𝑐𝑡̄ ∶= (𝑏𝑡 + 𝑑𝑡 − 𝑔𝑡 )/2
• 𝑚𝑡 ∶= (𝑏𝑡 − 𝑑𝑡 − 𝑠𝑡 )/2

Apart from 𝜈, all of these quantities are expressed in terms of exogenous variables
To solve for 𝜈, we can use the government’s budget constraint again
The term inside the brackets in Eq. (6) is (𝑏𝑡 − 𝑐𝑡 )(𝑠𝑡 + 𝑔𝑡 ) − (𝑏𝑡 − 𝑐𝑡 )ℓ𝑡 + ℓ𝑡2
1294 78. OPTIMAL TAXATION IN AN LQ ECONOMY

Using Eq. (8), the definitions above and the fact that ℓ ̄ = 𝑏 − 𝑐,̄ this term can be rewritten as

(𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) + 2𝑚2𝑡 (𝜈 2 − 𝜈)

Reinserting into Eq. (6), we get

∞ ∞
E {∑ 𝛽 𝑡 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )} + (𝜈 2 − 𝜈)E {∑ 𝛽 𝑡 2𝑚2𝑡 } = 0 (9)
𝑡=0 𝑡=0

Although it might not be clear yet, we are nearly there because:

• The two expectations terms in Eq. (9) can be solved for in terms of model primitives
• This in turn allows us to solve for the Lagrange multiplier 𝜈
• With 𝜈 in hand, we can go back and solve for the allocations via Eq. (8)
• Once we have the allocations, prices and the tax system can be derived from Eq. (5)

78.3.9 Computing the Quadratic Term

Let’s consider how to obtain the term 𝜈 in Eq. (9)


If we can compute the two expected geometric sums

∞ ∞
𝑏0 ∶= E {∑ 𝛽 (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 )}
𝑡
and 𝑎0 ∶= E {∑ 𝛽 𝑡 2𝑚2𝑡 } (10)
𝑡=0 𝑡=0

then the problem reduces to solving

𝑏0 + 𝑎0 (𝜈 2 − 𝜈) = 0

for 𝜈
Provided that 4𝑏0 < 𝑎0 , there is a unique solution 𝜈 ∈ (0, 1/2), and a unique corresponding
𝜆>0
Let’s work out how to compute mathematical expectations in Eq. (10)
For the first one, the random variable (𝑏𝑡 − 𝑐𝑡̄ )(𝑔𝑡 + 𝑠𝑡 ) inside the summation can be expressed
as

1 ′
𝑥 (𝑆 − 𝑆𝑑 + 𝑆𝑔 )′ (𝑆𝑔 + 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏

For the second expectation in Eq. (10), the random variable 2𝑚2𝑡 can be written as

1 ′
𝑥 (𝑆 − 𝑆𝑑 − 𝑆𝑠 )′ (𝑆𝑏 − 𝑆𝑑 − 𝑆𝑠 )𝑥𝑡
2 𝑡 𝑏
It follows that both objects of interest are special cases of the expression


𝑞(𝑥0 ) = E ∑ 𝛽 𝑡 𝑥′𝑡 𝐻𝑥𝑡 (11)
𝑡=0
78.3. THE RAMSEY PROBLEM 1295

where 𝐻 is a matrix conformable to 𝑥𝑡 and 𝑥′𝑡 is the transpose of column vector 𝑥𝑡


Suppose first that {𝑥𝑡 } is the Gaussian VAR described above
In this case, the formula for computing 𝑞(𝑥0 ) is known to be 𝑞(𝑥0 ) = 𝑥′0 𝑄𝑥0 + 𝑣, where

• 𝑄 is the solution to 𝑄 = 𝐻 + 𝛽𝐴′ 𝑄𝐴, and


• 𝑣 = trace (𝐶 ′ 𝑄𝐶)𝛽/(1 − 𝛽)

The first equation is known as a discrete Lyapunov equation and can be solved using this
function

78.3.10 Finite State Markov Case

Next, suppose that {𝑥𝑡 } is the discrete Markov process described above
Suppose further that each 𝑥𝑡 takes values in the state space {𝑥1 , … , 𝑥𝑁 } ⊂ R𝑘
Let ℎ ∶ R𝑘 → R be a given function, and suppose that we wish to evaluate


𝑞(𝑥0 ) = E ∑ 𝛽 𝑡 ℎ(𝑥𝑡 ) given 𝑥0 = 𝑥𝑗
𝑡=0

For example, in the discussion above, ℎ(𝑥𝑡 ) = 𝑥′𝑡 𝐻𝑥𝑡


It is legitimate to pass the expectation through the sum, leading to


𝑞(𝑥0 ) = ∑ 𝛽 𝑡 (𝑃 𝑡 ℎ)[𝑗] (12)
𝑡=0

Here

• 𝑃 𝑡 is the 𝑡-th power of the transition matrix 𝑃


• ℎ is, with some abuse of notation, the vector (ℎ(𝑥1 ), … , ℎ(𝑥𝑁 ))
• (𝑃 𝑡 ℎ)[𝑗] indicates the 𝑗-th element of 𝑃 𝑡 ℎ

It can be shown that Eq. (12) is in fact equal to the 𝑗-th element of the vector (𝐼 − 𝛽𝑃 )−1 ℎ
This last fact is applied in the calculations below

78.3.11 Other Variables

We are interested in tracking several other variables besides the ones described above.
To prepare the way for this, we define

𝑡
𝑏𝑡+𝑗 − 𝑐𝑡+𝑗
𝑝𝑡+𝑗 =
𝑏𝑡 − 𝑐𝑡

as the scaled Arrow-Debreu time 𝑡 price of a history contingent claim on one unit of con-
sumption at time 𝑡 + 𝑗
1296 78. OPTIMAL TAXATION IN AN LQ ECONOMY

These are prices that would prevail at time 𝑡 if markets were reopened at time 𝑡
These prices are constituents of the present value of government obligations outstanding at
time 𝑡, which can be expressed as


𝐵𝑡 ∶= E𝑡 ∑ 𝛽 𝑗 𝑝𝑡+𝑗
𝑡
(𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) (13)
𝑗=0

Using our expression for prices and the Ramsey plan, we can also write 𝐵𝑡 as

∞ 2
(𝑏𝑡+𝑗 − 𝑐𝑡+𝑗 )(ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 ) − ℓ𝑡+𝑗
𝐵𝑡 = E𝑡 ∑ 𝛽 𝑗
𝑗=0
𝑏𝑡 − 𝑐𝑡

This version is more convenient for computation


Using the equation

𝑡 𝑡 𝑡+1
𝑝𝑡+𝑗 = 𝑝𝑡+1 𝑝𝑡+𝑗

it is possible to verify that Eq. (13) implies that


𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝐸𝑡 ∑ 𝑝𝑡+𝑗 (𝜏𝑡+𝑗 ℓ𝑡+𝑗 − 𝑔𝑡+𝑗 )
𝑗=1

and

𝑡
𝐵𝑡 = (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 ) + 𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 (14)

Define

𝑅𝑡−1 ∶= E𝑡 𝛽 𝑗 𝑝𝑡+1
𝑡
(15)

𝑅𝑡 is the gross 1-period risk-free rate for loans between 𝑡 and 𝑡 + 1

78.3.12 A Martingale

We now want to study the following two objects, namely,

𝜋𝑡+1 ∶= 𝐵𝑡+1 − 𝑅𝑡 [𝐵𝑡 − (𝜏𝑡 ℓ𝑡 − 𝑔𝑡 )]

and the cumulation of 𝜋𝑡

𝑡
Π𝑡 ∶= ∑ 𝜋𝑡
𝑠=0

The term 𝜋𝑡+1 is the difference between two quantities:

• 𝐵𝑡+1 , the value of government debt at the start of period 𝑡 + 1


78.4. IMPLEMENTATION 1297

• 𝑅𝑡 [𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ], which is what the government would have owed at the begin-
ning of period 𝑡 + 1 if it had simply borrowed at the one-period risk-free rate
rather than selling state-contingent securities

Thus, 𝜋𝑡+1 is the excess payout on the actual portfolio of state-contingent government debt
relative to an alternative portfolio sufficient to finance 𝐵𝑡 + 𝑔𝑡 − 𝜏𝑡 ℓ𝑡 and consisting entirely of
risk-free one-period bonds
Use expressions Eq. (14) and Eq. (15) to obtain

1 𝑡
𝜋𝑡+1 = 𝐵𝑡+1 − 𝑡 [𝛽𝐸𝑡 𝑝𝑡+1 𝐵𝑡+1 ]
𝛽𝐸𝑡 𝑝𝑡+1

or

𝜋𝑡+1 = 𝐵𝑡+1 − 𝐸𝑡̃ 𝐵𝑡+1 (16)

where 𝐸𝑡̃ is the conditional mathematical expectation taken with respect to a one-step tran-
sition density that has been formed by multiplying the original transition density with the
likelihood ratio

𝑡
𝑝𝑡+1
𝑚𝑡𝑡+1 = 𝑡
𝐸𝑡 𝑝𝑡+1

It follows from equation Eq. (16) that

𝐸𝑡̃ 𝜋𝑡+1 = 𝐸𝑡̃ 𝐵𝑡+1 − 𝐸𝑡̃ 𝐵𝑡+1 = 0

which asserts that {𝜋𝑡+1 } is a martingale difference sequence under the distorted probability
measure, and that {Π𝑡 } is a martingale under the distorted probability measure
In the tax-smoothing model of Robert Barro [11], government debt is a random walk
In the current model, government debt {𝐵𝑡 } is not a random walk, but the excess payoff
{Π𝑡 } on it is

78.4 Implementation

The following code provides functions for

1. Solving for the Ramsey plan given a specification of the economy


2. Simulating the dynamics of the major variables

Description and clarifications are given below

In [2]: import sys


import numpy as np
from numpy import sqrt, eye, zeros, cumsum
from numpy.random import randn
import scipy.linalg
import matplotlib.pyplot as plt
from collections import namedtuple
1298 78. OPTIMAL TAXATION IN AN LQ ECONOMY

from quantecon import nullspace, mc_sample_path, var_quadratic_sum

# == Set up a namedtuple to store data on the model economy == #


Economy = namedtuple('economy',
('β', # Discount factor
'Sg', # Govt spending selector matrix
'Sd', # Exogenous endowment selector matrix
'Sb', # Utility parameter selector matrix
'Ss', # Coupon payments selector matrix
'discrete', # Discrete or continuous -- boolean
'proc')) # Stochastic process parameters

# == Set up a namedtuple to store return values for compute_paths() == #


Path = namedtuple('path',
('g', # Govt spending
'd', # Endowment
'b', # Utility shift parameter
's', # Coupon payment on existing debt
'c', # Consumption
'l', # Labor
'p', # Price
'τ', # Tax rate
'rvn', # Revenue
'B', # Govt debt
'R', # Risk-free gross return
'π', # One-period risk-free interest rate
'Π', # Cumulative rate of return, adjusted
'ξ')) # Adjustment factor for Π

def compute_paths(T, econ):


"""
Compute simulated time paths for exogenous and endogenous variables.

Parameters
===========
T: int
Length of the simulation

econ: a namedtuple of type 'Economy', containing


β - Discount factor
Sg - Govt spending selector matrix
Sd - Exogenous endowment selector matrix
Sb - Utility parameter selector matrix
Ss - Coupon payments selector matrix
discrete - Discrete exogenous process (True or False)
proc - Stochastic process parameters

Returns
========
path: a namedtuple of type 'Path', containing
g - Govt spending
d - Endowment
b - Utility shift parameter
s - Coupon payment on existing debt
c - Consumption
l - Labor
p - Price
τ - Tax rate
rvn - Revenue
B - Govt debt
R - Risk-free gross return
π - One-period risk-free interest rate
Π - Cumulative rate of return, adjusted
ξ - Adjustment factor for Π

The corresponding values are flat numpy ndarrays.

"""

# == Simplify names == #
β, Sg, Sd, Sb, Ss = econ.β, econ.Sg, econ.Sd, econ.Sb, econ.Ss
78.4. IMPLEMENTATION 1299

if econ.discrete:
P, x_vals = econ.proc
else:
A, C = econ.proc

# == Simulate the exogenous process x == #


if econ.discrete:
state = mc_sample_path(P, init=0, sample_size=T)
x = x_vals[:, state]
else:
# == Generate an initial condition x0 satisfying x0 = A x0 == #
nx, nx = A.shape
x0 = nullspace((eye(nx) - A))
x0 = -x0 if (x0[nx-1] < 0) else x0
x0 = x0 / x0[nx-1]

# == Generate a time series x of length T starting from x0 == #


nx, nw = C.shape
x = zeros((nx, T))
w = randn(nw, T)
x[:, 0] = x0.T
for t in range(1, T):
x[:, t] = A @ x[:, t-1] + C @ w[:, t]

# == Compute exogenous variable sequences == #


g, d, b, s = ((S @ x).flatten() for S in (Sg, Sd, Sb, Ss))

# == Solve for Lagrange multiplier in the govt budget constraint == #


# In fact we solve for ν = lambda / (1 + 2*lambda). Here ν is the
# solution to a quadratic equation a(ν**2 - ν) + b = 0 where
# a and b are expected discounted sums of quadratic forms of the state.
Sm = Sb - Sd - Ss
# == Compute a and b == #
if econ.discrete:
ns = P.shape[0]
F = scipy.linalg.inv(eye(ns) - β * P)
a0 = 0.5 * (F @ (x_vals.T @ Sm.T)**2)[0]
H = ((Sb - Sd + Sg) @ x_vals) * ((Sg - Ss) @ x_vals)
b0 = 0.5 * (F @ H.T)[0]
a0, b0 = float(a0), float(b0)
else:
H = Sm.T @ Sm
a0 = 0.5 * var_quadratic_sum(A, C, H, β, x0)
H = (Sb - Sd + Sg).T @ (Sg + Ss)
b0 = 0.5 * var_quadratic_sum(A, C, H, β, x0)

# == Test that ν has a real solution before assigning == #


warning_msg = """
Hint: you probably set government spending too {}. Elect a {}
Congress and start over.
"""
disc = a0**2 - 4 * a0 * b0
if disc >= 0:
ν = 0.5 * (a0 - sqrt(disc)) / a0
else:
print("There is no Ramsey equilibrium for these parameters.")
print(warning_msg.format('high', 'Republican'))
sys.exit(0)

# == Test that the Lagrange multiplier has the right sign == #


if ν * (0.5 - ν) < 0:
print("Negative multiplier on the government budget constraint.")
print(warning_msg.format('low', 'Democratic'))
sys.exit(0)

# == Solve for the allocation given ν and x == #


Sc = 0.5 * (Sb + Sd - Sg - ν * Sm)
Sl = 0.5 * (Sb - Sd + Sg - ν * Sm)
c = (Sc @ x).flatten()
l = (Sl @ x).flatten()
p = ((Sb - Sc) @ x).flatten() # Price without normalization
τ = 1 - l / (b - c)
1300 78. OPTIMAL TAXATION IN AN LQ ECONOMY

rvn = l * τ

# == Compute remaining variables == #


if econ.discrete:
H = ((Sb - Sc) @ x_vals) * ((Sl - Sg) @ x_vals) - (Sl @ x_vals)**2
temp = (F @ H.T).flatten()
B = temp[state] / p
H = (P[state, :] @ x_vals.T @ (Sb - Sc).T).flatten()
R = p / (β * H)
temp = ((P[state, :] @ x_vals.T @ (Sb - Sc).T)).flatten()
ξ = p[1:] / temp[:T-1]
else:
H = Sl.T @ Sl - (Sb - Sc).T @ (Sl - Sg)
L = np.empty(T)
for t in range(T):
L[t] = var_quadratic_sum(A, C, H, β, x[:, t])
B = L / p
Rinv = (β * ((Sb - Sc) @ A @ x)).flatten() / p
R = 1 / Rinv
AF1 = (Sb - Sc) @ x[:, 1:]
AF2 = (Sb - Sc) @ A @ x[:, :T-1]
ξ = AF1 / AF2
ξ = ξ.flatten()

π = B[1:] - R[:T-1] * B[:T-1] - rvn[:T-1] + g[:T-1]


Π = cumsum(π * ξ)

# == Prepare return values == #


path = Path(g=g, d=d, b=b, s=s, c=c, l=l, p=p,
τ=τ, rvn=rvn, B=B, R=R, π=π, Π=Π, ξ=ξ)

return path

def gen_fig_1(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""

T = len(path.c)

# == Prepare axes == #
num_rows, num_cols = 2, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(14, 10))
plt.subplots_adjust(hspace=0.4)
for i in range(num_rows):
for j in range(num_cols):
axes[i, j].grid()
axes[i, j].set_xlabel('Time')
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

# == Plot consumption, govt expenditure and revenue == #


ax = axes[0, 0]
ax.plot(path.rvn, label=r'$\tau_t \ell_t$', **p_args)
ax.plot(path.g, label='$g_t$', **p_args)
ax.plot(path.c, label='$c_t$', **p_args)
ax.legend(ncol=3, **legend_args)

# == Plot govt expenditure and debt == #


ax = axes[0, 1]
ax.plot(list(range(1, T+1)), path.rvn, label=r'$\tau_t \ell_t$', **p_args)
ax.plot(list(range(1, T+1)), path.g, label='$g_t$', **p_args)
ax.plot(list(range(1, T)), path.B[1:T], label='$B_{t+1}$', **p_args)
ax.legend(ncol=3, **legend_args)

# == Plot risk-free return == #


ax = axes[1, 0]
ax.plot(list(range(1, T+1)), path.R - 1, label='$R_t - 1$', **p_args)
ax.legend(ncol=1, **legend_args)
78.4. IMPLEMENTATION 1301

# == Plot revenue, expenditure and risk free rate == #


ax = axes[1, 1]
ax.plot(list(range(1, T+1)), path.rvn, label=r'$\tau_t \ell_t$', **p_args)
ax.plot(list(range(1, T+1)), path.g, label='$g_t$', **p_args)
axes[1, 1].plot(list(range(1, T)), path.π, label=r'$\pi_{t+1}$', **p_args)
ax.legend(ncol=3, **legend_args)

plt.show()

def gen_fig_2(path):
"""
The parameter is the path namedtuple returned by compute_paths(). See
the docstring of that function for details.
"""

T = len(path.c)

# == Prepare axes == #
num_rows, num_cols = 2, 1
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 10))
plt.subplots_adjust(hspace=0.5)
bbox = (0., 1.02, 1., .102)
bbox = (0., 1.02, 1., .102)
legend_args = {'bbox_to_anchor': bbox, 'loc': 3, 'mode': 'expand'}
p_args = {'lw': 2, 'alpha': 0.7}

# == Plot adjustment factor == #


ax = axes[0]
ax.plot(list(range(2, T+1)), path.ξ, label=r'$\xi_t$', **p_args)
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=1, **legend_args)

# == Plot adjusted cumulative return == #


ax = axes[1]
ax.plot(list(range(2, T+1)), path.Π, label=r'$\Pi_t$', **p_args)
ax.grid()
ax.set_xlabel('Time')
ax.legend(ncol=1, **legend_args)

plt.show()

78.4.1 Comments on the Code

The function var_quadratic_sum imported from quadsums is for computing the value of
Eq. (11) when the exogenous process {𝑥𝑡 } is of the VAR type described above
Below the definition of the function, you will see definitions of two namedtuple objects,
Economy and Path
The first is used to collect all the parameters and primitives of a given LQ economy, while the
second collects output of the computations
In Python, a namedtuple is a popular data type from the collections module of the
standard library that replicates the functionality of a tuple, but also allows you to assign a
name to each tuple element
These elements can then be references via dotted attribute notation — see for example the
use of path in the functions gen_fig_1() and gen_fig_2()
The benefits of using namedtuples:

• Keeps content organized by meaning


• Helps reduce the number of global variables
1302 78. OPTIMAL TAXATION IN AN LQ ECONOMY

Other than that, our code is long but relatively straightforward

78.5 Examples

Let’s look at two examples of usage

78.5.1 The Continuous Case

Our first example adopts the VAR specification described above


Regarding the primitives, we set

• 𝛽 = 1/1.05
• 𝑏𝑡 = 2.135 and 𝑠𝑡 = 𝑑𝑡 = 0 for all 𝑡

Government spending evolves according to

𝑔𝑡+1 − 𝜇𝑔 = 𝜌(𝑔𝑡 − 𝜇𝑔 ) + 𝐶𝑔 𝑤𝑔,𝑡+1

with 𝜌 = 0.7, 𝜇𝑔 = 0.35 and 𝐶𝑔 = 𝜇𝑔 √1 − 𝜌2 /10


Here’s the code

In [3]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .7, .35
A = eye(2)
A[0, :] = ρ, mg * (1-ρ)
C = np.zeros((2, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 10
Sg = np.array((1, 0)).reshape(1, 2)
Sd = np.array((0, 0)).reshape(1, 2)
Sb = np.array((0, 2.135)).reshape(1, 2)
Ss = np.array((0, 0)).reshape(1, 2)

economy = Economy(β=β, Sg=Sg, Sd=Sd, Sb=Sb, Ss=Ss,


discrete=False, proc=(A, C))

T = 50
path = compute_paths(T, economy)
gen_fig_1(path)
78.5. EXAMPLES 1303

The legends on the figures indicate the variables being tracked


Most obvious from the figure is tax smoothing in the sense that tax revenue is much less vari-
able than government expenditure

In [4]: gen_fig_2(path)
1304 78. OPTIMAL TAXATION IN AN LQ ECONOMY

See the original manuscript for comments and interpretation

78.5.2 The Discrete Case

Our second example adopts a discrete Markov specification for the exogenous process

In [5]: # == Parameters == #
β = 1 / 1.05
P = np.array([[0.8, 0.2, 0.0],
[0.0, 0.5, 0.5],
[0.0, 0.0, 1.0]])

# == Possible states of the world == #


# Each column is a state of the world. The rows are [g d b s 1]
x_vals = np.array([[0.5, 0.5, 0.25],
[0.0, 0.0, 0.0],
[2.2, 2.2, 2.2],
[0.0, 0.0, 0.0],
[1.0, 1.0, 1.0]])

Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 1, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 1, 0, 0)).reshape(1, 5)
78.5. EXAMPLES 1305

Ss = np.array((0, 0, 0, 1, 0)).reshape(1, 5)

economy = Economy(β=β, Sg=Sg, Sd=Sd, Sb=Sb, Ss=Ss,


discrete=True, proc=(P, x_vals))

T = 15
path = compute_paths(T, economy)
gen_fig_1(path)

The call gen_fig_2(path) generates

In [6]: gen_fig_2(path)
1306 78. OPTIMAL TAXATION IN AN LQ ECONOMY

See the original manuscript for comments and interpretation

78.6 Exercises

78.6.1 Exercise 1

Modify the VAR example given above, setting

𝑔𝑡+1 − 𝜇𝑔 = 𝜌(𝑔𝑡−3 − 𝜇𝑔 ) + 𝐶𝑔 𝑤𝑔,𝑡+1

with 𝜌 = 0.95 and 𝐶𝑔 = 0.7√1 − 𝜌2


Produce the corresponding figures
78.7. SOLUTIONS 1307

78.7 Solutions

78.7.1 Exercise 1

In [7]: # == Parameters == #
β = 1 / 1.05
ρ, mg = .95, .35
A = np.array([[0, 0, 0, ρ, mg*(1-ρ)],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1]])
C = np.zeros((5, 1))
C[0, 0] = np.sqrt(1 - ρ**2) * mg / 8
Sg = np.array((1, 0, 0, 0, 0)).reshape(1, 5)
Sd = np.array((0, 0, 0, 0, 0)).reshape(1, 5)
Sb = np.array((0, 0, 0, 0, 2.135)).reshape(1, 5) # Chosen st. (Sc + Sg) * x0 = 1
Ss = np.array((0, 0, 0, 0, 0)).reshape(1, 5)

economy = Economy(β=β, Sg=Sg, Sd=Sd, Sb=Sb,


Ss=Ss, discrete=False, proc=(A, C))

T = 50
path = compute_paths(T, economy)

gen_fig_1(path)

In [8]: gen_fig_2(path)
1308 78. OPTIMAL TAXATION IN AN LQ ECONOMY
79

Optimal Taxation with


State-Contingent Debt

79.1 Contents

• Overview 79.2

• A Competitive Equilibrium with Distorting Taxes 79.3

• Recursive Formulation of the Ramsey Problem 79.4

• Examples 79.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

79.2 Overview

This lecture describes a celebrated model of optimal fiscal policy by Robert E. Lucas, Jr., and
Nancy Stokey [90]
The model revisits classic issues about how to pay for a war
Here a war means a more or less temporary surge in an exogenous government expenditure
process
The model features

• a government that must finance an exogenous stream of government expenditures with


either

– a flat rate tax on labor, or


– purchases and sales from a full array of Arrow state-contingent securities

• a representative household that values consumption and leisure

• a linear production function mapping labor into a single good

1309
1310 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

• a Ramsey planner who at time 𝑡 = 0 chooses a plan for taxes and trades of Arrow secu-
rities for all 𝑡 ≥ 0

After first presenting the model in a space of sequences, we shall represent it recursively
in terms of two Bellman equations formulated along lines that we encountered in Dynamic
Stackelberg models
As in Dynamic Stackelberg models, to apply dynamic programming we shall define the state
vector artfully
In particular, we shall include forward-looking variables that summarize optimal responses of
private agents to a Ramsey plan
See Optimal taxation for analysis within a linear-quadratic setting

79.3 A Competitive Equilibrium with Distorting Taxes

For 𝑡 ≥ 0, a history 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ] of an exogenous state 𝑠𝑡 has joint probability density
𝜋𝑡 (𝑠𝑡 )
We begin by assuming that government purchases 𝑔𝑡 (𝑠𝑡 ) at time 𝑡 ≥ 0 depend on 𝑠𝑡
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 and date 𝑡
A representative household is endowed with one unit of time that can be divided between
leisure ℓ𝑡 and labor 𝑛𝑡 :

𝑛𝑡 (𝑠𝑡 ) + ℓ𝑡 (𝑠𝑡 ) = 1 (1)

Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between 𝑐𝑡 (𝑠𝑡 ) and 𝑔𝑡 (𝑠𝑡 )

𝑐𝑡 (𝑠𝑡 ) + 𝑔𝑡 (𝑠𝑡 ) = 𝑛𝑡 (𝑠𝑡 ) (2)

A representative household’s preferences over {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞


𝑡=0 are ordered by


∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡

where the utility function 𝑢 is increasing, strictly concave, and three times continuously dif-
ferentiable in both arguments
The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡
The government imposes a flat-rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡
There are complete markets in one-period Arrow securities
One unit of an Arrow security issued at time 𝑡 at history 𝑠𝑡 and promising to pay one unit of
time 𝑡 + 1 consumption in state 𝑠𝑡+1 costs 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
The government issues one-period Arrow securities each period
The government has a sequence of budget constraints whose time 𝑡 ≥ 0 component is
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1311

𝑔𝑡 (𝑠𝑡 ) = 𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) − 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) (4)
𝑠𝑡+1

where

• 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) is a competitive equilibrium price of one unit of consumption at date 𝑡 + 1
in state 𝑠𝑡+1 at date 𝑡 and history 𝑠𝑡
• 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) is government debt falling due at time 𝑡, history 𝑠𝑡

Government debt 𝑏0 (𝑠0 ) is an exogenous initial condition


The representative household has a sequence of budget constraints whose time 𝑡 ≥ 0 compo-
nent is

𝑐𝑡 (𝑠𝑡 ) + ∑ 𝑝𝑡 (𝑠𝑡+1 |𝑠𝑡 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = [1 − 𝜏𝑡 (𝑠𝑡 )] 𝑛𝑡 (𝑠𝑡 ) + 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) ∀𝑡 ≥ 0 (5)
𝑠𝑡+1

A government policy is an exogenous sequence {𝑔(𝑠𝑡 )}∞ 𝑡 ∞


𝑡=0 , a tax rate sequence {𝜏𝑡 (𝑠 )}𝑡=0 ,
𝑡+1 ∞
and a government debt sequence {𝑏𝑡+1 (𝑠 )}𝑡=0
A feasible allocation is a consumption-labor supply plan {𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 )}∞
𝑡=0 that satisfies
Eq. (2) at all 𝑡, 𝑠𝑡
A price system is a sequence of Arrow security prices {𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )}∞
𝑡=0

The household faces the price system as a price-taker and takes the government policy as
given
The household chooses {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞
𝑡=0 to maximize Eq. (3) subject to Eq. (5) and Eq. (1)
for all 𝑡, 𝑠𝑡
A competitive equilibrium with distorting taxes is a feasible allocation, a price system,
and a government policy such that

• Given the price system and the government policy, the allocation solves the household’s
optimization problem
• Given the allocation, government policy, and price system, the government’s budget
constraint is satisfied for all 𝑡, 𝑠𝑡

Note: There are many competitive equilibria with distorting taxes


They are indexed by different government policies
The Ramsey problem or optimal taxation problem is to choose a competitive equilib-
rium with distorting taxes that maximizes Eq. (3)

79.3.1 Arrow-Debreu Version of Price System

We find it convenient sometimes to work with the Arrow-Debreu price system that is implied
by a sequence of Arrow securities prices
Let 𝑞𝑡0 (𝑠𝑡 ) be the price at time 0, measured in time 0 consumption goods, of one unit of con-
sumption at time 𝑡, history 𝑠𝑡
1312 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

The following recursion relates Arrow-Debreu prices {𝑞𝑡0 (𝑠𝑡 )}∞


𝑡=0 to Arrow securities prices
𝑡 ∞
{𝑝𝑡+1 (𝑠𝑡+1 |𝑠 )}𝑡=0

0
𝑞𝑡+1 (𝑠𝑡+1 ) = 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )𝑞𝑡0 (𝑠𝑡 ) 𝑠.𝑡. 𝑞00 (𝑠0 ) = 1 (6)

Arrow-Debreu prices are useful when we want to compress a sequence of budget constraints
into a single intertemporal budget constraint, as we shall find it convenient to do below

79.3.2 Primal Approach

We apply a popular approach to solving a Ramsey problem, called the primal approach
The idea is to use first-order conditions for household optimization to eliminate taxes and
prices in favor of quantities, then pose an optimization problem cast entirely in terms of
quantities
After Ramsey quantities have been found, taxes and prices can then be unwound from the
allocation
The primal approach uses four steps:

1. Obtain first-order conditions of the household’s problem and solve them for
{𝑞𝑡0 (𝑠𝑡 ), 𝜏𝑡 (𝑠𝑡 )}∞ 𝑡 𝑡 ∞
𝑡=0 as functions of the allocation {𝑐𝑡 (𝑠 ), 𝑛𝑡 (𝑠 )}𝑡=0
2. Substitute these expressions for taxes and prices in terms of the allocation into the
household’s present-value budget constraint

• This intertemporal constraint involves only the allocation and is regarded as an imple-
mentability constraint

1. Find the allocation that maximizes the utility of the representative household Eq. (3)
subject to the feasibility constraints Eq. (1) and Eq. (2) and the implementability condi-
tion derived in step 2

• This optimal allocation is called the Ramsey allocation

1. Use the Ramsey allocation together with the formulas from step 1 to find taxes and
prices

79.3.3 The Implementability Constraint

By sequential substitution of one one-period budget constraint Eq. (5) into another, we can
obtain the household’s present-value budget constraint:

∞ ∞
∑ ∑ 𝑞𝑡0 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) = ∑ ∑ 𝑞𝑡0 (𝑠𝑡 )[1 − 𝜏𝑡 (𝑠𝑡 )]𝑛𝑡 (𝑠𝑡 ) + 𝑏0 (7)
𝑡=0 𝑠𝑡 𝑡=0 𝑠𝑡

{𝑞𝑡0 (𝑠𝑡 )}∞


𝑡=1 can be interpreted as a time 0 Arrow-Debreu price system

To approach the Ramsey problem, we study the household’s optimization problem


79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1313

First-order conditions for the household’s problem for ℓ𝑡 (𝑠𝑡 ) and 𝑏𝑡 (𝑠𝑡+1 |𝑠𝑡 ), respectively, im-
ply

𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (8)
𝑢𝑐 (𝑠𝑡 )

and

𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽𝜋(𝑠𝑡+1 |𝑠𝑡 ) ( ) (9)
𝑢𝑐 (𝑠𝑡 )

where 𝜋(𝑠𝑡+1 |𝑠𝑡 ) is the probability distribution of 𝑠𝑡+1 conditional on history 𝑠𝑡


Equation Eq. (9) implies that the Arrow-Debreu price system satisfies

𝑢𝑐 (𝑠𝑡 )
𝑞𝑡0 (𝑠𝑡 ) = 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ) (10)
𝑢𝑐 (𝑠0 )

Using the first-order conditions Eq. (8) and Eq. (9) to eliminate taxes and prices from
Eq. (7), we derive the implementability condition


∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )[𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝑢𝑐 (𝑠0 )𝑏0 = 0 (11)
𝑡=0 𝑠𝑡

The Ramsey problem is to choose a feasible allocation that maximizes


∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] (12)
𝑡=0 𝑠𝑡

subject to Eq. (11)

79.3.4 Solution Details

First, define a “pseudo utility function”

𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] = 𝑢[𝑐𝑡 (𝑠𝑡 ), 1 − 𝑛𝑡 (𝑠𝑡 )] + Φ [𝑢𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] (13)

where Φ is a Lagrange multiplier on the implementability condition Eq. (7)


Next form the Lagrangian


𝐽 = ∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 ){𝑉 [𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ), Φ] + 𝜃𝑡 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )]} − Φ𝑢𝑐 (0)𝑏0 (14)
𝑡=0 𝑠𝑡

where {𝜃𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 is a sequence of Lagrange multipliers on the feasible conditions Eq. (2)
Given an initial government debt 𝑏0 , we want to maximize 𝐽 with respect to
{𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0 and to minimize with respect to {𝜃(𝑠𝑡 ); ∀𝑠𝑡 }𝑡≥0
1314 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

The first-order conditions for the Ramsey problem for periods 𝑡 ≥ 1 and 𝑡 = 0, respectively,
are

𝑐𝑡 (𝑠𝑡 )∶ (1 + Φ)𝑢𝑐 (𝑠𝑡 ) + Φ [𝑢𝑐𝑐 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) − 𝑢ℓ𝑐 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 )] − 𝜃𝑡 (𝑠𝑡 ) = 0, 𝑡≥1
𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡
(15)
𝑛𝑡 (𝑠 )∶ − (1 + Φ)𝑢ℓ (𝑠 ) − Φ [𝑢𝑐ℓ (𝑠 )𝑐𝑡 (𝑠 ) − 𝑢ℓℓ (𝑠 )𝑛𝑡 (𝑠 )] + 𝜃𝑡 (𝑠 ) = 0, 𝑡≥1

and

𝑐0 (𝑠0 , 𝑏0 )∶ (1 + Φ)𝑢𝑐 (𝑠0 , 𝑏0 ) + Φ [𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓ𝑐 (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] − 𝜃0 (𝑠0 , 𝑏0 )
− Φ𝑢𝑐𝑐 (𝑠0 , 𝑏0 )𝑏0 = 0
(16)
𝑛0 (𝑠0 , 𝑏0 )∶ − (1 + Φ)𝑢ℓ (𝑠0 , 𝑏0 ) − Φ [𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑐0 (𝑠0 , 𝑏0 ) − 𝑢ℓℓ (𝑠0 , 𝑏0 )𝑛0 (𝑠0 , 𝑏0 )] + 𝜃0 (𝑠0 , 𝑏0 )
+ Φ𝑢𝑐ℓ (𝑠0 , 𝑏0 )𝑏0 = 0

Please note how these first-order conditions differ between 𝑡 = 0 and 𝑡 ≥ 1


It is instructive to use first-order conditions Eq. (15) for 𝑡 ≥ 1 to eliminate the multipliers
𝜃𝑡 (𝑠𝑡 )
For convenience, we suppress the time subscript and the index 𝑠𝑡 and obtain

(1 + Φ)𝑢𝑐 (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐𝑐 (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓ𝑐 (𝑐, 1 − 𝑐 − 𝑔)]


(17)
= (1 + Φ)𝑢ℓ (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐ℓ (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓℓ (𝑐, 1 − 𝑐 − 𝑔)]

where we have imposed conditions Eq. (1) and Eq. (2)


Equation Eq. (17) is one equation that can be solved to express the unknown 𝑐 as a function
of the exogenous variable 𝑔
We also know that time 𝑡 = 0 quantities 𝑐0 and 𝑛0 satisfy

(1 + Φ)𝑢𝑐 (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐𝑐 (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓ𝑐 (𝑐, 1 − 𝑐 − 𝑔)]


= (1 + Φ)𝑢ℓ (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐ℓ (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓℓ (𝑐, 1 − 𝑐 − 𝑔)] + Φ(𝑢𝑐𝑐 − 𝑢𝑐,ℓ )𝑏0
(18)
Notice that a counterpart to 𝑏0 does not appear in Eq. (17), so 𝑐 does not depend on it for
𝑡≥1
But things are different for time 𝑡 = 0
An analogous argument for the 𝑡 = 0 equations Eq. (16) leads to one equation that can be
solved for 𝑐0 as a function of the pair (𝑔(𝑠0 ), 𝑏0 )
These outcomes mean that the following statement would be true even when government pur-
chases are history-dependent functions 𝑔𝑡 (𝑠𝑡 ) of the history of 𝑠𝑡
Proposition: If government purchases are equal after two histories 𝑠𝑡 and 𝑠𝜏̃ for 𝑡, 𝜏 ≥ 0, i.e.,
if

𝑔𝑡 (𝑠𝑡 ) = 𝑔𝜏 (𝑠𝜏̃ ) = 𝑔

then it follows from Eq. (17) that the Ramsey choices of consumption and leisure,
(𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )) and (𝑐𝑗 (𝑠𝜏̃ ), ℓ𝑗 (𝑠𝜏̃ )), are identical
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1315

The proposition asserts that the optimal allocation is a function of the currently realized
quantity of government purchases 𝑔 only and does not depend on the specific history that
preceded that realization of 𝑔

79.3.5 The Ramsey Allocation for a Given Multiplier

Temporarily take Φ as given


We shall compute 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ) from the first-order conditions Eq. (16)
Evidently, for 𝑡 ≥ 1, 𝑐 and 𝑛 depend on the time 𝑡 realization of 𝑔 only
But for 𝑡 = 0, 𝑐 and 𝑛 depend on both 𝑔0 and the government’s initial debt 𝑏0
Thus, while 𝑏0 influences 𝑐0 and 𝑛0 , there appears no analogous variable 𝑏𝑡 that influences 𝑐𝑡
and 𝑛𝑡 for 𝑡 ≥ 1
The absence of 𝑏𝑡 as a determinant of the Ramsey allocation for 𝑡 ≥ 1 and its presence for
𝑡 = 0 is a symptom of the time-inconsistency of a Ramsey plan
Φ has to take a value that assures that the household and the government’s budget con-
straints are both satisfied at a candidate Ramsey allocation and price system associated with
that Φ

79.3.6 Further Specialization

At this point, it is useful to specialize the model in the following ways


We assume that 𝑠 is governed by a finite state Markov chain with states 𝑠 ∈ [1, … , 𝑆] and
transition matrix Π, where

Π(𝑠′ |𝑠) = Prob(𝑠𝑡+1 = 𝑠′ |𝑠𝑡 = 𝑠)

Also, assume that government purchases 𝑔 are an exact time-invariant function 𝑔(𝑠) of 𝑠
We maintain these assumptions throughout the remainder of this lecture

79.3.7 Determining the Multiplier

We complete the Ramsey plan by computing the Lagrange multiplier Φ on the implementabil-
ity constraint Eq. (11)
Government budget balance restricts Φ via the following line of reasoning
The household’s first-order conditions imply

𝑢𝑙 (𝑠𝑡 )
(1 − 𝜏𝑡 (𝑠𝑡 )) = (19)
𝑢𝑐 (𝑠𝑡 )

and the implied one-period Arrow securities prices

𝑢𝑐 (𝑠𝑡+1 )
𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝛽Π(𝑠𝑡+1 |𝑠𝑡 ) (20)
𝑢𝑐 (𝑠𝑡 )
1316 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

Substituting from Eq. (19), Eq. (20), and the feasibility condition Eq. (2) into the recursive
version Eq. (5) of the household budget constraint gives

𝑢𝑐 (𝑠𝑡 )[𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 )] + 𝛽 ∑ Π(𝑠𝑡+1 |𝑠𝑡 )𝑢𝑐 (𝑠𝑡+1 )𝑏𝑡+1 (𝑠𝑡+1 |𝑠𝑡 )
𝑠𝑡+1 (21)
𝑡 𝑡 𝑡 𝑡−1
= 𝑢𝑙 (𝑠 )𝑛𝑡 (𝑠 ) + 𝑢𝑐 (𝑠 )𝑏𝑡 (𝑠𝑡 |𝑠 )

Define 𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 )


Notice that 𝑥𝑡 (𝑠𝑡 ) appears on the right side of Eq. (21) while 𝛽 times the conditional expecta-
tion of 𝑥𝑡+1 (𝑠𝑡+1 ) appears on the left side
Hence the equation shares much of the structure of a simple asset pricing equation with 𝑥𝑡
being analogous to the price of the asset at time 𝑡
We learned earlier that for a Ramsey allocation 𝑐𝑡 (𝑠𝑡 ), 𝑛𝑡 (𝑠𝑡 ) and 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ), and therefore
also 𝑥𝑡 (𝑠𝑡 ), are each functions of 𝑠𝑡 only, being independent of the history 𝑠𝑡−1 for 𝑡 ≥ 1
That means that we can express equation Eq. (21) as

𝑢𝑐 (𝑠)[𝑛(𝑠) − 𝑔(𝑠)] + 𝛽 ∑ Π(𝑠′ |𝑠)𝑥′ (𝑠′ ) = 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (22)


𝑠′

where 𝑠′ denotes a next period value of 𝑠 and 𝑥′ (𝑠′ ) denotes a next period value of 𝑥
Equation Eq. (22) is easy to solve for 𝑥(𝑠) for 𝑠 = 1, … , 𝑆
If we let 𝑛,⃗ 𝑔,⃗ 𝑥⃗ denote 𝑆 × 1 vectors whose 𝑖th elements are the respective 𝑛, 𝑔, and 𝑥 values
when 𝑠 = 𝑖, and let Π be the transition matrix for the Markov state 𝑠, then we can express
Eq. (22) as the matrix equation

𝑢⃗𝑐 (𝑛⃗ − 𝑔)⃗ + 𝛽Π𝑥⃗ = 𝑢⃗𝑙 𝑛⃗ + 𝑥⃗ (23)

This is a system of 𝑆 linear equations in the 𝑆 × 1 vector 𝑥, whose solution is

𝑥⃗ = (𝐼 − 𝛽Π)−1 [𝑢⃗𝑐 (𝑛⃗ − 𝑔)⃗ − 𝑢⃗𝑙 𝑛]⃗ (24)

In these equations, by 𝑢⃗𝑐 𝑛,⃗ for example, we mean element-by-element multiplication of the
two vectors
𝑥(𝑠)
After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or the
matrix equation

𝑥⃗
𝑏⃗ = (25)
𝑢⃗𝑐

where division here means an element-by-element division of the respective components of the
𝑆 × 1 vectors 𝑥⃗ and 𝑢⃗𝑐
Here is a computational algorithm:

1. Start with a guess for the value for Φ, then use the first-order conditions and the feasi-
bility conditions to compute 𝑐(𝑠𝑡 ), 𝑛(𝑠𝑡 ) for 𝑠 ∈ [1, … , 𝑆] and 𝑐0 (𝑠0 , 𝑏0 ) and 𝑛0 (𝑠0 , 𝑏0 ),
given Φ
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1317

• these are 2(𝑆 + 1) equations in 2(𝑆 + 1) unknowns

1. Solve the 𝑆 equations Eq. (24) for the 𝑆 elements of 𝑥⃗

• these depend on Φ

1. Find a Φ that satisfies

𝑆
𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠|𝑠0 )𝑥(𝑠) (26)
𝑠=1

by gradually raising Φ if the left side of Eq. (26) exceeds the right side and lowering Φ if the
left side is less than the right side
1. After computing a Ramsey allocation, recover the flat tax rate on labor from Eq. (8) and
the implied one-period Arrow securities prices from Eq. (9)
In summary, when 𝑔𝑡 is a time-invariant function of a Markov state 𝑠𝑡 , a Ramsey plan can be
constructed by solving 3𝑆 + 3 equations in 𝑆 components each of 𝑐,⃗ 𝑛,⃗ and 𝑥⃗ together with
𝑛0 , 𝑐0 , and Φ

79.3.8 Time Inconsistency

Let {𝜏𝑡 (𝑠𝑡 )}∞ 𝑡 ∞


𝑡=0 , {𝑏𝑡+1 (𝑠𝑡+1 |𝑠 )}𝑡=0 be a time 0, state 𝑠0 Ramsey plan

Then {𝜏𝑗 (𝑠𝑗 )}∞ 𝑗 ∞ 𝑡


𝑗=𝑡 , {𝑏𝑗+1 (𝑠𝑗+1 |𝑠 )}𝑗=𝑡 is a time 𝑡, history 𝑠 continuation of a time 0, state 𝑠0
Ramsey plan
A time 𝑡, history 𝑠𝑡 Ramsey plan is a Ramsey plan that starts from initial conditions
𝑠𝑡 , 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 )
A time 𝑡, history 𝑠𝑡 continuation of a time 0, state 0 Ramsey plan is not a time 𝑡, history 𝑠𝑡
Ramsey plan
The means that a Ramsey plan is not time consistent
Another way to say the same thing is that a Ramsey plan is time inconsistent
The reason is that a continuation Ramsey plan takes 𝑢𝑐𝑡 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as given, not 𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 )
We shall discuss this more below

79.3.9 Specification with CRRA Utility

In our calculations below and in a subsequent lecture based on an extension of the Lucas-
Stokey model by Aiyagari, Marcet, Sargent, and Seppälä (2002) [5], we shall modify the one-
period utility function assumed above
(We adopted the preceding utility specification because it was the one used in the original
[90] paper)
We will modify their specification by instead assuming that the representative agent has util-
ity function
1318 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾

where 𝜎 > 0, 𝛾 > 0


We continue to assume that

𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡

We eliminate leisure from the model


We also eliminate Lucas and Stokey’s restriction that ℓ𝑡 + 𝑛𝑡 ≤ 1
We replace these two things with the assumption that labor 𝑛𝑡 ∈ [0, +∞]
With these adjustments, the analysis of Lucas and Stokey prevails once we make the following
replacements

𝑢ℓ (𝑐, ℓ) ∼ −𝑢𝑛 (𝑐, 𝑛)


𝑢𝑐 (𝑐, ℓ) ∼ 𝑢𝑐 (𝑐, 𝑛)
𝑢ℓ,ℓ (𝑐, ℓ) ∼ 𝑢𝑛𝑛 (𝑐, 𝑛)
𝑢𝑐,𝑐 (𝑐, ℓ) ∼ 𝑢𝑐,𝑐 (𝑐, 𝑛)
𝑢𝑐,ℓ (𝑐, ℓ) ∼ 0

With these understandings, equations Eq. (17) and Eq. (18) simplify in the case of the CRRA
utility function
They become

(1 + Φ)[𝑢𝑐 (𝑐) + 𝑢𝑛 (𝑐 + 𝑔)] + Φ[𝑐𝑢𝑐𝑐 (𝑐) + (𝑐 + 𝑔)𝑢𝑛𝑛 (𝑐 + 𝑔)] = 0 (27)

and

(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (28)

In equation Eq. (27), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :

𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (29)
𝑅0

where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate
In equation Eq. (29), it is understood that

𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0
79.3. A COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1319

79.3.10 Sequence Implementation

The above steps are implemented in a class called SequentialAllocation

In [2]: import numpy as np


from scipy.optimize import root
from quantecon import MarkovChain

class SequentialAllocation:

'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''

def __init__(self, model):

# Initialize from model object attributes


self.β, self.π, self.G = model.β, model.π, model.G
self.mc, self.Θ = MarkovChain(self.π), model.Θ
self.S = len(model.π) # Number of states
self.model = model

# Find the first best allocation


self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))

if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]

# Multiplier on the resource constraint


self.ΞFB = Uc(self.cFB, self.nFB)
self.zFB = np.hstack([self.cFB, self.nFB, self.ΞFB])

def time1_allocation(self, μ):


'''
Computes optimal allocation for time t >= 1 for a given μ
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])

# Find the root of the first-order condition


res = root(FOC, self.zFB)
1320 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

if not res.success:
raise Exception('Could not find LS allocation.')
z = res.x
c, n, Ξ = z[:S], z[S:2 * S], z[2 * S:]

# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)

return c, n, x, Ξ

def time0_allocation(self, B_, s_0):


'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
model, π, Θ, G, β = self.model, self.π, self.Θ, self.G, self.β
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

# First order conditions of planner's problem


def FOC(z):
μ, c, n, Ξ = z
xprime = self.time1_allocation(μ)[2]
return np.hstack([Uc(c, n) * (c - B_) + Un(c, n) * n + β * π[s_0] @ xprime,
Uc(c, n) - μ * (Ucc(c, n) *
(c - B_) + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n +
Un(c, n)) + Θ[s_0] * Ξ,
(Θ * n - c - G)[s_0]])

# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')

return res.x

def time1_value(self, μ):


'''
Find the value associated with multiplier μ
'''
c, n, x, Ξ = self.time1_allocation(μ)
U = self.model.U(c, n)
V = np.linalg.solve(np.eye(self.S) - self.β * self.π, U)
return c, n, x, V

def Τ(self, c, n):


'''
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π, β = self.model, self.π, self.β
Uc = model.Uc

if sHist is None:
sHist = self.mc.simulate(T, s_0)

cHist, nHist, Bhist, ΤHist, μHist = np.zeros((5, T))


RHist = np.zeros(T - 1)

# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ
79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1321

# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ

return np.array([cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist])

79.4 Recursive Formulation of the Ramsey Problem

𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) in equation Eq. (21) appears to be a purely “forward-looking” vari-
able
But 𝑥𝑡 (𝑠𝑡 ) is a also a natural candidate for a state variable in a recursive formulation of the
Ramsey problem

79.4.1 Intertemporal Delegation

To express a Ramsey plan recursively, we imagine that a time 0 Ramsey planner is followed
by a sequence of continuation Ramsey planners at times 𝑡 = 1, 2, …
A “continuation Ramsey planner” has a different objective function and faces different con-
straints than a Ramsey planner
A key step in representing a Ramsey plan recursively is to regard the marginal utility scaled
government debts 𝑥𝑡 (𝑠𝑡 ) = 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) as predetermined quantities that continuation
Ramsey planners at times 𝑡 ≥ 1 are obligated to attain
Continuation Ramsey planners do this by choosing continuation policies that induce the rep-
resentative household to make choices that imply that 𝑢𝑐 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡 |𝑠𝑡−1 ) = 𝑥𝑡 (𝑠𝑡 )
A time 𝑡 ≥ 1 continuation Ramsey planner delivers 𝑥𝑡 by choosing a suitable 𝑛𝑡 , 𝑐𝑡 pair and
a list of 𝑠𝑡+1 -contingent continuation quantities 𝑥𝑡+1 to bequeath to a time 𝑡 + 1 continuation
Ramsey planner
A time 𝑡 ≥ 1 continuation Ramsey planner faces 𝑥𝑡 , 𝑠𝑡 as state variables
But the time 0 Ramsey planner faces 𝑏0 , not 𝑥0 , as a state variable
Furthermore, the Ramsey planner cares about (𝑐0 (𝑠0 ), ℓ0 (𝑠0 )), while continuation Ramsey
planners do not
The time 0 Ramsey planner hands 𝑥1 as a function of 𝑠1 to a time 1 continuation Ramsey
planner
These lines of delegated authorities and responsibilities across time express the continuation
Ramsey planners’ obligations to implement their parts of the original Ramsey plan, designed
once-and-for-all at time 0
1322 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

79.4.2 Two Bellman Equations

After 𝑠𝑡 has been realized at time 𝑡 ≥ 1, the state variables confronting the time 𝑡 continua-
tion Ramsey planner are (𝑥𝑡 , 𝑠𝑡 )

• Let 𝑉 (𝑥, 𝑠) be the value of a continuation Ramsey plan at 𝑥𝑡 = 𝑥, 𝑠𝑡 = 𝑠 for 𝑡 ≥ 1


• Let 𝑊 (𝑏, 𝑠) be the value of a Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠

We work backward by presenting a Bellman equation for 𝑉 (𝑥, 𝑠) first, then a Bellman equa-
tion for 𝑊 (𝑏, 𝑠)

79.4.3 The Continuation Ramsey Problem

The Bellman equation for a time 𝑡 ≥ 1 continuation Ramsey planner is

𝑉 (𝑥, 𝑠) = max 𝑢(𝑛 − 𝑔(𝑠), 1 − 𝑛) + 𝛽 ∑ Π(𝑠′ |𝑠)𝑉 (𝑥′ , 𝑠′ ) (30)


𝑛,{𝑥′ (𝑠′ )}
𝑠′ ∈𝑆

where maximization over 𝑛 and the 𝑆 elements of 𝑥′ (𝑠′ ) is subject to the single imple-
mentability constraint for 𝑡 ≥ 1

𝑥 = 𝑢𝑐 (𝑛 − 𝑔(𝑠)) − 𝑢𝑙 𝑛 + 𝛽 ∑ Π(𝑠′ |𝑠)𝑥′ (𝑠′ ) (31)


𝑠′ ∈𝑆

Here 𝑢𝑐 and 𝑢𝑙 are today’s values of the marginal utilities


For each given value of 𝑥, 𝑠, the continuation Ramsey planner chooses 𝑛 and 𝑥′ (𝑠′ ) for each
𝑠′ ∈ 𝑆
Associated with a value function 𝑉 (𝑥, 𝑠) that solves Bellman equation Eq. (30) are 𝑆 + 1
time-invariant policy functions

𝑛𝑡 = 𝑓(𝑥𝑡 , 𝑠𝑡 ), 𝑡≥1
(32)
𝑥𝑡+1 (𝑠𝑡+1 ) = ℎ(𝑠𝑡+1 ; 𝑥𝑡 , 𝑠𝑡 ), 𝑠𝑡+1 ∈ 𝑆, 𝑡 ≥ 1

79.4.4 The Ramsey Problem

The Bellman equation for the time 0 Ramsey planner is

𝑊 (𝑏0 , 𝑠0 ) = max 𝑢(𝑛0 − 𝑔0 , 1 − 𝑛0 ) + 𝛽 ∑ Π(𝑠1 |𝑠0 )𝑉 (𝑥′ (𝑠1 ), 𝑠1 ) (33)


𝑛0 ,{𝑥′ (𝑠1 )}
𝑠1 ∈𝑆

where maximization over 𝑛0 and the 𝑆 elements of 𝑥′ (𝑠1 ) is subject to the time 0 imple-
mentability constraint

𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝛽 ∑ Π(𝑠1 |𝑠0 )𝑥′ (𝑠1 ) (34)


𝑠1 ∈𝑆

coming from restriction Eq. (26)


79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1323

Associated with a value function 𝑊 (𝑏0 , 𝑛0 ) that solves Bellman equation Eq. (33) are 𝑆 + 1
time 0 policy functions

𝑛0 = 𝑓0 (𝑏0 , 𝑠0 )
(35)
𝑥1 (𝑠1 ) = ℎ0 (𝑠1 ; 𝑏0 , 𝑠0 )

Notice the appearance of state variables (𝑏0 , 𝑠0 ) in the time 0 policy functions for the Ramsey
planner as compared to (𝑥𝑡 , 𝑠𝑡 ) in the policy functions Eq. (32) for the time 𝑡 ≥ 1 continua-
tion Ramsey planners
The value function 𝑉 (𝑥𝑡 , 𝑠𝑡 ) of the time 𝑡 continuation Ramsey planner equals

𝐸𝑡 ∑𝜏=𝑡 𝛽 𝜏−𝑡 𝑢(𝑐𝑡 , 𝑙𝑡 ), where the consumption and leisure processes are evaluated along the
original time 0 Ramsey plan

79.4.5 First-Order Conditions

Attach a Lagrange multiplier Φ1 (𝑥, 𝑠) to constraint Eq. (31) and a Lagrange multiplier Φ0 to
constraint Eq. (26)
Time 𝑡 ≥ 1: the first-order conditions for the time 𝑡 ≥ 1 constrained maximization problem on
the right side of the continuation Ramsey planner’s Bellman equation Eq. (30) are

𝛽Π(𝑠′ |𝑠)𝑉𝑥 (𝑥′ , 𝑠′ ) − 𝛽Π(𝑠′ |𝑠)Φ1 = 0 (36)

for 𝑥′ (𝑠′ ) and

(1 + Φ1 )(𝑢𝑐 − 𝑢𝑙 ) + Φ1 [𝑛(𝑢𝑙𝑙 − 𝑢𝑙𝑐 ) + (𝑛 − 𝑔(𝑠))(𝑢𝑐𝑐 − 𝑢𝑙𝑐 )] = 0 (37)

for 𝑛
Given Φ1 , equation Eq. (37) is one equation to be solved for 𝑛 as a function of 𝑠 (or of 𝑔(𝑠))
Equation Eq. (36) implies 𝑉𝑥 (𝑥′ , 𝑠′ ) = Φ1 , while an envelope condition is 𝑉𝑥 (𝑥, 𝑠) = Φ1 , so it
follows that

𝑉𝑥 (𝑥′ , 𝑠′ ) = 𝑉𝑥 (𝑥, 𝑠) = Φ1 (𝑥, 𝑠) (38)

Time 𝑡 = 0: For the time 0 problem on the right side of the Ramsey planner’s Bellman equa-
tion Eq. (33), first-order conditions are

𝑉𝑥 (𝑥(𝑠1 ), 𝑠1 ) = Φ0 (39)

for 𝑥(𝑠1 ), 𝑠1 ∈ 𝑆, and

(1 + Φ0 )(𝑢𝑐,0 − 𝑢𝑛,0 ) + Φ0 [𝑛0 (𝑢𝑙𝑙,0 − 𝑢𝑙𝑐,0 ) + (𝑛0 − 𝑔(𝑠0 ))(𝑢𝑐𝑐,0 − 𝑢𝑐𝑙,0 )]


(40)
− Φ0 (𝑢𝑐𝑐,0 − 𝑢𝑐𝑙,0 )𝑏0 = 0

Notice similarities and differences between the first-order conditions for 𝑡 ≥ 1 and for 𝑡 = 0
An additional term is present in Eq. (40) except in three special cases
1324 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

• 𝑏0 = 0, or
• 𝑢𝑐 is constant (i.e., preferences are quasi-linear in consumption), or
• initial government assets are sufficiently large to finance all government purchases with
interest earnings from those assets so that Φ0 = 0

Except in these special cases, the allocation and the labor tax rate as functions of 𝑠𝑡 differ
between dates 𝑡 = 0 and subsequent dates 𝑡 ≥ 1
Naturally, the first-order conditions in this recursive formulation of the Ramsey problem
agree with the first-order conditions derived when we first formulated the Ramsey plan in the
space of sequences

79.4.6 State Variable Degeneracy

Equations Eq. (39) and Eq. (40) imply that Φ0 = Φ1 and that

𝑉𝑥 (𝑥𝑡 , 𝑠𝑡 ) = Φ0 (41)

for all 𝑡 ≥ 1
When 𝑉 is concave in 𝑥, this implies state-variable degeneracy along a Ramsey plan in the
sense that for 𝑡 ≥ 1, 𝑥𝑡 will be a time-invariant function of 𝑠𝑡
Given Φ0 , this function mapping 𝑠𝑡 into 𝑥𝑡 can be expressed as a vector 𝑥⃗ that solves equa-
tion Eq. (34) for 𝑛 and 𝑐 as functions of 𝑔 that are associated with Φ = Φ0

79.4.7 Manifestations of Time Inconsistency

While the marginal utility adjusted level of government debt 𝑥𝑡 is a key state variable for the
continuation Ramsey planners at 𝑡 ≥ 1, it is not a state variable at time 0
The time 0 Ramsey planner faces 𝑏0 , not 𝑥0 = 𝑢𝑐,0 𝑏0 , as a state variable
The discrepancy in state variables faced by the time 0 Ramsey planner and the time 𝑡 ≥ 1
continuation Ramsey planners captures the differing obligations and incentives faced by the
time 0 Ramsey planner and the time 𝑡 ≥ 1 continuation Ramsey planners

• The time 0 Ramsey planner is obligated to honor government debt 𝑏0 measured in time
0 consumption goods
• The time 0 Ramsey planner can manipulate the value of government debt as measured
by 𝑢𝑐,0 𝑏0
• In contrast, time 𝑡 ≥ 1 continuation Ramsey planners are obligated not to alter values
of debt, as measured by 𝑢𝑐,𝑡 𝑏𝑡 , that they inherit from a preceding Ramsey planner or
continuation Ramsey planner

When government expenditures 𝑔𝑡 are a time-invariant function of a Markov state 𝑠𝑡 , a Ram-


sey plan and associated Ramsey allocation feature marginal utilities of consumption 𝑢𝑐 (𝑠𝑡 )
that, given Φ, for 𝑡 ≥ 1 depend only on 𝑠𝑡 , but that for 𝑡 = 0 depend on 𝑏0 as well
This means that 𝑢𝑐 (𝑠𝑡 ) will be a time-invariant function of 𝑠𝑡 for 𝑡 ≥ 1, but except when 𝑏0 =
0, a different function for 𝑡 = 0
79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1325

This in turn means that prices of one-period Arrow securities 𝑝𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) = 𝑝(𝑠𝑡+1 |𝑠𝑡 ) will
be the same time-invariant functions of (𝑠𝑡+1 , 𝑠𝑡 ) for 𝑡 ≥ 1, but a different function 𝑝0 (𝑠1 |𝑠0 )
for 𝑡 = 0, except when 𝑏0 = 0
The differences between these time 0 and time 𝑡 ≥ 1 objects reflect the Ramsey planner’s
incentive to manipulate Arrow security prices and, through them, the value of initial govern-
ment debt 𝑏0

79.4.8 Recursive Implementation

The above steps are implemented in a class called RecursiveAllocation

In [3]: from scipy.interpolate import UnivariateSpline


from scipy.optimize import fmin_slsqp

class RecursiveAllocation:

'''
Compute the planner's allocation by solving Bellman
equation.
'''

def __init__(self, model, μgrid):

self.β, self.π, self.G = model.β, model.π, model.G


self.mc, self.S = MarkovChain(self.π), len(model.π) # Number of states
self.Θ, self.model, self.μgrid = model.Θ, model, μgrid

# Find the first best allocation


self.solve_time1_bellman()
self.T.time_0 = True # Bellman equation now solves time 0 problem

def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
S = len(model.π)

# First get initial fit


PP = SequentialAllocation(model)
c, n, x, V = map(np.vstack, zip(*map(lambda μ: PP.time1_value(μ), μgrid0)))

Vf, cf, nf, xprimef = {}, {}, {}, {}


for s in range(2):
ind = np.argsort(x[:, s]) # Sort x
c, n, x, V = c[ind], n[ind], x[ind], V[ind] # Sort arrays according to x
cf[s] = UnivariateSpline(x[:, s], c[:, s])
nf[s] = UnivariateSpline(x[:, s], n[:, s])
Vf[s] = UnivariateSpline(x[:, s], V[:, s])
for sprime in range(S):
xprimef[s, sprime] = UnivariateSpline(x[:, s], x[:, s])
policies = [cf, nf, xprimef]

# Create xgrid
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid

# Now iterate on bellman equation


T = BellmanEquation(model, xgrid, policies)
diff = 1
while diff > 1e-7:
PF = T(Vf)
Vfnew, policies = self.fit_policy_function(PF)
diff = 0
1326 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

for s in range(S):
diff = max(diff, np.abs(
(Vf[s](xgrid) - Vfnew[s](xgrid)) / Vf[s](xgrid)).max())
Vf = Vfnew

# Store value function policies and Bellman Equations


self.Vf = Vf
self.policies = policies
self.T = T

def fit_policy_function(self, PF):


'''
Fits the policy functions PF using the points xgrid using UnivariateSpline
'''
xgrid, S = self.xgrid, self.S

Vf, cf, nf, xprimef = {}, {}, {}, {}


for s in range(S):
PFvec = np.vstack(map(lambda x: PF(x, s), xgrid))
Vf[s] = UnivariateSpline(xgrid, PFvec[:, 0], s=0)
cf[s] = UnivariateSpline(xgrid, PFvec[:, 1], s=0, k=1)
nf[s] = UnivariateSpline(xgrid, PFvec[:, 2], s=0, k=1)
for sprime in range(S):
xprimef[s, sprime] = UnivariateSpline(
xgrid, PFvec[:, 3 + sprime], s=0, k=1)

return Vf, [cf, nf, xprimef]

def Τ(self, c, n):


'''
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def time0_allocation(self, B_, s0):


'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
PF = self.T(self.Vf)
z0 = PF(B_, s0)
c0, n0, xprime0 = z0[1], z0[2], z0[3:]
return c0, n0, xprime0

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates Ramsey plan for T periods
'''
model, π = self.model, self.π
Uc = model.Uc
cf, nf, xprimef = self.policies

if sHist is None:
sHist = self.mc.simulate(T, s_0)

cHist, nHist, Bhist, ΤHist, μHist = np.zeros((5, T))


RHist = np.zeros(T - 1)

# Time 0
cHist[0], nHist[0], xprime = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = 0

# Time 1 onward
for t in range(1, T):
s, x = sHist[t], xprime[sHist[t]]
79.4. RECURSIVE FORMULATION OF THE RAMSEY PROBLEM 1327

c, n, xprime = np.empty(self.S), nf[s](x), np.empty(self.S)


for shat in range(self.S):
c[shat] = cf[shat](x)
for sprime in range(self.S):
xprime[sprime] = xprimef[s, sprime](x)

Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[sHist[t - 1]] @ u_c
μHist[t] = self.Vf[s](x, 1)

RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (self.β * Eu_c)

cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n, x / u_c[s], Τ

return np.array([cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist])

class BellmanEquation:

'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''

def __init__(self, model, xgrid, policies0):

self.β, self.π, self.G = model.β, model.π, model.G


self.S = len(model.π) # Number of states
self.Θ, self.model = model.Θ, model

self.xbar = [min(xgrid), max(xgrid)]


self.time_0 = False

self.z0 = {}
cf, nf, xprimef = policies0
for s in range(self.S):
for x in xgrid:
xprime0 = np.empty(self.S)
for sprime in range(self.S):
xprime0[sprime] = xprimef[s, sprime](x)
self.z0[x, s] = np.hstack([cf[s](x), nf[s](x), xprime0])

self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))


if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + Un(self.cFB, self.nFB) * self.nFB
self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)
self.zFB = {}

for s in range(S):
self.zFB[s] = np.hstack([self.cFB[s], self.nFB[s], self.xFB])

def __call__(self, Vf):


'''
1328 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

Given continuation value function, next period return value function, this
period return T(V) and optimal policies
'''
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF

def get_policies_time1(self, x, s, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, = self.model, self.β, self.Θ,
G, S, π = self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])

return -(U(c, n) + β * π[s] @ Vprime)

def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([x - Uc(c, n) * c - Un(c, n) * n - β * π[s] @ xprime,
(Θ * n - c - G)[s]])

out, fx, _, imode, smode = fmin_slsqp(objf,


self.z0[x, s],
f_eqcons=cons,
bounds=[(0, 100), (0, 100)] +
[self.xbar] * S,
full_output=True,
iprint=0,
acc=1e-10)

if imode > 0:
raise Exception(smode)

self.z0[x, s] = out
return np.hstack([-fx, out])

def get_policies_time0(self, B_, s0, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, = self.model, self.β, self.Θ,
G, S, π = self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[0], z[1], z[2:]
Vprime = np.empty(S)
for sprime in range(S):
Vprime[sprime] = Vf[sprime](xprime[sprime])

return -(U(c, n) + β * π[s0] @ Vprime)

def cons(z):
c, n, xprime = z[0], z[1], z[2:]
return np.hstack([-Uc(c, n) * (c - B_) - Un(c, n) * n - β * π[s0] @ xprime,
(Θ * n - c - G)[s0]])

out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,


bounds=[(0, 100), (0, 100)] +
[self.xbar] * S,
full_output=True, iprint=0, acc=1e-10)
79.5. EXAMPLES 1329

if imode > 0:
raise Exception(smode)

return np.hstack([-fx, out])

79.5 Examples

79.5.1 Anticipated One-Period War

This example illustrates in a simple setting how a Ramsey planner manages risk
Government expenditures are known for sure in all periods except one

• For 𝑡 < 3 and 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1


• At 𝑡 = 3 a war occurs with probability 0.5
– If there is war, 𝑔3 = 𝑔ℎ = 0.2
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1

We define the components of the state vector as the following six (𝑡, 𝑔) pairs:
(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 )
We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6
The transition matrix is

0 1 0 0 0 0

⎜0 0 1 0 0 0⎞⎟

⎜ ⎟
0 0 0 0.5 0.5 0⎟
Π=⎜




⎜0 0 0 0 0 1⎟⎟

⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠

Government expenditures at each state are

0.1

⎜0.1⎞⎟

⎜ ⎟
0.1⎟
𝑔=⎜





⎜0.1 ⎟

⎜0.2⎟⎟
⎝0.1⎠

We assume that the representative agent has utility function

𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾

and set 𝜎 = 2, 𝛾 = 2, and the discount factor 𝛽 = 0.9


Note: For convenience in terms of matching our code, we have expressed utility as a function
of 𝑛 rather than leisure 𝑙
This utility function is implemented in the class CRRAutility
1330 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

In [4]: import numpy as np

class CRRAutility:

def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.σ, self.γ = β, σ, γ


self.π, self.G, self.Θ, self.transfers = π, G, Θ, transfers

# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)

# Derivatives of utility function


def Uc(self, c, n):
return c**(-self.σ)

def Ucc(self, c, n):


return -self.σ * c**(-self.σ - 1)

def Un(self, c, n):


return -n**self.γ

def Unn(self, c, n):


return -self.γ * n**(self.γ - 1)

We set initial government debt 𝑏0 = 1


We can now plot the Ramsey tax under both realizations of time 𝑡 = 3 government expendi-
tures

• black when 𝑔3 = .1, and


• red when 𝑔3 = .2

In [5]: import matplotlib.pyplot as plt


%matplotlib inline

time_π = np.array([[0, 1, 0, 0, 0, 0],


[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0.5, 0.5, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1]])

time_G = np.array([0.1, 0.1, 0.1, 0.2, 0.1, 0.1])


time_Θ = np.ones(6) # Θ can in principle be random
time_example = CRRAutility(π=time_π, G=time_G, Θ=time_Θ)

time_allocation = SequentialAllocation(time_example) # Solve sequential problem


sHist_h = np.array([0, 1, 2, 3, 5, 5, 5])
sHist_l = np.array([0, 1, 2, 4, 5, 5, 5])
sim_seq_h = time_allocation.simulate(1, 0, 7, sHist_h)
sim_seq_l = time_allocation.simulate(1, 0, 7, sHist_l)

# Government spending paths


sim_seq_l[4] = time_example.G[sHist_l]
79.5. EXAMPLES 1331

sim_seq_h[4] = time_example.G[sHist_h]

# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]

fig, axes = plt.subplots(3, 2, figsize=(14, 10))


titles = ['Consumption', 'Labor Supply', 'Government Debt',
'Tax Rate', 'Government Spending', 'Output']

for ax, title, sim_l, sim_h in zip(axes.flatten(), titles, sim_seq_l, sim_seq_h):


ax.set(title=title)
ax.plot(sim_l, '-ok', sim_h, '-or', alpha=0.7)
ax.grid()

plt.tight_layout()
plt.show()

Tax smoothing

• the tax rate is constant for all 𝑡 ≥ 1

– For 𝑡 ≥ 1, 𝑡 ≠ 3, this is a consequence of 𝑔𝑡 being the same at all those dates


– For 𝑡 = 3, it is a consequence of the special one-period utility function that we
have assumed
– Under other one-period utility functions, the time 𝑡 = 3 tax rate could be either
higher or lower than for dates 𝑡 ≥ 1, 𝑡 ≠ 3

• the tax rate is the same at 𝑡 = 3 for both the high 𝑔𝑡 outcome and the low 𝑔𝑡 outcome

We have assumed that at 𝑡 = 0, the government owes positive debt 𝑏0


It sets the time 𝑡 = 0 tax rate partly with an eye to reducing the value 𝑢𝑐,0 𝑏0 of 𝑏0
1332 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

It does this by increasing consumption at time 𝑡 = 0 relative to consumption in later periods


This has the consequence of raising the time 𝑡 = 0 value of the gross interest rate for risk-free
loans between periods 𝑡 and 𝑡 + 1, which equals

𝑢𝑐,𝑡
𝑅𝑡 =
𝛽E𝑡 [𝑢𝑐,𝑡+1 ]

A tax policy that makes time 𝑡 = 0 consumption be higher than time 𝑡 = 1 consumption
evidently increases the risk-free rate one-period interest rate, 𝑅𝑡 , at 𝑡 = 0
Raising the time 𝑡 = 0 risk-free interest rate makes time 𝑡 = 0 consumption goods cheaper
relative to consumption goods at later dates, thereby lowering the value 𝑢𝑐,0 𝑏0 of initial gov-
ernment debt 𝑏0
We see this in a figure below that plots the time path for the risk-free interest rate under
both realizations of the time 𝑡 = 3 government expenditure shock
The following plot illustrates how the government lowers the interest rate at time 0 by raising
consumption

In [6]: plt.figure(figsize=(8, 5))


plt.title('Gross Interest Rate')
plt.plot(sim_seq_l[-1], '-ok', sim_seq_h[-1], '-or', alpha=0.7)
plt.grid()
plt.show()

79.5.2 Government Saving

At time 𝑡 = 0 the government evidently dissaves since 𝑏1 > 𝑏0


79.5. EXAMPLES 1333

• This is a consequence of it setting a lower tax rate at 𝑡 = 0, implying more


consumption at 𝑡 = 0

At time 𝑡 = 1, the government evidently saves since it has set the tax rate sufficiently high to
allow it to set 𝑏2 < 𝑏1

• Its motive for doing this is that it anticipates a likely war at 𝑡 = 3

At time 𝑡 = 2 the government trades state-contingent Arrow securities to hedge against war
at 𝑡 = 3

• It purchases a security that pays off when 𝑔3 = 𝑔ℎ


• It sells a security that pays off when 𝑔3 = 𝑔𝑙
• These purchases are designed in such a way that regardless of whether or not
there is a war at 𝑡 = 3, the government will begin period 𝑡 = 4 with the same
government debt
• The time 𝑡 = 4 debt level can be serviced with revenues from the constant
tax rate set at times 𝑡 ≥ 1

At times 𝑡 ≥ 4 the government rolls over its debt, knowing that the tax rate is set at level
required to service the interest payments on the debt and government expenditures

79.5.3 Time 0 Manipulation of Interest Rate

We have seen that when 𝑏0 > 0, the Ramsey plan sets the time 𝑡 = 0 tax rate partly with an
eye toward raising a risk-free interest rate for one-period loans between times 𝑡 = 0 and 𝑡 = 1
By raising this interest rate, the plan makes time 𝑡 = 0 goods cheap relative to consumption
goods at later times
By doing this, it lowers the value of time 𝑡 = 0 debt that it has inherited and must finance

79.5.4 Time 0 and Time-Inconsistency

In the preceding example, the Ramsey tax rate at time 0 differs from its value at time 1
To explore what is going on here, let’s simplify things by removing the possibility of war at
time 𝑡 = 3
The Ramsey problem then includes no randomness because 𝑔𝑡 = 𝑔𝑙 for all 𝑡
The figure below plots the Ramsey tax rates and gross interest rates at time 𝑡 = 0 and time
𝑡 ≥ 1 as functions of the initial government debt (using the sequential allocation solution and
a CRRA utility function defined above)

In [7]: tax_sequence = SequentialAllocation(CRRAutility(G=0.15,


π=np.ones((1, 1)),
Θ=np.ones(1)))

n = 100
tax_policy = np.empty((n, 2))
interest_rate = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)
1334 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
interest_rate[i] = tax_sequence.simulate(gov_debt[i], 0, 3)[-1]

fig, axes = plt.subplots(2, 1, figsize=(10,8), sharex=True)


titles = ['Tax Rate', 'Gross Interest Rate']

for ax, title, plot in zip(axes, titles, [tax_policy, interest_rate]):


ax.plot(gov_debt, plot[:, 0], gov_debt, plot[:, 1], lw=2)
ax.set(title=title, xlim=(min(gov_debt), max(gov_debt)))
ax.grid()

axes[0].legend(('Time $t=0$', 'Time $t \geq 1$'))


axes[1].set_xlabel('Initial Government Debt')

fig.tight_layout()
plt.show()

The figure indicates that if the government enters with positive debt, it sets a tax rate at 𝑡 =
0 that is less than all later tax rates
By setting a lower tax rate at 𝑡 = 0, the government raises consumption, which reduces the
value 𝑢𝑐,0 𝑏0 of its initial debt
It does this by increasing 𝑐0 and thereby lowering 𝑢𝑐,0
Conversely, if 𝑏0 < 0, the Ramsey planner sets the tax rate at 𝑡 = 0 higher than in subsequent
periods
A side effect of lowering time 𝑡 = 0 consumption is that it raises the one-period interest rate
at time 0 above that of subsequent periods
79.5. EXAMPLES 1335

There are only two values of initial government debt at which the tax rate is constant for all
𝑡≥0
The first is 𝑏0 = 0

• Here the government can’t use the 𝑡 = 0 tax rate to alter the value of the
initial debt

The second occurs when the government enters with sufficiently large assets that the Ramsey
planner can achieve first best and sets 𝜏𝑡 = 0 for all 𝑡
It is only for these two values of initial government debt that the Ramsey plan is time-
consistent
Another way of saying this is that, except for these two values of initial government debt, a
continuation of a Ramsey plan is not a Ramsey plan
To illustrate this, consider a Ramsey planner who starts with an initial government debt 𝑏1
associated with one of the Ramsey plans computed above
Call 𝜏1𝑅 the time 𝑡 = 0 tax rate chosen by the Ramsey planner confronting this value for ini-
tial government debt government
The figure below shows both the tax rate at time 1 chosen by our original Ramsey planner
and what a new Ramsey planner would choose for its time 𝑡 = 0 tax rate

In [8]: tax_sequence = SequentialAllocation(CRRAutility(G=0.15,


π=np.ones((1, 1)),
Θ=np.ones(1)))

n = 100
tax_policy = np.empty((n, 2))
τ_reset = np.empty((n, 2))
gov_debt = np.linspace(-1.5, 1, n)

for i in range(n):
tax_policy[i] = tax_sequence.simulate(gov_debt[i], 0, 2)[3]
τ_reset[i] = tax_sequence.simulate(gov_debt[i], 0, 1)[3]

fig, ax = plt.subplots(figsize=(10, 6))


ax.plot(gov_debt, tax_policy[:, 1], gov_debt, τ_reset, lw=2)
ax.set(xlabel='Initial Government Debt', title='Tax Rate',
xlim=(min(gov_debt), max(gov_debt)))
ax.legend((r'$\tau_1$', r'$\tau_1^R$'))
ax.grid()

fig.tight_layout()
plt.show()
1336 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

The tax rates in the figure are equal for only two values of initial government debt

79.5.5 Tax Smoothing and non-CRRA Preferences

The complete tax smoothing for 𝑡 ≥ 1 in the preceding example is a consequence of our hav-
ing assumed CRRA preferences
To see what is driving this outcome, we begin by noting that the Ramsey tax rate for 𝑡 ≥ 1
is a time-invariant function 𝜏 (Φ, 𝑔) of the Lagrange multiplier on the implementability con-
straint and government expenditures
For CRRA preferences, we can exploit the relations 𝑈𝑐𝑐 𝑐 = −𝜎𝑈𝑐 and 𝑈𝑛𝑛 𝑛 = 𝛾𝑈𝑛 to derive

(1 + (1 − 𝜎)Φ)𝑈𝑐
=1
(1 + (1 − 𝛾)Φ)𝑈𝑛

from the first-order conditions


This equation immediately implies that the tax rate is constant
For other preferences, the tax rate may not be constant
For example, let the period utility function be

𝑢(𝑐, 𝑛) = log(𝑐) + 0.69 log(1 − 𝑛)

We will create a new class LogUtility to represent this utility function

In [9]: class LogUtility:

def __init__(self,
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
79.5. EXAMPLES 1337

G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.ψ, self.π = β, ψ, π


self.G, self.Θ, self.transfers = G, Θ, transfers

# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)

# Derivatives of utility function


def Uc(self, c, n):
return 1 / c

def Ucc(self, c, n):


return -c**(-2)

def Un(self, c, n):


return -self.ψ / (1 - n)

def Unn(self, c, n):


return -self.ψ / (1 - n)**2

Also, suppose that 𝑔𝑡 follows a two-state IID process with equal probabilities attached to 𝑔𝑙
and 𝑔ℎ
To compute the tax rate, we will use both the sequential and recursive approaches described
above
The figure below plots a sample path of the Ramsey tax rate

In [10]: log_example = LogUtility()


seq_log = SequentialAllocation(log_example) # Solve sequential problem

# Initialize grid for value function iteration and solve


μ_grid = np.linspace(-0.6, 0.0, 200)
bel_log = RecursiveAllocation(log_example, μ_grid) # Solve recursive problem

T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1,
1, 1, 1, 1, 1, 0])

# Simulate
sim_seq = seq_log.simulate(0.5, 0, T, sHist)
sim_bel = bel_log.simulate(0.5, 0, T, sHist)

# Government spending paths


sim_seq[4] = log_example.G[sHist]
sim_bel[4] = log_example.G[sHist]

# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]

fig, axes = plt.subplots(3, 2, figsize=(14, 10))


titles = ['Consumption', 'Labor Supply', 'Government Debt',
'Tax Rate', 'Government Spending', 'Output']

for ax, title, sim_s, sim_b in zip(axes.flatten(), titles, sim_seq, sim_bel):


ax.plot(sim_s, '-ob', sim_b, '-xk', alpha=0.7)
ax.set(title=title)
ax.grid()

axes.flatten()[0].legend(('Sequential', 'Recursive'))
fig.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:76: FutureWarning: arrays to stack must


/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16: RuntimeWarning: divide by zero enco
1338 79. OPTIMAL TAXATION WITH STATE-CONTINGENT DEBT

app.launch_new_instance()
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:20: RuntimeWarning: divide by zero enco

As should be expected, the recursive and sequential solutions produce almost identical alloca-
tions
Unlike outcomes with CRRA preferences, the tax rate is not perfectly smoothed
Instead, the government raises the tax rate when 𝑔𝑡 is high

79.5.6 Further Comments

A related lecture describes an extension of the Lucas-Stokey model by Aiyagari, Marcet, Sar-
gent, and Seppälä (2002) [5]
In th AMSS economy, only a risk-free bond is traded
That lecture compares the recursive representation of the Lucas-Stokey model presented in
this lecture with one for an AMSS economy
By comparing these recursive formulations, we shall glean a sense in which the dimension of
the state is lower in the Lucas Stokey model
Accompanying that difference in dimension will be different dynamics of government debt
80

Optimal Taxation without


State-Contingent Debt

80.1 Contents

• Overview 80.2

• Competitive Equilibrium with Distorting Taxes 80.3

• Recursive Version of AMSS Model 80.4

• Examples 80.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

80.2 Overview

In an earlier lecture, we described a model of optimal taxation with state-contingent debt due
to Robert E. Lucas, Jr., and Nancy Stokey [90]
Aiyagari, Marcet, Sargent, and Seppälä [5] (hereafter, AMSS) studied optimal taxation in a
model without state-contingent debt
In this lecture, we

• describe assumptions and equilibrium concepts


• solve the model
• implement the model numerically
• conduct some policy experiments
• compare outcomes with those in a corresponding complete-markets model

We begin with an introduction to the model

1339
1340 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

80.3 Competitive Equilibrium with Distorting Taxes

Many but not all features of the economy are identical to those of the Lucas-Stokey economy
Let’s start with things that are identical
For 𝑡 ≥ 0, a history of the state is represented by 𝑠𝑡 = [𝑠𝑡 , 𝑠𝑡−1 , … , 𝑠0 ]
Government purchases 𝑔(𝑠) are an exact time-invariant function of 𝑠
Let 𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 ), and 𝑛𝑡 (𝑠𝑡 ) denote consumption, leisure, and labor supply, respectively, at
history 𝑠𝑡 at time 𝑡
Each period a representative household is endowed with one unit of time that can be divided
between leisure ℓ𝑡 and labor 𝑛𝑡 :

𝑛𝑡 (𝑠𝑡 ) + ℓ𝑡 (𝑠𝑡 ) = 1 (1)

Output equals 𝑛𝑡 (𝑠𝑡 ) and can be divided between consumption 𝑐𝑡 (𝑠𝑡 ) and 𝑔(𝑠𝑡 )

𝑐𝑡 (𝑠𝑡 ) + 𝑔(𝑠𝑡 ) = 𝑛𝑡 (𝑠𝑡 ) (2)

Output is not storable


The technology pins down a pre-tax wage rate to unity for all 𝑡, 𝑠𝑡
A representative household’s preferences over {𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )}∞
𝑡=0 are ordered by


∑ ∑ 𝛽 𝑡 𝜋𝑡 (𝑠𝑡 )𝑢[𝑐𝑡 (𝑠𝑡 ), ℓ𝑡 (𝑠𝑡 )] (3)
𝑡=0 𝑠𝑡

where

• 𝜋𝑡 (𝑠𝑡 ) is a joint probability distribution over the sequence 𝑠𝑡 , and


• the utility function 𝑢 is increasing, strictly concave, and three times continuously differ-
entiable in both arguments

The government imposes a flat rate tax 𝜏𝑡 (𝑠𝑡 ) on labor income at time 𝑡, history 𝑠𝑡
Lucas and Stokey assumed that there are complete markets in one-period Arrow securities;
also see smoothing models
It is at this point that AMSS [5] modify the Lucas and Stokey economy
AMSS allow the government to issue only one-period risk-free debt each period
Ruling out complete markets in this way is a step in the direction of making total tax collec-
tions behave more like that prescribed in [11] than they do in [90]

80.3.1 Risk-free One-Period Debt Only

In period 𝑡 and history 𝑠𝑡 , let

• 𝑏𝑡+1 (𝑠𝑡 ) be the amount of the time 𝑡 + 1 consumption good that at time 𝑡 the govern-
ment promised to pay
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1341

• 𝑅𝑡 (𝑠𝑡 ) be the gross interest rate on risk-free one-period debt between periods 𝑡 and 𝑡 + 1
• 𝑇𝑡 (𝑠𝑡 ) be a non-negative lump-sum transfer to the representative household [1]

That 𝑏𝑡+1 (𝑠𝑡 ) is the same for all realizations of 𝑠𝑡+1 captures its risk-free character
The market value at time 𝑡 of government debt maturing at time 𝑡 + 1 equals 𝑏𝑡+1 (𝑠𝑡 ) divided
by 𝑅𝑡 (𝑠𝑡 )
The government’s budget constraint in period 𝑡 at history 𝑠𝑡 is

𝑏𝑡+1 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝜏𝑡𝑛 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) +
𝑅𝑡 (𝑠𝑡 )
(4)
𝑏 (𝑠𝑡 )
≡ 𝑧(𝑠 ) + 𝑡+1 𝑡 ,
𝑡
𝑅𝑡 (𝑠 )

where 𝑧(𝑠𝑡 ) is the net-of-interest government surplus


To rule out Ponzi schemes, we assume that the government is subject to a natural debt
limit (to be discussed in a forthcoming lecture)
The consumption Euler equation for a representative household able to trade only one-period
risk-free debt with one-period gross interest rate 𝑅𝑡 (𝑠𝑡 ) is

𝑡+1
1 𝑡+1 𝑡 𝑢𝑐 (𝑠 )
= ∑ 𝛽𝜋 𝑡+1 (𝑠 |𝑠 )
𝑅𝑡 (𝑠𝑡 ) 𝑠𝑡+1 |𝑠𝑡 𝑢𝑐 (𝑠𝑡 )

Substituting this expression into the government’s budget constraint Eq. (4) yields:

𝑢𝑐 (𝑠𝑡+1 )
𝑏𝑡 (𝑠𝑡−1 ) = 𝑧(𝑠𝑡 ) + 𝛽 ∑ 𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) 𝑏 (𝑠𝑡 ) (5)
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑡+1

Components of 𝑧(𝑠𝑡 ) on the right side depend on 𝑠𝑡 , but the left side is required to depend on
𝑠𝑡−1 only
This is what it means for one-period government debt to be risk-free
Therefore, the sum on the right side of equation Eq. (5) also has to depend only on 𝑠𝑡−1
This requirement will give rise to measurability constraints on the Ramsey allocation to
be discussed soon
If we replace 𝑏𝑡+1 (𝑠𝑡 ) on the right side of equation Eq. (5) by the right side of next period’s
budget constraint (associated with a particular realization 𝑠𝑡 ) we get

𝑢𝑐 (𝑠𝑡+1 ) 𝑏𝑡+2 (𝑠𝑡+1 )


𝑏𝑡 (𝑠𝑡−1 ) = 𝑧(𝑠𝑡 ) + ∑ 𝛽𝜋𝑡+1 (𝑠𝑡+1 |𝑠𝑡 ) [𝑧(𝑠 𝑡+1
) + ]
𝑠𝑡+1 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 ) 𝑅𝑡+1 (𝑠𝑡+1 )

After making similar repeated substitutions for all future occurrences of government indebt-
edness, and by invoking the natural debt limit, we arrive at:


𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = ∑ ∑ 𝛽 𝑗 𝜋𝑡+𝑗 (𝑠𝑡+𝑗 |𝑠𝑡 ) 𝑧(𝑠𝑡+𝑗 ) (6)
𝑗=0 𝑠𝑡+𝑗 |𝑠𝑡
𝑢𝑐 (𝑠𝑡 )

Now let’s
1342 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

• substitute the resource constraint into the net-of-interest government surplus, and
• use the household’s first-order condition 1 − 𝜏𝑡𝑛 (𝑠𝑡 ) = 𝑢ℓ (𝑠𝑡 )/𝑢𝑐 (𝑠𝑡 ) to eliminate the
labor tax rate

so that we can express the net-of-interest government surplus 𝑧(𝑠𝑡 ) as

𝑢ℓ (𝑠𝑡 )
𝑧(𝑠𝑡 ) = [1 − ] [𝑐𝑡 (𝑠𝑡 ) + 𝑔𝑡 (𝑠𝑡 )] − 𝑔𝑡 (𝑠𝑡 ) − 𝑇𝑡 (𝑠𝑡 ) . (7)
𝑢𝑐 (𝑠𝑡 )

If we substitute the appropriate versions of the right side of Eq. (7) for 𝑧(𝑠𝑡+𝑗 ) into equation
Eq. (6), we obtain a sequence of implementability constraints on a Ramsey allocation in an
AMSS economy
Expression Eq. (6) at time 𝑡 = 0 and initial state 𝑠0 was also an implementability constraint
on a Ramsey allocation in a Lucas-Stokey economy:


𝑢𝑐 (𝑠𝑗 )
𝑏0 (𝑠−1 ) = E0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) (8)
𝑗=0
𝑢𝑐 (𝑠0 )

Indeed, it was the only implementability constraint there


But now we also have a large number of additional implementability constraints


𝑢𝑐 (𝑠𝑡+𝑗 )
𝑏𝑡 (𝑠𝑡−1 ) = E𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) (9)
𝑗=0
𝑢𝑐 (𝑠𝑡 )

Equation Eq. (9) must hold for each 𝑠𝑡 for each 𝑡 ≥ 1

80.3.2 Comparison with Lucas-Stokey Economy

The expression on the right side of Eq. (9) in the Lucas-Stokey (1983) economy would equal
the present value of a continuation stream of government surpluses evaluated at what would
be competitive equilibrium Arrow-Debreu prices at date 𝑡
In the Lucas-Stokey economy, that present value is measurable with respect to 𝑠𝑡
In the AMSS economy, the restriction that government debt be risk-free imposes that that
same present value must be measurable with respect to 𝑠𝑡−1
In a language used in the literature on incomplete markets models, it can be said that the
AMSS model requires that at each (𝑡, 𝑠𝑡 ) what would be the present value of continuation
government surpluses in the Lucas-Stokey model must belong to the marketable subspace
of the AMSS model

80.3.3 Ramsey Problem Without State-contingent Debt

After we have substituted the resource constraint into the utility function, we can express the
Ramsey problem as being to choose an allocation that solves


max E0 ∑ 𝛽 𝑡 𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
{𝑐𝑡 (𝑠𝑡 ),𝑏𝑡+1 (𝑠𝑡 )}
𝑡=0
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1343

where the maximization is subject to


𝑢𝑐 (𝑠𝑗 )
E0 ∑ 𝛽 𝑗 𝑧(𝑠𝑗 ) ≥ 𝑏0 (𝑠−1 ) (10)
𝑗=0
𝑢𝑐 (𝑠0 )

and


𝑢𝑐 (𝑠𝑡+𝑗 )
E𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) = 𝑏𝑡 (𝑠𝑡−1 ) ∀ 𝑠𝑡 (11)
𝑗=0
𝑢𝑐 (𝑠𝑡 )

given 𝑏0 (𝑠−1 )
Lagrangian Formulation
Let 𝛾0 (𝑠0 ) be a non-negative Lagrange multiplier on constraint Eq. (10)
As in the Lucas-Stokey economy, this multiplier is strictly positive when the government must
resort to distortionary taxation; otherwise it equals zero
A consequence of the assumption that there are no markets in state-contingent securities and
that a market exists only in a risk-free security is that we have to attach stochastic processes
{𝛾𝑡 (𝑠𝑡 )}∞
𝑡=1 of Lagrange multipliers to the implementability constraints Eq. (11)

Depending on how the constraints bind, these multipliers can be positive or negative:

𝛾𝑡 (𝑠𝑡 ) ≥ (≤) 0 if the constraint binds in this direction



𝑢𝑐 (𝑠𝑡+𝑗 )
E𝑡 ∑ 𝛽 𝑗 𝑧(𝑠𝑡+𝑗 ) ≥ (≤) 𝑏𝑡 (𝑠𝑡−1 )
𝑗=0
𝑢𝑐 (𝑠𝑡 )

A negative multiplier 𝛾𝑡 (𝑠𝑡 ) < 0 means that if we could relax constraint Eq. (11), we would
like to increase the beginning-of-period indebtedness for that particular realization of history
𝑠𝑡
That would let us reduce the beginning-of-period indebtedness for some other history [2]
These features flow from the fact that the government cannot use state-contingent debt and
therefore cannot allocate its indebtedness efficiently across future states

80.3.4 Some Calculations

It is helpful to apply two transformations to the Lagrangian


Multiply constraint Eq. (10) by 𝑢𝑐 (𝑠0 ) and the constraints Eq. (11) by 𝛽 𝑡 𝑢𝑐 (𝑠𝑡 )
Then a Lagrangian for the Ramsey problem can be represented as
1344 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT


𝐽 = E0 ∑ 𝛽 𝑡 {𝑢 (𝑐𝑡 (𝑠𝑡 ), 1 − 𝑐𝑡 (𝑠𝑡 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡=0

+ 𝛾𝑡 (𝑠𝑡 )[E𝑡 ∑ 𝛽 𝑗 𝑢𝑐 (𝑠𝑡+𝑗 ) 𝑧(𝑠𝑡+𝑗 ) − 𝑢𝑐 (𝑠𝑡 ) 𝑏𝑡 (𝑠𝑡−1 )}
𝑗=0
(12)

= E0 ∑ 𝛽 {𝑢 (𝑐𝑡 (𝑠 ), 1 − 𝑐𝑡 (𝑠 ) − 𝑔𝑡 (𝑠𝑡 ))
𝑡 𝑡 𝑡

𝑡=0

+ Ψ𝑡 (𝑠𝑡 ) 𝑢𝑐 (𝑠𝑡 ) 𝑧(𝑠𝑡 ) − 𝛾𝑡 (𝑠𝑡 ) 𝑢𝑐 (𝑠𝑡 ) 𝑏𝑡 (𝑠𝑡−1 )}

where

Ψ𝑡 (𝑠𝑡 ) = Ψ𝑡−1 (𝑠𝑡−1 ) + 𝛾𝑡 (𝑠𝑡 ) and Ψ−1 (𝑠−1 ) = 0 (13)

In Eq. (12), the second equality uses the law of iterated expectations and Abel’s summation
formula (also called summation by parts, see this page)
First-order conditions with respect to 𝑐𝑡 (𝑠𝑡 ) can be expressed as

𝑢𝑐 (𝑠𝑡 ) − 𝑢ℓ (𝑠𝑡 ) + Ψ𝑡 (𝑠𝑡 ) {[𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑧(𝑠𝑡 ) + 𝑢𝑐 (𝑠𝑡 ) 𝑧𝑐 (𝑠𝑡 )}
(14)
− 𝛾𝑡 (𝑠𝑡 ) [𝑢𝑐𝑐 (𝑠𝑡 ) − 𝑢𝑐ℓ (𝑠𝑡 )] 𝑏𝑡 (𝑠𝑡−1 ) = 0

and with respect to 𝑏𝑡 (𝑠𝑡 ) as

E𝑡 [𝛾𝑡+1 (𝑠𝑡+1 ) 𝑢𝑐 (𝑠𝑡+1 )] = 0 (15)

If we substitute 𝑧(𝑠𝑡 ) from Eq. (7) and its derivative 𝑧𝑐 (𝑠𝑡 ) into the first-order condition
Eq. (14), we find two differences from the corresponding condition for the optimal allocation
in a Lucas-Stokey economy with state-contingent government debt

1. The term involving 𝑏𝑡 (𝑠𝑡−1 ) in the first-order condition Eq. (14) does not appear in the
corresponding expression for the Lucas-Stokey economy

• This term reflects the constraint that beginning-of-period government indebtedness


must be the same across all realizations of next period’s state, a constraint that would
not be present if government debt could be state contingent

1. The Lagrange multiplier Ψ𝑡 (𝑠𝑡 ) in the first-order condition Eq. (14) may change over
time in response to realizations of the state, while the multiplier Φ in the Lucas-Stokey
economy is time-invariant

We need some code from our an earlier lecture on optimal taxation with state-contingent debt
sequential allocation implementation:

In [2]: import numpy as np


from scipy.optimize import root
from quantecon import MarkovChain

class SequentialAllocation:
80.3. COMPETITIVE EQUILIBRIUM WITH DISTORTING TAXES 1345

'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''

def __init__(self, model):

# Initialize from model object attributes


self.β, self.π, self.G = model.β, model.π, model.G
self.mc, self.Θ = MarkovChain(self.π), model.Θ
self.S = len(model.π) # Number of states
self.model = model

# Find the first best allocation


self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))

if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]

# Multiplier on the resource constraint


self.ΞFB = Uc(self.cFB, self.nFB)
self.zFB = np.hstack([self.cFB, self.nFB, self.ΞFB])

def time1_allocation(self, μ):


'''
Computes optimal allocation for time t >= 1 for a given μ
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])

# Find the root of the first-order condition


res = root(FOC, self.zFB)
if not res.success:
raise Exception('Could not find LS allocation.')
z = res.x
c, n, Ξ = z[:S], z[S:2 * S], z[2 * S:]

# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)

return c, n, x, Ξ
1346 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

def time0_allocation(self, B_, s_0):


'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
model, π, Θ, G, β = self.model, self.π, self.Θ, self.G, self.β
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

# First order conditions of planner's problem


def FOC(z):
μ, c, n, Ξ = z
xprime = self.time1_allocation(μ)[2]
return np.hstack([Uc(c, n) * (c - B_) + Un(c, n) * n + β * π[s_0] @ xprime,
Uc(c, n) - μ * (Ucc(c, n) *
(c - B_) + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n +
Un(c, n)) + Θ[s_0] * Ξ,
(Θ * n - c - G)[s_0]])

# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')

return res.x

def time1_value(self, μ):


'''
Find the value associated with multiplier μ
'''
c, n, x, Ξ = self.time1_allocation(μ)
U = self.model.U(c, n)
V = np.linalg.solve(np.eye(self.S) - self.β * self.π, U)
return c, n, x, V

def Τ(self, c, n):


'''
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π, β = self.model, self.π, self.β
Uc = model.Uc

if sHist is None:
sHist = self.mc.simulate(T, s_0)

cHist, nHist, Bhist, ΤHist, μHist = np.zeros((5, T))


RHist = np.zeros(T - 1)

# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ

# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
80.4. RECURSIVE VERSION OF AMSS MODEL 1347

μHist[t] = μ

return np.array([cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist])

To analyze the AMSS model, we find it useful to adopt a recursive formulation using tech-
niques like those in our lectures on dynamic Stackelberg models and optimal taxation with
state-contingent debt

80.4 Recursive Version of AMSS Model

We now describe a recursive formulation of the AMSS economy


We have noted that from the point of view of the Ramsey planner, the restriction to one-
period risk-free securities

• leaves intact the single implementability constraint on allocations Eq. (8) from the
Lucas-Stokey economy, but
• adds measurability constraints Eq. (6) on functions of tails of allocations at each time
and history

We now explore how these constraints alter Bellman equations for a time 0 Ramsey planner
and for time 𝑡 ≥ 1, history 𝑠𝑡 continuation Ramsey planners

80.4.1 Recasting State Variables

In the AMSS setting, the government faces a sequence of budget constraints

𝜏𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) + 𝑇𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) = 𝑔𝑡 + 𝑏𝑡 (𝑠𝑡−1 )

where 𝑅𝑡 (𝑠𝑡 ) is the gross risk-free rate of interest between 𝑡 and 𝑡 + 1 at history 𝑠𝑡 and 𝑇𝑡 (𝑠𝑡 )
are non-negative transfers
Throughout this lecture, we shall set transfers to zero (for some issues about the limiting
behavior of debt, this makes a possibly important difference from AMSS [5], who restricted
transfers to be non-negative)
In this case, the household faces a sequence of budget constraints

𝑏𝑡 (𝑠𝑡−1 ) + (1 − 𝜏𝑡 (𝑠𝑡 ))𝑛𝑡 (𝑠𝑡 ) = 𝑐𝑡 (𝑠𝑡 ) + 𝑏𝑡+1 (𝑠𝑡 )/𝑅𝑡 (𝑠𝑡 ) (16)

The household’s first-order conditions are 𝑢𝑐,𝑡 = 𝛽𝑅𝑡 E𝑡 𝑢𝑐,𝑡+1 and (1 − 𝜏𝑡 )𝑢𝑐,𝑡 = 𝑢𝑙,𝑡
Using these to eliminate 𝑅𝑡 and 𝜏𝑡 from budget constraint Eq. (16) gives

𝑢𝑙,𝑡 (𝑠𝑡 ) 𝛽(E𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )


𝑏𝑡 (𝑠𝑡−1 ) + 𝑛 (𝑠𝑡
) = 𝑐 (𝑠𝑡
) + (17)
𝑢𝑐,𝑡 (𝑠𝑡 ) 𝑡 𝑡
𝑢𝑐,𝑡 (𝑠𝑡 )

or

𝑢𝑐,𝑡 (𝑠𝑡 )𝑏𝑡 (𝑠𝑡−1 ) + 𝑢𝑙,𝑡 (𝑠𝑡 )𝑛𝑡 (𝑠𝑡 ) = 𝑢𝑐,𝑡 (𝑠𝑡 )𝑐𝑡 (𝑠𝑡 ) + 𝛽(E𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 ) (18)
1348 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

Now define

𝑏𝑡+1 (𝑠𝑡 )
𝑥𝑡 ≡ 𝛽𝑏𝑡+1 (𝑠𝑡 )E𝑡 𝑢𝑐,𝑡+1 = 𝑢𝑐,𝑡 (𝑠𝑡 ) (19)
𝑅𝑡 (𝑠𝑡 )

and represent the household’s budget constraint at time 𝑡, history 𝑠𝑡 as

𝑢𝑐,𝑡 𝑥𝑡−1
= 𝑢𝑐,𝑡 𝑐𝑡 − 𝑢𝑙,𝑡 𝑛𝑡 + 𝑥𝑡 (20)
𝛽E𝑡−1 𝑢𝑐,𝑡

for 𝑡 ≥ 1

80.4.2 Measurability Constraints

Write equation Eq. (18) as

𝑢𝑙,𝑡 (𝑠𝑡 ) 𝛽(E𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )


𝑏𝑡 (𝑠𝑡−1 ) = 𝑐𝑡 (𝑠𝑡 ) − 𝑛 (𝑠𝑡
) + (21)
𝑢𝑐,𝑡 (𝑠𝑡 ) 𝑡 𝑢𝑐,𝑡

The right side of equation Eq. (21) expresses the time 𝑡 value of government debt in terms of
a linear combination of terms whose individual components are measurable with respect to 𝑠𝑡
The sum of terms on the right side of equation Eq. (21) must equal 𝑏𝑡 (𝑠𝑡−1 )
That implies that it has to be measurable with respect to 𝑠𝑡−1
Equations Eq. (21) are the measurability constraints that the AMSS model adds to the single
time 0 implementation constraint imposed in the Lucas and Stokey model

80.4.3 Two Bellman Equations

Let Π(𝑠|𝑠− ) be a Markov transition matrix whose entries tell probabilities of moving from
state 𝑠− to state 𝑠 in one period
Let

• 𝑉 (𝑥− , 𝑠− ) be the continuation value of a continuation Ramsey plan at 𝑥𝑡−1 = 𝑥− , 𝑠𝑡−1 =


𝑠− for 𝑡 ≥ 1
• 𝑊 (𝑏, 𝑠) be the value of the Ramsey plan at time 0 at 𝑏0 = 𝑏 and 𝑠0 = 𝑠

We distinguish between two types of planners:


For 𝑡 ≥ 1, the value function for a continuation Ramsey planner satisfies the Bellman
equation

𝑉 (𝑥− , 𝑠− ) = max ∑ Π(𝑠|𝑠− ) [𝑢(𝑛(𝑠) − 𝑔(𝑠), 1 − 𝑛(𝑠)) + 𝛽𝑉 (𝑥(𝑠), 𝑠)] (22)


{𝑛(𝑠),𝑥(𝑠)} 𝑠

subject to the following collection of implementability constraints, one for each 𝑠 ∈ 𝑆:

𝑢𝑐 (𝑠)𝑥−
= 𝑢𝑐 (𝑠)(𝑛(𝑠) − 𝑔(𝑠)) − 𝑢𝑙 (𝑠)𝑛(𝑠) + 𝑥(𝑠) (23)
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃
80.4. RECURSIVE VERSION OF AMSS MODEL 1349

A continuation Ramsey planner at 𝑡 ≥ 1 takes (𝑥𝑡−1 , 𝑠𝑡−1 ) = (𝑥− , 𝑠− ) as given and before 𝑠 is
realized chooses (𝑛𝑡 (𝑠𝑡 ), 𝑥𝑡 (𝑠𝑡 )) = (𝑛(𝑠), 𝑥(𝑠)) for 𝑠 ∈ 𝑆
The Ramsey planner takes (𝑏0 , 𝑠0 ) as given and chooses (𝑛0 , 𝑥0 ).
The value function 𝑊 (𝑏0 , 𝑠0 ) for the time 𝑡 = 0 Ramsey planner satisfies the Bellman equa-
tion

𝑊 (𝑏0 , 𝑠0 ) = max 𝑢(𝑛0 − 𝑔0 , 1 − 𝑛0 ) + 𝛽𝑉 (𝑥0 , 𝑠0 ) (24)


𝑛0 ,𝑥0

where maximization is subject to

𝑢𝑐,0 𝑏0 = 𝑢𝑐,0 (𝑛0 − 𝑔0 ) − 𝑢𝑙,0 𝑛0 + 𝑥0 (25)

80.4.4 Martingale Supercedes State-Variable Degeneracy

Let 𝜇(𝑠|𝑠− )Π(𝑠|𝑠− ) be a Lagrange multiplier on the constraint Eq. (23) for state 𝑠
After forming an appropriate Lagrangian, we find that the continuation Ramsey planner’s
first-order condition with respect to 𝑥(𝑠) is

𝛽𝑉𝑥 (𝑥(𝑠), 𝑠) = 𝜇(𝑠|𝑠− ) (26)

Applying the envelope theorem to Bellman equation Eq. (22) gives

𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ Π(𝑠|𝑠− )𝜇(𝑠|𝑠− ) (27)
𝑠
𝛽 ∑𝑠 ̃ Π(𝑠|𝑠̃ − )𝑢𝑐 (𝑠)̃

Equations Eq. (26) and Eq. (27) imply that

𝑢𝑐 (𝑠)
𝑉𝑥 (𝑥− , 𝑠− ) = ∑ (Π(𝑠|𝑠− ) ) 𝑉𝑥 (𝑥(𝑠), 𝑠) (28)
𝑠
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃

Equation Eq. (28) states that 𝑉𝑥 (𝑥, 𝑠) is a risk-adjusted martingale


Saying that 𝑉𝑥 (𝑥, 𝑠) is a risk-adjusted martingale means that 𝑉𝑥 (𝑥, 𝑠) is a martingale with re-
spect to the probability distribution over 𝑠𝑡 sequences that are generated by the twisted tran-
sition probability matrix:

̌ 𝑢𝑐 (𝑠)
Π(𝑠|𝑠 − ) ≡ Π(𝑠|𝑠− )
∑𝑠 ̃ Π(𝑠|𝑠
̃ − )𝑢𝑐 (𝑠)̃

̌
Exercise: Please verify that Π(𝑠|𝑠 − ) is a valid Markov transition density, i.e., that its ele-
ments are all non-negative and that for each 𝑠− , the sum over 𝑠 equals unity

80.4.5 Absence of State Variable Degeneracy

Along a Ramsey plan, the state variable 𝑥𝑡 = 𝑥𝑡 (𝑠𝑡 , 𝑏0 ) becomes a function of the history 𝑠𝑡
and initial government debt 𝑏0
In Lucas-Stokey model, we found that
1350 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

• a counterpart to 𝑉𝑥 (𝑥, 𝑠) is time-invariant and equal to the Lagrange multiplier on the


Lucas-Stokey implementability constraint
• time invariance of 𝑉𝑥 (𝑥, 𝑠) is the source of a key feature of the Lucas-Stokey model,
namely, state variable degeneracy (i.e., 𝑥𝑡 is an exact function of 𝑠𝑡 )

That 𝑉𝑥 (𝑥, 𝑠) varies over time according to a twisted martingale means that there is no state-
variable degeneracy in the AMSS model
In the AMSS model, both 𝑥 and 𝑠 are needed to describe the state
This property of the AMSS model transmits a twisted martingale component to consumption,
employment, and the tax rate

80.4.6 Digression on Non-negative Transfers

Throughout this lecture, we have imposed that transfers 𝑇𝑡 = 0


AMSS [5] instead imposed a nonnegativity constraint 𝑇𝑡 ≥ 0 on transfers
They also considered a special case of quasi-linear preferences, 𝑢(𝑐, 𝑙) = 𝑐 + 𝐻(𝑙)
In this case, 𝑉𝑥 (𝑥, 𝑠) ≤ 0 is a non-positive martingale
By the martingale convergence theorem 𝑉𝑥 (𝑥, 𝑠) converges almost surely
Furthermore, when the Markov chain Π(𝑠|𝑠− ) and the government expenditure function 𝑔(𝑠)
are such that 𝑔𝑡 is perpetually random, 𝑉𝑥 (𝑥, 𝑠) almost surely converges to zero
For quasi-linear preferences, the first-order condition with respect to 𝑛(𝑠) becomes

(1 − 𝜇(𝑠|𝑠− ))(1 − 𝑢𝑙 (𝑠)) + 𝜇(𝑠|𝑠− )𝑛(𝑠)𝑢𝑙𝑙 (𝑠) = 0

When 𝜇(𝑠|𝑠− ) = 𝛽𝑉𝑥 (𝑥(𝑠), 𝑥) converges to zero, in the limit 𝑢𝑙 (𝑠) = 1 = 𝑢𝑐 (𝑠), so that
𝜏 (𝑥(𝑠), 𝑠) = 0
Thus, in the limit, if 𝑔𝑡 is perpetually random, the government accumulates sufficient assets
to finance all expenditures from earnings on those assets, returning any excess revenues to the
household as non-negative lump-sum transfers

80.4.7 Code

The recursive formulation is implemented as follows

In [3]: from scipy.optimize import fmin_slsqp

class RecursiveAllocationAMSS:

def __init__(self, model, μgrid, tol_diff=1e-4, tol=1e-4):

self.β, self.π, self.G = model.β, model.π, model.G


self.mc, self.S = MarkovChain(self.π), len(model.π) # Number of states
self.Θ, self.model, self.μgrid = model.Θ, model, μgrid
self.tol_diff, self.tol = tol_diff, tol

# Find the first best allocation


self.solve_time1_bellman()
self.T.time_0 = True # Bellman equation now solves time 0 problem
80.4. RECURSIVE VERSION OF AMSS MODEL 1351

def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)

# First get initial fit from Lucas Stokey solution.


# Need to change things to be ex ante
PP = SequentialAllocation(model)
interp = interpolator_factory(2, None)

def incomplete_allocation(μ_, s_):


c, n, x, V = PP.time1_value(μ_)
return c, n, π[s_] @ x, π[s_] @ V
cf, nf, xgrid, Vf, xprimef = [], [], [], [], []
for s_ in range(S):
c, n, x, V = zip(*map(lambda μ: incomplete_allocation(μ, s_), μgrid0))
c, n = np.vstack(c).T, np.vstack(n).T
x, V = np.hstack(x), np.hstack(V)
xprimes = np.vstack([x] * S)
cf.append(interp(x, c))
nf.append(interp(x, n))
Vf.append(interp(x, V))
xgrid.append(x)
xprimef.append(interp(x, xprimes))
cf, nf, xprimef = fun_vstack(cf), fun_vstack(nf), fun_vstack(xprimef)
Vf = fun_hstack(Vf)
policies = [cf, nf, xprimef]

# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid

# Now iterate on Bellman equation


T = BellmanEquation(model, xgrid, policies, tol=self.tol)
diff = 1
while diff > self.tol_diff:
PF = T(Vf)

Vfnew, policies = self.fit_policy_function(PF)


diff = np.abs((Vf(xgrid) - Vfnew(xgrid)) / Vf(xgrid)).max()

print(diff)
Vf = Vfnew

# store value function policies and Bellman Equations


self.Vf = Vf
self.policies = policies
self.T = T

def fit_policy_function(self, PF):


'''
Fits the policy functions
'''
S, xgrid = len(self.π), self.xgrid
interp = interpolator_factory(3, 0)
cf, nf, xprimef, Tf, Vf = [], [], [], [], []
for s_ in range(S):
PFvec = np.vstack([PF(x, s_) for x in self.xgrid]).T
Vf.append(interp(xgrid, PFvec[0, :]))
cf.append(interp(xgrid, PFvec[1:1 + S]))
nf.append(interp(xgrid, PFvec[1 + S:1 + 2 * S]))
xprimef.append(interp(xgrid, PFvec[1 + 2 * S:1 + 3 * S]))
Tf.append(interp(xgrid, PFvec[1 + 3 * S:]))
policies = fun_vstack(cf), fun_vstack(
nf), fun_vstack(xprimef), fun_vstack(Tf)
Vf = fun_hstack(Vf)
return Vf, policies
1352 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

def Τ(self, c, n):


'''
Computes Τ given c and n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def time0_allocation(self, B_, s0):


'''
Finds the optimal allocation given initial government debt B_ and
state s_0
'''
PF = self.T(self.Vf)
z0 = PF(B_, s0)
c0, n0, xprime0, T0 = z0[1:]
return c0, n0, xprime0, T0

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π = self.model, self.π
Uc = model.Uc
cf, nf, xprimef, Tf = self.policies

if sHist is None:
sHist = simulate_markov(π, s_0, T)

cHist, nHist, Bhist, xHist, ΤHist, THist, μHist = np.zeros((7, T))


# time 0
cHist[0], nHist[0], xHist[0], THist[0] = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = self.Vf[s_0](xHist[0])

# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)

Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c

μHist[t] = self.Vf[s](xprime[s])

cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x / Eu_c, Τ


xHist[t], THist[t] = xprime[s], T[s]
return np.array([cHist, nHist, Bhist, ΤHist, THist, μHist, sHist, xHist])

class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''

def __init__(self, model, xgrid, policies0, tol, maxiter=1000):

self.β, self.π, self.G = model.β, model.π, model.G


self.S = len(model.π) # Number of states
self.Θ, self.model, self.tol = model.Θ, model, tol
self.maxiter = maxiter

self.xbar = [min(xgrid), max(xgrid)]


self.time_0 = False

self.z0 = {}
cf, nf, xprimef = policies0
80.4. RECURSIVE VERSION OF AMSS MODEL 1353

for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])

self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))


if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB

self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)

self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])

def __call__(self, Vf):


'''
Given continuation value function next period return value function this
period return T(V) and optimal policies
'''
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF

def get_policies_time1(self, x, s_, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, G, S, π = self.model, self.β, self.Θ, self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]

Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])

return -π[s_] @ (U(c, n) + β * Vprime)

def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])

if model.transfers:
1354 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

bounds = [(0., 100)] * S + [(0., 100)] * S + \


[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)

if imode > 0:
raise Exception(smode)

self.z0[x, s_] = out


return np.hstack([-fx, out])

def get_policies_time0(self, B_, s0, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, G = self.model, self.β, self.Θ, self.G
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:-1]

return -(U(c, n) + β * Vf[s0](xprime))

def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])

if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)

if imode > 0:
raise Exception(smode)

return np.hstack([-fx, out])

80.5 Examples

We now turn to some examples


We will first build some useful functions for solving the model

In [4]: from scipy.interpolate import UnivariateSpline

class interpolate_wrapper:

def __init__(self, F):


self.F = F

def __getitem__(self, index):


return interpolate_wrapper(np.asarray(self.F[index]))

def reshape(self, *args):


self.F = self.F.reshape(*args)
return self

def transpose(self):
self.F = self.F.transpose()
80.5. EXAMPLES 1355

def __len__(self):
return len(self.F)

def __call__(self, xvec):


x = np.atleast_1d(xvec)
shape = self.F.shape
if len(x) == 1:
fhat = np.hstack([f(x) for f in self.F.flatten()])
return fhat.reshape(shape)
else:
fhat = np.vstack([f(x) for f in self.F.flatten()])
return fhat.reshape(np.hstack((shape, len(x))))

class interpolator_factory:

def __init__(self, k, s):


self.k, self.s = k, s

def __call__(self, xgrid, Fs):


shape, m = Fs.shape[:-1], Fs.shape[-1]
Fs = Fs.reshape((-1, m))
F = []
xgrid = np.sort(xgrid) # Sort xgrid
for Fhat in Fs:
F.append(UnivariateSpline(xgrid, Fhat, k=self.k, s=self.s))
return interpolate_wrapper(np.array(F).reshape(shape))

def fun_vstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.vstack(Fs))

def fun_hstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.hstack(Fs))

def simulate_markov(π, s_0, T):

sHist = np.empty(T, dtype=int)


sHist[0] = s_0
S = len(π)
for t in range(1, T):
sHist[t] = np.random.choice(np.arange(S), p=π[sHist[t - 1]])

return sHist

80.5.1 Anticipated One-Period War

In our lecture on optimal taxation with state contingent debt we studied how the government
manages uncertainty in a simple setting
As in that lecture, we assume the one-period utility function

𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾

Note
For convenience in matching our computer code, we have expressed utility as a
function of 𝑛 rather than leisure 𝑙
1356 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

We consider the same government expenditure process studied in the lecture on optimal taxa-
tion with state contingent debt
Government expenditures are known for sure in all periods except one

• For 𝑡 < 3 or 𝑡 > 3 we assume that 𝑔𝑡 = 𝑔𝑙 = 0.1


• At 𝑡 = 3 a war occurs with probability 0.5
– If there is war, 𝑔3 = 𝑔ℎ = 0.2
– If there is no war 𝑔3 = 𝑔𝑙 = 0.1

A useful trick is to define components of the state vector as the following six (𝑡, 𝑔) pairs:

(0, 𝑔𝑙 ), (1, 𝑔𝑙 ), (2, 𝑔𝑙 ), (3, 𝑔𝑙 ), (3, 𝑔ℎ ), (𝑡 ≥ 4, 𝑔𝑙 )

We think of these 6 states as corresponding to 𝑠 = 1, 2, 3, 4, 5, 6


The transition matrix is

0 1 0 0 0 0

⎜ 0 0 1 0 0 0⎞⎟

⎜ ⎟
0 0 0 0.5 0.5 0⎟
𝑃 =⎜




⎜ 0 0 0 0 0 1⎟⎟

⎜0 ⎟
0 0 0 0 1⎟
⎝0 0 0 0 0 1⎠

The government expenditure at each state is

0.1

⎜0.1⎞⎟

⎜ ⎟

⎜0.1 ⎟
𝑔=⎜
⎜ ⎟
⎜0.1⎟⎟
⎜0.2⎟
⎜ ⎟
0.1
⎝ ⎠

We assume the same utility parameters as in the Lucas-Stokey economy


This utility function is implemented in the following class

In [5]: import numpy as np

class CRRAutility:

def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.σ, self.γ = β, σ, γ


self.π, self.G, self.Θ, self.transfers = π, G, Θ, transfers

# Utility function
def U(self, c, n):
80.5. EXAMPLES 1357

σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)

# Derivatives of utility function


def Uc(self, c, n):
return c**(-self.σ)

def Ucc(self, c, n):


return -self.σ * c**(-self.σ - 1)

def Un(self, c, n):


return -n**self.γ

def Unn(self, c, n):


return -self.γ * n**(self.γ - 1)

The following figure plots the Ramsey plan under both complete and incomplete markets for
both possible realizations of the state at time 𝑡 = 3
Optimal policies when the government has access to state contingent debt are represented by
black lines, while the optimal policies when there is only a risk-free bond are in red
Paths with circles are histories in which there is peace, while those with triangle denote war

In [6]: import matplotlib.pyplot as plt


%matplotlib inline

# Initialize μgrid for value function iteration


μ_grid = np.linspace(-0.7, 0.01, 200)

time_example = CRRAutility()

time_example.π = np.array([[0, 1, 0, 0, 0, 0],


[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0.5, 0.5, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1]])

time_example.G = np.array([0.1, 0.1, 0.1, 0.2, 0.1, 0.1])


time_example.Θ = np.ones(6) # Θ can in principle be random

time_example.transfers = True # Government can use transfers


time_sequential = SequentialAllocation(time_example) # Solve sequential problem
time_bellman = RecursiveAllocationAMSS(time_example, μ_grid) # Solve recursive problem

sHist_h = np.array([0, 1, 2, 3, 5, 5, 5])


sHist_l = np.array([0, 1, 2, 4, 5, 5, 5])

sim_seq_h = time_sequential.simulate(1, 0, 7, sHist_h)


sim_bel_h = time_bellman.simulate(1, 0, 7, sHist_h)
sim_seq_l = time_sequential.simulate(1, 0, 7, sHist_l)
sim_bel_l = time_bellman.simulate(1, 0, 7, sHist_l)

# Government spending paths


sim_seq_l[4] = time_example.G[sHist_l]
sim_seq_h[4] = time_example.G[sHist_h]
sim_bel_l[4] = time_example.G[sHist_l]
sim_bel_h[4] = time_example.G[sHist_h]

# Output paths
sim_seq_l[5] = time_example.Θ[sHist_l] * sim_seq_l[1]
sim_seq_h[5] = time_example.Θ[sHist_h] * sim_seq_h[1]
sim_bel_l[5] = time_example.Θ[sHist_l] * sim_bel_l[1]
sim_bel_h[5] = time_example.Θ[sHist_h] * sim_bel_h[1]
1358 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

fig, axes = plt.subplots(3, 2, figsize=(14, 10))


titles = ['Consumption', 'Labor Supply', 'Government Debt',
'Tax Rate', 'Government Spending', 'Output']

for ax, title, sim_l, sim_h, bel_l, bel_h in zip(axes.flatten(), titles,


sim_seq_l, sim_seq_h,
sim_bel_l, sim_bel_h):
ax.plot(sim_l, '-ok', sim_h, '-^k', bel_l, '-or', bel_h, '-^r', alpha=0.7)
ax.set(title=title)
ax.grid()

plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24: RuntimeWarning: divide by zero enco


/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29: RuntimeWarning: divide by zero enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:225: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:230: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco

0.6029333236643755
0.11899588239403049
0.09881553212225772
0.08354106892508192
0.07149555120835548
0.06173036758118132
0.05366019901394205
0.04689112026451663
0.04115178347560931
0.036240012965927396
0.032006237992696515
0.028368464481562206
0.025192689677184087
0.022405843880195616
0.01994774715614924
0.017777614158738117
0.01586311426476452
0.014157556340393418
0.012655688350303772
0.011323561508356405
0.010134342587404501
0.009067133049314944
0.008133363039380094
0.007289176565901135
0.006541414713738157
0.005872916742002829
0.005262680193064001
0.0047307749771207785
0.00425304528362447
0.003818501528167009
0.0034264405600953744
0.003079364780532014
0.002768326786546087
0.002490427866931677
0.002240592066624134
0.0020186948255381727
0.001817134273040178
0.001636402035539666
0.0014731339707420147
0.0013228186455305523
0.0011905279885160533
0.001069923299755228
0.0009619064545164963
0.000866106560101833
0.0007801798498127538
0.0007044038334509719
0.001135820461718877
0.0005858462046557034
0.0005148785169405882
80.5. EXAMPLES 1359

0.0008125646930954998
0.000419343630648423
0.0006110525605884945
0.0003393644339027041
0.00030505082851731526
0.0002748939327310508
0.0002466101258104514
0.00022217612526700695
0.00020017376735678401
0.00018111714263865545
0.00016358937979053516
0.00014736943218961575
0.00013236625616948046
0.00011853760872608077
0.00010958653853354627
9.594155330329376e-05

How a Ramsey planner responds to war depends on the structure of the asset market
If it is able to trade state-contingent debt, then at time 𝑡 = 2

• the government purchases an Arrow security that pays off when 𝑔3 = 𝑔ℎ


• the government sells an Arrow security that pays off when 𝑔3 = 𝑔𝑙
• These purchases are designed in such a way that regardless of whether or not there is a
war at 𝑡 = 3, the government will begin period 𝑡 = 4 with the same government debt

This pattern facilities smoothing tax rates across states


The government without state contingent debt cannot do this
Instead, it must enter time 𝑡 = 3 with the same level of debt falling due whether there is
peace or war at 𝑡 = 3
1360 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

It responds to this constraint by smoothing tax rates across time


To finance a war it raises taxes and issues more debt
To service the additional debt burden, it raises taxes in all future periods
The absence of state contingent debt leads to an important difference in the optimal tax pol-
icy
When the Ramsey planner has access to state contingent debt, the optimal tax policy is his-
tory independent

• the tax rate is a function of the current level of government spending only, given the
Lagrange multiplier on the implementability constraint

Without state contingent debt, the optimal tax rate is history dependent

• A war at time 𝑡 = 3 causes a permanent increase in the tax rate

Perpetual War Alert


History dependence occurs more dramatically in a case in which the government perpetually
faces the prospect of war
This case was studied in the final example of the lecture on optimal taxation with state-
contingent debt
There, each period the government faces a constant probability, 0.5, of war
In addition, this example features the following preferences

𝑢(𝑐, 𝑛) = log(𝑐) + 0.69 log(1 − 𝑛)

In accordance, we will re-define our utility function

In [7]: class LogUtility:

def __init__(self,
β=0.9,
ψ=0.69,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.ψ, self.π = β, ψ, π


self.G, self.Θ, self.transfers = G, Θ, transfers

# Utility function
def U(self, c, n):
return np.log(c) + self.ψ * np.log(1 - n)

# Derivatives of utility function


def Uc(self, c, n):
return 1 / c

def Ucc(self, c, n):


return -c**(-2)

def Un(self, c, n):


return -self.ψ / (1 - n)

def Unn(self, c, n):


return -self.ψ / (1 - n)**2
80.5. EXAMPLES 1361

With these preferences, Ramsey tax rates will vary even in the Lucas-Stokey model with
state-contingent debt
The figure below plots optimal tax policies for both the economy with state contingent debt
(circles) and the economy with only a risk-free bond (triangles)

In [8]: log_example = LogUtility()


log_example.transfers = True # Government can use transfers
log_sequential = SequentialAllocation(log_example) # Solve sequential problem
log_bellman = RecursiveAllocationAMSS(log_example, μ_grid)

T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])

# Simulate
sim_seq = log_sequential.simulate(0.5, 0, T, sHist)
sim_bel = log_bellman.simulate(0.5, 0, T, sHist)

titles = ['Consumption', 'Labor Supply', 'Government Debt',


'Tax Rate', 'Government Spending', 'Output']

# Government spending paths


sim_seq[4] = log_example.G[sHist]
sim_bel[4] = log_example.G[sHist]

# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]

fig, axes = plt.subplots(3, 2, figsize=(14, 10))

for ax, title, seq, bel in zip(axes.flatten(), titles, sim_seq, sim_bel):


ax.plot(seq, '-ok', bel, '-^b')
ax.set(title=title)
ax.grid()

axes[0, 0].legend(('Complete Markets', 'Incomplete Markets'))


plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16: RuntimeWarning: invalid value encou


app.launch_new_instance()
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:16: RuntimeWarning: divide by zero enco
app.launch_new_instance()
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:20: RuntimeWarning: divide by zero enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco

0.09444436241467027
0.05938723624807882
0.009418765429903522
0.008379498687574425
0.0074624123240604355
0.006647816620408291
0.005931361510280879
0.005294448322145543
0.0047253954106721
0.0042222775808757355
0.0037757367327914595
0.0033746180929005954
0.003017386825278821
0.002699930230109115
0.002417750826161132
0.002162259204654334
0.0019376221726160596
0.001735451076427532
0.0015551292357692775
0.0013916748907577743
1362 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

0.0012464994947173087
0.0011179310440191763
0.0010013269547115295
0.0008961739076702308
0.0008040179931696027
0.0007206700039880485
0.0006461943981250373
0.0005794223638423901
0.0005197699346274911
0.0004655559524182191
0.00047793563176274804
0.00041453864841817386
0.0003355701386912934
0.0003008429779500316
0.00034467634902466326
0.00024187486910206502
0.0002748557784369297
0.0002832106657514851
0.00017453973560832125
0.00017016491393155364
0.00017694150942686578
0.00019907388038770387
0.00011291946575698052
0.00010121277902064972
9.094360747603131e-05

When the government experiences a prolonged period of peace, it is able to reduce govern-
ment debt and set permanently lower tax rates
However, the government finances a long war by borrowing and raising taxes
This results in a drift away from policies with state contingent debt that depends on the his-
tory of shocks
This is even more evident in the following figure that plots the evolution of the two policies
over 200 periods
80.5. EXAMPLES 1363

In [9]: T = 200 # Set T to 200 periods


sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)

titles = ['Consumption', 'Labor Supply', 'Government Debt',


'Tax Rate', 'Government Spending', 'Output']

# Government spending paths


sim_seq_long[4] = log_example.G[sHist_long]
sim_bel_long[4] = log_example.G[sHist_long]

# Output paths
sim_seq_long[5] = log_example.Θ[sHist_long] * sim_seq_long[1]
sim_bel_long[5] = log_example.Θ[sHist_long] * sim_bel_long[1]

fig, axes = plt.subplots(3, 2, figsize=(14, 10))

for ax, title, seq, bel in zip(axes.flatten(), titles, sim_seq_long, sim_bel_long):


ax.plot(seq, '-k', bel, '-.b', alpha=0.5)
ax.set(title=title)
ax.grid()

axes[0, 0].legend(('Complete Markets','Incomplete Markets'))


plt.tight_layout()
plt.show()

Footnotes
[1] In an allocation that solves the Ramsey problem and that levies distorting taxes on labor,
why would the government ever want to hand revenues back to the private sector? It would
not in an economy with state-contingent debt, since any such allocation could be improved by
lowering distortionary taxes rather than handing out lump-sum transfers. But, without state-
contingent debt there can be circumstances when a government would like to make lump-sum
transfers to the private sector.
1364 80. OPTIMAL TAXATION WITHOUT STATE-CONTINGENT DEBT

[2] From the first-order conditions for the Ramsey problem, there exists another realization 𝑠𝑡̃
with the same history up until the previous period, i.e., 𝑠𝑡−1̃ = 𝑠𝑡−1 , but where the multiplier
𝑡
on constraint Eq. (11) takes a positive value, so 𝛾𝑡 (𝑠 ̃ ) > 0.
81

Fluctuating Interest Rates Deliver


Fiscal Insurance

81.1 Contents

• Overview 81.2

• Forces at Work 81.3

• Logical Flow of Lecture 81.4

• Example Economy 81.5

• Reverse Engineering Strategy 81.6

• Code for Reverse Engineering 81.7

• Short Simulation for Reverse-engineered: Initial Debt 81.8

• Long Simulation 81.9

• BEGS Approximations of Limiting Debt and Convergence Rate 81.10

Co-authors: Anmol Bhandari and David Evans


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

81.2 Overview

This lecture extends our investigations of how optimal policies for levying a flat-rate tax on
labor income and issuing government debt depend on whether there are complete markets for
debt
A Ramsey allocation and Ramsey policy in the AMSS [5] model described in optimal taxation
without state-contingent debt generally differs from a Ramsey allocation and Ramsey policy
in the Lucas-Stokey [90] model described in optimal taxation with state-contingent debt

1365
1366 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

This is because the implementability restriction that a competitive equilibrium with a distort-
ing tax imposes on allocations in the Lucas-Stokey model is just one among a set of imple-
mentability conditions imposed in the AMSS model
These additional constraints require that time 𝑡 components of a Ramsey allocation for the
AMSS model be measurable with respect to time 𝑡 − 1 information
The measurability constraints imposed by the AMSS model are inherited from the restriction
that only one-period risk-free bonds can be traded
Differences between the Ramsey allocations in the two models indicate that at least some
of the measurability constraints of the AMSS model of optimal taxation without state-
contingent debt are violated at the Ramsey allocation of a corresponding [90] model with
state-contingent debt
Another way to say this is that differences between the Ramsey allocations of the two models
indicate that some of the measurability constraints of the AMSS model are violated at the
Ramsey allocation of the Lucas-Stokey model
Nonzero Lagrange multipliers on those constraints make the Ramsey allocation for the AMSS
model differ from the Ramsey allocation for the Lucas-Stokey model
This lecture studies a special AMSS model in which

• The exogenous state variable 𝑠𝑡 is governed by a finite-state Markov chain

• With an arbitrary budget-feasible initial level of government debt, the measurability


constraints

– bind for many periods, but …


– eventually, they stop binding evermore, so …
– in the tail of the Ramsey plan, the Lagrange multipliers 𝛾𝑡 (𝑠𝑡 ) on the AMSS imple-
mentability constraints (8) converge to zero

• After the implementability constraints (8) no longer bind in the tail of the AMSS Ram-
sey plan

– history dependence of the AMSS state variable 𝑥𝑡 vanishes and 𝑥𝑡 becomes a time-
invariant function of the Markov state 𝑠𝑡
– the par value of government debt becomes constant over time so that 𝑏𝑡+1 (𝑠𝑡 ) =
𝑏̄ for 𝑡 ≥ 𝑇 for a sufficiently large 𝑇
– 𝑏̄ < 0, so that the tail of the Ramsey plan instructs the government always to make
a constant par value of risk-free one-period loans to the private sector
– the one-period gross interest rate 𝑅𝑡 (𝑠𝑡 ) on risk-free debt converges to a time-
invariant function of the Markov state 𝑠𝑡

• For a particular 𝑏0 < 0 (i.e., a positive level of initial government loans to the private
sector), the measurability constraints never bind

• In this special case

– the par value 𝑏𝑡+1 (𝑠𝑡 ) = 𝑏̄ of government debt at time 𝑡 and Markov state 𝑠𝑡 is
constant across time and states, but …
̄
– the market value 𝑅 𝑏(𝑠 ) of government debt at time 𝑡 varies as a time-invariant
𝑡 𝑡
function of the Markov state 𝑠𝑡
81.3. FORCES AT WORK 1367

̄
– fluctuations in the interest rate make gross earnings on government debt 𝑅 𝑏(𝑠 )
𝑡 𝑡
fully insure the gross-of-gross-interest-payments government budget against fluc-
tuations in government expenditures
– the state variable 𝑥 in a recursive representation of a Ramsey plan is a time-
invariant function of the Markov state for 𝑡 ≥ 0

• In this special case, the Ramsey allocation in the AMSS model agrees with that in a
[90] model in which the same amount of state-contingent debt falls due in all states to-
morrow

– it is a situation in which the Ramsey planner loses nothing from not being able to
purchase state-contingent debt and being restricted to exchange only risk-free debt
debt

• This outcome emerges only when we initialize government debt at a particular 𝑏0 < 0

In a nutshell, the reason for this striking outcome is that at a particular level of risk-free gov-
ernment assets, fluctuations in the one-period risk-free interest rate provide the government
with complete insurance against stochastically varying government expenditures

81.3 Forces at Work

The forces driving asymptotic outcomes here are examples of dynamics present in a more gen-
eral class incomplete markets models analyzed in [17] (BEGS)
BEGS provide conditions under which government debt under a Ramsey plan converges to an
invariant distribution
BEGS construct approximations to that asymptotically invariant distribution of government
debt under a Ramsey plan
BEGS also compute an approximation to a Ramsey plan’s rate of convergence to that limit-
ing invariant distribution
We shall use the BEGS approximating limiting distribution and the approximating rate of
convergence to help interpret outcomes here
For a long time, the Ramsey plan puts a nontrivial martingale-like component into the par
value of government debt as part of the way that the Ramsey plan imperfectly smooths dis-
tortions from the labor tax rate across time and Markov states
But BEGS show that binding implementability constraints slowly push government debt in
a direction designed to let the government use fluctuations in equilibrium interest rate rather
than fluctuations in par values of debt to insure against shocks to government expenditures

• This is a weak (but unrelenting) force that, starting from an initial debt level, for a
long time is dominated by the stochastic martingale-like component of debt dynam-
ics that the Ramsey planner uses to facilitate imperfect tax-smoothing across time and
states
• This weak force slowly drives the par value of government assets to a constant level
at which the government can completely insure against government expenditure shocks
while shutting down the stochastic component of debt dynamics
• At that point, the tail of the par value of government debt becomes a trivial martingale:
it is constant over time
1368 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

81.4 Logical Flow of Lecture

We present ideas in the following order

• We describe a two-state AMSS economy and generate a long simulation starting from a
positive initial government debt
• We observe that in a long simulation starting from positive government debt, the par
value of government debt eventually converges to a constant 𝑏̄
• In fact, the par value of government debt converges to the same constant level 𝑏̄ for al-
ternative realizations of the Markov government expenditure process and for alternative
settings of initial government debt 𝑏0
• We reverse engineer a particular value of initial government debt 𝑏0 (it turns out to be
negative) for which the continuation debt moves to 𝑏̄ immediately
• We note that for this particular initial debt 𝑏0 , the Ramsey allocations for the AMSS
economy and the Lucas-Stokey model are identical
– we verify that the LS Ramsey planner chooses to purchase identical claims to
time 𝑡 + 1 consumption for all Markov states tomorrow for each Markov state to-
day
• We compute the BEGS approximations to check how accurately they describe the dy-
namics of the long-simulation

81.4.1 Equations from Lucas-Stokey (1983) Model

Although we are studying an AMSS [5] economy, a Lucas-Stokey [90] economy plays an im-
portant role in the reverse-engineering calculation to be described below
For that reason, it is helpful to have readily available some key equations underlying a Ram-
sey plan for the Lucas-Stokey economy
Recall first-order conditions for a Ramsey allocation for the Lucas-Stokey economy
For 𝑡 ≥ 1, these take the form

(1 + Φ)𝑢𝑐 (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐𝑐 (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓ𝑐 (𝑐, 1 − 𝑐 − 𝑔)]


(1)
= (1 + Φ)𝑢ℓ (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐ℓ (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓℓ (𝑐, 1 − 𝑐 − 𝑔)]

There is one such equation for each value of the Markov state 𝑠𝑡
In addition, given an initial Markov state, the time 𝑡 = 0 quantities 𝑐0 and 𝑏0 satisfy

(1 + Φ)𝑢𝑐 (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐𝑐 (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓ𝑐 (𝑐, 1 − 𝑐 − 𝑔)]


= (1 + Φ)𝑢ℓ (𝑐, 1 − 𝑐 − 𝑔) + Φ[𝑐𝑢𝑐ℓ (𝑐, 1 − 𝑐 − 𝑔) − (𝑐 + 𝑔)𝑢ℓℓ (𝑐, 1 − 𝑐 − 𝑔)] + Φ(𝑢𝑐𝑐 − 𝑢𝑐,ℓ )𝑏0
(2)
In addition, the time 𝑡 = 0 budget constraint is satisfied at 𝑐0 and initial government debt 𝑏0 :

𝑏̄
𝑏0 + 𝑔0 = 𝜏0 (𝑐0 + 𝑔0 ) + (3)
𝑅0

where 𝑅0 is the gross interest rate for the Markov state 𝑠0 that is assumed to prevail at time
𝑡 = 0 and 𝜏0 is the time 𝑡 = 0 tax rate
81.4. LOGICAL FLOW OF LECTURE 1369

In equation Eq. (3), it is understood that

𝑢𝑙,0
𝜏0 = 1 −
𝑢𝑐,0
𝑆
𝑢𝑐 (𝑠)
𝑅0−1 = 𝛽 ∑ Π(𝑠|𝑠0 )
𝑠=1
𝑢𝑐,0

It is useful to transform some of the above equations to forms that are more natural for ana-
lyzing the case of a CRRA utility specification that we shall use in our example economies

81.4.2 Specification with CRRA Utility

As in lectures optimal taxation without state-contingent debt and optimal taxation with
state-contingent debt, we assume that the representative agent has utility function

𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾

and set 𝜎 = 2, 𝛾 = 2, and the discount factor 𝛽 = 0.9


We eliminate leisure from the model and continue to assume that

𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡

The analysis of Lucas and Stokey prevails once we make the following replacements

𝑢ℓ (𝑐, ℓ) ∼ −𝑢𝑛 (𝑐, 𝑛)


𝑢𝑐 (𝑐, ℓ) ∼ 𝑢𝑐 (𝑐, 𝑛)
𝑢ℓ,ℓ (𝑐, ℓ) ∼ 𝑢𝑛𝑛 (𝑐, 𝑛)
𝑢𝑐,𝑐 (𝑐, ℓ) ∼ 𝑢𝑐,𝑐 (𝑐, 𝑛)
𝑢𝑐,ℓ (𝑐, ℓ) ∼ 0

With these understandings, equations Eq. (1) and Eq. (2) simplify in the case of the CRRA
utility function
They become

(1 + Φ)[𝑢𝑐 (𝑐) + 𝑢𝑛 (𝑐 + 𝑔)] + Φ[𝑐𝑢𝑐𝑐 (𝑐) + (𝑐 + 𝑔)𝑢𝑛𝑛 (𝑐 + 𝑔)] = 0 (4)

and

(1 + Φ)[𝑢𝑐 (𝑐0 ) + 𝑢𝑛 (𝑐0 + 𝑔0 )] + Φ[𝑐0 𝑢𝑐𝑐 (𝑐0 ) + (𝑐0 + 𝑔0 )𝑢𝑛𝑛 (𝑐0 + 𝑔0 )] − Φ𝑢𝑐𝑐 (𝑐0 )𝑏0 = 0 (5)

In equation Eq. (4), it is understood that 𝑐 and 𝑔 are each functions of the Markov state 𝑠
The CRRA utility function is represented in the following class
1370 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

In [2]: import numpy as np

class CRRAutility:

def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.σ, self.γ = β, σ, γ


self.π, self.G, self.Θ, self.transfers = π, G, Θ, transfers

# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)

# Derivatives of utility function


def Uc(self, c, n):
return c**(-self.σ)

def Ucc(self, c, n):


return -self.σ * c**(-self.σ - 1)

def Un(self, c, n):


return -n**self.γ

def Unn(self, c, n):


return -self.γ * n**(self.γ - 1)

81.5 Example Economy

We set the following parameter values


The Markov state 𝑠𝑡 takes two values, namely, 0, 1
The initial Markov state is 0
The Markov transition matrix is .5𝐼 where 𝐼 is a 2 × 2 identity matrix, so the 𝑠𝑡 process is
IID
Government expenditures 𝑔(𝑠) equal .1 in Markov state 0 and .2 in Markov state 1
We set preference parameters as follows:

𝛽 = .9
𝜎=2
𝛾=2

Here are several classes that do most of the work for us


The code is mostly taken or adapted from the earlier lectures optimal taxation without state-
contingent debt and optimal taxation with state-contingent debt

In [3]: import numpy as np


from scipy.optimize import root
81.5. EXAMPLE ECONOMY 1371

from quantecon import MarkovChain

class SequentialAllocation:

'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''

def __init__(self, model):

# Initialize from model object attributes


self.β, self.π, self.G = model.β, model.π, model.G
self.mc, self.Θ = MarkovChain(self.π), model.Θ
self.S = len(model.π) # Number of states
self.model = model

# Find the first best allocation


self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))

if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]

# Multiplier on the resource constraint


self.ΞFB = Uc(self.cFB, self.nFB)
self.zFB = np.hstack([self.cFB, self.nFB, self.ΞFB])

def time1_allocation(self, μ):


'''
Computes optimal allocation for time t >= 1 for a given μ
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])

# Find the root of the first-order condition


res = root(FOC, self.zFB)
if not res.success:
raise Exception('Could not find LS allocation.')
z = res.x
c, n, Ξ = z[:S], z[S:2 * S], z[2 * S:]

# Compute x
I = Uc(c, n) * c + Un(c, n) * n
1372 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

x = np.linalg.solve(np.eye(S) - self.β * self.π, I)

return c, n, x, Ξ

def time0_allocation(self, B_, s_0):


'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
model, π, Θ, G, β = self.model, self.π, self.Θ, self.G, self.β
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

# First order conditions of planner's problem


def FOC(z):
μ, c, n, Ξ = z
xprime = self.time1_allocation(μ)[2]
return np.hstack([Uc(c, n) * (c - B_) + Un(c, n) * n + β * π[s_0] @ xprime,
Uc(c, n) - μ * (Ucc(c, n) *
(c - B_) + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n +
Un(c, n)) + Θ[s_0] * Ξ,
(Θ * n - c - G)[s_0]])

# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')

return res.x

def time1_value(self, μ):


'''
Find the value associated with multiplier μ
'''
c, n, x, Ξ = self.time1_allocation(μ)
U = self.model.U(c, n)
V = np.linalg.solve(np.eye(self.S) - self.β * self.π, U)
return c, n, x, V

def Τ(self, c, n):


'''
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π, β = self.model, self.π, self.β
Uc = model.Uc

if sHist is None:
sHist = self.mc.simulate(T, s_0)

cHist, nHist, Bhist, ΤHist, μHist = np.zeros((5, T))


RHist = np.zeros(T - 1)

# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ

# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
u_c = Uc(c, n)
s = sHist[t]
81.5. EXAMPLE ECONOMY 1373

Eu_c = π[sHist[t - 1]] @ u_c


cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ

return np.array([cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist])

In [4]: from scipy.optimize import fmin_slsqp

class RecursiveAllocationAMSS:

def __init__(self, model, μgrid, tol_diff=1e-4, tol=1e-4):

self.β, self.π, self.G = model.β, model.π, model.G


self.mc, self.S = MarkovChain(self.π), len(model.π) # Number of states
self.Θ, self.model, self.μgrid = model.Θ, model, μgrid
self.tol_diff, self.tol = tol_diff, tol

# Find the first best allocation


self.solve_time1_bellman()
self.T.time_0 = True # Bellman equation now solves time 0 problem

def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)

# First get initial fit from Lucas Stokey solution.


# Need to change things to be ex ante
PP = SequentialAllocation(model)
interp = interpolator_factory(2, None)

def incomplete_allocation(μ_, s_):


c, n, x, V = PP.time1_value(μ_)
return c, n, π[s_] @ x, π[s_] @ V
cf, nf, xgrid, Vf, xprimef = [], [], [], [], []
for s_ in range(S):
c, n, x, V = zip(*map(lambda μ: incomplete_allocation(μ, s_), μgrid0))
c, n = np.vstack(c).T, np.vstack(n).T
x, V = np.hstack(x), np.hstack(V)
xprimes = np.vstack([x] * S)
cf.append(interp(x, c))
nf.append(interp(x, n))
Vf.append(interp(x, V))
xgrid.append(x)
xprimef.append(interp(x, xprimes))
cf, nf, xprimef = fun_vstack(cf), fun_vstack(nf), fun_vstack(xprimef)
Vf = fun_hstack(Vf)
policies = [cf, nf, xprimef]

# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid

# Now iterate on Bellman equation


T = BellmanEquation(model, xgrid, policies, tol=self.tol)
diff = 1
while diff > self.tol_diff:
PF = T(Vf)

Vfnew, policies = self.fit_policy_function(PF)


diff = np.abs((Vf(xgrid) - Vfnew(xgrid)) / Vf(xgrid)).max()

print(diff)
Vf = Vfnew
1374 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

# store value function policies and Bellman Equations


self.Vf = Vf
self.policies = policies
self.T = T

def fit_policy_function(self, PF):


'''
Fits the policy functions
'''
S, xgrid = len(self.π), self.xgrid
interp = interpolator_factory(3, 0)
cf, nf, xprimef, Tf, Vf = [], [], [], [], []
for s_ in range(S):
PFvec = np.vstack([PF(x, s_) for x in self.xgrid]).T
Vf.append(interp(xgrid, PFvec[0, :]))
cf.append(interp(xgrid, PFvec[1:1 + S]))
nf.append(interp(xgrid, PFvec[1 + S:1 + 2 * S]))
xprimef.append(interp(xgrid, PFvec[1 + 2 * S:1 + 3 * S]))
Tf.append(interp(xgrid, PFvec[1 + 3 * S:]))
policies = fun_vstack(cf), fun_vstack(
nf), fun_vstack(xprimef), fun_vstack(Tf)
Vf = fun_hstack(Vf)
return Vf, policies

def Τ(self, c, n):


'''
Computes Τ given c and n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def time0_allocation(self, B_, s0):


'''
Finds the optimal allocation given initial government debt B_ and
state s_0
'''
PF = self.T(self.Vf)
z0 = PF(B_, s0)
c0, n0, xprime0, T0 = z0[1:]
return c0, n0, xprime0, T0

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π = self.model, self.π
Uc = model.Uc
cf, nf, xprimef, Tf = self.policies

if sHist is None:
sHist = simulate_markov(π, s_0, T)

cHist, nHist, Bhist, xHist, ΤHist, THist, μHist = np.zeros((7, T))


# time 0
cHist[0], nHist[0], xHist[0], THist[0] = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = self.Vf[s_0](xHist[0])

# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)

Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c

μHist[t] = self.Vf[s](xprime[s])
81.5. EXAMPLE ECONOMY 1375

cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x / Eu_c, Τ


xHist[t], THist[t] = xprime[s], T[s]
return np.array([cHist, nHist, Bhist, ΤHist, THist, μHist, sHist, xHist])

class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''

def __init__(self, model, xgrid, policies0, tol, maxiter=1000):

self.β, self.π, self.G = model.β, model.π, model.G


self.S = len(model.π) # Number of states
self.Θ, self.model, self.tol = model.Θ, model, tol
self.maxiter = maxiter

self.xbar = [min(xgrid), max(xgrid)]


self.time_0 = False

self.z0 = {}
cf, nf, xprimef = policies0

for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])

self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))


if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB

self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)

self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])

def __call__(self, Vf):


'''
Given continuation value function next period return value function this
period return T(V) and optimal policies
'''
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF

def get_policies_time1(self, x, s_, Vf):


1376 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

'''
Finds the optimal policies
'''
model, β, Θ, G, S, π = self.model, self.β, self.Θ, self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]

Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])

return -π[s_] @ (U(c, n) + β * Vprime)

def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])

if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)

if imode > 0:
raise Exception(smode)

self.z0[x, s_] = out


return np.hstack([-fx, out])

def get_policies_time0(self, B_, s0, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, G = self.model, self.β, self.Θ, self.G
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:-1]

return -(U(c, n) + β * Vf[s0](xprime))

def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])

if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)

if imode > 0:
raise Exception(smode)

return np.hstack([-fx, out])

In [5]: from scipy.interpolate import UnivariateSpline


81.5. EXAMPLE ECONOMY 1377

class interpolate_wrapper:

def __init__(self, F):


self.F = F

def __getitem__(self, index):


return interpolate_wrapper(np.asarray(self.F[index]))

def reshape(self, *args):


self.F = self.F.reshape(*args)
return self

def transpose(self):
self.F = self.F.transpose()

def __len__(self):
return len(self.F)

def __call__(self, xvec):


x = np.atleast_1d(xvec)
shape = self.F.shape
if len(x) == 1:
fhat = np.hstack([f(x) for f in self.F.flatten()])
return fhat.reshape(shape)
else:
fhat = np.vstack([f(x) for f in self.F.flatten()])
return fhat.reshape(np.hstack((shape, len(x))))

class interpolator_factory:

def __init__(self, k, s):


self.k, self.s = k, s

def __call__(self, xgrid, Fs):


shape, m = Fs.shape[:-1], Fs.shape[-1]
Fs = Fs.reshape((-1, m))
F = []
xgrid = np.sort(xgrid) # Sort xgrid
for Fhat in Fs:
F.append(UnivariateSpline(xgrid, Fhat, k=self.k, s=self.s))
return interpolate_wrapper(np.array(F).reshape(shape))

def fun_vstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.vstack(Fs))

def fun_hstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.hstack(Fs))

def simulate_markov(π, s_0, T):

sHist = np.empty(T, dtype=int)


sHist[0] = s_0
S = len(π)
for t in range(1, T):
sHist[t] = np.random.choice(np.arange(S), p=π[sHist[t - 1]])

return sHist
1378 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

81.6 Reverse Engineering Strategy

We can reverse engineer a value 𝑏0 of initial debt due that renders the AMSS measurability
constraints not binding from time 𝑡 = 0 onward
We accomplish this by recognizing that if the AMSS measurability constraints never bind,
then the AMSS allocation and Ramsey plan is equivalent with that for a Lucas-Stokey econ-
omy in which for each period 𝑡 ≥ 0, the government promises to pay the same state-
contingent amount 𝑏̄ in each state tomorrow
This insight tells us to find a 𝑏0 and other fundamentals for the Lucas-Stokey [90] model that
make the Ramsey planner want to borrow the same value 𝑏̄ next period for all states and all
dates
We accomplish this by using various equations for the Lucas-Stokey [90] model presented in
optimal taxation with state-contingent debt
We use the following steps
Step 1: Pick an initial Φ
Step 2: Given that Φ, jointly solve two versions of equation Eq. (4) for 𝑐(𝑠), 𝑠 = 1, 2 associ-
ated with the two values for 𝑔(𝑠), 𝑠 = 1, 2
Step 3: Solve the following equation for 𝑥⃗

𝑥⃗ = (𝐼 − 𝛽Π)−1 [𝑢⃗𝑐 (𝑛⃗ − 𝑔)⃗ − 𝑢⃗𝑙 𝑛]⃗ (6)


𝑥(𝑠)
Step 4: After solving for 𝑥,⃗ we can find 𝑏(𝑠𝑡 |𝑠𝑡−1 ) in Markov state 𝑠𝑡 = 𝑠 from 𝑏(𝑠) = 𝑢𝑐 (𝑠) or
the matrix equation

𝑥⃗
𝑏⃗ = (7)
𝑢⃗𝑐

Step 5: Compute 𝐽 (Φ) = (𝑏(1) − 𝑏(2))2


Step 6: Put steps 2 through 6 in a function minimizer and find a Φ that minimizes 𝐽 (Φ)
Step 7: At the value of Φ and the value of 𝑏̄ that emerged from step 6, solve equations
Eq. (5) and Eq. (3) jointly for 𝑐0 , 𝑏0

81.7 Code for Reverse Engineering

Here is code to do the calculations for us

In [6]: from scipy.optimize import fsolve, fmin

u = CRRAutility()

def min_Φ(Φ):

g1, g2 = u.G # Government spending in s=0 and s=1

# Solve Φ(c)
def equations(unknowns, Φ):
c1, c2 = unknowns
# First argument of .Uc and second argument of .Un are redundant
81.7. CODE FOR REVERSE ENGINEERING 1379

# Set up simultaneous equations


eq = lambda c, g: (1 + Φ) * (u.Uc(c, 1) - -u.Un(1, c + g)) + \
Φ * ((c + g) * u.Unn(1, c + g) + c * u.Ucc(c, 1))

# Return equation evaluated at s=1 and s=2


return np.array([eq(c1, g1), eq(c2, g2)]).flatten()

global c1 # Update c1 globally


global c2 # Update c2 globally

c1, c2 = fsolve(equations, np.ones(2), args=(Φ))

uc = u.Uc(np.array([c1, c2]), 1) # uc(n - g)


ul = -u.Un(1, np.array([c1 + g1, c2 + g2])) * [c1 + g1, c2 + g2] # ul(n) = -un(c + g)
x = np.linalg.solve(np.eye((2)) - u.β * u.π, uc * [c1, c2] - ul) # solve for x

global b # Update b globally


b = x / uc
loss = (b[0] - b[1])**2

return loss

Φ_star = fmin(min_Φ, .1, ftol=1e-14)

Optimization terminated successfully.


Current function value: 0.000000
Iterations: 24
Function evaluations: 48

To recover and print out 𝑏̄

In [7]: b_bar = b[0]


b_bar

Out[7]: -1.0757576567504166

To complete the reverse engineering exercise by jointly determining 𝑐0 , 𝑏0 , we set up a func-


tion that returns two simultaneous equations

In [8]: def solve_cb(unknowns, Φ, b_bar, s=1):

c0, b0 = unknowns

g0 = u.G[s-1]

R_0 = u.β * u.π[s] @ [u.Uc(c1, 1) / u.Uc(c0, 1), u.Uc(c2, 1) / u.Uc(c0, 1)]


R_0 = 1 / R_0

τ_0 = 1 + u.Un(1, c0 + g0) / u.Uc(c0, 1)

eq1 = τ_0 * (c0 + g0) + b_bar / R_0 - b0 - g0


eq2 = (1 + Φ) * (u.Uc(c0, 1) + u.Un(1, c0 + g0)) + \
Φ * (c0 * u.Ucc(c0, 1) + (c0 + g0) * u.Unn(1, c0 + g0)) - \
Φ * u.Ucc(c0, 1) * b0

return np.array([eq1, eq2], dtype='float64')

To solve the equations for 𝑐0 , 𝑏0 , we use SciPy’s fsolve function

In [9]: c0, b0 = fsolve(solve_cb, np.array([1., -1.], dtype='float64'), args=(Φ_star, b[0], 1), xtol=1.0e-12)
c0, b0

Out[9]: (0.9344994030900681, -1.0386984075517638)

Thus, we have reverse engineered an initial 𝑏0 = −1.038698407551764 that ought to render


the AMSS measurability constraints slack
1380 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

81.8 Short Simulation for Reverse-engineered: Initial Debt

The following graph shows simulations of outcomes for both a Lucas-Stokey economy and for
an AMSS economy starting from initial government debt equal to 𝑏0 = −1.038698407551764
These graphs report outcomes for both the Lucas-Stokey economy with complete markets and
the AMSS economy with one-period risk-free debt only

In [10]: import matplotlib.pyplot as plt


%matplotlib inline

μ_grid = np.linspace(-0.09, 0.1, 100)

log_example = CRRAutility()

log_example.transfers = True # Government can use transfers


log_sequential = SequentialAllocation(log_example) # Solve sequential problem
log_bellman = RecursiveAllocationAMSS(log_example, μ_grid, tol_diff=1e-10, tol=1e-12)

T = 20
sHist = np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0])

sim_seq = log_sequential.simulate(-1.03869841, 0, T, sHist)


sim_bel = log_bellman.simulate(-1.03869841, 0, T, sHist)

titles = ['Consumption', 'Labor Supply', 'Government Debt',


'Tax Rate', 'Government Spending', 'Output']

# Government spending paths


sim_seq[4] = log_example.G[sHist]
sim_bel[4] = log_example.G[sHist]

# Output paths
sim_seq[5] = log_example.Θ[sHist] * sim_seq[1]
sim_bel[5] = log_example.Θ[sHist] * sim_bel[1]

fig, axes = plt.subplots(3, 2, figsize=(14, 10))

for ax, title, seq, bel in zip(axes.flatten(), titles, sim_seq, sim_bel):


ax.plot(seq, '-ok', bel, '-^b')
ax.set(title=title)
ax.grid()

axes[0, 0].legend(('Complete Markets', 'Incomplete Markets'))


plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24: RuntimeWarning: divide by zero enco


/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29: RuntimeWarning: divide by zero enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco

0.04094445433234912
0.0016732111459338028
0.0014846748487524172
0.0013137721375787164
0.001181403713496291
0.001055965336274255
0.0009446661646844358
0.0008463807322718293
0.0007560453780620191
0.0006756001036624751
0.0006041528458700388
0.0005396004512131591
0.0004820716911559142
0.0004308273211001684
81.8. SHORT SIMULATION FOR REVERSE-ENGINEERED: INITIAL DEBT 1381

0.0003848185136981698
0.0003438352175587286
0.000307243693715206
0.0002745009148200469
0.00024531773404782317
0.0002192332430448889
0.00019593539446980383
0.00017514303514117128
0.0001565593983558638
0.00013996737141091305
0.00012514457833358872
0.00011190070779369022
0.0001000702022487836
8.949728533921615e-05
8.004975220206986e-05
7.16059059036149e-05
6.40583656889648e-05
5.731162430892402e-05
5.127968193566545e-05
4.5886529754852955e-05
4.106387898823845e-05
3.675099365037568e-05
3.289361837628717e-05
2.9443289305467077e-05
2.635678797913085e-05
2.3595484132661966e-05
2.1124903957300157e-05
1.891424711454524e-05
1.6936003234214835e-05
1.5165596593393527e-05
1.358106697950504e-05
1.2162792578343118e-05
1.089323614045592e-05
9.756722989261432e-06
8.739240835382216e-06
7.828264537526775e-06
7.012590840428639e-06
6.282206099226885e-06
5.628151985858767e-06
5.042418443402312e-06
4.5178380641774095e-06
4.048002049270609e-06
3.6271748637111453e-06
3.25022483449945e-06
2.9125597419793e-06
2.6100730258792974e-06
2.33908472396273e-06
2.096307136505147e-06
1.8787904889257265e-06
1.6838997430816734e-06
1.509274819366032e-06
1.3528011889214775e-06
1.212587081653834e-06
1.0869381104429176e-06
9.743372244174285e-07
8.73426405689756e-07
7.829877314930334e-07
7.019331006223168e-07
6.292850109121352e-07
5.641704754646274e-07
5.058062142044674e-07
4.534908905846261e-07
4.0659614636622263e-07
3.6455917260464895e-07
3.2687571576858064e-07
2.9309400626589154e-07
2.628097110920697e-07
2.3565904692627078e-07
2.1131781852307158e-07
1.894947440294367e-07
1.699288361713118e-07
1.5238586063734686e-07
1.366568424325186e-07
1382 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

1.2255365279755824e-07
1.0990783200082102e-07
9.856861272368773e-08
8.840091774987147e-08
7.928334532230156e-08
7.110738489161091e-08
6.377562438179933e-08
5.720073827118772e-08
5.1304550974155735e-08
4.6016827121093976e-08
4.127508285786482e-08
3.702254013429707e-08
3.3208575403099436e-08
2.9788031505649846e-08
2.6720125194025672e-08
2.3968551794263268e-08
2.1500634727809534e-08
1.928709568259096e-08
1.7301644673193848e-08
1.5520805495718083e-08
1.3923446503682317e-08
1.2490628141347746e-08
1.1205412924843752e-08
1.005255424847768e-08
9.018420064493843e-09
8.090776959812253e-09
7.2586201295038205e-09
6.512151645666916e-09
5.842497427160883e-09
5.2417739988686235e-09
4.702866830975856e-09
4.219410867722359e-09
3.7856971691602775e-09
3.3965991981299917e-09
3.047527271191316e-09
2.73435780104547e-09
2.4533959184694e-09
2.201325576919178e-09
1.975173912964314e-09
1.7722736943474094e-09
1.5902318528480405e-09
1.4269032326934397e-09
1.280361209635549e-09
1.1488803057922307e-09
1.030910807308611e-09
9.250638131182712e-10
8.30091415855734e-10
7.44876618462649e-10
6.684152536152628e-10
5.998085081044447e-10
5.382483192957509e-10
4.830097256567513e-10
4.3344408654246964e-10
3.88969172650052e-10
3.4905943032488643e-10
3.1324806778169217e-10
2.811122777111904e-10
2.5227584505600285e-10
2.2639906361282244e-10
2.0317838832934676e-10
1.8234104590203233e-10
1.6364103618734542e-10
1.468608707188693e-10
1.3180218471597189e-10
1.182881710076278e-10
1.0616062455371046e-10
9.527750852134792e-11
81.9. LONG SIMULATION 1383

The Ramsey allocations and Ramsey outcomes are identical for the Lucas-Stokey and AMSS
economies
This outcome confirms the success of our reverse-engineering exercises
Notice how for 𝑡 ≥ 1, the tax rate is a constant - so is the par value of government debt
However, output and labor supply are both nontrivial time-invariant functions of the Markov
state

81.9 Long Simulation

The following graph shows the par value of government debt and the flat rate tax on labor
income for a long simulation for our sample economy
For the same realization of a government expenditure path, the graph reports outcomes for
two economies

• the gray lines are for the Lucas-Stokey economy with complete markets
• the blue lines are for the AMSS economy with risk-free one-period debt only

For both economies, initial government debt due at time 0 is 𝑏0 = .5


For the Lucas-Stokey complete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡+1 )

• Notice that this is a time-invariant function of the Markov state from the beginning

For the AMSS incomplete markets economy, the government debt plotted is 𝑏𝑡+1 (𝑠𝑡 )
1384 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

• Notice that this is a martingale-like random process that eventually seems to converge
to a constant 𝑏̄ ≈ −1.07
• Notice that the limiting value 𝑏̄ < 0 so that asymptotically the government makes a
constant level of risk-free loans to the public
• In the simulation displayed as well as other simulations we have run, the par value of
government debt converges to about 1.07 afters between 1400 to 2000 periods

For the AMSS incomplete markets economy, the marginal tax rate on labor income 𝜏𝑡 con-
verges to a constant

• labor supply and output each converge to time-invariant functions of the Markov state

In [11]: T = 2000 # Set T to 200 periods

sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)

titles = ['Government Debt', 'Tax Rate']

fig, axes = plt.subplots(2, 1, figsize=(14, 10))

for ax, title, id in zip(axes.flatten(), titles, [2, 3]):


ax.plot(sim_seq_long[id], '-k', sim_bel_long[id], '-.b', alpha=0.5)
ax.set(title=title)
ax.grid()

axes[0].legend(('Complete Markets', 'Incomplete Markets'))


plt.tight_layout()
plt.show()
81.10. BEGS APPROXIMATIONS OF LIMITING DEBT AND CONVERGENCE RATE1385

81.9.1 Remarks about Long Simulation

As remarked above, after 𝑏𝑡+1 (𝑠𝑡 ) has converged to a constant, the measurability constraints
in the AMSS model cease to bind

• the associated Lagrange multipliers on those implementability constraints converge to


zero

This leads us to seek an initial value of government debt 𝑏0 that renders the measurability
constraints slack from time 𝑡 = 0 onward

• a tell-tale sign of this situation is that the Ramsey planner in a corresponding Lucas-
Stokey economy would instruct the government to issue a constant level of government
debt 𝑏𝑡+1 (𝑠𝑡+1 ) across the two Markov states

We now describe how to find such an initial level of government debt

81.10 BEGS Approximations of Limiting Debt and Conver-


gence Rate

It is useful to link the outcome of our reverse engineering exercise to limiting approximations
constructed by [17]
[17] used a slightly different notation to represent a generalization of the AMSS model
We’ll introduce a version of their notation so that readers can quickly relate notation that
appears in their key formulas to the notation that we have used
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to our notation by

𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅 =
𝑢𝑐,𝑡−1 𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]

In terms of their notation, equation (44) of [17] expresses the time 𝑡 state 𝑠 government bud-
get constraint as

ℬ(𝑠) = ℛ𝜏 (𝑠, 𝑠− )ℬ− + 𝒳𝜏(𝑠) (𝑠) (8)

where the dependence on 𝜏 is to remind us that these objects depend on the tax rate and 𝑠−
is last period’s Markov state
BEGS interpret random variations in the right side of Eq. (8) as a measure of fiscal risk
composed of
1386 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government


portfolio, namely, ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡

81.10.1 Asymptotic Mean

BEGS give conditions under which the ergodic mean of ℬ𝑡 is

cov∞ (ℛ, 𝒳)
ℬ∗ = − (9)
var∞ (ℛ)

where the superscript ∞ denotes a moment taken with respect to an ergodic distribution
Formula Eq. (9) presents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribution
This regression coefficient emerges as the minimizer for a variance-minimization problem:

ℬ∗ = argminℬ var(ℛℬ + 𝒳) (10)

The minimand in criterion Eq. (10) is the measure of fiscal risk associated with a given tax-
debt policy that appears on the right side of equation Eq. (8)
Expressing formula Eq. (9) in terms of our notation tells us that 𝑏̄ should approximately
equal

ℬ∗
𝑏̂ = (11)
𝛽𝐸𝑡 𝑢𝑐,𝑡+1

81.10.2 Rate of Convergence

BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition

𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1

≈ 2
(12)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var(ℛ)

(See the equation above equation (47) in [17])

81.10.3 Formulas and Code Details

For our example, we describe some code that we use to compute the steady state mean and
the rate of convergence to it
The values of 𝜋(𝑠) are .5, .5
We can then construct 𝒳(𝑠), ℛ(𝑠), 𝑢𝑐 (𝑠) for our two states using the definitions above
We can then construct 𝛽𝐸𝑡−1 𝑢𝑐 = 𝛽 ∑𝑠 𝑢𝑐 (𝑠)𝜋(𝑠), cov(ℛ(𝑠), 𝒳(𝑠)) and var(ℛ(𝑠)) to be
plugged into formula Eq. (11)
We also want to compute var(𝒳)
81.10. BEGS APPROXIMATIONS OF LIMITING DEBT AND CONVERGENCE RATE1387

To compute the variances and covariance, we use the following standard formulas
Temporarily let 𝑥(𝑠), 𝑠 = 1, 2 be an arbitrary random variables
Then we define

𝜇𝑥 = ∑ 𝑥(𝑠)𝜋(𝑠)
𝑠

var(𝑥) = (∑ ∑ 𝑥(𝑠)2 𝜋(𝑠)) − 𝜇2𝑥


𝑠 𝑠

cov(𝑥, 𝑦) = (∑ 𝑥(𝑠)𝑦(𝑠)𝜋(𝑠)) − 𝜇𝑥 𝜇𝑦
𝑠

After we compute these moments, we compute the BEGS approximation to the asymptotic
mean 𝑏̂ in formula Eq. (11)
After that, we move on to compute ℬ∗ in formula Eq. (9)
We’ll also evaluate the BEGS criterion Eq. (8) at the limiting value ℬ∗

2
𝐽 (ℬ∗ ) = var(ℛ) (ℬ∗ ) + 2ℬ∗ cov(ℛ, 𝒳) + var(𝒳) (13)

Here are some functions that we’ll use to compute key objects that we want

In [12]: def mean(x):


'''Returns mean for x given initial state'''
x = np.array(x)
return x @ u.π[s]

def variance(x):
x = np.array(x)
return x**2 @ u.π[s] - mean(x)**2

def covariance(x, y):


x, y = np.array(x), np.array(y)
return x * y @ u.π[s] - mean(x) * mean(y)

Now let’s form the two random variables ℛ, 𝒳 appearing in the BEGS approximating formu-
las

In [13]: u = CRRAutility()

s = 0
c = [0.940580824225584, 0.8943592757759343] # Vector for c
g = u.G # Vector for g
n = c + g # Total population
τ = lambda s: 1 + u.Un(1, n[s]) / u.Uc(c[s], 1)

R_s = lambda s: u.Uc(c[s], n[s]) / (u.β * (u.Uc(c[0], n[0]) * u.π[0, 0] + \


u.Uc(c[1], n[1]) * u.π[1, 0]))
X_s = lambda s: u.Uc(c[s], n[s]) * (g[s] - τ(s) * n[s])

R = [R_s(0), R_s(1)]
X = [X_s(0), X_s(1)]

print(f"R, X = {R}, {X}")

R, X = [1.055169547122964, 1.1670526750992583], [0.06357685646224803, 0.19251010100512958]


1388 81. FLUCTUATING INTEREST RATES DELIVER FISCAL INSURANCE

Now let’s compute the ingredient of the approximating limit and the approximating rate of
convergence

In [14]: bstar = -covariance(R, X) / variance(R)


div = u.β * (u.Uc(c[0], n[0]) * u.π[s, 0] + u.Uc(c[1], n[1]) * u.π[s, 1])
bhat = bstar / div
bhat

Out[14]: -1.0757585378303758

Print out 𝑏̂ and 𝑏̄

In [15]: bhat, b_bar

Out[15]: (-1.0757585378303758, -1.0757576567504166)

So we have

In [16]: bhat - b_bar

Out[16]: -8.810799592140484e-07

These outcomes show that 𝑏̂ does a remarkably good job of approximating 𝑏̄


Next, let’s compute the BEGS fiscal criterion that 𝑏̂ is minimizing

In [17]: Jmin = variance(R) * bstar**2 + 2 * bstar * covariance(R, X) + variance(X)


Jmin

Out[17]: -9.020562075079397e-17

This is machine zero, a verification that 𝑏̂ succeeds in minimizing the nonnegative fiscal cost
criterion 𝐽 (ℬ∗ ) defined in BEGS and in equation Eq. (13) above
Let’s push our luck and compute the mean reversion speed in the formula above equation
(47) in [17]

In [18]: den2 = 1 + (u.β**2) * variance(R)


speedrever = 1/den2
print(f'Mean reversion speed = {speedrever}')

Mean reversion speed = 0.9974715478249827

Now let’s compute the implied meantime to get to within .01 of the limit

In [19]: ttime = np.log(.01) / np.log(speedrever)


print(f"Time to get within .01 of limit = {ttime}")

Time to get within .01 of limit = 1819.0360880098472

The slow rate of convergence and the implied time of getting within one percent of the limit-
ing value do a good job of approximating our long simulation above
82

Fiscal Risk and Government Debt

82.1 Contents

• Overview 82.2

• The Economy 82.3

• Long Simulation 82.4

• Asymptotic Mean and Rate of Convergence 82.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

82.2 Overview

This lecture studies government debt in an AMSS economy [5] of the type described in Opti-
mal Taxation without State-Contingent Debt
We study the behavior of government debt as time 𝑡 → +∞
We use these techniques

• simulations
• a regression coefficient from the tail of a long simulation that allows us to
verify that the asymptotic mean of government debt solves a fiscal-risk mini-
mization problem
• an approximation to the mean of an ergodic distribution of government debt
• an approximation to the rate of convergence to an ergodic distribution of
government debt

We apply tools applicable to more general incomplete markets economies that are presented
on pages 648 - 650 in section III.D of [17] (BEGS)
We study an [5] economy with three Markov states driving government expenditures

1389
1390 82. FISCAL RISK AND GOVERNMENT DEBT

• In a previous lecture, we showed that with only two Markov states, it is pos-
sible that eventually endogenous interest rate fluctuations support complete
markets allocations and Ramsey outcomes
• The presence of three states prevents the full spanning that eventually pre-
vails in the two-state example featured in Fiscal Insurance via Fluctuating
Interest Rates

The lack of full spanning means that the ergodic distribution of the par value of government
debt is nontrivial, in contrast to the situation in Fiscal Insurance via Fluctuating Interest
Rates where the ergodic distribution of the par value is concentrated on one point
Nevertheless, [17] (BEGS) establish for general settings that include ours, the Ramsey plan-
ner steers government assets to a level that comes as close as possible to providing full
spanning in a precise a sense defined by BEGS that we describe below
We use code constructed in a previous lecture
Warning: Key equations in [17] section III.D carry typos that we correct below

82.3 The Economy

As in Optimal Taxation without State-Contingent Debt and Optimal Taxation with State-
Contingent Debt, we assume that the representative agent has utility function

𝑐1−𝜎 𝑛1+𝛾
𝑢(𝑐, 𝑛) = −
1−𝜎 1+𝛾

We work directly with labor supply instead of leisure


We assume that

𝑐𝑡 + 𝑔𝑡 = 𝑛𝑡

The Markov state 𝑠𝑡 takes three values, namely, 0, 1, 2


The initial Markov state is 0
The Markov transition matrix is (1/3)𝐼 where 𝐼 is a 3 × 3 identity matrix, so the 𝑠𝑡 process is
IID
Government expenditures 𝑔(𝑠) equal .1 in Markov state 0, .2 in Markov state 1, and .3 in
Markov state 2
We set preference parameters

𝛽 = .9
𝜎=2
𝛾=2

The following Python code sets up the economy

In [2]: import numpy as np


82.4. LONG SIMULATION 1391

class CRRAutility:

def __init__(self,
β=0.9,
σ=2,
γ=2,
π=0.5*np.ones((2, 2)),
G=np.array([0.1, 0.2]),
Θ=np.ones(2),
transfers=False):

self.β, self.σ, self.γ = β, σ, γ


self.π, self.G, self.Θ, self.transfers = π, G, Θ, transfers

# Utility function
def U(self, c, n):
σ = self.σ
if σ == 1.:
U = np.log(c)
else:
U = (c**(1 - σ) - 1) / (1 - σ)
return U - n**(1 + self.γ) / (1 + self.γ)

# Derivatives of utility function


def Uc(self, c, n):
return c**(-self.σ)

def Ucc(self, c, n):


return -self.σ * c**(-self.σ - 1)

def Un(self, c, n):


return -n**self.γ

def Unn(self, c, n):


return -self.γ * n**(self.γ - 1)

82.3.1 First and Second Moments

We’ll want first and second moments of some key random variables below
The following code computes these moments; the code is recycled from Fiscal Insurance via
Fluctuating Interest Rates

In [3]: def mean(x, s):


'''Returns mean for x given initial state'''
x = np.array(x)
return x @ u.π[s]

def variance(x, s):


x = np.array(x)
return x**2 @ u.π[s] - mean(x, s)**2

def covariance(x, y, s):


x, y = np.array(x), np.array(y)
return x * y @ u.π[s] - mean(x, s) * mean(y, s)

82.4 Long Simulation

To generate a long simulation we use the following code


We begin by showing the code that we used in earlier lectures on the AMSS model
Here it is
1392 82. FISCAL RISK AND GOVERNMENT DEBT

In [4]: import numpy as np


from scipy.optimize import root
from quantecon import MarkovChain

class SequentialAllocation:

'''
Class that takes CESutility or BGPutility object as input returns
planner's allocation as a function of the multiplier on the
implementability constraint μ.
'''

def __init__(self, model):

# Initialize from model object attributes


self.β, self.π, self.G = model.β, model.π, model.G
self.mc, self.Θ = MarkovChain(self.π), model.Θ
self.S = len(model.π) # Number of states
self.model = model

# Find the first best allocation


self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Un = model.Uc, model.Un

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))

if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]

# Multiplier on the resource constraint


self.ΞFB = Uc(self.cFB, self.nFB)
self.zFB = np.hstack([self.cFB, self.nFB, self.ΞFB])

def time1_allocation(self, μ):


'''
Computes optimal allocation for time t >= 1 for a given μ
'''
model = self.model
S, Θ, G = self.S, self.Θ, self.G
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

def FOC(z):
c = z[:S]
n = z[S:2 * S]
Ξ = z[2 * S:]
return np.hstack([Uc(c, n) - μ * (Ucc(c, n) * c + Uc(c, n)) - Ξ, # FOC of c
Un(c, n) - μ * (Unn(c, n) * n + Un(c, n)) + \
Θ * Ξ, # FOC of n
Θ * n - c - G])

# Find the root of the first-order condition


res = root(FOC, self.zFB)
if not res.success:
raise Exception('Could not find LS allocation.')
z = res.x
c, n, Ξ = z[:S], z[S:2 * S], z[2 * S:]
82.4. LONG SIMULATION 1393

# Compute x
I = Uc(c, n) * c + Un(c, n) * n
x = np.linalg.solve(np.eye(S) - self.β * self.π, I)

return c, n, x, Ξ

def time0_allocation(self, B_, s_0):


'''
Finds the optimal allocation given initial government debt B_ and state s_0
'''
model, π, Θ, G, β = self.model, self.π, self.Θ, self.G, self.β
Uc, Ucc, Un, Unn = model.Uc, model.Ucc, model.Un, model.Unn

# First order conditions of planner's problem


def FOC(z):
μ, c, n, Ξ = z
xprime = self.time1_allocation(μ)[2]
return np.hstack([Uc(c, n) * (c - B_) + Un(c, n) * n + β * π[s_0] @ xprime,
Uc(c, n) - μ * (Ucc(c, n) *
(c - B_) + Uc(c, n)) - Ξ,
Un(c, n) - μ * (Unn(c, n) * n +
Un(c, n)) + Θ[s_0] * Ξ,
(Θ * n - c - G)[s_0]])

# Find root
res = root(FOC, np.array(
[0, self.cFB[s_0], self.nFB[s_0], self.ΞFB[s_0]]))
if not res.success:
raise Exception('Could not find time 0 LS allocation.')

return res.x

def time1_value(self, μ):


'''
Find the value associated with multiplier μ
'''
c, n, x, Ξ = self.time1_allocation(μ)
U = self.model.U(c, n)
V = np.linalg.solve(np.eye(self.S) - self.β * self.π, U)
return c, n, x, V

def Τ(self, c, n):


'''
Computes Τ given c, n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π, β = self.model, self.π, self.β
Uc = model.Uc

if sHist is None:
sHist = self.mc.simulate(T, s_0)

cHist, nHist, Bhist, ΤHist, μHist = np.zeros((5, T))


RHist = np.zeros(T - 1)

# Time 0
μ, cHist[0], nHist[0], _ = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = μ

# Time 1 onward
for t in range(1, T):
c, n, x, Ξ = self.time1_allocation(μ)
Τ = self.Τ(c, n)
1394 82. FISCAL RISK AND GOVERNMENT DEBT

u_c = Uc(c, n)
s = sHist[t]
Eu_c = π[sHist[t - 1]] @ u_c
cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x[s] / \
u_c[s], Τ[s]
RHist[t - 1] = Uc(cHist[t - 1], nHist[t - 1]) / (β * Eu_c)
μHist[t] = μ

return np.array([cHist, nHist, Bhist, ΤHist, sHist, μHist, RHist])

In [5]: from scipy.optimize import fmin_slsqp

class RecursiveAllocationAMSS:

def __init__(self, model, μgrid, tol_diff=1e-4, tol=1e-4):

self.β, self.π, self.G = model.β, model.π, model.G


self.mc, self.S = MarkovChain(self.π), len(model.π) # Number of states
self.Θ, self.model, self.μgrid = model.Θ, model, μgrid
self.tol_diff, self.tol = tol_diff, tol

# Find the first best allocation


self.solve_time1_bellman()
self.T.time_0 = True # Bellman equation now solves time 0 problem

def solve_time1_bellman(self):
'''
Solve the time 1 Bellman equation for calibration model and
initial grid μgrid0
'''
model, μgrid0 = self.model, self.μgrid
π = model.π
S = len(model.π)

# First get initial fit from Lucas Stokey solution.


# Need to change things to be ex ante
PP = SequentialAllocation(model)
interp = interpolator_factory(2, None)

def incomplete_allocation(μ_, s_):


c, n, x, V = PP.time1_value(μ_)
return c, n, π[s_] @ x, π[s_] @ V
cf, nf, xgrid, Vf, xprimef = [], [], [], [], []
for s_ in range(S):
c, n, x, V = zip(*map(lambda μ: incomplete_allocation(μ, s_), μgrid0))
c, n = np.vstack(c).T, np.vstack(n).T
x, V = np.hstack(x), np.hstack(V)
xprimes = np.vstack([x] * S)
cf.append(interp(x, c))
nf.append(interp(x, n))
Vf.append(interp(x, V))
xgrid.append(x)
xprimef.append(interp(x, xprimes))
cf, nf, xprimef = fun_vstack(cf), fun_vstack(nf), fun_vstack(xprimef)
Vf = fun_hstack(Vf)
policies = [cf, nf, xprimef]

# Create xgrid
x = np.vstack(xgrid).T
xbar = [x.min(0).max(), x.max(0).min()]
xgrid = np.linspace(xbar[0], xbar[1], len(μgrid0))
self.xgrid = xgrid

# Now iterate on Bellman equation


T = BellmanEquation(model, xgrid, policies, tol=self.tol)
diff = 1
while diff > self.tol_diff:
PF = T(Vf)

Vfnew, policies = self.fit_policy_function(PF)


diff = np.abs((Vf(xgrid) - Vfnew(xgrid)) / Vf(xgrid)).max()
82.4. LONG SIMULATION 1395

print(diff)
Vf = Vfnew

# store value function policies and Bellman Equations


self.Vf = Vf
self.policies = policies
self.T = T

def fit_policy_function(self, PF):


'''
Fits the policy functions
'''
S, xgrid = len(self.π), self.xgrid
interp = interpolator_factory(3, 0)
cf, nf, xprimef, Tf, Vf = [], [], [], [], []
for s_ in range(S):
PFvec = np.vstack([PF(x, s_) for x in self.xgrid]).T
Vf.append(interp(xgrid, PFvec[0, :]))
cf.append(interp(xgrid, PFvec[1:1 + S]))
nf.append(interp(xgrid, PFvec[1 + S:1 + 2 * S]))
xprimef.append(interp(xgrid, PFvec[1 + 2 * S:1 + 3 * S]))
Tf.append(interp(xgrid, PFvec[1 + 3 * S:]))
policies = fun_vstack(cf), fun_vstack(
nf), fun_vstack(xprimef), fun_vstack(Tf)
Vf = fun_hstack(Vf)
return Vf, policies

def Τ(self, c, n):


'''
Computes Τ given c and n
'''
model = self.model
Uc, Un = model.Uc(c, n), model.Un(c, n)

return 1 + Un / (self.Θ * Uc)

def time0_allocation(self, B_, s0):


'''
Finds the optimal allocation given initial government debt B_ and
state s_0
'''
PF = self.T(self.Vf)
z0 = PF(B_, s0)
c0, n0, xprime0, T0 = z0[1:]
return c0, n0, xprime0, T0

def simulate(self, B_, s_0, T, sHist=None):


'''
Simulates planners policies for T periods
'''
model, π = self.model, self.π
Uc = model.Uc
cf, nf, xprimef, Tf = self.policies

if sHist is None:
sHist = simulate_markov(π, s_0, T)

cHist, nHist, Bhist, xHist, ΤHist, THist, μHist = np.zeros((7, T))


# time 0
cHist[0], nHist[0], xHist[0], THist[0] = self.time0_allocation(B_, s_0)
ΤHist[0] = self.Τ(cHist[0], nHist[0])[s_0]
Bhist[0] = B_
μHist[0] = self.Vf[s_0](xHist[0])

# time 1 onward
for t in range(1, T):
s_, x, s = sHist[t - 1], xHist[t - 1], sHist[t]
c, n, xprime, T = cf[s_, :](x), nf[s_, :](
x), xprimef[s_, :](x), Tf[s_, :](x)

Τ = self.Τ(c, n)[s]
u_c = Uc(c, n)
Eu_c = π[s_, :] @ u_c
1396 82. FISCAL RISK AND GOVERNMENT DEBT

μHist[t] = self.Vf[s](xprime[s])

cHist[t], nHist[t], Bhist[t], ΤHist[t] = c[s], n[s], x / Eu_c, Τ


xHist[t], THist[t] = xprime[s], T[s]
return np.array([cHist, nHist, Bhist, ΤHist, THist, μHist, sHist, xHist])

class BellmanEquation:
'''
Bellman equation for the continuation of the Lucas-Stokey Problem
'''

def __init__(self, model, xgrid, policies0, tol, maxiter=1000):

self.β, self.π, self.G = model.β, model.π, model.G


self.S = len(model.π) # Number of states
self.Θ, self.model, self.tol = model.Θ, model, tol
self.maxiter = maxiter

self.xbar = [min(xgrid), max(xgrid)]


self.time_0 = False

self.z0 = {}
cf, nf, xprimef = policies0

for s_ in range(self.S):
for x in xgrid:
self.z0[x, s_] = np.hstack([cf[s_, :](x),
nf[s_, :](x),
xprimef[s_, :](x),
np.zeros(self.S)])

self.find_first_best()

def find_first_best(self):
'''
Find the first best allocation
'''
model = self.model
S, Θ, Uc, Un, G = self.S, self.Θ, model.Uc, model.Un, self.G

def res(z):
c = z[:S]
n = z[S:]
return np.hstack([Θ * Uc(c, n) + Un(c, n), Θ * n - c - G])

res = root(res, 0.5 * np.ones(2 * S))


if not res.success:
raise Exception('Could not find first best')

self.cFB = res.x[:S]
self.nFB = res.x[S:]
IFB = Uc(self.cFB, self.nFB) * self.cFB + \
Un(self.cFB, self.nFB) * self.nFB

self.xFB = np.linalg.solve(np.eye(S) - self.β * self.π, IFB)

self.zFB = {}
for s in range(S):
self.zFB[s] = np.hstack(
[self.cFB[s], self.nFB[s], self.π[s] @ self.xFB, 0.])

def __call__(self, Vf):


'''
Given continuation value function next period return value function this
period return T(V) and optimal policies
'''
if not self.time_0:
def PF(x, s): return self.get_policies_time1(x, s, Vf)
else:
def PF(B_, s0): return self.get_policies_time0(B_, s0, Vf)
return PF
82.4. LONG SIMULATION 1397

def get_policies_time1(self, x, s_, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, G, S, π = self.model, self.β, self.Θ, self.G, self.S, self.π
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:S], z[S:2 * S], z[2 * S:3 * S]

Vprime = np.empty(S)
for s in range(S):
Vprime[s] = Vf[s](xprime[s])

return -π[s_] @ (U(c, n) + β * Vprime)

def cons(z):
c, n, xprime, T = z[:S], z[S:2 * S], z[2 * S:3 * S], z[3 * S:]
u_c = Uc(c, n)
Eu_c = π[s_] @ u_c
return np.hstack([
x * u_c / Eu_c - u_c * (c - T) - Un(c, n) * n - β * xprime,
Θ * n - c - G])

if model.transfers:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 100.)] * S
else:
bounds = [(0., 100)] * S + [(0., 100)] * S + \
[self.xbar] * S + [(0., 0.)] * S
out, fx, _, imode, smode = fmin_slsqp(objf, self.z0[x, s_],
f_eqcons=cons, bounds=bounds,
full_output=True, iprint=0,
acc=self.tol, iter=self.maxiter)

if imode > 0:
raise Exception(smode)

self.z0[x, s_] = out


return np.hstack([-fx, out])

def get_policies_time0(self, B_, s0, Vf):


'''
Finds the optimal policies
'''
model, β, Θ, G = self.model, self.β, self.Θ, self.G
U, Uc, Un = model.U, model.Uc, model.Un

def objf(z):
c, n, xprime = z[:-1]

return -(U(c, n) + β * Vf[s0](xprime))

def cons(z):
c, n, xprime, T = z
return np.hstack([
-Uc(c, n) * (c - B_ - T) - Un(c, n) * n - β * xprime,
(Θ * n - c - G)[s0]])

if model.transfers:
bounds = [(0., 100), (0., 100), self.xbar, (0., 100.)]
else:
bounds = [(0., 100), (0., 100), self.xbar, (0., 0.)]
out, fx, _, imode, smode = fmin_slsqp(objf, self.zFB[s0], f_eqcons=cons,
bounds=bounds, full_output=True, iprint=0)

if imode > 0:
raise Exception(smode)

return np.hstack([-fx, out])

In [6]: from scipy.interpolate import UnivariateSpline


1398 82. FISCAL RISK AND GOVERNMENT DEBT

class interpolate_wrapper:

def __init__(self, F):


self.F = F

def __getitem__(self, index):


return interpolate_wrapper(np.asarray(self.F[index]))

def reshape(self, *args):


self.F = self.F.reshape(*args)
return self

def transpose(self):
self.F = self.F.transpose()

def __len__(self):
return len(self.F)

def __call__(self, xvec):


x = np.atleast_1d(xvec)
shape = self.F.shape
if len(x) == 1:
fhat = np.hstack([f(x) for f in self.F.flatten()])
return fhat.reshape(shape)
else:
fhat = np.vstack([f(x) for f in self.F.flatten()])
return fhat.reshape(np.hstack((shape, len(x))))

class interpolator_factory:

def __init__(self, k, s):


self.k, self.s = k, s

def __call__(self, xgrid, Fs):


shape, m = Fs.shape[:-1], Fs.shape[-1]
Fs = Fs.reshape((-1, m))
F = []
xgrid = np.sort(xgrid) # Sort xgrid
for Fhat in Fs:
F.append(UnivariateSpline(xgrid, Fhat, k=self.k, s=self.s))
return interpolate_wrapper(np.array(F).reshape(shape))

def fun_vstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.vstack(Fs))

def fun_hstack(fun_list):

Fs = [IW.F for IW in fun_list]


return interpolate_wrapper(np.hstack(Fs))

def simulate_markov(π, s_0, T):

sHist = np.empty(T, dtype=int)


sHist[0] = s_0
S = len(π)
for t in range(1, T):
sHist[t] = np.random.choice(np.arange(S), p=π[sHist[t - 1]])

return sHist

Next, we show the code that we use to generate a very long simulation starting from initial
government debt equal to −.5
Here is a graph of a long simulation of 102000 periods
82.4. LONG SIMULATION 1399

In [7]: import matplotlib.pyplot as plt


%matplotlib inline

μ_grid = np.linspace(-0.09, 0.1, 100)

log_example = CRRAutility(π=(1 / 3) * np.ones((3, 3)),


G=np.array([0.1, 0.2, .3]),
Θ=np.ones(3))

log_example.transfers = True # Government can use transfers


log_sequential = SequentialAllocation(log_example) # Solve sequential problem
log_bellman = RecursiveAllocationAMSS(log_example, μ_grid,
tol=1e-12, tol_diff=1e-10)

T = 102000 # Set T to 102000 periods

sim_seq_long = log_sequential.simulate(0.5, 0, T)
sHist_long = sim_seq_long[-3]
sim_bel_long = log_bellman.simulate(0.5, 0, T, sHist_long)

titles = ['Government Debt', 'Tax Rate']

fig, axes = plt.subplots(2, 1, figsize=(10, 8))

for ax, title, id in zip(axes.flatten(), titles, [2, 3]):


ax.plot(sim_seq_long[id], '-k', sim_bel_long[id], '-.b', alpha=0.5)
ax.set(title=title)
ax.grid()

axes[0].legend(('Complete Markets', 'Incomplete Markets'))


plt.tight_layout()
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:24: RuntimeWarning: divide by zero enco


/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29: RuntimeWarning: divide by zero enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:232: RuntimeWarning: invalid value enco

0.03826635338764132
0.0015144378246369176
0.0013387575048731985
0.0011833202401039893
0.0010600307116126906
0.0009506620324956109
0.0008518776516864095
0.0007625857030935052
0.0006819563061521688
0.0006094002927215782
0.0005443007358277924
0.00048599500336476265
0.00043383959355578774
0.0003872273086410756
0.0003455954121656354
0.0003084287064303067
0.0002752590187447044
0.00024566312918700075
0.00021925988532276431
0.00019570695816949855
0.00017469751640983744
0.00015595697136515873
0.0001392398796073817
0.00012432704754811855
0.00011102285955108606
9.91528320785181e-05
8.85613917694051e-05
7.910986484645073e-05
7.067466534287542e-05
6.314566738064437e-05
5.6424746011174256e-05
1400 82. FISCAL RISK AND GOVERNMENT DEBT

5.042447141827191e-05
4.506694213583938e-05
4.0282743557388626e-05
3.6010019182228725e-05
3.219364288206812e-05
2.878448159091498e-05
2.573873836048089e-05
2.3017369984667964e-05
2.05855625553573e-05
1.8412273738832955e-05
1.647009682046267e-05
1.473414850165591e-05
1.318221437445491e-05
1.179465462832215e-05
1.0553942908677422e-05
9.444436171219125e-06
8.452171085911092e-06
7.564681532049729e-06
6.770836845076241e-06
6.0606989929363644e-06
5.4253876521102315e-06
4.856977565561056e-06
4.3483826637469755e-06
3.893276378650257e-06
3.4860031062237643e-06
3.1215108661056906e-06
2.795283854622686e-06
2.503284403583463e-06
2.241904892406543e-06
2.007920705909931e-06
1.798447325327638e-06
1.6109044251479309e-06
1.442988347655394e-06
1.2926351682153122e-06
1.1580010625721907e-06
1.0374365434222132e-06
9.294651116185741e-07
8.327667920437415e-07
7.461587021354829e-07
6.685858974594693e-07
5.991018344113918e-07
5.368603393149832e-07
4.81103577238208e-07
4.3115412525372483e-07
3.864050517478613e-07
3.4631297123891545e-07
3.103915075306082e-07
2.7820599855382486e-07
2.4936652532360413e-07
2.2352414642839936e-07
2.0036654801897765e-07
1.796139868675989e-07
1.6101607241185348e-07
1.443483988609368e-07
1.294101204124584e-07
1.1602136958604934e-07
1.0402103101246419e-07
9.32645840418982e-08
8.362288415242993e-08
7.49800787010894e-08
6.723245436434705e-08
6.028706738630238e-08
5.406065458358515e-08
4.8478615812375635e-08
4.3474113222701845e-08
3.8987258373706536e-08
3.496438798394339e-08
3.1357419654540646e-08
2.8123260340143566e-08
2.5223296649714814e-08
2.2622923200109855e-08
2.0291126502626413e-08
1.820010906835736e-08
82.4. LONG SIMULATION 1401

1.6324961312300717e-08
1.4643350782351372e-08
1.3135263982637831e-08
1.1782761809326932e-08
1.0569761128861453e-08
9.481846520621847e-09
8.50609563218879e-09
7.63092232387174e-09
6.845941933854109e-09
6.1418410292820715e-09
5.510272681189048e-09
4.9437518940625445e-09
4.4355691169414215e-09
3.979750248408692e-09
3.5708449347232996e-09
3.204016722809105e-09
2.8749276973662518e-09
2.5796921449357712e-09
2.3148170455156384e-09
2.07718006687458e-09
1.8639730005803084e-09
1.6726806861203604e-09
1.5010483122790743e-09
1.3470513529986538e-09
1.2088752663386996e-09
1.0848927139156927e-09
9.73642020099531e-10
8.738161691592579e-10
7.842384749994105e-10
7.038552975803189e-10
6.317228704828643e-10
5.669919588945233e-10
5.089025906360272e-10
4.5677212348350323e-10
4.09988116556057e-10
3.6800306134146346e-10
3.3032151919832457e-10
2.9650399560334535e-10
2.661523729846855e-10
2.389122352703575e-10
2.1446217496797182e-10
1.9251828729107873e-10
1.7282254741959947e-10
1.5514374031919394e-10
1.3927467128385764e-10
1.2503216593604361e-10
1.1224730712531801e-10
1.0077070390381865e-10
9.046923715917151e-11
1402 82. FISCAL RISK AND GOVERNMENT DEBT

The long simulation apparently indicates eventual convergence to an ergodic distribution


It takes about 1000 periods to reach the ergodic distribution – an outcome that is forecast by
approximations to rates of convergence that appear in [17] and that we discuss in a previous
82.4. LONG SIMULATION 1403

lecture
We discard the first 2000 observations of the simulation and construct the histogram of the
part value of government debt
We obtain the following graph for the histogram of the last 100,000 observations on the par
value of government debt

The black vertical line denotes the sample mean for the last 100,000 observations included in
ℬ∗
the histogram; the green vertical line denotes the value of 𝐸𝑢 , associated with the sample
𝑐

(presumably) from the ergodic where ℬ is the regression coefficient described below; the red
vertical line denotes an approximation by [17] to the mean of the ergodic distribution that
can be precomputed before sampling from the ergodic distribution, as described below
Before moving on to discuss the histogram and the vertical lines approximating the ergodic
mean of government debt in more detail, the following graphs show government debt and
taxes early in the simulation, for periods 1-100 and 101 to 200 respectively

In [8]: titles = ['Government Debt', 'Tax Rate']

fig, axes = plt.subplots(4, 1, figsize=(10, 15))

for i, id in enumerate([2, 3]):


axes[i].plot(sim_seq_long[id][:99], '-k', sim_bel_long[id][:99], '-.b', alpha=0.5)
axes[i+2].plot(range(100, 199), sim_seq_long[id][100:199], '-k',
range(100, 199), sim_bel_long[id][100:199], '-.b', alpha=0.5)
axes[i].set(title=titles[i])
axes[i+2].set(title=titles[i])
axes[i].grid()
axes[i+2].grid()

axes[0].legend(('Complete Markets', 'Incomplete Markets'))


plt.tight_layout()
plt.show()
1404 82. FISCAL RISK AND GOVERNMENT DEBT
82.4. LONG SIMULATION 1405

For the short samples early in our simulated sample of 102,000 observations, fluctuations in
government debt and the tax rate conceal the weak but inexorable force that the Ramsey
planner puts into both series driving them toward ergodic distributions far from these early
observations

• early observations are more influenced by the initial value of the par value of
government debt than by the ergodic mean of the par value of government
debt
• much later observations are more influenced by the ergodic mean and are in-
dependent of the initial value of the par value of government debt
1406 82. FISCAL RISK AND GOVERNMENT DEBT

82.5 Asymptotic Mean and Rate of Convergence

We apply the results of [17] to interpret

• the mean of the ergodic distribution of government debt


• the rate of convergence to the ergodic distribution from an arbitrary initial
government debt

We begin by computing objects required by the theory of section III.i of [17]


As in Fiscal Insurance via Fluctuating Interest Rates, we recall that [17] used a particular
notation to represent what we can regard as a generalization of the AMSS model
We introduce some of the [17] notation so that readers can quickly relate notation that ap-
pears in their key formulas to the notation that we have used in previous lectures here and
here
BEGS work with objects 𝐵𝑡 , ℬ𝑡 , ℛ𝑡 , 𝒳𝑡 that are related to notation that we used in earlier
lectures by

𝑢𝑐,𝑡 𝑢𝑐,𝑡
ℛ𝑡 = 𝑅𝑡−1 =
𝑢𝑐,𝑡−1 𝛽𝐸𝑡−1 𝑢𝑐,𝑡
𝑏𝑡+1 (𝑠𝑡 )
𝐵𝑡 =
𝑅𝑡 (𝑠𝑡 )
𝑏𝑡 (𝑠𝑡−1 ) = ℛ𝑡−1 𝐵𝑡−1
ℬ𝑡 = 𝑢𝑐,𝑡 𝐵𝑡 = (𝛽𝐸𝑡 𝑢𝑐,𝑡+1 )𝑏𝑡+1 (𝑠𝑡 )
𝒳𝑡 = 𝑢𝑐,𝑡 [𝑔𝑡 − 𝜏𝑡 𝑛𝑡 ]

[17] call 𝒳𝑡 the effective government deficit, and ℬ𝑡 the effective government debt
Equation (44) of [17] expresses the time 𝑡 state 𝑠 government budget constraint as

ℬ(𝑠) = ℛ𝜏 (𝑠, 𝑠− )ℬ− + 𝒳𝜏 (𝑠) (1)

where the dependence on 𝜏 is to remind us that these objects depend on the tax rate; 𝑠− is
last period’s Markov state
BEGS interpret random variations in the right side of Eq. (1) as fiscal risks generated by

• interest-rate-driven fluctuations in time 𝑡 effective payments due on the government


portfolio, namely, ℛ𝜏 (𝑠, 𝑠− )ℬ− , and
• fluctuations in the effective government deficit 𝒳𝑡

82.5.1 Asymptotic Mean

BEGS give conditions under which the ergodic mean of ℬ𝑡 approximately satisfies the equa-
tion

cov∞ (ℛt , 𝒳t )
ℬ∗ = − (2)
var∞ (ℛt )
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1407

where the superscript ∞ denotes a moment taken with respect to an ergodic distribution
Formula Eq. (2) represents ℬ∗ as a regression coefficient of 𝒳𝑡 on ℛ𝑡 in the ergodic distribu-
tion
Regression coefficient ℬ∗ solves a variance-minimization problem:

ℬ∗ = argminℬ var∞ (ℛℬ + 𝒳) (3)

The minimand in criterion Eq. (3) measures fiscal risk associated with a given tax-debt policy
that appears on the right side of equation Eq. (1)
Expressing formula Eq. (2) in terms of our notation tells us that the ergodic mean of the par
value 𝑏 of government debt in the AMSS model should approximately equal

ℬ∗ ℬ∗
𝑏̂ = = (4)
𝛽𝐸(𝐸𝑡 𝑢𝑐,𝑡+1 ) 𝛽𝐸(𝑢𝑐,𝑡+1 )

where mathematical expectations are taken with respect to the ergodic distribution

82.5.2 Rate of Convergence

BEGS also derive the following approximation to the rate of convergence to ℬ∗ from an arbi-
trary initial condition

𝐸𝑡 (ℬ𝑡+1 − ℬ∗ ) 1

≈ (5)
(ℬ𝑡 − ℬ ) 1 + 𝛽 var∞ (ℛ)
2

(See the equation above equation (47) in [17])

82.5.3 More Advanced Material

The remainder of this lecture is about technical material based on formulas from [17]
The topic is interpreting and extending formula Eq. (3) for the ergodic mean ℬ∗

82.5.4 Chicken and Egg

Attributes of the ergodic distribution for ℬ𝑡 appear on the right side of formula Eq. (3) for
the ergodic mean ℬ∗
Thus, formula Eq. (3) is not useful for estimating the mean of the ergodic in advance of actu-
ally computing the ergodic distribution

• we need to know the ergodic distribution to compute the right side of for-
mula Eq. (3)

So the primary use of equation Eq. (3) is how it confirms that the ergodic distribution solves
a fiscal-risk minimization problem
As an example, notice how we used the formula for the mean of ℬ in the ergodic distribution
of the special AMSS economy in Fiscal Insurance via Fluctuating Interest Rates
1408 82. FISCAL RISK AND GOVERNMENT DEBT

• first we computed the ergodic distribution using a reverse-engineering con-


struction
• then we verified that ℬ agrees with the mean of that distribution

82.5.5 Approximating the Ergodic Mean

[17] propose an approximation to ℬ∗ that can be computed without first knowing the ergodic
distribution
To construct the BEGS approximation to ℬ∗ , we just follow steps set forth on pages 648 - 650
of section III.D of [17]

• notation in BEGS might be confusing at first sight, so it is important to stare and di-
gest before computing
• there are also some sign errors in the [17] text that we’ll want to correct

Here is a step-by-step description of the [17] approximation procedure

82.5.6 Step by Step

Step 1: For a given 𝜏 we compute a vector of values 𝑐𝜏 (𝑠), 𝑠 = 1, 2, … , 𝑆 that satisfy

(1 − 𝜏 )𝑐𝜏 (𝑠)−𝜎 − (𝑐𝜏 (𝑠) + 𝑔(𝑠))𝛾 = 0

This is a nonlinear equation to be solved for 𝑐𝜏 (𝑠), 𝑠 = 1, … , 𝑆


𝑆 = 3 in our case, but we’ll write code for a general integer 𝑆
Typo alert: Please note that there is a sign error in equation (42) of [17] – it should be a
minus rather than a plus in the middle

• We have made the appropriate correction in the above equation

Step 2: Knowing 𝑐𝜏 (𝑠), 𝑠 = 1, … , 𝑆 for a given 𝜏 , we want to compute the random variables

𝑐𝜏 (𝑠)−𝜎
ℛ𝜏 (𝑠) = 𝑆
𝛽 ∑𝑠′ =1 𝑐𝜏 (𝑠′ )−𝜎 𝜋(𝑠′ )

and

𝒳𝜏 (𝑠) = (𝑐𝜏 (𝑠) + 𝑔(𝑠))1+𝛾 − 𝑐𝜏 (𝑠)1−𝜎

each for 𝑠 = 1, … , 𝑆
BEGS call ℛ𝜏 (𝑠) the effective return on risk-free debt and they call 𝒳𝜏 (𝑠) the effective
government deficit
Step 3: With the preceding objects in hand, for a given ℬ, we seek a 𝜏 that satisfies

𝛽 𝛽
ℬ=− 𝐸𝒳𝜏 ≡ − ∑ 𝒳𝜏 (𝑠)𝜋(𝑠)
1−𝛽 1−𝛽 𝑠
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1409

This equation says that at a constant discount factor 𝛽, equivalent government debt ℬ equals
the present value of the mean effective government surplus
Typo alert: there is a sign error in equation (46) of [17] –the left side should be multiplied
by −1

• We have made this correction in the above equation

For a given ℬ, let a 𝜏 that solves the above equation be called 𝜏 (ℬ)
We’ll use a Python root solver to finds a 𝜏 that this equation for a given ℬ
We’ll use this function to induce a function 𝜏 (ℬ)
Step 4: With a Python program that computes 𝜏 (ℬ) in hand, next we write a Python func-
tion to compute the random variable

𝐽 (ℬ)(𝑠) = ℛ𝜏(ℬ) (𝑠)ℬ + 𝒳𝜏(ℬ) (𝑠), 𝑠 = 1, … , 𝑆

Step 5: Now that we have a machine to compute the random variable 𝐽 (ℬ)(𝑠), 𝑠 = 1, … , 𝑆,
via a composition of Python functions, we can use the population variance function that we
defined in the code above to construct a function var(𝐽 (ℬ))
We put var(𝐽 (ℬ)) into a function minimizer and compute

ℬ∗ = argminℬ var(𝐽 (ℬ))

Step 6: Next we take the minimizer ℬ∗ and the Python functions for computing means and
variances and compute

1
rate =
1 + 𝛽 2 var(ℛ𝜏(ℬ∗ ) )

Ultimate outputs of this string of calculations are two scalars

(ℬ∗ , rate)

Step 7: Compute the divisor

𝑑𝑖𝑣 = 𝛽𝐸𝑢𝑐,𝑡+1

and then compute the mean of the par value of government debt in the AMSS model

ℬ∗
𝑏̂ =
𝑑𝑖𝑣

In the two-Markov-state AMSS economy in Fiscal Insurance via Fluctuating Interest Rates,
𝐸𝑡 𝑢𝑐,𝑡+1 = 𝐸𝑢𝑐,𝑡+1 in the ergodic distribution and we have confirmed that this formula very
accurately describes a constant par value of government debt that

• supports full fiscal insurance via fluctuating interest parameters, and


1410 82. FISCAL RISK AND GOVERNMENT DEBT

• is the limit of government debt as 𝑡 → +∞

In the three-Markov-state economy of this lecture, the par value of government debt fluctu-
ates in a history-dependent way even asymptotically
In this economy, 𝑏̂ given by the above formula approximates the mean of the ergodic distribu-
tion of the par value of government debt

• this is the red vertical line plotted in the histogram of the last 100,000 obser-
vations of our simulation of the par value of government debt plotted above
• the approximation is fairly accurate but not perfect
• so while the approximation circumvents the chicken and egg problem sur-
rounding the much better approximation associated with the green vertical
line, it does so by enlarging the approximation error

82.5.7 Execution

Now let’s move on to compute things step by step


Step 1

In [9]: u = CRRAutility(π=(1 / 3) * np.ones((3, 3)),


G=np.array([0.1, 0.2, .3]),
Θ=np.ones(3))

τ = 0.05 # Initial guess of τ (to displays calcs along the way)


S = len(u.G) # Number of states

def solve_c(c, τ, u):


return (1 - τ) * c**(-u.σ) - (c + u.G)**u.γ

c = root(solve_c, np.ones(S), args=(τ, u)).x # .x returns the result from root


c

Out[9]: array([0.93852387, 0.89231015, 0.84858872])

In [10]: root(solve_c, np.ones(S), args=(τ, u))

Out[10]: fjac: array([[-0.99990816, -0.00495351, -0.01261467],


[-0.00515633, 0.99985715, 0.01609659],
[-0.01253313, -0.01616015, 0.99979086]])
fun: array([ 5.61814373e-10, -4.76900741e-10, 1.17474919e-11])
message: 'The solution converged.'
nfev: 11
qtf: array([1.55568331e-08, 1.28322481e-08, 7.89913426e-11])
r: array([ 4.26943131, 0.08684775, -0.06300593, -4.71278821, -0.0743338 ,
-5.50778548])
status: 1
success: True
x: array([0.93852387, 0.89231015, 0.84858872])

Step 2

In [11]: n = c + u.G # compute labor supply


82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1411

82.5.8 Note about Code

Remember that in our code 𝜋 is a 3 × 3 transition matrix


But because we are studying an IID case, 𝜋 has identical rows and we only need to compute
objects for one row of 𝜋
This explains why at some places below we set 𝑠 = 0 just to pick off the first row of 𝜋 in the
calculations

82.5.9 Code

First, let’s compute ℛ and 𝒳 according to our formulas

In [12]: def compute_R_X(τ, u, s):


c = root(solve_c, np.ones(S), args=(τ, u)).x # Solve for vector of c's
div = u.β * (u.Uc(c[0], n[0]) * u.π[s, 0] + u.Uc(c[1], n[1]) * u.π[s, 1] + u.Uc(c[2], n[2]) *
R = c**(-u.σ) / (div)
X = (c + u.G)**(1 + u.γ) - c**(1 - u.σ)
return R, X

In [13]: c**(-u.σ) @ u.π

Out[13]: array([1.25997521, 1.25997521, 1.25997521])

In [14]: u.π

Out[14]: array([[0.33333333, 0.33333333, 0.33333333],


[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333]])

We only want unconditional expectations because we are in an IID case


So we’ll set 𝑠 = 0 and just pick off expectations associated with the first row of 𝜋

In [15]: s = 0

R, X = compute_R_X(τ, u, s)

Let’s look at the random variables ℛ, 𝒳

In [16]: R

Out[16]: array([1.00116313, 1.10755123, 1.22461897])

In [17]: mean(R, s)

Out[17]: 1.1111111111111112

In [18]: X

Out[18]: array([0.05457803, 0.18259396, 0.33685546])

In [19]: mean(X, s)
1412 82. FISCAL RISK AND GOVERNMENT DEBT

Out[19]: 0.19134248445303795

In [20]: X @ u.π

Out[20]: array([0.19134248, 0.19134248, 0.19134248])

Step 3

In [21]: def solve_τ(τ, B, u, s):


R, X = compute_R_X(τ, u, s)
return ((u.β - 1) / u.β) * B - X @ u.π[s]

Note that 𝐵 is a scalar


Let’s try out our method computing 𝜏

In [22]: s = 0
B = 1.0

τ = root(solve_τ, .1, args=(B, u, s)).x[0] # Very sensitive to starting value


τ

Out[22]: 0.2740159773695818

In the above cell, B is fixed at 1 and 𝜏 is to be computed as a function of B


Note that 0.2 is the initial value for 𝜏 in the root-finding algorithm
Step 4

In [23]: def min_J(B, u, s):


τ = root(solve_τ, .5, args=(B, u, s)).x[0] # very sensitive to initial value of τ
R, X = compute_R_X(τ, u, s)
return variance(R * B + X, s)

In [24]: min_J(B, u, s)

Out[24]: 0.035564405653720765

Step 6

In [25]: from scipy.optimize import minimize

B_star = minimize(min_J, .5, args=(u, s)).x[0]


B_star

Out[25]: -1.199482032053344

In [26]: n = c + u.G # compute labor supply

In [27]: div = u.β * (u.Uc(c[0], n[0]) * u.π[s, 0] + u.Uc(c[1], n[1]) * u.π[s, 1] + u.Uc(c[2], n[2]) * u.π[

In [28]: B_hat = B_star/div


B_hat

Out[28]: -1.057765110954647
82.5. ASYMPTOTIC MEAN AND RATE OF CONVERGENCE 1413

In [29]: τ_star = root(solve_τ, 0.05, args=(B_star, u, s)).x[0]


τ_star

Out[29]: 0.09572926599432369

In [30]: R_star, X_star = compute_R_X(τ_star, u, s)


R_star, X_star

Out[30]: (array([0.9998398 , 1.10746593, 1.22602761]),


array([0.00202709, 0.1246474 , 0.27315286]))

In [31]: rate = 1 / (1 + u.β**2 * variance(R_star, s))


rate

Out[31]: 0.9931353429089931

In [32]: root(solve_c, np.ones(S), args=(τ_star, u)).x

Out[32]: array([0.92643817, 0.88027114, 0.83662633])


1414 82. FISCAL RISK AND GOVERNMENT DEBT
83

Competitive Equilibria of Chang


Model

83.1 Contents

• Overview 83.2

• Setting 83.3

• Competitive Equilibrium 83.4

• Inventory of Objects in Play 83.5

• Analysis 83.6

• Calculating all Promise-Value Pairs in CE 83.7

• Solving a Continuation Ramsey Planner’s Bellman Equation 83.8

Co-author: Sebastian Graves


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install polytope

83.2 Overview

This lecture describes how Chang [25] analyzed competitive equilibria and a best competi-
tive equilibrium called a Ramsey plan
He did this by

• characterizing a competitive equilibrium recursively in a way also employed in the dy-


namic Stackelberg problems and Calvo model lectures to pose Stackelberg problems in
linear economies, and then
• appropriately adapting an argument of Abreu, Pearce, and Stachetti [2] to describe key
features of the set of competitive equilibria

1415
1416 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Roberto Chang [25] chose a model of Calvo [21] as a simple structure that conveys ideas that
apply more broadly
A textbook version of Chang’s model appears in chapter 25 of [87]
This lecture and Credible Government Policies in Chang Model can be viewed as more so-
phisticated and complete treatments of the topics discussed in Ramsey plans, time inconsis-
tency, sustainable plans
Both this lecture and Credible Government Policies in Chang Model make extensive use of an
idea to which we apply the nickname dynamic programming squared
In dynamic programming squared problems there are typically two interrelated Bellman equa-
tions

• A Bellman equation for a set of agents or followers with value or value function 𝑣𝑎
• A Bellman equation for a principal or Ramsey planner or Stackelberg leader with value
or value function 𝑣𝑝 in which 𝑣𝑎 appears as an argument

We encountered problems with this structure in dynamic Stackelberg problems, optimal taxa-
tion with state-contingent debt, and other lectures

83.2.1 The Setting

First, we introduce some notation


For a sequence of scalars 𝑧 ⃗ ≡ {𝑧𝑡 }∞ 𝑡
𝑡=0 , let 𝑧 ⃗ = (𝑧0 , … , 𝑧𝑡 ), 𝑧𝑡⃗ = (𝑧𝑡 , 𝑧𝑡+1 , …)

An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, …
The objects in play are

• an initial quantity 𝑀−1 of nominal money holdings


• a sequence of inverse money growth rates ℎ⃗ and an associated sequence of nominal
money holdings 𝑀⃗
• a sequence of values of money 𝑞 ⃗
• a sequence of real money holdings 𝑚⃗
• a sequence of total tax collections 𝑥⃗
• a sequence of per capita rates of consumption 𝑐 ⃗
• a sequence of per capita incomes 𝑦 ⃗

A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget con-
straints and other constraints imposed by competitive equilibrium
Given tax collection and price of money sequences, a representative household chooses se-
quences (𝑐,⃗ 𝑚)
⃗ of consumption and real balances
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling
decisions of the government and the representative household
Chang adopts a version of a model that [21] designed to exhibit time-inconsistency of a Ram-
sey policy in a simple and transparent setting
By influencing the representative household’s expectations, government actions at time 𝑡 af-
fect components of household utilities for periods 𝑠 before 𝑡
83.3. SETTING 1417

When setting a path for monetary expansion rates, the government takes into account how
the household’s anticipations of the government’s future actions affect the household’s current
decisions
The ultimate source of time inconsistency is that a time 0 Ramsey planner takes these effects
into account in designing a plan of government actions for 𝑡 ≥ 0

83.3 Setting

83.3.1 The Household’s Problem

A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize


∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0

subject to

𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)

and

𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)

Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, which we can also call the value of money
Chang [25] assumes that

• 𝑢 ∶ R+ → R is twice continuously differentiable, strictly concave, and strictly increasing;


• 𝑣 ∶ R+ → R is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0

The household carries real balances out of a period equal to 𝑚𝑡 = 𝑞𝑡 𝑀𝑡


Inequality Eq. (2) is the household’s time 𝑡 budget constraint
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on income, consumption, taxes,
and real balances 𝑞𝑡 𝑀𝑡−1 carried into the period
Equation Eq. (3) imposes an exogenous upper bound 𝑚̄ on the household’s choice of real bal-
ances, where 𝑚̄ ≥ 𝑚𝑓

83.3.2 Government

The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋
𝑡
1418 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

The government faces a sequence of budget constraints with time 𝑡 component

−𝑥𝑡 = 𝑞𝑡 (𝑀𝑡 − 𝑀𝑡−1 )

which by using the definitions of 𝑚𝑡 and ℎ𝑡 can also be expressed as

−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)

The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚]̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:

𝑦𝑡 = 𝑓(𝑥𝑡 ), (5)

where 𝑓 ∶ R → R satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ R, so that subsidies and taxes are equally distorting
Calvo’s and Chang’s purpose is not to model the causes of tax distortions in any detail but
simply to summarize the outcome of those distortions via the function 𝑓(𝑥)
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues
Ramsey plan: A Ramsey plan is a competitive equilibrium that maximizes Eq. (1)
Within-period timing of decisions is as follows:

• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;


• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household
chooses 𝑀𝑡 and therefore 𝑚𝑡 because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡

This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected
This consideration will be important in lecture credible government policies when we study
credible government policies
The model is designed to focus on the intertemporal trade-offs between the welfare benefits
of deflation and the welfare costs associated with the high tax collections required to retire
money at a rate that delivers deflation
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing sufficiently large distorting tax collections
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation
83.4. COMPETITIVE EQUILIBRIUM 1419

83.3.3 Household’s Problem

Given 𝑀−1 and {𝑞𝑡 }∞


𝑡=0 , the household’s problem is


ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0

+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}

First-order conditions with respect to 𝑐𝑡 and 𝑀𝑡 , respectively, are

𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄

The last equation expresses Karush-Kuhn-Tucker complementary slackness conditions (see


here)
These insist that the inequality is an equality at an interior solution for 𝑀𝑡
𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies

𝑚𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑚𝑡 )] ≤ 𝛽𝑢′ (𝑓(𝑥𝑡+1 ))𝑚𝑡+1 ℎ𝑡+1 , = if 𝑚𝑡 < 𝑚̄ (6)

Define the following key variable

𝜃𝑡+1 ≡ 𝑢′ (𝑓(𝑥𝑡+1 ))𝑚𝑡+1 ℎ𝑡+1 (7)

This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’
From the standpoint of the household at time 𝑡, equation Eq. (7) shows that 𝜃𝑡+1 intermedi-
ates the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation
A closely related observation pervaded the analysis of Stackelberg plans in lecture dynamic
Stackelberg problems

83.4 Competitive Equilibrium

Definition:

• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0


• A price system is a nonnegative value of money sequence 𝑞 ⃗
• An allocation is a triple of nonnegative sequences (𝑐,⃗ 𝑚,⃗ 𝑦)⃗

It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸


1420 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be
a competitive equilibrium if

• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 )
• The government budget constraint is satisfied
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem

83.5 Inventory of Objects in Play

Chang constructs the following objects

1. A set Ω of initial marginal utilities of money 𝜃0

• Let Ω denote the set of initial promised marginal utilities of money 𝜃0 associated with
competitive equilibria
• Chang exploits the fact that a competitive equilibrium consists of a first period outcome
(ℎ0 , 𝑚0 , 𝑥0 ) and a continuation competitive equilibrium with marginal utility of money
𝜃1 ∈ Ω

1. Competitive equilibria that have a recursive representation

• A competitive equilibrium with a recursive representation consists of an initial 𝜃0 and


a four-tuple of functions (ℎ, 𝑚, 𝑥, Ψ) mapping 𝜃 into this period’s (ℎ, 𝑚, 𝑥) and next pe-
riod’s 𝜃, respectively
• A competitive equilibrium can be represented recursively by iterating on

ℎ𝑡 = ℎ(𝜃𝑡 )
𝑚𝑡 = 𝑚(𝜃𝑡 )
(8)
𝑥𝑡 = 𝑥(𝜃𝑡 )
𝜃𝑡+1 = Ψ(𝜃𝑡 )

starting from 𝜃0
The range and domain of Ψ(⋅) are both Ω

1. A recursive representation of a Ramsey plan

• A recursive representation of a Ramsey plan is a recursive competitive equilib-


rium 𝜃0 , (ℎ, 𝑚, 𝑥, Ψ) that, among all recursive competitive equilibria, maximizes

∑𝑡=0 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )]
• The Ramsey planner chooses 𝜃0 , (ℎ, 𝑚, 𝑥, Ψ) from among the set of recursive competi-
tive equilibria at time 0
• Iterations on the function Ψ determine subsequent 𝜃𝑡 ’s that summarize the aspects of
the continuation competitive equilibria that influence the household’s decisions
• At time 0, the Ramsey planner commits to this implied sequence {𝜃𝑡 }∞ 𝑡=0 and therefore
to an associated sequence of continuation competitive equilibria
83.6. ANALYSIS 1421

1. A characterization of time-inconsistency of a Ramsey plan

• Imagine that after a ‘revolution’ at time 𝑡 ≥ 1, a new Ramsey planner is given the op-
portunity to ignore history and solve a brand new Ramsey plan
• This new planner would want to reset the 𝜃𝑡 associated with the original Ramsey plan
to 𝜃0
• The incentive to reinitialize 𝜃𝑡 associated with this revolution experiment indicates the
time-inconsistency of the Ramsey plan
• By resetting 𝜃 to 𝜃0 , the new planner avoids the costs at time 𝑡 that the original Ram-
sey planner must pay to reap the beneficial effects that the original Ramsey plan for
𝑠 ≥ 𝑡 had achieved via its influence on the household’s decisions for 𝑠 = 0, … , 𝑡 − 1

83.6 Analysis

A competitive equilibrium is a triple of sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ that satisfies Eq. (2),
Eq. (3), and Eq. (6)
Chang works with a set of competitive equilibria defined as follows
Definition: 𝐶𝐸 = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐸 ∞ such that Eq. (2), Eq. (3), and Eq. (6) are satisfied }
𝐶𝐸 is not empty because there exists a competitive equilibrium with ℎ𝑡 = 1 for all 𝑡 ≥ 1,
namely, an equilibrium with a constant money supply and constant price level
Chang establishes that 𝐶𝐸 is also compact
Chang makes the following key observation that combines ideas of Abreu, Pearce, and Stac-
chetti [2] with insights of Kydland and Prescott [81]
Proposition: The continuation of a competitive equilibrium is a competitive equilibrium
That is, (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 implies that (𝑚⃗ 𝑡 , 𝑥𝑡⃗ , ℎ⃗ 𝑡 ) ∈ 𝐶𝐸 ∀ 𝑡 ≥ 1
(Lecture dynamic Stackelberg problems also used a version of this insight)
We can now state that a Ramsey problem is to


max ∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈𝐸 ∞
𝑡=0

subject to restrictions Eq. (2), Eq. (3), and Eq. (6)


Evidently, associated with any competitive equilibrium (𝑚0 , 𝑥0 ) is an implied value of 𝜃0 =
𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 )
To bring out a recursive structure inherent in the Ramsey problem, Chang defines the set

Ω = {𝜃 ∈ R such that 𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for some (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}

Equation Eq. (6) inherits from the household’s Euler equation for money holdings the prop-
erty that the value of 𝑚0 consistent with the representative household’s choices depends on
(ℎ⃗ 1 , 𝑚⃗ 1 )
This dependence is captured in the definition above by making Ω be the set of first period
values of 𝜃0 satisfying 𝜃0 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) for first period component (𝑚0 , ℎ0 ) of compet-
itive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗
1422 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Chang establishes that Ω is a nonempty and compact subset of R+


Next Chang advances:
Definition: Γ(𝜃) = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸|𝜃 = 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 )}
Thus, Γ(𝜃) is the set of competitive equilibrium sequences (𝑚,⃗ 𝑥,⃗ ℎ)⃗ whose first period compo-
nents (𝑚0 , ℎ0 ) deliver the prescribed value 𝜃 for first period marginal utility
If we knew the sets Ω, Γ(𝜃), we could use the following two-step procedure to find at least the
value of the Ramsey outcome to the representative household

1. Find the indirect value function 𝑤(𝜃) defined as


𝑤(𝜃) = max ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )]
(𝑚, ⃗
⃗ 𝑥,⃗ ℎ)∈Γ(𝜃) 𝑡=0

1. Compute the value of the Ramsey outcome by solving max𝜃∈Ω 𝑤(𝜃)

Thus, Chang states the following


Proposition:
𝑤(𝜃) satisfies the Bellman equation

𝑤(𝜃) = max ′ {𝑢(𝑓(𝑥)) + 𝑣(𝑚) + 𝛽𝑤(𝜃′ )} (9)


𝑥,𝑚,ℎ,𝜃

where maximization is subject to

(𝑚, 𝑥, ℎ) ∈ 𝐸 and 𝜃′ ∈ Ω (10)

and

𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥) (11)

and

−𝑥 = 𝑚(1 − ℎ) (12)

and

𝑚 ⋅ [𝑢′ (𝑓(𝑥)) − 𝑣′ (𝑚)] ≤ 𝛽𝜃′ , = if 𝑚 < 𝑚̄ (13)

Before we use this proposition to recover a recursive representation of the Ramsey plan, note
that the proposition relies on knowing the set Ω
To find Ω, Chang uses the insights of Kydland and Prescott [81] together with a method
based on the Abreu, Pearce, and Stacchetti [2] iteration to convergence on an operator 𝐵 that
maps continuation values into values
We want an operator that maps a continuation 𝜃 into a current 𝜃
83.6. ANALYSIS 1423

Chang lets 𝑄 be a nonempty, bounded subset of R


Elements of the set 𝑄 are taken to be candidate values for continuation marginal utilities
Chang defines an operator

𝐵(𝑄) = 𝜃 ∈ R such that there is (𝑚, 𝑥, ℎ, 𝜃′ ) ∈ 𝐸 × 𝑄

such that Eq. (11), Eq. (12), and Eq. (13) hold
Thus, 𝐵(𝑄) is the set of first period 𝜃’s attainable with (𝑚, 𝑥, ℎ) ∈ 𝐸 and some 𝜃′ ∈ 𝑄
Proposition:

1. 𝑄 ⊂ 𝐵(𝑄) implies 𝐵(𝑄) ⊂ Ω (‘self-generation’)


2. Ω = 𝐵(Ω) (‘factorization’)

The proposition characterizes Ω as the largest fixed point of 𝐵


It is easy to establish that 𝐵(𝑄) is a monotone operator
This property allows Chang to compute Ω as the limit of iterations on 𝐵 provided that itera-
tions begin from a sufficiently large initial set

83.6.1 Some Useful Notation

Let ℎ⃗ 𝑡 = (ℎ0 , ℎ1 , … , ℎ𝑡 ) denote a history of inverse money creation rates with time 𝑡 compo-
nent ℎ𝑡 ∈ Π
A government strategy 𝜎 = {𝜎𝑡 }∞
𝑡=0 is a 𝜎0 ∈ Π and for 𝑡 ≥ 1 a sequence of functions 𝜎𝑡 ∶
Π𝑡−1 → Π
Chang restricts the government’s choice of strategies to the following space:

𝐶𝐸𝜋 = {ℎ⃗ ∈ Π∞ ∶ there is some (𝑚,⃗ 𝑥)⃗ such that (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸}

In words, 𝐶𝐸𝜋 is the set of money growth sequences consistent with the existence of competi-
tive equilibria
Chang observes that 𝐶𝐸𝜋 is nonempty and compact
Definition: 𝜎 is said to be admissible if for all 𝑡 ≥ 1 and after any history ℎ⃗ 𝑡−1 , the continua-
tion ℎ⃗ 𝑡 implied by 𝜎 belongs to 𝐶𝐸𝜋
Admissibility of 𝜎 means that anticipated policy choices associated with 𝜎 are consistent with
the existence of competitive equilibria after each possible subsequent history
After any history ℎ⃗ 𝑡−1 , admissibility restricts the government’s choice in period 𝑡 to the set

𝐶𝐸𝜋0 = {ℎ ∈ Π ∶ there is ℎ⃗ ∈ 𝐶𝐸𝜋 with ℎ = ℎ0 }.

In words, 𝐶𝐸𝜋0 is the set of all first period money growth rates ℎ = ℎ0 , each of which is con-
sistent with the existence of a sequence of money growth rates ℎ⃗ starting from ℎ0 in the ini-
tial period and for which a competitive equilibrium exists
1424 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Remark: 𝐶𝐸𝜋0 = {ℎ ∈ Π ∶ there is (𝑚, 𝜃′ ) ∈ [0, 𝑚]̄ × Ω such that 𝑚𝑢′ [𝑓((ℎ − 1)𝑚) −
𝑣′ (𝑚)] ≤ 𝛽𝜃′ with equality if 𝑚 < 𝑚}
̄
Definition: An allocation rule is a sequence of functions 𝛼⃗ = {𝛼𝑡 }∞ 𝑡
𝑡=0 such that 𝛼𝑡 ∶ Π →
[0, 𝑚]̄ × 𝑋
Thus, the time 𝑡 component of 𝛼𝑡 (ℎ𝑡 ) is a pair of functions (𝑚𝑡 (ℎ𝑡 ), 𝑥𝑡 (ℎ𝑡 ))
Definition: Given an admissible government strategy 𝜎, an allocation rule 𝛼 is called com-
petitive if given any history ℎ⃗ 𝑡−1 and ℎ𝑡 ∈ 𝐶𝐸𝜋0 , the continuations of 𝜎 and 𝛼 after (ℎ⃗ 𝑡−1 , ℎ𝑡 )
induce a competitive equilibrium sequence

83.6.2 Another Operator

At this point it is convenient to introduce another operator that can be used to compute a
Ramsey plan
For computing a Ramsey plan, this operator is wasteful because it works with a state vector
that is bigger than necessary
We introduce this operator because it helps to prepare the way for Chang’s operator called
̃
𝐷(𝑍) that we shall describe in lecture credible government policies
It is also useful because a fixed point of the operator to be defined here provides a good guess
̃
for an initial set from which to initiate iterations on Chang’s set-to-set operator 𝐷(𝑍) to be
described in lecture credible government policies
Let 𝑆 be the set of all pairs (𝑤, 𝜃) of competitive equilibrium values and associated initial
marginal utilities
Let 𝑊 be a bounded set of values in R
Let 𝑍 be a nonempty subset of 𝑊 × Ω
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, 𝜃 pairs
Define the operator

𝐷(𝑍) = {(𝑤, 𝜃) ∶ there is ℎ ∈ 𝐶𝐸𝜋0

and a four-tuple (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍

such that

𝑤 = 𝑢(𝑓(𝑥(ℎ))) + 𝑣(𝑚(ℎ)) + 𝛽𝑤′ (ℎ) (14)

𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ)) (15)

𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1) (16)


83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1425

𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (17)

with equality if 𝑚(ℎ) < 𝑚}


̄

It is possible to establish
Proposition:

1. If 𝑍 ⊂ 𝐷(𝑍), then 𝐷(𝑍) ⊂ 𝑆 (‘self-generation’)


2. 𝑆 = 𝐷(𝑆) (‘factorization’)

Proposition:

1. Monotonicity of 𝐷: 𝑍 ⊂ 𝑍 ′ implies 𝐷(𝑍) ⊂ 𝐷(𝑍 ′ )


2. 𝑍 compact implies that 𝐷(𝑍) is compact

It can be shown that 𝑆 is compact and that therefore there exists a (𝑤, 𝜃) pair within this set
that attains the highest possible value 𝑤
This (𝑤, 𝜃) pair i associated with a Ramsey plan
Further, we can compute 𝑆 by iterating to convergence on 𝐷 provided that one begins with a
sufficiently large initial set 𝑆0
As a very useful by-product, the algorithm that finds the largest fixed point 𝑆 = 𝐷(𝑆) also
produces the Ramsey plan, its value 𝑤, and the associated competitive equilibrium

83.7 Calculating all Promise-Value Pairs in CE

Above we have defined the 𝐷(𝑍) operator as:

𝐷(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍

such that

𝑤 = 𝑢(𝑓(𝑥(ℎ))) + 𝑣(𝑚(ℎ)) + 𝛽𝑤′ (ℎ)

𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))

𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)

𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄

We noted that the set 𝑆 can be found by iterating to convergence on 𝐷, provided that we
start with a sufficiently large initial set 𝑆0
1426 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Our implementation builds on ideas in this notebook


To find 𝑆 we use a numerical algorithm called the outer hyperplane approximation algorithm
It was invented by Judd, Yeltekin, Conklin [74]
This algorithm constructs the smallest convex set that contains the fixed point of the 𝐷(𝑆)
operator
Given that we are finding the smallest convex set that contains 𝑆, we can represent it on a
computer as the intersection of a finite number of half-spaces
Let 𝐻 be a set of subgradients, and 𝐶 be a set of hyperplane levels
We approximate 𝑆 by:

𝑆 ̃ = {(𝑤, 𝜃)|𝐻 ⋅ (𝑤, 𝜃) ≤ 𝐶}

A key feature of this algorithm is that we discretize the action space, i.e., we create a grid of
possible values for 𝑚 and ℎ (note that 𝑥 is implied by 𝑚 and ℎ). This discretization simplifies
computation of 𝑆 ̃ by allowing us to find it by solving a sequence of linear programs
The outer hyperplane approximation algorithm proceeds as follows:

1. Initialize subgradients, 𝐻, and hyperplane levels, 𝐶0


2. Given a set of subgradients, 𝐻, and hyperplane levels, 𝐶𝑡 , for each subgradient ℎ𝑖 ∈ 𝐻:

• Solve a linear program (described below) for each action in the action space
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1

1. If |𝐶𝑡+1 − 𝐶𝑡 | > 𝜖, return to 2

Step 1 simply creates a large initial set 𝑆0


Given some set 𝑆𝑡 , Step 2 then constructs the set 𝑆𝑡+1 = 𝐷(𝑆𝑡 ). The linear program in
Step 2 is designed to construct a set 𝑆𝑡+1 that is as large as possible while satisfying the con-
straints of the 𝐷(𝑆) operator
To do this, for each subgradient ℎ𝑖 , and for each point in the action space (𝑚𝑗 , ℎ𝑗 ), we solve
the following problem:

max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]

subject to

𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡

𝑤 = 𝑢(𝑓(𝑥𝑗 )) + 𝑣(𝑚𝑗 ) + 𝛽𝑤′

𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1427

𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)

𝑚𝑗 (𝑢′ (𝑓(𝑥𝑗 )) − 𝑣′ (𝑚𝑗 )) ≤ 𝛽𝜃′ (= if 𝑚𝑗 < 𝑚)


̄

This problem maximizes the hyperplane level for a given set of actions
The second part of Step 2 then finds the maximum possible hyperplane level across the action
space
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0
Step 3 ends the algorithm when the difference between these sets is small enough
We have created a Python class that solves the model assuming the following functional
forms:

𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)

1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500

𝑓(𝑥) = 180 − (0.4𝑥)2

̄ are then variables to be specified for an instance of the


The remaining parameters {𝛽, 𝑚,̄ ℎ, ℎ}
Chang class
Below we use the class to solve the model and plot the resulting equilibrium set, once with
𝛽 = 0.3 and once with 𝛽 = 0.8
(Here we have set the number of subgradients to 10 in order to speed up the code for now -
we can increase accuracy by increasing the number of subgradients)

In [2]: """
Author: Sebastian Graves

Provides a class called ChangModel to solve different


parameterizations of the Chang (1998) model.
"""

import numpy as np
import quantecon as qe
import time

from scipy.spatial import ConvexHull


from scipy.optimize import linprog, minimize, minimize_scalar
from scipy.interpolate import UnivariateSpline
import numpy.polynomial.chebyshev as cheb

class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""

def __init__(self, β, mbar, h_min, h_max, n_h, n_m, N_g):


# Record parameters
self.β, self.mbar, self.h_min, self.h_max = β, mbar, h_min, h_max
self.n_h, self.n_m, self.N_g = n_h, n_m, N_g
1428 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

# Create other parameters


self.m_min = 1e-9
self.m_max = self.mbar
self.N_a = self.n_h*self.n_m

# Utility and production functions


uc = lambda c: np.log(c)
uc_p = lambda c: 1/c
v = lambda m: 1/500 * (mbar * m - 0.5 * m**2)**0.5
v_p = lambda m: 0.5/500 * (mbar * m - 0.5 * m**2)**(-0.5) * (mbar - m)
u = lambda h, m: uc(f(h, m)) + v(m)

def f(h, m):


x = m * (h - 1)
f = 180 - (0.4 * x)**2
return f

def θ(h, m):


x = m * (h - 1)
θ = uc_p(f(h, m)) * (m + x)
return θ

# Create set of possible action combinations, A


A1 = np.linspace(h_min, h_max, n_h).reshape(n_h, 1)
A2 = np.linspace(self.m_min, self.m_max, n_m).reshape(n_m, 1)
self.A = np.concatenate((np.kron(np.ones((n_m, 1)), A1),
np.kron(A2, np.ones((n_h, 1)))), axis=1)

# Pre-compute utility and output vectors


self.euler_vec = -np.multiply(self.A[:, 1], uc_p(f(self.A[:, 0], self.A[:, 1])) - v_p(self.A[:
self.u_vec = u(self.A[:, 0], self.A[:, 1])
self.Θ_vec = θ(self.A[:, 0], self.A[:, 1])
self.f_vec = f(self.A[:, 0], self.A[:, 1])
self.bell_vec = np.multiply(uc_p(f(self.A[:, 0],
self.A[:, 1])),
np.multiply(self.A[:, 1],
(self.A[:, 0] - 1))) + np.multiply(self.A[:, 1],
v_p(self.A[:, 1]))

# Find extrema of (w, θ) space for initial guess of equilibrium sets


p_vec = np.zeros(self.N_a)
w_vec = np.zeros(self.N_a)
for i in range(self.N_a):
p_vec[i] = self.Θ_vec[i]
w_vec[i] = self.u_vec[i]/(1 - β)

w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space

# Set up hyperplane levels and gradients for iterations


def SG_H_V(N, w_space, p_space):
"""
This function initializes the subgradients, hyperplane levels,
and extreme points of the value set by choosing an appropriate
origin and radius. It is based on a similar function in QuantEcon's Games.jl
"""

# First, create a unit circle. Want points placed on [0, 2π]


inc = 2 * np.pi / N
degrees = np.arange(0, 2 * np.pi, inc)

# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)

# Then calculate origin and radius


o = np.array([np.mean(w_space), np.mean(p_space)])
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1429

r1 = max((max(w_space) - o[0])**2, (o[0] - min(w_space))**2)


r2 = max((max(p_space) - o[1])**2, (o[1] - min(p_space))**2)
r = np.sqrt(r1 + r2)

# Now calculate vertices


Z = np.zeros((2, N))
for i in range(N):
Z[0, i] = o[0] + r*H.T[0, i]
Z[1, i] = o[1] + r*H.T[1, i]

# Corresponding hyperplane levels


C = np.zeros(N)
for i in range(N):
C[i] = np.dot(Z[:, i], H[i, :])

return C, H, Z

C, self.H, Z = SG_H_V(N_g, w_space, p_space)


C = C.reshape(N_g, 1)
self.c0_c, self.c0_s, self.c1_c, self.c1_s = np.copy(C), np.copy(C), np.copy(C), np.copy(C)
self.z0_s, self.z0_c, self.z1_s, self.z1_c = np.copy(Z), np.copy(Z), np.copy(Z), np.copy(Z)

self.w_bnds_s, self.w_bnds_c = (w_space[0], w_space[1]), (w_space[0], w_space[1])


self.p_bnds_s, self.p_bnds_c = (p_space[0], p_space[1]), (p_space[0], p_space[1])

# Create dictionaries to save equilibrium set for each iteration


self.c_dic_s, self.c_dic_c = {}, {}
self.c_dic_s[0], self.c_dic_c[0] = self.c0_s, self.c0_c

def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""

p_vec = np.full(self.N_a, np.nan)


c = [1, 0]

# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))

aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]

for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]

# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))

def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""

# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))

aineq_C = self.H
1430 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

bineq_C = self.c0_c
aeq_C = [[0, -self.β]]

aineq_S_mbar = np.vstack((np.vstack((self.H, np.array([0, -self.β]))),


np.array([-self.β, 0])))
bineq_S_mbar = np.vstack((self.c0_s, np.zeros((2, 1))))

aineq_S = np.vstack((self.H, np.array([-self.β, 0])))


bineq_S = np.vstack((self.c0_s, 0))
aeq_S = [[0, -self.β]]

# Update maximal hyperplane level


for i in range(self.N_g):
c_a1a2_c, t_a1a2_c = np.full(self.N_a, -np.inf), np.zeros((self.N_a, 2))
c_a1a2_s, t_a1a2_s = np.full(self.N_a, -np.inf), np.zeros((self.N_a, 2))

c = [-self.H[i, 0], -self.H[i, 1]]

for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:

# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0]*(self.u_vec[j] + self.β * res.x[0]) + self.H[i, 1]
t_a1a2_c[j] = res.x

# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] + self.β*res.x[0]) + self.H[i, 1]
t_a1a2_s[j] = res.x

idx_c = np.where(c_a1a2_c == max(c_a1a2_c))[0][0]


self.z1_c[:, i] = np.array([self.u_vec[idx_c] + self.β * t_a1a2_c[idx_c, 0],
self.Θ_vec[idx_c]])

idx_s = np.where(c_a1a2_s == max(c_a1a2_s))[0][0]


self.z1_s[:, i] = np.array([self.u_vec[idx_s] + self.β*t_a1a2_s[idx_s, 0],
self.Θ_vec[idx_s]])

for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])

def solve_sustainable(self, tol=1e-5, max_iter=250):


"""
Method to solve for the competitive and sustainable equilibrium sets.
"""

t = time.time()
diff = tol + 1
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1431

iters = 0

print('### --------------- ###')


print('Solving Chang Model Using Outer Hyperplane Approximation')
print('### --------------- ### \n')

print('Maximum difference when updating hyperplane levels:')

while diff > tol and iters < max_iter:


iters = iters + 1
self.solve_worst_spe()
self.solve_subgradient()
diff = max(np.maximum(abs(self.c0_c - self.c1_c),
abs(self.c0_s - self.c1_s)))
print(diff)

# Update hyperplane levels


self.c0_c, self.c0_s = np.copy(self.c1_c), np.copy(self.c1_s)

# Update bounds for w and θ


wmin_c, wmax_c = np.min(self.z1_c, axis=1)[0], np.max(self.z1_c, axis=1)[0]
pmin_c, pmax_c = np.min(self.z1_c, axis=1)[1], np.max(self.z1_c, axis=1)[1]

wmin_s, wmax_s = np.min(self.z1_s, axis=1)[0], np.max(self.z1_s, axis=1)[0]


pmin_S, pmax_S = np.min(self.z1_s, axis=1)[1], np.max(self.z1_s, axis=1)[1]

self.w_bnds_s, self.w_bnds_c = (wmin_s, wmax_s), (wmin_c, wmax_c)


self.p_bnds_s, self.p_bnds_c = (pmin_S, pmax_S), (pmin_c, pmax_c)

# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), np.copy(self.c1_s)
self.iters = iters

elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} seconds'.format(iters, round(elapsed, 2

def solve_bellman(self, θ_min, θ_max, order, disp=False, tol=1e-7, maxiters=100):


"""
Continuous Method to solve the Bellman equation in section 25.3
"""
mbar = self.mbar

# Utility and production functions


uc = lambda c: np.log(c)
uc_p = lambda c: 1 / c
v = lambda m: 1 / 500 * (mbar * m - 0.5 * m**2)**0.5
v_p = lambda m: 0.5/500 * (mbar*m - 0.5 * m**2)**(-0.5) * (mbar - m)
u = lambda h, m: uc(f(h, m)) + v(m)

def f(h, m):


x = m * (h - 1)
f = 180 - (0.4 * x)**2
return f

def θ(h, m):


x = m * (h - 1)
θ = uc_p(f(h, m)) * (m + x)
return θ

# Bounds for Maximization


lb1 = np.array([self.h_min, 0, θ_min])
ub1 = np.array([self.h_max, self.mbar - 1e-5, θ_max])
lb2 = np.array([self.h_min, θ_min])
ub2 = np.array([self.h_max, θ_max])

# Initialize Value Function coefficients


# Calculate roots of Chebyshev polynomial
k = np.linspace(order, 1, order)
roots = np.cos((2 * k - 1) * np.pi / (2 * order))
# Scale to approximation space
s = θ_min + (roots - -1) / 2 * (θ_max - θ_min)
# Create a basis matrix
Φ = cheb.chebvander(roots, order - 1)
1432 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

c = np.zeros(Φ.shape[0])

# Function to minimize and constraints


def p_fun(x):
scale = -1 + 2 * (x[2] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0], x[1]) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun

def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun

cons1 = ({'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[1] * (x[0] - 1) + v_p(x[1])
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[0] * x[1] - θ})
cons2 = ({'type': 'ineq', 'fun': lambda x: uc_p(f(x[0], mbar)) * mbar * (x[0] - 1) + v_p(mbar)
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], mbar)) * x[0] * mbar - θ})

bnds1 = np.concatenate([lb1.reshape(3, 1), ub1.reshape(3, 1)], axis=1)


bnds2 = np.concatenate([lb2.reshape(2, 1), ub2.reshape(2, 1)], axis=1)

# Bellman Iterations
diff = 1
iters = 1

while diff > tol:


# 1. Maximization, given value function guess
p_iter1 = np.zeros(order)
for i in range(order):
θ = s[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p_iter1[i] = -p_fun(res.x)
res = minimize(p_fun2,
lb2 + (ub2-lb2) / 2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p_iter1[i] and res.success == True:
p_iter1[i] = -p_fun2(res.x)

# 2. Bellman updating of Value Function coefficients


c1 = np.linalg.solve(Φ, p_iter1)
# 3. Compute distance and update
diff = np.linalg.norm(c - c1)
if bool(disp == True):
print(diff)
c = np.copy(c1)
iters = iters + 1
if iters > maxiters:
print('Convergence failed after {} iterations'.format(maxiters))
break

self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))

# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1433

θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p

self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)

# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)

# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun

res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x

# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
1434 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]

self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
self.x_series = m_series * (h_series - 1)

In [3]: ch1 = ChangModel(β=0.3, mbar=30, h_min=0.9, h_max=2, n_h=8, n_m=35, N_g=10)


ch1.solve_sustainable()

### --------------- ###


Solving Chang Model Using Outer Hyperplane Approximation
### --------------- ###

Maximum difference when updating hyperplane levels:


[1.91679545]
[0.66781649]
[0.49234789]
[0.3241217]
[0.19022279]
[0.10862838]
[0.05817151]
[0.02620056]
[0.01836386]
[0.01415009]
[0.00297077]
[0.00089123]
[0.00026737]
[8.02108797e-05]
[2.40632639e-05]
[7.21897917e-06]
Convergence achieved after 16 iterations and 168.65 seconds

In [4]: import polytope


import matplotlib.pyplot as plt
%matplotlib inline

def plot_competitive(ChangModel):
"""
Method that only plots competitive equilibrium set
"""
poly_C = polytope.Polytope(ChangModel.H, ChangModel.c1_c)
ext_C = polytope.extreme(poly_C)

fig, ax = plt.subplots(figsize=(7, 5))

ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)

ax.fill(ext_C[:,0], ext_C[:,1], 'r', zorder=0)


ChangModel.min_theta = min(ext_C[:, 1])
ChangModel.max_theta = max(ext_C[:, 1])

# Add point showing Ramsey Plan


idx_Ramsey = np.where(ext_C[:, 0] == max(ext_C[:, 0]))[0][0]
R = ext_C[idx_Ramsey, :]
ax.scatter(R[0], R[1], 150, 'black', 'o', zorder=1)
w_min = min(ext_C[:, 0])

# Label Ramsey Plan slightly to the right of the point


ax.annotate("R", xy=(R[0], R[1]), xytext=(R[0] + 0.03 * (R[0] - w_min),
R[1]), fontsize=18)

plt.tight_layout()
plt.show()

plot_competitive(ch1)
83.7. CALCULATING ALL PROMISE-VALUE PAIRS IN CE 1435

`polytope` failed to import `cvxopt.glpk`.


will use `scipy.optimize.linprog`

In [5]: ch2 = ChangModel(β=0.8, mbar=30, h_min=0.9, h_max=1/0.8, n_h=8, n_m=35, N_g=10)


ch2.solve_sustainable()

### --------------- ###


Solving Chang Model Using Outer Hyperplane Approximation
### --------------- ###

Maximum difference when updating hyperplane levels:


[0.06369]
[0.02476]
[0.02153]
[0.01915]
[0.01795]
[0.01642]
[0.01507]
[0.01284]
[0.01106]
[0.00694]
[0.0085]
[0.00781]
[0.00433]
[0.00492]
[0.00303]
[0.00182]
[0.00638]
[0.00116]
[0.00093]
[0.00075]
[0.0006]
[0.00494]
[0.00038]
[0.00121]
1436 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 971.84 seconds

In [6]: plot_competitive(ch2)

83.8 Solving a Continuation Ramsey Planner’s Bellman Equa-


tion

In this section we solve the Bellman equation confronting a continuation Ramsey planner
The construction of a Ramsey plan is decomposed into a two subproblems in Ramsey plans,
time inconsistency, sustainable plans and dynamic Stackelberg problems

• Subproblem 1 is faced by a sequence of continuation Ramsey planners at 𝑡 ≥ 1


83.8. SOLVING A CONTINUATION RAMSEY PLANNER’S BELLMAN EQUATION 1437

• Subproblem 2 is faced by a Ramsey planner at 𝑡 = 0

The problem is:

𝐽 (𝜃) = max ′ 𝑢(𝑓(𝑥)) + 𝑣(𝑚) + 𝛽𝐽 (𝜃′ )


𝑚,𝑥,ℎ,𝜃

subject to:

𝜃 ≤ 𝑢′ (𝑓(𝑥))𝑥 + 𝑣′ (𝑚)𝑚 + 𝛽𝜃′

𝜃 = 𝑢′ (𝑓(𝑥))(𝑚 + 𝑥)

𝑥 = 𝑚(ℎ − 1)

(𝑚, 𝑥, ℎ) ∈ 𝐸

𝜃′ ∈ Ω

To solve this Bellman equation, we must know the set Ω


We have solved the Bellman equation for the two sets of parameter values for which we com-
puted the equilibrium value sets above
Hence for these parameter configurations, we know the bounds of Ω
The two sets of parameters differ only in the level of 𝛽
From the figures earlier in this lecture, we know that when 𝛽 = 0.3, Ω = [0.0088, 0.0499], and
when 𝛽 = 0.8, Ω = [0.0395, 0.2193]

In [7]: ch1 = ChangModel(β=0.3, mbar=30, h_min=0.99, h_max=1/0.3, n_h=8, n_m=35, N_g=50)


ch2 = ChangModel(β=0.8, mbar=30, h_min=0.1, h_max=1/0.8, n_h=20, n_m=50, N_g=50)

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:35: RuntimeWarning: invalid value encou

In [8]: ch1.solve_bellman(θ_min=0.01, θ_max=0.0499, order=30, tol=1e-6)


ch2.solve_bellman(θ_min=0.045, θ_max=0.15, order=30, tol=1e-6)

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:290: RuntimeWarning: invalid value enco

Convergence achieved after 15 iterations


Convergence achieved after 72 iterations

First, a quick check that our approximations of the value functions are good
We do this by calculating the residuals between iterates on the value function on a fine grid:

In [9]: max(abs(ch1.resid_grid)), max(abs(ch2.resid_grid))


1438 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

Out[9]: (6.463131553502421e-06, 6.875429860997428e-07)

The value functions plotted below trace out the right edges of the sets of equilibrium values
plotted above

In [10]: fig, axes = plt.subplots(1, 2, figsize=(12, 4))

for ax, model in zip(axes, (ch1, ch2)):


ax.plot(model.θ_grid, model.p_iter)
ax.set(xlabel=r"$\theta$",
ylabel=r"$J(\theta)$",
title=rf"$\beta = {model.β}$")

plt.show()

The next figure plots the optimal policy functions; values of 𝜃′ , 𝑚, 𝑥, ℎ for each value of the
state 𝜃:

In [11]: for model in (ch1, ch2):

fig, axes = plt.subplots(2, 2, figsize=(12, 6), sharex=True)


fig.suptitle(rf"$\beta = {model.β}$", fontsize=16)

plots = [model.θ_prime_grid, model.m_grid,


model.h_grid, model.x_grid]
labels = [r"$\theta'$", "$m$", "$h$", "$x$"]

for ax, plot, label in zip(axes.flatten(), plots, labels):


ax.plot(model.θ_grid_fine, plot)
ax.set_xlabel(r"$\theta$", fontsize=14)
ax.set_ylabel(label, fontsize=14)

plt.show()
83.8. SOLVING A CONTINUATION RAMSEY PLANNER’S BELLMAN EQUATION 1439

With the first set of parameter values, the value of 𝜃′ chosen by the Ramsey planner quickly
hits the upper limit of Ω
But with the second set of parameters it converges to a value in the interior of the set
Consequently, the choice of 𝜃 ̄ is clearly important with the first set of parameter values
One way of seeing this is plotting 𝜃′ (𝜃) for each set of parameters
With the first set of parameter values, this function does not intersect the 45-degree line until
𝜃,̄ whereas in the second set of parameter values, it intersects in the interior

In [12]: fig, axes = plt.subplots(1, 2, figsize=(12, 4))


1440 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL

for ax, model in zip(axes, (ch1, ch2)):


ax.plot(model.θ_grid_fine, model.θ_prime_grid, label=r"$\theta'(\theta)$")
ax.plot(model.θ_grid_fine, model.θ_grid_fine, label=r"$\theta$")
ax.set(xlabel=r"$\theta$", title=rf"$\beta = {model.β}$")

axes[0].legend()
plt.show()

Subproblem 2 is equivalent to the planner choosing the initial value of 𝜃 (i.e. the value which
maximizes the value function)
From this starting point, we can then trace out the paths for {𝜃𝑡 , 𝑚𝑡 , ℎ𝑡 , 𝑥𝑡 }∞
𝑡=0 that support
this equilibrium
These are shown below for both sets of parameters

In [13]: for model in (ch1, ch2):

fig, axes = plt.subplots(2, 2, figsize=(12, 6))


fig.suptitle(rf"$\beta = {model.β}$")

plots = [model.θ_series, model.m_series, model.h_series, model.x_series]


labels = [r"$\theta$", "$m$", "$h$", "$x$"]

for ax, plot, label in zip(axes.flatten(), plots, labels):


ax.plot(plot)
ax.set(xlabel='t', ylabel=label)

plt.show()
83.8. SOLVING A CONTINUATION RAMSEY PLANNER’S BELLMAN EQUATION 1441

83.8.1 Next Steps

In Credible Government Policies in Chang Model we shall find a subset of competitive equi-
libria that are sustainable in the sense that a sequence of government administrations that
chooses sequentially, rather than once and for all at time 0 will choose to implement them
In the process of constructing them, we shall construct another, smaller set of competitive
equilibria
1442 83. COMPETITIVE EQUILIBRIA OF CHANG MODEL
84

Credible Government Policies in


Chang Model

84.1 Contents

• Overview 84.2
• The Setting 84.3
• Calculating the Set of Sustainable Promise-Value Pairs 84.4

Co-author: Sebastian Graves


In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install polytope

84.2 Overview

Some of the material in this lecture and competitive equilibria in the Chang model can be
viewed as more sophisticated and complete treatments of the topics discussed in Ramsey
plans, time inconsistency, sustainable plans
This lecture assumes almost the same economic environment analyzed in competitive equilib-
ria in the Chang model
The only change – and it is a substantial one – is the timing protocol for making government
decisions
In competitive equilibria in the Chang model, a Ramsey planner chose a comprehensive gov-
ernment policy once-and-for-all at time 0
Now in this lecture, there is no time 0 Ramsey planner
Instead there is a sequence of government decision-makers, one for each 𝑡
The time 𝑡 government decision-maker choose time 𝑡 government actions after forecasting
what future governments will do
We use the notion of a sustainable plan proposed in [26], also referred to as a credible public
policy in [124]

1443
1444 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

Technically, this lecture starts where lecture competitive equilibria in the Chang model on
Ramsey plans within the Chang [25] model stopped
That lecture presents recursive representations of competitive equilibria and a Ramsey plan for
a version of a model of Calvo [21] that Chang used to analyze and illustrate these concepts
We used two operators to characterize competitive equilibria and a Ramsey plan, respectively
In this lecture, we define a credible public policy or sustainable plan
Starting from a large enough initial set 𝑍0 , we use iterations on Chang’s set-to-set operator
̃
𝐷(𝑍) to compute a set of values associated with sustainable plans
̃
Chang’s operator 𝐷(𝑍) is closely connected with the operator 𝐷(𝑍) introduced in lecture
competitive equilibria in the Chang model

̃
• 𝐷(𝑍) incorporates all of the restrictions imposed in constructing the operator 𝐷(𝑍),
but …

• It adds some additional restrictions

– these additional restrictions incorporate the idea that a plan must be sustainable
– sustainable means that the government wants to implement it at all times after all
histories

84.3 The Setting

We begin by reviewing the set up deployed in competitive equilibria in the Chang model
Chang’s model, adopted from Calvo, is designed to focus on the intertemporal trade-offs be-
tween the welfare benefits of deflation and the welfare costs associated with the high tax col-
lections required to retire money at a rate that delivers deflation
A benevolent time 0 government can promote utility generating increases in real balances
only by imposing an infinite sequence of sufficiently large distorting tax collections
To promote the welfare increasing effects of high real balances, the government wants to in-
duce gradual deflation
We start by reviewing notation
For a sequence of scalars 𝑧 ⃗ ≡ {𝑧𝑡 }∞ 𝑡
𝑡=0 , let 𝑧 ⃗ = (𝑧0 , … , 𝑧𝑡 ), 𝑧𝑡⃗ = (𝑧𝑡 , 𝑧𝑡+1 , …)

An infinitely lived representative agent and an infinitely lived government exist at dates 𝑡 =
0, 1, …
The objects in play are

• an initial quantity 𝑀−1 of nominal money holdings


• a sequence of inverse money growth rates ℎ⃗ and an associated sequence of nominal
money holdings 𝑀⃗
• a sequence of values of money 𝑞 ⃗
• a sequence of real money holdings 𝑚⃗
• a sequence of total tax collections 𝑥⃗
• a sequence of per capita rates of consumption 𝑐 ⃗
• a sequence of per capita incomes 𝑦 ⃗
84.3. THE SETTING 1445

A benevolent government chooses sequences (𝑀⃗ , ℎ,⃗ 𝑥)⃗ subject to a sequence of budget con-
straints and other constraints imposed by competitive equilibrium
Given tax collection and price of money sequences, a representative household chooses se-
quences (𝑐,⃗ 𝑚)
⃗ of consumption and real balances
In competitive equilibrium, the price of money sequence 𝑞 ⃗ clears markets, thereby reconciling
decisions of the government and the representative household

84.3.1 The Household’s Problem

A representative household faces a nonnegative value of money sequence 𝑞 ⃗ and sequences 𝑦,⃗ 𝑥⃗
of income and total tax collections, respectively
The household chooses nonnegative sequences 𝑐,⃗ 𝑀⃗ of consumption and nominal balances,
respectively, to maximize


∑ 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] (1)
𝑡=0

subject to

𝑞𝑡 𝑀𝑡 ≤ 𝑦𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑐𝑡 − 𝑥𝑡 (2)

and

𝑞𝑡 𝑀𝑡 ≤ 𝑚̄ (3)

Here 𝑞𝑡 is the reciprocal of the price level at 𝑡, also known as the value of money
Chang [25] assumes that

• 𝑢 ∶ R+ → R is twice continuously differentiable, strictly concave, and strictly increasing;


• 𝑣 ∶ R+ → R is twice continuously differentiable and strictly concave;
• 𝑢′ (𝑐)𝑐→0 = lim𝑚→0 𝑣′ (𝑚) = +∞;
• there is a finite level 𝑚 = 𝑚𝑓 such that 𝑣′ (𝑚𝑓 ) = 0

Real balances carried out of a period equal 𝑚𝑡 = 𝑞𝑡 𝑀𝑡


Inequality Eq. (2) is the household’s time 𝑡 budget constraint
It tells how real balances 𝑞𝑡 𝑀𝑡 carried out of period 𝑡 depend on income, consumption, taxes,
and real balances 𝑞𝑡 𝑀𝑡−1 carried into the period
Equation Eq. (3) imposes an exogenous upper bound 𝑚̄ on the choice of real balances, where
𝑚̄ ≥ 𝑚𝑓

84.3.2 Government

The government chooses a sequence of inverse money growth rates with time 𝑡 component
ℎ𝑡 ≡ 𝑀𝑀𝑡−1 ∈ Π ≡ [𝜋, 𝜋], where 0 < 𝜋 < 1 < 𝛽1 ≤ 𝜋
𝑡

The government faces a sequence of budget constraints with time 𝑡 component


1446 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

−𝑥𝑡 = 𝑞𝑡 (𝑀𝑡 − 𝑀𝑡−1 )

which, by using the definitions of 𝑚𝑡 and ℎ𝑡 , can also be expressed as

−𝑥𝑡 = 𝑚𝑡 (1 − ℎ𝑡 ) (4)

The restrictions 𝑚𝑡 ∈ [0, 𝑚]̄ and ℎ𝑡 ∈ Π evidently imply that 𝑥𝑡 ∈ 𝑋 ≡ [(𝜋 − 1)𝑚,̄ (𝜋 − 1)𝑚]̄
We define the set 𝐸 ≡ [0, 𝑚]̄ × Π × 𝑋, so that we require that (𝑚, ℎ, 𝑥) ∈ 𝐸
To represent the idea that taxes are distorting, Chang makes the following assumption about
outcomes for per capita output:

𝑦𝑡 = 𝑓(𝑥𝑡 ) (5)

where 𝑓 ∶ R → R satisfies 𝑓(𝑥) > 0, is twice continuously differentiable, 𝑓 ″ (𝑥) < 0, and
𝑓(𝑥) = 𝑓(−𝑥) for all 𝑥 ∈ R, so that subsidies and taxes are equally distorting
The purpose is not to model the causes of tax distortions in any detail but simply to summa-
rize the outcome of those distortions via the function 𝑓(𝑥)
A key part of the specification is that tax distortions are increasing in the absolute value of
tax revenues
The government chooses a competitive equilibrium that maximizes Eq. (1)

84.3.3 Within-period Timing Protocol

For the results in this lecture, the timing of actions within a period is important because of
the incentives that it activates
Chang assumed the following within-period timing of decisions:

• first, the government chooses ℎ𝑡 and 𝑥𝑡 ;


• then given 𝑞 ⃗ and its expectations about future values of 𝑥 and 𝑦’s, the household
chooses 𝑀𝑡 and therefore 𝑚𝑡 because 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 ;
• then output 𝑦𝑡 = 𝑓(𝑥𝑡 ) is realized;
• finally 𝑐𝑡 = 𝑦𝑡

This within-period timing confronts the government with choices framed by how the private
sector wants to respond when the government takes time 𝑡 actions that differ from what the
private sector had expected
This timing will shape the incentives confronting the government at each history that are to
be incorporated in the construction of the 𝐷̃ operator below

84.3.4 Household’s Problem

Given 𝑀−1 and {𝑞𝑡 }∞


𝑡=0 , the household’s problem is
84.3. THE SETTING 1447


ℒ = max min ∑ 𝛽 𝑡 {𝑢(𝑐𝑡 ) + 𝑣(𝑀𝑡 𝑞𝑡 ) + 𝜆𝑡 [𝑦𝑡 − 𝑐𝑡 − 𝑥𝑡 + 𝑞𝑡 𝑀𝑡−1 − 𝑞𝑡 𝑀𝑡 ]
𝑐,⃗ 𝑀⃗ 𝜆,⃗ 𝜇⃗ 𝑡=0

+ 𝜇𝑡 [𝑚̄ − 𝑞𝑡 𝑀𝑡 ]}

First-order conditions with respect to 𝑐𝑡 and 𝑀𝑡 , respectively, are

𝑢′ (𝑐𝑡 ) = 𝜆𝑡
𝑞𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑀𝑡 𝑞𝑡 )] ≤ 𝛽𝑢′ (𝑐𝑡+1 )𝑞𝑡+1 , = if 𝑀𝑡 𝑞𝑡 < 𝑚̄

𝑀𝑡−1 𝑚𝑡
Using ℎ𝑡 = 𝑀𝑡 and 𝑞𝑡 = 𝑀𝑡 in these first-order conditions and rearranging implies

𝑚𝑡 [𝑢′ (𝑐𝑡 ) − 𝑣′ (𝑚𝑡 )] ≤ 𝛽𝑢′ (𝑓(𝑥𝑡+1 ))𝑚𝑡+1 ℎ𝑡+1 , = if 𝑚𝑡 < 𝑚̄ (6)

Define the following key variable

𝜃𝑡+1 ≡ 𝑢′ (𝑓(𝑥𝑡+1 ))𝑚𝑡+1 ℎ𝑡+1 (7)

This is real money balances at time 𝑡 + 1 measured in units of marginal utility, which Chang
refers to as ‘the marginal utility of real balances’
From the standpoint of the household at time 𝑡, equation Eq. (7) shows that 𝜃𝑡+1 intermedi-
ates the influences of (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) on the household’s choice of real balances 𝑚𝑡
By “intermediates” we mean that the future paths (𝑥𝑡+1
⃗ , 𝑚⃗ 𝑡+1 ) influence 𝑚𝑡 entirely through
their effects on the scalar 𝜃𝑡+1
The observation that the one dimensional promised marginal utility of real balances 𝜃𝑡+1
functions in this way is an important step in constructing a class of competitive equilibria
that have a recursive representation
A closely related observation pervaded the analysis of Stackelberg plans in dynamic Stackel-
berg problems and the Calvo model

84.3.5 Competitive Equilibrium

Definition:

• A government policy is a pair of sequences (ℎ,⃗ 𝑥)⃗ where ℎ𝑡 ∈ Π ∀𝑡 ≥ 0


• A price system is a non-negative value of money sequence 𝑞 ⃗
• An allocation is a triple of non-negative sequences (𝑐,⃗ 𝑚,⃗ 𝑦)⃗

It is required that time 𝑡 components (𝑚𝑡 , 𝑥𝑡 , ℎ𝑡 ) ∈ 𝐸


Definition:
Given 𝑀−1 , a government policy (ℎ,⃗ 𝑥),
⃗ price system 𝑞,⃗ and allocation (𝑐,⃗ 𝑚,⃗ 𝑦)⃗ are said to be
a competitive equilibrium if

• 𝑚𝑡 = 𝑞𝑡 𝑀𝑡 and 𝑦𝑡 = 𝑓(𝑥𝑡 )
• The government budget constraint is satisfied
• Given 𝑞,⃗ 𝑥,⃗ 𝑦,⃗ (𝑐,⃗ 𝑚)
⃗ solves the household’s problem
1448 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

84.3.6 A Credible Government Policy

Chang works with


A credible government policy with a recursive representation

• Here there is no time 0 Ramsey planner


• Instead there is a sequence of governments, one for each 𝑡, that choose time 𝑡 govern-
ment actions after forecasting what future governments will do

• Let 𝑤 = ∑𝑡=0 𝛽 𝑡 [𝑢(𝑐𝑡 ) + 𝑣(𝑞𝑡 𝑀𝑡 )] be a value associated with a particular competitive
equilibrium
• A recursive representation of a credible government policy is a pair of initial conditions
(𝑤0 , 𝜃0 ) and a five-tuple of functions

ℎ(𝑤𝑡 , 𝜃𝑡 ), 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ), 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ), 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ), Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )


mapping 𝑤𝑡 , 𝜃𝑡 and in some cases ℎ𝑡 into ℎ̂ 𝑡 , 𝑚𝑡 , 𝑥𝑡 , 𝑤𝑡+1 , and 𝜃𝑡+1 , respectively
• Starting from an initial condition (𝑤0 , 𝜃0 ), a credible government policy can be con-
structed by iterating on these functions in the following order that respects the within-
period timing:

ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) (8)
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )

• Here it is to be understood that ℎ̂ 𝑡 is the action that the government policy instructs
the government to take, while ℎ𝑡 possibly not equal to ℎ̂ 𝑡 is some other action that the
government is free to take at time 𝑡

The plan is credible if it is in the time 𝑡 government’s interest to execute it


Credibility requires that the plan be such that for all possible choices of ℎ𝑡 that are consistent
with competitive equilibria,

𝑢(𝑓(𝑥(ℎ̂ 𝑡 , 𝑤𝑡 , 𝜃𝑡 ))) + 𝑣(𝑚(ℎ̂ 𝑡 , 𝑤𝑡 , 𝜃𝑡 )) + 𝛽𝜒(ℎ̂ 𝑡 , 𝑤𝑡 , 𝜃𝑡 )


≥ 𝑢(𝑓(𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ))) + 𝑣(𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )) + 𝛽𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )

so that at each instance and circumstance of choice, a government attains a weakly higher
lifetime utility with continuation value 𝑤𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 ) by adhering to the plan and con-
firming the associated time 𝑡 action ℎ̂ 𝑡 that the public had expected earlier
Please note the subtle change in arguments of the functions used to represent a competitive
equilibrium and a Ramsey plan, on the one hand, and a credible government plan, on the
other hand
The extra arguments appearing in the functions used to represent a credible plan come from
allowing the government to contemplate disappointing the private sector’s expectation about
its time 𝑡 choice ℎ̂ 𝑡
A credible plan induces the government to confirm the private sector’s expectation
84.3. THE SETTING 1449

The recursive representation of the plan uses the evolution of continuation values to deter the
government from wanting to disappoint the private sector’s expectations
Technically, a Ramsey plan and a credible plan both incorporate history dependence
For a Ramsey plan, this is encoded in the dynamics of the state variable 𝜃𝑡 , a promised
marginal utility that the Ramsey plan delivers to the private sector
For a credible government plan, we the two-dimensional state vector (𝑤𝑡 , 𝜃𝑡 ) encodes history
dependence

84.3.7 Sustainable Plans

A government strategy 𝜎 and an allocation rule 𝛼 are said to constitute a sustainable plan
(SP) if

1. 𝜎 is admissible
2. Given 𝜎, 𝛼 is competitive
3. After any history ℎ⃗ 𝑡−1 , the continuation of 𝜎 is optimal for the government; i.e., the se-
quence ℎ⃗ 𝑡 induced by 𝜎 after ℎ⃗ 𝑡−1 maximizes over 𝐶𝐸𝜋 given 𝛼

Given any history ℎ⃗ 𝑡−1 , the continuation of a sustainable plan is a sustainable plan
Let Θ = {(𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ 𝐶𝐸 ∶ there is an SP whose outcome is(𝑚,⃗ 𝑥,⃗ ℎ)}

Sustainable outcomes are elements of Θ


Now consider the space

𝑆 = {(𝑤, 𝜃) ∶ there is a sustainable outcome (𝑚,⃗ 𝑥,⃗ ℎ)⃗ ∈ Θ

with value


𝑤 = ∑ 𝛽 𝑡 [𝑢(𝑓(𝑥𝑡 )) + 𝑣(𝑚𝑡 )] and such that 𝑢′ (𝑓(𝑥0 ))(𝑚0 + 𝑥0 ) = 𝜃}
𝑡=0

The space 𝑆 is a compact subset of 𝑊 × Ω where 𝑊 = [𝑤, 𝑤] is the space of values associated
with sustainable plans. Here 𝑤 and 𝑤 are finite bounds on the set of values
Because there is at least one sustainable plan, 𝑆 is nonempty
Now recall the within-period timing protocol, which we can depict (ℎ, 𝑥) → 𝑚 = 𝑞𝑀 → 𝑦 = 𝑐
With this timing protocol in mind, the time 0 component of an SP has the following compo-
nents:

1. A period 0 action ℎ̂ ∈ Π that the public expects the government to take, together
with subsequent within-period consequences 𝑚(ℎ), ̂ 𝑥(ℎ)̂ when the government acts as
expected
2. For any first-period action ℎ ≠ ℎ̂ with ℎ ∈ 𝐶𝐸𝜋0 , a pair of within-period consequences
𝑚(ℎ), 𝑥(ℎ) when the government does not act as the public had expected
3. For every ℎ ∈ Π, a pair (𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ 𝑆 to carry into next period
1450 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

These components must be such that it is optimal for the government to choose ℎ̂ as ex-
pected; and for every possible ℎ ∈ Π, the government budget constraint and the household’s
Euler equation must hold with continuation 𝜃 being 𝜃′ (ℎ)
Given the timing protocol within the model, the representative household’s response to a
government deviation to ℎ ≠ ℎ̂ from a prescribed ℎ̂ consists of a first-period action 𝑚(ℎ)
and associated subsequent actions, together with future equilibrium prices, captured by
(𝑤′ (ℎ), 𝜃′ (ℎ))
At this point, Chang introduces an idea in the spirit of Abreu, Pearce, and Stacchetti [2]
Let 𝑍 be a nonempty subset of 𝑊 × Ω
Think of using pairs (𝑤′ , 𝜃′ ) drawn from 𝑍 as candidate continuation value, promised
marginal utility pairs
Define the following operator:

̃
𝐷(𝑍) = {(𝑤, 𝜃) ∶ there is ℎ̂ ∈ 𝐶𝐸𝜋0 and for each ℎ ∈ 𝐶𝐸𝜋0
(9)
a four-tuple (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍

such that

̂ + 𝑣(𝑚(ℎ))
𝑤 = 𝑢(𝑓(𝑥(ℎ))) ̂ + 𝛽𝑤′ (ℎ)̂ (10)

̂
𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ)̂ + 𝑥(ℎ))
̂ (11)

and for all ℎ ∈ 𝐶𝐸𝜋0

𝑤 ≥ 𝑢(𝑓(𝑥(ℎ))) + 𝑣(𝑚(ℎ)) + 𝛽𝑤′ (ℎ) (12)

𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1) (13)

and

𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (14)

with equality if 𝑚(ℎ) < 𝑚}


̄

This operator adds the key incentive constraint to the conditions that had defined the earlier
𝐷(𝑍) operator defined in competitive equilibria in the Chang model
Condition Eq. (12) requires that the plan deter the government from wanting to take one-shot
deviations when candidate continuation values are drawn from 𝑍
Proposition:

̃
1. If 𝑍 ⊂ 𝐷(𝑍), ̃
then 𝐷(𝑍) ⊂ 𝑆 (‘self-generation’)
̃
2. 𝑆 = 𝐷(𝑆) (‘factorization’)
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1451

Proposition:

1. Monotonicity of 𝐷:̃ 𝑍 ⊂ 𝑍 ′ implies 𝐷(𝑍)


̃ ̃ ′)
⊂ 𝐷(𝑍
̃
2. 𝑍 compact implies that 𝐷(𝑍) is compact

Chang establishes that 𝑆 is compact and that therefore there exists a highest value SP and a
lowest value SP
Further, the preceding structure allows Chang to compute 𝑆 by iterating to convergence on 𝐷̃
provided that one begins with a sufficiently large initial set 𝑍0
This structure delivers the following recursive representation of a sustainable outcome:

1. choose an initial (𝑤0 , 𝜃0 ) ∈ 𝑆;


2. generate a sustainable outcome recursively by iterating on Eq. (8), which we repeat here
for convenience:

ℎ̂ 𝑡 = ℎ(𝑤𝑡 , 𝜃𝑡 )
𝑚𝑡 = 𝑚(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑥𝑡 = 𝑥(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝑤𝑡+1 = 𝜒(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )
𝜃𝑡+1 = Ψ(ℎ𝑡 , 𝑤𝑡 , 𝜃𝑡 )

84.4 Calculating the Set of Sustainable Promise-Value Pairs

̃
Above we defined the 𝐷(𝑍) operator as Eq. (9)
Chang (1998) provides a method for dealing with the final three constraints
These incentive constraints ensure that the government wants to choose ℎ̂ as the private sec-
tor had expected it to
Chang’s simplification starts from the idea that, when considering whether or not to confirm
the private sector’s expectation, the government only needs to consider the payoff of the best
possible deviation
Equally, to provide incentives to the government, we only need to consider the harshest possi-
ble punishment
Let ℎ denote some possible deviation. Chang defines:

𝑃 (ℎ; 𝑍) = min 𝑢(𝑓(𝑥)) + 𝑣(𝑚) + 𝛽𝑤′

where the minimization is subject to

𝑥 = 𝑚(ℎ − 1)

𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) + 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)}
̄
1452 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

(𝑚, 𝑥, 𝑤′ , 𝜃′ ) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍

For a given deviation ℎ, this problem finds the worst possible sustainable value
We then define:

𝐵𝑅(𝑍) = max 𝑃 (ℎ; 𝑍) subject to ℎ ∈ 𝐶𝐸𝜋0

𝐵𝑅(𝑍) is the value of the government’s most tempting deviation


̃
With this in hand, we can define a new operator 𝐸(𝑍) that is equivalent to the 𝐷(𝑍) opera-
tor but simpler to implement:

𝐸(𝑍) = {(𝑤, 𝜃) ∶ ∃ℎ ∈ 𝐶𝐸𝜋0 and (𝑚(ℎ), 𝑥(ℎ), 𝑤′ (ℎ), 𝜃′ (ℎ)) ∈ [0, 𝑚]̄ × 𝑋 × 𝑍

such that

𝑤 = 𝑢(𝑓(𝑥(ℎ))) + 𝑣(𝑚(ℎ)) + 𝛽𝑤′ (ℎ)

𝜃 = 𝑢′ (𝑓(𝑥(ℎ)))(𝑚(ℎ) + 𝑥(ℎ))

𝑥(ℎ) = 𝑚(ℎ)(ℎ − 1)

𝑚(ℎ)(𝑢′ (𝑓(𝑥(ℎ))) − 𝑣′ (𝑚(ℎ))) ≤ 𝛽𝜃′ (ℎ) (with equality if 𝑚(ℎ) < 𝑚)


̄

and

𝑤 ≥ 𝐵𝑅(𝑍)}

Aside from the final incentive constraint, this is the same as the operator in competitive equi-
libria in the Chang model
Consequently, to implement this operator we just need to add one step to our outer hyper-
plane approximation algorithm :

1. Initialize subgradients, 𝐻, and hyperplane levels, 𝐶0


2. Given a set of subgradients, 𝐻, and hyperplane levels, 𝐶𝑡 , calculate 𝐵𝑅(𝑆𝑡 )
3. Given 𝐻, 𝐶𝑡 , and 𝐵𝑅(𝑆𝑡 ), for each subgradient ℎ𝑖 ∈ 𝐻:

• Solve a linear program (described below) for each action in the action space
• Find the maximum and update the corresponding hyperplane level, 𝐶𝑖,𝑡+1

1. If |𝐶𝑡+1 − 𝐶𝑡 | > 𝜖, return to 2


84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1453

Step 1 simply creates a large initial set 𝑆0


Given some set 𝑆𝑡 , Step 2 then constructs the value 𝐵𝑅(𝑆𝑡 )
To do this, we solve the following problem for each point in the action space (𝑚𝑗 , ℎ𝑗 ):

min 𝑢(𝑓(𝑥𝑗 )) + 𝑣(𝑚𝑗 ) + 𝛽𝑤′


[𝑤′ ,𝜃′ ]

subject to

𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡

𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)

𝑚𝑗 (𝑢′ (𝑓(𝑥𝑗 )) − 𝑣′ (𝑚𝑗 )) ≤ 𝛽𝜃′ (= if 𝑚𝑗 < 𝑚)


̄

This gives us a matrix of possible values, corresponding to each point in the action space
To find 𝐵𝑅(𝑍), we minimize over the 𝑚 dimension and maximize over the ℎ dimension
Step 3 then constructs the set 𝑆𝑡+1 = 𝐸(𝑆𝑡 ). The linear program in Step 3 is designed to
construct a set 𝑆𝑡+1 that is as large as possible while satisfying the constraints of the 𝐸(𝑆)
operator
To do this, for each subgradient ℎ𝑖 , and for each point in the action space (𝑚𝑗 , ℎ𝑗 ), we solve
the following problem:

max ℎ𝑖 ⋅ (𝑤, 𝜃)
[𝑤′ ,𝜃′ ]

subject to

𝐻 ⋅ (𝑤′ , 𝜃′ ) ≤ 𝐶𝑡

𝑤 = 𝑢(𝑓(𝑥𝑗 )) + 𝑣(𝑚𝑗 ) + 𝛽𝑤′

𝜃 = 𝑢′ (𝑓(𝑥𝑗 ))(𝑚𝑗 + 𝑥𝑗 )

𝑥𝑗 = 𝑚𝑗 (ℎ𝑗 − 1)

𝑚𝑗 (𝑢′ (𝑓(𝑥𝑗 )) − 𝑣′ (𝑚𝑗 )) ≤ 𝛽𝜃′ (= if 𝑚𝑗 < 𝑚)


̄

𝑤 ≥ 𝐵𝑅(𝑍)

This problem maximizes the hyperplane level for a given set of actions
1454 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

The second part of Step 3 then finds the maximum possible hyperplane level across the action
space
The algorithm constructs a sequence of progressively smaller sets 𝑆𝑡+1 ⊂ 𝑆𝑡 ⊂ 𝑆𝑡−1 ⋯ ⊂ 𝑆0
Step 4 ends the algorithm when the difference between these sets is small enough
We have created a Python class that solves the model assuming the following functional
forms:

𝑢(𝑐) = 𝑙𝑜𝑔(𝑐)

1
𝑣(𝑚) = (𝑚𝑚̄ − 0.5𝑚2 )0.5
500

𝑓(𝑥) = 180 − (0.4𝑥)2

̄ are then variables to be specified for an instance of the


The remaining parameters {𝛽, 𝑚,̄ ℎ, ℎ}
Chang class
Below we use the class to solve the model and plot the resulting equilibrium set, once with
𝛽 = 0.3 and once with 𝛽 = 0.8. We also plot the (larger) competitive equilibrium sets, which
we described in competitive equilibria in the Chang model
(We have set the number of subgradients to 10 in order to speed up the code for now. We can
increase accuracy by increasing the number of subgradients)
The following code computes sustainable plans

In [2]: """
Author: Sebastian Graves

Provides a class called ChangModel to solve different


parameterizations of the Chang (1998) model.
"""

import numpy as np
import quantecon as qe
import time

from scipy.spatial import ConvexHull


from scipy.optimize import linprog, minimize, minimize_scalar
from scipy.interpolate import UnivariateSpline
import numpy.polynomial.chebyshev as cheb

class ChangModel:
"""
Class to solve for the competitive and sustainable sets in the Chang (1998)
model, for different parameterizations.
"""

def __init__(self, β, mbar, h_min, h_max, n_h, n_m, N_g):


# Record parameters
self.β, self.mbar, self.h_min, self.h_max = β, mbar, h_min, h_max
self.n_h, self.n_m, self.N_g = n_h, n_m, N_g

# Create other parameters


self.m_min = 1e-9
self.m_max = self.mbar
self.N_a = self.n_h*self.n_m

# Utility and production functions


84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1455

uc = lambda c: np.log(c)
uc_p = lambda c: 1/c
v = lambda m: 1/500 * (mbar * m - 0.5 * m**2)**0.5
v_p = lambda m: 0.5/500 * (mbar * m - 0.5 * m**2)**(-0.5) * (mbar - m)
u = lambda h, m: uc(f(h, m)) + v(m)

def f(h, m):


x = m * (h - 1)
f = 180 - (0.4 * x)**2
return f

def θ(h, m):


x = m * (h - 1)
θ = uc_p(f(h, m)) * (m + x)
return θ

# Create set of possible action combinations, A


A1 = np.linspace(h_min, h_max, n_h).reshape(n_h, 1)
A2 = np.linspace(self.m_min, self.m_max, n_m).reshape(n_m, 1)
self.A = np.concatenate((np.kron(np.ones((n_m, 1)), A1),
np.kron(A2, np.ones((n_h, 1)))), axis=1)

# Pre-compute utility and output vectors


self.euler_vec = -np.multiply(self.A[:, 1], uc_p(f(self.A[:, 0], self.A[:, 1])) - v_p(self.A[:
self.u_vec = u(self.A[:, 0], self.A[:, 1])
self.Θ_vec = θ(self.A[:, 0], self.A[:, 1])
self.f_vec = f(self.A[:, 0], self.A[:, 1])
self.bell_vec = np.multiply(uc_p(f(self.A[:, 0],
self.A[:, 1])),
np.multiply(self.A[:, 1],
(self.A[:, 0] - 1))) + np.multiply(self.A[:, 1],
v_p(self.A[:, 1]))

# Find extrema of (w, θ) space for initial guess of equilibrium sets


p_vec = np.zeros(self.N_a)
w_vec = np.zeros(self.N_a)
for i in range(self.N_a):
p_vec[i] = self.Θ_vec[i]
w_vec[i] = self.u_vec[i]/(1 - β)

w_space = np.array([min(w_vec[~np.isinf(w_vec)]),
max(w_vec[~np.isinf(w_vec)])])
p_space = np.array([0, max(p_vec[~np.isinf(w_vec)])])
self.p_space = p_space

# Set up hyperplane levels and gradients for iterations


def SG_H_V(N, w_space, p_space):
"""
This function initializes the subgradients, hyperplane levels,
and extreme points of the value set by choosing an appropriate
origin and radius. It is based on a similar function in QuantEcon's Games.jl
"""

# First, create a unit circle. Want points placed on [0, 2π]


inc = 2 * np.pi / N
degrees = np.arange(0, 2 * np.pi, inc)

# Points on circle
H = np.zeros((N, 2))
for i in range(N):
x = degrees[i]
H[i, 0] = np.cos(x)
H[i, 1] = np.sin(x)

# Then calculate origin and radius


o = np.array([np.mean(w_space), np.mean(p_space)])
r1 = max((max(w_space) - o[0])**2, (o[0] - min(w_space))**2)
r2 = max((max(p_space) - o[1])**2, (o[1] - min(p_space))**2)
r = np.sqrt(r1 + r2)

# Now calculate vertices


Z = np.zeros((2, N))
for i in range(N):
1456 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

Z[0, i] = o[0] + r*H.T[0, i]


Z[1, i] = o[1] + r*H.T[1, i]

# Corresponding hyperplane levels


C = np.zeros(N)
for i in range(N):
C[i] = np.dot(Z[:, i], H[i, :])

return C, H, Z

C, self.H, Z = SG_H_V(N_g, w_space, p_space)


C = C.reshape(N_g, 1)
self.c0_c, self.c0_s, self.c1_c, self.c1_s = np.copy(C), np.copy(C), np.copy(C), np.copy(C)
self.z0_s, self.z0_c, self.z1_s, self.z1_c = np.copy(Z), np.copy(Z), np.copy(Z), np.copy(Z)

self.w_bnds_s, self.w_bnds_c = (w_space[0], w_space[1]), (w_space[0], w_space[1])


self.p_bnds_s, self.p_bnds_c = (p_space[0], p_space[1]), (p_space[0], p_space[1])

# Create dictionaries to save equilibrium set for each iteration


self.c_dic_s, self.c_dic_c = {}, {}
self.c_dic_s[0], self.c_dic_c[0] = self.c0_s, self.c0_c

def solve_worst_spe(self):
"""
Method to solve for BR(Z). See p.449 of Chang (1998)
"""

p_vec = np.full(self.N_a, np.nan)


c = [1, 0]

# Pre-compute constraints
aineq_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_mbar = np.vstack((self.c0_s, 0))

aineq = self.H
bineq = self.c0_s
aeq = [[0, -self.β]]

for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_mbar, b_ub=bineq_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
else:
beq = self.euler_vec[j]
res = linprog(c, A_ub=aineq, b_ub=bineq, A_eq=aeq, b_eq=beq,
bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
p_vec[j] = self.u_vec[j] + self.β * res.x[0]

# Max over h and min over other variables (see Chang (1998) p.449)
self.br_z = np.nanmax(np.nanmin(p_vec.reshape(self.n_m, self.n_h), 0))

def solve_subgradient(self):
"""
Method to solve for E(Z). See p.449 of Chang (1998)
"""

# Pre-compute constraints
aineq_C_mbar = np.vstack((self.H, np.array([0, -self.β])))
bineq_C_mbar = np.vstack((self.c0_c, 0))

aineq_C = self.H
bineq_C = self.c0_c
aeq_C = [[0, -self.β]]

aineq_S_mbar = np.vstack((np.vstack((self.H, np.array([0, -self.β]))),


np.array([-self.β, 0])))
bineq_S_mbar = np.vstack((self.c0_s, np.zeros((2, 1))))
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1457

aineq_S = np.vstack((self.H, np.array([-self.β, 0])))


bineq_S = np.vstack((self.c0_s, 0))
aeq_S = [[0, -self.β]]

# Update maximal hyperplane level


for i in range(self.N_g):
c_a1a2_c, t_a1a2_c = np.full(self.N_a, -np.inf), np.zeros((self.N_a, 2))
c_a1a2_s, t_a1a2_s = np.full(self.N_a, -np.inf), np.zeros((self.N_a, 2))

c = [-self.H[i, 0], -self.H[i, 1]]

for j in range(self.N_a):
# Only try if consumption is possible
if self.f_vec[j] > 0:

# COMPETITIVE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_C_mbar[-1] = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C_mbar, b_ub=bineq_C_mbar,
bounds=(self.w_bnds_c, self.p_bnds_c))
# If m < mbar, use equality constraint
else:
beq_C = self.euler_vec[j]
res = linprog(c, A_ub=aineq_C, b_ub=bineq_C, A_eq = aeq_C,
b_eq = beq_C, bounds=(self.w_bnds_c, self.p_bnds_c))
if res.status == 0:
c_a1a2_c[j] = self.H[i, 0]*(self.u_vec[j] + self.β * res.x[0]) + self.H[i, 1]
t_a1a2_c[j] = res.x

# SUSTAINABLE EQUILIBRIA
# If m = mbar, use inequality constraint
if self.A[j, 1] == self.mbar:
bineq_S_mbar[-2] = self.euler_vec[j]
bineq_S_mbar[-1] = self.u_vec[j] - self.br_z
res = linprog(c, A_ub=aineq_S_mbar, b_ub=bineq_S_mbar,
bounds=(self.w_bnds_s, self.p_bnds_s))
# If m < mbar, use equality constraint
else:
bineq_S[-1] = self.u_vec[j] - self.br_z
beq_S = self.euler_vec[j]
res = linprog(c, A_ub=aineq_S, b_ub=bineq_S, A_eq = aeq_S,
b_eq = beq_S, bounds=(self.w_bnds_s, self.p_bnds_s))
if res.status == 0:
c_a1a2_s[j] = self.H[i, 0] * (self.u_vec[j] + self.β*res.x[0]) + self.H[i, 1]
t_a1a2_s[j] = res.x

idx_c = np.where(c_a1a2_c == max(c_a1a2_c))[0][0]


self.z1_c[:, i] = np.array([self.u_vec[idx_c] + self.β * t_a1a2_c[idx_c, 0],
self.Θ_vec[idx_c]])

idx_s = np.where(c_a1a2_s == max(c_a1a2_s))[0][0]


self.z1_s[:, i] = np.array([self.u_vec[idx_s] + self.β*t_a1a2_s[idx_s, 0],
self.Θ_vec[idx_s]])

for i in range(self.N_g):
self.c1_c[i] = np.dot(self.z1_c[:, i], self.H[i, :])
self.c1_s[i] = np.dot(self.z1_s[:, i], self.H[i, :])

def solve_sustainable(self, tol=1e-5, max_iter=250):


"""
Method to solve for the competitive and sustainable equilibrium sets.
"""

t = time.time()
diff = tol + 1
iters = 0

print('### --------------- ###')


print('Solving Chang Model Using Outer Hyperplane Approximation')
print('### --------------- ### \n')

print('Maximum difference when updating hyperplane levels:')


1458 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

while diff > tol and iters < max_iter:


iters = iters + 1
self.solve_worst_spe()
self.solve_subgradient()
diff = max(np.maximum(abs(self.c0_c - self.c1_c),
abs(self.c0_s - self.c1_s)))
print(diff)

# Update hyperplane levels


self.c0_c, self.c0_s = np.copy(self.c1_c), np.copy(self.c1_s)

# Update bounds for w and θ


wmin_c, wmax_c = np.min(self.z1_c, axis=1)[0], np.max(self.z1_c, axis=1)[0]
pmin_c, pmax_c = np.min(self.z1_c, axis=1)[1], np.max(self.z1_c, axis=1)[1]

wmin_s, wmax_s = np.min(self.z1_s, axis=1)[0], np.max(self.z1_s, axis=1)[0]


pmin_S, pmax_S = np.min(self.z1_s, axis=1)[1], np.max(self.z1_s, axis=1)[1]

self.w_bnds_s, self.w_bnds_c = (wmin_s, wmax_s), (wmin_c, wmax_c)


self.p_bnds_s, self.p_bnds_c = (pmin_S, pmax_S), (pmin_c, pmax_c)

# Save iteration
self.c_dic_c[iters], self.c_dic_s[iters] = np.copy(self.c1_c), np.copy(self.c1_s)
self.iters = iters

elapsed = time.time() - t
print('Convergence achieved after {} iterations and {} seconds'.format(iters, round(elapsed, 2

def solve_bellman(self, θ_min, θ_max, order, disp=False, tol=1e-7, maxiters=100):


"""
Continuous Method to solve the Bellman equation in section 25.3
"""
mbar = self.mbar

# Utility and production functions


uc = lambda c: np.log(c)
uc_p = lambda c: 1 / c
v = lambda m: 1 / 500 * (mbar * m - 0.5 * m**2)**0.5
v_p = lambda m: 0.5/500 * (mbar*m - 0.5 * m**2)**(-0.5) * (mbar - m)
u = lambda h, m: uc(f(h, m)) + v(m)

def f(h, m):


x = m * (h - 1)
f = 180 - (0.4 * x)**2
return f

def θ(h, m):


x = m * (h - 1)
θ = uc_p(f(h, m)) * (m + x)
return θ

# Bounds for Maximization


lb1 = np.array([self.h_min, 0, θ_min])
ub1 = np.array([self.h_max, self.mbar - 1e-5, θ_max])
lb2 = np.array([self.h_min, θ_min])
ub2 = np.array([self.h_max, θ_max])

# Initialize Value Function coefficients


# Calculate roots of Chebyshev polynomial
k = np.linspace(order, 1, order)
roots = np.cos((2 * k - 1) * np.pi / (2 * order))
# Scale to approximation space
s = θ_min + (roots - -1) / 2 * (θ_max - θ_min)
# Create a basis matrix
Φ = cheb.chebvander(roots, order - 1)
c = np.zeros(Φ.shape[0])

# Function to minimize and constraints


def p_fun(x):
scale = -1 + 2 * (x[2] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0], x[1]) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1459

def p_fun2(x):
scale = -1 + 2*(x[1] - θ_min)/(θ_max - θ_min)
p_fun = - (u(x[0],mbar) + self.β * np.dot(cheb.chebvander(scale, order - 1), c))
return p_fun

cons1 = ({'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[1] * (x[0] - 1) + v_p(x[1])
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], x[1])) * x[0] * x[1] - θ})
cons2 = ({'type': 'ineq', 'fun': lambda x: uc_p(f(x[0], mbar)) * mbar * (x[0] - 1) + v_p(mbar)
{'type': 'eq', 'fun': lambda x: uc_p(f(x[0], mbar)) * x[0] * mbar - θ})

bnds1 = np.concatenate([lb1.reshape(3, 1), ub1.reshape(3, 1)], axis=1)


bnds2 = np.concatenate([lb2.reshape(2, 1), ub2.reshape(2, 1)], axis=1)

# Bellman Iterations
diff = 1
iters = 1

while diff > tol:


# 1. Maximization, given value function guess
p_iter1 = np.zeros(order)
for i in range(order):
θ = s[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p_iter1[i] = -p_fun(res.x)
res = minimize(p_fun2,
lb2 + (ub2-lb2) / 2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p_iter1[i] and res.success == True:
p_iter1[i] = -p_fun2(res.x)

# 2. Bellman updating of Value Function coefficients


c1 = np.linalg.solve(Φ, p_iter1)
# 3. Compute distance and update
diff = np.linalg.norm(c - c1)
if bool(disp == True):
print(diff)
c = np.copy(c1)
iters = iters + 1
if iters > maxiters:
print('Convergence failed after {} iterations'.format(maxiters))
break

self.θ_grid = s
self.p_iter = p_iter1
self.Φ = Φ
self.c = c
print('Convergence achieved after {} iterations'.format(iters))

# Check residuals
θ_grid_fine = np.linspace(θ_min, θ_max, 100)
resid_grid = np.zeros(100)
p_grid = np.zeros(100)
θ_prime_grid = np.zeros(100)
m_grid = np.zeros(100)
h_grid = np.zeros(100)
for i in range(100):
θ = θ_grid_fine[i]
res = minimize(p_fun,
lb1 + (ub1-lb1) / 2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
1460 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

if res.success == True:
p = -p_fun(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[2]
h_grid[i] = res.x[0]
m_grid[i] = res.x[1]
res = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res.x) > p and res.success == True:
p = -p_fun2(res.x)
p_grid[i] = p
θ_prime_grid[i] = res.x[1]
h_grid[i] = res.x[0]
m_grid[i] = self.mbar
scale = -1 + 2 * (θ - θ_min)/(θ_max - θ_min)
resid_grid[i] = np.dot(cheb.chebvander(scale, order-1), c) - p

self.resid_grid = resid_grid
self.θ_grid_fine = θ_grid_fine
self.θ_prime_grid = θ_prime_grid
self.m_grid = m_grid
self.h_grid = h_grid
self.p_grid = p_grid
self.x_grid = m_grid * (h_grid - 1)

# Simulate
θ_series = np.zeros(31)
m_series = np.zeros(30)
h_series = np.zeros(30)

# Find initial θ
def ValFun(x):
scale = -1 + 2*(x - θ_min)/(θ_max - θ_min)
p_fun = np.dot(cheb.chebvander(scale, order - 1), c)
return -p_fun

res = minimize(ValFun,
(θ_min + θ_max)/2,
bounds=[(θ_min, θ_max)])
θ_series[0] = res.x

# Simulate
for i in range(30):
θ = θ_series[i]
res = minimize(p_fun,
lb1 + (ub1-lb1)/2,
method='SLSQP',
bounds=bnds1,
constraints=cons1,
tol=1e-10)
if res.success == True:
p = -p_fun(res.x)
h_series[i] = res.x[0]
m_series[i] = res.x[1]
θ_series[i+1] = res.x[2]
res2 = minimize(p_fun2,
lb2 + (ub2-lb2)/2,
method='SLSQP',
bounds=bnds2,
constraints=cons2,
tol=1e-10)
if -p_fun2(res2.x) > p and res2.success == True:
h_series[i] = res2.x[0]
m_series[i] = self.mbar
θ_series[i+1] = res2.x[1]

self.θ_series = θ_series
self.m_series = m_series
self.h_series = h_series
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1461

self.x_series = m_series * (h_series - 1)

84.4.1 Comparison of Sets

The set of (𝑤, 𝜃) associated with sustainable plans is smaller than the set of (𝑤, 𝜃) pairs asso-
ciated with competitive equilibria, since the additional constraints associated with sustainabil-
ity must also be satisfied
Let’s compute two examples, one with a low 𝛽, another with a higher 𝛽

In [3]: ch1 = ChangModel(β=0.3, mbar=30, h_min=0.9, h_max=2, n_h=8, n_m=35, N_g=10)

In [4]: ch1.solve_sustainable()

### --------------- ###


Solving Chang Model Using Outer Hyperplane Approximation
### --------------- ###

Maximum difference when updating hyperplane levels:


[1.91679545]
[0.66781649]
[0.49234789]
[0.3241217]
[0.19022279]
[0.10862838]
[0.05817151]
[0.02620056]
[0.01836386]
[0.01415009]
[0.00297077]
[0.00089123]
[0.00026737]
[8.02108797e-05]
[2.40632639e-05]
[7.21897917e-06]
Convergence achieved after 16 iterations and 134.97 seconds

The following plot shows both the set of 𝑤, 𝜃 pairs associated with competitive equilibria (in
red) and the smaller set of 𝑤, 𝜃 pairs associated with sustainable plans (in blue)

In [5]: import polytope


import matplotlib.pyplot as plt
%matplotlib inline

def plot_equilibria(ChangModel):
"""
Method to plot both equilibrium sets
"""
fig, ax = plt.subplots(figsize=(7, 5))

ax.set_xlabel('w', fontsize=16)
ax.set_ylabel(r"$\theta$", fontsize=18)

poly_S = polytope.Polytope(ChangModel.H, ChangModel.c1_s)


poly_C = polytope.Polytope(ChangModel.H, ChangModel.c1_c)
ext_C = polytope.extreme(poly_C)
ext_S = polytope.extreme(poly_S)

ax.fill(ext_C[:, 0], ext_C[:, 1], 'r', zorder=-1)


ax.fill(ext_S[:, 0], ext_S[:, 1], 'b', zorder=0)

# Add point showing Ramsey Plan


idx_Ramsey = np.where(ext_C[:, 0] == max(ext_C[:, 0]))[0][0]
1462 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

R = ext_C[idx_Ramsey, :]
ax.scatter(R[0], R[1], 150, 'black', 'o', zorder=1)
w_min = min(ext_C[:, 0])

# Label Ramsey Plan slightly to the right of the point


ax.annotate("R", xy=(R[0], R[1]),
xytext=(R[0] + 0.03 * (R[0] - w_min),
R[1]), fontsize=18)

plt.tight_layout()
plt.show()

plot_equilibria(ch1)

`polytope` failed to import `cvxopt.glpk`.


will use `scipy.optimize.linprog`

Evidently, the Ramsey plan, denoted by the 𝑅, is not sustainable


Let’s raise the discount factor and recompute the sets

In [6]: ch2 = ChangModel(β=0.8, mbar=30, h_min=0.9, h_max=1/0.8, n_h=8, n_m=35, N_g=10)

In [7]: ch2.solve_sustainable()

### --------------- ###


Solving Chang Model Using Outer Hyperplane Approximation
### --------------- ###

Maximum difference when updating hyperplane levels:


[0.06369]
[0.02476]
[0.02153]
[0.01915]
84.4. CALCULATING THE SET OF SUSTAINABLE PROMISE-VALUE PAIRS 1463

[0.01795]
[0.01642]
[0.01507]
[0.01284]
[0.01106]
[0.00694]
[0.0085]
[0.00781]
[0.00433]
[0.00492]
[0.00303]
[0.00182]
[0.00638]
[0.00116]
[0.00093]
[0.00075]
[0.0006]
[0.00494]
[0.00038]
[0.00121]
[0.00024]
[0.0002]
[0.00016]
[0.00013]
[0.0001]
[0.00008]
[0.00006]
[0.00005]
[0.00004]
[0.00003]
[0.00003]
[0.00002]
[0.00002]
[0.00001]
[0.00001]
[0.00001]
Convergence achieved after 40 iterations and 782.13 seconds

Let’s plot both sets

In [8]: plot_equilibria(ch2)
1464 84. CREDIBLE GOVERNMENT POLICIES IN CHANG MODEL

Evidently, the Ramsey plan is now sustainable


Bibliography

[1] Dilip Abreu. On the theory of infinitely repeated games with discounting. Economet-
rica, 56:383–396, 1988.
[2] Dilip Abreu, David Pearce, and Ennio Stacchetti. Toward a theory of discounted re-
peated games with imperfect monitoring. Econometrica, 58(5):1041–1063, September
1990.
[3] Daron Acemoglu, Simon Johnson, and James A Robinson. The colonial origins of com-
parative development: An empirical investigation. The American Economic Review,
91(5):1369–1401, 2001.
[4] S Rao Aiyagari. Uninsured Idiosyncratic Risk and Aggregate Saving. The Quarterly
Journal of Economics, 109(3):659–684, 1994.
[5] S Rao Aiyagari, Albert Marcet, Thomas J Sargent, and Juha Seppälä. Optimal taxation
without state-contingent debt. Journal of Political Economy, 110(6):1220–1254, 2002.
[6] D. B. O. Anderson and J. B. Moore. Optimal Filtering. Dover Publications, 2005.
[7] E. W. Anderson, L. P. Hansen, E. R. McGrattan, and T. J. Sargent. Mechanics of
Forming and Estimating Dynamic Linear Economies. In Handbook of Computational
Economics. Elsevier, vol 1 edition, 1996.
[8] Cristina Arellano. Default risk and income fluctuations in emerging economies. The
American Economic Review, pages 690–712, 2008.
[9] Papoulis Athanasios and S Unnikrishna Pillai. Probability, random variables, and
stochastic processes. Mc-Graw Hill, 1991.
[10] Orazio P Attanasio and Nicola Pavoni. Risk sharing in private information models with
asset accumulation: Explaining the excess smoothness of consumption. Econometrica,
79(4):1027–1068, 2011.
[11] Robert J Barro. On the Determination of the Public Debt. Journal of Political Econ-
omy, 87(5):940–971, 1979.
[12] Jess Benhabib, Alberto Bisin, and Shenghao Zhu. The wealth distribution in bewley
economies with capital income risk. Journal of Economic Theory, 159:489–515, 2015.
[13] L M Benveniste and J A Scheinkman. On the Differentiability of the Value Function in
Dynamic Models of Economics. Econometrica, 47(3):727–732, 1979.
[14] Dmitri Bertsekas. Dynamic Programming and Stochastic Control. Academic Press, New
York, 1975.
[15] Truman Bewley. The permanent income hypothesis: A theoretical formulation. Journal
of Economic Theory, 16(2):252–292, 1977.

1465
1466 BIBLIOGRAPHY

[16] Truman F Bewley. Stationary monetary equilibrium with a continuum of independently


fluctuating consumers. In Werner Hildenbran and Andreu Mas-Colell, editors, Contri-
butions to Mathematical Economics in Honor of Gerard Debreu, pages 27–102. North-
Holland, Amsterdam, 1986.

[17] Anmol Bhandari, David Evans, Mikhail Golosov, and Thomas J. Sargent. Fiscal Policy
and Debt Management with Incomplete Markets. The Quarterly Journal of Economics,
132(2):617–663, 2017.

[18] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[19] Fischer Black and Robert Litterman. Global portfolio optimization. Financial analysts
journal, 48(5):28–43, 1992.

[20] Philip Cagan. The monetary dynamics of hyperinflation. In Milton Friedman, editor,
Studies in the Quantity Theory of Money, pages 25–117. University of Chicago Press,
Chicago, 1956.

[21] Guillermo A. Calvo. On the time consistency of optimal policy in a monetary economy.
Econometrica, 46(6):1411–1428, 1978.

[22] Christopher D Carroll. A Theory of the Consumption Function, with and without Liq-
uidity Constraints. Journal of Economic Perspectives, 15(3):23–45, 2001.

[23] Christopher D Carroll. The method of endogenous gridpoints for solving dynamic
stochastic optimization problems. Economics Letters, 91(3):312–320, 2006.

[24] David Cass. Optimum growth in an aggregative model of capital accumulation. Review
of Economic Studies, 32(3):233–240, 1965.

[25] Roberto Chang. Credible monetary policy in an infinite horizon model: Recursive ap-
proaches. Journal of Economic Theory, 81(2):431–461, 1998.

[26] Varadarajan V Chari and Patrick J Kehoe. Sustainable plans. Journal of Political
Economy, pages 783–802, 1990.

[27] Ronald Harry Coase. The nature of the firm. economica, 4(16):386–405, 1937.

[28] Wilbur John Coleman. Solving the Stochastic Growth Model by Policy-Function Itera-
tion. Journal of Business & Economic Statistics, 8(1):27–29, 1990.

[29] J. D. Cryer and K-S. Chan. Time Series Analysis. Springer, 2nd edition edition, 2008.

[30] Steven J Davis, R Jason Faberman, and John Haltiwanger. The flow approach to labor
markets: New data sources, micro-macro links and the recent downturn. Journal of
Economic Perspectives, 2006.

[31] Angus Deaton. Saving and Liquidity Constraints. Econometrica, 59(5):1221–1248, 1991.

[32] Angus Deaton and Christina Paxson. Intertemporal Choice and Inequality. Journal of
Political Economy, 102(3):437–467, 1994.

[33] Wouter J Den Haan. Comparison of solutions to the incomplete markets model with
aggregate uncertainty. Journal of Economic Dynamics and Control, 34(1):4–27, 2010.

[34] Raymond J Deneckere and Kenneth L Judd. Cyclical and chaotic behavior in a dy-
namic equilibrium model, with implications for fiscal policy. Cycles and chaos in eco-
nomic equilibrium, pages 308–329, 1992.
BIBLIOGRAPHY 1467

[35] J Dickey. Bayesian alternatives to the f-test and least-squares estimate in the normal
linear model. In S.E. Fienberg and A. Zellner, editors, Studies in Bayesian econometrics
and statistics, pages 515–554. North-Holland, Amsterdam, 1975.

[36] Ulrich Doraszelski and Mark Satterthwaite. Computable markov-perfect industry dy-
namics. The RAND Journal of Economics, 41(2):215–243, 2010.

[37] Y E Du, Ehud Lehrer, and A D Y Pauzner. Competitive economy as a ranking device
over networks. submitted, 2013.

[38] R M Dudley. Real Analysis and Probability. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press, 2002.

[39] Robert F Engle and Clive W J Granger. Co-integration and Error Correction: Repre-
sentation, Estimation, and Testing. Econometrica, 55(2):251–276, 1987.

[40] Richard Ericson and Ariel Pakes. Markov-perfect industry dynamics: A framework for
empirical work. The Review of Economic Studies, 62(1):53–82, 1995.

[41] G W Evans and S Honkapohja. Learning and Expectations in Macroeconomics. Fron-


tiers of Economic Research. Princeton University Press, 2001.

[42] Pablo Fajgelbaum, Edouard Schaal, and Mathieu Taschereau-Dumouchel. Uncertainty


traps. Technical report, National Bureau of Economic Research, 2015.

[43] M. Friedman. A Theory of the Consumption Function. Princeton University Press,


1956.

[44] Milton Friedman and Rose D Friedman. Two Lucky People. University of Chicago
Press, 1998.

[45] David Gale. The theory of linear economic models. University of Chicago press, 1989.

[46] Albert Gallatin. Report on the finances**, november, 1807. In Reports of the Secretary
of the Treasury of the United States, Vol 1. Government printing office, Washington,
DC, 1837.

[47] Olle Häggström. Finite Markov chains and algorithmic applications, volume 52. Cam-
bridge University Press, 2002.

[48] Robert E Hall. Stochastic Implications of the Life Cycle-Permanent Income Hypothesis:
Theory and Evidence. Journal of Political Economy, 86(6):971–987, 1978.

[49] Robert E Hall and Frederic S Mishkin. The Sensitivity of Consumption to Transitory
Income: Estimates from Panel Data on Households. National Bureau of Economic Re-
search Working Paper Series, No. 505, 1982.

[50] Michael J Hamburger, Gerald L Thompson, and Roman L Weil. Computation of expan-
sion rates for the generalized von neumann model of an expanding economy. Economet-
rica, Journal of the Econometric Society, pages 542–547, 1967.

[51] James D Hamilton. What’s real about the business cycle? Federal Reserve Bank of St.
Louis Review, (July-August):435–452, 2005.

[52] L P Hansen and T J Sargent. Robustness. Princeton University Press, 2008.

[53] L P Hansen and T J Sargent. Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press, 2013.
1468 BIBLIOGRAPHY

[54] Lars Peter Hansen and Scott F Richard. The Role of Conditioning Information in De-
ducing Testable. Econometrica, 55(3):587–613, May 1987.

[55] Lars Peter Hansen and Thomas J Sargent. Formulating and estimating dynamic linear
rational expectations models. Journal of Economic Dynamics and control, 2:7–46, 1980.

[56] Lars Peter Hansen and Thomas J Sargent. Wanting robustness in macroeconomics.
Manuscript, Department of Economics, Stanford University., 4, 2000.

[57] Lars Peter Hansen and Thomas J. Sargent. Robust control and model uncertainty.
American Economic Review, 91(2):60–66, 2001.

[58] Lars Peter Hansen and Thomas J Sargent. Robustness. Princeton university press, 2008.

[59] Lars Peter Hansen and Thomas J. Sargent. Recursive Linear Models of Dynamic Eco-
nomics. Princeton University Press, Princeton, New Jersey, 2013.

[60] Lars Peter Hansen and José A Scheinkman. Long-term risk: An operator approach.
Econometrica, 77(1):177–234, 2009.

[61] J. Michael Harrison and David M. Kreps. Speculative investor behavior in a stock mar-
ket with heterogeneous expectations. The Quarterly Journal of Economics, 92(2):323–
336, 1978.

[62] J. Michael Harrison and David M. Kreps. Martingales and arbitrage in multiperiod
securities markets. Journal of Economic Theory, 20(3):381–408, June 1979.

[63] John Heaton and Deborah J Lucas. Evaluating the effects of incomplete markets on risk
sharing and asset pricing. Journal of Political Economy, pages 443–487, 1996.

[64] Elhanan Helpman and Paul Krugman. Market structure and international trade. MIT
Press Cambridge, 1985.

[65] O Hernandez-Lerma and J B Lasserre. Discrete-Time Markov Control Processes: Basic


Optimality Criteria. Number Vol 1 in Applications of Mathematics Stochastic Modelling
and Applied Probability. Springer, 1996.

[66] Hugo A Hopenhayn and Edward C Prescott. Stochastic Monotonicity and Stationary
Distributions for Dynamic Economies. Econometrica, 60(6):1387–1406, 1992.

[67] Hugo A Hopenhayn and Richard Rogerson. Job Turnover and Policy Evaluation: A
General Equilibrium Analysis. Journal of Political Economy, 101(5):915–938, 1993.

[68] Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance


economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, 1993.

[69] K Jänich. Linear Algebra. Springer Undergraduate Texts in Mathematics and Technol-
ogy. Springer, 1994.

[70] Robert J. Shiller John Y. Campbell. The Dividend-Price Ratio and Expectations of
Future Dividends and Discount Factors. Review of Financial Studies, 1(3):195–228,
1988.

[71] Boyan Jovanovic. Firm-specific capital and turnover. Journal of Political Economy,
87(6):1246–1260, 1979.

[72] K L Judd. Cournot versus bertrand: A dynamic resolution. Technical report, Hoover
Institution, Stanford University, 1990.
BIBLIOGRAPHY 1469

[73] Kenneth L Judd. On the performance of patents. Econometrica, pages 567–585, 1985.

[74] Kenneth L. Judd, Sevin Yeltekin, and James Conklin. Computing Supergame Equilib-
ria. Econometrica, 71(4):1239–1254, 07 2003.

[75] Takashi Kamihigashi. Elementary results on solutions to the bellman equation of dy-
namic programming: existence, uniqueness, and convergence. Technical report, Kobe
University, 2012.

[76] John G Kemeny, Oskar Morgenstern, and Gerald L Thompson. A generalization of the
von neumann model of an expanding economy. Econometrica, Journal of the Economet-
ric Society, pages 115–135, 1956.

[77] Tomoo Kikuchi, Kazuo Nishimura, and John Stachurski. Span of control, transaction
costs, and the structure of production chains. Theoretical Economics, 13(2):729–760,
2018.

[78] Tjalling C. Koopmans. On the concept of optimal economic growth. In Tjalling C.


Koopmans, editor, The Economic Approach to Development Planning, page 225–287.
Chicago, 1965.

[79] David M. Kreps. Notes on the Theory of Choice. Westview Press, Boulder, Colorado,
1988.

[80] Moritz Kuhn. Recursive Equilibria In An Aiyagari-Style Economy With Permanent


Income Shocks. International Economic Review, 54:807–835, 2013.

[81] Finn E Kydland and Edward C Prescott. Dynamic optimal taxation, rational expecta-
tions and optimal control. Journal of Economic Dynamics and Control, 2:79–91, 1980.

[82] A Lasota and M C MacKey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynam-
ics. Applied Mathematical Sciences. Springer-Verlag, 1994.

[83] Edward E Leamer. Specification searches: Ad hoc inference with nonexperimental data,
volume 53. John Wiley & Sons Incorporated, 1978.

[84] Martin Lettau and Sydney Ludvigson. Consumption, Aggregate Wealth, and Expected
Stock Returns. Journal of Finance, 56(3):815–849, 06 2001.

[85] Martin Lettau and Sydney C. Ludvigson. Understanding Trend and Cycle in Asset
Values: Reevaluating the Wealth Effect on Consumption. American Economic Review,
94(1):276–299, March 2004.

[86] David Levhari and Leonard J Mirman. The great fish war: an example using a dynamic
cournot-nash solution. The Bell Journal of Economics, pages 322–334, 1980.

[87] L Ljungqvist and T J Sargent. Recursive Macroeconomic Theory. MIT Press, 4 edition,
2018.

[88] Robert E Lucas, Jr. Asset prices in an exchange economy. Econometrica: Journal of the
Econometric Society, 46(6):1429–1445, 1978.

[89] Robert E Lucas, Jr. and Edward C Prescott. Investment under uncertainty. Economet-
rica: Journal of the Econometric Society, pages 659–681, 1971.

[90] Robert E Lucas, Jr. and Nancy L Stokey. Optimal Fiscal and Monetary Policy in an
Economy without Capital. Journal of monetary Economics, 12(3):55–93, 1983.
1470 BIBLIOGRAPHY

[91] Albert Marcet and Thomas J Sargent. Convergence of Least-Squares Learning in En-
vironments with Hidden State Variables and Private Information. Journal of Political
Economy, 97(6):1306–1322, 1989.

[92] V Filipe Martins-da Rocha and Yiannis Vailakis. Existence and Uniqueness of a Fixed
Point for Local Contractions. Econometrica, 78(3):1127–1141, 2010.

[93] A Mas-Colell, M D Whinston, and J R Green. Microeconomic Theory, volume 1. Ox-


ford University Press, 1995.

[94] J J McCall. Economics of Information and Job Search. The Quarterly Journal of Eco-
nomics, 84(1):113–126, 1970.

[95] S P Meyn and R L Tweedie. Markov Chains and Stochastic Stability. Cambridge Uni-
versity Press, 2009.

[96] Mario J Miranda and P L Fackler. Applied Computational Economics and Finance.
Cambridge: MIT Press, 2002.

[97] F. Modigliani and R. Brumberg. Utility analysis and the consumption function: An in-
terpretation of cross-section data. In K.K Kurihara, editor, Post-Keynesian Economics.
1954.

[98] John F Muth. Optimal properties of exponentially weighted forecasts. Journal of the
american statistical association, 55(290):299–306, 1960.

[99] Derek Neal. The Complexity of Job Mobility among Young Men. Journal of Labor
Economics, 17(2):237–261, 1999.

[100] J v Neumann. Zur theorie der gesellschaftsspiele. Mathematische annalen, 100(1):295–


320, 1928.

[101] Sophocles J Orfanidis. Optimum Signal Processing: An Introduction. McGraw Hill


Publishing, New York, New York, 1988.

[102] Jenő Pál and John Stachurski. Fitted value function iteration with probability one con-
tractions. Journal of Economic Dynamics and Control, 37(1):251–264, 2013.

[103] Jonathan A Parker. The Reaction of Household Consumption to Predictable Changes


in Social Security Taxes. American Economic Review, 89(4):959–973, 1999.

[104] Martin L Puterman. Markov decision processes: discrete stochastic dynamic program-
ming. John Wiley & Sons, 2005.

[105] Guillaume Rabault. When do borrowing constraints bind? Some new results on the
income fluctuation problem. Journal of Economic Dynamics and Control, 26(2):217–
245, 2002.

[106] F. P. Ramsey. A Contribution to the theory of taxation. Economic Journal, 37(145):47–


61, 1927.

[107] Kevin L Reffett. Production-based asset pricing in monetary economies with transac-
tions costs. Economica, pages 427–443, 1996.

[108] Michael Reiter. Solving heterogeneous-agent models by projection and perturbation.


Journal of Economic Dynamics and Control, 33(3):649–665, 2009.

[109] Steven Roman. Advanced linear algebra, volume 3. Springer, 2005.


BIBLIOGRAPHY 1471

[110] Sherwin Rosen, Kevin M Murphy, and Jose A Scheinkman. Cattle cycles. Journal of
Political Economy, 102(3):468–492, 1994.

[111] Y. A. Rozanov. Stationary Random Processes. Holden-Day, San Francisco, 1967.

[112] John Rust. Numerical dynamic programming in economics. Handbook of computational


economics, 1:619–729, 1996.

[113] Stephen P Ryan. The costs of environmental regulation in a concentrated industry.


Econometrica, 80(3):1019–1061, 2012.

[114] Jaewoo Ryoo and Sherwin Rosen. The engineering labor market. Journal of political
economy, 112(S1):S110–S140, 2004.

[115] Paul A. Samuelson. Interactions between the multiplier analysis and the principle of
acceleration. Review of Economic Studies, 21(2):75–78, 1939.

[116] Thomas Sargent, Lars Peter Hansen, and Will Roberts. Observable implications of
present value budget balance. In Rational Expectations Econometrics. Westview Press,
1991.

[117] Thomas J Sargent. The Demand for Money During Hyperinflations under Rational
Expectations: I. International Economic Review, 18(1):59–82, February 1977.

[118] Thomas J Sargent. Macroeconomic Theory. Academic Press, New York, 2nd edition,
1987.

[119] Jack Schechtman and Vera L S Escudero. Some results on an income fluctuation prob-
lem. Journal of Economic Theory, 16(2):151–166, 1977.

[120] Jose A. Scheinkman. Speculation, Trading, and Bubbles. Columbia University Press,
New York, 2014.

[121] Thomas C Schelling. Models of Segregation. American Economic Review, 59(2):488–


493, 1969.

[122] A N Shiriaev. Probability. Graduate texts in mathematics. Springer. Springer, 2nd


edition, 1995.

[123] N L Stokey, R E Lucas, and E C Prescott. Recursive Methods in Economic Dynamics.


Harvard University Press, 1989.

[124] Nancy L Stokey. Reputation and time consistency. The American Economic Review,
pages 134–139, 1989.

[125] Nancy L. Stokey. Credible public policy. Journal of Economic Dynamics and Control,
15(4):627–656, October 1991.

[126] Kjetil Storesletten, Christopher I Telmer, and Amir Yaron. Consumption and risk shar-
ing over the life cycle. Journal of Monetary Economics, 51(3):609–633, 2004.

[127] R K Sundaram. A First Course in Optimization Theory. Cambridge University Press,


1996.

[128] George Tauchen. Finite state markov-chain approximations to univariate and vector
autoregressions. Economics Letters, 20(2):177–181, 1986.

[129] Daniel Treisman. Russia’s billionaires. The American Economic Review, 106(5):236–241,
2016.
1472 BIBLIOGRAPHY

[130] Ngo Van Long. Dynamic games in the economics of natural resources: a survey. Dy-
namic Games and Applications, 1(1):115–148, 2011.

[131] John Von Neumann. Uber ein okonomsiches gleichungssystem und eine verallgemeiner-
ing des browerschen fixpunktsatzes. In Erge. Math. Kolloq., volume 8, pages 73–83,
1937.

[132] Abraham Wald. Sequential Analysis. John Wiley and Sons, New York, 1947.

[133] Peter Whittle. Prediction and regulation by linear least-square methods. English Univ.
Press, 1963.

[134] Peter Whittle. Prediction and Regulation by Linear Least Squares Methods. University
of Minnesota Press, Minneapolis, Minnesota, 2nd edition, 1983.

[135] Jeffrey M Wooldridge. Introductory econometrics: A modern approach. Nelson Educa-


tion, 2015.

[136] G Alastair Young and Richard L Smith. Essentials of statistical inference. Cambridge
University Press, 2005.

You might also like